smooth.spline {modreg}R Documentation

Fit a Smoothing Spline

Description

Fits a cubic smoothing spline to the supplied data.

Usage

smooth.spline(x, y, w = rep(1, length(x)), df = 5, spar = NULL,
              cv = FALSE, all.knots = FALSE, df.offset = 0, penalty = 1,
              control.spar = list())

Arguments

x a vector giving the values of the predictor variable, or a list or a two-column matrix specifying x and y.
y responses. If y is missing, the responses are assumed to be specified by x.
w optional vector of weights
df the desired equivalent number of degrees of freedom (trace of the smoother matrix).
spar smoothing parameter, typically (but not necessarily) in (0,1]. The coefficient λ of the integral of the squared second derivative in the fit (penalized log likelihood) criterion is a monotone function of spar, see the details below.
cv ordinary (TRUE) or `generalized' (FALSE) cross-validation.
all.knots if TRUE, all points in x are uses as knots. If FALSE, a suitably fine grid of knots is used.
df.offset allows the degrees of freedom to be increased by df.offset in the GCV criterion.
penalty the coefficient of the penalty for degrees of freedom in the GCV criterion.
control.spar optional list with named components controlling the root finding when the smoothing parameter spar is computed.
Note that this is partly experimental and may change with general spar computation improvements!
low:
lower bound for spar; defaults to -1.5 (used to implicitly default to 0 in R versions earlier than 1.4).
high:
upper bound for spar; defaults to +1.5.
tol:
the absolute precision (tolerance) used; defaults to 1e-4 (formerly 1e-3).
eps:
the relative precision used; defaults to 2e-8 (formerly 0.00244).
trace:
logical indicating if iterations should be traced.
maxit:
integer giving the maximal number of iterations; defaults to 500.
Note that spar is only searched for in the interval [low, high].

Details

The x vector should contain at least ten distinct values.

The computational λ used (as a function of spar) is lambda = r * 256^(3*spar - 1) where r = tr(X' W^2 X) / tr(Σ), Σ is the matrix given by Sigma[i,j] = Integral B''[i](t) B''[j](t) dt, X is given by X[i,j] = B[j](x[i]), W^2 is the diagonal matrix of scaled weights, W = diag(w)/n (i.e., the identity for default weights), and B[k](.) is the k-th B-spline.

Note that with these definitions, f_i = f(x_i), and the B-spline basis representation f = X c (i.e. c is the vector of spline coefficients), the penalized log likelihood is L = (y - f)' W^2 (y - f) + λ c' Σ c, and hence c is the solution of the (ridge regression) (X' W^2 X + λ Σ) c = X' W^2 y.

If spar is missing or NULL, the value of df is used to determine the degree of smoothing. If both are missing, leave-one-out cross-validation is used to determine λ. Note that from the above relation, spar is spar = s0 + 0.0601 * log(lambda), which is intentionally different from the S-plus implementation of smooth.spline (where spar is proportional to λ). In R's (log λ) scale, it makes more sense to vary spar linearly.

Note however that currently the results may be come very unreliable for spar values smaller than about -1 or -2. The same may happen for values larger than 2 or so. Don't think of setting spar or the controls low and high outside such a safe range, unless you know what you are doing!

The ``generalized'' cross-validation method will work correctly when there are duplicated points in x. However, it is ambiguous what leave-one-out cross-validation means with duplicated points, and the internal code uses an approximation that involves leaving out groups of duplicated points. cv=TRUE is best avoided in that case.

Value

An object of class "smooth.spline" with components

x the distinct x values in increasing order.
y the fitted values corresponding to x.
w the weights used at the unique values of x.
yin the y values used at the unique y values.
lev leverages, the diagonal values of the smoother matrix.
cv.crit (generalized) cross-validation score.
pen.crit penalized criterion
crit the criterion value minimized in the underlying .Fortran routine `sslvrg'.
df equivalent degrees of freedom used. Note that (currently) this value may become quite unprecise when the true df is between and 1 and 2.
spar the value of spar computed or given.
lambda the value of λ corresponding to spar, see the details above.
iparms named integer(3) vector where ..$ipars["iter"] gives number of spar computing iterations used.
fit list for use by predict.smooth.spline.
call the matched call.

Author(s)

B.D. Ripley and Martin Maechler (spar/lambda, etc).

See Also

predict.smooth.spline

Examples

data(cars)
attach(cars)
plot(speed, dist, main = "data(cars)  &  smoothing splines")
cars.spl <- smooth.spline(speed, dist)
(cars.spl)
## This example has duplicate points, so avoid cv=TRUE

lines(cars.spl, col = "blue")
lines(smooth.spline(speed, dist, df=10), lty=2, col = "red")
legend(5,120,c(paste("default [C.V.] => df =",round(cars.spl$df,1)),
               "s( * , df = 10)"), col = c("blue","red"), lty = 1:2,
       bg='bisque')
detach()

##-- artificial example
y18 <- c(1:3,5,4,7:3,2*(2:5),rep(10,4))
xx  <- seq(1,length(y18), len=201)
(s2  <- smooth.spline(y18)) # GCV
(s02 <- smooth.spline(y18, spar = 0.2))
plot(y18, main=deparse(s2$call), col.main=2)
lines(s2, col = "gray"); lines(predict(s2, xx), col = 2)
lines(predict(s02, xx), col = 3); mtext(deparse(s02$call), col = 3)

## The following shows the problematic behavior of `spar' searching:
(s2  <- smooth.spline(y18,          con=list(trace=TRUE,tol=1e-6, low= -1.5)))
(s2m <- smooth.spline(y18, cv=TRUE, con=list(trace=TRUE,tol=1e-6, low= -1.5)))
## both above do quite similarly (Df = 8.5 +- 0.2)



[Package Contents]