| Cross-validation in penalized generalized linear models {penalized} | R Documentation |
Cross-validating generalized linear models with L1 (lasso) and/or L2 (ridge) penalties, using likelihood cross-validation.
cvl (response, penalized, unpenalized, lambda1 = 0, lambda2 = 0,
data, model = c("cox", "logistic", "linear"), startbeta,
startgamma, fold, epsilon = 1e-10, maxiter, standardize = FALSE,
trace = TRUE)
optL1 (response, penalized, unpenalized, minlambda1, maxlambda1,
lambda2 = 0, data, model = c("cox", "logistic", "linear"),
startbeta, startgamma, fold, epsilon = 1e-10, maxiter,
standardize = FALSE, trace = TRUE, tol = .Machine$double.eps^0.25)
optL2 (response, penalized, unpenalized, lambda1 = 0, minlambda2,
maxlambda2, data, model = c("cox", "logistic", "linear"),
startbeta, startgamma, fold, epsilon = 1e-10, maxiter,
standardize = FALSE, trace = TRUE, tol = .Machine$double.eps^0.25)
profL1 (response, penalized, unpenalized, minlambda1, maxlambda1,
lambda2 = 0, data, model = c("cox", "logistic", "linear"), startbeta,
startgamma, fold, epsilon = 1e-10, maxiter, standardize = FALSE,
trace = TRUE, steps = 100, minsteps = steps/4, log = FALSE)
profL2 (response, penalized, unpenalized, lambda1 = 0, minlambda2,
maxlambda2, data, model = c("cox", "logistic", "linear"), startbeta,
startgamma, fold, epsilon = 1e-10, maxiter, standardize = FALSE,
trace = TRUE, steps = 100, minsteps = steps/4, log = TRUE)
response |
The response variable (vector). This should be a numeric vector for
linear regression, a Surv object for Cox regression and
a vector of 0/1 values for logistic regression. |
penalized |
The penalized covariates. These may be specified
either as a matrix or as a (one-sided) formula object.
See also under data. |
unpenalized |
Additional unpenalized covariates.
Specified as under penalized.
Note that an unpenalized intercept is included in the model by default (except in the cox model).
This can be suppressed by specifying unpenalized = ~0. |
lambda1, lambda2 |
The fixed values of the tuning parameters for L1 and L2 penalization. Both may be vectors if different covariates are to be penalized differently. |
minlambda1, minlambda2, maxlambda1, maxlambda2 |
The values of the tuning parameters for L1 or L2 penalization between which the cross-validated likelihood is to be profiled or optimized. |
data |
A data.frame used to evaluate response, and the terms of
penalized or unpenalized when these have been specified as a
formula object. |
model |
The model to be used. If missing, the model will be guessed from the response input. |
startbeta |
Starting values for the regression coefficients of the penalized covariates.
These starting values will be used only for the first values of lambda1 and lambda2. |
startgamma |
Starting values for the regression coefficients of the unpenalized covariates.
These starting values will be used only for the first values of lambda1 and lambda2. |
fold |
The fold for cross-validation. May be supplied as a single number
(between 2 and n) giving the number of folds, or, alternatively, as a length n
vector with values in 1:fold, specifying exactly which subjects are assigned
to which fold. The default is fold = 1:n, resulting in leave-one-out (n-fold)
cross-validation. |
epsilon |
The convergence criterion. As in glm.
Convergence is judged separately on the likelihood and on the penalty. |
maxiter |
The maximum number of iterations allowed. Set by default at 25 when lambda1 = 0, infinite otherwise. |
standardize |
If TRUE, standardizes all penalized covariates to
unit central L2-norm before applying penalization. |
trace |
If TRUE, prints progress information. Note that setting
trace=TRUE may slow down the algorithm (but it often feels quicker) |
steps |
The maximum number of steps between minlambda1 and
maxlambda1 or minlambda2 and maxlambda2 at which the cross-validated likelihood is to
be calculated. |
minsteps |
The minimum number of steps between minlambda1 and
maxlambda1 or minlambda2 and maxlambda2 at which the cross-validated likelihood is to
be calculated. If minsteps is smaller than steps, the algorithm will
automatically stop when the cross-validated likelihood drops below the cross-validated
likelihood of the null model, provided it has done at least minsteps steps. |
log |
If FALSE, the steps between minlambda1 and
maxlambda1 or minlambda2 and maxlambda2 are equidistant on a linear scale, if TRUE
on a logaritmic scale. Please note the different default between optL1 (FALSE)
and optL2 (TRUE). |
tol |
The tolerance of the Brent algorithm used for minimization.
See also optimize. |
All five functions return a list with the following named elements:
lambda:optL1 and
optL2 lambda gives the optimal value of the tuning parameters found. For
profL1 and profL2 lambda is the vector of values of the
tuning parameter for which the cross-validated likelihood has been calculated.
Absent in the output of cvl.cvl:optL1,
optL2 this is the cross-validated likelihood at the optimal value of the
tuning parameter.fold:optL1, optL2,
profL1, profL2.predictions:breslow for survival models. The functions profL1
and profL2 return a list here, whereas optL1, optL2 return the
predictions for the optimal value of the tuning parameter only.fullfit:profL1
and profL2 return a list here, whereas optL1, optL2 return the
full data fit for the optimal value of the tuning parameter only.A named list. See details.
The optL1 and optL2 functions use Brent's algorithm for
minimization without derivatives (see also optimize).
There is a risk that these functions converge to a local instead of to a global
optimum. This is especially the case for optL1, as the cross-validated
likelihood as a function of lambda1 quite often has local optima. It is
recommended to use optL1 in combination with profL1 to check whether
optL1 has converged to the right optimum.
See also the notes under penalized.
Jelle Goeman: j.j.goeman@lumc.nl
data(nki70)
# Finding an optimal crossvalidated likelihood
attach(nki70)
opt <- optL1(Surv(time, event), penalized = nki70[,8:77], fold = 10)
coefficients(opt$fullfit)
plot(opt$predictions)
# Plotting the profile of the crossvalidated likelihood
prof <- profL1(Surv(time, event), penalized = nki70[,8:77],
fold = opt$fold, steps=20)
plot(prof$lambda, prof$cvl, type="l")
plotpath(prof$fullfit)