| cv.CoxBoost {CoxBoost} | R Documentation |
Performs a K-fold cross-validation for CoxBoost in search for the optimal number of boosting steps.
cv.CoxBoost(time,status,x,maxstepno=100,K=10,type=c("verweij","naive"),
parallel=FALSE,upload.x=TRUE,folds=NULL,trace=FALSE,...)
time |
vector of length n specifying the observed times. |
status |
censoring indicator, i.e., vector of length n with entries 0 for censored observations and 1 for uncensored observations. If this vector contains elements not equal to 0 or 1, these are taken to indicate events from a competing risk and a model for the subdistribution hazard with respect to event 1 is fitted (see e.g. Fine and Gray, 1999). |
x |
n * p matrix of covariates. |
maxstepno |
maximum number of boosting steps to evaluate, i.e, the returned ``optimal'' number of boosting steps will be in the range [0,maxstepno]. |
K |
number of folds to be used for cross-validation. If K is larger or equal to the number of non-zero elements in status, leave-one-out cross-validation is performed. |
type |
way of calculating the partial likelihood contribution of the observation in the hold-out folds: "verweij" uses the more appropriate method described in Verweij and van Houwelingen (1996), "naive" uses the approach where the observations that are not in the hold-out folds are ignored (often found in other R packages). |
parallel |
logical value indicating whether computations in the cross-validation folds should be performed in parallel on a compute cluster. Parallelization is performed via the package snowfall and the initialization function of of this package, sfInit, should be called before calling cv.CoxBoost. |
upload.x |
logical value indicating whether x should/has to be uploaded to the
compute cluster for parallel computation. Uploading this only once (using sfExport(x) from library snowfall) can save much time for large data sets. |
folds |
if not NULL, this has to be a list of length K, each element being a vector of indices of fold elements. Useful for employing the same folds for repeated runs. |
trace |
logical value indicating whether progress in estimation should be indicated by printing the number of the cross-validation fold and the index of the covariate updated. |
... |
miscellaneous parameters for the calls to CoxBoost |
List with the following components:
mean.logplik |
vector of length maxstepno+1 with the mean partial log-likelihood for boosting steps 0 to maxstepno |
se.logplik |
vector with standard error estimates for the mean partial log-likelihood criterion for each boosting step. |
optimal.step |
optimal boosting step number, i.e., with minimum mean partial log-likelihood. |
folds |
list of length K, where the elements are vectors of the indices of observations in the respective folds. |
Harald Binder binderh@fdm.uni-freiburg.de
Verweij, P. J. M. and van Houwelingen, H. C. (1993). Cross-validation in survival analysis. Statistics in Medicine, 12(24):2305-2314.
CoxBoost, optimCoxBoostPenalty
## Not run:
# Generate some survival data with 10 informative covariates
n <- 200; p <- 100
beta <- c(rep(1,10),rep(0,p-10))
x <- matrix(rnorm(n*p),n,p)
real.time <- -(log(runif(n)))/(10*exp(drop(x %*% beta)))
cens.time <- rexp(n,rate=1/10)
status <- ifelse(real.time <= cens.time,1,0)
obs.time <- ifelse(real.time <= cens.time,real.time,cens.time)
# 10-fold cross-validation
cv.res <- cv.CoxBoost(time=obs.time,status=status,x=x,maxstepno=500,
K=10,type="verweij",penalty=100)
# examine mean partial log-likelihood in the course of the boosting steps
plot(cv.res$mean.logplik)
# Fit with optimal number of boosting steps
cbfit <- CoxBoost(time=obs.time,status=status,x=x,stepno=cv.res$optimal.step,
penalty=100)
summary(cbfit)
## End(Not run)