| cvrisk {mboost} | R Documentation |
Cross-validated estimation of the empirical risk for hyper-parameter selection.
cvrisk(object, folds, grid = c(1:mstop(object)))
object |
an object of class gb. |
folds |
a weight matrix with number of rows equal to the number of observations. The number of columns corresponds to the number of cross-validation runs. |
grid |
a vector of stopping parameters the empirical risk is to be evaluated for. |
The number of boosting iterations is a hyper-parameter of the
boosting algorithms implemented in this package. Honest,
i.e., cross-validated, estimates of the empirical risk
for different stopping parameters mstop are computed by
this function which can be utilized to choose an appropriate
number of boosting iterations to be applied.
Different forms of cross-validation can be applied, for example
10-fold cross-validation or bootstrapping. The weights (zero weights
correspond to test cases) are defined via the folds matrix.
An object of class cvrisk, basically a matrix
containing estimates of the empirical risk for a varying number
of bootstrap iterations. plot and print methods
are available as well as a mstop method.
The model object needs to be fitted with option
savedata = TRUE in boost_control.
Torsten Hothorn, Friedrich Leisch, Achim Zeileis and Kurt Hornik (2006), The design and analysis of benchmark experiments. Journal of Computational and Graphical Statistics, 14(3), 675–699.
data("bodyfat", package = "mboost")
### fit linear model to data
model <- glmboost(DEXfat ~ ., data = bodyfat,
control = boost_control(center = TRUE))
### AIC-based selection of number of boosting iterations
maic <- AIC(model)
maic
### inspect coefficient path and AIC-based stopping criterion
par(mai = par("mai") * c(1, 1, 1, 1.8))
plot(model)
abline(v = mstop(maic), col = "lightgray")
### 10-fold cross-validation
n <- nrow(bodyfat)
k <- 10
ntest <- floor(n / k)
cv10f <- matrix(c(rep(c(rep(0, ntest), rep(1, n)), k - 1),
rep(0, n * k - (k - 1) * (n + ntest))), nrow = n)
cvm <- cvrisk(model, folds = cv10f)
print(cvm)
mstop(cvm)
plot(cvm)
### 25 bootstrap iterations
set.seed(290875)
bs25 <- rmultinom(25, n, rep(1, n)/n)
cvm <- cvrisk(model, folds = bs25)
print(cvm)
mstop(cvm)
layout(matrix(1:2, ncol = 2))
plot(cvm)
### trees
blackbox <- blackboost(DEXfat ~ ., data = bodyfat)
cvtree <- cvrisk(blackbox, folds = bs25)
plot(cvtree)