| createDataPartition {caret} | R Documentation |
A series of test/training partitions are created using
createDataPartition while createResample creates one or
more bootstrap samples. createFolds splits the data into
k groups.
createDataPartition(y, times = 1, p = 0.5, list = TRUE, groups = min(5, length(y))) createResample(y, times = 10, list = TRUE) createFolds(y, k = 10, list = TRUE, returnTrain = FALSE)
y |
a vector of outcomes |
times |
the number of partitions to create |
p |
the percentage of data that goes to training |
list |
logical - should the results be in a list (TRUE) or a matrix
with the number of rows equal to floor(p * length(y)) and times
columns. |
groups |
for numeric y, the number of breaks in the quantiles
(see below) |
k |
an integer for the number of folds. |
returnTrain |
a logical. When true, the values returned are the
sample positions corresponding to the data used during
training. This argument only works in conjunction with list = TRUE |
For bootstrap samples, simple random sampling is used.
For other data splitting, the random sampling is done within the
levels of y when y is a factor in an attempt to balance
the class distributions within the splits. For numeric y, the
sample is split into groups sections based
on quantiles and sampling is done within these subgroups. Also, for
very small class sizes (<= 3) the classes may not show up in both the
training and test data
A list or matrix of row positions (e.g. 1, 15) corresponding to the em{training} data
Max Kuhn
data(oil) createDataPartition(oilType, 2) x <- rgamma(50, 3, .5) inA <- createDataPartition(x, list = FALSE) plot(density(x[inA])) rug(x[inA]) points(density(x[-inA]), type = "l", col = 4) rug(x[-inA], col = 4) createResample(oilType, 2) createFolds(oilType, 10) createFolds(oilType, 5, FALSE) createFolds(rnorm(21))