| clusterSetup {simFrame} | R Documentation |
Generic function for setting up multiple samples on a snow cluster.
clusterSetup(cl, x, control, ...) ## S4 method for signature 'ANY, data.frame, ## SampleControl': clusterSetup(cl, x, control)
cl |
a snow cluster. |
x |
the data.frame to sample from. |
control |
a control object inheriting from the virtual class
"VirtualSampleControl" or a character string specifying such a
control class (the default being "SampleControl"). |
... |
if control is a character string or missing, the slots of
the control object may be supplied as additional arguments. |
The computational performance of setting up multiple samples can be increased
by parallel computing. In simFrame, parallel computing is implemented
using the package snow. Note that all objects and packages required
for the computations (including simFrame) need to be made available on
every worker process.
In order to prevent problems with random numbers and to ensure
reproducibility, random number streams should be used. In R, the packages
rlecuyer and rsprng are available for creating random number
streams, which are supported by snow via the function
clusterSetupRNG.
The control class "SampleControl" is highly flexible and allows
stratified sampling as well as sampling of whole groups rather than
individuals with a specified sampling method. Hence it is often sufficient
to implement the desired sampling method for the simple non-stratified case
to extend the existing framework. See "SampleControl"
for some restrictions on the argument names of such a function, which should
return a vector containing the indices of the sampled observations.
Nevertheless, for very complex sampling procedures, it is possible to define
a control class "MySampleControl" extending
"VirtualSampleControl", and the corresponding method
clusterSetup(cl, x, control) with signature 'ANY, data.frame,
MySampleControl'. In order to optimize computational performance, it is
necessary to efficiently set up multiple samples. Thereby the slot k
of "VirtualSampleControl" needs to be used to control the number of
samples, and the resulting object must be of class "SampleSetup".
An object of class "SampleSetup".
Andreas Alfons, alfons@statistik.tuwien.ac.at
L'Ecuyer, P., Simard, R., Chen E and Kelton, W. (2002) An object-oriented random-number package with many long streams and substreams. Operations Research, 50(6), 1073–1075.
Mascagni, M. and Srinivasan, A. (2000) Algorithm 806: SPRNG: a scalable
library for pseudorandom number generation. ACM Transactions on
Mathematical Software, 26(3), 436–461.
Rossini, A., Tierney L. and Li, N. (2007) Simple parallel statistical computing in R. Journal of Computational and Graphical Statistics, 16(2), 399–420.
Tierney, L., Rossini, A. and Li, N. (2009) snow: A parallel computing
framework for the R system. International Journal of Parallel
Programming, 37(1), 78–90.
makeCluster,
clusterSetupRNG,
setup, draw, SampleControl,
VirtualSampleControl, SampleSetup
## Not run:
# these examples require at least dual core processor
# load data
data(eusilc)
# start snow cluster
cl <- makeCluster(2, type = "SOCK")
# load package and data on workers
clusterEvalQ(cl, {
library(simFrame)
data(eusilc)
})
# simple random sampling
srss <- clusterSetup(cl, eusilc, size = 20, k = 4)
draw(eusilc[, c("id", "eqIncome")], srss, i = 1)
# group sampling
gss <- clusterSetup(cl, eusilc, group = "hid", size = 10, k = 4)
draw(eusilc[, c("hid", "id", "eqIncome")], gss, i = 2)
# stratified sampling
stss <- clusterSetup(cl, eusilc, design = "region",
size = c(2, 5, 5, 3, 4, 5, 3, 5, 2), k = 4)
draw(eusilc[, c("id", "region", "eqIncome")], stss, i = 3)
# stop cluster
stopCluster(cl)
## End(Not run)