| clusterRunSimulation {simFrame} | R Documentation |
Generic function for running a simulation experiment on a snow cluster.
clusterRunSimulation(cl, x, setup, nrep, control,
contControl = NULL, NAControl = NULL,
design = character(), fun, ...,
SAE = FALSE)
cl |
a snow cluster. |
x |
a data.frame (for design-based simulation or simulation based
on real data) or a control object for data generation inheriting from
"VirtualDataControl" (for model-based simulation). |
setup |
an object of class "SampleSetup", containing previously
set up samples, or a control class for setting up samples inheriting
from "VirtualSampleControl". |
nrep |
a non-negative integer giving the number of repetitions of the simulation experiment (for model-based simulation or simulation based on real data). |
control |
a control object of class "SimControl" |
contControl |
an object of a class inheriting from
"VirtualContControl", controlling contamination in the simulation
experiment. |
NAControl |
an object of a class inheriting from
"VirtualNAControl", controlling the insertion of missing values in
the simulation experiment. |
design |
a character vector specifying the variables (columns) to be
used for splitting the data into domains. The simulations, including
contamination and the insertion of missing values (unless SAE=TRUE),
are then performed on every domain. |
fun |
a function to be applied in each simulation run. |
... |
for runSimulation, additional arguments to be passed
to fun. For runSim, arguments to be passed to
runSimulation. |
SAE |
a logical indicating whether small area estimation will be used in the simulation. |
Statistical simulation is embarrassingly parallel, hence computational
performance can be increased by parallel computing. In simFrame,
parallel computing is implemented using the package snow. Note that
all objects and packages required for the computations (including
simFrame) need to be made available on every worker process.
In order to prevent problems with random numbers and to ensure
reproducibility, random number streams should be used. In R, the packages
rlecuyer and rsprng are available for creating random number
streams, which are supported by snow via the function
clusterSetupRNG.
There are some requirements for slot fun of the control object
control. The function must return a numeric vector or an object of
class "SimResult", which consists of a slot values (a numeric
vector) and a slot add (additional results of any class, e.g.,
statistical models). Note that the latter is computationally more
expensive. Returning a list with components values and add
is also accepted and slightly faster than using a "SimResult"
object. A data.frame is passed to fun in every simulation
run. The corresponding argument must be called x. If comparisons
with the original data need to be made, e.g., for evaluating the quality of
imputation methods, the function should have an argument called orig.
If different domains are used in the simulation, the indices of the current
domain can be passed to the function via an argument called domain.
For small area estimation, the following points have to be kept in mind. The
slot design of control for splitting the data must be supplied
and the slot SAE must be set to TRUE. However, the data are
not actually split into the specified domains. Instead, the whole data set
(sample) is passed to fun. Also contamination and missing values are
added to the whole data (sample). Last, but not least, the function must
have a domain argument so that the current domain can be extracted
from the whole data (sample).
In every simulation run, fun is evaluated using try. Hence
no results are lost if computations fail in any of the simulation runs.
An object of class "SimResults".
Andreas Alfons, alfons@statistik.tuwien.ac.at
L'Ecuyer, P., Simard, R., Chen E and Kelton, W. (2002) An object-oriented random-number package with many long streams and substreams. Operations Research, 50(6), 1073–1075.
Mascagni, M. and Srinivasan, A. (2000) Algorithm 806: SPRNG: a scalable
library for pseudorandom number generation. ACM Transactions on
Mathematical Software, 26(3), 436–461.
Rossini, A., Tierney L. and Li, N. (2007) Simple parallel statistical computing in R. Journal of Computational and Graphical Statistics, 16(2), 399–420.
Tierney, L., Rossini, A. and Li, N. (2009) snow: A parallel computing
framework for the R system. International Journal of Parallel
Programming, 37(1), 78–90.
makeCluster,
clusterSetupRNG,
runSimulation, SimControl,
SimResults, simBwplot,
simDensityplot, simXyplot
## Not run:
# these examples require at least dual core processor
# start snow cluster
cl <- makeCluster(2, type = "SOCK")
# load package on workers
clusterEvalQ(cl, library(simFrame))
# setup random number stream
clusterSetupRNG(cl, seed = "1234")
# function for generating data
grnorm <- function(n, means) {
group <- sample(1:2, n, replace=TRUE)
data.frame(group=group, value=rnorm(n) + means[group])
}
# control objects for data generation and contamination
means <- c(0, 0.5)
dc <- DataControl(size = 500, distribution = grnorm,
dots = list(means = means))
cc <- DCARContControl(target = "value",
epsilon = 0.1, dots = list(mean = 10))
# function for simulation runs
sim <- function(x) {
c(mean = mean(x$value),
trimmed = mean(x$value, trim = 0.1),
median = median(x$value))
}
# export objects to workers
clusterExport(cl, c("grnorm", "means", "dc", "cc", "sim"))
# run simulation
results <- clusterRunSimulation(cl, dc, nrep = 100,
contControl = cc, design = "group", fun = sim)
# plot results
plot(results, true = means)
## End(Not run)