| mi {mi} | R Documentation |
Produce a multiply imputed matrix applying the elementary functions iteratively to the variables with missingness in the data randomly imputing each variable and looping through until approximate convergence.
## S4 method for signature 'data.frame':
mi( object, info, n.imp = 3, n.iter = 30,
R.hat = 1.1, max.minutes = 20, rand.imp.method = "bootstrap",
preprocess = TRUE, run.past.convergence = FALSE,
seed = NA, check.coef.convergence = FALSE,
add.noise = noise.control(), post.run = TRUE)
## S4 method for signature 'mi':
mi( object, info, n.iter = 30,
R.hat = 1.1, max.minutes = 20, rand.imp.method = "bootstrap",
run.past.convergence = FALSE, seed = NA)
object |
A data frame or an mi object that contains an incomplete data. mi identifies NAs as the missing data. |
info |
The mi.info object. |
n.imp |
The number of multiple imputations. Default is 3 chains. |
n.iter |
The maximum number of imputation iterations. Default is 30 iterations. |
R.hat |
The value of the R.hat statistic used as a convergence criterion. Default is 1.1. |
max.minutes |
The maximum minutes to operate the whole imputation process. Default is 20 minutes. |
rand.imp.method |
The methods for random imputation. Currently, mi implements only the boostrap method. |
preprocess |
Default is TRUE. mi will transform the variables that are of nonnegative, positive-continuous, and proportion types. |
run.past.convergence |
Default is FALSE. If the value is set to be TRUE, mi will run until the values of either n.iter or max.minutes are reached even if the imputation is converged. |
seed |
The random number seed. |
check.coef.convergence |
Default is FALSE. If the value is set to be TRUE, mi will check the convergence of the coefficients of imputation models. |
add.noise |
A list of parameters for controlling the process of adding noise to mi via noise.control. |
post.run |
Default is TRUE. mi will run 20 more iterations after an imputation process is finished if and only if add.noise is not FALSE. This is to mitigate the influence of the noise to the whole imputation process. |
Generate multiple imputations for incomplete data using iterative regression imputation. If the variables with missingness are a matrix Y with columns Y(1), . . . , Y(K) and the fully observed predictors are X, this entails first imputing all the missing Y values using some crude approach (for example, choosing imputed values for each variable by randomly selecting from the observed outcomes of that variable); and then imputing Y(1) given Y(2), . . . , Y(K) and X; imputing Y(2) given Y(1), Y(3), . . . , Y(K) and X (using the newly imputed values for Y(1)), and so forth, randomly imputing each variable and looping through until approximate convergence.
A list of object of class mi, which stands for “multiple imputation”.
Each object is itself a list of 10 elements.
call |
Theimputation model. |
data |
The original data frame. |
m |
The number of imputations. |
mi.info |
Information matrix of the mi. |
imp |
A list of length(m) of imputations. |
converged |
Binary variable to indicate if the mi has converged. |
coef.conv |
Binary variable to indicate if the coefs of mi model have converged, return
NULL if check.coef.convergence = FALSE |
bugs |
BUGS array of the mean and sd of each iteration. |
preprocess |
Binary variable to indicate if preprocess=TRUE in the mi process |
mi.info.preprocessed |
Information matrix that actually used in the mi if preprocess=TRUE. |
|
the specified models used for imputing missing values |
|
a list of vectors of length n-n.mis (number of complete observed data), specifying the estimated values of the models |
|
a list of vectors of length n.mis (number of NAs), specifying the random predicted values for imputing missing data |
Masanao Yajima yajima@stat.columbia.edu, Yu-Sung Su ys463@columbia.edu, M. Grazia Pittau grazia@stat.columbia.edu, Andrew Gelman gelman@stat.columbia.edu
Yu-Sung Su, Andrew Gelman, Jennifer Hill, Masanao Yajima. Forthcoming. “Multiple Imputation with Diagnostics (mi) in R: Opening Windows into the Black Box”. Journal of Statistical Software.
Kobi Abayomi, Andrew Gelman and Marc Levy. (2008). “Diagnostics for multivariate imputations”. Applied Statistics 57, Part 3: 273–291.
Andrew Gelman and Jennifer Hill. (2007). Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge University Press.
mi.completed, mi.data.frame,
mi.continuous, mi.binary,
mi.count, mi.categorical,
mi.polr, typecast,
mi.info, mi.preprocess
# simulate fake data set.seed(100) n <- 100 u1 <- rbinom(n, 1, .5) v1 <- log(rnorm(n, 5, 1)) x1 <- u1*exp(v1) u2 <- rbinom(n, 1, .5) v2 <- log(rnorm(n, 5, 1)) x2 <- u2*exp(v2) x3 <- rbinom(n, 1, prob=0.45) x4 <- ordered(rep(seq(1, 5),100)[sample(1:n, n)]) x5 <- rep(letters[1:10],10)[sample(1:n, n)] x6 <- trunc(runif(n, 1, 10)) x7 <- rnorm(n) x8 <- factor(rep(seq(1,10),10)[sample(1:n, n)]) x9 <- runif(n, 0.1, .99) x10 <- rpois(n, 10) y <- x1 + x2 + x7 + x9 + rnorm(n) fakedata <- cbind.data.frame(y, x1, x2, x3, x4, x5, x6, x7, x8, x9, x10) # randomly create missing values dat <- mi:::.create.missing(fakedata, pct.mis=30) # get information matrix of the data inf <- mi.info(dat) # update the variable type of a specific variable to mi.info inf <- update(inf, "type", list(x10="count")) # run the imputation ## this is for test only IMP <- mi(dat, info=inf, n.iter=6, post.run=FALSE) # no noise # IMP <- mi(dat, info=inf, n.iter=6, add.noise=FALSE) # pick up where you left off # IMP <- mi(IMP) ## NOT RUN ## this is the suggested (defautl) way of running mi, NOT RUN # IMP <- mi(dat, info=inf) # convergence checking converged(IMP) ## You should get FALSE here because only n.iter is small bugs.mi(IMP) ## BUGS object to look at the R hat statistics plot(IMP@bugs) ## visually check R.hat # visually check the imputation plot(IMP)