| att {cem} | R Documentation |
An example of ATT estimation from CEM output
att(obj, formula, data, model="linear", extrapolate=FALSE, ntree=2000) ## S3 method for class 'cem.att': plot(x, obj, data, vars=NULL,...)
obj |
a cem.atch or cem.match.list object |
formula |
a model formula. See Details. |
data |
a single data.frame or a list of data.frame's in case of cem.match.list |
model |
one model. See Details. |
extrapolate |
extrapolate the CEM restriced estimate to the whole data. Default = FALSE. |
ntree |
number of trees to generate in random forest model. Default = 2000. |
x |
the output from the att function |
vars |
a vector of variable names to be used in the parallel plots. By default all variables involved in data matching are used. |
... |
passed to the plot function. |
Argument model can be lm, linear for linear regression
model; logit for the the logistic model;
lme, linear-RE for the linear model with random effects.
Also rf, forest for the randomforest algorithm.
If the outcome is y and the
treatment variable is T, then a formula like y ~ T
will produce the simplest estimate the ATT: with lm, it is just the
coefficient on T, which is the same as the difference in means,
weighted by CEM stratum size. Users can add covariates to span any
remaining imbalance after the match, such as y ~ T + age + sex,
to adjust for variables age and sex.
In the case of multiply imputed datasets, the model is applied to each single matched data and the ATT and is the standard error estimated using the standard formulas for combining results of multiply imputed data.
When extrapolate = TRUE, the estimate model is extrapolated
to the whole set of data.
There is a print method for the output of att. Specifying the
option TRUE in a print command gives complete output from the
estimated model when availalble.
A matrix of estimates with their standard error, or a list in
the case of cem.match.list.
Stefano Iacus, Gary King, and Giuseppe Porro
Stefano Iacus, Gary King, Giuseppe Porro, ``Matching for Casual Inference Without Balance Checking,'' http://gking.harvard.edu/files/abs/cem-abs.shtml
data(LL)
# cem match: automatic bin choice
mat <- cem(treatment="treated",data=LL, drop="re78")
mat
mat$k2k
# ATT estimate
homo1 <- att(mat, re78~treated, data=LL)
rand1 <- att(mat, re78~treated, data=LL, model="linear-RE")
rf1 <- att(mat, re78~treated, data=LL, model="rf")
homo2 <- att(mat, re78~treated, data=LL, extra=TRUE)
rand2 <- att(mat, re78~treated, data=LL, model="linear-RE", extra=TRUE)
rf2 <- att(mat, re78~treated, data=LL, model="rf", extra=TRUE)
homo1
rand1
rf1
homo2
rand2
rf2
plot( homo1, mat, LL, vars=c("age","education","re74","re75"))
plot( rand1, mat, LL, vars=c("age","education","re74","re75"))
plot( rf1, mat, LL, vars=c("age","education","re74","re75"))
plot( homo2, mat, LL, vars=c("age","education","re74","re75"))
plot( rand2, mat, LL, vars=c("age","education","re74","re75"))
plot( rf2, mat, LL, vars=c("age","education","re74","re75"))
# reduce the match into k2k using euclidean distance within cem strata
mat2 <- k2k(mat, LL, "euclidean", 1)
mat2
mat2$k2k
# ATT estimate after k2k
att(mat2, re78~treated, data=LL)
# example with missing data
# using multiply imputated data
# we use Amelia for multiple imputation
if(require(Amelia)){
data(LL)
n <- dim(LL)[1]
k <- dim(LL)[2]
# we generate missing values in 30
# randomly in one colum per row
LL1 <- LL
idx <- sample(1:n, .3*n)
invisible(sapply(idx, function(x) LL1[x,sample(2:k,1)] <<- NA))
imputed <- amelia(LL1)[1:5]
mat <- cem("treated", datalist=imputed, data=LL1, drop="re78")
print(mat)
att(mat, re78 ~ treated, data=imputed)
}