| ps {twang} | R Documentation |
ps calculates propensity scores and diagnoses them using
a variety of methods, but centered on using boosted logistic regression as
implemented in gbm
ps(formula = formula(data), data, sampw = rep(1, nrow(data)), title=NULL, stop.method = stop.methods[1:2], plots="all", pdf.plots=FALSE, n.trees = 10000, interaction.depth = 3, shrinkage = 0.01, perm.test.iters=0, print.level = 2, iterlim = 1000, verbose = TRUE)
formula |
a formula for the propensity score model with the treatment indicator on the left side of the formula and the potential confounding variables on the right side. |
title |
a short text title, it will be used in plots and saved files |
data |
the dataset, includes treatment assignment as well as covariates |
sampw |
optional sampling weights |
stop.method |
a stop.methods object, or a list of such
objects, containing the metrics and rules for evaluating
the quality of the propensity scores |
plots |
a character vector indicating which plots to create. The options
are all (the default), optimize, ps boxplot, weight histogram,
t pvalues, ks pvalues, es. Any other options (such as "none") will
produce no plots. See the help for diag.plot for details
on the plotted figures |
pdf.plots |
if TRUE then all plots are dumped to a pdf file with
the name specified in title |
n.trees |
number of gbm iterations passed on to gbm |
interaction.depth |
interaction.depth passed on to
gbm |
shrinkage |
shrinkage passed on to gbm |
perm.test.iters |
a non-negative integer giving the number of iterations
of the permutation test for the KS statistic. If perm.test.iters=0
then the function returns an analytic approximation to the p-value. Setting
perm.test.iters=200 will yield precision to within 3% if the true
p-value is 0.05. Use perm.test.iters=500 to be within 2% |
print.level |
the amount of detail to print to the screen |
iterlim |
maximum number of iterations for the direct optimization |
verbose |
if TRUE, lots of information will be printed to monitor the the progress of the fitting |
formula should be something like "treatment ~ X1 + X2 + X3". The
treatment variable should be a 0/1 indicator. There is no need to specify
interaction terms in the formula. interaction.depth controls the level
of interactions to allow in the propensity score model.
If pdf.plots=TRUE then ps causes plots to be saved as a single
pdf file with the name "[title].pdf" in the working directory. See
diag.plot for details of the plots.
Returns an object of class ps, a list containing
gbm.obj |
The returned gbm object |
ps |
a data frame containing the estimated propensity scores. Each
column is associated with one of the methods selected in
stop.methods |
w |
a data frame containing the propensity score weights. Each
column is associated with one of the methods selected in
stop.methods. If sampling weights were given then these are
incorporated into these weights |
plot.info |
a list containing the raw data used to generate the plots |
desc |
a list containing balance tables for each method selected in
stop.methods. Includes a component for the unweighted
analysis names “unw”. Each desc component includes
a list with the following components
|
datestamp |
Records the date of the analysis |
parameters |
Saves the ps call |
alerts |
Text containing any warnings accumulated during the estimation |
Greg Ridgeway gregr@rand.org, Dan McCaffrey danielm@rand.org, Andrew Morral morral@rand.org
Dan McCaffrey, G. Ridgeway, Andrew Morral (2004). “Propensity Score Estimation with Boosted Regression for Evaluating Adolescent Substance Abuse Treatment,” Psychological Methods 9(4):403-425.
data(lalonde)
print(nrow(lalonde))
ps.lalonde <- ps(treat ~ age + educ + black + hispan + nodegree +
married + re74 + re75,
data = lalonde,
title="Lalonde example",
stop.method=stop.methods[c("ks.stat.mean","ks.stat.max")],
# generate plots?
plots="all",
pdf.plots=FALSE,
# gbm options
n.trees=2000,
interaction.depth=3,
shrinkage=0.005,
perm.test.iters=0,
verbose=TRUE)
# get the balance tables
bal.table(ps.lalonde)
# diagnose the weights using a ps object
a <- dx.wts(ps.lalonde,data=lalonde,treat.var="treat")
print(a)
bal.table(a)
# diagnose the weights as propensity score weights
# will be the same as before, except for MC variation in the KS p-values
# when perm.test.iters is greater than 0
w <- with(ps.lalonde, ps/(1-ps))
w[lalonde$treat==1,] <- 1
dx.wts(w,data=lalonde,treat.var="treat",
perm.test.iters=0)
# diagnose the weights as propensity scores
p <- ps.lalonde$ps
dx.wts(p,data=lalonde,treat.var="treat",x.as.weights=FALSE)
# look at propensity scores
names(ps.lalonde$ps)
hist(ps.lalonde$ps$ks.stat.max)
boxplot(split(ps.lalonde$ps$ks.stat.max,ps.lalonde$treat),
ylab="estimated propensity scores",
names=c("control","treatment"))
# check out the balance
names(ps.lalonde$desc)
# unweighted
ps.lalonde$desc$unw
# optimized for ks.stat.max
ps.lalonde$desc$ks.stat.max
# check out the gbm object, indicates which variables are most influential in
# estimating the propensity score
summary(ps.lalonde$gbm.obj, n.trees=ps.lalonde$desc$ks.stat.max$n.trees)
# bal.stat() can use an arbitrary set of weights
bal.stat(data=lalonde,
w.all=w[,1],
vars=names(lalonde),
treat.var="treat",
get.means=TRUE,
get.ks=TRUE,
na.action="level")
# sensitivity analysis
sensitivity(ps.lalonde,lalonde,"re78")