estweights              package:survey              R Documentation

_E_s_t_i_m_a_t_e_d _w_e_i_g_h_t_s _f_o_r _m_i_s_s_i_n_g _d_a_t_a

_D_e_s_c_r_i_p_t_i_o_n:

     Creates or adjusts a two-phase survey design object using a
     logistic regression model for second-phase sampling probability. 
     This function should be particularly useful in reweighting to
     account for missing data.

_U_s_a_g_e:

     estWeights(data,formula,...)
     ## S3 method for class 'twophase':
     estWeights(data,formula=NULL, working.model=NULL,...)
     ## S3 method for class 'data.frame':
     estWeights(data,formula=NULL, working.model=NULL,
           subset=NULL, strata=NULL,...)

_A_r_g_u_m_e_n_t_s:

    data: twophase design object or data frame

 formula: Predictors for estimating weights

working.model: Model fitted to complete (ie phase 1) data

  subset: Subset of data frame with complete data (ie phase 1). If
          'NULL' use all complete cases

  strata: Stratification (if any) of phase 2 sampling

     ...: for future expansion

_D_e_t_a_i_l_s:

     If 'data' is a data frame, 'estWeights' first creates a two-phase
     design object. The 'strata' argument is used only to compute
     finite population corrections, the same variables must be included
     in 'formula' to compute stratified sampling probabilities.

     With a two-phase design object, 'estWeights' estimates the
     sampling probabilities using logistic regression as described by
     Robins et al (1994) and adds information to the object to enable
     correct sandwich standard errors to be computed.

     An alternative to specifying 'formula' is to specify
     'working.model'. The estimating functions from this model will be
     used as predictors of the sampling probabilities, which will
     increase efficiency to the extent that the working model and the
     model of interest estimate the same parameters (Kulich & Lin
     2004).

     The effect on a two-phase design object is very similar to
     'calibrate', and is identical when 'formula' specifies a saturated
     model.

_V_a_l_u_e:

     A two-phase survey design object.

_R_e_f_e_r_e_n_c_e_s:

     Robins JM, Rotnitzky A, Zhao LP. (1994) Estimation of regression
     coefficients when some regressors are not always observed. Journal
     of the American Statistical Association, 89, 846-866.

     Kulich M, Lin DY (2004). Improving the Efficiency of Relative-Risk
     Estimation in Case-Cohort Studies. Journal of the American
     Statistical Association, Vol. 99,  pp.832-844

_S_e_e _A_l_s_o:

     'postStratify', 'calibrate', 'twophase'

_E_x_a_m_p_l_e_s:

     data(airquality)

     ## ignoring missingness, using model-based standard error
     summary(lm(log(Ozone)~Temp+Wind, data=airquality))

     ## Without covariates to predict missingness we get
     ## same point estimates, but different (sandwich) standard errors
     daq<-estWeights(airquality, formula=~1,subset=~I(!is.na(Ozone)))
     summary(svyglm(log(Ozone)~Temp+Wind,design=daq))

     ## Reweighting based on weather, month
     d2aq<-estWeights(airquality, formula=~Temp+Wind+Month,
                      subset=~I(!is.na(Ozone)))
     summary(svyglm(log(Ozone)~Temp+Wind,design=d2aq))

