robcov                package:Design                R Documentation

_R_o_b_u_s_t _C_o_v_a_r_i_a_n_c_e _M_a_t_r_i_x _E_s_t_i_m_a_t_e_s

_D_e_s_c_r_i_p_t_i_o_n:

     Uses the Huber-White method to adjust the variance-covariance
     matrix of a fit from maximum likelihood or least squares, to
     correct for heteroscedasticity and for correlated responses from
     cluster samples. The method uses the ordinary estimates of
     regression coefficients and other parameters of the model, but
     involves correcting the covariance matrix for model
     misspecification and sampling design.  Models currently
     implemented are models that have a  'residuals(fit,type="score")'
     function implemented, such as 'lrm',  'cph', 'coxph', and ordinary
     linear models ('ols'). The fit must have specified the 'x=TRUE'
     and 'y=TRUE' options for certain models. Observations in different
     clusters are assumed to be independent. For the special case where
     every cluster contains one observation, the corrected covariance
     matrix returned is the "sandwich" estimator (see Lin and Wei).
     This is a consistent estimate of the covariance matrix even if the
     model is misspecified (e.g. heteroscedasticity, underdispersion,
     wrong covariate form).

     For the special case of ols fits, 'robcov' can compute the
     improved (especially for small samples) Efron estimator that
     adjusts for natural heterogeneity of residuals (see Long and Ervin
     (2000) estimator HC3).

_U_s_a_g_e:

     robcov(fit, cluster, method=c('huber','efron'))

_A_r_g_u_m_e_n_t_s:

     fit: a fit object from the 'Design()' series 

 cluster: a variable indicating groupings. 'cluster' may be any type of
          vector (factor, character, integer).  NAs are not allowed.
          Unique values of 'cluster' indicate possibly correlated
          groupings of observations. Note the data used in the fit and
          stored in 'fit$x' and 'fit$y' may have had observations
          containing missing values deleted. It is assumed that if any
          NAs were removed during the original model fitting, an
          'naresid' function exists to restore NAs so that the rows of
          the score matrix coincide with 'cluster'. If 'cluster' is
          omitted, it defaults to the integers 1,2,...,n to obtain the
          "sandwich" robust covariance matrix estimate. 

  method: can set to '"efron"' for ols fits (only).  Default is
          Huber-White estimator of the covariance matrix. 

_V_a_l_u_e:

     a new fit object with the same class as the original fit, and with
     the element 'orig.var' added. 'orig.var' is the covariance matrix
     of the original fit.  Also, the original 'var' component is
     replaced with the new Huberized estimates.

_W_a_r_n_i_n_g_s:

     Adjusted 'ols' fits do not have the corrected standard errors
     printed with 'print.ols'. Use 'sqrt(diag(adjfit$var))' to get
     this, where 'adjfit' is the result of 'robcov'.

_A_u_t_h_o_r(_s):

     Frank Harrell
      Department of Biostatistics
      Vanderbilt University
      f.harrell@vanderbilt.edu

_R_e_f_e_r_e_n_c_e_s:

     Huber, PJ. Proc Fifth Berkeley Symposium Math Stat 1:221-33, 1967.

     White, H. Econometrica 50:1-25, 1982.

     Lin, DY, Wei, LJ. JASA 84:1074-8, 1989.

     Rogers, W.  Stata Technical Bulletin STB-8, p. 15-17, 1992.

     Rogers, W.  Stata Release 3 Manual, 'deff', 'loneway', 'huber',
     'hreg', 'hlogit' functions.

     Long, JS, Ervin, LH.  The American Statistician 54:217-224, 2000.

_S_e_e _A_l_s_o:

     'bootcov', 'naresid', 'residuals.cph'

_E_x_a_m_p_l_e_s:

     # A dataset contains a variable number of observations per subject,
     # and all observations are laid out in separate rows. The responses
     # represent whether or not a given segment of the coronary arteries
     # is occluded. Segments of arteries may not operate independently
     # in the same patient.  We assume a "working independence model" to
     # get estimates of the coefficients, i.e., that estimates assuming
     # independence are reasonably efficient.  The job is then to get
     # unbiased estimates of variances and covariances of these estimates.

     set.seed(1)
     n.subjects <- 30
     ages <- rnorm(n.subjects, 50, 15)
     sexes  <- factor(sample(c('female','male'), n.subjects, TRUE))
     logit <- (ages-50)/5
     prob <- plogis(logit)  # true prob not related to sex
     id <- sample(1:n.subjects, 300, TRUE) # subjects sampled multiple times
     table(table(id))  # frequencies of number of obs/subject
     age <- ages[id]
     sex <- sexes[id]
     # In truth, observations within subject are independent:
     y   <- ifelse(runif(300) <= prob[id], 1, 0)
     f <- lrm(y ~ lsp(age,50)*sex, x=TRUE, y=TRUE)
     g <- robcov(f, id)
     diag(g$var)/diag(f$var)
     # add ,group=w to re-sample from within each level of w
     anova(g)            # cluster-adjusted Wald statistics
     # fastbw(g)         # cluster-adjusted backward elimination
     plot(g, age=30:70, sex='female')  # cluster-adjusted confidence bands

     # Get design effects based on inflation of the variances when compared
     # with bootstrap estimates which ignore clustering
     g2 <- robcov(f)
     diag(g$var)/diag(g2$var)

     # Get design effects based on pooled tests of factors in model
     anova(g2)[,1] / anova(g)[,1]



     # A dataset contains one observation per subject, but there may be
     # heteroscedasticity or other model misspecification. Obtain
     # the robust sandwich estimator of the covariance matrix.

     # f <- ols(y ~ pol(age,3), x=TRUE, y=TRUE)
     # f.adj <- robcov(f)

