svymle                package:survey                R Documentation

_M_a_x_i_m_u_m _p_s_e_u_d_o_l_i_k_e_l_i_h_o_o_d _e_s_t_i_m_a_t_i_o_n _i_n _c_o_m_p_l_e_x _s_u_r_v_e_y_s

_D_e_s_c_r_i_p_t_i_o_n:

     Fits a user-specified likelihood parametrised by multiple linear
     predictors to data from a complex sample survey and computes the
     sandwich variance estimator of the coefficients. Note that this
     function maximises an estimated population likelihood, it is not
     the sample MLE.

_U_s_a_g_e:

     svymle(loglike, gradient = NULL, design, formulas, start = NULL, control
     = list(maxit=1000), na.action="na.fail", method=NULL, ...)
     ## S3 method for class 'svymle':
     summary(object, stderr=c("robust", "model"),...)

_A_r_g_u_m_e_n_t_s:

 loglike: vectorised loglikelihood function

gradient: Derivative of 'loglike'. Required for variance computation
          and helpful for fitting

  design: a 'survey.design' object 

formulas: A list of formulas specifying the variable and linear
          predictors: see Details below

   start: Starting values for parameters

 control: control options for 'optim'

na.action: Handling of 'NA's

  method: '"nlm"' to use 'nlm', otherwise passed to 'optim'

     ...: Arguments to 'loglike' and 'gradient' that are not to be
          optimised over.

  object: 'svymle' object

  stderr: Choice of standard error estimator. The default is a standard
          sandwich estimator. See Details below.

_D_e_t_a_i_l_s:

     Optimization is done by 'nlm' by default or if 'method=="nlm"'.
     Otherwise 'optim' is used and 'method' specifies the method and
     'control' specifies control parameters.

     The 'design' object contains all the data and design information
     from the survey, so all the formulas refer to variables in this
     object. The 'formulas' argument needs to specify the response
     variable and a linear predictor for each freely varying argument
     of 'loglike'.

     Consider for example the 'dnorm' function, with arguments 'x',
     'mean', 'sd' and 'log', and suppose we want to estimate the mean
     of 'y' as a linear function of a variable 'z', and to estimate a
     constant standard deviation.  The 'log' argument must be fixed at
     'FALSE' to get the loglikelihood.  A 'formulas' argument would be
     'list(~y, mean=~z, sd=~1)'. Note that the data variable 'y' must
     be the first argument to 'dnorm' and the first formula and that
     all the other formulas are labelled.  It is also permitted to have
     the data variable as the left-hand side of one of the formulas: eg
     'list( mean=y~z, sd=~1)'.

     The usual variance estimator for MLEs in a survey sample is a
     `sandwich' variance that requires the score vector and the
     information matrix. It requires only sampling assumptions to be
     valid (though some model assumptions are required for it to be
     useful). This is the 'stderr="robust"' option, which is available
     only when the 'gradient' argument was specified.

     If the model is correctly specified and the sampling is at random
     conditional on variables in the model then standard errors based
     on just the information matrix will be approximately valid.  In
     particular, for independent sampling where weights and strata
     depend on variables in the model the 'stderr="model"' should work
     fairly well.

_V_a_l_u_e:

     An object of class 'svymle'

_A_u_t_h_o_r(_s):

     Thomas Lumley

_S_e_e _A_l_s_o:

     'svydesign', 'svyglm'

_E_x_a_m_p_l_e_s:

      data(api)

      dstrat<-svydesign(id=~1, strata=~stype, weight=~pw, fpc=~fpc, data=apistrat)

      ## fit with glm
      m0 <- svyglm(api00~api99+ell,family="gaussian",design=dstrat)
      ## fit as mle (without gradient)
      m1 <- svymle(loglike=dnorm,gradient=NULL, design=dstrat, formulas=list(mean=api00~api99+ell, sd=~1),start=list(c(80,1,0),c(20)), log=TRUE)
      ## with gradient
      gr<- function(x,mean,sd,log){
              dm<-2*(x - mean)/(2*sd^2)
              ds<-(x-mean)^2*(2*(2 * sd))/(2*sd^2)^2 - sqrt(2*pi)/(sd*sqrt(2*pi))
              cbind(dm,ds)
           }
      m2 <- svymle(loglike=dnorm,gradient=gr, design=dstrat, formulas=list(mean=api00~api99+ell, sd=~1),
                    start=list(c(80,1,0),c(20)), log=TRUE, method="BFGS")

      summary(m0)
      summary(m1,stderr="model")
      summary(m2)

      ## More complicated censored data example
      ## showing that the response variable can be multivariate

      data(pbc, package="survival")
      pbc$randomized <- with(pbc, !is.na(trt) & trt>0)
      biasmodel<-glm(randomized~age*edema,data=pbc)
      pbc$randprob<-fitted(biasmodel)
      dpbc<-svydesign(id=~1, prob=~randprob, strata=~edema,
         data=subset(pbc,randomized))

      lcens<-function(x,mean,sd){
         ifelse(x[,2]==1,
                dnorm(log(x[,1]),mean,sd,log=TRUE),
                pnorm(log(x[,1]),mean,sd,log=TRUE,lower.tail=FALSE)
                )
      }

      gcens<- function(x,mean,sd){

             dz<- -dnorm(log(x[,1]),mean,sd)/pnorm(log(x[,1]),mean,sd,lower.tail=FALSE)

             dm<-ifelse(x[,2]==1,
                        2*(log(x[,1]) - mean)/(2*sd^2),
                        dz*-1/sd)
             ds<-ifelse(x[,2]==1,
                        (log(x[,1])-mean)^2*(2*(2 * sd))/(2*sd^2)^2 - sqrt(2*pi)/(sd*sqrt(2*pi)),
                        ds<- dz*-(log(x[,1])-mean)/(sd*sd))
             cbind(dm,ds)      
      }

     if(!is.null(pbc$albumin)){
       svymle(loglike=lcens, gradient=gcens, design=dpbc,
           formulas=list(mean=I(cbind(time,status>0))~bili+protime+albumin,
                         sd=~1),
              start=list(c(10,0,0,0),c(1)))
     } else {
       svymle(loglike=lcens, gradient=gcens, design=dpbc,
           formulas=list(mean=I(cbind(time,status>0))~bili+protime+alb,
                         sd=~1),
              start=list(c(10,0,0,0),c(1)))
     }

