drm                   package:drm                   R Documentation

_C_o_m_b_i_n_e_d _r_e_g_r_e_s_s_i_o_n _a_n_d _a_s_s_o_c_i_a_t_i_o_n _m_o_d_e_l_s _f_o_r _c_l_u_s_t_e_r_e_d _c_a_t_e_g_o_r_i_c_a_l _r_e_s_p_o_n_s_e_s

_D_e_s_c_r_i_p_t_i_o_n:

     'drm' fits a combined regression and association model for
     longitudinal or otherwise clustered categorical responses using
     dependence ratio as  a measure of the association.

_U_s_a_g_e:

     drm(formula, family=binomial, data=sys.parent(), weights, offset,
     subset=NULL, na.action, start=NULL, link="cum", dep="I", Ncond=TRUE,
     Lclass=2, dropout=FALSE, drop.x=NULL, save.profiles=TRUE, pmatrix=NULL,
     print.level=2, iterlim=200, ...)

_A_r_g_u_m_e_n_t_s:

 formula: a formula expression as for other regression models. In 
          addition  the  cluster term has to be specified in the
          expression by 'cluster()' and if using temporal  association
          structure the temporal term has to be specified by 'Time()'.
          See examples below and the documentation of 'lm' and
          'formula' for further details.

  family: a description of the link function to be  used  in  the model
          for a binary response. Default is logit link.  See 'family'
          for details. For an ordinal response, link is defined for the
          cumulative probabilities when 'link'-argument is set to
          "cum". See 'link' below.

    data: an optional data frame containing the variables in  the
          model.

 weights: an optional vector of  weights  to  be  used  in  the fitting
           process.  Only  equal  weights within cluster are allowed.

  offset: this can be used to specify an a priori known component to be
          included in the linear predictor during fitting.

  subset: an optional vector specifying a subset of observations to be
          used in the fitting process.

na.action: a function which indicates what should happen when the  data
           contain  NAs.  The  default is 'na.include' after which the
          analysis assumes missing data mechanism at random (MAR) if
          'dropout=FALSE', and not at random if 'dropout = TRUE'. See
          'dropout' below.

   start: an optional vector of starting values for the parameters. By
          default, the starting values are estimated from
          'glm'-procedure assuming independence

    link: this can be used to specify alternative link functions for
          nominal and ordinal responses. By default "cum", after which
          the link is specified through 'family = binomial(link=?)' for
          the cumulative probabilities. Alternative links include
          adjacent category logit "acl" and baseline category logit
          "bcl" (baseline category being the last category). For "bcl",
          the regression parameters are estimated for each logit level.
          For a binary response, this argument is ignored.

     dep: 'dep' defines the association structure. The default is
          independence  "I".  Other  singular  options  are for  the
          exchangeable association: Necessary  factor  "N",  Latent
          categorical factor "L", Latent Beta-distributed propensity
          "B" (binary response), Latent Dirichlet-distributed
          propensities "D" (multicategorical response), and for the
          temporal association: first order  Markov  "M", and second
          order Markov "M2" (binary response). By default, Markov
          structure for the adjacent 2-way dependence ratios is assumed
          to be stationary. Superpositions  of  these  structures   can
          be imposed,  such  as  "NL", "NB","ND", "NM", "LM",
          "NLM","NM2". See [3-7] for further details. Parameter
          restrictions, covariates and  functional  forms  for  the
          association parameters can also be specified. In that case
          the 'dep'-argument must be a list. See examples below. For
          the interpretation of the association parameters, see the
          documentation of the support function 'getass.drm'.

   Ncond: logical argument defining whether the regression model is
          marginal or conditional when the association is "N". The
          default is 'TRUE',  i.e. the regression estimation is
          conditional on {N=1}. If covariates are used for the
          "N"-association, it is advisable to set 'Ncond=FALSE', since
          otherwise the interpretation of the regression parameters is
          less clear.

  Lclass: Number of latent classes in the population when the
          association is "L". Default is 2. Available only for binary
          response. Note that in the current implementation, the
          conditional probabilities are not calculated for 'Lclass'>2.
          For checking the validity of the model, the user needs to
          check whether the estimated conditional probabilities fall
          within 0 and 1. See example in 'getass.drm' for parameter
          interpretation and how to calculate the conditional
          probabilities

 dropout: logical argument. For monotone missing patterns in
          longitudinal studies, this argument allows to impose a
          selection model (see [8] for details) on top of regression
          and association model to investigate the sensitivity of the
          results due to missingness. The model formula notation is:
          'logit(hz(drop.cur)) =
          (Intercept)d+response.cur+response.prev' , where
          'response.cur' denotes the effect of current, possibly
          missing response value and 'response.prev' denotes the effect
          of previous response value. MCAR, MAR and MNAR-models can be
          specified by imposing restrictions on selection model
          parameters in 'dep'-argument as for the association
          parameters. See 'dep' above and examples below. If the
          response is a factor, the effect of the factor levels are
          estimated contrasting to the lowest level.

  drop.x: an optional covariate vector for the selection model. The
          covariate's previous value (notation: 'covariate.prev') is
          used in the selection model.

save.profiles: logical argument defining whether the fitted values of
          all possible profiles is saved. If 'FALSE', only the
          indicator vector (-1 for a negative, 1 for a positive
          profile) over all units will be saved. If the cluster size is
          large, using 'save.profiles=TRUE' may result in a very large
          object.

 pmatrix: a character object specifying the name of the matrix for all
          possible profiles, created using 'profiles.drm'. If the
          cluster size is large, this speeds up the estimation in case
          several models are fitted. See examples below.

print.level: level of printing during  numerical  optimisation. The
          default is 2. See 'nlm' for further details.

 iterlim: maximum iteration limit for the numerical maxisimisation. See
          'nlm' for further details.

     ...: other arguments passed to 'nlm', e.g. controlling the
          convergence. See 'nlm' for further details.

_D_e_t_a_i_l_s:

     'drm' gives maximum likelihood estimates for the combined
     regression  and association model by decomposing a joint
     probability of responses in a cluster to univariate  marginal or
     cumulative probabilities and dependence ratios of all orders. See
     [1] and [5] for further details. The dimensionality of the
     association part is reduced by imposing a model for the 
     association  structure with 'dep'-argument. See 'getass.drm' and
     [3-7] for details. Furthermore, a selection model can be added on
     top of regression and association model. See examples below and
     [5] and [8] for details.

_V_a_l_u_e:

     'drm' returns an object of class 'drm'. The function 'summary'
     (i.e., 'summary.drm') can be used to obtain or print a summary of
     the results. The generic accessor function 'coefficients' can be
     used to extract coefficients.

     An object of class 'drm' is a list containing at least the
     following components:

coefficients: a named vector of regression, and possibly association
          and selection model coefficients.

cov.scaled: a variance-covariance matrix of the parameter estimates.

fitted.marginals: the fitted values for the univariate means, obtained
          by transforming the linear predictors by the inverse of the
          link function.

fitted.conditionals: in case of "L"-structure, the fitted values for
          the conditional univariate means, otherwise NULL. Not yet
          implemented for 'Lclass'>2; see also 'getass.drm'.

fitted.profiles: the fitted response profile probabilities within each
          cluster, calculated by using the maximum likelihood estimates
          from the model. See also 'save.profiles' above. Note that
          within each cluster, the order of the responses is by Time
          for Markov structures, and for exchangeable structures with
          missing values, by response value, with missing values (NA)
          last.

deviance: minus twice the maximised log-likelihood.

     aic: An Information Criterion: minus twice the maximised
          log-likelihood plus twice the number of coefficients. Not
          available if the likelihood is weighted with the dropout
          probabilities.

   niter: the number of iterations that 'nlm' used.

    code: convergence code from 'nlm'. See 'nlm' for details.

    call: the matched call.

   terms: the `terms' object used.

_W_A_R_N_I_N_G:

     The maximum likelihood estimates may sometimes lead to negative
     fitted probabilities. In this case, both generic print-methods
     warn about this. In this case, the model is considered to be
     wrongly specified and model specification should be changed.

_A_u_t_h_o_r(_s):

     Jukka Jokinen, jukka.jokinen@helsinki.fi

_R_e_f_e_r_e_n_c_e_s:

     1. Ekholm A, Smith PWF, McDonald  JW.  Marginal  regression
     analysis  of  a  multivariate  binary response. _Biometrika_ 1995;
     82(4):847-854.

     2. Ekholm A, Skinner C. The Muscatine children's obesity data
     reanalysed using pattern mixture models. _Applied Statistics_
     1998; 47:251-263.

     3. Ekholm A, McDonald JW, Smith PWF.  Association  models for  a 
     multivariate  binary  response. _Biometrics_ 2000; 56:712-718.

     4. Ekholm A, Jokinen J, Kilpi T. Combining regression and
     association modelling on longitudinal data on bacterial carriage.
     _Statistics in Medicine_ 2002; 21:773-791.

     5. Ekholm A, Jokinen J, McDonald JW, Smith PWF. Joint regression
     and association modelling of longitudinal ordinal data.
     _Biometrics_ 2003; 59:795-803.

     6. Jokinen J, McDonald JW, Smith PWF. Meaningful regression and
     association models for clustered ordinal data. _Sociological
     Methodology_ 2006; 36:173-199.

     7. Jokinen J. Fast estimation algorithm for likelihood-based
     analysis of repeated categorical responses. _Computational
     Statistics and Data Analysis_ 2006; 51:1509-1522.

     8. Diggle PJ, Kenward MJ. Informative dropout in longitudinal data
     analysis. _Applied Statistics_ 1994;  43: 49-94.

_S_e_e _A_l_s_o:

     'getass.drm', 'nlm', 'cluster', 'Time'  'profiles.drm', 'depratio'

_E_x_a_m_p_l_e_s:

     ######################################################
     ## Examples for binary responses
     ###########################################
     ## Wheeze among Steubenville (see [3]):
     ## Latent Beta-distributed propensity
     data(wheeze)
     fit1 <- drm(wheeze~I(age>9)+smoking+cluster(id),data=wheeze,dep="B", print=0)

     ## Obesity among Muscatine children (see [2]):
     ## Analysis for completers: M2 for girls
     data(obese)
     fit2 <- drm(obese~age+cluster(id)+Time(year), subset=sex=="female",
                 dep="M2",data=obese)

     ## Not run: 
     ## Muscatine children continued (see [3]):
     ## LM for boys and girls separately
     fit3 <- drm(obese~age+cluster(id)+Time(age), subset=sex=="male",
                 dep="LM",data=obese)

     fit4 <- drm(obese~age+cluster(id)+Time(age), subset=sex=="female",
                 dep="LM",data=obese)
     ## End(Not run)
     ############################################
     ## Examples for ordinal responses
     ############################################
     ## Movie critic example (see [6]):
     ## Latent Dirichlet propensities with baseline category link.
     data(movie)

     options(contrasts=c("contr.treatment","contr.treatment"))
     fit5 <- drm(y~critic+cluster(movie), data=movie, dep="D", link="bcl")

     ## Longitudinal dataset on teenage marijuana use (see [6]):
     ## Superposition of structures N, L and M for the girls.
     data(marijuana)

     fit6 <- drm(y~age+cluster(id)+Time(age), data=marijuana,
                 subset=sex=="female", dep=list("NLM", ~kappa1==1,
                 ~kappa2==0, ~tau12==1, ~tau21==1, ~tau11==tau22))

     ## Parameter restrictions with functions using M-structure for the boys.
     ## Plot the second order dependence ratios:
     plot(depratio(y~cluster(id)+Time(age), data=marijuana,
          subset=sex=="male"))

     ## fit the model in [6]:
     fit7 <- drm(y~age+cluster(id)+Time(age), data=marijuana,
                 subset=sex=="male", dep=list("M", 
                 tau12~function(a=1,b=0) a+b*c(0:3),
                 tau21~function(a=1,b=0) a+b*c(0:3)))

     ## Not run: 
     ##############################################
     ## Covariates for the association (see [7]):
     ##############################################
     data(madras)

     ## plot empirical 2nd order dependence ratios with bootstrap CI's
     tau.madras <- depratio(symptom~cluster(id)+Time(month), data=madras,
                            boot.ci = TRUE, n.boot = 1000)
     plot(tau.madras, log="y", ylim=c(1,40), plot.ci=TRUE)

     ## create matrix for profiles:
     W.madras <- profiles.drm(n.categories=2, n.repetitions=12, "M")

     ## create four-level covariate, combining age and sex:
     madras$age.sex <- factor(paste(madras$age,madras$sex,sep="."))

     ## fit the model in [7], Section 4:
     fit8 <- drm(symptom~age+sex+month+month:age+month:sex+cluster(id)+Time(month),
                 data=madras, Ncond=FALSE, save.profiles=FALSE, pmatrix="W.madras",
                 dep=list("NM",nu~nu:age.sex,
                          tau~function(a0=0,a1=0) 1+a0*exp(a1*c(0:10))), print=2)

     ###################################################
     ## Dropout model on top of regression & association 
     ###################################################
     ## Continue with the madras data.
     ## fit a model without the dropout model:
     fit9 <- drm(symptom~age+sex+month+month:age+month:sex+cluster(id)+Time(month),
                 data=madras, save.profiles=FALSE, pmatrix="W.madras", print=0,
                 dep=list("NM", tau~function(a0=0,a1=0) 1+a0*exp(a1*c(0:10))))

     ## A dropout model assuming MCAR for the thought disorders:

     mcar <- drm(symptom~age+sex+month+month:age+month:sex+cluster(id)+Time(month),
                 data=madras, save.profiles=FALSE, pmatrix="W.madras",
                 dep=list("NM", tau~function(a0=0,a1=0) 1+a0*exp(a1*c(0:10)),
                          ~symptom.cur==0,~symptom.prev==0),
                 dropout=TRUE, start=c(coef(fit9), -4))

     ## A dropout model assuming MAR; including sex as a covariate:

     mar <- drm(symptom~age+sex+month+month:age+month:sex+cluster(id)+Time(month),
                data=madras, save.profiles=FALSE, pmatrix="W.madras",
                dep=list("NM", tau~function(a0=0,a1=0) 1+a0*exp(a1*c(0:10)),
                         ~symptom.cur==0), dropout=TRUE, drop.x=sex,
                start=c(coef(mcar),0,0))

     ## A dropout model assuming MNAR and sex as a covariate:

     mnar <- drm(symptom~age+sex+month+month:age+month:sex+cluster(id)+Time(month),
                 data=madras, save.profiles=FALSE, pmatrix="W.madras",
                 dep=list("NM", tau~function(a0=0,a1=0) 1+a0*exp(a1*c(0:10))),
                 dropout=TRUE, drop.x=sex, start=c(coef(mcar),0,0,0))

     ## print out coefficients and std.errors:
     coef(summary(mnar))
     ## End(Not run)
     ## std.error of `symptom.cur' all over the place; too few dropouts
     ## for a comprehensive evaluation of the dropout mechanism

