mclgen                package:catspec                R Documentation

_R_e_s_t_r_u_c_t_u_r_e _a _d_a_t_a-_f_r_a_m_e _a_s _a "_p_e_r_s_o_n-_c_h_o_i_c_e" _f_i_l_e

_D_e_s_c_r_i_p_t_i_o_n:

     'mclgen' restructures a data-frame into a _person-choice_ file for
     estimation of a multinomial logit as a conditional logit model

_U_s_a_g_e:

     mclgen(datamat, catvar)

_A_r_g_u_m_e_n_t_s:

 datamat: A data-frame to be transformed into a _person-choice_ file

  catvar: A factor representing the response variable (i.e. the
          dependent variable in a multinomial logistic model)

_D_e_t_a_i_l_s:

     A multinomial logit model can be estimated using a program for
     conditional logit regression. This will produce the same
     coefficients and standard errors but allows greater flexibility
     for imposing restrictions on the dependent variable.

     To estimate the multinomial logistic model as a conditional logit
     model, the data must be restructured as a _person-choice_ file.
     'mclgen' performs this operation such that:

        *  each record of the data-frame is duplicated _ncat_ times,
           where _ncat_ is the number of categories of the response
           variable

        *  A new variable 'id' is created to index respondents. This
           variable is used as the stratifying variable in 'clogit'

        *  A new variable 'newy' is created to index response options
           for each respondent

        *  A new variable 'depvar' is created which is equal to 1 for
           the record corresponding with the respondent's actual choice
           and is 0 otherwise

     'depvar' is the dependent variable in 'clogit'. The main effects
     of 'catvar' correspond with the intercept term of a multinomial
     logit model, interactions of 'catvar' with predictor variables
     correspond with the effects of these variables in a multinomial
     logit model.

     Since 'catvar' is now on the right-hand side of the model
     equation, restrictions can be imposed in the usual fashion. For
     example, by using '[MASS]''contr.sdif' in the MASS package for
     'catvar', an adjacent logit model is obtained (Agresti 1990: 318).
     By adding the dummy variables for two categories of 'catvar', an
     equality constraint can be imposed on those categories. These
     equality constraints can then be imposed on the effects of some
     predictor variables but not others.

     Another use is to include a mobility model in a multinomial
     logistic regression model. Mobility models are loglinear models
     for square tables. They lie in the space between a model of
     independence and a saturated model. This is accomplished by
     imposing restrictions on the interaction effect of the row and
     column variable. A number of these special models have been
     developed, see Hout (1983) or Goodman (1984) for an overview.

     These loglinear mobility models can be seen as multinomial
     logistic regression models with special restrictions on the
     dependent variable. The nature of the restriction depends on the
     category of the predictor variable. In practise, mobility models
     can be included in an MCL model using the same specification as
     for a loglinear model. R functions for several common mobility
     models can be found in 'sqtab'.

_V_a_l_u_e:

     A data-frame is returned, restructured as a _person-choice_ file.

_N_o_t_e:

     The effects of predictor variables are modelled as interactions
     with 'depvar' but the main effects of the predictor variables are
     not included in the model since these are constant within strata.
     This causes problems due to the way R handles contrasts in
     interaction effects. If lower order effects of variables are not
     included in a model then contrasts are not used in interactions
     with these variables (see 'terms.object' under the heading
     'factors'). For example, in the model '~Y+Y:X1+Y:X2', contrasts
     will only be used for the main effect of 'Y'. For the interaction
     effects 'Y:X1' and 'Y:X2', dummies for all levels of 'Y' will be
     used. The dummies for 'Y' are linearly dependent within strata of
     the conditional logit model, therefore a warning issues that _"X
     matrix deemed to be singular"_ and the dummy for the last level of
     'Y' is dropped. So the model is estimated but 'depvar' is not
     coded as intended.

     The simplest workaround is to include the main effects of the
     predictor variables in the model, i.e. use '~Y*X1+Y*X2'. The main
     effects of 'X1' and 'X2' are constant within strata so a warning
     will issue that _"X matrix deemed to be singular"_ and these
     effects will be dropped from the model. This has no further
     consequences and 'depvar' is coded as intended.

     A second workaround can be to use 'model.matrix' to create the
     dummies for 'depvar' using the intended contrast. See the example
     below.

     'clogit' does not accept weights. A workaround is to call 'coxph'
     directly (see example below).

_A_u_t_h_o_r(_s):

     John Hendrickx John_Hendrickx@yahoo.com

_R_e_f_e_r_e_n_c_e_s:

     Agresti, Alan. (1990). Categorical data analysis. New York: John
     Wiley & Sons.

     Allison, Paul D. and Nicholas Christakis. (1994). Logit models for
     sets of ranked items. Pp. 199-228 in Peter V. Marsden (ed.),
     Sociological Methodology. Oxford: Basil Blackwell.

     Breen, Richard. (1994). Individual Level Models for Mobility
     Tables and Other Cross- Classifications. Sociological Methods &
     Research 33: 147-173.

     Goodman, Leo A. (1984). The analysis of cross-classified data
     having ordered categories. Cambridge, Mass.: Harvard University
     Press.

     Hendrickx, John. (2000). Special restrictions in multinomial
     logistic regression. Stata Technical Bulletin 56: 18-26.

     Hendrickx, John, Ganzeboom, Harry B.G. (1998). Occupational Status
     Attainment in the Netherlands, 1920-1990. A Multinomial Logistic
     Analysis. European Sociological Review 14: 387-403.

     Hout, Michael. (1983). Mobility Tables. Sage Publication 07-031.

     Logan, John A. (1983). A Multivariate Model for Mobility Tables.
     American Journal of Sociology 89: 324-349.

     <URL: http://www.xs4all.nl/~jhckx/>

_S_e_e _A_l_s_o:

     '[survival]''clogit', '[survival]''coxph', '[nnet]''multinom',
     'sqtab'

_E_x_a_m_p_l_e_s:

     ## Example 1
     # data from the Data from the 1972-78 GSS used by Logan (1983)
     data(logan)

     # create the "person-choice" file
     pc<-mclgen(logan,occ)
     summary(pc)
     attach(pc)

     library(survival)
     # The following specification will work but R won't drop
     # cl.lr<-clogit(depvar~occ+occ:educ+occ:black+strata(id),data=pc)
     # However, R won't drop the first category of "occ"
     # in the interaction effects. The last category will be omitted
     # instead due to linear dependence within strata.
     # Fix for the problem, create dummies manually for "occ"
     occ.X<-model.matrix(~pc$occ)
     occ.X<-occ.X[,attributes(occ.X)$assign==1]
     cl.lr<-clogit(depvar~occ.X+occ.X:educ+occ.X:black+strata(id),data=pc)
     summary(cl.lr)

     # Estimate a "quasi-uniform association" loglinear model for "focc" and "occ"
     # with "educ" and "black" as covariates at the respondent level
     cl.qu<-clogit(depvar~occ.X+occ.X:educ+occ.X:black+
       mob.qi(focc,occ)+mob.unif(focc,occ)+strata(id),data=pc)
     summary(cl.qu)

     data(housing,package="MASS")
     housing.prsch<-mclgen(housing,Sat)
     library(survival)
     # clogit doesn't support the weights argument at present
     # a work-around is to call coxph directly
     # coxph warns that X is singular, because the main
     # effects of Infl, Type, and Cont are dropped
     coxph.prsch<-coxph(Surv(rep(1, NROW(housing.prsch)), depvar) ~
       Sat+Sat*Infl+Sat*Type+Sat*Cont+strata(id),
       weights = housing.prsch$Freq, data = housing.prsch)
     summary(coxph.prsch)

     # the same model using multinomial logistic regression
     library(nnet)
     house.mult<- multinom(Sat ~ Infl + Type + Cont, weights = Freq,
                           data = housing)
     summary(house.mult,correlation=FALSE)

     # compare the coefficients
     m1<-coef(coxph.prsch)
     m1<-m1[!is.na(m1)]
     dim(m1)<-c(2,7)
     m2<-coef(house.mult)
     m1
     m2
     m1-m2
     max(abs(m1-m2))
     mean(abs(m1-m2))

