moc                   package:moc                   R Documentation

_F_i_t _a _G_e_n_e_r_a_l _N_o_n_l_i_n_e_a_r _M_u_l_t_i_v_a_r_i_a_t_e _M_i_x_t_u_r_e _M_o_d_e_l

_D_e_s_c_r_i_p_t_i_o_n:

     `moc' fits user-specified mixture models with one,two and three
     parameters distributions to multivariate data that can be of
     discrete or continuous type and a mix of both types. The
     likelihood for the vector of observations or repeated measurements
     for subject i has the form


 f( Y[i] = y[i] | z[i], x[i]) = Sum_k P( G[i] = k | z[i]) h( y[i] | G[i] = k, x[i])


     The G[i] represent the mixture groups and h() the conditional
     joint density of the observations Y[i] given the covariates
     (Z[i],X[i]). The user supplies either the joint or marginal
     conditional density(ies) of the components of Y[i]. In the latter
     case, the joint conditional density is constructed by taking the
     product of the marginal densities ( assuming conditional
     independence ).

_U_s_a_g_e:

     moc(y, density=NULL, joint=FALSE, groups=1,
     gmu=NULL, gshape=NULL, gextra=NULL, gmixture=inv.glogit, expected = NULL,
     pgmu=NULL, pgshape=NULL, pgextra=NULL, pgmix=NULL,
     check.length=TRUE, scale.weight=FALSE, wt=1, data=NULL,
     ndigit=10, gradtol=0.0001, steptol=gradtol,iterlim=100,print.level=2,...)

_A_r_g_u_m_e_n_t_s:

       y: A matrix or data frame giving the vectors of observations (of
          length nvar) for each subject.

 density: A function returning the conditional joint or marginal
          density of the observations and calling the location, shape
          and extra functions.

   joint: Specify if the density gives the joint or common marginal
          density of the vector of observations. When using a joint
          density remenber that this density will receive its
          parameters as matrices.

  groups: Number of mixtures.

gmu, gshape, gextra: User-specified `lists' of functions returning the
          location, shape and extra density parameters for each mixture
          group and observation as a function of the parameters pgmu,
          pgshape, pgextra and covariates.

gmixture: A user-specified function of pgmix, giving the regression 
          function of the mixture probabilities. The default is the
          inverse generalized logit with respect to the first group.

expected: A `list' of functions returning the expected response value
          with respect to all the parameters
          `c(pgmu,pgshape,pgextra,pgmix)' for each mixture groups.
          Defaults to `gmu'.

pgmu, pgshape, pgextra, pgmix: Vector of initial estimates for the
          parameters of the location, shape, extra and mixture
          functions.

      wt: Vector of subjects sampling weights. Actually the program
          uses standard sample-weighted log-Likelihood assuming fixed
          weights. 

scale.weight: Logical value specifying if the vector of weights  wt
          should be rescaled to sum to the sample size.

check.length: Logical value specifying weither to check the rows length
          returned by the functions in `gextra' against the number of
          variables in y. Especially usefull when the `density'
          requires more paramaters then the number of variables like
          covariance parameters for multivariate normal.

    data: An optional data frame or list containing some or all
          variables and functions required to fit the model. It is
          `attach'ed at run time and subsequently by `moc' `methods'
          applied to the fitted model. Be aware that `objects' in
          .GlobalEnv will mask any `object' with the same name in data. 

ndigit, gradtol, steptol, iterlim, print.level, ...: Arguments
          controlling `nlm'.

_D_e_t_a_i_l_s:

     The procedure minimizes the resulting -log(Likelihood) without
     constraints, the parameters are all assumed to be real numbers.
     Thus the user should supply appropriate link functions and
     parameterize the density and parameters functions accordingly (
     see the examples ). By default missing values in the response
     variables y are assumed to be missing at random, that is the
     Likelihood for the subset of valid observations is just the
     marginal Likelihood for this subset in each mixture. Specific
     treatment of missing values in the response variables can be
     achieved by handling them explicitly in the functions
     `density',`gmixture',`gmu',`gshape' and `gextra'. The function
     `density' can return `NA' and yields the default treatment of
     missing values in the response. The functions
     `gmixture',`gmu',`gshape' and `gextra', can not return `NA' thus
     missing values in the covariates should be treated explicitly by
     these functions.

     The `lists' of functions `gmu', `gshape', `gextra' returns the
     location, shape and extra parameters to the density for each
     observation and mixture group as a function of pgmu, pgshape and
     pgextra and covariates. Each function should return a vector of
     length nvar or a matrix of such vectors. The first function in the
     `list' is for the first group, the second function for the second
     group and so on. The functions in the same `list' share the same
     parameters but the different `lists' have different parameters (
     see the examples ). 

     Setting the `attributes' parameters for functions
     `gmu',`gshape',`gextra' and `gmixture' will generate parameter
     labels in the printout of the object.

     The deviance residuals, fitted values and posterior probabilities
     are obtained through the use of the methods `residuals',`fitted'
     and `post'.

_V_a_l_u_e:

     A list of class moc is returned that contains all of the relevant
     information calculated, including error code generated by `nlm'.
     The printed output includes  -2 log(Likelihood), the corresponding
     df , AIC, BIC, entropy and ICL-BIC ( entropy corrected BIC ), mean
     mixture probabilities, mean expected and observed values for each
     mixture group, the maximum likelihood estimates and standard
     errors.

_N_o_t_e:

     The `expected' function is used to compute the fitted values and
     response residuals (not deviance). It is especially useful when
     the expected value differs from the location parameters as for
     censored normal or zero inflated Poisson distributions.

     The method of fixed sample-weight provides design-consistent
     parameters estimates. However, for the moment the program does not
     provide any methods to include sampling variances resulting from
     weights estimation. If the user wants to incorporate weights
     estimation sampling variances it could be achieved, for example,
     by including `moc' model estimation in a jackknife loop.

     Be aware that degrees of freedom (df) for mixture models are
     usually useless ( if not meaningless ) and likelihood-ratio of
     apparently nested models often does not converge to a Chi-Square
     with corresponding df.

_A_u_t_h_o_r(_s):

     Bernard Boulerice <Bernard.Boulerice@umontreal.ca>

_R_e_f_e_r_e_n_c_e_s:

     McLachlan, Geoffrey and Peel, David (2000) Finite mixture
     models,Wiley-Interscience, New York.

     Roeder, K., Lynch, K. and Nagin, D. (1999) Modeling Uncertainty in
     Latent Class Membership: A Case Study in Criminology, J. Amer.
     Statist. Assoc., 94, pp. 766-776.

     Lindsay, Bruce G. and Roeder, K. (1992) Residual diagnostics for
     mixture models, J. Amer. Statist. Assoc., 87, pp. 785-794.

_S_e_e _A_l_s_o:

     `print.moc',`plot.moc',`residuals.moc',
     `plot.residuals.moc',`fitted.moc',`post.moc',`AIC.moc',
     `logLik.moc',`obsfit.moc',`nlm'

_E_x_a_m_p_l_e_s:

     data(moc.dat)

     cnorm.dat<-list()   #This is used as a container for functions and data

     # Censored Normal (marginal density)

     cnorm.dat$cnorm<-function(x,mu,sig,min,max)
     {mi<-(x==min)*1
     ma<-(x==max)*1
     mi*pnorm((min-mu)/sig)+ma*(1-pnorm((max-mu)/sig))+
     (1-mi-ma)*dnorm((x-mu)/sig)/sig}

     # For this data set the range of the dependent variables is [0,14]

     cnorm.dat$cnorm1<-function(x,mu,sig,...) {cnorm(x,mu,sig,0,14)}

     # We have 4 observations

     cnorm.dat$gmu1<- list(
        Group1=function(pmu) {t(1)%*%rep(pmu[1],4)},
        Group2=function(pmu) {t(1)%*%rep(pmu[2],4)},
        Group3=function(pmu) {t(1)%*%rep(pmu[3],4)})

     attr(cnorm.dat$gmu1,"parameters")<-c("  cons1","  cons2","  cons3")

     # Expected value of a general censored normal

     cnorm.dat$cmean<-function(mu,sig,min,max) {
     max-(max-mu)*pnorm((max-mu)/sig)+(min-mu)*pnorm((min-mu)/sig)-
     sig*(dnorm((max-mu)/sig)-dnorm((min-mu)/sig)) }

     # Homogeneous variances

     cnorm.dat$gshape1<- list(
        Group1=function(psh) {t(1)%*%rep(exp(psh[1]),4)},
        Group2=function(psh) {t(1)%*%rep(exp(psh[1]),4)},
        Group3=function(psh) {t(1)%*%rep(exp(psh[1]),4)})

     attr(cnorm.dat$gshape1,"parameters")<-c("  log(std.dev)")

     cnorm.dat$cmean1<- list(
        Group1=function(p) {cmean(gmu1[[1]](p[1:3]),gshape1[[1]](p[4]),0,14) },
        Group2=function(p) {cmean(gmu1[[2]](p[1:3]),gshape1[[2]](p[4]),0,14) },
        Group3=function(p) {cmean(gmu1[[3]](p[1:3]),gshape1[[3]](p[4]),0,14) })

     moc1<-
     moc(moc.dat[,1:4],density=cnorm1,groups=3,gmu=gmu1,gshape=gshape1,expected=cmean1,
     pgmu=c(0.5, 2, 5),pgshape=c(0.7),pgmix=c(-0.6, -2.0), data=cnorm.dat,gradtol=1E-4)


     # Heterogeneous variances across mixture groups

     cnorm.dat$gshape2<-list(
        Group1=function(psh) {t(1)%*%rep(exp(psh[1]),4)},
        Group2=function(psh) {t(1)%*%rep(exp(psh[2]),4)},
        Group3=function(psh) {t(1)%*%rep(exp(psh[3]),4)})

     cnorm.dat$cmean2<-list(
        Group1=function(p) {cmean(gmu1[[1]](p[1:3]),gshape2[[1]](p[4:6]),0,14) },
        Group2=function(p) {cmean(gmu1[[2]](p[1:3]),gshape2[[2]](p[4:6]),0,14) },
        Group3=function(p) {cmean(gmu1[[3]](p[1:3]),gshape2[[3]](p[4:6]),0,14) })

     moc2<-
     moc(moc.dat[,1:4],density=cnorm1,groups=3,gmu=gmu1,gshape=gshape2,expected=cmean2,
     pgmu=moc1$coef[1:3],pgshape=c(rep(moc1$coef[4],3)),pgmix=moc1$coef[5:6]
     ,data=cnorm.dat,gradtol=1E-4)

     # Heterogeneous variances across time

     cnorm.dat$gshape3<-list(
        Group1=function(psh) {exp(t(1)%*%psh[1:4])},
        Group2=function(psh) {exp(t(1)%*%psh[1:4])},
        Group3=function(psh) {exp(t(1)%*%psh[1:4])})

     cnorm.dat$cmean3<-list(
        Group1=function(p) {cmean(gmu1[[1]](p[1:3]),gshape3[[1]](p[4:7]),0,14)},
        Group2=function(p) {cmean(gmu1[[2]](p[1:3]),gshape3[[2]](p[4:7]),0,14)},
        Group3=function(p) {cmean(gmu1[[3]](p[1:3]),gshape3[[3]](p[4:7]),0,14)})

     moc3<-
     moc(moc.dat[,1:4],density=cnorm1,groups=3,gmu=gmu1,gshape=gshape3,expected=cmean3,
     pgmu=moc1$coef[1:3],pgshape=c(rep(moc1$coef[4],4)),pgmix=moc1$coef[5:6],
     data=cnorm.dat,gradtol=1E-4)

     cnorm.dat$ages<-cbind(1.7,3,4.2,5.6)


     # Last group is a linear function of time

     cnorm.dat$gmu1t<-list(
        Group1=function(pmu) {pmu[1]*ages^0},
        Group2=function(pmu) {pmu[2]+pmu[3]*ages},
        Group3=function(pmu) {pmu[4]*ages^0})

     cnorm.dat$cmean1t<-list(
        Group1=function(p) {cmean(gmu1t[[1]](p[1:4]),gshape1[[1]](p[5]),0,14)},
        Group2=function(p) {cmean(gmu1t[[2]](p[1:4]),gshape1[[2]](p[5]),0,14)},
        Group3=function(p) {cmean(gmu1t[[3]](p[1:4]),gshape1[[3]](p[5]),0,14)})

     moc4<-
     moc(moc.dat[,1:4],density=cnorm1,groups=3,gmu=gmu1t,gshape=gshape1,expected=cmean1t,
     pgmu=append(moc1$coef[1:3],0,after=2),pgshape=c(moc1$coef[4]),pgmix=moc1$coef[5:6],
     data=cnorm.dat,gradtol=1E-4)

     # Zero inflated Poisson log-linear in time for the third group
     # Be carefull dpois requires integer values

     zip<- function(x,la,shape=1,extra)
     { mix<- exp(extra)/(1+exp(extra))
       mix*(x==0)+(1-mix)*dpois(x,la) }


     ages<-cbind(1.7,3,4.2,5.6)

     gmup<-list(
        Group1=function(pmu) {exp(pmu[1]*ages^0)},
        Group2=function(pmu) {exp(pmu[2]+pmu[3]*ages)},
        Group3=function(pmu) {exp(pmu[4]*ages^0)})


     zipfit<-list(
        Group1=function(p) { gmup[[1]](p)/(1+exp(p[5]))},
        Group2=function(p) { gmup[[2]](p)/(1+exp(p[5]))},
        Group3=function(p) { gmup[[3]](p)/(1+exp(p[5]))})

     gextrap<-list(
        Group1=function(pxt) {t(1)%*%rep(pxt[1],4)},
        Group2=function(pxt) {t(1)%*%rep(pxt[1],4)},
        Group3=function(pxt) {t(1)%*%rep(pxt[1],4)})

     moc5<-
     moc(moc.dat[,1:4],density=zip,groups=3,gmu=gmup,gextra=gextrap,expected = zipfit,
     pgmu=c(-0.6, 0.64,0, 1.6),pgextra=c(-3),pgmix=c(-0.7, -2), gradtol=1E-4)


     # Standard Poisson with mixture depending on time independent
     # dichotomous covariate
     # Be aware that dpoiss require integer values

     dumm<-moc.dat[,5]-1
     gmixt<-function(pm){
     mix<-cbind(1,dumm)%*%matrix(pm[1:4],2,2)
     cbind(1,exp(mix))/(1+apply(exp(mix),1,sum))}

     poiss<-function(x,la,...) {dpois(x,la)}

     moc6<-
     moc(moc.dat[,1:4],density=poiss,groups=3,gmu=gmup,gmixture=gmixt,
     pgmu=c(-0.6,0.64, 0, 1.6),pgmix=c(-0.2,-1, -1 ,-2),gradtol=1E-4)

     obsfit.moc(moc6,along=dumm)

      plot(moc6,against=ages,main="MOC profiles",xlab="age",ylab="Y")
      plot(residuals(moc6))

