gllm                  package:gllm                  R Documentation

_G_e_n_e_r_a_l_i_z_e_d _l_o_g-_l_i_n_e_a_r _m_o_d_e_l_l_i_n_g

_D_e_s_c_r_i_p_t_i_o_n:

     Fits log-linear models for incomplete contingency tables,
     including some latent class models, via EM and Fisher scoring
     approaches.

_U_s_a_g_e:

     gllm(y,s,X,method="hybrid",em.maxit=1,tol=0.00001)

_A_r_g_u_m_e_n_t_s:

       y: is the observed contingency table.

       s: is a vector of indices, one for each cell of the full
          (unobserved)  contingency table, representing the appropriate
          cell of 'y'

       X: is the design matrix, or a formula.

  method: chooses the EM, Fisher scoring or a hybrid (EM then scoring)
          method for fitting the model.

em.maxit: is the number of EM iterations.

     tol: is the convergence criterion for the LR criterion.

_D_e_t_a_i_l_s:

     The generalized log-linear model allows for modelling of
     incomplete contingency tables, that is tables where one or more
     dimensions have been collapsed over.  These include situations
     where imprecise measures have been calibrated using a "perfect"
     gold standard, and the true association between imperfectly
     measured variables is to be estimated; where data is missing for a
     subsample of the population; latent variable models where latent
     variables are "errorless" functions of observed variables - eg ML
     gene frequency estimation from counts of observed phenotypes;
     specialised measurement models eg where observed counts are
     mixtures due to perfect measures and error prone measures;
     standard latent class analysis; symmetry and quasi-symmetry models
     for square tables.

     The general framework underlying these models is summarised by
     Espeland (1986), and Espeland & Hui (1987), and is originally due
     to Thompson & Baker (1981).  An observed contingency table y,
     which will be treated as a vector, is modelled as arising from an
     underlying complete table z, where observed count y_j is the sum
     of a number of elements of z, such that each z_i contributes to no
     more than one y_j. Therefore one can write y=F'z, where F is made
     up of orthogonal columns of ones and zeros.

     We then specify a loglinear model for z, so that log(E(z))=X'b,
     where X is a design matrix, and b a vector of loglinear
     parameters.  The loglinear model for z and thus y, can be fitted
     using two methods, both of which are available in 'gllm'.  The
     first was presented as AS207 by Michael Haber (1984) and combines
     an iterative proportional fitting algorithm for b and z, with an
     EM fitting for y, z and b.  The second is a Fisher scoring
     approach, presented in Espeland (1986).

     The 'gllm' function is actually a simple wrapper for
     'scoregllm()'.

_V_a_l_u_e:

     A list with components: 

    iter: 

deviance: 

      df: 

coefficients: 

      se: 

       V: 

observed.values: 

fitted.values: 

residuals: 

full.table: 

     {the expected counts for the full (unobserved) table.}

_R_e_f_e_r_e_n_c_e_s:

     Espeland MA (1986).  A general class of models for discrete
     multivariate data. _Commun. Statist.-Simula_ 15:405-424.

     Espeland MA, Hui SL (1987).  A general approach to analyzing
     epidemiologic data that contains misclassification errors. 
     _Biometrics_ 43:1001-1012.

     Haber M (1984).  AS207: Fitting a general log-linear model.  _Appl
     Statist_ 33:358-362.

     Thompson R, Baker RJ (1981).  Composite link functions in
     generalized linear models.  _Appl Statist_ 30: 125-131.

_E_x_a_m_p_l_e_s:

      
     #
     # latent class analysis: two latent classes
     #
     # Data matrix 2x2x2x2x2 table of responses to five binary items
     # (items 11-15 of sections 6-7 of the Law School Admission Test)
     #
     y<-c( 3,   6,   2,   11,   1,   1,   3,    4,
           1,   8,   0,   16,   0,   3,   2,   15,
          10,  29,  14,   81,   3,  28,  15,   80,
          16,  56,  21,  173,  11,  61,  28,  298)
     #
     # Scatter matrix: full table is 2x2x2x2x2x2
     #
     s<-  c(1:32,1:32)
     #
     # Design matrix: x is the latent variable (2 levels), 
     # a-e are the observed variables
     #
     x<-as.integer(gl(2,32,64))-1
     a<-as.integer(gl(2,16,64))-1
     b<-as.integer(gl(2,8 ,64))-1
     c<-as.integer(gl(2,4 ,64))-1
     d<-as.integer(gl(2,2 ,64))-1
     e<-as.integer(gl(2,1 ,64))-1

     res1<-gllm(y,s,~x*(a+b+c+d+e),method="em",tol=0.01)
     res1
     #
     # An example of model fitting: gametic association between two diallelic loci
     #
     # Data matrix
     #
     y<-c( 187,386,156,
           352,310,20,
           136,0  ,0)
     #
     # Scatter matrix
     #
     s<-  c( 1, 2, 2, 3,
             4, 5, 5, 6,
             4, 5, 5, 6,
             7, 8, 8, 9)
     #
     # Design matrix
     #
     X<-  matrix(c( 1,0,0,0,0,0,1,
                    1,0,1,0,0,0,0,
                    1,0,1,0,0,0,0,
                    1,0,2,0,1,0,0,
                    1,1,0,0,0,0,0,
                    1,1,1,0,0,1,0,
                    1,1,1,0,0,0,1,
                    1,1,2,0,1,1,1,
                    1,1,0,0,0,0,0,
                    1,1,1,0,0,0,1,
                    1,1,1,0,0,1,0,
                    1,1,2,0,1,1,1,
                    1,2,0,1,0,0,0,
                    1,2,1,1,0,1,1,
                    1,2,1,1,0,1,1,
                    1,2,2,1,1,2,2), byrow=TRUE, ncol=7) 

     colnames(X)<-c("Intercept", "A", "B", "P1", "P2", "Delta", "Epsilon")
     res2<-gllm(y,s,X[,c(1:6)],method="hybrid",em.maxit=1,tol=0.00001)
     res2
     #

