geese                package:geepack                R Documentation

_F_u_n_c_t_i_o_n _t_o _s_o_l_v_e _a _G_e_n_e_r_a_l_i_z_e_d _E_s_t_i_m_a_t_i_n_g _E_q_u_a_t_i_o_n _M_o_d_e_l

_D_e_s_c_r_i_p_t_i_o_n:

     Produces an object of class `geese' which is a Generalized
     Estimating Equation fit of the data.

_U_s_a_g_e:

     geese(formula = formula(data), sformula = ~1, id, waves = NULL,
           data = parent.frame(), subset = NULL, na.action = na.omit,
           contrasts = NULL, weights = NULL, zcor = NULL, corp = NULL,
           control = geese.control(...), b = NULL, alpha = NULL, gm = NULL,
           family = gaussian(), mean.link = NULL, variance = NULL,
           cor.link = "identity", sca.link = "identity", link.same = TRUE,
           scale.fix = FALSE, scale.value = 1, corstr = "independence", ...)

     geese.fit(x, y, id, offset = rep(0, N), soffset = rep(0, N),
               weights = rep(1,N), waves = NULL, zsca = matrix(1, N, 1),
               zcor = NULL, corp = NULL, control = geese.control(...),
               b = NULL, alpha = NULL, gm = NULL, family = gaussian(),
               mean.link = NULL, variance = NULL, cor.link = "identity",
               sca.link = "identity", link.same = TRUE, scale.fix = FALSE,
               scale.value = 1, corstr = "independence", ...)

_A_r_g_u_m_e_n_t_s:

 formula: a formula expression as for 'glm', of the form 'response ~
          predictors'. See the documentation of lm and formula for
          details. As for glm, this specifies the linear predictor for
          modeling the mean. A term of the form 'offset(expression)' is
          allowed. 

sformula: a formula expression of the form ' ~ predictor', the response
          being ignored. This specifies the linear predictor for
          modeling the dispersion. A term of the form
          'offset(expression)' is allowed. 

      id: a vector which identifies the clusters.  The length of `id'
          should be the same as the number of observations.  Data are
          assumed to be sorted so that observations on a cluster are
          contiguous rows for all entities in the formula. 

   waves: an integer vector which identifies components in clusters.
          The length of 'waves' should be the same as the number of
          observation. components with the same 'waves' value will have
          the same link functions.  

    data: an optional data frame in which to interpret the variables
          occurring in the 'formula', along with the 'id' and 'n'
          variables. 

  subset: expression saying which subset of the rows of the data should
          be used in the fit.  This can be a logical vector (which is
          replicated to have length equal to the number of
          observations), or a numeric vector indicating which
          observation numbers are to be included, or a character vector
          of the row names to be included. All observations are
          included by default. 

na.action: a function to filter missing data.  For 'gee' only 'na.omit'
          should be used here. 

contrasts: a list giving contrasts for some or all of the factors
          appearing in the model formula.  The elements of the list
          should have the same name as the variable and should be
          either a contrast matrix (specifically, any full-rank matrix
          with as many rows as there are levels in the factor), or else
          a function to compute such a matrix given the number of
          levels. 

 weights: an optional vector of weights to be used  in the fitting
          process. The length of 'weights' should be the same as the
          number of observations. This weights is not (yet) the weight
          as in sas proc genmod, and hence is not recommended to use.  

    zcor: a design matrix for correlation parameters.

    corp: known parameters such as coordinates used for correlation
          coefficients. 

 control: a list of iteration and algorithmic constants. See
          'geese.control' for their names and default values. These can
          also be set as arguments to 'geese' itself. 

       b: an initial estimate for the mean parameters.

   alpha: an initial estimate for the correlation parameters.

      gm: an initial estimate for the scale parameters.

  family: a description of the error distribution and link function to
          be used in the model, as for 'glm'. 

mean.link: a character string specifying the link function for the
          means. The following are allowed: '"identity"', '"logit"',
          '"probit"', '"cloglog"', '"log"', and '"inverse"'. The
          default value is determined from family. 

variance: a character string specifying the variance function in terms
          of the mean. The following are allowed: '"gaussian"',
          '"binomial"', '"poisson"', and '"gamma"'. The default value
          is determined from family. 

cor.link: a character string specifying the link function for the
          correlation coefficients. The following are allowed:
          '"identity"', and '"fisherz"'. 

sca.link: a character string specifying the link function for the
          scales. The following are allowed: '"identity"', and '"log"'. 

link.same: a logical indicating if all the components in a cluster
          should use the same link. 

scale.fix: a logical variable; if true, the scale parameter is fixed at
          the value of 'scale.value'. 

scale.value: numeric variable giving the value to which the scale
          parameter should be fixed; used only if 'scale.fix == TRUE'. 

  corstr: a character string specifying the correlation structure. The
          following are permitted: '"independence"', '"exchangeable"',
          '"ar1"', '"unstructured"', '"userdefined"', and '"fixed"' 

    x, y: 'x' is a design matrix of dimension 'n * p', and 'y' is a
          vector of observations of length 'n'. 

offset, soffset: vector of offset for the mean and for the scale,
          respectively. 

    zsca: a design matrix of dimension 'n * r' for the scales. 

     ...: further arguments passed to or from other methods.

_D_e_t_a_i_l_s:

     when the correlation structure is 'fixed', the specification of
     'Zcor' should be a vector of length 'sum(clusz * (clusz - 1)) /
     2.'

_V_a_l_u_e:

     An object of class '"geese"' representing the fit.

_A_u_t_h_o_r(_s):

     Jun Yan jyan@stat.uiowa.edu

_R_e_f_e_r_e_n_c_e_s:

     Yan, J. and J.P. Fine (2004) Estimating Equations for Association
     Structures. _Statistics in Medicine_, *23*, 859-880.

_S_e_e _A_l_s_o:

     'glm', 'lm', 'ordgee'.

_E_x_a_m_p_l_e_s:

     data(seizure)
     ## Diggle, Liang, and Zeger (1994) pp166-168, compare Table 8.10
     seiz.l <- reshape(seizure,
                       varying=list(c("base","y1", "y2", "y3", "y4")),
                       v.names="y", times=0:4, direction="long")
     seiz.l <- seiz.l[order(seiz.l$id, seiz.l$time),]
     seiz.l$t <- ifelse(seiz.l$time == 0, 8, 2)
     seiz.l$x <- ifelse(seiz.l$time == 0, 0, 1)
     m1 <- geese(y ~ offset(log(t)) + x + trt + x:trt, id = id,
                 data=seiz.l, corstr="exch", family=poisson)
     summary(m1)
     m2 <- geese(y ~ offset(log(t)) + x + trt + x:trt, id = id,
                 data = seiz.l, subset = id!=49,
                 corstr = "exch", family=poisson)
     summary(m2)
     ## Using fixed correlation matrix
     cor.fixed <- matrix(c(1, 0.5, 0.25, 0.125, 0.125,
                           0.5, 1, 0.25, 0.125, 0.125,
                           0.25, 0.25, 1, 0.5, 0.125,
                           0.125, 0.125, 0.5, 1, 0.125,
                           0.125, 0.125, 0.125, 0.125, 1), 5, 5)
     cor.fixed
     zcor <- rep(cor.fixed[lower.tri(cor.fixed)], 59)
     m3 <- geese(y ~ offset(log(t)) + x + trt + x:trt, id = id,
                 data = seiz.l, family = poisson,
                 corstr = "fixed", zcor = zcor)
     summary(m3)

     data(ohio)
     fit <- geese(resp ~ age + smoke + age:smoke, id=id, data=ohio,
                  family=binomial, corstr="exch", scale.fix=TRUE)
     summary(fit)
     fit.ar1 <- geese(resp ~ age + smoke + age:smoke, id=id, data=ohio,
                      family=binomial, corstr="ar1", scale.fix=TRUE)
     summary(fit.ar1)

     ###### simulated data
     ## a function to generate a dataset
     gendat <- function() {
       id <- gl(50, 4, 200)
       visit <- rep(1:4, 50)
       x1 <- rbinom(200, 1, 0.6) ## within cluster varying binary covariate
       x2 <- runif(200, 0, 1)   ## within cluster varying continuous covariate
       phi <- 1 + 2 * x1         ## true scale model
       ## the true correlation coefficient rho for an ar(1)
       ## correlation structure is 0.667.
       rhomat <- 0.667 ^ outer(1:4, 1:4, function(x, y) abs(x - y))
       chol.u <- chol(rhomat)
       noise <- as.vector(sapply(1:50, function(x) chol.u %*% rnorm(4)))
       e <- sqrt(phi) * noise
       y <- 1 + 3 * x1 - 2 * x2 + e
       dat <- data.frame(y, id, visit, x1, x2)
       dat
     }

     dat <- gendat()
     fit <- geese(y ~ x1 + x2, id = id, data = dat, sformula = ~ x1,
                  corstr = "ar1", jack = TRUE, j1s = TRUE, fij = TRUE)
     summary(fit)

     #### create user-defined design matrix of unstrctured correlation.
     #### in this case, zcor has 4*3/2 = 6 columns, and 50 * 6 = 300 rows
     zcor <- genZcor(clusz = rep(4, 50), waves = dat$visit, "unstr")
     zfit <- geese(y ~ x1 + x2, id = id, data = dat, sformula = ~ x1,
                   corstr = "userdefined", zcor = zcor,
                   jack = TRUE, j1s = TRUE, fij = TRUE)
     summary(zfit)

     #### Now, suppose that we want the correlation of 1-2, 2-3, and 3-4
     #### to be the same. Then zcor should have 4 columns.
     z2 <- matrix(NA, 300, 4)
     z2[,1] <- zcor[,1] + zcor[,4] + zcor[,6]
     z2[,2:4] <- zcor[, c(2, 3, 5)]
     summary(geese(y ~ x1 + x2, id = id, data = dat, sformula = ~ x1,
                   corstr = "userdefined", zcor = z2,
                   jack = TRUE, j1s = TRUE, fij = TRUE))

     #### Next, we introduce non-constant cluster sizes by
     #### randomly selecting 60 percent of the data
     good <- sort(sample(1:nrow(dat), .6 * nrow(dat))) 
     mdat <- dat[good,]

     summary(geese(y ~ x1 + x2, id = id, data = mdat, waves = visit,
                   sformula = ~ x1, corstr="ar1",
                   jack = TRUE, j1s = TRUE, fij = TRUE))

