grouped               package:grouped               R Documentation

_R_e_g_r_e_s_s_i_o_n _f_o_r _G_r_o_u_p_e_d _D_a_t_a - _C_o_a_r_s_e _D_a_t_a

_D_e_s_c_r_i_p_t_i_o_n:

     'grouped' is used to fit regression models for grouped or coarse
     data under the assumption  that the data are Coarsened At Random.

_U_s_a_g_e:

     grouped(formula, link = c("identity", "log", "logit"), 
                 distribution = c("normal", "t", "logistic"), data,
                 subset, na.action, str.values, df = NULL, iter = 3, ...)

_A_r_g_u_m_e_n_t_s:

 formula: a two-sided formula describing the model structure. In the
          left-hand side, a two-column response  matrix must be
          supplied, specifying the lower and upper limits (1st and 2nd
          column, respectively) of the interval in which the true
          response lies. They can be defined arbitrarily or you can use
          the  functions 'equispaced' and 'rounding'. 

    link: the link function under which the underlying response
          variable follows the distribution given by the 
          'distribution' argument. Available choices are '"identity"',
          '"log"' and '"logit"'.  See *Details* for more info.

distribution: the assumed distribution for the true latent response
          variable. Available choices are  '"normal"', '"t"' and
          '"logistic"'. See *Details* for more info.

    data: an optional 'data.frame' containing the variables in the
          model. If not found in data, the variables  are taken from
          'environment(formula)', typically the environment from which
          'grouped' is  called.

  subset: an optional vector specifying a subset of observations to be
          used in the fitting process.  

na.action: a function which indicates what should happen when the data
          contain 'NA's. 

str.values: a numeric vector of starting values. 

      df: a scalar numeric value denoting the degrees of freedom when
          the underlying distribution for the response variable is
          assumed to be Student's-t. 

    iter: the number of extra times to call 'optim' in case the first
          optimization has not converged.

     ...: additional arguments; currently none is used. 

_D_e_t_a_i_l_s:

     Let Z_i, i = 1, ..., n be a random sample from a response variable
     of interest. In many problems one can think of the sample space
     S_i of Z_i as being partitioned into a number of groups; one  then
     observes not the exact value of Z_i but the group into which it
     falls. Data generated in this way are called grouped (Heitjan,
     1989). The function 'grouped' and this package are devoted in the
     analysis of such data in the  case the data are Coarsened At
     Random (Heitjan and Rubin, 1991).

     The framework we use assumes a latent variable Z_i which is
     coarsely measured and for which we only know  Y_{li} and Y_{ui},
     i.e., the interval in which Z_i lies. Given some covariates X_i, 
     Z_i|X_i may assume either a Normal, a Logistic or (generalized)
     Student's-t distribution. In addition three  link functions are
     available for greater flexibility. In particular, the likelihood
     is of the following form 

 L_i(beta, sigma) = F[(y_u^* - xbeta)/sigma] - F[(y_l^* - xbeta)/sigma],

     where F(.) denotes the cdf of the assumed distribution given by
     the argument 'distribution' and  y_l^* = phi(y_l), where phi(.)
     denotes the link function,  and y_u is defined analogously. 

     An interesting example of coarse data is the various quality of
     life indexes. The observed value of such indexes can be thought of
     as a rounded version of the _true_ latent quality of life that the
     index attempts to capture.  Applications of this approach can be
     found in Lesaffre et al. (2005) and Tsonaka et al. (2005). Various
     other  examples of grouped and coarse data can be found in Heitjan
     (1989; 1993).

_V_a_l_u_e:

     an object of class 'grouped' is a list with the following
     components: 

coefficients: the estimated coefficients, including the standard
          deviation sigma.

 hessian: the approximate Hessian matrix at convergence returned by
          'optim'.

  fitted: the fitted values.

 details: a list with components: (i) 'X' the design matrix, (ii) 'y'
          the response data matrix, (iii) 'convergence' the convergence
          identifier returned by 'optim', (iv) 'logLik' the value of
          the log-likelihood at convergence, (v) 'k' the number of
          outer iterations used, (vi) 'n' the sample size, (vii) 'df'
          the degrees of freedom; 'NULL' except for the t distribution,
          (viii) 'link' the link function used, (ix) 'distribution' the
          distribution assumed for the true latent response variable
          and (x) 'max.sc' the maximum absolute value of the  score
          vector at convergence.  

    call: the matched call.

_A_u_t_h_o_r(_s):

     Dimitris Rizopoulos dimitris.rizopoulos@med.kuleuven.be

_R_e_f_e_r_e_n_c_e_s:

     Heitjan, D. (1989) Inference from grouped continuous data: A
     review  (with discussion).  _Statistical Science_, *4*, 164-183.

     Heitjan, D. (1993) Ignorability and coarse data: some biomedical
     examples. _Biometrics_, *49*, 1099-1109. 

     Heitjan, D. and Rubin, D. (1991) Ignorability and coarse data.
     _Annals of Statistics_, *19*, 2244-2253.

     Lesaffre, E., Rizopoulos, D. and Tsonaka, S. (2005) The
     logistic-transform for bounded  outcome scores. _Biostatistics_
     (to appear).

     Tsonaka, S., Rizopoulos, D. and Lesaffre, E. (2005) Power and
     sample size calculations for discrete  bounded outcomes.
     _Statistics in Medicine_ (to appear).

_S_e_e _A_l_s_o:

     'anova.grouped', 'plot.grouped', 'residuals.grouped',
     'summary.grouped', 'power.grouped'

_E_x_a_m_p_l_e_s:

         
     grouped(cbind(lo, up) ~ treat * x, link = "logit", data = Sdata)
         
     grouped(equispaced(r, n) ~ x1 * x2, link = "logit", data = Seeds)

     # See Figure 1 and Table 1 in Heitjan (1989)
     y <- iris[iris$Species == "setosa", "Petal.Width"]
     index <- cbind(seq(0.05, 0.55, 0.1), seq(0.15, 0.65, 0.1)) 
     n <- length(y)
     a <- b <- numeric(n)
     for(i in 1:n){
         ind <- which(index[, 2] - y[i] > 0)[1]
         a[i] <- index[ind, 1]
         b[i] <- index[ind, 2]
     }
     summary(grouped(cbind(a, b) ~ 1))

     # See Figure 1 and Table 1 in Heitjan (1989)
     y <- iris[iris$Species == "setosa", "Petal.Length"]
     index <- cbind(seq(0.95, 1.75, 0.2), seq(1.15, 1.95, 0.2)) 
     n <- length(y)
     a <- b <- numeric(n)
     for(i in 1:n){
         ind <- which(index[, 2] - y[i] > 0)[1]
         a[i] <- index[ind, 1]
         b[i] <- index[ind, 2]
     }
     summary(grouped(cbind(a, b) ~ 1))

