dglm                  package:dglm                  R Documentation

_D_O_U_B_L_E _G_E_N_E_R_A_L_I_Z_E_D _L_I_N_E_A_R _M_O_D_E_L_S

_D_e_s_c_r_i_p_t_i_o_n:

     Fits a generalized linear model with a link-linear model for the
     dispersion as well as for the mean.

_U_s_a_g_e:

     dglm(formula=formula(data), dformula = ~ 1, family = gaussian, dlink = "log", 
     data = sys.parent(), subset = NULL, weights = NULL, contrasts = NULL, 
     method = "ml", mustart = NULL, betastart = NULL, etastart = NULL, phistart = NULL, 
     control = dglm.control(...), ykeep = TRUE, xkeep = FALSE, zkeep = FALSE, ...)

     dglm.constant(y,family,weights=1)

_A_r_g_u_m_e_n_t_s:

 formula: a symbolic description of the model to be fit.  The details
          of model specification are found in 'dglm'.

dformula: a formula expression of the form   '~ predictor', the
          response being ignored.  This specifies the linear predictor
          for modelling the dispersion.  A term of the form
          'offset(expression)' is allowed.

  family: a description of the error distribution and link function to
          be used in the model.  See 'glm' for more information.

   dlink: link function for modelling the dispersion.  Any link
          function accepted by the 'quasi' family is allowed, including
          'power(x)'. See details below.

    data: an optional data frame containing the variables in the model.
          See 'glm' for more information.

  subset: an optional vector specifying a subset of observations to be
          used in the fitting process.

 weights: an optional vector of weights to be used in the fitting
          process.

contrasts: an optional list. See the 'contrasts.arg' of
          'model.matrix.default'.

  method: the method used to estimate the dispersion parameters;  the
          default is '"reml"' for restricted maximum likelihood  and
          the alternative is '"ml"' for maximum likelihood.  Upper case
          and partial matches are allowed.

 mustart: numeric vector giving starting values for the fitted values 
          or expected responses.  Must be of the same length as the
          response,  or of length 1 if a constant starting vector is
          desired.  Ignored if 'betastart' is supplied.

betastart: numeric vector giving starting values for the  regression
          coefficients in the link-linear model for the mean.

etastart: numeric vector giving starting values for the  linear
          predictor for the mean model.

phistart: numeric vector giving starting values for the dispersion
          parameters.

 control: a list of iteration and algorithmic constants.  See
          'dglm.control' for their names and default values.  These can
          also be set as arguments to 'dglm' itself.

   ykeep: logical flag: if 'TRUE', the vector of responses is returned.

   xkeep: logical flag: if 'TRUE', the 'model.matrix' for the mean
          model is returned.

   zkeep: logical flag: if 'TRUE', the 'model.matrix' for the
          dispersion model is returned.

     ...: further arguments passed to or from other methods.

       y: numeric response vector

_D_e_t_a_i_l_s:

     Write m_i = E(y_i) for the expectation of the  ith response.  Then
     Var(y_i) = s_iV(m_i) where V is the variance function and s_i is
     the dispersion of the  ith response  (often denoted as the Greek
     character `phi').  We assume the link linear models g(m_i) = x_i^T
     b and h(s_i) = z_i^T a, where x_i and z_i are vectors of
     covariates, and b and a are vectors of regression cofficients
     affecting the mean and dispersion respectively.  The argument
     'dlink' specifies h.  See 'family' for how to specify g.  The
     optional arguments 'mustart', 'betastart' and 'phistart' specify
     starting values for m_i, b and s_i respectively.

     The parameters b are estimated as for an ordinary glm. The
     parameters a are estimated by way of a dual glm in which the
     deviance components of the ordinary glm appear as responses. The
     estimation procedure alternates between one iteration for the mean
     submodel  and one iteration for the dispersion submodel until
     overall convergence.

     The output from 'dglm', 'out' say, consists of two 'glm' objects
     (that for the dispersion submodel is 'out$dispersion.fit') with a
     few more components for the outer iteration and overall
     likelihood.  The 'summary' and 'anova' functions have special
     methods for 'dglm' objects.  Any generic function which has
     methods for 'glm's or 'lm's will work on 'out', giving information
     about the mean submodel.  Information about the dispersion
     submodel can be obtained by using 'out$dispersion.fit' as argument
     rather than out itself.  In particular 'drop1(out,scale=1)' gives
     correct score statistics for  removing terms from the mean
     submodel,  while 'drop1(out$dispersion.fit,scale=2)' gives correct
     score  statistics for removing terms from the dispersion submodel.

     The dispersion submodel is treated as a gamma family unless the
     original  reponses are gamma, in which case the dispersion
     submodel is digamma.  (Note that the digamma and trigamma
     functions are required to fit a digamma family.) This is exact if
     the original glm family is 'gaussian', 'Gamma' or
     'inverse.gaussian'. In other cases it can be  justified by the
     saddle-point approximation to the density of the responses.  The
     results will therefore be close to exact ML or REML when the
     dispersions  are small compared to the means. In all cases the
     dispersion submodel as prior weights 1, and has its own dispersion
     parameter which is 2.

_V_a_l_u_e:

     an object of class 'dglm' is returned,  which inherits from 'glm'
     and 'lm'.  See 'dglm.object' for details.

_N_o_t_e:

     The anova method is questionable when applied to an 'dglm' object
     with 'method="reml"' (stick to 'method="ml"').

_A_u_t_h_o_r(_s):

     Gordon Smyth, ported to R by Peter Dunn (dunn@usq.edu.au)

_R_e_f_e_r_e_n_c_e_s:

     Smyth, G. K. (1989). Generalized linear models with varying
     dispersion.  _J. R. Statist. Soc. B_, *51*, 47-60.

     Smyth, G. K., and Verbyla, A. P. (1999).  Adjusted likelihood
     methods for modelling dispersion in generalized linear models.
     _Environmetrics_, *10*, 696-709.

     Verbyla, A. P., and Smyth, G. K. (1998). Double generalized linear
     models: approximate residual maximum likelihood and diagnostics. 
     Research Report, Department of Statistics, University of Adelaide.

_S_e_e _A_l_s_o:

     'dglm.object', 'dglm.control',  'Digamma family', 'Polygamma'

_E_x_a_m_p_l_e_s:

     # Continuing the example from  glm, but this time try
     # fitting a Gamma double generalized linear model also.
     library(statmod)
     clotting <- data.frame(
           u = c(5,10,15,20,30,40,60,80,100),
           lot1 = c(118,58,42,35,27,25,21,19,18),
           lot2 = c(69,35,26,21,18,16,13,12,12))
              
     # The same example as in  glm: the dispersion is modelled as constant
     out <- dglm(lot1 ~ log(u), ~1, data=clotting, family=Gamma)
     summary(out)

     # Try a double glm 
     out2 <- dglm(lot1 ~ log(u), ~u, data=clotting, family=Gamma)

     summary(out2)
     anova(out2)

     # Summarize the mean model as for a glm
     summary.glm(out2)
         
     # Summarize the dispersion model as for a glm
     summary(out2$dispersion.fit)

     # Examine goodness of fit of dispersion model by plotting residuals
     plot(fitted(out2$dispersion.fit),residuals(out2$dispersion.fit)) 

