gam                   package:gam                   R Documentation

_F_i_t_t_i_n_g _G_e_n_e_r_a_l_i_z_e_d _A_d_d_i_t_i_v_e _M_o_d_e_l_s

_D_e_s_c_r_i_p_t_i_o_n:

     'gam' is used to fit generalized additive models, specified by
     giving a symbolic description of the additive predictor and a
     description of the error distribution. 'gam' uses the _backfitting
     algorithm_ to combine different smoothing or fitting methods. The
     methods currently supported are local regression and smoothing
     splines.

_U_s_a_g_e:

     gam(formula, family = gaussian, data, weights, subset, na.action, 
            start, etastart, mustart, control = gam.control(...),
     model=FALSE, method, x=FALSE, y=TRUE, ...)

     gam.fit(x, y, smooth.frame, weights = rep(1,nobs), start = NULL, 
         etastart = NULL, mustart = NULL, offset = rep(0, nobs), family = gaussian(), 
         control = gam.control()) 

_A_r_g_u_m_e_n_t_s:

 formula: a formula expression as for other regression models, of the
          form 'response ~ predictors'. See the documentation of 'lm'
          and 'formula' for details.  Built-in nonparametric smoothing
          terms are indicated by 's' for smoothing splines or 'lo' for
          'loess' smooth terms.  See the documentation for 's' and 'lo'
          for their arguments. Additional smoothers can be added by
          creating the appropriate interface functions. Interactions
          with nonparametric smooth terms are not fully supported, but
          will not produce errors; they will simply produce the usual
          parametric interaction.

  family: a description of the error distribution and link function to
          be used in the model. This can be a character string naming a
          family function, a family function or the result of a call to
          a family function.  (See 'family' for details of family
          functions.)

    data: an optional data frame containing the variables in the model.
           If not found in 'data', the variables are taken from
          'environment(formula)', typically the environment from which
          'gam' is called.

 weights: an optional vector of weights to be used in the fitting
          process.

  subset: an optional vector specifying a subset of observations to be
          used in the fitting process.

na.action: a function which indicates what should happen when the data
          contain 'NA's.  The default is set by the 'na.action' setting
          of 'options', and is 'na.fail' if that is unset.  The
          "factory-fresh" default is 'na.omit'. A special method
          'na.gam.replace' allows for mean-imputation of missing values
          (assumes missing at random), and works gracefully with 'gam'

   start: starting values for the parameters in the additive predictor.

etastart: starting values for the additive predictor.

 mustart: starting values for the vector of means.

  offset: this can be used to specify an _a priori_ known component to
          be included in the additive predictor during fitting.

 control: a list of parameters for controlling the fitting process. 
          See the documentation for 'gam.control' for details. These
          can also be set as arguments to 'gam()' itself. 

   model: a logical value indicating whether _model frame_ should be
          included as a component of the returned value.

  method: the method to be used in fitting the parametric part of the
          model. The default method '"glm.fit"' uses iteratively
          reweighted least squares (IWLS).  The only current
          alternative is '"model.frame"' which returns the model frame
          and does no fitting.

    x, y: For 'gam': logical values indicating whether the response
          vector and model matrix used in the fitting process should be
          returned as components of the returned value.

          For 'gam.fit': 'x' is a model matrix of dimension 'n * p',
          and 'y' is a vector of observations of length 'n'. 

smooth.frame: for 'gam.fit' only. This is essentially a subset of the
          model frame corresponding to the smooth terms, and has the
          ingredients needed for smoothing each variable in the
          backfitting algorithm. The elements of this frame are
          produced by the formula functions 'lo' and 's'.

     ...: further arguments passed to or from other methods.

_D_e_t_a_i_l_s:

     The gam model is fit using the local scoring algorithm, which
     iteratively fits weighted additive models by backfitting. The
     backfitting algorithm is a Gauss-Seidel method for fitting
     additive models, by iteratively smoothing partial residuals.  The
     algorithm separates the parametric from the nonparametric part of
     the fit, and fits the parametric part using weighted linear least
     squares within the backfitting algorithm. This version of 'gam'
     remains faithful to the philosophy of GAM models as outlined in
     the references below.

     An object 'gam.slist' (currently set to 'c("lo","s","random")')
     lists the smoothers supported by 'gam'. Corresponding to each of
     these is a smoothing function 'gam.lo', 'gam.s' etc that take
     particular arguments and produce particular output, custom built
     to serve as building blocks in the backfitting algorithm. This
     allows users to add their own smoothing methods. See the
     documentation for these methods for further information. In
     addition, the object 'gam.wlist' (currently set to 'c("s","lo")')
     lists the smoothers for which efficient backfitters are provided.
     These are invoked if all the smoothing methods are of one kind
     (either all '"lo"' or all '"s"').

_V_a_l_u_e:

     'gam' returns an object of class 'gam', which inherits from both
     'glm' and 'lm'.

     Gam objects can be examined by 'print', 'summary', 'plot', and
     'anova'.  Components can be extracted using extractor functions
     'predict', 'fitted', 'residuals', 'deviance', 'formula', and
     'family'. Can be modified using 'update'. It has all the
     components of a 'glm' object, with a few more. This also means it
     can be queried, summarized etc by methods for 'glm' and 'lm'
     objects. Other generic functions that have methods for 'gam'
     objects are 'step' and 'preplot'.

     The following components must be included in a legitimate `gam'
     object. The residuals, fitted values,  coefficients and effects
     should be extracted by the generic functions of the same name,
     rather than by the '`$'' operator.  The 'family' function returns
     the entire family object used in the fitting, and 'deviance' can
     be used to extract the deviance of the fit. 

coefficients: the coefficients of the parametric part of the
          'additive.predictors', which multiply  the columns of the
          model matrix. The names of the coefficients are the names of
          the single-degree-of-freedom effects (the columns of the
          model matrix). If the model is overdetermined there will be
          missing values in the coefficients corresponding to
          inestimable coefficients. 

additive.predictors: the additive fit, given by the product of the
          model matrix and the coefficients, plus the columns of the
          '$smooth' component. 

fitted.values: the fitted mean values, obtained by transforming  the
          component 'additive.predictors' using the inverse link
          function. 

smooth, nl.df, nl.chisq, var: these four characterize the nonparametric
          aspect of the fit. 'smooth' is a matrix of smooth terms, with
          a column corresponding to each smooth term in the model; if
          no smooth terms are in the 'gam' model, all these components
          will be missing.  Each column corresponds to the strictly 
          nonparametric part of the term, while the parametric part is
          obtained from the model matrix. 'nl.df' is a vector giving
          the approximate degrees of freedom for each column of
          'smooth'. For smoothing splines specified by 's(x)', the
          approximate 'df' will be the trace of the implicit smoother
          matrix minus 2. 'nl.chisq' is a vector containing a type of
          score test for the removal of each of the columns of
          'smooth'. 'var' is a matrix like 'smooth', containing the
          approximate pointwise variances for the columns of 'smooth'. 

smooth.frame: This is essentially a subset of the model frame
          corresponding to the smooth terms, and has the ingredients
          needed for making predictions from a 'gam' object

residuals: the  residuals from the final weighted additive fit; also
          known as  residuals, these are typically not interpretable
          without rescaling by the weights. 

deviance: up to a constant, minus twice the maximized log-likelihood.
          Similar to the residual sum of squares. Where sensible, the
          constant is chosen so that a saturated model has deviance
          zero. 

null.deviance: The deviance for the null model, comparable with
          'deviance'. The null model will include the offset, and an
          intercept if there is one in the model

    iter: the number of local scoring  iterations used to compute the
          estimates.  

  family: a three-element character vector giving the name of the
          family, the link, and the variance function; mainly for
          printing purposes. 

 weights: the _working_ weights, that is the weights in the final
          iteration of the local scoring fit.

prior.weights: the case weights initially supplied.

df.residual: the residual degrees of freedom.

 df.null: the residual degrees of freedom for the null model.


     The object will also have the components of a 'lm' object:
     'coefficients', 'residuals', 'fitted.values', 'call', 'terms', 
     and some others involving the numerical fit.  See 'lm.object'.

_A_u_t_h_o_r(_s):

     Written by Trevor Hastie, following closely the design in the
     "Generalized Additive Models" chapter (Hastie, 1992) in Chambers
     and Hastie (1992), and the philosophy in Hastie and Tibshirani
     (1991). This version of 'gam' is adapted from the S version to
     match the 'glm' and 'lm' functions in R. 

     Note that this version of 'gam' is different from the function
     with the same name in the R library 'mgcv', which uses only
     smoothing splines with a focus on automatic smoothing parameter
     selection via GCV.

_R_e_f_e_r_e_n_c_e_s:

     Hastie, T. J. (1991) _Generalized additive models._ Chapter 7 of
     _Statistical Models in S_ eds J. M. Chambers and T. J. Hastie,
     Wadsworth & Brooks/Cole.

     Hastie, T. and Tibshirani, R. (1990) _Generalized Additive
     Models._ London: Chapman and Hall.

     Venables, W. N. and Ripley, B. D. (2002) _Modern Applied
     Statistics with S._ New York: Springer.

_S_e_e _A_l_s_o:

     'glm',  'family', 'lm'.

_E_x_a_m_p_l_e_s:

     data(kyphosis)
     gam(Kyphosis ~ s(Age,4) + Number, family = binomial, data=kyphosis,
     trace=TRUE)
     data(airquality)
     gam(Ozone^(1/3) ~ lo(Solar.R) + lo(Wind, Temp), data=airquality, na=na.gam.replace)
     gam(Kyphosis ~ poly(Age,2) + s(Start), data=kyphosis, family=binomial, subset=Number>2)
     data(gam.data)
     gam.object <- gam(y ~ s(x,6) + z,data=gam.data)
     summary(gam.object)
     plot(gam.object,se=TRUE)
     data(gam.newdata)
     predict(gam.object,type="terms",newdata=gam.newdata)

