gssanova                 package:gss                 R Documentation

_F_i_t_t_i_n_g _S_m_o_o_t_h_i_n_g _S_p_l_i_n_e _A_N_O_V_A _M_o_d_e_l_s _w_i_t_h _N_o_n-_G_a_u_s_s_i_a_n _R_e_s_p_o_n_s_e_s

_D_e_s_c_r_i_p_t_i_o_n:

     Fit smoothing spline ANOVA models in non-Gaussian regression.  The
     symbolic model specification via 'formula' follows the same rules
     as in 'lm' and 'glm'.

_U_s_a_g_e:

     gssanova(formula, family, type=NULL, data=list(), weights, subset,
              offset, na.action=na.omit, partial=NULL, alpha=NULL, nu=NULL,
              id.basis=NULL, nbasis=NULL, seed=NULL, random=NULL)

_A_r_g_u_m_e_n_t_s:

 formula: Symbolic description of the model to be fit.

  family: Description of the error distribution.  Supported are
          exponential families '"binomial"', '"poisson"', '"Gamma"',
          and '"nbinomial"'.  Also supported are accelerated life model
          families '"weibull"', '"lognorm"', and '"loglogis"'.

    type: List specifying the type of spline for each variable. See
          'mkterm' for details.

    data: Optional data frame containing the variables in the model.

 weights: Optional vector of weights to be used in the fitting process.

  subset: Optional vector specifying a subset of observations to be
          used in the fitting process.

  offset: Optional offset term with known parameter 1.

na.action: Function which indicates what should happen when the data
          contain NAs.

 partial: Optional extra unpenalized terms in partial spline models.

   alpha: Tuning parameter defining cross-validation; larger values
          yield smoother fits.  Defaults are 'alpha=1' for
          'family="binomial"' and 'alpha=1.4' otherwise.

      nu: Inverse scale parameter in accelerated life model families. 
          Ignored for exponential families.

id.basis: Index designating selected "knots".

  nbasis: Number of "knots" to be selected.  Ignored when 'id.basis' is
          supplied.

    seed: Seed for reproducible random selection of "knots". Ignored
          when 'id.basis' is supplied.

  random: Input for parametric random effects in nonparametric
          mixed-effect models.  See 'mkran' for details.

_D_e_t_a_i_l_s:

     The model specification via 'formula' is intuitive.  For example,
     'y~x1*x2' yields a model of the form

          y = C + f_{1}(x1) + f_{2}(x2) + f_{12}(x1,x2) + e

     with the terms denoted by '"1"', '"x1"', '"x2"', and '"x1:x2"'.

     The model terms are sums of unpenalized and penalized terms.
     Attached to every penalized term there is a smoothing parameter,
     and the model complexity is largely determined by the number of
     smoothing parameters.

     Only one link is implemented for each 'family'.  It is the logit
     link for '"binomial"', and the log link for '"poisson"', and
     '"Gamma"'.  For '"nbinomial"', the working parameter is the logit
     of the probability p; see 'NegBinomial'.  For '"weibull"',
     '"lognorm"', and '"loglogis"', it is the location parameter for
     the log lifetime.

     The selection of smoothing parameters is through direct
     cross-validation.  The cross-validation score used for
     'family="poisson"' is taken from density estimation as in Gu and
     Wang (2003), and those used for other families are derived
     following the lines of Gu and Xiang (2001).

     A subset of the observations are selected as "knots."  Unless
     specified via 'id.basis' or 'nbasis', the number of "knots" q is
     determined by max(30,10n^{2/9}), which is appropriate for the
     default cubic splines for numerical vectors.

_V_a_l_u_e:

     'gssanova' returns a list object of class
     'c("gssanova","ssanova")'.

     The method 'summary.gssanova' can be used to obtain summaries of
     the fits.  The method 'predict.ssanova' can be used to evaluate
     the fits at arbitrary points along with standard errors, on the
     link scale.  The method 'project.gssanova' can be used to
     calculate the Kullback-Leibler projection for model selection. 
     The methods 'residuals.gssanova' and 'fitted.gssanova' extract the
     respective traits from the fits.

_R_e_s_p_o_n_s_e_s:

     For 'family="binomial"', the response can be specified either as
     two columns of counts or as a column of sample proportions plus a
     column of total counts entered through the argument 'weights', as
     in 'glm'.

     For 'family="nbinomial"', the response may be specified as two
     columns with the second being the known sizes, or simply as a
     single column with the common unknown size to be estimated through
     the maximum likelihood.

     For 'family="weibull"', '"lognorm"', or '"loglogis"', the response
     consists of three columns, with the first giving the follow-up
     time, the second the censoring status, and the third the
     left-truncation time.  For data with no truncation, the third
     column can be omitted.

_N_o_t_e:

     For simpler models and moderate sample sizes, the exact solution
     of 'gssanova0' can be faster.

     The results may vary from run to run. For consistency, specify
     'id.basis' or set 'seed'.

     In _gss_ versions earlier than 1.0, 'gssanova' was under the name
     'gssanova1'.

_A_u_t_h_o_r(_s):

     Chong Gu, chong@stat.purdue.edu

_R_e_f_e_r_e_n_c_e_s:

     Gu, C. and Xiang, D. (2001), Cross validating non Gaussian data:
     generalized approximate cross validation revisited.  _Journal of
     Computational and Graphical Statistics_, *10*, 581-591.

     Gu, C. and Wang, J. (2003), Penalized likelihood density
     estimation: Direct cross-validation and scalable approximation.
     _Statistica Sinica_, *13*, 811-826.

_E_x_a_m_p_l_e_s:

     ## Fit a cubic smoothing spline logistic regression model
     test <- function(x)
             {.3*(1e6*(x^11*(1-x)^6)+1e4*(x^3*(1-x)^10))-2}
     x <- (0:100)/100
     p <- 1-1/(1+exp(test(x)))
     y <- rbinom(x,3,p)
     logit.fit <- gssanova(cbind(y,3-y)~x,family="binomial")
     ## The same fit
     logit.fit1 <- gssanova(y/3~x,"binomial",weights=rep(3,101),
                            id.basis=logit.fit$id.basis)
     ## Obtain estimates and standard errors on a grid
     est <- predict(logit.fit,data.frame(x=x),se=TRUE)
     ## Plot the fit and the Bayesian confidence intervals
     plot(x,y/3,ylab="p")
     lines(x,p,col=1)
     lines(x,1-1/(1+exp(est$fit)),col=2)
     lines(x,1-1/(1+exp(est$fit+1.96*est$se)),col=3)
     lines(x,1-1/(1+exp(est$fit-1.96*est$se)),col=3)

     ## Fit a mixed-effect logistic model
     data(bacteriuria)
     bact.fit <- gssanova(infect~trt+time,family="binomial",data=bacteriuria,
                          id.basis=(1:820)[bacteriuria$id%in%c(3,38)],random=~1|id)
     ## Predict fixed effects
     predict(bact.fit,data.frame(time=2:16,trt=as.factor(rep(1,15))),se=TRUE)
     ## Estimated random effects
     bact.fit$b

     ## Clean up
     ## Not run: 
     rm(test,x,p,y,logit.fit,logit.fit1,est,bacteriuria,bact.fit)
     dev.off()
     ## End(Not run)

