gssanova0                package:gss                R Documentation

_F_i_t_t_i_n_g _S_m_o_o_t_h_i_n_g _S_p_l_i_n_e _A_N_O_V_A _M_o_d_e_l_s _w_i_t_h _N_o_n-_G_a_u_s_s_i_a_n _R_e_s_p_o_n_s_e_s

_D_e_s_c_r_i_p_t_i_o_n:

     Fit smoothing spline ANOVA models in non-Gaussian regression.  The
     symbolic model specification via 'formula' follows the same rules
     as in 'lm' and 'glm'.

_U_s_a_g_e:

     gssanova0(formula, family, type=NULL, data=list(), weights, subset,
               offset, na.action=na.omit, partial=NULL, method=NULL,
               varht=1, nu=NULL, prec=1e-7, maxiter=30)

_A_r_g_u_m_e_n_t_s:

 formula: Symbolic description of the model to be fit.

  family: Description of the error distribution.  Supported are
          exponential families '"binomial"', '"poisson"', '"Gamma"',
          '"inverse.gaussian"', and '"nbinomial"'.  Also supported are
          accelerated life model families '"weibull"', '"lognorm"', and
          '"loglogis"'.

    type: List specifying the type of spline for each variable. See
          'mkterm' for details.

    data: Optional data frame containing the variables in the model.

 weights: Optional vector of weights to be used in the fitting process.

  subset: Optional vector specifying a subset of observations to be
          used in the fitting process.

  offset: Optional offset term with known parameter 1.

na.action: Function which indicates what should happen when the data
          contain NAs.

 partial: Optional extra unpenalized terms in partial spline models.

  method: Score used to drive the performance-oriented iteration. 
          Supported are 'method="v"' for GCV, 'method="m"' for GML, and
          'method="u"' for Mallow's CL.

   varht: Dispersion parameter needed for 'method="u"'. Ignored when
          'method="v"' or 'method="m"' are specified.

      nu: Inverse scale parameter in accelerated life model families. 
          Ignored for exponential families.

    prec: Precision requirement for the iterations.

 maxiter: Maximum number of iterations allowed for performance-oriented
          iteration, and for inner-loop multiple smoothing parameter
          selection when applicable.

_D_e_t_a_i_l_s:

     The model specification via 'formula' is intuitive.  For example,
     'y~x1*x2' yields a model of the form

          y = C + f_{1}(x1) + f_{2}(x2) + f_{12}(x1,x2) + e

     with the terms denoted by '"1"', '"x1"', '"x2"', and '"x1:x2"'.

     The model terms are sums of unpenalized and penalized terms.
     Attached to every penalized term there is a smoothing parameter,
     and the model complexity is largely determined by the number of
     smoothing parameters.

     Only one link is implemented for each 'family'.  It is the logit
     link for '"binomial"', and the log link for '"poisson"',
     '"Gamma"', and '"inverse.gaussian"'. For '"nbinomial"', the
     working parameter is the logit of the probability p; see
     'NegBinomial'.  For '"weibull"', '"lognorm"', and '"loglogis"', it
     is the location parameter for the log lifetime.

     The models are fitted by penalized likelihood method through the
     performance-oriented iteration as described in the reference; the
     O(n^{3}) algorithms of RKPACK are used for numerical calculations.
      For 'family="binomial"', '"poisson"', '"nbinomial"', '"weibull"',
     '"lognorm"', and '"loglogis"', the score driving the
     performance-oriented iteration defaults to 'method="u"' with
     'varht=1'.  For 'family="Gamma"' and '"inverse.gaussian"', the
     default is 'method="v"'.

_V_a_l_u_e:

     'gssanova0' returns a list object of 'class'
     'c("gssanova0","ssanova0","gssanova")'.

     The method 'summary.gssanova0' can be used to obtain summaries of
     the fits.  The method 'predict.ssanova0' can be used to evaluate
     the fits at arbitrary points along with standard errors, on the
     link scale.  The methods 'residuals.gssanova' and
     'fitted.gssanova' extract the respective traits from the fits.

_R_e_s_p_o_n_s_e_s:

     For 'family="binomial"', the response can be specified either as
     two columns of counts or as a column of sample proportions plus a
     column of total counts entered through the argument 'weights', as
     in 'glm'.

     For 'family="nbinomial"', the response may be specified as two
     columns with the second being the known sizes, or simply as a
     single column with the common unknown size to be estimated through
     the maximum likelihood.

     For 'family="weibull"', '"lognorm"', or '"loglogis"', the response
     consists of three columns, with the first giving the follow-up
     time, the second the censoring status, and the third the
     left-truncation time.  For data with no truncation, the third
     column can be omitted.

_N_o_t_e:

     The direct cross-validation of 'gssanova' can be more effective in
     general, and more stable for complex models.

     For large sample sizes, the approximate solution of 'gssanova' can
     be faster.

     The method 'project' is not implemented for 'gssanova0', nor is
     the mixed-effect model support through 'mkran'.

     In _gss_ versions earlier than 1.0, 'gssanova0' was under the name
     'gssanova'.

_A_u_t_h_o_r(_s):

     Chong Gu, chong@stat.purdue.edu

_R_e_f_e_r_e_n_c_e_s:

     Gu, C. (1992), Cross-validating non Gaussian data. _Journal of
     Computational and Graphical Statistics_, *1*, 169-179.

_E_x_a_m_p_l_e_s:

     ## Fit a cubic smoothing spline logistic regression model
     test <- function(x)
             {.3*(1e6*(x^11*(1-x)^6)+1e4*(x^3*(1-x)^10))-2}
     x <- (0:100)/100
     p <- 1-1/(1+exp(test(x)))
     y <- rbinom(x,3,p)
     logit.fit <- gssanova0(cbind(y,3-y)~x,family="binomial")
     ## The same fit
     logit.fit1 <- gssanova0(y/3~x,"binomial",weights=rep(3,101))
     ## Obtain estimates and standard errors on a grid
     est <- predict(logit.fit,data.frame(x=x),se=TRUE)
     ## Plot the fit and the Bayesian confidence intervals
     plot(x,y/3,ylab="p")
     lines(x,p,col=1)
     lines(x,1-1/(1+exp(est$fit)),col=2)
     lines(x,1-1/(1+exp(est$fit+1.96*est$se)),col=3)
     lines(x,1-1/(1+exp(est$fit-1.96*est$se)),col=3)
     ## Clean up
     ## Not run: 
     rm(test,x,p,y,logit.fit,logit.fit1,est)
     dev.off()
     ## End(Not run)

