sshzd                  package:gss                  R Documentation

_E_s_t_i_m_a_t_i_n_g _H_a_z_a_r_d _F_u_n_c_t_i_o_n _U_s_i_n_g _S_m_o_o_t_h_i_n_g _S_p_l_i_n_e_s

_D_e_s_c_r_i_p_t_i_o_n:

     Estimate hazard function using smoothing spline ANOVA models.  The
     symbolic model specification via 'formula' follows the same rules
     as in 'lm', but with the response of a special form.

_U_s_a_g_e:

     sshzd(formula, type=NULL, data=list(), alpha=1.4, weights=NULL,
           subset, na.action=na.omit, id.basis=NULL, nbasis=NULL, seed=NULL,
           prec=1e-7, maxiter=30)

_A_r_g_u_m_e_n_t_s:

 formula: Symbolic description of the model to be fit, where the
          response is of the form 'Surv(futime,status,start=0)'.

    type: List specifying the type of spline for each variable. See
          'mkterm' for details.

    data: Optional data frame containing the variables in the model.

   alpha: Parameter defining cross-validation score for smoothing
          parameter selection.

 weights: Optional vector of counts for duplicated data.

  subset: Optional vector specifying a subset of observations to be
          used in the fitting process.

na.action: Function which indicates what should happen when the data
          contain NAs.

id.basis: Index of observations to be used as "knots."

  nbasis: Number of "knots" to be used.  Ignored when 'id.basis' is
          specified.

    seed: Seed to be used for the random generation of "knots." Ignored
          when 'id.basis' is specified.

    prec: Precision requirement for internal iterations.

 maxiter: Maximum number of iterations allowed for internal iterations.

_D_e_t_a_i_l_s:

     The model specification via 'formula' is for the log hazard. For
     example, 'Suve(t,d)~t*u' prescribes a model of the form

         log f(t,u) = C + g_{t}(t) + g_{u}(u) + g_{t,u}(t,u)

     with the terms denoted by '"1"', '"t"', '"u"', and '"t:u"'. 
     Replacing 't*u' by 't+u' in the 'formula', one gets a proportional
     hazard model with g_{t,u}=0.

     'sshzd' takes standard right-censored lifetime data, with possible
     left-truncation and covariates; in
     'Surv(futime,status,start=0)~...', 'futime' is the   follow-up
     time, 'status' is the censoring indicator, and 'start' is the
     optional left-truncation time.  The main effect of 'futime' must
     appear in the model terms specified via '...'.

     Parallel to those in a 'ssanova' object, the model terms are sums
     of unpenalized and penalized terms.  Attached to every penalized
     term there is a smoothing parameter, and the model complexity is
     largely determined by the number of smoothing parameters.

     The selection of smoothing parameters is through a
     cross-validation mechanism described in Gu (2002, Sec. 7.2), with
     a parameter 'alpha'; 'alpha=1' is "unbiased" for the minimization
     of Kullback-Leibler loss but may yield severe undersmoothing,
     whereas larger 'alpha' yields smoother estimates.

     A subset of the observations are selected as "knots."  Unless
     specified via 'id.basis' or 'nbasis', the number of "knots" q is
     determined by max(30,10n^{2/9}), which is appropriate for the
     default cubic splines for numerical vectors.

_V_a_l_u_e:

     'sshzd' returns a list object of 'class "sshzd"'.

     'hzdrate.sshzd' can be used to evaluate the estimated hazard
     function.  'hzdcurve.sshzd' can be used to evaluate hazard curves
     with fixed covariates. 'survexp.sshzd' can be used to calculated
     estimated expected survival.  The method 'project.sshzd' can be
     used to calculate the Kullback-Leibler projection for model
     selection.

_N_o_t_e:

     The function 'Surv(futime,status,start=0)' is defined and parsed
     inside 'sshzd', not quite the same as the one in the 'survival'
     package.

     Integration on the time axis is done by the 200-point
     Gauss-Legendre formula on 'c(min(start),max(futime))', returned
     from 'gauss.quad'.

     The results may vary from run to run.  For consistency, specify
     'id.basis' or set 'seed'.

_A_u_t_h_o_r(_s):

     Chong Gu, chong@stat.purdue.edu

_R_e_f_e_r_e_n_c_e_s:

     Gu, C. (2002), _Smoothing Spline ANOVA Models_.  New York:
     Springer-Verlag.

     Du, P. and Gu, C. (2006), Penalized likelihood hazard estimation:
     efficient approximation and Bayesian confidence intervals.
     _Statistics and Probability Letters_, *76*, 244-254.

_E_x_a_m_p_l_e_s:

     ## Model with interaction
     data(gastric)
     gastric.fit <- sshzd(Surv(futime,status)~futime*trt,data=gastric)
     ## exp(-Lambda(600)), exp(-(Lambda(1200)-Lambda(600))), and exp(-Lambda(1200))
     survexp.sshzd(gastric.fit,c(600,1200,1200),data.frame(trt=as.factor(1)),c(0,600,0))
     ## Clean up
     ## Not run: 
     rm(gastric,gastric.fit)
     dev.off()
     ## End(Not run)

     ## THE FOLLOWING EXAMPLE IS TIME-CONSUMING
     ## Proportional hazard model
     ## Not run: 
     data(stan)
     stan.fit <- sshzd(Surv(futime,status)~futime+age,data=stan)
     ## Evaluate fitted hazard
     hzdrate.sshzd(stan.fit,data.frame(futime=c(10,20),age=c(20,30)))
     ## Plot lambda(t,age=20)
     tt <- seq(0,60,leng=101)
     hh <- hzdcurve.sshzd(stan.fit,tt,data.frame(age=20))
     plot(tt,hh,type="l")
     ## Clean up
     rm(stan,stan.fit,tt,hh)
     dev.off()
     ## End(Not run)

