validate               package:Design               R Documentation

_R_e_s_a_m_p_l_i_n_g _V_a_l_i_d_a_t_i_o_n _o_f _a _F_i_t_t_e_d _M_o_d_e_l'_s _I_n_d_e_x_e_s _o_f _F_i_t

_D_e_s_c_r_i_p_t_i_o_n:

     The 'validate' function when used on an object created by one of
     the 'Design' series does resampling validation of a  regression
     model, with or without backward step-down variable deletion. It
     provides bias-corrected indexes that are specific to each type of
     model. For 'validate.cph' and 'validate.psm', see 'validate.lrm',
     which is similar. For 'validate.cph' and 'validate.psm', there is
     an extra argument 'dxy', which if 'TRUE' causes the 'rcorr.cens'
     function to be invoked to compute the Somers' D_{xy} rank
     correlation to be computed at each resample (this takes a bit
     longer than the likelihood based statistics).  For 'validate.cph'
     with 'dxy=TRUE', you must specify an argument 'u' if the model is
     stratified, since survival curves can then cross and X beta is not
     1-1 with predicted survival.   There is also 'validate' method for
     'tree', which only does cross-validation and which has a different
     list of arguments.

_U_s_a_g_e:

     # fit <- fitting.function(formula=response ~ terms, x=TRUE, y=TRUE)
     validate(fit, method="boot", B=40,
              bw=FALSE, rule="aic", type="residual", sls=0.05, aics=0, 
              pr=FALSE, ...)

_A_r_g_u_m_e_n_t_s:

     fit: a fit derived by e.g. 'lrm', 'cph', 'psm', 'ols'. The options
          'x=TRUE' and 'y=TRUE' must have been specified. 

  method: may be '"crossvalidation"', '"boot"' (the default), '".632"',
          or '"randomization"'. See 'predab.resample' for details.  Can
          abbreviate, e.g. '"cross", "b", ".6"'. 

       B: number of repetitions.  For 'method="crossvalidation"', is
          the number of groups of omitted observations. 

      bw: 'TRUE' to do fast step-down using the 'fastbw' function, for
          both the overall model and for each repetition. 'fastbw'
          keeps parameters together that represent the same factor. 

    rule: Applies if 'bw=TRUE'.  '"aic"' to use Akaike's information
          criterion as a stopping rule (i.e., a factor is deleted if
          the chi-square falls below twice its degrees of freedom), or
          '"p"' to use P-values. 

    type: '"residual"' or '"individual"' - stopping rule is for
          individual factors or for the residual chi-square for all
          variables deleted 

     sls: significance level for a factor to be kept in a model, or for
          judging the residual chi-square. 

    aics: cutoff on AIC when 'rule="aic"'. 

      pr: 'TRUE' to print results of each repetition 

     ...: parameters for each specific validate function, and
          parameters to pass to 'predab.resample' (note especially the
          'group', 'cluster', amd 'subset' parameters). For 'psm', you
          can pass the 'maxiter' parameter here (passed to 
          'survreg.control', default is 15 iterations) as well as a
          'tol' parameter  for judging matrix singularity in 'solvet'
          (default is 1e-12) and a 'rel.tolerance' parameter that is
          passed to 'survreg.control' (default is 1e-5). 

_V_a_l_u_e:

     a matrix with rows corresponding to the statistical indexes and
     columns for columns for the original index, resample estimates, 
     indexes applied to the whole or omitted sample using the model
     derived from the resample, average optimism, corrected index, and
     number of successful re-samples.

_S_i_d_e _E_f_f_e_c_t_s:

     prints a summary, and optionally statistics for each re-fit

_A_u_t_h_o_r(_s):

     Frank Harrell
      Department of Biostatistics, Vanderbilt University
      f.harrell@vanderbilt.edu

_S_e_e _A_l_s_o:

     'validate.ols', 'validate.cph', 'validate.lrm', 'validate.tree',
     'predab.resample', 'fastbw', 'Design', 'Design.trans', 'calibrate'

_E_x_a_m_p_l_e_s:

     # See examples for validate.cph, validate.lrm, validate.ols
     # Example of validating a parametric survival model:

     n <- 1000
     set.seed(731)
     age <- 50 + 12*rnorm(n)
     label(age) <- "Age"
     sex <- factor(sample(c('Male','Female'), n, TRUE))
     cens <- 15*runif(n)
     h <- .02*exp(.04*(age-50)+.8*(sex=='Female'))
     dt <- -log(runif(n))/h
     e <- ifelse(dt <= cens,1,0)
     dt <- pmin(dt, cens)
     units(dt) <- "Year"
     S <- Surv(dt,e)

     f <- psm(S ~ age*sex, x=TRUE, y=TRUE)  # Weibull model
     # Validate full model fit
     validate(f, B=10)                # usually B=150

     # Validate stepwise model with typical (not so good) stopping rule
     # bw=TRUE does not preserve hierarchy of terms at present
     validate(f, B=10, bw=TRUE, rule="p", sls=.1, type="individual")

