pentrace               package:Design               R Documentation

_T_r_a_c_e _A_I_C _a_n_d _B_I_C _v_s. _P_e_n_a_l_t_y

_D_e_s_c_r_i_p_t_i_o_n:

     For an ordinary unpenalized fit from 'lrm' or 'ols' and for a
     vector or list of penalties,  fits a series of logistic or linear
     models using penalized maximum likelihood estimation, and saves
     the effective degrees of freedom, Akaike Information Criterion
     (AIC), Schwarz Bayesian Information Criterion (BIC), and Hurvich
     and Tsai's corrected AIC (AIC_c).  Optionally 'pentrace' can  use
     the 'nlminb' function to solve for the optimum penalty factor or
     combination of factors penalizing different kinds of terms in the
     model. The 'effective.df' function prints the original and
     effective degrees of freedom for a penalized fit or for an
     unpenalized fit and the best penalization determined from a
     previous invocation of 'pentrace' if 'method="grid"' (the
     default). The effective d.f. is computed separately for each class
     of terms in the model (e.g., interaction, nonlinear). A 'plot'
     method exists to plot the results, and a 'print' method exists to
     print the most pertinent components.  Both AIC and BIC may be
     plotted if  there is only one penalty factor type specified in
     'penalty'.  Otherwise, the first two types of penalty factors are
     plotted, showing only the AIC.

_U_s_a_g_e:

     pentrace(fit, penalty, penalty.matrix, 
              method=c('grid','optimize'),
              which=c('aic.c','aic','bic'), target.df,
              fitter, pr=FALSE, tol=1e-7,
              keep.coef=FALSE, complex.more=TRUE, verbose=FALSE, maxit=12, subset)

     effective.df(fit, object)

     ## S3 method for class 'pentrace':
     print(x, ...)

     ## S3 method for class 'pentrace':
     plot(x, method=c('points','image'), 
          which=c('effective.df','aic','aic.c','bic'), pch=2, add=FALSE, 
          ylim, ...)

_A_r_g_u_m_e_n_t_s:

     fit: a result from 'lrm' or 'ols' with 'x=TRUE, y=TRUE' and
          without using 'penalty' or 'penalty.matrix' (or optionally
          using penalization in the case of 'effective.df') 

 penalty: can be a vector or a list.  If it is a vector, all types of
          terms in the model will be penalized by the same amount,
          specified by elements in 'penalty', with a penalty of zero
          automatically added.  'penalty' can also be a list in the
          format documented in the 'lrm' function, except that elements
          of the list can be vectors.  The 'expand.grid' function is
          invoked by 'pentrace' to generate all possible combinations
          of penalties.  For example, specifying 
          'penalty=list(simple=1:2, nonlinear=1:3)' will generate 6
          combinations to try, so that the analyst can attempt to
          determine whether penalizing more complex terms in the model
          more than the linear or categorical variable terms will be
          beneficial.  If 'complex.more=TRUE', it is assumed that the
          variables given in 'penalty' are listed in order from less
          complex to more complex.  With 'method="optimize"' 'penalty'
          specifies an initial guess for the penalty or penalties.  If
          all term types are to be equally penalized, 'penalty' should
          be a single number, otherwise it should be a list containing
          single numbers as elements, e.g., 'penalty=list(simple=1,
          nonlinear=2)'.  Experience has shown that the optimization
          algorithm is more likely to find a reasonable solution when
          the starting value specified in 'penalty' is too large rather
          than too small. 

  object: an object returned by 'pentrace'.  For 'effective.df',
          'object' can be omitted if the 'fit' was penalized. 

penalty.matrix: see 'lrm' 

  method: The default is 'method="grid"' to print various indexes for
          all combinations of penalty parameters given by the user. 
          Specify 'method="optimize"' to have 'pentrace' use 'nlminb'
          to solve for the combination of penalty parameters that gives
          the maximum value of the objective named in 'which', or, if
          'target.df' is given, to find the combination that yields
          'target.df' effective total degrees of freedom for the model.
           When 'target.df' is specified, 'method' is set to
          '"optimize"' automatically. For 'plot.pentrace' this
          parameter applies only if more than one penalty term-type was
          used.  The default is to use open triangles whose sizes are
          proportional to the ranks of the AICs, plotting the first two
          penalty factors respectively on the x and y  axes.  Use
          'method="image"' to plot an image plot.  

   which: the objective to maximize for either 'method'.  Default is
          '"aic.c"' (corrected AIC). For 'plot.pentrace', 'which' is a
          vector of names of criteria to show; default is to plot all 4
          types, with effective d.f. in its own separate plot 

target.df: applies only to 'method="optimize"'.  See 'method'. 
          'target.df' makes sense mainly when a single type of penalty
          factor is specified. 

  fitter: a fitting function.  Default is 'lrm.fit' ('lm.pfit' is
          always used for 'ols'). 

      pr: set to 'TRUE' to print intermediate results 

     tol: tolerance for declaring a matrix singular (see 'lrm.fit,
          solvet') 

keep.coef: set to 'TRUE' to store matrix of regression  coefficients
          for all the fits (corresponding to increasing values of
          'penalty') in object 'Coefficients' in the returned list. 
          Rows correspond to penalties, columns to regression
          parameters. 

complex.more: By default if 'penalty' is a list, combinations of
          penalties for which complex terms are penalized less than
          less complex terms will be dropped after 'expand.grid' is
          invoked.  Set 'complex.more=FALSE' to allow more complex
          terms to be penalized less.  Currently this option is ignored
          for 'method="optimize"'. 

 verbose: set to 'TRUE' to print number of intercepts and sum of
          effective degrees of freedom

   maxit: maximum number of iterations to allow in a model fit
          (default=12). This is passed to the appropriate fitter
          function with the correct argument name.  Increase 'maxit' if
          you had to when fitting the original unpenalized model. 

  subset: a logical or integer vector specifying rows of the design and
          response matrices to subset in fitting models.  This is most
          useful for bootstrapping 'pentrace' to see if the best
          penalty can be estimated with little error so that variation
          due to selecting the optimal penalty can be safely ignored
          when bootstrapping standard errors of regression coefficients
          and measures of predictive accuracy.  See an example below. 

       x: a result from 'pentrace'

     pch: used for 'method="points"'

     add: set to 'TRUE' to add to an existing plot.  In that case, the
          effective d.f. plot is not re-drawn, but the AIC/BIC plot is
          added to. 

    ylim: 2-vector of y-axis limits for plots other than effective d.f. 

     ...: other arguments passed to 'plot', 'lines', or 'image' 

_V_a_l_u_e:

     a list of class '"pentrace"' with elements 'penalty, df,
     objective, fit, var.adj, diag, results.all', and optionally
     'Coefficients'. The first 6 elements correspond to the fit that
     had the best objective as named in the 'which' argument, from the
     sequence of fits tried. Here 'fit' is the fit object from 'fitter'
     which was a penalized fit, 'diag' is the diagonal of the matrix
     used to compute the effective d.f., and 'var.adj' is Gray (1992)
     Equation 2.9, which is an improved covariance matrix for the
     penalized beta. 'results.all' is a data frame whose first few
     variables are the components of 'penalty' and whose other columns
     are 'df, aic, bic, aic.c'.  'results.all' thus contains a summary
     of results for all fits attempted.  When 'method="optimize"', only
     two components are returned: 'penalty' and 'objective', and the
     object does not have a class.

_A_u_t_h_o_r(_s):

     Frank Harrell
      Department of Biostatistics
      Vanderbilt University
      f.harrell@vanderbilt.edu

_R_e_f_e_r_e_n_c_e_s:

     Gray RJ: Flexible methods for analyzing survival data using
     splines, with applications to breast cancer prognosis.  JASA
     87:942-951, 1992.

     Hurvich CM, Tsai, CL: Regression and time series model selection
     in small samples.  Biometrika 76:297-307, 1989.

_S_e_e _A_l_s_o:

     'lrm', 'ols', 'solvet', 'Design.Misc', 'image'

_E_x_a_m_p_l_e_s:

     n <- 1000    # define sample size
     set.seed(17) # so can reproduce the results
     age            <- rnorm(n, 50, 10)
     blood.pressure <- rnorm(n, 120, 15)
     cholesterol    <- rnorm(n, 200, 25)
     sex            <- factor(sample(c('female','male'), n,TRUE))
     # Specify population model for log odds that Y=1
     L <- .4*(sex=='male') + .045*(age-50) +
       (log(cholesterol - 10)-5.2)*(-2*(sex=='female') + 2*(sex=='male'))
     # Simulate binary y to have Prob(y=1) = 1/[1+exp(-L)]
     y <- ifelse(runif(n) < plogis(L), 1, 0)

     f <- lrm(y ~ blood.pressure + sex * (age + rcs(cholesterol,4)),
              x=TRUE, y=TRUE)
     p <- pentrace(f, seq(.2,1,by=.05))
     plot(p)
     p$diag      # may learn something about fractional effective d.f. 
                 # for each original parameter
     pentrace(f, list(simple=c(0,.2,.4), nonlinear=c(0,.2,.4,.8,1)))

     # Bootstrap pentrace 5 times, making a plot of corrected AIC plot with 5 reps
     n <- nrow(f$x)
     plot(pentrace(f, seq(.2,1,by=.05)), which='aic.c', 
          col=1, ylim=c(30,120)) #original in black
     for(j in 1:5)
       plot(pentrace(f, seq(.2,1,by=.05), subset=sample(n,n,TRUE)), 
            which='aic.c', col=j+1, add=TRUE)

     # Find penalty giving optimum corrected AIC.  Initial guess is 1.0
     if(!.R.) pentrace(f, 1, method='optimize')

     # Find penalty reducing total regression d.f. effectively to 5
     if(!.R.) pentrace(f, 1, target.df=5)

     # Re-fit with penalty giving best aic.c without differential penalization
     f <- update(f, penalty=p$penalty)
     effective.df(f)

