bj                  package:Design                  R Documentation

_B_u_c_k_l_e_y-_J_a_m_e_s _M_u_l_t_i_p_l_e _R_e_g_r_e_s_s_i_o_n _M_o_d_e_l

_D_e_s_c_r_i_p_t_i_o_n:

     'bj' fits the Buckley-James distribution-free least squares
     multiple regression model to a possibly right-censored response
     variable.   This model reduces to ordinary least squares if there
     is no censoring.  By default, model fitting is done after taking
     logs of the response variable. 'bj' uses the 'Design' class for
     automatic 'anova', 'fastbw', 'validate', 'Function', 'nomogram',
     'summary', 'plot', 'bootcov', and other functions.  The 'bootcov'
     function may be worth using with 'bj' fits, as the properties of
     the Buckley-James covariance matrix estimator are not fully known
     for strange censoring patterns.

     The 'residuals.bj' function exists mainly to compute  residuals
     and to censor them (i.e., return them as 'Surv' objects) just as
     the original failure time variable was censored.  These residuals
     are useful for checking to see if the model also satisfies certain
     distributional assumptions. To get these residuals, the fit must
     have specified 'y=TRUE'.

     The 'bjplot' function is a special plotting function for objects
     created by 'bj' with 'x=TRUE, y=TRUE' in effect.  It produces
     three scatterplots for every covariate in the model: the first
     plots the original situation, where censored data are distingushed
     from non-censored data by a different plotting symbol. In the
     second plot, called a renovated plot, vertical lines show how
     censored data were changed by the procedure, and the third is
     equal to the second, but without vertical lines.  Imputed data are
     again distinguished from the non-censored by a different symbol.

     The 'validate' method for 'bj' validates the Somers' 'Dxy' rank
     correlation between predicted and observed responses, accounting
     for censoring.

     The primary fitting function for 'bj' is 'bj.fit', which does not
     allow missing data and expects a full design matrix as input.

_U_s_a_g_e:

     bj(formula=formula(data), data, subset, na.action=na.delete,
        link="log", control, method='fit', x=FALSE, y=FALSE, 
        time.inc)

     ## S3 method for class 'bj':
     print(x, digits=4, long=FALSE, ...)

     ## S3 method for class 'bj':
     predict{...}

     ## S3 method for class 'bj':
     residuals(object, type=c("censored","censored.normalized"),...)

     bjplot(fit, which=1:dim(X)[[2]])

     ## S3 method for class 'bj':
     validate(fit, method="boot", B=40,
              bw=FALSE,rule="aic",type="residual",sls=.05,aics=0,pr=FALSE,
                      dxy=TRUE, tol=1e-7, rel.tolerance=1e-3, maxiter=15, ...)

     bj.fit(x, y, control)

_A_r_g_u_m_e_n_t_s:

 formula: an S statistical model formula. Interactions up to third
          order are supported. The left hand side must be a 'Surv'
          object. 

    data: 

  subset: 

na.action: the usual statistical model fitting arguments

     fit: a fit created by 'bj', required for all functions except
          'bj'. 

       x: a design matrix with or without a first column of ones, to
          pass to 'bj.fit'.  All models will have an intercept.  For
          'print.bj' is a result of 'bj'.  For 'bj', set 'x=TRUE' to
          include the design matrix in the fit object.  

       y: a 'Surv' object to pass to 'bj.fit' as the two-column
          response variable.  Only right censoring is allowed, and
          there need not be any censoring.  For 'bj', set 'y' to 'TRUE'
          to include the two-column response matrix, with the 
          event/censoring indicator in the second column.  The first
          column will be transformed according to 'link', and depending
          on 'na.action', rows with missing data in the predictors or
          the response will be deleted. 

    link: set to, for example, '"log"' (the default) to model the log
          of the response, or '"identity"' to model the untransformed
          response. 

 control: a list containing any or all of the following components:
          'iter.max' (maximum number of iterations allowed, default is
          20), 'eps' (convergence criterion: concergence is assumed
          when the ratio of sum of squared errors from one iteration to
          the next is between 1-'eps' and 1+'eps'), 'trace' (set to
          'TRUE' to monitor iterations),  'tol' (matrix singularity
          criterion, default is 1e-7), and 'max.cycle'  (in case of
          nonconvergence the program looks for a cycle that repeats
          itself,  default is 30).   

  method: set to '"model.frame"' or '"model.matrix"' to return one of
          those objects rather than the model fit. 

     dxy: set to 'FALSE' to prevent Somers' D_{xy} from being computed
          by 'validate' (saves time for very large datasets)

time.inc: setting for default time spacing. Default is 30 if time
          variable has 'units="Day"', 1 otherwise, unless maximum
          follow-up time < 1. Then max time/10 is used as 'time.inc'.
          If 'time.inc' is not given and max time/default 'time.inc' is
          > 25, 'time.inc' is increased. 

  digits: number of significant digits to print if not 4. 

    long: set to 'TRUE' to print the correlation matrix for parameter
          estimates 

  object: the result of 'bj'

    type: type of residual desired.  Default is censored unnormalized
          residuals, defined as link(Y) - linear.predictors, where the
          link function was usually the log function.  You can specify
          'type="censored.normalized"' to divide the residuals by the
          estimate of 'sigma'. 

   which: vector of integers or character strings naming elements of
          the design matrix (the names of the original predictors if
          they entered the model linearly) for which to have 'bjplot'
          make plots of only the variables listed in 'which' (names or
          numbers). 

       B: 

      bw: 

    rule: 

     sls: 

    aics: 

      pr: 

     tol: 

rel.tolerance: 

 maxiter: see 'predab.resample'

     ...: ignored for 'print'; passed through to 'predab.resample' for
          'validate'

_D_e_t_a_i_l_s:

     The program implements the algorithm as described in the original
     article by Buckley & James. Also, we have used the original
     Buckley & James prescription for computing variance/covariance
     estimator.  This is based on non-censored observations only and
     does not have any theoretical justification, but has been shown in
     simulation studies to behave well. Our experience confirms this
     view.  Convergence is rather slow with this method, so you may
     want to increase the number of iterations.  Our experience shows
     that often, in particular with high censoring, 100 iterations is
     not too many. Sometimes the method will not converge, but will
     instead enter a loop of repeating values (this is due to the
     discrete nature  of Kaplan and Meier estimator and usually happens
     with small sample sizes). The program will look for such a loop
     and return the average betas. It will also  issue a warning
     message and give the size of the cycle (usually less than 6).

_V_a_l_u_e:

     'bj' returns a fit object with similar information to what
     'survreg', 'psm', 'cph' would store as  well as what 'Design'
     stores and 'units' and 'time.inc'. 'residuals.bj' returns a 'Surv'
     object.  One of the components of the 'fit' object produced by
     'bj' (and 'bj.fit') is a vector called 'stats' which contains the
     following names elements:  '"Obs", "Events", "d.f.","error
     d.f.","sigma"'.  Here 'sigma' is the estimate of the residual
     standard deviation.

_A_u_t_h_o_r(_s):

     Janez Stare
      Department of Biomedical Informatics
      Ljubljana University
      Ljubljana, Slovenia
      janez.stare@mf.uni-lj.si

     Harald Heinzl
      Department of Medical Computer Sciences
      Vienna University
      Vienna, Austria
      harald.heinzl@akh-wien.ac.at

     Frank Harrell
      Department of Biostatistics
      Vanderbilt University
      f.harrell@vanderbilt.edu

_R_e_f_e_r_e_n_c_e_s:

     Buckley JJ, James IR. Linear regression with censored data.
     Biometrika 1979;  66:429-36.

     Miller RG, Halpern J. Regression with censored data. Biometrika
     1982; 69:  521-31.

     James IR, Smith PJ. Consistency results for linear regression with
     censored  data. Ann Statist 1984; 12: 590-600.

     Lai TL, Ying Z. Large sample theory of a modified Buckley-James
     estimator for  regression analysis  with censored data. Ann
     Statist 1991; 19: 1370-402.

     Hillis SL. Residual plots for the censored data linear regression
     model.  Stat in Med 1995; 14: 2023-2036.

_S_e_e _A_l_s_o:

     'Design', 'psm', 'survreg', 'cph', 'Surv', 'na.delete',
     'na.detail.response', 'datadist', 'rcorr.cens'.

_E_x_a_m_p_l_e_s:

     set.seed(1)
     ftime  <- 10*rexp(200)
     stroke <- ifelse(ftime > 10, 0, 1)
     ftime  <- pmin(ftime, 10)
     units(ftime) <- "Month"
     age <- rnorm(200, 70, 10)
     hospital <- factor(sample(c('a','b'),200,TRUE))
     dd <- datadist(age, hospital)
     options(datadist="dd")

     f <- bj(Surv(ftime, stroke) ~ rcs(age,5) + hospital, x=TRUE, y=TRUE)
     # add link="identity" to use a censored normal regression model instead
     # of a lognormal one
     anova(f)
     fastbw(f)
     validate(f, B=15)
     plot(f, age=NA, hospital=NA)  # needs datadist since no explicit age,hosp.
     coef(f)               # look at regression coefficients
     coef(psm(Surv(ftime, stroke) ~ rcs(age,5) + hospital, dist='lognormal'))
                           # compare with coefficients from likelihood-based
                           # log-normal regression model
                           # use dist='gau' not under R 

     r <- resid(f, 'censored.normalized')
     survplot(survfit(r), conf='none') 
                           # plot Kaplan-Meier estimate of 
                           # survival function of standardized residuals
     survplot(survfit(r ~ cut2(age, g=2)), conf='none')  
                           # may desire both strata to be n(0,1)
     options(datadist=NULL)

