logistf               package:logistf               R Documentation

_B_i_a_s-_r_e_d_u_c_e_d _l_o_g_i_s_t_i_c _r_e_g_r_e_s_s_i_o_n

_D_e_s_c_r_i_p_t_i_o_n:

     Implements Firth's penalized-likelihood logistic regression

_U_s_a_g_e:

     logistf(formula=attr(data, "formula"), data=sys.parent(),
       pl = TRUE, alpha = 0.05, maxit = 25, maxhs=5, epsilon = .0001,
       maxstep = 10, firth=TRUE, beta0)

_A_r_g_u_m_e_n_t_s:

 formula: a formula object, with the response on the left of the 
          operator, and the model terms on the right. The response must
          be a vector with 0 and 1 or FALSE and TRUE for the model
          outcome, where the higher value (1 or TRUE) is modeled. It's
          possible to include contrasts, interactions, nested effects,
          cubic or polynomial splines and all the S-PLUS features, as
          well, e.g. 'Y ~ X1^*X2 + ns(X3, df=4)'. 

    data: a data.frame where the variables named in the formula can be
          found, i. e. the variables containing the binary response and
          the covariates.

      pl: specifies if confidence intervals and tests should be based
          on the profile penalized log likelihood (pl=TRUE, the
          default) or on the Wald method (pl=FALSE).

   alpha: the significance level (1-alpha the confidence level, 0.05 as
          default).

   maxit: maximum number of iterations (default value is 25)

   maxhs: maximum number of step-halvings per iterations (default value
          is 5)

 epsilon: specifies the maximum allowed change in penalized log
          likelihood to declare convergence. Default value is 0.0001.

 maxstep: specifies the maximum change of (standardized) parameter
          values allowed in one iteration. Default value is 0.5.

   firth: use of Firth's penalized maximum likelihood (firth=TRUE,
          default) or the standard maximum likelihood method
          (firth=FALSE) for the logistic regression. Note that by
          specifying pl=TRUE and firth=FALSE (and probably a lower
          number of iterations)  one obtains profile likelihood
          confidence intervals for maximum likelihood logistic
          regression parameters.

   beta0: 

     {specifies the initial values of the coefficients for the fitting
     algorithm.}

_D_e_t_a_i_l_s:

     The package logistf provides a comprehensive tool to facilitate
     the application of Firth's modified score procedure in logistic
     regression analysis. It was written on a PC with S-PLUS 4.0 but
     runs on R, newer versions of S as well as with other operation
     systems like UNIX. The library is available at the web-site <URL:
     http://www.akh-wien.ac.at/imc/biometrie/programme/fl/index.html>.

     The call of the main function of the library follows the structure
     of the standard functions as lm or glm, requiring a data.frame and
     a formula for the model specification.  The resulting object
     belongs to the new class logistf, which includes penalized maximum
     likelihood (`Firth-Logistic'- or `FL'-type) logistic regression
     parameters, standard errors, confidence limits, p-values, the
     value of the maximized penalized log likelihood, the linear
     predictors, the number of iterations needed to arrive at the
     maximum and much more.  Furthermore, specific methods for the
     resulting object are supplied. Additionally, a function to plot
     profiles of the penalized likelihood function and a function to
     perform penalized likelihood ratio tests have been included.

     In explaining the details of the estimation process we follow
     mainly the description in Heinze & Ploner (2003). In general,
     maximum likelihood estimates are often prone to small sample bias.
     To reduce this bias, Firth (1993) suggested to maximize the
     penalized log likelihood log L(beta)^* = log L(beta) + 0.5log
     |I(beta)|, where I(beta) is the Fisher information matrix, i. e.
     minus the second derivative of the log likelihood. Applying this
     idea to logistic regression, the score function U(beta) is
     replaced by the modified score function U(beta)^* = U(beta) + a,
     where a has rth entry  a_r = 0.5tr{I(beta)_{-1}
     [dI(beta)/dbeta_r]}, r = 1,...,k.   Heinze and Schemper (2001)
     give the explicit formulae for I(beta) and I(beta)/beta_r.

     In our programs estimation of beta is based on a Newton-Raphson
     algorithm. Parameter values are initialized usually with 0, but in
     general the user can specify arbitrary starting values.

     With a starting value of beta^{(0)}, the penalized maximum
     likelihood estimate beta is obtained iteratively:


    beta^{(s+1)}= beta^{(s)} + I(beta^{(s)})^{-1} U(beta^{(s)})^*


     If the penalized log likelihood evaluated at beta^{(s+1)} is less
     than that evaluated at beta^{(s)} , then s) (beta^{(s+1)} is
     recomputed by step-halving. For each entry r of beta with r =
     1,...,k the absolute step size |beta_r^{(s+1)}-beta_r^s| is
     restricted to a maximal allowed value zeta. These two means should
     avoid numerical problems during estimation. The iterative process
     is continued until the parameter estimates converge.

     Computation of profile penalized likelihood confidence intervals
     for parameters ('logistpl') follows the algorithm of Venzon and
     Moolgavkar (1988). For testing the hypothesis of gamma = gamma_0,
     let the likelihood ratio statistic


   LR = 2 [ log L(gamma, delta) - log L(gamma_0,delta_{gamma0})^*]

     , 

     where (gamma, delta)  is the joint penalized maximum likelihood
     estimate of beta= (gamma,delta), and delta_{gamma 0} is the
     penalized maximum likelihood estimate of delta when  gamma=
     gamma_0. The profile penalized likelihood confidence interval is
     the continuous set of values gamma_0 for which LR does not exceed
     the (1 - alpha)100th percentile of the chi^2_1-distribution. The
     confidence limits can therefore be found iteratively by
     approximating the penalized log likelihood function in a
     neighborhood of beta by the quadratic function


       l(beta+delta) = l(beta) + delta'U^* - 0.5 delta' I delta


     where U^* = U(beta)^* and -I = -I(beta).

     In some situations computation of profile penalized likelihood
     confidence intervals may be time consuming since the iterative
     procedure outlined above has to be repeated for the lower and for
     the upper confidence limits of each of the k parameters. In other
     problems one may not be interested in interval estimation, anyway.
     In such cases, the user can request computation of Wald confidence
     intervals and P-values, which are based on the normal
     approximation of the parameter estimates and do not need any
     iterative estimation process. Standard errors sigma_r, r =
     1,...,k, of the parameter estimates are computed as the roots of
     the diagonal elements of the variance matrix V(beta) =
     I(beta)^{-1} . A 100(1 - alpha) parameter beta_r is then defined
     as [beta_r + Psi_{alpha/2}sigma_r, beta_r+Psi_{1-alpha/2}sigma_r]
     where Psi_{alpha} denotes the alpha-quantile of the standard
     normal distribution function. The adequacy of Wald confidence
     intervals for parameter estimates should be verified by plotting
     the profile penalized log likelihood (PPL) function. A symmetric
     shape of the PPL function allows use of Wald intervals, while an
     asymmetric shape demands profile penalized likelihood intervals
     (Heinze & Schemper (2001)).

_R_e_f_e_r_e_n_c_e_s:

     Firth D (1993). Bias reduction of maximum likelihood estimates.
     _Biometrika_  80, 27-38.

     Heinze G (1999). Technical Report 10: The application of Firth's
     procedure to Cox and logistic regression. Department of Medical
     Computer Sciences, Section of Clinical Biometrics, Vienna
     University, Vienna.

     Heinze G, Schemper M (2002). A solution to the problem of 
     separation in logistic regression. _Statistics in Medicine_ 21:
     2409-2419.

     Heinze G, Ploner M (2003). Fixing the nonconvergence bug in 
     logistic regression with SPLUS and SAS. _Computer Methods and 
     Programs in Biomedicine_ 71: 181-187.

     Ploner, M. (2001). Technical Report 2/2001: An SPLUS library to
     perform logistic regression without convergence problems. Section
     of Clinical Biometrics, Department of Medical Computer Sciences,
     University of Vienna, Vienna.

     Venzon DJ, Moolgavkar AH (1988). A method for computing
     profile-likelihood based confidence intervals. _Applied
     Statistics_ 37:87-94.

_E_x_a_m_p_l_e_s:

     data(sex2)
     fit<-logistf(case ~ age+oc+vic+vicl+vis+dia, data=sex2)
     summary(fit)

