rda                   package:klaR                   R Documentation

_R_e_g_u_l_a_r_i_z_e_d _D_i_s_c_r_i_m_i_n_a_n_t _A_n_a_l_y_s_i_s (_R_D_A)

_D_e_s_c_r_i_p_t_i_o_n:

     Builds a classification rule using regularized group covariance 
     matrices that are supposed to be more robust against 
     multicollinearity in the data.

_U_s_a_g_e:

     rda(x, ...)

     ## Default S3 method:
     rda(x, grouping = NULL, prior = NULL, gamma = NA, 
         lambda = NA, regularization = c(gamma = gamma, lambda = lambda), 
         crossval = TRUE, fold = 10, train.fraction = 0.5, 
         estimate.error = TRUE, output = FALSE, startsimplex = NULL, 
         max.iter = 100, trafo = TRUE, simAnn = FALSE, schedule = 2, 
         T.start = 0.1, halflife = 50, zero.temp = 0.01, alpha = 2, 
         K = 100, ...)
     ## S3 method for class 'formula':
     rda(formula, data, ...)

_A_r_g_u_m_e_n_t_s:

       x: Matrix or data frame containing the explanatory variables 
          (required, if 'formula' is not given).

 formula: Formula of the form ''groups ~ x1 + x2 + ...''.

    data: A data frame (or matrix) containing the explanatory 
          variables.

grouping: (Optional) a vector specifying the class for  each
          observation; if not specified, the first column of  ''data''
          is taken.

   prior: (Optional) prior probabilities for the classes.  Default:
          proportional to training sample sizes.  "'prior=1'" indicates
          equally likely classes.

gamma, lambda, regularization: One or both of the rda-parameters may be
          fixed manually.  Unspecified parameters are determined by
          minimizing the  estimated error rate (see below).

crossval: Logical. If 'TRUE', in the optimization  step the error rate
          is estimated by Cross-Validation,  otherwise by drawing
          several training- and test-samples.

    fold: The number of Cross-Validation- or Bootstrap-samples to be
          drawn.

train.fraction: In case of Bootstrapping: the fraction of  the data to
          be used for training in each Bootstrap-sample;  the remainder
          is used to estimate the misclassification rate.

estimate.error: Logical. If 'TRUE', the apparent  error rate for the
          final parameter set is estimated.

  output: Logical flag to indicate whether text output  during
          computation is desired.

startsimplex: (Optional) a starting simplex for the 
          Nelder-Mead-minimization.

max.iter: Maximum number of iterations for Nelder-Mead.

   trafo: Logical; indicates whether minimization is carrried  out
          using transformed parameters.

  simAnn: Logical; indicates whether Simulated Annealing  shall be
          used.

schedule: Annealing schedule 1 or 2 (exponential or polynomial).

 T.start: Starting temperature for Simulated Annealing.

halflife: Number of iterations until temperature is reduced to a half 
          (schedule 1).

zero.temp: Temperature at which it is set to zero  (schedule 1).

   alpha: Power of temperature reduction (linear, quadratic, cubic,...)
           (schedule 2).

       K: Number of iterations until temperature = 0  (schedule 2).

     ...: 

_D_e_t_a_i_l_s:

     J.H. Friedman (see references below) suggested a method to fix 
     almost singular covariance matrices in discriminant analysis. 
     Basically, individual covariances as in QDA are used, but 
     depending on two parameters (gamma and  lambda), these can be
     shifted towards a  diagonal matrix and/or the pooled covariance 
     matrix. For (gamma=0, lambda=0) it equals QDA,  for (gamma=0,
     lambda=1) it equals LDA.

     You may fix these parameters at certain values or leave it to  the
     function to try to find "optimal" values. If one  parameter is
     given, the other one is determined using the  R-function
     ''optimize''. If no parameter is  given, both are determined
     numerically by a  Nelder-Mead-(Simplex-)algorithm with the option
     of using  Simulated Annealing. The goal function to be minimized
     is the (estimated)  misclassification rate; the misclassification
     rate is estimated  either by Cross-Validation or by repeatedly
     dividing the data  into training- and test-sets (Boostrapping).

     Since the Nelder-Mead-algorithm is actually intended for 
     _continuous_ functions while the observed error rate  by its
     nature is _discrete_, a greater number of  Boostrap-samples might
     improve the optimization by increasing  the smoothness of the
     response surface (and, of course, by  reducing variance and bias).
      If a set of parameters leads to singular covariance  matrices, a
     penalty term is added to the misclassification rate  which will
     hopefully help to maneuver back out of singularity (so do not
     worry about error rates greater than one during  optimization).

_V_a_l_u_e:

     A list of class 'rda' containing the following  components: 

    call: The (matched) function call.

regularization: vector containing the two regularization  parameters
          (gamma, lambda)

 classes: the names of the classes

   prior: the prior probabilities for the classes

error.rate: apparent error rate (if computation  was not suppressed),
          and, if any optimization took place, the final
          (cross-validated or bootstrapped) error rate estimate as 
          well.

   means: Group means.

covariances: Array of group covariances.

covpooled: Pooled covariance.

converged: (Logical) indicator of convergence (only for  Nelder-Mead).

    iter: Number of iterations actually performed (only for 
          Nelder-Mead).

_M_o_r_e _d_e_t_a_i_l_s:

     The explicit defintion of gamma, lambda and the resulting
     covariance estimates is as follows:

     The pooled covariance estimate SigmaHat is  given as well as the
     individual covariance estimates  SigmaHat_k for each group.

     First, using lambda, a convex combination of  these two is
     computed:

    SigmaHat_k(lambda) := (1-lambda)*SigmaHat_k + lambda*SigmaHat.

     Then, another convex combination is constructed using the  above
     estimate and a (scaled) identity matrix:

 SigmaHat_k(lambda,gamma) := (1-gamma) SigmaHat_k(lambda)  + gamma 1/d trace(SigmaHat_k(lambda)) I.

     The factor  1/d trace(SigmaHat_k(lambda)) in front of the identity
     matrix I is the mean of the diagonal  elements of 
     SigmaHat_k(lambda), so it is  the mean variance of all d variables
     assuming the group  covariance SigmaHat_k(lambda).

     For the four extremes of (gamma,lambda)  the covariance structure
     reduces to special cases:

        *  (gamma=0, lambda=0):  QDA - individual covariance for each
           group.

        *  (gamma=0, lambda=1):  LDA - a common covariance matrix.

        *  (gamma=1, lambda=0):  Conditional independent variables -
           similar to Naive Bayes, but variable variances within group
           (main diagonal elements)  are equal.

        *  (gamma=1, lambda=1): Classification using euclidean distance
           - as in previous case,  but variances are the same for all
           groups. Objects are assigned  to group with nearest mean.

_A_u_t_h_o_r(_s):

     Christian Rver, roever@statistik.uni-dortmund.de

_R_e_f_e_r_e_n_c_e_s:

     Friedman, J.H. (1989): Regularized Discriminant Analysis. In:
     _Journal of the American Statistical Association_ 84,  165-175.

     Press, W.H., Flannery, B.P., Teukolsky, S.A., Vetterling, W.T.
     (1992):  _Numerical Recipes in C_.  Cambridge: Cambridge
     University Press.

_S_e_e _A_l_s_o:

     'predict.rda', 'lda', 'qda'

_E_x_a_m_p_l_e_s:

     data(iris)
     x <- rda(Species ~ ., data = iris, gamma = 0.05, lambda = 0.2)
     predict(x, iris)

