friedman.data              package:klaR              R Documentation

_F_r_i_e_d_m_a_n'_s _c_l_a_s_s_i_f_i_c_a_t_i_o_n _b_e_n_c_h_m_a_r_k _d_a_t_a

_D_e_s_c_r_i_p_t_i_o_n:

     Function to generate 3-class classification benchmarking data as
     introduced by J.H. Friedman (1989)

_U_s_a_g_e:

     friedman.data(setting = 1, p = 6, samplesize = 40, asmatrix = FALSE)

_A_r_g_u_m_e_n_t_s:

 setting: the problem setting (integer 1,2,...,6).

       p: number of variables (6, 10, 20 or 40).

samplesize: sample size (number of observations, >=6).

asmatrix: if 'TRUE', results are returned as a matrix, otherwise as a
          data frame (default).

_D_e_t_a_i_l_s:

     When J.H. Friedman introduced the Regularized Discriminant
     Analysis ('rda') in 1989, he used artificially generated data to
     test the procedure and to examine its performance in comparison to
     Linear and Quadratic Discriminant Analysis (see also 'lda' and
     'qda').

     6 different settings were considered to demonstrate potential
     strengths and weaknesses of the new method:

        1.  equal spherical covariance matrices,

        2.  unequal spherical covariance matrices,

        3.  equal, highly ellipsoidal covariance matrices with mean
           differences in low-variance subspace,

        4.  equal, highly ellipsoidal covariance matrices with mean
           differences in high-variance subspace,

        5.  unequal, highly ellipsoidal covariance matrices with zero
           mean differences and

        6.  unequal, highly ellipsoidal covariance matrices with
           nonzero mean differences.

     For each of the 6 settings data was generated with 6, 10, 20 and
     40 variables.

     Classification performance was then measured by repeatedly
     creating training-datasets of 40 observations and estimating the
     misclassification rates by test sets of 100 observations.

     The number of classes is always 3, class labels are assigned
     randomly (with equal probabilities) to observations, so the
     contributions of classes to the data differs from dataset to
     dataset. To make sure covariances can be estimated at all, there
     are always at least two observations from each class in a dataset.

_V_a_l_u_e:

     Depending on 'asmatrix' either a data frame or a matrix with
     'samplesize' rows and 'p+1' columns, the first column containing
     the class labels, the remaining columns being the variables.

_A_u_t_h_o_r(_s):

     Christian Rver, roever@statistik.uni-dortmund.de

_R_e_f_e_r_e_n_c_e_s:

     Friedman, J.H. (1989): Regularized Discriminant Analysis. In:
     _Journal of the American Statistical Association_ 84, 165-175.

_S_e_e _A_l_s_o:

     'rda'

_E_x_a_m_p_l_e_s:

     # Reproduce the 1st setting with 6 variables.
     # Error rate should be somewhat near 9 percent.
     training <- friedman.data(1, 6, 40)
     x <- rda(class ~ ., data = training, gamma = 0.74, lambda = 0.77)
     test <- friedman.data(1, 6, 100)
     y <- predict(x, test[,-1])
     errormatrix(test[,1], y$class)

