stepclass                package:klaR                R Documentation

_S_t_e_p_w_i_s_e _v_a_r_i_a_b_l_e _s_e_l_e_c_t_i_o_n _f_o_r _c_l_a_s_s_i_f_i_c_a_t_i_o_n

_D_e_s_c_r_i_p_t_i_o_n:

     Forward/backward variable selection for classification using any
     specified  classification function and selecting by estimated
     classification performance measure from 'ucpm'.

_U_s_a_g_e:

     stepclass(x, ...)

     ## Default S3 method:
     stepclass(x, grouping, method, improvement = 0.05, maxvar = Inf, 
         start.vars = NULL, direction = c("both", "forward", "backward"), 
         criterion = "CR",  fold = 10, cv.groups = NULL, output = TRUE, ...)
     ## S3 method for class 'formula':
     stepclass(formula, data, method, ...)

_A_r_g_u_m_e_n_t_s:

       x: matrix or data frame containing the explanatory variables 
          (required, if 'formula' is not given).

 formula: A formula of the form 'groups ~ x1 + x2 + ...'.  That is, the
          response is the grouping factor and the right hand side 
          specifies the (non-factor) discriminators.  Interaction terms
          are not supported.

    data: data matrix (rows=cases, columns=variables)

grouping: class indicator vector (a factor)

  method: character, name of classification function  (e.g. "'lda'").

improvement: least improvement of performance measure desired  to
          include or exclude any variable (<=1)

  maxvar: maximum number of variables in model

start.vars: set variables to start with (indices or names).  Default is
          no variables if ''direction'' is  "'forward'" or "'both'", 
          and all variables if ''direction'' is "'backward'".

direction: "'forward'", "'backward'" or  "'both'" (default)

criterion: performance measure taken from 'ucpm'.

    fold: parameter for cross-validation; omitted if ''cv.groups'' is
          specified.

cv.groups: vector of group indicators for cross-validation.  By default
          assigned automatically.

  output: indicator (logical) for textoutput during computation (slows
          down computation!)

     ...: further parameters passed to classification function
          (''method''), e.g. priors etc.

_D_e_t_a_i_l_s:

     The classification "method" (e.g. ''lda'') must have its own 
     ''predict'' method (like ''predict.lda'' for ''lda'')  that either
     returns a matrix of posterior probabilities or a list with an
     element ''posterior'' containing  that matrix instead. It must be
     able to deal with matrices as in 'method(x, grouping, ...)'

     Then a stepwise variable selection is performed.  The initial
     model is defined by the provided starting variables;  in every
     step new models are generated by including every single  variable
     that is not in the model, and by excluding every single  variable
     that is in the model. The resulting performance measure for these 
     models are estimated (by cross-validation), and if the maximum
     value of the chosen criterion is better than ''improvement'' plus
     the value so far, the  corresponding variable is in- or excluded.
     The procedure stops, if the new best value is not good enough, or
     if the specified maximum  number of variables is reached.

     If ''direction'' is "'forward'", the model is only extended (by
     including  further variables), if ''direction'' is "'backward'",
     the model is only  reduced (by excluding variables from the
     model).

_V_a_l_u_e:

     A list of class ''stepclass'' containing the following components: 

    call: the (matched) function call.

  method: name of classification function used (e.g. "'lda'").

start.variables: vector of starting variables.

 process: data frame showing selection process (included/excluded
          variables and performance measure).

   model: the final model: data frame with 2 columns; indices and names
          of variables.

perfomance.measure: value of the criterion used by 'ucpm'

_A_u_t_h_o_r(_s):

     Christian Rver, roever@statistik.uni-dortmund.de, Irina Czogiel

_S_e_e _A_l_s_o:

     'step', 'stepAIC'

_E_x_a_m_p_l_e_s:

     data(iris)
     library(MASS)
     iris.d <- iris[,1:4]  # the data    
     iris.c <- iris[,5]    # the classes 
     x <- stepclass(iris.d, iris.c, "lda", start.vars = "Sepal.Width")
     y <- stepclass(Species ~ ., data = iris, method = "qda", 
         start.vars = "Sepal.Width", criterion = "AS")  # same as above 
     plot(x)

