maxsel              package:exactmaxsel              R Documentation

_M_a_x_i_m_a_l_l_y _s_e_l_e_c_t_e_d _c_r_i_t_e_r_i_o_n

_D_e_s_c_r_i_p_t_i_o_n:

     The function 'maxsel' computes the maximal value of the criterion
     of interest (either the Gini-gain or the chi-square statistic)
     over some candidate binary splits. The candidate binary splits
     depend on 'type' (see details).

_U_s_a_g_e:

     maxsel(x, y=NULL, type, statistic)

_A_r_g_u_m_e_n_t_s:

       x: a numeric vector of length n giving the values of the
          variable X for the considered n observations. The classes 
          must be coded as 1,...,K. Alternatively, 'x' can be a 2 x K
          matrix corresponding to a contingency table, where the two
          rows are for the values of Y (Y=0,1) and the K columns are
          for the values of X (X=1,...,K). In this case, 'y' must be
          set to 'y=NULL'. 

       y: a numeric vector of length n giving the class (response
          variable Y) of the considered observations. The classes  must
          be coded as 0 and 1. If 'x' is a contingency table, 'y' must
          be set to 'y=NULL'.

    type: the type of the considered binary splits. 'type="ord"'
          corresponds to an ordinal X variable, 'type="cat"' 
          corresponds to a categorical X variable with unordered
          categories, 'type="ord2"' corresponds to an ordinal X
          variable with 2 cutpoints (non-monotonous association). 

statistic: the association measure used as criterion to select the best
          split. Currently, only 'statistic="chi2"' (chi-square
          statistic) and 'statistic="gini"' (the Gini-gain from machine
          learning) are implemented.

_D_e_t_a_i_l_s:

     For example, let us consider a variable X with the possible values
     {1,2,3,4}. If 'type="ord"', the set of candidate splits consists
     of {1}{2,3,4}, {1,2}{3,4} and {1}{2,3,4}. If 'type="cat"', the set
     of candidate splits consists of {1}{2,3,4}, {1,2}{3,4}, 
     {1,2,3}{4}, {1,2,4}{3}, {1,4}{2,3}, {1,3,4}{2}, {1,3}{2,4}. If
     'type="ord2"', the set of candidate splits consists of {1}{2,3,4},
     {1,2}{3,4},  {1,2,3}{4}, {1,2,4}{3}, {1,4}{2,3}, {1,3,4}{2}.

_V_a_l_u_e:

     the value of the maximally selected criterion.

_A_u_t_h_o_r(_s):

     Anne-Laure Boulesteix (<URL:
     http://www.ibe.med.uni-muenchen.de/organisation/mitarbeiter/020_professuren/boulesteix/>)

_R_e_f_e_r_e_n_c_e_s:

     A.-L. Boulesteix (2006), Maximally selected chi-square statistics
     for ordinal variables, Biometrical Journal 48:451-462.

     A.-L. Boulesteix (2006), Maximally selected chi-square statistics
     and binary splits of nominal variables, Biometrical Journal
     48:838-848.

     C. Strobl, A.-L. Boulesteix and T. Augustin (2007), Unbiased split
     selection  for classification trees based on the Gini index,
     Computational Statistics and Data Analysis 52:483-501.

     A.-L. Boulesteix and C. Strobl (2006), Maximally selected
     chi-square statistics and non-monotonic associations: an exact
     approach based on two cutpoints. Computational Statistics and Data
     Analysis 51:6295-6306.

_S_e_e _A_l_s_o:

     'maxsel.test'.

_E_x_a_m_p_l_e_s:

     # load exactmaxsel library
     library(exactmaxsel)

     # First case: x and y are data vectors
     # Simulate x and y
     x<-sample(4,30,replace=TRUE)
     y<-sample(c(0,1),30,replace=TRUE)

     maxsel.test(x=x,y=y,type="ord",statistic="chi2")
     maxsel.test(x=x,y=y,type="cat",statistic="gini")

     # Second case: x is a contingency table, y=NULL.
     x<-matrix(c(8,10,40,13,15,4),2,4,byrow=TRUE)
     maxsel.test(x=x,y=NULL,type="ord",statistic="chi2")
     maxsel.test(x=x,y=NULL,type="cat",statistic="gini")

