divideUp               package:hddplot               R Documentation

_P_a_r_t_i_t_i_o_n _d_a_t_a _i_n_t_o _m_u_t_i_p_l_e _n_e_a_r_l_y _e_q_u_a_l _s_u_b_s_e_t_s

_D_e_s_c_r_i_p_t_i_o_n:

     Randomly partition data into nearly equal subsets. If
     'balanced=TRUE' the requirement is imposed that the subsets should
     as far as possible be balanced with respect to a classifying
     factor.  The multiple sets are suitable for use for determining
     the folds in a cross-validation.

_U_s_a_g_e:

     divideUp(cl, nset = 2, seed = NULL, balanced=TRUE)

_A_r_g_u_m_e_n_t_s:

      cl: classifying factor

    nset: number of subsets into which to partition data

    seed: set the seed, if required, in order to obtain reproducible
          results

balanced: logical: should subsets be as far as possible balanced with
          respect to the classifying factor?

_D_e_t_a_i_l_s:

_V_a_l_u_e:

     a set of indices that identify the 'nset' subsets

_N_o_t_e:

_A_u_t_h_o_r(_s):

     John Maindonald

_R_e_f_e_r_e_n_c_e_s:

_S_e_e _A_l_s_o:

     ~~objects to See Also as 'help', ~~~

_E_x_a_m_p_l_e_s:

     foldid <- divideUp(cl=rep(1:3, c(17,14,8)), nset=10)
     table(rep(1:3, c(17,14,8)), foldid)
     foldid <- divideUp(cl=rep(1:3, c(17,14,8)), nset=10,
                 balanced=FALSE)
     table(rep(1:3, c(17,14,8)), foldid)

     ## The function is currently defined as
     function(cl = rep(1:3, c(7, 4, 8)), nset=2, seed=NULL, balanced=TRUE){
         if(!is.null(seed))set.seed(seed)
         if(balanced){
           ord <- order(cl)
           ordcl <- cl[ord]
           gp0 <- rep(sample(1:nset), length.out=length(cl))
           gp <- unlist(split(gp0,ordcl), function(x)sample(x))
           gp[ord] <- gp
         } else
         gp <- sample(rep(1:nset, length.out=length(cl)))
         as.vector(gp)
       }

