cforest                package:party                R Documentation

_R_a_n_d_o_m _F_o_r_e_s_t

_D_e_s_c_r_i_p_t_i_o_n:

     An implementation of the random forest and bagging ensemble
     algorithms utilizing conditional inference trees as base learners.

_U_s_a_g_e:

     cforest(formula, data = list(), subset = NULL, weights = NULL, 
             controls = cforest_control(),
             xtrafo = ptrafo, ytrafo = ptrafo, scores = NULL)

_A_r_g_u_m_e_n_t_s:

 formula: a symbolic description of the model to be fit. 

    data: an data frame containing the variables in the model. 

  subset: an optional vector specifying a subset of observations to be
          used in the fitting process.

 weights: an optional vector of weights to be used in the fitting
          process. Only non-negative integer valued weights are
          allowed.

controls: an object of class 'ForestControl-class', which can be
          obtained using 'cforest_control'.

  xtrafo: a function to be applied to all input variables. By default,
          the 'ptrafo' function is applied.

  ytrafo: a function to be applied to all response variables. By
          default, the 'ptrafo' function is applied.

  scores: an optional named list of scores to be attached to ordered
          factors.

_D_e_t_a_i_l_s:

     This implementation of the random forest (and bagging) algorithm
     differs from the reference implementation in 'randomForest]' with
     respect to the base learner used and the aggregation scheme
     applied.

     Conditional inference trees, see 'ctree', are fitted to each of
     the 'B' bootstrap samples of the learning sample. There are many
     hyper parameters that can be controlled, see 'cforest_control'.
     You MUST NOT change anything you don't understand completely.

     The aggregation scheme works by averaging observation weights
     extracted from each of the 'B' trees and NOT by averaging
     predictions directly. See Hothorn et al. (2004) for a description. 

     Ensembles of conditional inference trees have not yet been
     extensively tested, so this routine is meant for the expert user
     only and its current state is rather experimental. However, there
     are some things that can't be done with 'randomForest', for
     example fitting forests to censored response variables.

     By default, raw test statitics are maximized and five inputs are
     randomly examined for possible splits in each node.

_V_a_l_u_e:

     An object of class 'RandomForest-class'.

_R_e_f_e_r_e_n_c_e_s:

     Leo Breiman (2001). Random Forests. _Machine Learning_, 45(1),
     5-32.

     Torsten Hothorn, Berthold Lausen, Axel Benner and Martin
     Radespiel-Troeger (2004). Bagging Survival Trees. _Statistics in
     Medicine_, *23*(1), 77-91.

_E_x_a_m_p_l_e_s:

         ### honest (i.e., out-of-bag) cross-classification of
         ### true vs. predicted classes
         table(mammoexp$ME, predict(cforest(ME ~ ., data = mammoexp, 
                                    control = cforest_control(ntree = 50)),
                                    OOB = TRUE))

         ### fit forest to censored response
         if (require("ipred")) {

             data("GBSG2", package = "ipred")
             bst <- cforest(Surv(time, cens) ~ ., data = GBSG2, 
                        control = cforest_control(ntree = 50))

             ### estimate conditional Kaplan-Meier curves
             treeresponse(bst, newdata = GBSG2[1:2,], OOB = TRUE)

             ### if you can't resist to look at individual trees ...
             party:::prettytree(bst@ensemble[[1]], names(bst@data@get("input")))
         }

