Conditional Trees           package:party           R Documentation

_C_o_n_d_i_t_i_o_n_a_l _T_r_e_e_s

_D_e_s_c_r_i_p_t_i_o_n:

     Recursive partitioning for continuous, censored, ordered, nominal
     and multivariate response variables in a conditional inference
     framework.

_U_s_a_g_e:

     ctree(formula, data, subset = NULL, weights = NULL, 
           controls = ctree_control(), xtrafo = ptrafo, ytrafo = ptrafo, 
           scores = NULL)

_A_r_g_u_m_e_n_t_s:

 formula: a symbolic description of the model to be fit. 

    data: a data frame containing the variables in the model. 

  subset: an optional vector specifying a subset of observations to be
          used in the fitting process.

 weights: an optional vector of weights to be used in the fitting
          process. Only non-negative integer valued weights are
          allowed.

controls: an object of class 'TreeControl', which can be obtained using
          'ctree_control'.

  xtrafo: a function to be applied to all input variables. By default,
          the 'ptrafo' function is applied.

  ytrafo: a function to be applied to all response variables.  By
          default, the 'ptrafo' function is applied.

  scores: an optional named list of scores to be attached to ordered
          factors.

_D_e_t_a_i_l_s:

     Conditional trees estimate a regression relationship by binary
     recursive partitioning in a conditional inference framework.
     Roughly, the algorithm works as follows: 1) Test the global null
     hypothesis of independence between any of the input variables and
     the response (which may be multivariate as well).  Stop if this
     hypothesis cannot be rejected. Otherwise select the input variable
     with strongest association to the resonse. This association is
     measured by a p-value corresponding to a test for the partial null
     hypothesis of a single input variable and the response. 2)
     Implement a binary split in the selected input variable.  3)
     Recursively repeate steps 1) and 2). 

     The implementation utilizes a unified framework for conditional
     inference, or permutation tests, developed by Strasser and Weber
     (1999). The stop criterion in step 1) is either based on
     multiplicity adjusted p-values  ('testtype = "Bonferroni"'  or
     'testtype = "MonteCarlo"' in 'ctree_control') or on the univariate
     p-values ('testtype = "Univariate"'). In both cases, the criterion
     is maximized, i.e., 1 - p-value is used. A split is implemented 
     when the criterion exceeds the value given by 'mincriterion' as
     specified in 'ctree_control'. For example, when  'mincriterion =
     0.95', the p-value must be smaller than $0.05$ in order to split
     this node. This statistical approach ensures that the right sized
     tree is grown and no form of pruning or cross-validation or
     whatsoever is needed. The selection of the input variable to split
     in is based on the univariate p-values avoiding a variable
     selection bias towards input variables with many possible
     cutpoints.

     By default, the scores for each ordinal factor 'x' are
     '1:length(x)', this may be changed using 'scores = list(x =
     c(1,5,6))', for example.

     For a general description of the methodology see Hothorn, Hornik
     and Zeileis (2006).

_V_a_l_u_e:

     An object of class 'BinaryTree-class'.

_R_e_f_e_r_e_n_c_e_s:

     Torsten Hothorn, Kurt Hornik and Achim Zeileis (2006). Unbiased
     Recursive Partitioning: A Conditional Inference Framework.
     _Journal of Computational and Graphical Statistics_ (accepted).
     Preprint available from <URL:
     http://statmath.wu-wien.ac.at/~zeileis/papers/Hothorn+Hornik+Zeile
     is-2006.pdf>

     Helmut Strasser and Christian Weber (1999). On the asymptotic
     theory of permutation statistics. _Mathematical Methods of
     Statistics_, *8*, 220-250.

_E_x_a_m_p_l_e_s:

         ### regression
         airq <- subset(airquality, !is.na(Ozone))
         airct <- ctree(Ozone ~ ., data = airq, 
                        controls = ctree_control(maxsurrogate = 3))
         airct
         plot(airct)
         mean((airq$Ozone - predict(airct))^2)

         ### classification
         irisct <- ctree(Species ~ .,data = iris)
         irisct
         plot(irisct)
         table(predict(irisct), iris$Species)

         ### estimated class probabilities, a list
         tr <- treeresponse(irisct, newdata = iris[1:10,])

         ### ordinal regression
         mammoct <- ctree(ME ~ ., data = mammoexp) 
         plot(mammoct)

         ### estimated class probabilities
         treeresponse(mammoct, newdata = mammoexp[1:10,])

         ### survival analysis
         if (require("ipred")) {
             data("GBSG2", package = "ipred")
             GBSG2ct <- ctree(Surv(time, cens) ~ .,data = GBSG2)
             plot(GBSG2ct)
             treeresponse(GBSG2ct, newdata = GBSG2[1:2,])        
         }

