TWIX                  package:TWIX                  R Documentation

_T_r_e_e_s _w_i_t_h _e_x_t_r_a _s_p_l_i_t_s

_D_e_s_c_r_i_p_t_i_o_n:

     Trees with extra splits

_U_s_a_g_e:

     TWIX(formula, data = NULL, test.data = 0, subset = NULL,
             method = "deviance", topn.method = "complete", cluster = NULL,
             minsplit = 30, minbucket = round(minsplit/3), Devmin = 0.05,
             topN = 1, level = 30, st = 1, cl.level = 2, tol = 0.15, score = 1,
             k = 0, trace.plot=FALSE, ...)

_A_r_g_u_m_e_n_t_s:

 formula: formula of the form 'y ~ x1 + x2 + ...', where 'y' must be a
          factor and 'x1,x2,...' are numeric or factor.

    data: an optional data frame containing the variables in the
          model(training data).

test.data: This can be a data frame containing new data, '0'(default),
          or '"NULL"'.If set to '"NULL"' the bad obserations will be
          specified.

  subset: an optional vector specifying a subset of observations to be
          used.

  method: Which split points will be used? This can be '"deviance"'
          (default), '"grid"' or '"local"'. If the 'method' is set to:
           '"local"' - the program uses the local maxima of the split
          function(entropy),
           '"deviance"' - all values of the entropy,
           '"grid"' - grid points.

topn.method: one of '"complete"'(default) or '"single"'. A
          specification of the consideration of the split points. If
          set to '"complete"' it uses split points from all variables,
          else it uses split points per variable.

 cluster: name of the cluster, if parallel computing will be used.

minsplit: the minimum number of observations that must exist in a node.

minbucket: the minimum number of observations in any terminal <leaf>
          node.

  Devmin: the minimum improvement on entropy by splitting.

    topN: integer vector. How many splits will be selected and at which
          level? If length 1, the same size of splits will be selected
          at each level. If length > 1, for example 'topN=c(3,2)', 3
          splits will be chosen at first level, 2 splits at second
          level and for all next levels 1 split.

   level: maximum depth of the trees. If 'level' set to 1, trees
          consist of root node.

      st: step parameter for method '"grid"'.

cl.level: parameter for parallel computing.

     tol: parameter, which will be used, if 'topn.method' is set to
          '"single"'.

   score: a parameter, which can be '1'(default) or '2'. If it is '2'
          the _sort_-function will be used,
           if it set to '1' _weigth_-function will be used
           'score =
          0.25*scale(dev.tr)+0.6*scale(fit.tr)+0.15*(tree.structure)'


       k: k-fold cross-validation of split-function. k specify the part
          of observations which will be take in hold-out sample (k can
          be (0,0.5)).

trace.plot: Should trace plot be ploted?

     ...: further arguments to be passed to or from methods.

_V_a_l_u_e:

     a list with the following components : 

    call: the call generating the object.

   trees: a list of all constructed trees, which include ID, Dev ... 
          for each tree.

greedy.tree: greedy tree

multitree: database

 agg.id : vector specifying trees for aggregation.

  Bad.id: ID-vector of bad observations from train data.

_S_e_e _A_l_s_o:

     'get.tree', 'predict.TWIX', 'print.single.tree', 'plot.TWIX',
     'deviance.TWIX'

_E_x_a_m_p_l_e_s:

     data(olives)
     i <- sample(572,150)
     ic <- setdiff(1:572,i)
     training <- olives[ic,]
     test <- olives[i,]
     #
     #Tree1<-TWIX(Region~.,data=training[,1:9],topN=c(9,2),method="local")
     #Tree1$trees
     #
     #pred<-predict(Tree1,newdata=test,sq=1:2)
     #
     #predict(Tree1,newdata=test,sq=1:2,ccr=TRUE)$CCR

