Weka_classifier_trees         package:RWeka         R Documentation

_R/_W_e_k_a _C_l_a_s_s_i_f_i_e_r _T_r_e_e_s

_D_e_s_c_r_i_p_t_i_o_n:

     R interfaces to Weka regression and classification tree learners.

_U_s_a_g_e:

     J48(formula, data, subset, na.action,
         control = Weka_control(), options = NULL)
     LMT(formula, data, subset, na.action,
         control = Weka_control(), options = NULL)
     M5P(formula, data, subset, na.action,
         control = Weka_control(), options = NULL)
     DecisionStump(formula, data, subset, na.action,
                   control = Weka_control(), options = NULL)

_A_r_g_u_m_e_n_t_s:

 formula: a symbolic description of the model to be fit.

    data: an optional data frame containing the variables in the model.

  subset: an optional vector specifying a subset of observations to be
          used in the fitting process.

na.action: a function which indicates what should happen when the data
          contain 'NA's.

 control: an object of class 'Weka_control' giving options to be passed
          to the Weka learner.  Available options can be obtained
          on-line using the Weka Option Wizard 'WOW', or the Weka
          documentation.

 options: a named list of further options, or 'NULL' (default).  See
          *Details*.

_D_e_t_a_i_l_s:

     There are a 'predict' method for predicting from the fitted
     models, and a 'summary' method based on
     'evaluate_Weka_classifier'.

     There is also a 'plot' method for fitted binary 'Weka_tree's via
     the facilities provided by package 'party'. This converts the
     'Weka_tree' to a 'BinaryTree' and then simply calls the plot
     method of this class (see 'plot.BinaryTree') with slight
     modifications to the default arguments.

     Provided the Weka classification tree learner implements the
     Drawable interface (i.e., provides a 'graph' method),
     'write_to_dot' can be used to create a DOT representation of the
     tree for visualization via Graphviz or the 'Rgraphviz' package.

     'J48' generates unpruned or pruned C4.5 decision trees (Quinlan,
     1993).

     'LMT' implements Logistic Model Trees (Landwehr, 2003; Landwehr
     et al., 2005).

     'M5P' (where the 'P' stands for prime) generates M5 model trees
     using the M5' algorithm, which was introduced in Wang & Witten
     (1997) and enhances the original M5 algorithm by Quinlan (1992).

     'DecisionStump' implements decision stumps (trees with a single
     split only), which are frequently used as base learners for meta
     learners such as Boosting.

     The model formulae should only use the '+' and '-' operators to
     indicate the variables to be included or not used, respectively.

     Argument 'options' allows further customization.  Currently,
     options 'model' and 'instances' (or partial matches for these) are
     used: if set to 'TRUE', the model frame or the corresponding Weka
     instances, respectively, are included in the fitted model object,
     possibly speeding up subsequent computations on the object.  By
     default, neither is included.

_V_a_l_u_e:

     A list inheriting from classes 'Weka_tree' and 'Weka_classifiers'
     with components including 

classifier: a reference (of class 'jobjRef') to a Java object obtained
          by applying the Weka 'buildClassifier' method to build the
          specified model using the given control options.

predictions: a numeric vector or factor with the model predictions for
          the training instances (the results of calling the Weka
          'classifyInstance' method for the built classifier and each
          instance).

    call: the matched call.

_R_e_f_e_r_e_n_c_e_s:

     N. Landwehr (2003). _Logistic Model Trees_. Master's thesis,
     Institute for Computer Science, University of Freiburg, Germany.
     <URL:
     http://www.informatik.uni-freiburg.de/~ml/thesis_landwehr2003.html>

     N. Landwehr, M. Hall, and E. Frank (2005). Logistic Model Trees.
     _Machine Learning_, *59*, 161-205.

     R. Quinlan (1993). _C4.5: Programs for Machine Learning_. Morgan
     Kaufmann Publishers, San Mateo, CA.

     R. Quinlan (1992). Learning with continuous classes. _Proceedings
     of the Australian Joint Conference on Artificial Intelligence_,
     343-348. World Scientific, Singapore. 

     Y. Wang and I. H. Witten (1997). Induction of model trees for
     predicting continuous classes. _Proceedings of the European
     Conference on Machine Learning_. University of Economics, Faculty
     of Informatics and Statistics, Prague.

     I. H. Witten and E. Frank (2005). _Data Mining: Practical Machine
     Learning Tools and Techniques_. 2nd Edition, Morgan Kaufmann, San
     Francisco.

_S_e_e _A_l_s_o:

     Weka_classifiers

_E_x_a_m_p_l_e_s:

     m1 <- J48(Species ~ ., data = iris)

     ## print and summary
     m1
     summary(m1) # calls evaluate_Weka_classifier()
     table(iris$Species, predict(m1)) # by hand

     ## visualization
     ## use party package
     if(require("party", quietly = TRUE)) plot(m1)
     ## or Graphviz
     write_to_dot(m1)
     ## or Rgraphviz
     ## Not run: 
     library("Rgraphviz")
     ff <- tempfile()
     write_to_dot(m1, ff)
     plot(agread(ff))
     ## End(Not run)

     ## Using some Weka data sets ...

     ## J48
     DF2 <- read.arff(system.file("arff", "contact-lenses.arff",
                                  package = "RWeka"))
     m2 <- J48(`contact-lenses` ~ ., data = DF2)
     m2
     table(DF2$`contact-lenses`, predict(m2))
     if(require("party", quietly = TRUE)) plot(m2)

     ## M5P
     DF3 <- read.arff(system.file("arff", "cpu.arff", package = "RWeka"))
     m3 <- M5P(class ~ ., data = DF3)
     m3
     if(require("party", quietly = TRUE)) plot(m3)

     ## Logistic Model Tree.
     DF4 <- read.arff(system.file("arff", "weather.arff", package = "RWeka"))
     m4 <- LMT(play ~ ., data = DF4)
     m4
     table(DF4$play, predict(m4))

     ## Larger scale example.
     if(require("mlbench", quietly = TRUE)
        && require("party", quietly = TRUE)) {
         ## Predict diabetes status for Pima Indian women
         data("PimaIndiansDiabetes", package = "mlbench")
         ## Fit J48 tree with reduced error pruning
         m5 <- J48(diabetes ~ ., data = PimaIndiansDiabetes,
                   control = Weka_control(R = TRUE))
         plot(m5)
         ## (Make sure that the plotting device is big enough for the tree.)
     }

