Weka_classifier_functions       package:RWeka       R Documentation

_R/_W_e_k_a _C_l_a_s_s_i_f_i_e_r _F_u_n_c_t_i_o_n_s

_D_e_s_c_r_i_p_t_i_o_n:

     R interfaces to Weka regression and classification function
     learners.

_U_s_a_g_e:

     LinearRegression(formula, data, subset, na.action,
                      control = Weka_control(), options = NULL)
     Logistic(formula, data, subset, na.action,
              control = Weka_control(), options = NULL)
     SMO(formula, data, subset, na.action,
         control = Weka_control(), options = NULL)

_A_r_g_u_m_e_n_t_s:

 formula: a symbolic description of the model to be fit.

    data: an optional data frame containing the variables in the model.

  subset: an optional vector specifying a subset of observations to be
          used in the fitting process.

na.action: a function which indicates what should happen when the data
          contain 'NA's.

 control: an object of class 'Weka_control' giving options to be passed
          to the Weka learner.  Available options can be obtained
          on-line using the Weka Option Wizard 'WOW', or the Weka
          documentation.

 options: a named list of further options, or 'NULL' (default).  See
          *Details*.

_D_e_t_a_i_l_s:

     There are a 'predict' method for predicting from the fitted
     models, and a 'summary' method based on
     'evaluate_Weka_classifier'.

     'LinearRegression' builds suitable linear regression models, using
     the Akaike criterion for model selection.

     'Logistic' builds multinomial logistic regression models based on
     ridge estimation (le Cessie and van Houwelingen, 1992).

     'SMO' implements John C. Platt's sequential minimal optimization
     algorithm for training a support vector classifier using
     polynomial or RBF kernels.  Multi-class problems are solved using
     pairwise classification.  

     The model formulae should only use the '+' and '-' operators to
     indicate the variables to be included or not used, respectively.

     Argument 'options' allows further customization.  Currently,
     options 'model' and 'instances' (or partial matches for these) are
     used: if set to 'TRUE', the model frame or the corresponding Weka
     instances, respectively, are included in the fitted model object,
     possibly speeding up subsequent computations on the object.  By
     default, neither is included.

_V_a_l_u_e:

     A list inheriting from classes 'Weka_functions' and
     'Weka_classifiers' with components including 

classifier: a reference (of class 'jobjRef') to a Java object obtained
          by applying the Weka 'buildClassifier' method to build the
          specified model using the given control options.

predictions: a numeric vector or factor with the model predictions for
          the training instances (the results of calling the Weka
          'classifyInstance' method for the built classifier and each
          instance).

    call: the matched call.

_R_e_f_e_r_e_n_c_e_s:

     J. C. Platt (1998). Fast training of Support Vector Machines using
     Sequential Minimal Optimization. In B. Schoelkopf, C. Burges, and
     A. Smola (eds.), _Advances in Kernel Methods - Support Vector
     Learning_. MIT Press.

     I. H. Witten and E. Frank (2005). _Data Mining: Practical Machine
     Learning Tools and Techniques_. 2nd Edition, Morgan Kaufmann, San
     Francisco.

_S_e_e _A_l_s_o:

     Weka_classifiers

_E_x_a_m_p_l_e_s:

     ## Linear regression:
     ## Using standard data set 'mtcars'.
     LinearRegression(mpg ~ ., data = mtcars)
     ## Compare to R:
     step(lm(mpg ~ ., data = mtcars), trace = 0)

     ## Using standard data set 'chickwts'.
     LinearRegression(weight ~ feed, data = chickwts)
     ## (Note the interactions!)

     ## Logistic regression:
     ## Using standard data set 'infert'.
     STATUS <- factor(infert$case, labels = c("control", "case"))
     Logistic(STATUS ~ spontaneous + induced, data = infert)
     ## Compare to R:
     glm(STATUS ~ spontaneous + induced, data = infert, family = binomial())

     ## Sequential minimal optimization algorithm for training a support
     ## vector classifier, using am RBF kernel with a non-default gamma
     ## parameter (argument '-G') instead of the default polynomial kernel
     ## (from a question on r-help):
     SMO(Species ~ ., data = iris,
         control = Weka_control(K =
         list("weka.classifiers.functions.supportVector.RBFKernel", G = 2)))
     ## In fact, by some hidden magic it also "works" to give the "base" name
     ## of the Weka kernel class:
     SMO(Species ~ ., data = iris,
         control = Weka_control(K = list("RBFKernel", G = 2)))

