ada                   package:ada                   R Documentation

_F_i_t_t_i_n_g _S_t_o_c_h_a_s_t_i_c _B_o_o_s_t_i_n_g _M_o_d_e_l_s

_D_e_s_c_r_i_p_t_i_o_n:

     'ada' is used to fit a variety stochastic boosting models for a
     binary response  as described in _Additive Logistic Regression:  A
     Statistical View of Boosting_ by Friedman, et al. (2000).

_U_s_a_g_e:

     ada(x,...)
     ## Default S3 method:
     ada(x, y,test.x,test.y=NULL, loss=c("exponential","logistic"),
                           type=c("discrete","real","gentle"),iter=50, nu=0.1, bag.frac=0.5,
                           model.coef=TRUE,bag.shift=FALSE,max.iter=20,delta=10^(-10),verbose=FALSE,
                           na.action=na.rpart,...)

     ## S3 method for class 'formula':
     ada(formula, data, ..., subset, na.action=na.rpart)

_A_r_g_u_m_e_n_t_s:

       x: matrix of descriptors.

       y: vector of responses.  'y' may have only two unique values.

  test.x: testing matrix of discriptors (optional)

  test.y: vector of testing responses (optional)

    loss: loss="exponential", "ada","e" or any variation corresponds to
           the default boosting under exponential loss. 
          loss="logistic","l2","l" provides boosting under logistic
          loss.

    type: type of boosting algorithm to perform. "discrete" performs
          discrete Boosting (default). "real" performs Real Boost.
          "gentle" performs Gentle Boost.

    iter: number of boosting iterations to perform.  Default = 50.

      nu: shrinkage parameter for boosting, default taken as 1.

bag.frac: sampling fraction for samples taken out-of-bag.  This allows
          one to use random permutation which improves performance.

model.coef: flag to use stageweights in boosting.  If FALSE then the
          procedure corresponds to epsilon-boosting.

bag.shift: flag to determine whether the stageweights should go to  one
          as nu goes to zero.  This only makes since if bag.frac is
          small.  The rationale behind this parameter is discussed in
          (Culp et al., 2006).

max.iter: number of iterations to perform in the newton step to
          determine  the coeficient.

   delta: 

 verbose: print the number of iterations necessary for convergence of a
          coeficient.

 formula: a symbolic description of the model to be fit.

    data: an optional data frame containing the variables in the model.

  subset: an optional vector specifying a subset of observations to be
          used in the fitting process.

na.action: a function that indicates how to process 'NA' values. 
          Default=na.rpart.

     ...: arguments passed to 'rpart.control'.  For stumps, use
          'rpart.control(maxdepth=1,cp=-1,minsplit=0,xval=0)'.
          'maxdepth' controls the depth of trees, and 'cp' controls the
          complexity of trees.  The priors should also be fixed through
          the parms argument as discussed in the second reference.

_D_e_t_a_i_l_s:

     This function directly follows the algorithms listed in _"Additive
     Logistic Regression:  A Statistical View of Boosting"_.

     When using usage 'ada(x,y)': x data can take the form data.frame
     or as.matrix. y data can take form data.frame, as.factor,
     as.matrix, as.array, or as.table. missing values must be removed
     from the data prior to execution.

     When using usage 'ada(y~.)': data must be in a data frame. 
     Response can have factor or numeric values. missing values can be
     present in the descriptor data, whenever na.action is set to any
     option other than na.pass.

     After the model is fit, 'ada' prints  a summary of the function
     call,  the method used for boosting,  the number of iterations,
     the final confusion matrix (observed classification vs predicted
     classification;  labels for classes are same as in response),  
     the error for the training set, and testing, training , and kappa
     estimates of the  appropriate number of iterations.

     A summary of this information can also be obtained with the
     command 'print(x)'.

     Corresponding functions (Use help with summary.ada, predict.ada,
     ... varplot for additional information on these commands):

     summary :  function to print a summary of the original function
     call, method used for boosting, number of iterations, final
     confusion matrix, accuracy, and kappa statistic (a measure of
     agreement between the observed classification and predicted
     classification). 'summary' can be used for training, testing, or
     validation data.  

     predict :  function to predict the response for any data set
     (train, test, or validation)

     plot    :  function to plot performance of the algorithm across
     boosting iterations. Default plot is iteration number (x-axis)
     versus prediction error (y-axis) for the data set used to build
     the model.  Function can also simultaneously produce an error plot
     for an external test set and a kappa plot for training and test
     sets. 

     pairs   :  function to produce pairwise plots of descriptors. 
     Descriptors are arranged by  decreasing frequency of selection by
     boosting (upper left = most frequently chosen). The color of the
     marker in the plot represents class membership; the Size of the
     marker represents predicted class probability.  The larger the
     marker, the higher the probability of classification.

     varplot :  plot of variables ordered by the variable importance
     measure (based on improvement).

     addtest : add a testing data set to the 'ada' object, therefore
     the testing errors only have to  be computed once.  

     update : add more trees to the 'ada' object.

_V_a_l_u_e:

   model: The following items are the different components created by
          the algorithms: trees:  ensamble of rpart trees used to fit
          the model alpha:  the weights of the trees used in the final
          aggregate model (AdaBoost only;  see references for more
          information) F    :  F[[1]] corresponds to the training sum,
          F[[2]]], ... corresponds to testing sums. errs  :  matrix of
          errs, training, kappa, testing 1, kappa 1, ... lw    :  last
          weights calculated, used by update routine 

     fit: The predicted classification for each observation in the
          orginal level of the response. 

    call: The function call. 

      nu: shrinakge parameter

    type: The type of adaboost performed:  'discrete', 'real', 'logit',
          and 'gentle'. 

confusion: The confusion matrix (True value vs. Predicted value) for
          the training data. 

    iter: The number of boosting iterations that were performed. 

  actual: The original response vector. 

_W_a_r_n_i_n_g_s:

     For LogitBoost and Gentle Boost, under certain circumstances, the
     methods will fail to classify the data into more than one
     category. If this occurs, try modifying the rpart.control options
     such as 'minsplit', 'cp', and 'maxdepth'.

     'ada' does not currently handle multiclass problems.  However,
     there is an example in (Culp et al., 2006) that shows how to use
     this code in that setting.  Plots and other functions are not set
     up for this analysis.

_A_u_t_h_o_r(_s):

     Mark Culp, University of Michigan Kjell Johnson, Pfizer, Inc.
     George Michailidis, University of Michigan

     Special thanks goes to: Zhiguang Qian, Georgia Tech University
     Greg Warnes, Pfizer, Inc.

_R_e_f_e_r_e_n_c_e_s:

     Friedman, J. (1999). _Greedy Function Approximation: A Gradient
     Boosting Machine._  Technical Report, Department of Statistics,
     Standford University.

     Friedman, J., Hastie, T., and Tibshirani, R.  (2000).  _Additive
     Logistic Regression: A statistical view of boosting_.  Annals of
     Statistics, 28(2), 337-374.

     Friedman, J. (2002). _Stochastic Gradient Boosting_.  Coputational
     Statistics & Data Analysis 38.

     Culp, M., Johnson, K., Michailidis, G. (2006). _ada: an R Package
     for Boosting_ Journal of Statistical Software.

_S_e_e _A_l_s_o:

     'print.ada','summary.ada','predict.ada'
     'plot.ada','pairs.ada','update.ada' 'addtest'

_E_x_a_m_p_l_e_s:

     ## fit discrete ada boost to a simple example
     data(iris)
     ##drop setosa
     iris[iris$Species!="setosa",]->iris
     ##set up testing and training data (60% for training)
     n<-dim(iris)[1]
     trind<-sample(1:n,floor(.6*n),FALSE)
     teind<-setdiff(1:n,trind)
     iris[,5]<- as.factor((levels(iris[,5])[2:3])[as.numeric(iris[,5])-1])
     ##fit 8-split trees
     gdis<-ada(Species~.,data=iris[trind,],iter=20,nu=1,type="discrete")
     ##add testing data set
     gdis=addtest(gdis,iris[teind,-5],iris[teind,5])
     ##plot gdis
     plot(gdis,TRUE,TRUE)
     ##variable selection plot
     varplot(gdis)
     ##pairwise plot
     pairs(gdis,iris[trind,-5],maxvar=2)

     ##for many more examples refer to reference (Culp et al., 2006)

