mob                  package:party                  R Documentation

_M_o_d_e_l-_b_a_s_e_d _R_e_c_u_r_s_i_v_e _P_a_r_t_i_t_i_o_n_i_n_g

_D_e_s_c_r_i_p_t_i_o_n:

     MOB is an algorithm for model-based recursive partitioning
     yielding a tree with fitted models associated with each terminal
     node.

_U_s_a_g_e:

     mob(formula, weights, data = list(), model = glinearModel,
       control = mob_control(), ...)

     ## S3 method for class 'mob':
     predict(object, newdata = NULL, type = c("response", "node"), ...)
     ## S3 method for class 'mob':
     summary(object, node = NULL, ...)
     ## S3 method for class 'mob':
     coef(object, node = NULL, ...)
     ## S3 method for class 'mob':
     sctest(x, node = NULL, ...)

_A_r_g_u_m_e_n_t_s:

 formula: A symbolic description of the model to be fit. This should be
          of type 'y ~ x1 + ... + xk | z1 + ... + zl' where the
          variables before the '|' are passed to the 'model' and the
          variables after the '|' are used for partitioning.

 weights: An optional vector of weights to be used in the fitting
          process. Only non-negative integer valued weights are allowed
          (default = 1).

    data: A data frame containing the variables in the model.

   model: A model of class 'StatModel-class'. See details for
          requirements.

 control: A list with control parameters as returned by 'mob_control'.

     ...: Additional arguments passed to the 'fit' call for the
          'model'.

object, x: A fitted 'mob' object.

 newdata: A data frame with new inputs, by default the learning data is
          used.

    type: A character string specifying whether the response should be
          predicted (inherited from the 'predict' method for the
          'model') or the ID of the associated terminal node.

    node: A vector of node IDs for which the corresponding method
          should be applied.

_D_e_t_a_i_l_s:

     Model-based partitioning fits a model tree using the following
     algorithm:

        1.  'fit' a 'model' (default: a generalized linear model
           'glinearModel' with formula 'y ~ x1 + ... + xk' for the
           observations in the current node.

        2.  Assess the stability of the model parameters with respect
           to each of the partitioning variables 'z1', ..., 'zl'. If
           there is some overall instability, choose the variable 'z'
           associated with the smallest p value for partitioning,
           otherwise stop. For performing the parameter instability
           fluctuation test, a 'estfun' method and a 'weights' method
           is needed.

        3.  Search for the locally optimal split in 'z' by minimizing
           the objective function of the 'model'. Typically, this will
           be something like 'deviance' or the negative 'logLik' and
           can be specified in 'mob_control'.

        4.  Re-fit the 'model' in both children, using 'reweight' and
           repeat from step 2.

     More details on the conceptual design of the algorithm can be
     found in  Zeileis, Hothorn, Hornik (2005) and some illustrations
     are provided in 'vignette("MOB")'.  

     For the fitted MOB tree, several standard methods are inherited if
     they are available for fitted 'model's, such as 'print',
     'predict', 'residuals', 'logLik', 'deviance', 'weights', 'coef'
     and 'summary'. By default, the latter four return the result
     (deviance, weights, coefficients, summary) for all terminal nodes,
     but take a 'node' argument that can be set to any node ID. The
     'sctest' method extracts the results of the parameter stability
     tests (aka structural change tests) for any given node, by default
     for all nodes. Some examples are given below.

_V_a_l_u_e:

     An object of class 'mob' inheriting from 'BinaryTree-class'. Every
     node of the tree is additionally associated with a fitted model.

_R_e_f_e_r_e_n_c_e_s:

     Achim Zeileis, Torsten Hothorn and Kurt Hornik (2005). Model-based
     Recursive Partitioning. _Report 19_, Department of Statistics and
     Mathematics,  Wirtschaftsuniversitaet Wien, Research Report
     Series. <URL:
     http://epub.wu-wien.ac.at/dyn/openURL?id=oai:epub-wu-01_86e>

_S_e_e _A_l_s_o:

     'plot.mob', 'mob_control'

_E_x_a_m_p_l_e_s:

     if(require("mlbench")) {

     ## recursive partitioning of a linear regression model
     ## load data
     data("BostonHousing", package = "mlbench")
     ## and transform variables appropriately (for a linear regression)
     BostonHousing$lstat <- log(BostonHousing$lstat)
     BostonHousing$rm <- BostonHousing$rm^2
     ## as well as partitioning variables (for fluctuation testing)
     BostonHousing$chas <- factor(BostonHousing$chas, levels = 0:1, labels = c("no", "yes"))
     BostonHousing$rad <- factor(BostonHousing$rad, ordered = TRUE)

     ## partition the linear regression model medv ~ lstat + rm
     ## with respect to all remaining variables:
     fmBH <- mob(medv ~ lstat + rm | zn + indus + chas + nox + age + dis + rad + tax + crim + b + ptratio,
       control = mob_control(minsplit = 40), data = BostonHousing, model = linearModel)

     ## print the resulting tree
     fmBH
     ## or better visualize it
     plot(fmBH)

     ## extract coefficients in all terminal nodes
     coef(fmBH)
     ## look at full summary, e.g., for node 7
     summary(fmBH, node = 7)
     ## results of parameter stability tests for that node
     sctest(fmBH, node = 7)
     ## -> no further significant instabilities (at 5% level)

     ## compute mean squared error (on training data)
     mean((BostonHousing$medv - fitted(fmBH))^2)
     mean(residuals(fmBH)^2)
     deviance(fmBH)/sum(weights(fmBH))

     ## evaluate logLik and AIC
     logLik(fmBH)
     AIC(fmBH)
     ## (Note that this penalizes estimation of error variances, which
     ## were treated as nuisance parameters in the fitting process.)

     ## recursive partitioning of a logistic regression model
     ## load data
     data("PimaIndiansDiabetes", package = "mlbench")
     ## partition logistic regression diabetes ~ glucose 
     ## wth respect to all remaining variables
     fmPID <- mob(diabetes ~ glucose | pregnant + pressure + triceps + insulin + mass + pedigree + age,
       data = PimaIndiansDiabetes, model = glinearModel, family = binomial())

     ## fitted model
     coef(fmPID)
     plot(fmPID)
     plot(fmPID, tp_args = list(cdplot = TRUE))
     }

