inbagg                 package:ipred                 R Documentation

_I_n_d_i_r_e_c_t _B_a_g_g_i_n_g

_D_e_s_c_r_i_p_t_i_o_n:

     Function to perform the indirect bagging and subagging.

_U_s_a_g_e:

     inbagg.data.frame(formula, data, pFUN=NULL, 
       cFUN=list(model = NULL, predict = NULL, training.set = NULL), 
       nbagg = 25, ns = 0.5, replace = FALSE, ...)

_A_r_g_u_m_e_n_t_s:

 formula: formula. A `formula' specified as `y~w1+w2+w3~x1+x2+x3'
          describes how to model the intermediate variables `w1, w2,
          w3' and the response variable `y', if no other formula is
          specified by the elements of `pFUN' or in `cFUN'

    data: data frame of explanatory, intermediate and response
          variables.

    pFUN: list of lists, which describe models for the intermediate
          variables, details are given below.

    cFUN: either a fixed function with argument `newdata' and returning
          the class membership by default, or a list specifying a
          classifying model, similar to one element of `pFUN'. Details
          are given below.

   nbagg: number of bootstrap samples.

      ns: proportion of sample to be drawn from the learning sample. By
          default, subagging with 50% is performed, i.e. draw 0.5*n out
          of n without replacement.

 replace: logical. Draw with or without replacement.

     ...: additional arguments (e.g. `subset').

_D_e_t_a_i_l_s:

     A given data set is subdivided into three types of variables:
     explanatory, intermediate and response variables.

     Here, each specified intermediate variable is modelled separately
     following `pFUN', a list of lists with elements specifying an
     arbitrary number of models for the intermediate variables and an
     optional element `training.set = c("oob", "bag", "all")'. The
     element `training.set' determines whether, predictive models for
     the intermediate are calculated based on the out-of-bag sample
     (`"oob"'), the default, on the bag sample (`"bag"') or on all
     available observations (`"all"'). The elements of `pFUN',
     specifying the models for the intermediate variables are lists as
     described in `inclass'. Note that, if no formula is given in these
     elements, the functional relationship of `formula' is used.

     The response variable is modelled following `cFUN'. This can
     either be a fixed classifying function as described in Peters et
     al. (2003) or a list, which specifies the  modelling technique to
     be applied. The list contains the arguments `model' (which model
     to be fitted), `predict' (optional, how to predict), `formula'
     (optional, of type `y~w1+w2+w3+x1+x2' determines the variables the
     classifying function is based on) and the optional argument
     `training.set = c("fitted.bag", "original", "fitted.subset")'
     specifying whether the classifying function is trained on the
     predicted observations of the bag sample (`"fitted.bag"'), on the
     original observations (`"original"') or on the predicted
     observations not included in a defined subset (`"fitted.subset"').
     Per default the formula specified in `formula' determines the
     variables, the classifying function is based on.

     Note that the default of `cFUN = list(model = NULL, training.set =
     "fitted.bag")' uses the function `rpart' and the predict function
     `predict(object, newdata, type = "class")'.

_V_a_l_u_e:

     An object of class `"inbagg"', that is a list with elements 

  mtrees: a list of length `nbagg', describing the prediction models
          corresponding to each bootstrap sample. Each element of
          `mtrees' is a list with elements `bindx' (observations of bag
          sample), `btree' (classifying function of bag sample) and
          `bfct' (predictive models for intermediates of bag sample).

       y: vector of response values.

       W: data frame of intermediate variables.

       X: data frame of explanatory variables.

_A_u_t_h_o_r(_s):

     Andrea Peters <Peters.Andrea@imbe.imed.uni-erlangen.de>

_R_e_f_e_r_e_n_c_e_s:

     David J. Hand, Hua Gui Li, Niall M. Adams (2001), Supervised
     classification with structured class definitions. Computational
     Statistics & Data Analysis 36, 209-225.

     Andrea Peters, Berthold Lausen, Georg Michelson and Olaf Gefeller
     (2003), Diagnosis of glaucoma by indirect classifiers. Methods of
     Information in Medicine 1, 99-103.

_S_e_e _A_l_s_o:

     `rpart', `bagging', `lm'

_E_x_a_m_p_l_e_s:

     library(mvtnorm)
     y <- as.factor(sample(1:2, 100, replace = TRUE))
     W <- rmvnorm(200, mean = rep(0, 3))

     X <- rmvnorm(200, mean = rep(2, 3))
     colnames(W) <- c("w1", "w2", "w3") 
     colnames(X) <- c("x1", "x2", "x3") 
     DATA <- data.frame(y, W, X)

     pFUN <- list(list(formula = w1~x1+x2, model = lm, predict = mypredict.lm),
     list(model = rpart))

     inbagg(y~w1+w2+w3~x1+x2+x3, data = DATA, pFUN = pFUN)

