polyclass             package:polspline             R Documentation

_P_o_l_y_c_l_a_s_s: _p_o_l_y_c_h_o_t_o_m_o_u_s _r_e_g_r_e_s_s_i_o_n _a_n_d _m_u_l_t_i_p_l_e _c_l_a_s_s_i_f_i_c_a_t_i_o_n

_D_e_s_c_r_i_p_t_i_o_n:

     Fit a polychotomous regression and multiple classification  using
     linear splines and selected tensor products.

_U_s_a_g_e:

     polyclass(data, cov, weight, penalty, maxdim, exclude, include,
     additive = FALSE, linear, delete = 2, fit,  silent = TRUE, 
     normweight = TRUE, tdata, tcov, tweight, cv, select, loss, seed) 

_A_r_g_u_m_e_n_t_s:

    data: vector of classes: 'data' should ranges over consecutive
          integers with 0 or 1 as the minimum value.  

     cov: covariates: matrix with as many rows as the length of 'data'.            

  weight: optional vector of case-weights.  Should have the same length
          as  'data'.

 penalty: the parameter to be used in the AIC criterion if the  model
          selection is carried out by AIC.  The program chooses  the
          number of knots that minimizes '-2 * loglikelihood + penalty
          * (dimension)'.  The default is to use 'penalty =
          log(length(data))' as in BIC. If the model  selection is
          carried out by cross-validation or using a test set, the 
          program uses the number of knots that minimizes  'loss +
          penalty * dimension * (loss for smallest model)'. In this
          case  the default of 'penalty' is 0.  

  maxdim: maximum dimension (default is  'min(n, 4 * n^(1/3) * (cl -
          1)', where  n is 'length(data)' and cl the number of classes. 

 exclude: combinations to be excluded - this should be a matrix with 2 
          columns - if for example 'exclude[1, 1] = 2' and 'exclude[1,
          2] = 3' no  interaction between covariate 2 and 3 is
          included. 0 represents time.  

 include: those combinations that can be included. Should have the same
          format  as 'exclude'. Only one of 'exclude' and 'include' can
          be specified . 

additive: should the model selection be restricted to additive models?  

  linear: vector indicating for which of the variables no knots should 
          be entered. For example, if 'linear = c(2, 3)' no knots for
          either covariate  2 or 3 are entered. 0 represents time.  

  delete: should complete basis functions be deleted at once (2),
          should  only individual dimensions be deleted (1) or should
          only the addition  stage of the model selection be carried
          out (0)?  

     fit: 'polyclass' object. If  'fit' is specified, 'polyclass' adds 
          basis functions starting with those in 'fit'.  

  silent: suppresses the printing of diagnostic output about basis
          functions added  or deleted, Rao-statistics, Wald-statistics
          and log-likelihoods.  

normweight: should the weights be normalized so that they average to
          one? This option  has only an effect if the model is selected
          using AIC.  

tdata,tcov,tweight: test set. Should satisfy the same requirements as
          'data', 'cov' and 'weight'. If  all test set weights are one,
          'tweight' can be omitted. If 'tdata' and 'tcov' are 
          specified, the model selection is carried out using this test
          set, irrespective  of the input for 'penalty' or 'cv'.  

      cv: in how many subsets should the data be divided for
          cross-validation? If 'cv' is  specified and tdata is omitted,
          the model selection is carried out by  cross-validation. 

  select: if a test set is provided, or if the model is selected using
          cross validation,  should the model be select that minimizes
          (misclassification) loss (0), that  maximizes test set
          log-likelihood (1) or that minimizes test set  squared error
          loss (2)?  

    loss: a rectangular matrix specifying the loss function, whose 
          size is the number of  classes times number of actions.  Used
          for cross-validation and test set model  selection. 'loss[i,
          j]' contains the loss for  assigning action 'j'  to an object
          whose true class is 'i'.  The default is 1 minus the identity
          matrix.  'loss' does not need to be square. 

    seed: optional  seed for the random number generator that
          determines the sequence of the  cases for cross-validation.
          If the seed has length 12 or more,  the first twelve elements
          are assumed to be '.Random.seed', otherwise  the function
          'set.seed' is used.   If 'seed' is 0 or 'rep(0, 12)', it is
          assumed that the user  has already provided a (random)
          ordering.  If 'seed' is not provided, while a fit with  an
          element 'fit\$seed' is provided,  '.Random.seed' is set using
          'set.seed(fit\$seed)'. Otherwise  the present value of
          '.Random.seed' is used.  

_V_a_l_u_e:

     The output is an object of class 'polyclass', organized to serve
     as input for 'plot.polyclass',  'beta.polyclass',
     'summary.polyclass', 'ppolyclass' (fitted probabilities),
     'cpolyclass' (fitted classes) and 'rpolyclass' (random classes). 
     The function returns a list with the following members: 

    call: the command that was executed.  

    ncov: number of covariates.  

    ndim: number of dimensions of the fitted model.  

  nclass: number of classes.  

    nbas: number of basis functions.  

 naction: number of possible actions that are considered.  

    fcts: matrix of size 'nbas x (nclass + 4)'. each row is a basis
          function.  First element: first covariate involved ('NA' =
          constant); 

          second element: which knot ('NA' means: constant or linear); 

          third element: second covariate involved ('NA' means: this is
          a function  of one variable); 

          fourth element: knot involved (if the third element is 'NA',
          of no relevance); 

          fifth, sixth,...  element: beta (coefficient) for class one,
          two, ...  

   knots: a matrix with 'ncov' rows. Covariate 'i' has row 'i+1', time
          has row 1.  First column: number of knots in this dimension;
          other columns: the knots, appended with 'NA's to make it a
          matrix. 

      cv: in how many sets was the data divided for cross-validation. 
          Only provided if 'method = 2'.  

    loss: the loss matrix used in cross-validation and test set.  Only
          provided if 'method = 1' or 'method = 2'. 

 penalty: the parameter used in the AIC criterion. Only provided if
          'method = 0'. 

  method: 0 = AIC, 1 = test set, 2 = cross-validation.  

  ranges: column 'i' gives the range of the 'i'-th covariate.  

    logl: matrix with eight or eleven columns. Summarizes fits.  Column
          one indicates the dimension, column  column two the AIC or
          loss value, whichever was  used during the model selection 
          appropriate, column three four and five give the training set
          log-likelihood,  (misclassification) loss and squared error
          loss, columns six to  eight give the same information for the
          test set, column nine (or column  six if 'method = 0' or
          'method = 2') indicates whether the  model was fitted during
          the addition stage (1) or during the deletion stage (0), 
          column ten and eleven (or seven and eight) the minimum and
          maximum  penalty parameter for which AIC would have selected
          this model.  

  sample: sample size.  

 tsample: the sample size of the test set. Only prvided if 'method =
          1'.  

  wgtsum: sum of the case weights.  

covnames: names of the covariates.  

classnames: (numerical) names of the classes.  

  cv.aic: the penalty value that was determined optimal by  by cross
          validation. Only provided if 'method = 2'.  

  cv.tab: table with three columns. Column one and two indicate the
          penalty parameter  range for which the cv-loss in column
          three would be realized.   Only provided if 'method = 2'.

    seed: the random seed that was used to determine the order  of the
          cases for cross-validation.  Only provided if 'method = 2'.

  delete: were complete basis functions deleted at once (2), were  only
          individual dimensions deleted (1) or was only the addition 
          stage of the model selection carried out (0)?  

    beta: moments of basisfunctions. Needed for 'beta.polyclass'. 

  select: if a test set is provided, or if the model is selected using
          cross validation,  was the model selected that minimized
          (misclassification) loss (0), that  maximized test set
          log-likelihood (1) or that minimized test set  squared error
          loss (2)?  

   anova: matrix with three columns. The first two elements in a line 
          indicate the subspace to which the line refers. The third
          element indicates  the percentage of variance explained by
          that subspace.  

 twgtsum: sum of the test set case weights (only if 'method = 1').  

_A_u_t_h_o_r(_s):

     Charles Kooperberg clk@fhcrc.org.

_R_e_f_e_r_e_n_c_e_s:

     Charles Kooperberg, Smarajit Bose, and  Charles J. Stone (1997).
     Polychotomous regression. _Journal of the American Statistical
     Association_, *92*, 117-127.

     Charles J. Stone, Mark Hansen, Charles Kooperberg, and Young K.
     Truong. The use of polynomial splines and their tensor products in
     extended linear modeling (with discussion) (1997).  _Annals of
     Statistics_, *25*, 1371-1470.

_S_e_e _A_l_s_o:

     'polymars', 'plot.polyclass', 'summary.polyclass',
     'beta.polyclass', 'cpolyclass',
      'ppolyclass', 'rpolyclass'.

_E_x_a_m_p_l_e_s:

     data(iris)
     fit.iris <- polyclass(iris[,5], iris[,1:4])

