fda                   package:mda                   R Documentation

_F_l_e_x_i_b_l_e _D_i_s_c_r_i_m_i_n_a_n_t _A_n_a_l_y_s_i_s

_D_e_s_c_r_i_p_t_i_o_n:

     Flexible discriminant analysis.

_U_s_a_g_e:

     fda(formula, data, weights, theta, dimension, eps, method,
         keep.fitted, ...)

_A_r_g_u_m_e_n_t_s:

 formula: of the form 'y~x' it describes the response and the
          predictors.  The formula can be more complicated, such as
          'y~log(x)+z' etc (see 'formula' for more details). The
          response should be a factor representing the response
          variable, or any vector that can be coerced to such (such as
          a logical variable).

    data: data frame containing the variables in the formula
          (optional).

 weights: an optional vector of observation weights.

   theta: an optional matrix of class scores, typically with less than
          'J-1' columns.

dimension: The dimension of the solution, no greater than 'J-1', where
          'J' is the number classes.  Default is 'J-1'.

     eps: a threshold for small singular values for excluding
          discriminant variables; default is '.Machine$double.eps'.

  method: regression method used in optimal scaling.  Default is linear
          regression via the function 'polyreg', resulting in linear
          discriminant analysis.  Other possibilities are 'mars' and
          'bruto'.  For Penalized Discriminant analysis 'gen.ridge' is
          appropriate.

keep.fitted: a logical variable, which determines whether the
          (sometimes large) component '"fitted.values"' of the 'fit'
          component of the returned fda object should be kept.  The
          default is 'TRUE' if 'n * dimension < 1000'.

     ...: additional arguments to 'method'.

_V_a_l_u_e:

     an object of class '"fda"'.  Use 'predict' to extract discriminant
     variables, posterior probabilities or predicted class memberships.
      Other extractor functions are 'coef', 'confusion' and 'plot'.

     The object has the following components: 

percent.explained: the percent between-group variance explained by each
          dimension (relative to the total explained.)

  values: optimal scaling regresssion sum-of-squares for each dimension
          (see reference).  The usual discriminant analysis eigenvalues
          are given by 'values / (1-values)', which are used to define
          'percent.explained'.

   means: class means in the discriminant space.  These are also scaled
          versions of the final theta's or class scores, and can be
          used in a subsequent call to 'fda' (this only makes sense if
          some columns of theta are omitted-see the references).

theta.mod: (internal) a class scoring matrix which allows 'predict' to
          work properly.

dimension: dimension of discriminant space.

   prior: class proprotions for the training data.

     fit: fit object returned by 'method'.

    call: the call that created this object (allowing it to be
          'update'-able)

confusion: confusion matrix when classifying the training data.


     The 'method' functions are required to take arguments 'x' and 'y'
     where both can be matrices, and should produce a matrix of
     'fitted.values' the same size as 'y'.  They can take additional
     arguments 'weights' and should all have a '...' for safety sake. 
     Any arguments to 'method' can be passed on via the '...' argument
     of 'fda'.  The default method 'polyreg' has a 'degree' argument
     which allows polynomial regression of the required total degree. 
     See the documentation for 'predict.fda' for further requirements
     of 'method'.

_N_o_t_e:

     This software it is not well-tested, we would like to hear of any
     bugs.

_A_u_t_h_o_r(_s):

     Trevor Hastie and Robert Tibshirani

_R_e_f_e_r_e_n_c_e_s:

     ``Flexible Disriminant Analysis by Optimal Scoring''  by Hastie,
     Tibshirani and Buja, 1994, JASA, 1255-1270.

     ``Penalized Discriminant Analysis'' by Hastie, Buja and
     Tibshirani, Annals of Statistics, 1995 (in press).

_S_e_e _A_l_s_o:

     'predict.fda', 'mars', 'bruto', 'polyreg', 'softmax', 'confusion',

_E_x_a_m_p_l_e_s:

     data(iris)
     irisfit <- fda(Species ~ ., data = iris)
     irisfit
     ## fda(formula = Species ~ ., data = iris)
     ##
     ## Dimension: 2 
     ##
     ## Percent Between-Group Variance Explained:
     ##     v1     v2 
     ##  99.12 100.00 
     ##
     ## Degrees of Freedom (per dimension): 5 
     ##
     ## Training Misclassification Error: 0.02 ( N = 150 )

     confusion(irisfit, iris)
     ##            Setosa Versicolor Virginica 
     ##     Setosa     50          0         0
     ## Versicolor      0         48         1
     ##  Virginica      0          2        49
     ## attr(, "error"):
     ## [1] 0.02

     plot(irisfit)

     coef(irisfit)
     ##           [,1]        [,2]
     ## [1,] -2.126479 -6.72910343
     ## [2,] -0.837798  0.02434685
     ## [3,] -1.550052  2.18649663
     ## [4,]  2.223560 -0.94138258
     ## [5,]  2.838994  2.86801283

     marsfit <- fda(Species ~ ., data = iris, method = mars)
     marsfit2 <- update(marsfit, degree = 2)
     marsfit3 <- update(marsfit, theta = marsfit$means[, 1:2]) 
     ## this refits the model, using the fitted means (scaled theta's)
     ## from marsfit to start the iterations

