diagDA                package:sfsmisc                R Documentation

_D_i_a_g_o_n_a_l _D_i_s_c_r_i_m_i_n_a_n_t _A_n_a_l_y_s_i_s

_D_e_s_c_r_i_p_t_i_o_n:

     This function implements a simple Gaussian maximum likelihood
     discriminant rule, for diagonal class covariance matrices.

_U_s_a_g_e:

     dDA(x, cll, pool = TRUE)
     ## S3 method for class 'dDA':
     predict(object, newdata, pool = object$pool, ...)
     ## S3 method for class 'dDA':
     print(x, ...)

     diagDA(ls, cll, ts, pool = TRUE)

_A_r_g_u_m_e_n_t_s:

    x,ls: learning set data matrix, with rows corresponding to cases
          (e.g., mRNA samples) and columns to predictor variables
          (e.g., genes).

     cll: class labels for learning set, must be consecutive integers.

  object: object of class 'dDA'.

ts, newdata: test set (prediction) data matrix, with rows corresponding
          to cases and columns to predictor variables.

    pool: logical flag.  If true (by default), the covariance matrices
          are assumed to be constant across classes and the
          discriminant rule is linear in the data.  Otherwise ('pool=
          FALSE'), the covariance matrices may vary across classes and
          the discriminant rule is quadratic in the data.

     ...: further arguments passed to and from methods.

_V_a_l_u_e:

     'dDA()' returns an object of class 'dDA' for which there are
     'print' and 'predict' methods.  The latter returns the same as
     'diagDA()':

     'diagDA()' returns an integer vector of class predictions for the
     test set.

_A_u_t_h_o_r(_s):

     Sandrine Dudoit, sandrine@stat.berkeley.edu  and
      Jane Fridlyand, janef@stat.berkeley.edu originally wrote
     'stat.diag.da()' in CRAN package 'sma' which was modified for
     speedup by Martin Maechler maechler@R-project.org who also
     introduced 'dDA' etc.

_R_e_f_e_r_e_n_c_e_s:

     S. Dudoit, J. Fridlyand, and T. P. Speed. (2000) Comparison of
     Discrimination Methods for the Classification of Tumors Using Gene
     Expression Data. (Statistics, UC Berkeley, June 2000, Tech Report
     #576)

_S_e_e _A_l_s_o:

     'lda' and 'qda' from the 'MASS' package.

_E_x_a_m_p_l_e_s:

     ## two artificial examples by Andreas Greutert:
     d1 <- data.frame(x = c(1, 5, 5, 5, 10, 25, 25, 25, 25, 29),
                      y = c(4, 1, 2, 4,  4,  4,     6:8,     7))
     n.plot(d1)
     library(cluster)
     (cl1P <- pam(d1,k=4)$cluster) # 4 surprising clusters
     with(d1, points(x+0.5, y, col = cl1P, pch =cl1P))

     i1 <- c(1,3,5,6)
     tr1 <- d1[-i1,]
     cl1. <- c(1,2,1,2,1,3)
     cl1  <- c(2,2,1,1,1,3)
     plot(tr1, cex=2, col = cl1, pch = 20+cl1)
     (dd.<- diagDA(tr1, cl1., ts = d1[ i1,]))# ok
     (dd <- diagDA(tr1, cl1 , ts = d1[ i1,]))# ok, too!
     points(d1[ i1,], pch = 10, cex=3, col = dd)

     ## use new fit + predict instead :
     (r1 <- dDA(tr1, cl1))
     (r1.<- dDA(tr1, cl1.))
     stopifnot(dd == predict(r1,  new = d1[ i1,]),
               dd.== predict(r1., new = d1[ i1,]))

     plot(tr1, cex=2, col = cl1, bg = cl1, pch = 20+cl1,
          xlim=c(1,30), ylim= c(0,10))
     xy <- cbind(x= runif(500, min=1,max=30), y = runif(500, min=0, max=10))
     points(xy, cex= 0.5, col = predict(r1, new = xy))
     abline(v=c( mean(c(5,25)), mean(c(25,29))))

