mclustDA               package:mclust               R Documentation

_M_c_l_u_s_t_D_A _d_i_s_c_r_i_m_i_n_a_n_t _a_n_a_l_y_s_i_s.

_D_e_s_c_r_i_p_t_i_o_n:

     MclustDA training and testing.

_U_s_a_g_e:

     mclustDA(train, test, pro=NULL, G=NULL, modelNames=NULL, prior=NULL, 
              control=emControl(), initialization=NULL, 
              warn=FALSE, verbose=FALSE, ...)

_A_r_g_u_m_e_n_t_s:

   train: A list with two named components: 'data' giving the data and
          'labels' giving the class labels for the observations in the
          data. 

    test: A list with two named components: 'data' giving the data and
          'labels' giving the class labels for the observations in the
          data. The labels are used only to compute the error rate in
          the 'print' method and  can be set to 'NULL' if unknown. The
          default is to test the training data. 

     pro: Optional prior probabilities for each class in the training
          data. 

       G: An integer vector specifying the numbers of mixture
          components (clusters) for which the BIC is to be calculated. 
          The default is 'G=1:9'.  

modelNames: A vector of character strings indicating the models to be
          fitted  in the EM phase of clustering. The help file for
          'mclustModelNames' describes the available models. The
          default is 'c("E", "V")' for univariate data and
          'mclustOptions()\$emModelNames' for multivariate data. 

   prior: The default assumes no prior, but this argument allows
          specification of a  conjugate prior on the means and
          variances through the function  'priorControl'. 

 control: A list of control parameters for EM. The defaults are set by
          the call 'emControl()'.  

initialization: A list containing zero or more of the following
          components:

        _h_c_P_a_i_r_s A matrix of merge pairs for hierarchical clustering
             such as produced by function 'hc'. The default is to
             compute a hierarchical clustering tree by applying
             function 'hc' with 'modelName = "E"' to univariate data
             and 'modelName = "VVV"' to multivariate data or a subset
             as indicated by the 'subset' argument.  The hierarchical
             clustering results are used as starting values for EM.  

        _s_u_b_s_e_t A logical or numeric vector specifying a subset of the
             data to be used in the initial hierarchical clustering
             phase.

    warn: A logical value indicating whether or not certain warnings
          (usually related to singularity) should be issued when
          estimation fails. The default is to suppress these warnings. 

 verbose: A logical variable telling whether or not to print an
          indication that the function is in the training phase, which
          may take some time to complete. 

    ... : Catches unused arguments in indirect or list calls via
          'do.call'. 

_D_e_t_a_i_l_s:

     'mclustDA' combines functions 'mclustDAtrain' and  'mclustDAtest'
     and their summaries. This is suitable when  all test data are
     available in advance, so that the training model is only used
     once.

_V_a_l_u_e:

     A list with the following components:  

    test: A list with the following components:

          _c_l_a_s_s_i_f_i_c_a_t_i_o_n The classification of the test data for this
               instance of  'mclustDA'.

          _u_n_c_e_r_t_a_i_n_t_y The uncertainty of the classification (0 least
               certain, 1 most certain).

          _l_a_b_e_l_s The test labels (if any) from the input.

training: A list with the following components:

          _c_l_a_s_s_i_f_i_c_a_t_i_o_n The classification of the training data for
               this instance of  'mclustDA'.

          _z A  matrix whose _[i,k]_th entry is the probability that 
               observation _i_ in the training data belongs to the 
               _k_th class.

          _l_a_b_e_l_s The training labels from the input.

 summary: A data frame summarizing the 'mclustDA' results including the
          mixture models and numbers of components for the training
          classes. 

_R_e_f_e_r_e_n_c_e_s:

     C. Fraley and A. E. Raftery (2002). Model-based clustering,
     discriminant analysis, and density estimation. _Journal of the
     American Statistical Association 97:611-631_. 

     C. Fraley and A. E. Raftery (2006). MCLUST Version 3 for R: Normal
     Mixture Modeling and Model-Based Clustering,  Technical Report no.
     504, Department of Statistics, University of Washington.

_S_e_e _A_l_s_o:

     'plot.mclustDA',  'mclustDAtrain',  'mclustDAtest', 'classError'

_E_x_a_m_p_l_e_s:

     n <- 250 ## create artificial data
     set.seed(1)
     triModal <- c(rnorm(n,-5), rnorm(n,0), rnorm(n,5))
     triClass <- c(rep(1,n), rep(2,n), rep(3,n))

     odd <- seq(from = 1, to = length(triModal), by = 2)
     even <- odd + 1
     triMclustDA <- mclustDA(train=list(data=triModal[odd],labels=triClass[odd]),
                        test= list(data=triModal[even],labels=triClass[even]),
                            verbose = TRUE)

     names(triMclustDA)
     ## Not run: 
       plot(triMclustDA, trainData = triModal[odd], testData = triModal[even])
     ## End(Not run)

     odd <- seq(from = 1, to = nrow(cross), by = 2)
     even <- odd + 1
     crossMclustDA <- mclustDA( train=list(data=cross[odd,-1],
                                           labels=cross[odd,1]),
                            test= list(data=cross[even,-1],labels=cross[even,1]),
                            verbose = TRUE)

     ## Not run: 
       plot(crossMclustDA, trainData = cross[odd,-1], testData = cross[even,-1])
     ## End(Not run)

     odd <- seq(from = 1, to = nrow(iris), by = 2)
     even <- odd + 1
     irisMclustDA <- mclustDA(train=list(data=iris[odd,-5],labels=iris[odd,5]),
                            test= list(data=iris[even,-5],labels=iris[even,5]),
                            verbose = TRUE)

     ## Not run: 
       plot(irisMclustDA, trainData = iris[odd,-5], testData = iris[even,-5])
     ## End(Not run)

