getSepProj         package:clusterGeneration         R Documentation

_O_P_T_I_M_A_L _P_R_O_J_E_C_T_I_O_N _D_I_R_E_C_T_I_O_N _A_N_D _C_O_R_R_E_S_P_O_N_D_I_N_G _S_E_P_A_R_A_T_I_O_N _I_N_D_E_X _F_O_R _P_A_I_R_S _O_F _C_L_U_S_T_E_R_S

_D_e_s_c_r_i_p_t_i_o_n:

     Optimal projection direction and corresponding separation index
     for pairs of clusters.

_U_s_a_g_e:

     getSepProjTheory(muMat, SigmaArray, 
          iniProjDirMethod=c("SL", "naive"), 
          projDirMethod=c("newton", "fixedpoint"), 
          alpha=0.05, ITMAX=20, eps=1.0e-10, quiet=TRUE)

     getSepProjData(y, cl, 
          iniProjDirMethod=c("SL", "naive"), 
          projDirMethod=c("newton", "fixedpoint"), 
          alpha=0.05, ITMAX=20, eps=1.0e-10, quiet=TRUE)

_A_r_g_u_m_e_n_t_s:

   muMat: Matrix of mean vectors. Rows correspond to mean vectors for
          clusters. 

SigmaArray: Array of covariance matrices. 'SigmaArray[,,i]' record the
          covariance  matrix of the $i$-th cluster. 

       y: Data matrix. Rows correspond to observations. Columns
          correspond to variables. 

      cl: Cluster membership vector. 

iniProjDirMethod: Indicating the method to get initial projection
          direction when calculating the separation index between a
          pair of clusters (c.f. Qiu and Joe, 2006a, 2006b). 
           'iniProjDirMethod'$=$"SL" indicates the initial projection 
          direction is the sample version of the SL's projection
          direction  (Su and Liu, 1993)
          (boldsymbol{Sigma}_1+boldsymbol{Sigma}_2)^{-1}(boldsymbol{mu}_2-boldsymbol{mu}_1)
           'iniProjDirMethod'$=$"naive" indicates the initial
          projection  direction is boldsymbol{mu}_2-boldsymbol{mu}_1 

projDirMethod: Indicating the method to get the optimal projection
          direction when calculating  the separation index between a
          pair of clusters (c.f. Qiu and Joe, 2006a, 2006b). 
           'projDirMethod'$=$"newton" indicates we use the
          Newton-Raphson  method to search the optimal projection
          direction (c.f. Qiu and Joe, 2006a).  This requires the
          assumptions that both covariance matrices of the pair of 
          clusters are positive-definite. If this assumption is
          violated, the  "fixedpoint" method could be used. The
          "fixedpoint" method  iteratively searches the optimal
          projection direction based on the first  derivative of the
          separation index to the project direction  (c.f. Qiu and Joe,
          2006b). 

   alpha: Tuning parameter reflecting the percentage in the two tails
          of a projected cluster that might be outlying. We set
          'alpha'=0.05 like we set the significance level in hypothesis
          testing as 0.05. 

   ITMAX: Maximum iteration allowed when to iteratively calculate the
          optimal projection direction. The actual number of iterations
          is usually much less than the default value 20. 

     eps: Convergence threshold. A small positive number to check if a
          quantitiy  q is equal to zero.  If |q|<'eps', then we regard
          q  as equal to zero.  'eps' is used to check if an algorithm
          converges. The default value is 1.0e-10. 

   quiet: A flag to switch on/off the outputs of intermediate results
          and/or possible warning messages. The default value is
          'TRUE'. 

_D_e_t_a_i_l_s:

     When calculating the optimal projection direction and
     corresponding optimal  separation index for a pair of cluster, if
     one or both cluster covariance  matrices is/are singular, the
     'newton' method can not be used.  In this case, the functions
     'getSepProjTheory' and 'getSepProjData'  will automatically use
     the 'fixedpoint' method to search the optimal  projection
     direction, even if the user specifies the value of the argument 
     'projDirMethod' as 'newton'. Also, multiple initial projection 
     directions will be evaluated. 

     Specifically, $2+2p$ projection directions will be evaluated. The
     first  projection direction is the "naive" direction 
     boldsymbol{mu}_2-boldsymbol{mu}_1.  The second projection
     direction is the "SL" projection direction  
     (boldsymbol{Sigma}_1+boldsymbol{Sigma}_2)^{-1}
     (boldsymbol{mu}_2-boldsymbol{mu}_1). The next $p$ projection
     directions are the $p$ eigenvectors of the covariance  matrix of
     the first cluster. The remaining $p$ projection directions are 
     the $p$ eigenvectors of the covariance matrix of the second
     cluster. 

     Each of these $2+2*p$ projection directions are in turn used as
     the initial  projection direction for the 'fixedpoint' algorithm
     to obtain the  optimal projection direction and the corresponding
     optimal separation index.  We also obtain $2+2*p$ separation
     indices by projecting two clusters along each of these $2+2*p$
     projection directions.

     Finally, the projection direction with the largest separation
     index among the  $2*(2+2*p)$ optimal separation indices is chosen
     as the optimal projection  direction. The corresponding separation
     index is chosen as the optimal  separation index.

_V_a_l_u_e:

sepValMat: Separation index matrix 

projDirArray: Array of projection directions for each pair of clusters 

_A_u_t_h_o_r(_s):

     Weiliang Qiu stwxq@channing.harvard.edu
      Harry Joe harry@stat.ubc.ca

_R_e_f_e_r_e_n_c_e_s:

     Qiu, W.-L. and Joe, H. (2006a) Generation of Random Clusters with
     Specified Degree of Separaion. _Journal of Classification_,
     *23*(2), 315-334.

     Qiu, W.-L. and Joe, H. (2006b) Separation Index and Partial
     Membership for Clustering. _Computational Statistics and Data
     Analysis_, *50*, 585-603.

     Su, J. Q. and Liu, J. S. (1993) Linear Combinations of Multiple
     Diagnostic Markers. _Journal of the American Statistical
     Association_, *88*, 1350-1355.

_E_x_a_m_p_l_e_s:

     n1<-50
     mu1<-c(0,0)
     Sigma1<-matrix(c(2,1,1,5),2,2)
     n2<-100
     mu2<-c(10,0)
     Sigma2<-matrix(c(5,-1,-1,2),2,2)
     projDir<-c(1, 0)
     muMat<-rbind(mu1, mu2)
     SigmaArray<-array(0, c(2,2,2))
     SigmaArray[,,1]<-Sigma1
     SigmaArray[,,2]<-Sigma2

     a<-getSepProjTheory(muMat, SigmaArray, iniProjDirMethod="SL")
     # separation index for cluster distributions 1 and 2
     a$sepValMat[1,2]
     # projection direction for cluster distributions 1 and 2
     a$projDirArray[1,2,]

     library(MASS)
     y1<-mvrnorm(n1, mu1, Sigma1)
     y2<-mvrnorm(n2, mu2, Sigma2)
     y<-rbind(y1, y2)
     cl<-rep(1:2, c(n1, n2))

     b<-getSepProjData(y, cl, iniProjDirMethod="SL", projDirMethod="newton")
     # separation index for clusters 1 and 2
     b$sepValMat[1,2]
     # projection direction for clusters 1 and 2
     b$projDirArray[1,2,]

