clusterMix              package:bayesm              R Documentation

_C_l_u_s_t_e_r _O_b_s_e_r_v_a_t_i_o_n_s _B_a_s_e_d _o_n _I_n_d_i_c_a_t_o_r _M_C_M_C _D_r_a_w_s

_D_e_s_c_r_i_p_t_i_o_n:

     'clusterMix' uses MCMC draws of indicator variables from a normal
     component mixture model to cluster observations based on a
     similarity matrix.

_U_s_a_g_e:

     clusterMix(zdraw, cutoff = 0.9, SILENT = FALSE)

_A_r_g_u_m_e_n_t_s:

   zdraw: R x nobs array of draws of indicators 

  cutoff: cutoff probability for similarity  (def=.9)

  SILENT: logical flag for silent operation (def= FALSE) 

_D_e_t_a_i_l_s:

     define a similarity matrix, Sim, Sim[i,j]=1 if observations i and
     j are in same component. Compute the posterior mean of Sim over
     indicator draws.

     clustering is achieved by two means:

     Method A: Find the indicator draw whose similarity matrix
     minimizes, loss(E[Sim]-Sim(z)),   where loss is absolute
     deviation.

     Method B: Define a Similarity matrix by setting any element of
     E[Sim] = 1 if E[Sim] > cutoff. Compute the clustering scheme
     associated with this "windsorized" Similarity matrix.

_V_a_l_u_e:

clustera: indicator function for clustering based on method A above

clusterb: indicator function for clustering based on method B above

_W_a_r_n_i_n_g:

     This routine is a utility routine that does *not* check the input
     arguments for proper dimensions and type.

_A_u_t_h_o_r(_s):

     Peter Rossi, Graduate School of Business, University of Chicago
     Peter.Rossi@ChicagoGsb.edu.

_R_e_f_e_r_e_n_c_e_s:

     For further discussion, see _Bayesian Statistics and Marketing_ by
     Rossi, Allenby and McCulloch Chapter 3. 
      <URL:
     http://gsbwww.uchicago.edu/fac/peter.rossi/research/bsm.html>

_S_e_e _A_l_s_o:

     'rnmixGibbs'

_E_x_a_m_p_l_e_s:

     ##
     if(nchar(Sys.getenv("LONG_TEST")) != 0) 
     {
     ## simulate data from mixture of normals
     n=500
     pvec=c(.5,.5)
     mu1=c(2,2)
     mu2=c(-2,-2)
     Sigma1=matrix(c(1,.5,.5,1),ncol=2)
     Sigma2=matrix(c(1,.5,.5,1),ncol=2)
     comps=NULL
     comps[[1]]=list(mu1,backsolve(chol(Sigma1),diag(2)))
     comps[[2]]=list(mu2,backsolve(chol(Sigma2),diag(2)))
     dm=rmixture(n,pvec,comps)
     ## run MCMC on normal mixture
     R=2000
     Data=list(y=dm$x)
     ncomp=2
     Prior=list(ncomp=ncomp,a=c(rep(100,ncomp)))
     Mcmc=list(R=R,keep=1)
     out=rnmixGibbs(Data=Data,Prior=Prior,Mcmc=Mcmc)
     begin=500
     end=R
     ## find clusters
     outclusterMix=clusterMix(out$zdraw[begin:end,])
     ##
     ## check on clustering versus "truth"
     ##  note: there could be switched labels
     ##
     table(outclusterMix$clustera,dm$z)
     table(outclusterMix$clusterb,dm$z)
     }
     ##

