kgs                 package:maptree                 R Documentation

_K_G_S _M_e_a_s_u_r_e _f_o_r _P_r_u_n_i_n_g _H_i_e_r_a_r_c_h_i_c_a_l _C_l_u_s_t_e_r_s

_D_e_s_c_r_i_p_t_i_o_n:

     Computes the Kelley-Gardner-Sutcliffe penalty function for a 
     hierarchical cluster tree.

_U_s_a_g_e:

       kgs (cluster, diss, alpha=1, maxclust=NULL)

_A_r_g_u_m_e_n_t_s:

 cluster: object of class 'hclust' or 'twins'.

    diss: object of class 'dissimilarity' or 'dist'.

   alpha: weight for number of clusters.

maxclust: maximum number of clusters for which to compute measure.

_D_e_t_a_i_l_s:

     Kelley et al. (see reference) proposed a method that can help
     decide  where to prune a hierarchical cluster tree.  At any level
     of the  tree the mean across all clusters of the mean within
     clusters of the  dissimilarity measure is calculated.  After
     normalizing, the number  of clusters times alpha is added.  The
     minimum of this function   corresponds to the suggested pruning
     size.

     The current implementation has complexity O(n*n*maxclust), thus 
     very slow with large n.  For improvements, at least it should only
      calculate the spread for clusters that are split at each level, 
     rather than over again for all.

     Uses contributed function 'combn'.

_V_a_l_u_e:

     Vector of the penalty function for trees of size 2:maxclust.  The
     names of vector elements are the respective numbers of clusters.

_A_u_t_h_o_r(_s):

     Denis White, white.denis@epa.gov

_R_e_f_e_r_e_n_c_e_s:

     Kelley, L.A., Gardner, S.P., Sutcliffe, M.J.  (1996)  An automated
     approach for clustering an ensemble of NMR-derived protein
     structures  into conformationally-related subfamilies, _Protein
     Engineering_,  *9*, 1063-1065.

_S_e_e _A_l_s_o:

     'twins.object',  'dissimilarity.object',  'hclust',  'dist',
     'prune.clust', 'combn'

_E_x_a_m_p_l_e_s:

       library (cluster)
       library (combinat)
       data (votes.repub)

       a <- agnes (votes.repub, method="ward")
       b <- kgs (a, a$diss, maxclust=20)
       plot (names (b), b, xlab="# clusters", ylab="penalty")

