hcluster                package:amap                R Documentation

_H_i_e_r_a_r_c_h_i_c_a_l _C_l_u_s_t_e_r_i_n_g

_D_e_s_c_r_i_p_t_i_o_n:

     Hierarchical cluster analysis.

_U_s_a_g_e:

     hcluster(x, method = "euclidean", diag = FALSE, upper = FALSE,
              link = "complete", members = NULL, nbproc = 2,
              doubleprecision = TRUE)

_A_r_g_u_m_e_n_t_s:

       x: A numeric matrix of data, or an object that can be coerced to
          such a matrix (such as a numeric vector or a data frame with
          all numeric columns). Or an object of class "exprSet". 

  method: the distance measure to be used. This must be one of
          '"euclidean"', '"maximum"', '"manhattan"', '"canberra"'
          '"binary"' '"pearson"', '"correlation"', '"spearman"' or
          '"kendall"'. Any unambiguous substring can be given.

    diag: logical value indicating whether the diagonal of the distance
          matrix should be printed by 'print.dist'.

   upper: logical value indicating whether the upper triangle of the
          distance matrix should be printed by 'print.dist'.

    link: the agglomeration method to be used. This should be (an
          unambiguous abbreviation of) one of '"ward"', '"single"',
          '"complete"', '"average"', '"mcquitty"', '"median"' or
          '"centroid"'.

 members: 'NULL' or a vector with length size of 'd'.

  nbproc: integer, number of subprocess for parallelization

doubleprecision: True: use of double precision for distance matrix
          computation; False: use simple precision

_D_e_t_a_i_l_s:

     This function is a mix of function 'hclust' and function 'dist'.
     'hcluster(x, method = "euclidean",link = "complete") =
     hclust(dist(x, method = "euclidean"),method = "complete"))'    It
     use twice less memory, as it doesn't store distance matrix.

     For more details, see documentation of 'hclust' and 'dist'.

_V_a_l_u_e:

     An object of class *hclust* which describes the tree produced by
     the clustering process. The object is a list with components: 

   merge: an n-1 by 2 matrix. Row i of 'merge' describes the merging of
          clusters at step i of the clustering. If an element j in the
          row is negative, then observation -j was merged at this
          stage. If j is positive then the merge was with the cluster
          formed at the (earlier) stage j of the algorithm. Thus
          negative entries in 'merge' indicate agglomerations of
          singletons, and positive entries indicate agglomerations of
          non-singletons.

  height: a set of n-1 non-decreasing real values. The clustering
          _height_: that is, the value of the criterion associated with
          the clustering 'method' for the particular agglomeration.

   order: a vector giving the permutation of the original observations
          suitable for plotting, in the sense that a cluster plot using
          this ordering and matrix 'merge' will not have crossings of
          the branches.

  labels: labels for each of the objects being clustered.

    call: the call which produced the result.

  method: the cluster method that has been used.

dist.method: the distance that has been used to create 'd' (only
          returned if the distance object has a '"method"' attribute).


     There is a 'print' and a 'plot' method for 'hclust' objects. The
     'plclust()' function is basically the same as the plot method,
     'plot.hclust', primarily for back compatibility with S-plus. Its
     extra arguments are not yet implemented.

_A_u_t_h_o_r(_s):

     The 'hcluster' function is based on C code adapted from Cran
     Fortran routine by Antoine Lucas <URL:
     http://mulcyber.toulouse.inra.fr/projects/amap/>.

_R_e_f_e_r_e_n_c_e_s:

     Antoine Lucas and Sylvain Jasson, _Using amap and ctc Packages for
     Huge Clustering_, R News, 2006, vol 6, issue 5 pages 58-60.

_S_e_e _A_l_s_o:

     'Dist', 'hclust', 'kmeans'.

_E_x_a_m_p_l_e_s:

     data(USArrests)
     hc <- hcluster(USArrests,link = "ave")
     plot(hc)
     plot(hc, hang = -1)

     ## Do the same with centroid clustering and squared Euclidean distance,
     ## cut the tree into ten clusters and reconstruct the upper part of the
     ## tree from the cluster centers.
     hc <- hclust(dist(USArrests)^2, "cen")
     memb <- cutree(hc, k = 10)
     cent <- NULL
     for(k in 1:10){
       cent <- rbind(cent, colMeans(USArrests[memb == k, , drop = FALSE]))
     }
     hc1 <- hclust(dist(cent)^2, method = "cen", members = table(memb))
     opar <- par(mfrow = c(1, 2))
     plot(hc,  labels = FALSE, hang = -1, main = "Original Tree")
     plot(hc1, labels = FALSE, hang = -1, main = "Re-start from 10 clusters")
     par(opar)

     ## other combinaison are possible

     hc <- hcluster(USArrests,method = "euc",link = "ward", nbproc= 1,
     doubleprecision = TRUE)
     hc <- hcluster(USArrests,method = "max",link = "single", nbproc= 2,
     doubleprecision = TRUE)
     hc <- hcluster(USArrests,method = "man",link = "complete", nbproc= 1,
     doubleprecision = TRUE)
     hc <- hcluster(USArrests,method = "can",link = "average", nbproc= 2,
     doubleprecision = TRUE)
     hc <- hcluster(USArrests,method = "bin",link = "mcquitty", nbproc= 1,
     doubleprecision = FALSE)
     hc <- hcluster(USArrests,method = "pea",link = "median", nbproc= 2,
     doubleprecision = FALSE)
     hc <- hcluster(USArrests,method = "cor",link = "centroid", nbproc= 1,
     doubleprecision = FALSE)
     hc <- hcluster(USArrests,method = "spe",link = "complete", nbproc= 2,
     doubleprecision = FALSE)

