edist                 package:energy                 R Documentation

_E-_d_i_s_t_a_n_c_e

_D_e_s_c_r_i_p_t_i_o_n:

     Returns the E-distances (energy statistics) between clusters.

_U_s_a_g_e:

      edist(x, sizes, distance = FALSE, ix = 1:sum(sizes), alpha = 1)

_A_r_g_u_m_e_n_t_s:

       x: data matrix of pooled sample or Euclidean distances

   sizes: vector of sample sizes

distance: logical: if TRUE, x is a distance matrix

      ix: a permutation of the row indices of x 

   alpha: distance exponent 

_D_e_t_a_i_l_s:

     A vector containing the pairwise two-sample multivariate 
     E-statistics for comparing clusters or samples is returned.  The
     e-distance between clusters is computed from the original pooled
     data,  stacked in matrix 'x' where each row is a multivariate
     observation, or  from the distance matrix 'x' of the original
     data, or distance object  returned by 'dist'. The first 'sizes[1]'
     rows of the original data  matrix are the first sample, the next
     'sizes[2]' rows are the second  sample, etc. The permutation
     vector 'ix' may be used to obtain e-distances corresponding to a
     clustering solution at a given level in the hierarchy.

     The e-distance between two clusters C_i, C_j of size n_i, n_j 
     proposed by Szekely and Rizzo (2003) is the e-distance e(C_i,C_j),
     defined by

       e(S_i, S_j) = (n_i n_j)(n_i+n_j)[2M_(ij)-M_(ii)-M_(jj)],

     where

    M_{ij} = 1/(n_i n_j) sum[1:n_i, 1:n_j] ||X_(ip) - X_(jq)||^a,

     || || denotes Euclidean norm, a= 'alpha', and X_(ip) denotes the
     p-th observation in the i-th cluster.  The exponent 'alpha' should
     be in the interval (0,2].

_V_a_l_u_e:

     A object of class 'dist' containing the lower triangle of the
     e-distance matrix of cluster distances corresponding to the
     permutation  of indices 'ix' is returned.

_A_u_t_h_o_r(_s):

     Maria L. Rizzo rizzo@math.ohiou.edu and Gabor J. Szekely
     gabors@bgnet.bgsu.edu

_R_e_f_e_r_e_n_c_e_s:

     Szekely, G. J. and Rizzo, M. L. (2005) Hierarchical Clustering via
     Joint Between-Within Distances: Extending Ward's Minimum Variance
     Method, _Journal of Classification_ 22(2) (in press).

     Szekely, G. J. and Rizzo, M. L. (2004) Testing for Equal
     Distributions in High Dimension, InterStat, November (5).

     Szekely, G. J. (2000) Technical Report 03-05, E-statistics: Energy
     of  Statistical Samples, Department of Mathematics and Statistics,
     Bowling Green State University.

_S_e_e _A_l_s_o:

     'energy.hclust' 'eqdist.etest'  'ksample.e'

_E_x_a_m_p_l_e_s:

      ## compute e-distances for 3 samples of iris data
      data(iris)
      edist(iris[,1:4], c(50,50,50))


      ## compute e-distances from vector of group labels
      d <- dist(matrix(rnorm(100), nrow=50))
      g <- cutree(energy.hclust(d), k=4)
      edist(d, sizes=table(g), ix=rank(g, ties.method="first"))
      

