Dist                  package:amap                  R Documentation

_D_i_s_t_a_n_c_e _M_a_t_r_i_x _C_o_m_p_u_t_a_t_i_o_n

_D_e_s_c_r_i_p_t_i_o_n:

     This function computes and returns the distance matrix computed by
     using the specified distance measure to compute the distances
     between the rows of a data matrix.

_U_s_a_g_e:

     Dist(x, method = "euclidean", diag = FALSE, upper = FALSE)

_A_r_g_u_m_e_n_t_s:

       x: numeric matrix or (data frame).  Distances between the rows
          of 'x' will be computed.

  method: the distance measure to be used. This must be one of
          '"euclidean"', '"maximum"', '"manhattan"', '"canberra"',
          '"binary"', '"pearson"', '"correlation"' or '"spearman"'. Any
          unambiguous substring can be given.

    diag: logical value indicating whether the diagonal of the distance
          matrix should be printed by 'print.dist'.

   upper: logical value indicating whether the upper triangle of the
          distance matrix should be printed by 'print.dist'.

_D_e_t_a_i_l_s:

     Available distance measures are (written for two vectors x and y):

     '_e_u_c_l_i_d_e_a_n': Usual square distance between the two vectors (2
          norm).

     '_m_a_x_i_m_u_m': Maximum distance between two components of x and y
          (supremum norm)

     '_m_a_n_h_a_t_t_a_n': Absolute distance between the two vectors (1 norm).

     '_c_a_n_b_e_r_r_a': sum(|x_i - y_i| / |x_i + y_i|).  Terms with zero
          numerator and denominator are omitted from the sum and
          treated as if the values were missing.


     '_b_i_n_a_r_y': (aka _asymmetric binary_): The vectors are regarded as
          binary bits, so non-zero elements are `on' and zero elements
          are `off'.  The distance is the _proportion_ of bits in which
          only one is on amongst those in which at least one is on.

     '_p_e_a_r_s_o_n': Also named "not centered Pearson" sum(x_i y_i) / sqrt
          [sum(x_i^2) sum(y_i^2)].


     '_c_o_r_r_e_l_a_t_i_o_n': Also named "Centered Pearson" 1 - corr(x,y).  

     '_s_p_e_a_r_m_a_n': Compute a distance based on rank. sum (d_i^2) where
          d_i is the difference in rank between x_i and y_i.

          'Dist(x,method="spearman")[i,j] ='

          'cor.test(x[i,],x[j,],method="spearman")$statistic'


     Missing values are allowed, and are excluded from all computations
     involving the rows within which they occur.  If some columns are
     excluded in calculating a Euclidean, Manhattan or Canberra
     distance, the sum is scaled up proportionally to the number of
     columns used. If all pairs are excluded when calculating a
     particular distance, the value is 'NA'.

     The functions 'as.matrix.dist()' and 'as.dist()' can be used for
     conversion between objects of class '"dist"' and conventional
     distance matrices and vice versa.

_V_a_l_u_e:

     An object of class '"dist"'.

     The lower triangle of the distance matrix stored by columns in a
     vector, say 'do'. If 'n' is the number of observations, i.e., 'n
     <- attr(do, "Size")', then for i < j <= n, the dissimilarity
     between (row) i and j is 'do[n*(i-1) - i*(i-1)/2 + j-i]'. The
     length of the vector is n*(n-1)/2, i.e., of order n^2.

     The object has the following attributes (besides '"class"' equal
     to '"dist"'): 

    Size: integer, the number of observations in the dataset.

  Labels: optionally, contains the labels, if any, of the observations
          of the dataset.

Diag, Upper: logicals corresponding to the arguments 'diag' and 'upper'
          above, specifying how the object should be printed.

    call: optionally, the 'call' used to create the object.

 methods: optionally, the distance method used; resulting form
          'dist()', the ('match.arg()'ed) 'method' argument.

_R_e_f_e_r_e_n_c_e_s:

     Mardia, K. V., Kent, J. T. and Bibby, J. M. (1979) _Multivariate
     Analysis._ London: Academic Press.

_S_e_e _A_l_s_o:

     'daisy' in the 'cluster' package with more possibilities in the
     case of _mixed_ (contiuous / categorical) variables. 'dist' 
     'hclust'.

_E_x_a_m_p_l_e_s:

     x <- matrix(rnorm(100), nrow=5)
     Dist(x)
     Dist(x, diag = TRUE)
     Dist(x, upper = TRUE)

