dist                  package:proxy                  R Documentation

_M_a_t_r_i_x _D_i_s_t_a_n_c_e/_S_i_m_i_l_a_r_i_t_y _C_o_m_p_u_t_a_t_i_o_n

_D_e_s_c_r_i_p_t_i_o_n:

     These functions compute and return the auto-distance/similarity
     matrix between either rows or columns of a matrix/data frame, or a
     list, as well as the cross-distance matrix between two
     matrices/data frames/lists.

_U_s_a_g_e:

     dist(x, y = NULL, method = NULL, ..., diag = FALSE, upper = FALSE,
          pairwise = FALSE, by_rows = TRUE, convert_similarities = TRUE,
          auto_convert_data_frames = TRUE)
     simil(x, y = NULL, method = NULL, ..., diag = FALSE, upper = FALSE,
           pairwise = FALSE, by_rows = TRUE, convert_distances = TRUE,
           auto_convert_data_frames = TRUE)

     pr_dist2simil(x)
     pr_simil2dist(x)

     as.dist(x, FUN = NULL)
     as.simil(x, FUN = NULL)

     ## S3 method for class 'dist':
     as.matrix(x, diag = NA, ...)
     ## S3 method for class 'simil':
     as.matrix(x, diag = NA, ...)

_A_r_g_u_m_e_n_t_s:

       x: For 'dist' and 'simil', a numeric matrix object, a data
          frame, or a list. A vector will be converted into a column
          matrix. For 'as.simil' and 'as.dist', an object of class
          'dist' and 'simil', respectively, or a numeric matrix. For
          'pr_dist2simil' and 'pr_simil2dist', any numeric vector.

       y: 'NULL', or a similar object than 'x'

  method: a function, or a mnemonic string referencing the proximity
          measure. A list of all available measures can be obtained
          using 'pr_DB' (see examples). The default for 'dist' is
          '"Euclidean"', and for 'simil' '"correlation"'.

    diag: logical value indicating whether the diagonal of the
          distance/similarity matrix should be printed by
          'print.dist'/'print.simil'.

          In the context of 'as.matrix' the value to use on the
          diagonal representing self-proximities.

   upper: logical value indicating whether the upper triangle of the
          distance/similarity matrix should be printed by
          'print.dist'/'print.simil'

pairwise: logical value indicating whether distances should be computed
          for the pairs of 'x' and 'y' only.

 by_rows: logical indicating whether proximities between rows, or
          columns should be computed.

convert_similarities, convert_distances: logical indicating whether
          distances should be automatically converted into similarities
          (and the other way round) if needed.

auto_convert_data_frames: logical indicating whether data frames should
          be converted to matrices if all variables are numeric, or all
          are logical, or all are complex.

     FUN: optional function to be used by 'as.dist' and 'as.simil'. If
          'NULL', it is looked up in the method registry. If there is
          none specified there, 'FUN' defaults to 'pr_simil2dist' and
          'pr_dist2simil', respectively.

     ...: further arguments passed to the proximity function.

_D_e_t_a_i_l_s:

     The interface is fashioned after 'dist', but can also compute
     cross-distances, and allows user extensions by means of registry
     of all proximity measures (see 'pr_DB').

     Missing values are allowed but are excluded from all computations 
     involving the rows within which they occur. If some columns are
     excluded in calculating a Euclidean, Manhattan, Canberra or
     Minkowski distance, the sum is scaled up proportionally to the
     number of columns used (compare 'dist' in package 'stats').

     Data frames are silently coerced to matrix if all columns are of
     (same) mode 'numeric' or 'logical'.

     Distance measures can be used with 'simil', and similarity
     measures with 'dist'. In these cases, the result is transformed
     accordingly using the specified conversion functions (default:
     pr_simil2dist(d) = 1 - s and pr_dist2simil(s) = 1 / (1 - d)).
     Objects of class 'simil' and 'dist' can be converted one in
     another using 'as.dist' and 'as.simil', respectively.

     Distance and similarity objects can conveniently be subsetted 
     (see examples). Note that duplicate indexes are silently ignored.

_V_a_l_u_e:

     Auto distances/similarities are returned as an object of class
     'dist'/'simil' and  cross-distances/similarities as an object of
     class 'crossdist'/'crosssimil'.

_A_u_t_h_o_r(_s):

     David Meyer David.Meyer@R-project.org and   Christian Buchta
     Christian.Buchta@wu-wien.ac.at

_R_e_f_e_r_e_n_c_e_s:

     Anderberg, M.R. (1973), _Cluster analysis for applications_, 359
     pp., Academic Press, New York, NY, USA.

     Cox, M.F. and Cox, M.A.A. (2001), _Multidimensional Scaling_,
     Chapman and Hall.

     Sokol, R.S. and Sneath P.H.A (1963), _Principles of Numerical
     Taxonomy_, W. H. Freeman and Co., San Francisco.

_S_e_e _A_l_s_o:

     'dist' for compatibility information.

_E_x_a_m_p_l_e_s:

     ### show available proximities
     summary(pr_DB)

     ### binary data
     x <- matrix(sample(c(FALSE, TRUE), 8, rep = TRUE), ncol = 2)
     dist(x, method = "Jaccard")

     ### for real-valued data
     dist(x, method = "eJaccard")

     ### for positive real-valued data
     dist(x, method = "fJaccard")

     ### cross distances
     dist(x, x, method = "Jaccard")

     ### pairwise (diagonal)
     dist(x, x, method = "Jaccard", 
              pairwise = TRUE)

     ### this is the same but less efficient
     as.matrix(stats::dist(x, method = "binary"))

     ### numeric data
     x <- matrix(rnorm(16), ncol = 4)

     ## test inheritance of names
     rownames(x) <- LETTERS[1:4]
     colnames(x) <- letters[1:4]
     dist(x)
     dist(x, x)

     ## custom distance function
     f <- function(x, y) sum(x * y)
     dist(x, f)

     ## working with lists
     z <- unlist(apply(x, 1, list), recursive = FALSE)
     (d <- dist(z))
     dist(z, z)

     ## subsetting
     d[[1:2]]
     subset(d, c(1,3,4))
     d[[c(1,2,2)]]       # duplicate index gets ignored

     ## transformations and self-proximities
     as.matrix(as.simil(d, function(x) exp(-x)), diag = 1)

     ## row and column indexes
     row.dist(d)
     col.dist(d)

