vegdist                package:vegan                R Documentation

_D_i_s_s_i_m_i_l_a_r_i_t_y _I_n_d_i_c_e_s _f_o_r _C_o_m_m_u_n_i_t_y _E_c_o_l_o_g_i_s_t_s

_D_e_s_c_r_i_p_t_i_o_n:

     The function computes dissimilarity indices that are useful for or
     popular with community ecologists. Gower, Bray-Curtis, Jaccard and
     Kulczynski indices are good in detecting underlying ecological
     gradients (Faith et al. 1987). Morisita, Horn-Morisita and
     Binomial indices should be able to handle different sample sizes
     (Wolda 1981, Krebs 1999, Anderson & Millar 2004), and Mountford
     (1962) and Raup-Crick indices for presence-absence data should be
     able to handle unknown (and variable) sample sizes.

_U_s_a_g_e:

     vegdist(x, method="bray", binary=FALSE, diag=FALSE, upper=FALSE,
             na.rm = FALSE, ...) 

_A_r_g_u_m_e_n_t_s:

       x: Community data matrix.

  method: Dissimilarity index, partial match to  '"manhattan"',
          '"euclidean"', '"canberra"', '"bray"', '"kulczynski"',
          '"jaccard"', '"gower"', '"morisita"', '"horn"',
          '"mountford"', '"raup"' or '"binomial"'.

  binary: Perform presence/absence standardization before analysis
          using 'decostand'.

    diag: Compute diagonals. 

   upper: Return only the upper diagonal. 

   na.rm: Pairwise deletion of missing observations when computing
          dissimilarities.

     ...: Other parameters.  These are ignored, except in 'method
          ="gower"' which accepts 'range.global' parameter of
          'decostand'. .

_D_e_t_a_i_l_s:

     Jaccard ('"jaccard"'), Mountford ('"mountford"'), Raup-Crick
     ('"raup"') and Binomial indices are discussed below. The other
     indices are defined as:

       'euclidean'   d[jk] = sqrt(sum (x[ij]-x[ik])^2)
       'manhattan'   d[jk] = sum(abs(x[ij] - x[ik]))
       'gower'       d[jk] = sum (abs(x[ij]-x[ik])/(max(x[i])-min(x[i]))
       'canberra'    d[jk] = (1/NZ) sum ((x[ij]-x[ik])/(x[ij]+x[ik]))
                     where NZ is the number of non-zero entries.
       'bray'        d[jk] = (sum abs(x[ij]-x[ik])/(sum (x[ij]+x[ik]))
       'kulczynski'  d[jk] 1 - 0.5*((sum min(x[ij],x[ik])/(sum x[ij]) + (sum min(x[ij],x[ik])/(sum x[ik]))
       'morisita'    {d[jk] = 2*sum(x[ij]*x[ik])/((lambda[j]+lambda[k]) * sum(x[ij])*sum(x[ik]))  }
                     where lambda[j] = sum(x[ij]*(x[ij]-1))/sum(x[ij])*sum(x[ij]-1)
       'horn'        Like 'morisita', but lambda[j] = sum(x[ij]^2)/(sum(x[ij])^2)
       'binomial'    d[jk] = sum(x[ij]*log(x[ij]/n[i]) + x[ik]*log(x[ik]/n[i]) - n[i]*log(1/2))/n[i]
                     where n[i] = x[ij] + x[ik]

     Jaccard index is computed as 2B/(1+B), where B is Bray-Curtis
     dissimilarity.

     Binomial index is derived from Binomial deviance under null
     hypothesis that the two compared communities are equal. It should
     be able to handle variable sample sizes. The index does not have a
     fixed upper limit, but can vary among sites with no shared
     species. For further discussion, see Anderson & Millar (2004).

     Mountford index is defined as M = 1/alpha where alpha is the
     parameter of Fisher's logseries assuming that the compared
     communities are samples from the same community (cf. 'fisherfit',
     'fisher.alpha'). The index M is found as the positive root of
     equation exp(a*M) + exp(b*M) = 1 + exp((a+b-j)*M), where j is the
     number of species occurring in both communities, and a and b are
     the number of species in each separate community (so the index
     uses presence-absence information). Mountford index is usually
     misrepresented in the literature: indeed Mountford (1962)
     suggested an approximation to be used as starting value in
     iterations, but the proper index is defined as the root of the
     equation above. The function 'vegdist' solves M with the Newton
     method. Please note that if either a or b are equal to j, one of
     the communities could be a subset of other, and the dissimilarity
     is 0 meaning that non-identical objects may be regarded as similar
     and the index is non-metric. The Mountford index is in the range 0
     ... log(2), but the dissimilarities are divided by log(2)  so that
     the results will be in the conventional range 0 ... 1.

     Raup-Crick dissimilarity ('method = "raup"') is a probabilistic
     index based on presensec/absence data.  It is defined as 1 -
     prob(j), or based on the probability of observing at least j 
     species in shared in compared communities.  Legendre & Legendre
     (1978) suggest using simulations to assess the probability, but
     the current function uses analytic result from hypergeometric
     distribution ('phyper') instead.  This probability (and the index)
     is dependent on the number of species missing in both sites, and
     adding all-zero species to the data or removing missing species
     from the data will influence the index.  The probability (and the
     index) may be almost zero or almost one for a wide range of
     parameter values.  The index is nonmetric: two communities with no
     shared species may have a dissimilarity slightly below one, and
     two identical communities may have dissimilarity slightly above
     zero.

     Morisita index can be used with genuine count data (integers)
     only. Its Horn-Morisita variant is able to handle any abundance
     data.

     Euclidean and Manhattan dissimilarities are not good in gradient
     separation without proper standardization but are still included
     for comparison and special needs.

     Bray-Curtis and Jaccard indices are rank-order similar, and some
     other indices become identical or rank-order similar after some 
     standardizations, especially with presence/absence transformation
     of equalizing site totals with 'decostand'. Jaccard index is
     metric, and probably should be preferred instead of the default
     Bray-Curtis which is semimetric. 

     The naming conventions vary. The one adopted here is traditional
     rather than truthful to priority. The function finds either
     quantitative or binary variants of the indices under the same
     name, which correctly may refer only to one of these alternatives
     For instance, the Bray index is known also as Steinhaus,
     Czekanowski and Srensen index. The quantitive version of Jaccard
     should probably called Ruzicka index (but spelled with characters
     that cannot be shown here). The abbreviation '"horn"' for the
     Horn-Morisita index is misleading, since there is a separate Horn
     index. The abbreviation will be changed if that index is
     implemented in 'vegan'.

_V_a_l_u_e:

     Should provide a drop-in replacement for 'dist' and return a
     distance object of the same type.

_N_o_t_e:

     The  function is an alternative to 'dist' adding some ecologically
     meaningful indices.  Both methods should produce similar types of
     objects which can be interchanged in any method accepting either. 
     Manhattan and Euclidean dissimilarities should be identical in
     both methods. Canberra index is divided by the number of variables
     in 'vegdist', but not in 'dist'. So these differ by a constant
     multiplier, and the alternative in 'vegdist' is in range (0,1). 
     Function 'daisy' (package 'cluster') provides alternative
     implentation of Gower index for mixed data of numeric and class
     variables (but it works for mixed variables only).

     Most dissimilarity indices in 'vegdist' are designed for community
     data, and they will give misleading values if there are negative
     data entries.  The results may also be misleading or 'NA' or 'NaN'
     if there are empty sites.  In principle, you cannot study species
     compostion without species and you should remove empty sites from
     community data.

_A_u_t_h_o_r(_s):

     Jari Oksanen, with contributions from Tyler Smith (Gower index)
     and Michael Bedward (Raup-Crick index).

_R_e_f_e_r_e_n_c_e_s:

     Anderson, M.J. and Millar, R.B. (2004). Spatial variation and
     effects of habitat on temperate reef fish assemblages in
     northeastern New Zealand.  _Journal of Experimental Marine Biology
     and Ecology_ 305, 191-221.

     Faith, D. P, Minchin, P. R. and Belbin, L. (1987). Compositional
     dissimilarity as a robust measure of ecological distance.
     _Vegetatio_ 69, 57-68.

     Krebs, C. J. (1999). _Ecological Methodology._ Addison Wesley
     Longman.

     Legendre, P, & Legendre, L. (1998) _Numerical Ecology_. 2nd
     English Edition. Elsevier.

     Mountford, M. D. (1962). An index of similarity and its
     application to classification problems. In: P.W.Murphy (ed.),
     _Progress in Soil Zoology_, 43-50. Butterworths.

     Wolda, H. (1981). Similarity indices, sample size and diversity.
     _Oecologia_ 50, 296-302.

_S_e_e _A_l_s_o:

     'decostand', 'dist', 'rankindex', 'isoMDS', 'stepacross', 'daisy',
     'dsvdis'.

_E_x_a_m_p_l_e_s:

     data(varespec)
     vare.dist <- vegdist(varespec)
     # Orlci's Chord distance: range 0 .. sqrt(2)
     vare.dist <- vegdist(decostand(varespec, "norm"), "euclidean")

