distconnected             package:vegan             R Documentation

_C_o_n_n_e_c_t_e_d_n_e_s_s _a_n_d _M_i_n_i_m_u_m _S_p_a_n_n_i_n_g _T_r_e_e _f_o_r _D_i_s_s_i_m_i_l_a_r_i_t_i_e_s

_D_e_s_c_r_i_p_t_i_o_n:

     Function 'distconnected' finds groups that are connected
     disregarding dissimilarities that are at or above a threshold or
     'NA'. The function can be used to find groups that can be
     ordinated together or transformed by 'stepacross'. Function
     'no.shared' returns a logical dissimilarity object, where 'TRUE'
     means that sites have no species in common. This is a minimal
     structure for 'distconnected' or can be used to set missing values
     to dissimilarities. Function 'spantree' finds a minimum spanning
     tree connecting all points, but disregarding dissimilarities that
     are at or above the threshold or 'NA'.

_U_s_a_g_e:

     distconnected(dis, toolong = 1, trace = TRUE)
     no.shared(x)
     spantree(dis, toolong = 1)

_A_r_g_u_m_e_n_t_s:

     dis: Dissimilarity data inheriting from class 'dist' or a an
          object, such as a matrix, that can be converted to a
          dissimilarity matrix. Functions 'vegdist' and 'dist' are some
          functions producing suitable dissimilarity data.

 toolong: Shortest dissimilarity regarded as 'NA'. The function uses a
          fuzz factor, so that dissimilarities close to the limit will
          be made 'NA', too. 

   trace: Summarize results of 'distconnected'

       x: Community data.

_D_e_t_a_i_l_s:

     Data sets are disconnected if they have sample plots or groups of
     sample plots which share no species with other sites or groups of
     sites. Such data sets cannot be sensibly ordinated by any
     unconstrained method, because these subsets cannot be related to
     each other. For instance, correspondence analysis will polarize
     these subsets with eigenvalue 1. Neither can such dissimilarities
     be transformed with 'stepacross', because there is no path between
     all points, and result will contain 'NA's. Function
     'distconnected' will find such subsets in dissimilarity matrices.
     The function will return a grouping vector that can be used for
     subsetting the data. If data are connected, the result vector will
     be all 1s. The connectedness between two points can be defined
     either by a threshold 'toolong' or using input dissimilarities
     with 'NA's. If 'toolong' is zero or negative, no threshold will be
     used.

     Function 'no.shared' returns a 'dist' structure having value
     'TRUE' when two sites have nothing in common, and value 'FALSE'
     when they have at least one shared species. This is a minimal
     structure that can be analysed with 'distconnected'. The function
     can be used to select dissimilarities with no shared species in
     indices which do not have a fixed upper limit.

     Function 'spantree' finds a minimum spanning tree for
     dissimilarities (there may be several minimum spanning trees, but
     the function finds only one). Dissimilarities at or above the
     threshold 'toolong' and 'NA's are disregarded, and the spanning
     tree is found through other dissimilarities. If the data are
     disconnected, the function will return a disconnected tree (or a
     forest), and the corresponding link is 'NA'. The results of
     'spantree' can be overlaid onto an ordination diagram using
     function 'ordispantree'. 

     Function 'distconnected' uses depth-first search (Sedgewick 1990).
     Function 'spantree' uses Prim's method implemented as
     priority-first search for dense graphs (Sedgewick 1990).

_V_a_l_u_e:

     Function 'distconnected' returns a vector for observations using
     integers to identify connected groups. If the data are connected,
     values will be all '1'. Function 'no.shared' returns an object of
     class 'dist'. Function 'spantree' returns a list with two vectors,
     each of length n-1. The number of links in a tree is one less the
     number of observations, and the first item is omitted. The items
     are   

    kid : The child node of the parent, starting from parent number
          two. If there is no link from the parent, value will be 'NA'
          and tree is disconnected at the node.

   dist : Corresponding distance. If 'kid = NA', then 'dist = 0'.

_N_o_t_e:

     In principle, minimum spanning tree is equivalent to single
     linkage clustering that can be performed using 'hclust' or
     'agnes'. However, these functions combine clusters to each other
     and the information of the actually connected points (the ``single
     link'') cannot be recovered from the result. The graphical output
     of a single linkage clustering plotted with 'ordicluster' will
     look very different from an equivalent spanning tree plotted with
     'ordispantree'.

_A_u_t_h_o_r(_s):

     Jari Oksanen

_R_e_f_e_r_e_n_c_e_s:

     Sedgewick, R. (1990). _Algorithms in C_. Addison Wesley.

_S_e_e _A_l_s_o:

     'vegdist' or 'dist' for getting dissimilarities, 'stepacross' for
     a case where you may need 'distconnected', 'ordispantree' for
     displaying results of 'spantree', and 'hclust' or 'agnes' for
     single linkage clustering.

_E_x_a_m_p_l_e_s:

     ## There are no disconnected data in vegan, and the following uses an
     ## extremely low threshold limit for connectedness. This is for
     ## illustration only, and not a recommended practice.
     data(dune)
     dis <- vegdist(dune)
     ord <- cmdscale(dis) ## metric MDS
     gr <- distconnected(dis, toolong=0.4)
     tr <- spantree(dis, toolong=0.4)
     ordiplot(ord, type="n")
     ordispantree(ord, tr, col="red", lwd=2)
     points(ord, cex=1.3, pch=21, col=1, bg = gr)
     # Make sites with no shared species as NA in Manhattan dissimilarities
     dis <- vegdist(dune, "manhattan")
     is.na(dis) <- no.shared(dune)

