cca                  package:vegan                  R Documentation

[_P_a_r_t_i_a_l] [_C_o_n_s_t_r_a_i_n_e_d] _C_o_r_r_e_s_p_o_n_d_e_n_c_e _A_n_a_l_y_s_i_s _a_n_d _R_e_d_u_n_d_a_n_c_y
_A_n_a_l_y_s_i_s

_D_e_s_c_r_i_p_t_i_o_n:

     Function 'cca' performs correspondence analysis, or optionally
     constrained correspondence analysis (a.k.a. canonical
     correspondence analysis), or optionally partial constrained
     correspondence analysis. Function 'rda' performs redundancy
     analysis, or optionally principal components analysis. These are
     all very popular ordination techniques in community ecology.

_U_s_a_g_e:

     ## S3 method for class 'formula':
     cca(formula, data)
     ## Default S3 method:
     cca(X, Y, Z, ...)
     ## S3 method for class 'formula':
     rda(formula, data, scale=FALSE)
     ## Default S3 method:
     rda(X, Y, Z, scale=FALSE, ...)
     ## S3 method for class 'cca':
     summary(object, scaling=2, axes=6, digits, ...)

_A_r_g_u_m_e_n_t_s:

 formula: Model formula, where the left hand side gives the community
          data matrix, right hand side gives the constraining
          variables, and conditioning variables can be given within a
          special function 'Condition'.

    data: Data frame containing the variables on the right hand side of
          the model formula.

       X: Community data matrix. 

       Y: Constraining matrix, typically of environmental variables.
          Can be missing. 

       Z: Conditioning matrix, the effect of which is removed
          (`partialled out') before next step. Can be missing.

  object: A 'cca' result object.

 scaling: Scaling for species and site scores. Either species ('2') or
          site ('1') scores are scaled by eigenvalues, and the other
          set of scores is left unscaled, or with '3' both are scaled
          symmetrically by square root of eigenvalues.  Corresponding
          negative values can be used in 'cca' to additionally multiply
          results with sqrt(1/(1-lambda)).  This scaling is know as
          Hill scaling (although it has nothing to do with Hill's
          rescaling of 'decorana'). With corresponding negative values
          in'rda', species scores are divided by standard deviation of
          each species. Unscaled raw scores stored in the result can be
          accessed with 'scaling = 0'.  

    axes: Number of axes in summaries.

  digits: Number of digits in output.

   scale: Scale species to unit variance (like correlations do).

     ...: Other parameters for 'print' or 'plot' functions.

_D_e_t_a_i_l_s:

     Since their introduction (ter Braak 1986), constrained or
     canonical correspondence analysis, and its spin-off, redundancy
     analysis have been the most popular ordination methods in
     community ecology. Functions 'cca' and 'rda' are  similar to
     popular proprietary software 'Canoco', although implementation is
     completely different.  The functions are based on Legendre &
     Legendre's (1998) algorithm: in 'cca' Chi-square transformed data
     matrix is subjected to weighted linear regression on constraining
     variables, and the fitted values are submitted to correspondence
     analysis performed via singular value decomposition ('svd').
     Function 'rda' is similar, but uses ordinary, unweighted linear
     regression and unweighted SVD.

     The functions can be called either with matrix entries for
     community data and constraints, or with formula interface.  In
     general, the formula interface is preferred, because it allows a
     better control of the model and allows factor constraints.

     In matrix interface, the community data matrix 'X' must be given,
     but any other data matrix can be omitted, and the corresponding
     stage of analysis is skipped.  If matrix 'Z' is supplied, its
     effects are removed from the community matrix, and the residual
     matrix is submitted to the next stage.  This is called `partial'
     correspondence or redundancy analysis.  If matrix 'Y' is supplied,
     it is used to constrain the ordination, resulting in constrained
     or canonical correspondence analysis, or redundancy analysis.
     Finally, the residual is submitted to ordinary correspondence
     analysis (or principal components analysis).  If both matrices 'Z'
     and 'Y' are missing, the data matrix is analysed by ordinary
     correspondence analysis (or principal components analysis).

     Instead of separate matrices, the model can be defined using a
     model 'formula'.  The left hand side must be the community data
     matrix ('X').  The right hand side defines the constraining model.
     The constraints can contain ordered or unordered factors,
     interactions among variables and functions of variables.  The
     defined 'contrasts' are honoured in 'factor' variables.  The
     formula can include a special term 'Condition' for conditioning
     variables (``covariables'') ``partialled out'' before analysis. 
     So the following commands are equivalent: 'cca(X, y, z)', 'cca(X ~
     y + Condition(z))', where 'y' and 'z' refer to single variable
     constraints and conditions.

     Constrained correspondence analysis is indeed a constrained
     method: CCA does not try to display all variation in the data, but
     only the part that can be explained by the used constraints.
     Consequently, the results are strongly dependent on the set of
     constraints and their transformations or interactions among the
     constraints.  The shotgun method is to use all environmental
     variables as constraints.  However, such exploratory problems are
     better analysed with unconstrained methods such as correspondence
     analysis ('decorana', 'ca') or non-metric multidimensional scaling
     ('isoMDS') and environmental interpretation after analysis
     ('envfit', 'ordisurf'). CCA is a good choice if the user has clear
     and strong _a priori_ hypotheses on constraints and is not
     interested in the major structure in the data set.  

     CCA is able to correct a common curve artefact in correspondence
     analysis by forcing the configuration into linear constraints. 
     However, the curve artefact can be avoided only with a low number
     of constraints that do not have a curvilinear relation with each
     other.  The curve can reappear even with two badly chosen
     constraints or a single factor.  Although the formula interface
     makes easy to include polynomial or interaction terms, such terms
     often allow curve artefact (and are difficult to interpret), and
     should probably be avoided.

     According to folklore, 'rda' should be used with ``short
     gradients'' rather than 'cca'. However, this is not based on
     research which finds methods based on Euclidean metric as
     uniformly weaker than those based on Chi-squared metric.

     Partial CCA (pCCA; or alternatively partial RDA) can be used to
     remove the effect of some conditioning or ``background'' or
     ``random'' variables or ``covariables'' before CCA proper.  In
     fact, pCCA compares models 'cca(X ~ z)' and 'cca(X ~ y + z)' and
     attributes their difference to the effect of 'y' cleansed of the
     effect of 'z'.  Some people have used the method for extracting
     ``components of variance'' in CCA.  However, if the effect of
     variables together is stronger than sum of both separately, this
     can increase total Chi-square after ``partialling out'' some
     variation, and give negative ``components of variance''.  In
     general, such components of ``variance'' are not to be trusted due
     to interactions between two sets of variables.

     The functions have 'summary' and 'plot' methods.  The 'summary'
     method lists all species and site scores, and results may be very
     long.  Palmer (1993) suggested using linear constraints (``LC
     scores'') in ordination diagrams, because these gave better
     results in simulations and site scores (``WA scores'') are a step
     from constrained to unconstrained analysis.  However, McCune
     (1997) showed that noisy environmental variables (and all
     environmental measurements are noisy) destroy ``LC scores''
     whereas ``WA scores'' were little affected.  Therefore the 'plot'
     function uses site scores (``WA scores'') as the default. This is
     consistent with the usage in statistics and other functions in R
     ('lda', 'cancor').

_V_a_l_u_e:

     Function 'cca' returns a huge object of class 'cca', which is
     described separately in 'cca.object'.

     Function 'rda' returns an object of class 'rda' which inherits
     from class 'cca' and is described in 'cca.object'. The scaling
     used in 'rda' scores is desribed in a separate vignette with this
     package.

_A_u_t_h_o_r(_s):

     The responsible author was Jari Oksanen, but the code borrows
     heavily from Dave Roberts (<URL: http://labdsv.nr.usu.edu/>).

_R_e_f_e_r_e_n_c_e_s:

     The original method was by ter Braak, but the current
     implementations follows Legendre and Legendre.

     Legendre, P. and Legendre, L. (1998) _Numerical Ecology_. 2nd
     English ed. Elsevier.

     McCune, B. (1997) Influence of noisy environmental data on
     canonical correspondence analysis. _Ecology_ 78, 2617-2623.

     Palmer, M. W. (1993) Putting things in even better order: The
     advantages of canonical correspondence analysis.  _Ecology_ 74,
     2215-2230. 

     Ter Braak, C. J. F. (1986) Canonical Correspondence Analysis: a
     new eigenvector technique for multivariate direct gradient
     analysis. _Ecology_ 67, 1167-1179.

_S_e_e _A_l_s_o:

     There is a special documentation for 'plot.cca' function with its
     helper functions ('text.cca', 'points.cca', 'scores.cca').
     Function 'anova.cca' provides an ANOVA like permutation test for
     the ``significance'' of constraints. Automatic model building
     (dangerous!) is discussed in 'deviance.cca'.  Diagnostic tools,
     prediction and adding new points in ordination are discussed in
     'goodness.cca' and 'predict.cca'. Functions 'CAIV' (library
     'CoCoAn') and 'cca' (library 'ade4') provide alternative
     implementations of CCA (these are internally quite different).
     Function 'capscale' is a non-Euclidean generalization of 'rda'.

_E_x_a_m_p_l_e_s:

     data(varespec)
     data(varechem)
     ## Common but bad way: use all variables you happen to have in your
     ## environmental data matrix
     vare.cca <- cca(varespec, varechem)
     vare.cca
     plot(vare.cca)
     ## Formula interface and a better model
     vare.cca <- cca(varespec ~ Al + P*(K + Baresoil), data=varechem)
     vare.cca
     plot(vare.cca)
     ## `Partialling out' and `negative components of variance'
     cca(varespec ~ Ca, varechem)
     cca(varespec ~ Ca + Condition(pH), varechem)
     ## RDA
     data(dune)
     data(dune.env)
     dune.Manure <- rda(dune ~ Manure, dune.env)
     plot(dune.Manure) 

