\name{BCI}
\alias{BCI}
\docType{data}
\title{Barro Colorado Island Tree Counts}
\description{
  Tree counts in 1-hectare plots in the Barro Colorado Island.
}
\usage{data(BCI)}
\format{
  A data frame with 50 plots (rows) of 1 hectare with counts of trees on each
  plot with total of 225 species (columns). Full Latin names are used
  for tree species.
 }
\details{
  Data give the numbers of trees at least 10 cm in
  diameter at breast height (1.3 m above the ground) in each one hectare
  square of forest. Within each one hectare square, all individuals of
  all species were tallied and are recorded in this table.

  The data frame contains only the Barro Colorado Island subset of the
  original data.
}
\source{
  \url{http://www.sciencemag.org/cgi/content/full/295/5555/666/DC1}
}
\references{
  Condit, R, Pitman, N, Leigh, E.G., Chave, J., Terborgh, J., Foster,
  R.B., Nuez, P., Aguilar, S., Valencia, R., Villa, G., Muller-Landau,
  H.C., Losos, E. & Hubbell, S.P. (2002). Beta-diversity in tropical
  forest trees. \emph{Science} 295, 666--669.
}
\examples{
data(BCI)
}
\keyword{datasets}

\eof
\name{anosim}
\alias{anosim}
\alias{print.anosim}
\alias{summary.anosim}
\alias{plot.anosim}

\title{ Analysis of Similarities }
\description{
  Analysis of similarities (ANOSIM) provides a way to test statistically
  whether there is a significant difference between two or more groups
  of sampling units.
}
\usage{
anosim(dis, grouping, permutations=1000, strata)
}

\arguments{
  \item{dis}{Dissimilarity matrix.}
  \item{grouping}{Factor for grouping observations.}
  \item{permutations}{Number of permutation to assess the significance
    of the ANOSIM statistic. }
  \item{strata}{An integer vector or factor specifying the strata for
    permutation. If supplied, observations are permuted only within the
    specified strata.}
}
\details{
  Analysis of similarities (ANOSIM) provides a way to test statistically
  whether there is a significant difference between two or more groups
  of sampling units.  Function \code{anosim} operates directly on a
  dissimilarity matrix.  A suitable dissimilarity matrix is produced by
  functions \code{\link[mva]{dist}} or \code{\link{vegdist}}.  The
  method is philosophically allied with NMDS ordination
  (\code{\link[MASS]{isoMDS}}), in that it uses only the rank order of
  dissimilarity values.

  If two groups of sampling units are really different in their species
  composition, then compositional dissimilarities between the groups
  ought to be greater than those within the groups.  The \code{anosim}
  statistic \eqn{R} is based on the difference of mean ranks between
  groups (\eqn{r_B}) and within groups (\eqn{r_W}):

  \deqn{R = (r_B - r_W)/(N/(N-1)/4)}

  The divisor is chosen so that \eqn{R} will be in the interval
  \eqn{-1 \dots +1}, value \eqn{0} indicating completely random
  grouping.

  The statistical significance of observed \eqn{R} is assessed by
  permuting the grouping vector to obtain the empirical
  distribution of \eqn{R} under null-model.

  The function has \code{summary} and \code{plot} methods.  These both
  show valuable information to assess the validity of the method:  The
  function assumes that all ranked dissimilarities within groups 
  have about equal median and range.  The \code{plot} method uses
  \code{\link{boxplot}} with options \code{notch=TRUE} and
  \code{varwidth=TRUE}. 
}
\value{
  The function returs a list of class \code{anosim} with following items:
  \item{call }{Function call.}
  \item{statistic}{The value of ANOSIM statistic \eqn{R}}
  \item{signif}{Significance from permutation.}
  \item{perm}{Permutation values of \eqn{R}}
  \item{class.vec}{Factor with value \code{Between} for dissimilarities
    between classes and class name for corresponding dissimilarity
    within class.}
  \item{dis.rank}{Rank of dissimilarity entry.}
  \item{dissimilarity}{The name of the dissimilarity index: the
    \code{"method"} entry of the \code{dist} object.}
}
\references{
  Clarke, K. R. (1993). Non-parametric multivariate analysis of changes
  in community structure. \emph{Australian Journal of Ecology} 18, 117-143.
}
\author{Jari Oksanen, with a help from Peter R. Minchin.}
\note{
  I don't quite trust this method.  Somebody should study its
  performance carefully.  The function returns a lot of information 
  to ease further scrutiny.

}

\seealso{\code{\link[mva]{dist}} and \code{\link{vegdist}} for obtaining
  dissimilarities, and \code{\link{rank}} for ranking real values.  For
  comparing dissimilarities against continuous variables, see
  \code{\link{mantel}}.  }

\examples{
data(dune)
data(dune.env)
dune.dist <- vegdist(dune)
attach(dune.env)
dune.ano <- anosim(dune.dist, Management)
summary(dune.ano)
plot(dune.ano)
}
\keyword{multivariate }
\keyword{ nonparametric }
\keyword{ htest }

\eof
\name{anova.cca}
\alias{anova.cca}
\alias{permutest.cca}
\alias{print.permutest.cca}
\alias{print.anova.cca}

\title{Permutation Test for Constrained Correspondence Analysis,
  Redundancy Analysis and Constrained Analysis of Principal Coordinates }
\description{
  The function performs an ANOVA like permutation test for Constrained
  Correspondence Analysis (\code{\link{cca}}), Redundancy Analysis
  (\code{\link{rda}}) or Constrained Analysis of Principal Coordinates
  (\code{\link{capscale}}) to assess the significance of constraints.
}
\usage{
\method{anova}{cca}(object, alpha=0.05, beta=0.01, step=100, perm.max=10000, ...)
permutest.cca(x, permutations=100, model=c("reduced","full"), strata)
}

\arguments{
  \item{object,x}{A result object from \code{\link{cca}}. }
  \item{alpha}{Targeted Type I error rate. }
  \item{beta}{Accepted Type II error rate. }
  \item{step}{Number of permutations during one step. }
  \item{perm.max}{Maximum number of permutations. }
  \item{\dots}{Parameters to permutest.cca. }
  \item{permutations}{Number of permutations for assessing significance
    of constraints.}
  \item{model}{Permutation model (partial match).}
  \item{strata}{An integer vector or factor specifying the strata for
    permutation. If supplied, observations are permuted only within the
    specified strata.}
}
\details{
  Functions \code{anova.cca} and \code{permutest.cca} implement an ANOVA
  like permutation test for the joint effect of constraints in
  \code{\link{cca}}, \code{\link{rda}} or \code{\link{capscale}}.
  Functions \code{anova.cca} and \code{permutest.cca} differ in printout
  style and in interface.
  Function \code{permutest.cca} is the proper workhorse, but
  \code{anova.cca} passes all parameters to \code{permutest.cca}.

  In \code{anova.cca} the number of permutations is controlled by
  targeted ``critical'' \eqn{P} value (\code{alpha}) and accepted Type
  II or rejection error (\code{beta}).  If the results of permutations
  differ from the targeted \code{alpha} at risk level given by
  \code{beta}, the permutations are
  terminated.  If the current estimate of \eqn{P} does not
  differ significantly from \code{alpha} of the alternative hypothesis,
  the permutations are
  continued with \code{step} new permutations.  
  
  The function \code{permutest.cca} implements a permutation test for
  the ``significance'' of constraints in \code{\link{cca}},
  \code{\link{rda}} or \code{\link{capscale}}.  Residuals after
  partial CCA/RDA/CAP are permuted with choice \code{model = "reduced"},
  and residuals after CCA/RDA/CAP under choice \code{model = "full"}.
  If there is no partial CCA/RDA/CAP stage, the former simply permutes
  the data. The test statistic is ``pseudo-\eqn{F}'', which is the ratio
  of constrained and unconstrained total Inertia (Chi-squares, variances
  or something similar), each divided by their respective ranks.  In plain
  CCA/RDA/CAP under \code{reduced} model, the community data is permuted, and
  the sum of all eigenvalues
  remains constant, so that pseudo-\eqn{F} and eigenvalues would give
  equal results.  In partial CCA/RDA/CAP, the effect of conditioning variables
  (``covariables'') is removed before permutation, and these residuals
  are added to the non-permuted fitted values of partial CCA (fitted
  values of \code{X ~ Z}).  Consequently, the total Chi-square is not
  fixed, and test based on pseudo-\eqn{F} would differ from the test based on
  plain eigenvalues.
}
\value{
  Function \code{permutest.cca} returns an object of class
  \code{permutest.cca} which has its own \code{print} method.  The
  function \code{anova.cca} calls \code{permutest.cca}, fills an
  \code{\link{anova}} table and uses \code{\link{print.anova}} for printing.
}
\references{
  Legendre, P. and Legendre, L. (1998). \emph{Numerical Ecology}. 2nd English
  ed. Elsevier.
}
\author{Jari  Oksanen}
\seealso{\code{\link{cca}}, \code{\link{rda}}, \code{\link{capscale}}. }

\examples{
data(varespec)
data(varechem)
vare.cca <- cca(varespec ~ Al + P + K, varechem)
anova(vare.cca)
permutest.cca(vare.cca)
## Test for adding variable N to the previous model:
anova(cca(varespec ~ N + Condition(Al + P + K), varechem), step=40)
}
\keyword{ multivariate }
\keyword{ htest }


\eof
\name{bioenv}
\alias{bioenv}
\alias{bioenv.default}
\alias{bioenv.formula}
\alias{print.bioenv}
\alias{summary.bioenv}
\alias{print.summary.bioenv}
\alias{ripley.subsets}
\alias{ripley.subs}

\title{Best subset of environmental variables with
  maximum (rank) correlation with community dissimilarities. }
\description{
  Function finds the best subset of environmental variables, so that
  the Euclidean distances of scaled environmental variables have the
  highest (rank) correlation with community dissimilarities.  
}
\usage{
\method{bioenv}{default}(comm, env, method = "spearman", index = "bray",
upto = ncol(env), ...)
\method{bioenv}{formula}(formula, data, ...)
}

\arguments{
  \item{comm}{Community data frame. }
  \item{env}{Data frame of continuous environmental variables. }
  \item{method}{The correlation method used in \code{\link{cor.test}}.}
  \item{index}{The dissimilarity index used for community data in
    \code{\link{vegdist}}. }
  \item{upto}{Maximum number of parameters in studied subsets.}
  \item{formula, data}{Model \code{\link{formula}} and data.}
  \item{...}{Other parameters passed to function.}
}
\details{
  The function calculates a community dissimilarity matrix using
  \code{\link{vegdist}}.  Then it selects all possible subsets of
  environmental variables, \code{\link{scale}}s the variables, and
  calculates Euclidean distances for this subset using
  \code{\link{dist}}.  Then it finds the correlation between community
  dissimilarities and environmental distances, and for each size of
  subsets, saves the best result. 
  There are \eqn{2^p-1} subsets of \eqn{p} variables, and exhaustive
  search may take a very, very, very long time (parameter \code{upto} offers a
  partial relief). 

  The function can be called with a model \code{\link{formula}} where
  the LHS is the data matrix and RHS lists the environmental variables.
  The formula interface is practical in selecting or transforming
  environmental variables.

  Clarke & Ainsworth (1993) suggested this method to be used for
  selecting the best subset of environmental variables in interpreting
  results of nonmetric multidimensional scaling (NMDS). They recommended a
  parallel display of NMDS of the community dissimilarities and NMDS of
  Euclidean distances for the best subset of scaled environmental
  variables.  They warned against the use of Procrustes analysis, but
  to me this looks like a good way of comparing these two ordinations.

  Clarke & Ainsworth wrote a computer program BIO-ENV giving the name to
  the current function. Presumably BIO-ENV
  was later incorporated in Clarke's PRIMER software (available for
  Windows).  In addition, Clarke & Ainsworth suggested a novel method of
  rank correlation which is not available in the current function.
}
\value{
  The function returns an object of class \code{bioenv} with a
  \code{summary} method.
}
\references{
  Clarke, K. R & Ainsworth, M. 1993. A method of linking multivariate
  community structure to environmental variables. \emph{Marine Ecology
    Progress Series}, 92, 205--219.
}
\author{ Jari Oksanen. The code for selecting all possible subsets was
  posted to the R mailing list by Prof. B. D. Ripley in 1999. }
\note{
  Function \code{\link{cor.test}} will give harmless warnings on ties.
}

\seealso{\code{\link{vegdist}},
  \code{\link{dist}}, \code{\link{cor.test}} for underlying routines,
  \code{\link[MASS]{isoMDS}} for ordination, \code{\link{procrustes}}
  for Procrustes analysis, \code{\link{protest}} for an alternative, and
  \code{\link{rankindex}} for studying alternatives to the default
  Bray-Curtis index.}

\examples{
# The method is very slow for large number of possible subsets.
# Therefore only 6 variables in this example.
data(varespec)
data(varechem)
sol <- bioenv(wisconsin(varespec) ~ log(N) + P + K + Ca + pH + Al, varechem)
sol
summary(sol)
}
\keyword{ multivariate }


\eof
\name{capscale}
\alias{capscale}

\title{[Partial] Constrained Analysis of Principal Coordinates }
\description{
  Constrained Analysis of Principal Coordinates (CAP) is an ordination method
  similar to Redundancy Analysis (\code{\link{rda}}), but it allows
  non-Euclidean dissimilarity indices, such as Manhattan or
  Bray--Curtis distance. Despite this non-Euclidean feature, the analysis
  is strictly linear and metric. If called with Euclidean distance,
  the results are identical to \code{\link{rda}}, but \code{capscale}
  will be much more inefficient. Function \code{capscale} may be useful
  with other  dissimilarity measures, since Euclidean distances inherent in
  \code{\link{rda}} are generally poor with community data
}
\usage{
capscale(formula, data, distance = "euclidean", comm = NULL, ...)
}

\arguments{
  \item{formula}{Model formula. The function can be called only with the
  formula interface. Most usual features of \code{\link{formula}} hold,
  especially as defined in \code{\link{cca}} and \code{\link{rda}}. The
  LHS must be either a community data matrix or a dissimilarity matrix,
  e.g., from
  \code{\link{vegdist}} or \code{\link[mva]{dist}}.
  If the LHS is a data matrix, function \code{\link{vegdist}}
  will be used to find the dissimilarities. RHS defines the constraints.
  The constraints can be continuous or factors, they can be transformed
  within the formula, and they can have interactions as in typical
  \code{\link{formula}}. The RHS can have a special term \code{Condition}
  that defines variables ``partialled out'' before constraints, just like
  in \code{\link{rda}} or \code{\link{cca}}. This allows the use of
  partial CAP.}
\item{data}{ Data frame containing the variables on the right hand side of
  the model formula. }
  \item{distance}{Dissimilarity (or distance) index  in
    \code{\link{vegdist}} used if the LHS of the \code{formula} is a
    data frame instead of dissimilarity matrix. }
  \item{comm}{ Community data frame which will be used for finding
    species scores when the LHS of the \code{formula} was a
    dissimilarity matrix. This is not used if the LHS is a data
    frame. If this is not supplied, the ``species scores'' are the axes
    of initial metric scaling (\code{\link[mva]{cmdscale}}) and may be confusing.}
  \item{\dots}{Other parameters passed to \code{\link{rda}}. }
}
\details{
  The Canonical Analysis of Principal Coordinates (CAP) is simply a
  Redundancy Analysis of results of Metric (Classical) Multidimensional
  Scaling (Anderson & Willis 2003). Function capscale uses two steps:
  (1) it ordinates the dissimilarity matrix using
  \code{\link[mva]{cmdscale}} and (2) analyses these results using
  \code{\link{rda}}. If the user supplied a community data frame instead
  of dissimilarities, the function will find the needed dissimilarity
  matrix using \code{\link{vegdist}} with specified
  \code{distance}. However, the method will accept dissimilarity
  matrices from \code{\link{vegdist}}, \code{\link[mva]{dist}}, or any
  other method producing similar matrices. The constraining variables can be
  continuous or factors or both, they can have interaction terms,
  or they can be transformed in the call. Moreover, there can be a
  special term
  \code{Condition} just like in \code{\link{rda}} and \code{\link{cca}}
  so that ``partial'' CAP can be performed.

  The current implementation  differs from the method suggested by
  Anderson & Willis (2003) in three major points:
  \enumerate{
    \item Anderson & Willis used orthonormal solution of
    \code{\link[mva]{cmdscale}}, whereas \code{capscale} uses axes
    weighted by corresponding eigenvalues, so that the ordination
    distances are best approximations of original dissimilarities. In
    the original method, later ``noise'' axes are just as important as
    first major axes.
    \item Anderson & Willis take only a subset of axes, whereas 
    \code{capscale} uses all axes with positive eigenvalues. The use of
    subset is necessary with orthonormal axes to chop off some
    ``noise'', but the use of all axes guarantees that the results are
    the best approximation of original dissimilarities.
    \item Function \code{capscale} adds species scores as weighted sums
    of (residual) community matrix (if the matrix is available), whereas
    Anderson & Willis have no fixed method for adding species scores.
  }
  With these definitions, function \code{capscale} with Euclidean
  distances will be identical to \code{\link{rda}} in eigenvalues and
  in site, species and biplot scores (except for possible sign
  reversal). 
  However, it makes no sense to use \code{capscale} with
  Euclidean distances, since direct use of \code{\link{rda}} is much more
  efficient. Even with non-Euclidean dissimilarities, the
  rest of the analysis will be metric and linear.
  
}
\value{
  The function returns an object of class \code{capscale} which is
  identical to the result of \code{\link{rda}}. At the moment,
  \code{capscale} does not have specific methods, but it uses
  \code{\link{cca}} and \code{\link{rda}} methods
  \code{\link{plot.cca}}, \code{\link{summary.rda}} etc. Moreover, you
  can use \code{\link{anova.cca}} for permutation tests of
  ``significance'' of the results.
}
\references{
  Anderson, M.J. & Willis, T.J. (2003). Canonical analysis of principal
  coordinates: a useful method of constrained ordination for
  ecology. \emph{Ecology} 84, 511--525.
}
\author{ Jari Oksanen }
\note{ 
  Function \code{\link{rda}} usually divides the ordination results by
  number of
  sites minus one. In this way, the inertia is variance instead of sum
  of squares, and the eigenvalues sum up to variance. Many
  dissimilarity measures are in the range 0 to 1, so they have already
  made a similar division. If the largest original dissimilarity is less or
  equal to 4
  (allowing for \code{\link{stepacross}}), this division is undone in
  \code{capscale} and original dissimilarities are used. The inertia is
  called as \code{squared dissimilarity}
  (as defined in the dissimilarity matrix), but keyword
  \code{mean} is added to the inertia in cases where division was
  made, e.g. in Euclidean and Manhattan distances. 
}


\seealso{\code{\link{rda}}, \code{\link{cca}}, \code{\link{plot.cca}},
  \code{\link{anova.cca}}, \code{\link{vegdist}},
  \code{\link[mva]{dist}}, \code{\link[mva]{cmdscale}}.}
\examples{
data(varespec)
data(varechem)
vare.cap <- capscale(varespec ~ N + P + K + Condition(Al), varechem, dist="bray")
vare.cap
plot(vare.cap)
anova(vare.cap)
}
\keyword{ multivariate }


\eof
\name{cca}
\alias{cca}
\alias{cca.default}
\alias{cca.formula}
\alias{print.cca}
\alias{summary.cca}
\alias{print.summary.cca}
\alias{rda}
\alias{rda.default}
\alias{rda.formula}
\alias{summary.rda}

\title{ [Partial] [Constrained] Correspondence Analysis and Redundancy
  Analysis } 
\description{
  Function \code{cca} performs correspondence analysis, or optionally
  constrained correspondence analysis (a.k.a. canonical correspondence
  analysis), or optionally partial constrained correspondence
  analysis. Function \code{rda} performs redundancy analysis, or
  optionally principal components analysis.
  These are all very popular ordination techniques in community ecology.
}
\usage{
\method{cca}{formula}(formula, data)
\method{cca}{default}(X, Y, Z, ...)
\method{rda}{formula}(formula, data, scale=FALSE)
\method{rda}{default}(X, Y, Z, scale=FALSE, ...)
\method{summary}{cca}(object, scaling=2, axes=6, digits, ...)
}

\arguments{
  \item{formula}{Model formula, where the left hand side gives the
    community data matrix, right hand side gives the constraining variables,
    and conditioning variables can be given within a special function
    \code{Condition}.}
  \item{data}{Data frame containing the variables on the right hand side
    of the model formula.}
  \item{X}{ Community data matrix. }
  \item{Y}{ Constraining matrix, typically of environmental variables.
    Can be missing. }
  \item{Z}{ Conditioning matrix, the effect of which is removed
    (`partialled out') before next step. Can be missing.}
  \item{object}{A \code{cca} result object.}
  \item{scaling}{Scaling for species and site scores. Either species
    (\code{2}) or site (\code{1}) scores are scaled by eigenvalues, and
    the other set of scores is left unscaled, or with \code{3} both are
    scaled symmetrically by square root of eigenvalues.  Corresponding
    negative values can be used in \code{cca} to additionally multiply
    results with \eqn{\sqrt(1/(1-\lambda))}.  This scaling is known with
    a misleading name of Hill scaling (although it has nothing to do
    with Hill's rescaling of \code{\link{decorana}}).  Negative values
    are not recognized for results of \code{rda}.
  }
  \item{axes}{Number of axes in summaries.}
  \item{digits}{Number of digits in output.}
  \item{scale}{Scale species to unit variance (like correlations do).}
  \item{...}{Other parameters for \code{print} or \code{plot} functions.}
}
\details{
  Since their introduction (ter Braak 1986), constrained or canonical
  correspondence analysis, and its spin-off, redundancy analysis have
  been the most popular ordination methods in community ecology.
  Functions \code{cca} and \code{rda} are  similar to popular
  proprietary software \code{Canoco}, although implementation is
  completely different.  The functions are based on Legendre &
  Legendre's (1998) algorithm: in \code{cca}
  Chi-square transformed data matrix is subjected to weighted linear
  regression on constraining variables, and the fitted values are
  submitted to correspondence analysis performed via singular value
  decomposition (\code{\link{svd}}). Function \code{rda} is similar, but uses
  ordinary, unweighted linear regression and unweighted SVD.

  The functions can be called either with matrix entries for community
  data and constraints, or with formula interface.  In general, the
  formula interface is preferred, because it allows a better control of
  the model and allows factor constraints.

  In matrix interface, the
  community data matrix \code{X} must be given, but any other data
  matrix can be omitted, and the corresponding stage of analysis is
  skipped.  If matrix \code{Z} is supplied, its effects are removed from
  the community matrix, and the residual matrix is submitted to the next
  stage.  This is called `partial' correspondence or redundancy
  analysis.  If matrix
  \code{Y} is supplied, it is used to constrain the ordination,
  resulting in constrained or canonical correspondence analysis, or
  redundancy analysis.
  Finally, the residual is submitted to ordinary correspondence
  analysis (or principal components analysis).  If both matrices
  \code{Z} and \code{Y} are missing, the
  data matrix is analysed by ordinary correspondence analysis (or
  principal components analysis).

  Instead of separate matrices, the model can be defined using a model
  \code{\link{formula}}.  The left hand side must be the
  community data matrix (\code{X}).  The right hand side defines the
  constraining model.
  The constraints can contain ordered or unordered factors,
  interactions among variables and functions of variables.  The defined
  \code{\link{contrasts}} are honoured in \code{\link{factor}}
  variables.  The formula can include a special term \code{Condition}
  for conditioning variables (``covariables'') ``partialled out'' before
  analysis.  So the following commands are equivalent: \code{cca(X, y,
    z)}, \code{cca(X ~ y + Condition(z))}, where \code{y} and \code{z}
  refer to single variable constraints and conditions.  

  Constrained correspondence analysis is indeed a constrained method:
  CCA does not try to display all variation in the
  data, but only the part that can be explained by the used constraints.
  Consequently, the results are strongly dependent on the set of
  constraints and their transformations or interactions among the
  constraints.  The shotgun method is to use all environmental variables
  as constraints.  However, such exploratory problems are better
  analysed with
  unconstrained methods such as correspondence analysis
  (\code{\link{decorana}}, \code{\link[mva]{ca}}) or non-metric
  multidimensional scaling (\code{\link[MASS]{isoMDS}}) and
  environmental interpretation after analysis
  (\code{\link{envfit}}, \code{\link{ordisurf}}).
  CCA is a good choice if the user has
  clear and strong \emph{a priori} hypotheses on constraints and is not
  interested in the major structure in the data set.  

  CCA is able to correct a common
  curve artefact in correspondence analysis by
  forcing the configuration into linear constraints.  However, the curve
  artefact can be avoided only with a low number of constraints that do
  not have a curvilinear relation with each other.  The curve can reappear
  even with two badly chosen constraints or a single factor.  Although
  the formula
  interface makes easy to include polynomial or interaction terms, such
  terms often allow curve artefact (and are difficult to interpret), and
  should probably be avoided.

  According to folklore, \code{rda} should be used with ``short
  gradients'' rather than \code{cca}. However, this is not based
  on research which finds methods based on Euclidean metric as uniformly
  weaker than those based on Chi-squared metric.
  
  Partial CCA (pCCA; or alternatively partial RDA) can be used to remove
  the effect of some
  conditioning or ``background'' or ``random'' variables or
  ``covariables'' before CCA proper.  In fact, pCCA compares models
  \code{cca(X ~ z)} and \code{cca(X ~ y + z)} and attributes their
  difference to the effect of \code{y} cleansed of the effect of
  \code{z}.  Some people have used the method for extracting
  ``components of variance'' in CCA.  However, if the effect of
  variables together is stronger than sum of both separately, this can
  increase total Chi-square after ``partialling out'' some
  variation, and give negative ``components of variance''.  In general,
  such components of ``variance'' are not to be trusted due to
  interactions between two sets of variables.
  
  The functions have \code{summary} and \code{plot} methods.  The
  \code{summary} method lists all species and site scores, and results
  may be very long.  Palmer (1993) suggested using linear constraints
  (``LC scores'') in ordination diagrams, because these gave better
  results in simulations and site scores (``WA scores'') are a step from
  constrained to unconstrained analysis.  However, McCune (1997) showed
  that noisy environmental variables (and all environmental
  measurements are noisy) destroy ``LC scores'' whereas ``WA scores''
  were little affected.  Therefore the \code{plot} function uses site
  scores (``WA scores'') as the default. This is consistent with the
  usage in statistics and other functions in \R
  (\code{\link[MASS]{lda}}, \code{\link[mva]{cancor}}).
}
\value{
  Function \code{cca} returns a big object of class \code{cca}.  It has
  as elements
  separate lists for pCCA, CCA and CA.  These lists have information on
  total Chi-square and estimated rank of the stage.  Lists \code{CCA}
  and \code{CA} contain scores for species (\code{v}) and sites
  (\code{u}).  These site scores are linear constraints in \code{CCA} and
  weighted averages in \code{CA}.  In addition, list \code{CCA} has
  item \code{wa} for site scores and \code{biplot} for endpoints of
  biplot arrows.  All these scores are unscaled (actually, their
  weighted sum of squares is one). The result object can be
  accessed with functions \code{summary} and \code{\link{scores.cca}}
  which
  know how to scale the results for display.  The traditional 
  alternative in correspondence analysis (see \code{\link{decorana}})
  was to scale sites by
  eigenvalues and leave species unscaled (\code{scaling=1}),
  so that configuration of sites
  would reflect the real structure in data (longer axes for higher
  eigenvalues).  Species scores would not reflect axis lengths, and they
  would have larger variation than species scores, which was motivated
  by some species having their optima outside studied range.  Later the
  common practice was to leave sites unscaled (\code{scaling=2}),
  so that they would have a better relation with biplot arrows.

  Function \code{rda} returns an object of class \code{rda} which
  inherits from class \code{cca}.  Function \code{rda} is really only a
  spin-off from \code{cca}, and the object uses the same item names as
  \code{cca}, which are misleading in this case. The only specific
  function is \code{summary.rda}, but otherwise the object \code{rda} is
  accessed with \code{cca} methods (\code{print}, \code{\link{plot.cca}},
  \code{\link{scores.cca}}, \code{\link{anova.cca}}). The analysis
  stores unscaled
  results names similarly as in \code{cca}, but \code{summary.rda} (and
  hence \code{plot} and \code{scores} functions) scales these so that
  site and species scores are similarly scaled to each other as in
  \code{Canoco}. However, the \code{summary.rda} scores differ from
  \code{Canoco} by a constant multiplier so that they define a real
  biplot or an approximation of the data.
}
\references{ The original method was by ter Braak, but the current
  implementations follows Legendre and Legendre.

  Legendre, P. and Legendre, L. (1998) \emph{Numerical Ecology}. 2nd English
  ed. Elsevier.

  McCune, B. (1997) Influence of noisy environmental data on canonical
  correspondence analysis. \emph{Ecology} 78, 2617-2623.
  
  Palmer, M. W. (1993) Putting things in even better order: The
  advantages of canonical correspondence analysis.  \emph{Ecology} 74,
  2215-2230. 
  
  Ter Braak, C. J. F. (1986) Canonical Correspondence Analysis: a new
  eigenvector technique for multivariate direct gradient
  analysis. \emph{Ecology} 67, 1167-1179.
  
}
\author{
  The responsible author was Jari Oksanen, but the code borrows heavily
  from Dave Roberts (\url{http://labdsv.nr.usu.edu/}).
}

\seealso{
  There is a special documentation for \code{\link{plot.cca}} function
  with its helper functions (\code{\link{text.cca}},
  \code{\link{points.cca}}, \code{\link{scores.cca}}).
  Function \code{\link{anova.cca}} provides an ANOVA like permutation
  test for the ``significance'' of constraints.
  Functions \code{\link[CoCoAn]{CAIV}} (library \code{CoCoAn}) and
  \code{\link[ade4]{cca}} (library \code{ade4}) provide an alternative
  implementations of CCA (these are internally quite
  different). Function \code{\link{capscale}} is a non-Euclidean generalization of
  \code{rda}.
}

\examples{
data(varespec)
data(varechem)
## Common but bad way: use all variables you happen to have in your
## environmental data matrix
vare.cca <- cca(varespec, varechem)
vare.cca
plot(vare.cca)
## Formula interface and a better model
vare.cca <- cca(varespec ~ Al + P*(K + Baresoil), data=varechem)
vare.cca
plot(vare.cca)
## `Partialling out' and `negative components of variance'
cca(varespec ~ Ca, varechem)
cca(varespec ~ Ca + Condition(pH), varechem)
## RDA
data(dune)
data(dune.env)
dune.Manure <- rda(dune ~ Manure, dune.env)
plot(dune.Manure) 
}
\keyword{ multivariate }


\eof
\name{decorana}
\alias{decorana}
\alias{print.decorana}
\alias{summary.decorana}
\alias{print.summary.decorana}
\alias{plot.decorana}
\alias{downweight}
\alias{scores.decorana}

\title{Detrended Correspondence Analysis and Basic Reciprocal Averaging }
\description{
  Performs detrended correspondence analysis and basic reciprocal
  averaging or orthogonal correspondence analysis.
}
\usage{
decorana(veg, iweigh=0, iresc=4, ira=0, mk=26, short=0, before=NULL,
         after=NULL)
\method{plot}{decorana}(x, choices=c(1,2), origin=TRUE, display=c("both","sites","species","none"),
     cex = 0.8, cols = c(1,2), type, ...)
\method{summary}{decorana}(object, digits=3, origin=TRUE,
     display=c("both", "species","sites","none"), ...)
downweight(veg, fraction = 5)
\method{scores}{decorana}(x, display=c("sites","species"), choices =1:4, origin=TRUE, ...)
}

\arguments{
  \item{veg}{Community data matrix. }
  \item{iweigh}{Downweighting of rare species (0: no). }
  \item{iresc}{Number of rescaling cycles (0: no rescaling). }
  \item{ira}{Type of analysis (0: detrended, 1: basic reciprocal averaging). }
  \item{mk}{Number of segments in rescaling. }
  \item{short}{Shortest gradient to be rescaled. }
  \item{before}{Hill's piecewise transformation: values before transformation. }
  \item{after}{Hill's piecewise transformation: values after
  transformation -- these must correspond to values in \code{before}.}
  \item{x, object}{A \code{decorana} result object.}
  \item{choices}{Axes shown.}
  \item{origin}{Use true origin even in detrended correspondence analysis.}
  \item{display}{Display only sites, only species, both or neither.}
  \item{cex}{Plot character size.}
  \item{cols}{Colours used for sites and species.}
  \item{type}{Type of plots, partial match to \code{"text"},
    \code{"points"} or \code{"none"}.} 
  \item{digits}{Number of digits in summary output.}
  \item{fraction}{Abundance fraction where downweighting begins.}
  \item{...}{Other parameters for \code{plot} function.}
  }
}
\details{
  In late 1970s, correspondence analysis became the method of choice for
  ordination in vegetation science, since it seemed to be able to cope
  with non-linear species responses better than principal components
  analysis.  However, even correspondence analysis produced arc-shaped
  configuration of a single gradient.  Mark Hill developed
  detrended correspondence analysis to correct two assumed `faults' in
  correspondence analysis: curvature of straight gradients and packing
  of sites at the ends of the gradient.  

  The curvature is removed by replacing the orthogonalization of axes
  with detrending.  In orthogonalization the successive axes are made
  non-correlated, but detrending should remove all systematic dependence
  between axes.  Detrending is made using a five-segment smoothing
  window with weights (1,2,3,2,1) on \code{mk} segments -- which indeed
  is more robust than the suggested alternative of detrending by
  polynomials. The
  packing of sites at the ends of the gradient is undone by rescaling
  the axes after extraction.  After rescaling, the axis is supposed to be
  scaled by `SD' units, so that the average width of Gaussian species
  responses is supposed to be one over whole axis.  Other innovations
  were the piecewise linear transformation of species abundances and
  downweighting of rare species which were regarded to have an
  unduly high influence on ordination axes.  

  It seems that detrending works actually by twisting the ordination
  space, so that the results look non-curved in two-dimensional projections
  (`lolly paper effect').  As a result, the points have usually an
  easily recognized triangle or diamond shaped pattern, obviously as a
  detrending artefact.  Rescaling works differently than commonly
  presented, too.  \code{Decorana} does not use, or even evaluate, the
  widths of species responses.  Instead, it tries to equalize the
  weighted variance of species scores on axis segments (parameter
  \code{mk} has only a small effect, since \code{decorana} finds the
  segment number from the current estimate of axis length).  This 
  equalizes response widths only for the idealized species packing
  model, where all species initially have unit width responses and
  equally spaced modes.

  Function \code{summary} prints the ordination scores,
  possible prior weights used in downweighting, and the marginal totals
  after applying these weights. Function \code{plot} plots
  species and site scores.  Classical \code{decorana} scaled the axes
  so that smallest site score was 0 (and smallest species score was
  negative), but \code{summary}, \code{plot} and
  \code{scores}  use the true origin, unless \code{origin = FALSE}.

  In addition to proper eigenvalues, the function also reports `decorana
  values' in detrended analysis. These are the values that the legacy
  code of \code{decorana} returns as `eigenvalues'.
  They are estimated internally during
  iteration, and it seems that detrending interferes the estimation 
  so that these values are generally too low and have unclear
  interpretation. Moreover, `decorana values' are estimated before
  rescaling which will change the eigenvalues. The
  proper eigenvalues are estimated after extraction of the axes and
  they are always the ratio of biased weighted variances of site and
  species scores even in detrended and rescaled solutions. The
  `decorana values' are provided only for the the compatibility with
  legacy software, and they should not be used.
}
\value{
  Function returns an object of class \code{decorana}, which has
  \code{print}, \code{summary} and \code{plot} methods.
}
\references{
  Hill, M.O. and Gauch, H.G. (1980). Detrended correspondence analysis:
  an improved ordination technique. \emph{Vegetatio} 42, 47--58.

  Oksanen, J. and Minchin, P.R. (1997). Instability of ordination
  results under changes in input data order: explanations and
  remedies. \emph{Journal of Vegetation Science} 8, 447--454.
}
\author{Mark O. Hill wrote the original Fortran code, \R port was by Jari
  Oksanen.  }
\note{
  Function \code{decorana} uses the central numerical engine of the
  original Fortran code (which is in public domain), or about 1/3 of the
  original program.  I have tried to implement the original behaviour,
  although a great part of preparatory steps were written in \R
  language, and may differ somewhat from the original code. However,
  well-known bugs are corrected and strict criteria used (Oksanen &
  Minchin 1997).

  Please
  note that there really is no need for piecewise transformation or even
  downweighting within \code{decorana}, since there are more powerful
  and extensive alternatives in \R, but these options are included for
  compliance with the original software.  If different fraction of
  abundance is needed in downweighting, function \code{downweight} must
  be applied before \code{decorana}.  Function \code{downweight} 
  indeed can be applied prior to correspondence analysis, and so it can be
  used together with \code{\link{cca}}, \code{\link[CoCoAn]{CAIV}} and
  \code{\link[multiv]{ca}} as well.

  The function finds only four axes: this is not easily changed.
}


 \seealso{
   For unconstrained ordination, non-metric multidimensional scaling in
   \code{\link[MASS]{isoMDS}} may be more robust.  Constrained (or
   `canonical') correspondence analysis can be made with
   \code{\link{cca}}.  Orthogonal correspondence analysis can be
   made with \code{\link[multiv]{ca}}, or with \code{decorana} or
   \code{\link{cca}}, but the scaling of results vary.  
 }

\examples{
data(varespec)
vare.dca <- decorana(varespec)
vare.dca
summary(vare.dca)
plot(vare.dca)
### the detrending rationale:
gaussresp <- function(x,u) exp(-(x-u)^2/2)
x <- seq(0,6,length=15) ## The gradient
u <- seq(-2,8,len=23)   ## The optima
pack <- outer(x,u,gaussresp)
matplot(x, pack, type="l", main="Species packing")
library(mva)    ## prcomp
opar <- par(mfrow=c(2,2))
plot(scores(prcomp(pack)), asp=1, type="b", main="PCA")
plot(scores(decorana(pack, ira=1)), asp=1, type="b", main="CA")
plot(scores(decorana(pack)), asp=1, type="b", main="DCA")
plot(scores(cca(pack ~ x), dis="sites"), asp=1, type="b", main="CCA")
### Let's add some noise:
noisy <- (0.5 + runif(length(pack)))*pack
par(mfrow=c(2,1))
matplot(x, pack, type="l", main="Ideal model")
matplot(x, noisy, type="l", main="Noisy model")
par(mfrow=c(2,2))
plot(scores(prcomp(noisy)), type="b", main="PCA", asp=1)
plot(scores(decorana(noisy, ira=1)), type="b", main="CA", asp=1)
plot(scores(decorana(noisy)), type="b", main="DCA", asp=1)
plot(scores(cca(noisy ~ x), dis="sites"), asp=1, type="b", main="CCA")
par(opar)
  }
\keyword{ multivariate }


\eof
\name{decostand}
\alias{decostand}
\alias{wisconsin}

\title{Standardizaton Methods for Community Ecology}
\description{
The function provides some popular (and effective) standardization
methods for community ecologists.
}
\usage{
decostand(x, method, MARGIN)
wisconsin(x)
}

\arguments{
  \item{x}{Community data matrix.}
  \item{method}{Standardization method.}
  \item{MARGIN}{Margin, if default is not acceptable.}
}
\details{
  The function offers following standardization methods for community
  data:
  \itemize{
    \item \code{total}: divide by margin total (default \code{MARGIN = 1}).
    \item \code{max}: divide by margin maximum (default \code{MARGIN = 2}).
    \item \code{freq}: divide by margin maximum and multiply by number of
    non-zero items, so that the average of non-zero entries is one
    (Oksanen 1983; default \code{MARGIN = 2}).
    \item \code{normalize}: make margin sum of squares equal to one (default
    \code{MARGIN = 1}).
    \item \code{range}: standardize values into range 0 \dots 1 (default
    \code{MARGIN = 2}).
    \item \code{standardize}: scale into zero mean and unit variance
    (default \code{MARGIN = 2}).
    \item \code{pa}: scale into presence/absence scale (0/1).
    \item \code{chi.square}: divide by row sums and square root of
    column sums, and adjust for square root of matrix total
    (Legendre & Gallagher 2001). When used with Euclidean
    distance, the matrix should be similar to the  the
    Chi-square distance used in correspondence analysis. However, the
    results from \code{\link[mva]{cmdscale}} would still differ, since
    CA is a weighted ordination method (default \code{MARGIN =
      1}).
  }
  Standardization, as contrasted to transformation, means that the
  entries are transformed relative to other entries.

  All methods have a default margin.  \code{MARGIN=1} means rows (sites
  in a
  normal data set) and \code{MARGIN=2} means columns (species in a normal
  data set).

  Command \code{wisconsin} is a shortcut to common Wisconsin double
  standardization where species (\code{MARGIN=2}) are first standardized
  by maxima (\code{max}) and then sites (\code{MARGIN=1}) by
  site totals (\code{tot}). 
}
\value{
  Returns the standardized data frame.
}
\author{Jari Oksanen}
\note{Common transformations can be made with standard \R functions.}

\references{
  Legendre, P. & Gallagher, E.D. (2001) Ecologically meaningful
  transformations for ordination of species data. \emph{Oecologia} 129:
  271--280.

  Oksanen, J. (1983) Ordination of boreal heath-like vegetation with
  principal component analysis, correspondence analysis and
  multidimensional scaling. \emph{Vegetatio} 52, 181--189.
  }

\examples{
data(varespec)
sptrans <- decostand(varespec, "max")
apply(sptrans, 2, max)
sptrans <- wisconsin(varespec)
# Chi-square: Similar but not identical to Correspondence Analysis.
sptrans <- decostand(varespec, "chi.square")
plot(procrustes(rda(sptrans), cca(varespec)))
# Hellinger transformation (Legendre & Callagher 2001):
sptrans <- sqrt(decostand(varespec, "total"))
}
\keyword{ multivariate}%-- one or more ...
\keyword{ manip }


\eof
\name{deviance.cca}
\alias{deviance.cca}
\alias{deviance.rda}
\alias{deviance.capscale}
\alias{extractAIC.cca}
\title{ Statistics Resembling Deviance and AIC for Constrained Ordination}
\description{
  The functions extract statistics that resemble deviance and AIC from the
  result of constrained correspondence analysis \code{\link{cca}} or
  redundancy analysis \code{\link{rda}}.  These functions are rarely
  needed directly, but they are called by \code{\link{step}} in
  automatic model building.  Actually, \code{\link{cca}} and
  \code{\link{rda}} do not have \code{\link{AIC}} and these functions
  are certainly wrong.
}
\usage{
\method{deviance}{cca}(object, ...)
\method{extractAIC}{cca}(fit, scale = 0, k = 2, ...)
}

\arguments{
  \item{object}{the result of a constrained ordination
    (\code{\link{cca}} or \code{\link{rda}}). }
  \item{fit}{fitted model from constrained ordination.}
  \item{scale}{optional numeric specifying the scale parameter of the model,
    see \code{scale} in \code{\link{step}}.}
  \item{k}{numeric specifying the "weight" of the \emph{equivalent degrees of
    freedom} (=:\code{edf}) part in the AIC formula.}
  \item{\dots}{further arguments. }
}
\details{
  The functions find statistics that
  resemble \code{\link{deviance}} and \code{\link{AIC}} in constrained
  ordination.  Actually,
  constrained ordination methods do not have log-Likelihood, which means
  that they cannot have AIC and deviance.  Therefore you should not use
  these functions, and if you use them, you should not trust them.  If
  you use these functions, it remains as your responsibility to check
  the adequacy of the result.

  The deviance of \code{\link{cca}} is equal to Chi-square of
  the residual data matrix after fitting the constraints.  The deviance of
  \code{\link{rda}} is defined as the residual sum of squares.
  The
  deviance of \code{\link{capscale}} is \code{NA}.
  Function \code{extractAIC} mimics
  \code{extractAIC.lm} in translating deviance to AIC.

  There is little need to call these functions directly.  However, they
  are called implicitly in \code{\link{step}} function used in automatic
  selection of constraining variables.  You should check the
  resulting model with some other criteria, because the statistics used
  here are unfounded. In particular, the penalty \code{k} is not properly
  defined, and the default \code{k = 2} is not justified
  theoretically. If you have only continuous covariates, the \code{step}
  function will base the model building on magnitude of eigenvalues, and
  the value of \code{k} only influences the stopping point (but
  variable with highest eigenvalues is not necessarily the most
  significant one in permutation
  tests in \code{\link{anova.cca}}). If you also
  have multi-class factors, the value of \code{k} will have a
  capricious effect in model building.  
 
}
\value{
  The \code{deviance} functions return ``deviance'', and
  \code{extractAIC} returns effective degrees of freedom and ``AIC''. 
}
\references{
  I do not know any publication using these functions. However, their
  derivation is so obvious that somebody may have published them, or
  somebody probably will publish them some day.  } 

\author{ Jari  Oksanen }

\note{
  These functions are unfounded and untested and they should not be used
  directly or implicitly.  Moreover, usual caveats in using 
  \code{\link{step}} are very valid.
}

\seealso{\code{\link{cca}}, \code{\link{rda}}, \code{\link{anova.cca}},
    \code{\link{step}}, \code{\link{extractAIC}}.}
\examples{
# The deviance of correspondence analysis equals Chi-square
data(dune)
data(dune.env)
chisq.test(dune)
deviance(cca(dune))
# Backward elimination from a complete model "dune ~ ."
ord <- cca(dune ~ ., dune.env)
ord
step(ord)
# Stepwise selection (forward from an empty model "dune ~ 1")
step(cca(dune ~ 1, dune.env), scope = formula(ord))
# ANOVA for the added variable
anova(cca(dune ~ Moisture, dune.env))
# ANOVA for the next candidate variable that was not added
anova(cca(dune ~ Condition(Moisture) + Management, dune.env), perm.max=1000)
}
\keyword{ multivariate }
\keyword{ models }

\eof
\name{distconnected}
\alias{distconnected}
\alias{no.shared}
\alias{spantree}

\title{Connectedness and Minimum Spanning Tree for Dissimilarities }
\description{
  Function \code{distconnected} finds groups that are connected
  disregarding dissimilarities that are at or above a threshold or
  \code{NA}. The function can be used to find groups that can be
  ordinated together or transformed by
  \code{\link{stepacross}}. Function \code{no.shared} returns a logical
  dissimilarity object, where \code{TRUE} means that sites have no
  species in common. This is a minimal structure for
  \code{distconnected} or can be used to set missing values to
  dissimilarities.
  Function \code{spantree} finds a minimum spanning tree
  connecting all points, but disregarding dissimilarities that are at or
  above the threshold or \code{NA}.  
}
\usage{
distconnected(dis, toolong = 1, trace = TRUE)
no.shared(x)
spantree(dis, toolong = 1)
}

\arguments{
  \item{dis}{Dissimilarity data inheriting from class \code{dist} or
    a an object, such as a matrix, that can be converted to a
    dissimilarity matrix. Functions \code{\link{vegdist}} and
    \code{\link[mva]{dist}} are some functions producing suitable
    dissimilarity data.}
  \item{toolong}{ Shortest dissimilarity regarded as \code{NA}.
    The function uses a fuzz factor, so
    that dissimilarities close to the limit will be made \code{NA}, too. }
  \item{trace}{Summarize results of \code{distconnected}}
  \item{x}{Community data.}
  
}
\details{
  Data sets are disconnected if they have sample plots or groups of
  sample plots which share no species with other sites or groups of
  sites. Such data sets
  cannot be sensibly ordinated by any unconstrained method, because
  these subsets cannot be related to each other. For instance,
  correspondence analysis will polarize these subsets with eigenvalue
  1. Neither can such dissimilarities be transformed with
  \code{\link{stepacross}}, because there is no path between all points,
  and result will contain \code{NA}s. Function \code{distconnected} will
  find such subsets in dissimilarity matrices. The function will return
  a grouping vector that can be used for subsetting the
  data. If data are connected, the result vector will be all
  \eqn{1}s. The connectedness between two points can be defined either
  by a threshold \code{toolong} or using input dissimilarities
  with \code{NA}s. If \code{toolong} is zero or negative, no
  threshold will be used.

  Function \code{no.shared} returns a \code{dist} structure having value
  \code{TRUE} when two sites have nothing in common, and value
  \code{FALSE} when they have at least one shared species. This is a
  minimal structure that can be analysed with \code{distconnected}. The
  function can be used to select dissimilarities with no shared species
  in indices which do not have a fixed upper limit.
  
  Function \code{spantree} finds a minimum spanning tree for
  dissimilarities (there may be several minimum spanning trees, but the
  function finds only one). Dissimilarities at or above the threshold
  \code{toolong} and \code{NA}s are disregarded, and the spanning tree
  is found through other dissimilarities. If the data are disconnected,
  the function will return a disconnected tree (or a forest), and the
  corresponding link is \code{NA}. The results of \code{spantree} can be
  overlaid onto an ordination diagram using function
  \code{\link{ordispantree}}. 

  Function \code{distconnected} uses depth-first search
  (Sedgewick 1990). Function \code{spantree} uses Prim's method
  implemented as priority-first search for dense graphs (Sedgewick
  1990). 
}
\value{
  Function \code{distconnected} returns a vector for
  observations using integers to identify connected groups. If the data
  are connected, values will be all \code{1}. Function \code{no.shared}
  returns an object of class \code{\link[mva]{dist}}. Function \code{spantree}
  returns a list with two vectors, each of length \eqn{n-1}. The
  number of links in a tree is one less the number of observations, and
  the first item is omitted. The items are  
  \item{kid }{The child node of the parent, starting from parent number
    two. If there is no link from the parent, value will be \code{NA}
    and tree is disconnected at the node.}
  \item{dist }{Corresponding distance. If \code{kid = NA}, then
    \code{dist = 0}.}
}
\references{
 Sedgewick, R. (1990). \emph{Algorithms in C}. Addison Wesley. 
}
\author{ Jari Oksanen }
\note{
  In principle, minimum spanning tree is equivalent to single linkage
  clustering that can be performed using \code{\link[mva]{hclust}} or
  \code{\link[cluster]{agnes}}. However, these functions combine
  clusters to each other and the information of the actually connected points
  (the ``single link'') cannot be recovered from the result. The
  graphical output of a single linkage clustering plotted with
  \code{\link{ordicluster}} will look very different from an equivalent
  spanning tree plotted with \code{\link{ordispantree}}.
}


\seealso{\code{\link{vegdist}} or \code{\link[mva]{dist}} for getting
  dissimilarities, \code{\link{stepacross}} for a case where you may need
    \code{distconnected}, \code{\link{ordispantree}} for displaying
    results of \code{spantree}, and \code{\link[mva]{hclust}} or
    \code{\link[cluster]{agnes}} for single linkage clustering. 
}
\examples{
## There are no disconnected data in vegan, and the following uses an
## extremely low threshold limit for connectedness. This is for
## illustration only, and not a recommended practice.
data(dune)
dis <- vegdist(dune)
ord <- cmdscale(dis) ## metric MDS
gr <- distconnected(dis, toolong=0.4)
tr <- spantree(dis, toolong=0.4)
ordiplot(ord, type="n")
ordispantree(ord, tr, col="red", lwd=2)
points(ord, cex=1.3, pch=21, col=1, bg = gr)
# Make sites with no shared species as NA in Manhattan dissimilarities
dis <- vegdist(dune, "manhattan")
is.na(dis) <- no.shared(dune)
}
\keyword{ multivariate}


\eof
\name{diversity}
\alias{diversity}
\alias{rarefy}
\alias{renyi}
\alias{fisher.alpha}
\alias{specnumber}

\title{ Ecological Diversity Indices and Rarefaction Species Richness }

\description{ Shannon, Simpson, Rnyi, Hill and Fisher diversity indices
and rarefied species richness for community ecologists.  }

\usage{
diversity(x, index = "shannon", MARGIN = 1, base = exp(1))
rarefy(x, sample, se = FALSE, MARGIN = 1)
renyi(x, scales=c(0,0.25,0.5,1,2,4,8,16,32,64,Inf), hill = FALSE)
fisher.alpha(x, MARGIN = 1, se = FALSE, ...)
specnumber(x, MARGIN = 1)
}

\arguments{
  \item{x}{Community data matrix.}
  \item{index}{Diversity index, one of \code{shannon}, \code{simpson} or
  \code{invsimpson}.}
  \item{MARGIN}{Margin for which the index is computed. }
  \item{base}{ The logarithm \code{base} used in \code{shannon}.}
  \item{sample}{Subsample size for rarefying community.}
  \item{se}{Estimate standard errors.}
  \item{scales}{Scales of Rnyi diversity.}
  \item{hill}{Calculate Hill numbers.}
  \item{...}{Parameters passed to \code{\link{nlm}}}
}
\details{
  Shannon or Shannon--Weaver (or Shannon--Wiener) index is defined as
  \eqn{H' = -\sum_i p_i \log_{b} p_i}{H = -sum p_i log(b) p_i}, where
  \eqn{p_i} is the proportional abundance of species \eqn{i} and \eqn{b}
  is the base of the logarithm.  It is most popular to use natural
  logarithms, but some argue for base \eqn{b = 2} (which makes sense, but no
  real difference).

  Both variants of Simpson's index are based on \eqn{D = \sum p_i^2}{D =
    sum p_i^2}. Choice \code{simpson} returns \eqn{1-D} and
  \code{invsimpson} returns \eqn{1/D}.

  Shannon and Simpson indices are both special cases of Rnyi
  diversity
  \deqn{H_a = \frac{1}{1-a} \log \sum p_i^a}{H.a = 1/(1-a) log
    sum(p^a)}
  where \eqn{a} is a scale parameter, and Hill (1975) suggested to
  use so-called ``Hill numbers'' defined as \eqn{N_a = \exp(H_a)}{N.a =
    exp(H.a)}.  Some Hill numbers are the number of species with
  \eqn{a = 0}, \eqn{\exp(H')}{exp(H')} or the exponent of Shannon
  diversity with \eqn{a = 1}, inverse Simpson with \eqn{a = 2} and
  \eqn{1/ \max(p_i)}{1/max(p)} with \eqn{a = \infty}{a = Inf}. According
  to the theory of diversity ordering, one community can be regarded as
  more diverse than another only if its Rnyi diversities are all higher 
  (Tthmrsz 1995). 
 
  Function \code{rarefy} gives the expected species richness in random
  subsamples of size \code{sample} from the community. The size of
  \code{sample}
  should be smaller than total community size, but the function will
  silently work for larger \code{sample} as well and return non-rarefied
  species richness (and standard error = 0).
  Rarefaction can be performed only with genuine
  counts of individuals: the current function will silently truncate
  abundances to integers and give wrong results. The function
  \code{rarefy} is based on Hurlbert's (1971) formulation, and the
  standard errors on Heck et al. (1975).  

  Function \code{fisher.alpha} estimates the \eqn{\alpha} parameter of
  Fisher's logarithmic series (see \code{\link{fisherfit}}). 
  The estimation is possible only for genuine
  counts of individuals. The function can optionally return standard
  errors of \eqn{\alpha}.  These should be regarded only as rough
  indicators of the accuracy: the confidence limits of \eqn{\alpha} are
  strongly
  non-symmetric and standard errors cannot be used in Normal inference.

  Function \code{specnumber} finds the number of species. With
  \code{MARGIN = 2}, it finds frequencies of species. The function is
  extremely simple, and shortcuts are easy in plain \R.
  
  Better stories can be told about Simpson's index than about
  Shannon's index, and still more grandiose stories about
  rarefaction (Hurlbert 1971).  However, these indices are all very
  closely related (Hill 1973), and there is no reason to despise one more than
  others (but if you are a graduate student, don't drag me in, but obey
  your Professor's orders). In particular, exponent of the Shannon
  index is linearly related to inverse Simpson (Hill 1973) although the
  former may be more sensitive to rare species. Moreover, inverse
  Simpson is asymptotically equal to rarefied species richness in sample
  of two individuals, and Fisher's \eqn{\alpha} is very similar to
  inverse Simpson.
}

\value{
  Vector of diversity indices or rarefied species richness values. With
  option \code{se = TRUE}, function \code{rarefy} returns a 2-row matrix
  with rarefied richness (\code{S}) and its standard error
  (\code{se}). Function \code{renyi} returns a data frame of selected
  indices. 
  With option \code{se = TRUE}, function \code{fisher.alpha} returns a
  data frame with items for \eqn{\alpha} (\code{alpha}), its approximate
  standard errors (\code{se}), residual degrees of freedom
  (\code{df.residual}), and the \code{code} returned by
  \code{\link{nlm}} on the success of estimation. }
}

\references{
  Fisher, R.A., Corbet, A.S. & Williams, C.B. (1943). The relation
  between the number of species and the number of individuals in a
  random sample of animal population. \emph{Journal of Animal Ecology}
  12, 42--58.

  Heck, K.L., van Belle, G. & Simberloff, D. (1975). Explicit
  calculation of the rarefaction diversity measurement and the
  determination of sufficient sample size. \emph{Ecology} 56,
  1459--1461.  
  
  Hill, M.O. (1973). Diversity and evenness: a unifying notation and its
  consequences. \emph{Ecology} 54, 427--473.
  
  Hurlbert, S.H. (1971). The nonconcept of species diversity: a critique
  and alternative parameters. \emph{Ecological Monographs} 54, 187--211.
 
  Tthmrsz, B. (1995). Comparison of different methods for diversity
  ordering. \emph{Journal of Vegetation Science} 6, 283--290.
}

\author{ Jari Oksanen, Roeland Kindt \email{r.kindt@cgiar.org}
  (\code{renyi}) and  Bob O'Hara \email{bob.ohara@helsinki.fi}
    (\code{fisher.alpha}). }

\examples{
data(BCI)
H <- diversity(BCI)
simp <- diversity(BCI, "simpson")
invsimp <- diversity(BCI, "inv")
r.2 <- rarefy(BCI, 2)
alpha <- fisher.alpha(BCI)
pairs(cbind(H, simp, invsimp, r.2, alpha), pch="+", col="blue")
## Species richness (S) and Pielou's evenness (J):
S <- specnumber(BCI) ## rowSums(BCI > 0) does the same...
J <- H/log(S)
}
\keyword{ univar }



\eof
\name{dune}
\alias{dune}
\alias{dune.env}
\docType{data}
\title{Vegetation and Environment in Dutch Dune Meadows. }
\usage{
  data(dune)
  data(dune.env)
}
\description{
The dune meadow vegetation data \code{dune} has cover class values of 30
species on 20 sites. The corresponding environmental data frame
\code{dune.env} has following entries:
}
\format{
  \describe{
    \item{A1}{a numeric vector of thickness of A1 horizon.}
    \item{Moisture}{an ordered factor with levels}
    \item{Moisture}{\code{1} < \code{2} < \code{4} < \code{5}}
    \item{Management}{a factor with levels}
    \item{Management}{\code{BF}: Biological Farming  }
    \item{Management}{\code{HF}: Hobby Farming }
    \item{Management}{\code{NM}: Nature Conservation Management }
    \item{Management}{\code{SF}: Standard Farming }
    \item{Use}{an ordered factor of landuse with levels}
    \item{Use}{\code{Hayfield} < \code{Haypastu} < \code{Pasture}}
    \item{Manure}{an ordered factor with levels}
    \item{Manure}{\code{0} < \code{1} < \code{2} < \code{3} < \code{4}}
  }
}
}
\source{
  Jongman, R.H.G, ter Braak, C.J.F & van Tongeren,
  O.F.R. (1987). \emph{Data Analysis in Community and Landscape
    Ecology}. Pudog, Wageningen.
}
}
\examples{
data(dune)
}
\keyword{datasets}

\eof
\name{envfit}
\alias{envfit}
\alias{vectorfit}
\alias{factorfit}
\alias{plot.envfit}
\alias{print.envfit}
\alias{print.factorfit}
\alias{print.vectorfit}

\title{Fits an Environmental Vector or Factor onto an Ordination }
\description{
  The function fits environmental vectors or factors onto an
  ordination. The projection of points onto vectors have
  maximum correlations with corresponding environmental variables, and
  the factors show the averages of factor levels.
}
\usage{
envfit(X, P, permutations = 0, strata, choices=c(1,2))
\method{plot}{envfit}(x, choices = c(1,2), arrow.mul = 1, p.max = NULL,
   col = "blue", add = TRUE, ...)
vectorfit(X, P, permutations = 0, strata, choices=c(1,2))
factorfit(X, P, permutations = 0, strata, choices=c(1,2))
}

\arguments{
  \item{X}{ Ordination configuration.}
  \item{P}{ Matrix or vector of environmental variable(s). }
  \item{permutations}{ Number of permutations for assessing significance
    of vectors or factors.}
  \item{x}{A result object from \code{envfit}.}
  \item{choices}{Axes to plotted.}
  \item{arrow.mul}{Multiplier for vector lengths.}
  \item{p.max}{Maximum estimated \eqn{P} value for displayed
    variables.  You must calculate \eqn{P} values with setting
    \code{permutations} to use this option. }
  \item{col}{Colour in plotting.}
  \item{add}{Results added to an existing ordination plot.}
  \item{strata}{An integer vector or factor specifying the strata for
    permutation. If supplied, observations are permuted only within the
    specified strata.}
  \item{...}{Parameters to \code{text} function.}
}
\details{
  Function \code{envfit} finds vectors or factor averages of
  environmental variables.  Function \code{plot.envfit} adds these in an
  ordination diagram.  If \code{X} is a \code{\link{data.frame}},
  \code{envfit}
  uses \code{factorfit} for \code{\link{factor}} variables and
  \code{vectorfit} for other variables.  If \code{X} is a matrix or a
  vector, \code{envfit} uses only \code{vectorfit}.
  
  Functions \code{vectorfit} and \code{factorfit} can be called directly.
  Function \code{vectorfit} finds directions in the ordination space
  towards which the environmental vectors change most rapidly and to
  which they have maximal correlations with the ordination
  configuration.  Function \code{factorfit} finds averages of ordination
  scores for factor levels.

  If \code{permutations} \eqn{> 0}, the `significance' of fitted vectors
  or factors is assessed using permutation of environmental variables.
  The goodness of fit statistic is squared correlation coefficient
  (\eqn{r^2}).
  For factors this is defined as \eqn{r^2 = 1 - ss_w/ss_t}, where
  \eqn{ss_w} and \eqn{ss_t} are within-group and total sums of squares.
}
\value{
  Functions \code{vectorfit} and \code{factorfit} return lists of
  classes \code{vectorfit} and \code{factorfit} which have a
  \code{print} method.  The result object have the following items:

  \item{arrows}{Arrow endpoints from \code{vectorfit}. The arrows are
    scaled to unit length.}
  \item{centroids}{Class centroids from \code{factorfit}.}
  \item{r}{Goodness of fit statistic: Squared orrelation coefficient}
  \item{permutations}{Number of permutations.}
  \item{pvals}{Empirical P-values for each variable.}

  Function \code{envfit} returns a list of class \code{envfit} with
  results of \code{vectorfit} and \code{envfit} as items.
  
  Function \code{plot.envfit} scales the vectors by correlation.
}

\author{Jari Oksanen.  The permutation test derives from the code
  suggested by Michael Scroggie. }

\note{
  Fitted vectors have become the method of choice in displaying
  environmental variables in ordination.  Indeed, they are the optimal
  way of presenting environmental variables in Constrained
  Correspondence Analysis \code{\link{cca}}, since there they are the
  linear constraints.
  In unconstrained ordination the relation between external variables
  and ordination configuration may be less linear, and therefore other
  methods than arrows may be more useful.  The simplest is to adjust the
  plotting symbol sizes (\code{cex}, \code{\link{symbols}}) by
  environmental variables.
  Fancier methods involve smoothing and regression methods that
  abound in \R, and \code{\link{ordisurf}} provides a wrapper for some.
  }

\seealso{
  A better alternative to vectors may be \code{\link{ordisurf}}.    
  }

\examples{
data(varespec)
data(varechem)
library(MASS)
library(mva)
vare.dist <- vegdist(wisconsin(varespec))
vare.mds <- isoMDS(vare.dist)
vare.mds <- postMDS(vare.mds, vare.dist)
vare.fit <- envfit(vare.mds$points, varechem, 1000)
vare.fit
ordiplot(vare.mds)
plot(vare.fit)
plot(vare.fit, p.max = 0.05, col = "red")
}
\keyword{multivariate }
\keyword{aplot}
\keyword{htest}





\eof
\name{fisherfit}
\alias{fisherfit}
\alias{as.fisher}
\alias{plot.fisherfit}
\alias{print.fisherfit}
\alias{prestonfit}
\alias{prestondistr}
\alias{as.preston}
\alias{plot.prestonfit}
\alias{lines.prestonfit}
\alias{print.prestonfit}
\alias{veiledspec}

\title{ Fit Fisher's Logseries and Preston's Lognormal model to Abundance Data }
\description{
  Function \code{fisherfit} fits Fisher's logseries to abundance
  data. Function \code{prestonfit} groups species frequencies into
  doubling octave classes and fits Preston's lognormal model, and
  function \code{prestondistr} fits the truncated lognormal model
  without pooling the data into octaves.
}
\usage{
fisherfit(x, ...)
prestonfit(x, ...)
prestondistr(x, truncate = -1, ...)
\method{plot}{prestonfit}(x, xlab = "Frequency", ylab = "Species", bar.col = "skyblue", 
    line.col = "red", lwd = 2, ...)
\method{lines}{prestonfit}(x, line.col = "red", lwd = 2, ...)
veiledspec(x, ...)
as.fisher(x, ...)
}

\arguments{
  \item{x}{Community data vector for fitting functions or their result
    object for \code{plot} functions.}
  \item{truncate}{Truncation point for log-Normal model, in log2
    units. Default value \eqn{-1} corresponds to the left border of zero
    Octave. The choice strongly influences the fitting results.}
  \item{xlab, ylab}{Labels for \code{x} and \code{y} axes.}
  \item{bar.col}{Colour of data bars.}
  \item{line.col}{Colour of fitted line.}
  \item{lwd}{Width of fitted line.}
  \item{\dots}{Other parameters passed to functions. }
}
\details{
  In Fisher's logarithmic series the expected
  number of species \eqn{f} with \eqn{n} observed individuals is
  \eqn{f_n = \alpha x^n / n} (Fisher et al. 1943). The estimation
  follows Kempton & Taylor (1974) and uses function
  \code{\link{nlm}}. The estimation is possible only for genuine
  counts of individuals. The parameter \eqn{\alpha} is used as a
  diversity index, and \eqn{\alpha} and its standard error can be
  estimated with a separate function \code{\link{fisher.alpha}}. The
  parameter \eqn{x} is taken as a nuisance parameter which is not
  estimated separately but taken to be \eqn{N/(N+\alpha)}. Helper
  function \code{as.fisher} transforms abundance data into Fisher
  frequency table.

  Preston (1948) was not satisfied with Fisher's model which seemed to
  imply infinite species richness, and postulated that rare species is a
  diminishing class and most species are in the middle of frequency
  scale. This was achieved by collapsing higher frequency classes into
  wider and wider ``octaves'' of doubling class limits: 1, 2, 3--4,
  5--8, 9--16 etc. occurrences. Any logseries data will look like
  lognormal when plotted this way. The expected frequency \eqn{f} at abundance
  octave \eqn{o} is defined by \eqn{f_o = S_0 \exp(-(\log_2(o) -
    \mu)^2/2/\sigma^2)}{f = S0 exp(-(log2(o)-mu)^2/2/sigma^2)}, where
  \eqn{\mu} is the location of the mode and \eqn{\sigma} the width, both
  in \eqn{\log_2}{log2} scale, and \eqn{S_0}{S0} is the expected number
  of species at mode. The lognormal model is usually truncated on the
  left so that some rare species are not observed. Function
  \code{prestonfit} fits the truncated lognormal model as a second
  degree log-polynomial to the octave pooled data using Poisson
  error. Function \code{prestondistr} fits left-truncated Normal distribution to
  \eqn{log_2}{log2} transformed non-pooled observations with direct
  maximization of log-likelihood. Function \code{prestondistr} is
  modelled after function \code{\link[MASS]{fitdistr}} which can be used
  for alternative distribution models. The functions have common \code{print},
  \code{plot} and \code{lines} methods. The \code{lines} function adds
  the fitted curve to the octave range with line segments showing the
  location of the mode and the width (sd) of the response.

  The total
  extrapolated richness from a fitted Preston model can be found with
  function \code{veiledspec}. The function accepts results both from
  \code{prestonfit} and from \code{prestondistr}. If \code{veiledspec} is
  called with a species count vector, it will internally use
  \code{prestonfit}. Function \code{\link{specpool}} provides
  alternative ways of estimating the number of unseen species. In fact,
  Preston's lognormal model seems to be truncated at both ends, and this
  may be the main reason why its result differ from lognormal models
  fitted in Rank--Abundance diagrams with functions
  \code{\link{rad.lognormal}} or \code{\link{rad.veil}}. 
}
\value{
  The function \code{prestonfit} returns an object with fitted
  \code{coefficients}, and with observed (\code{freq}) and fitted
  (\code{fitted}) frequencies, and a string describing the fitting
  \code{method}. Function \code{prestondistr} omits the entry \code{fitted}.
  The function \code{fisherfit} returns the result of \code{\link{nlm}}, where item
  \code{estimate} is \eqn{\alpha}. The result object is amended with the
  following items:
  \item{df.residuals}{Residual degrees of freedom.}
  \item{nuisance}{Parameter \eqn{x}.}
  \item{fisher}{Observed data from \code{as.fisher}.}

}
\references{
  Fisher, R.A., Corbet, A.S. & Williams, C.B. (1943). The relation
  between the number of species and the number of individuals in a
  random sample of animal population. \emph{Journal of Animal Ecology}
  12: 42-58.

  Kempton, R.A. & Taylor, L.R. (1974). Log-series and log-normal
  parameters as diversity discriminators for Lepidoptera. \emph{Journal
    of Animal Ecology} 43: 381-399.

  Preston, F.W. (1948) The commonness and rarity of
  species. \emph{Ecology} 29, 254--283.
}
\author{Bob O'Hara \email{bob.ohara@helsinki.fi} (\code{fisherfit}) and Jari Oksanen. }

\note{It seems that Preston regarded frequencies 1, 2, 4, \emph{etc.}. as ``tied''
  between octaves. This means that only half of the species with
  frequency 1 were shown in the lowest octave, and the rest were
  transferred to the second octave. Half of the species from the second
  octave were transferred to the higher one as well, but this is usually
  not as large number of species. This practice makes data look more
  lognormal by reducing the usually high lowest octaves, but is too
  unfair to be followed. Therefore the octaves used in this function
  include the upper limit. If you do not accept this, you must change
  the function yourself.
}

\seealso{\code{\link{diversity}}, \code{\link{fisher.alpha}},
  \code{\link{radfit}}, \code{\link{specpool}}. Function
  \code{\link[MASS]{fitdistr}} of \code{MASS} package was used as the
  model for \code{fitdistr}. Function \code{\link{density}} can be used for
  smoothed ``non-parametric'' estimation of responses, and
  \code{\link{qqplot}} is an alternative, traditional and more effective
  way of studying concordance of observed abundances to any distribution model.
}
\examples{
data(BCI)
plot(fisherfit(BCI[5,]))
# prestonfit seems to need large samples
mod.oct <- prestonfit(colSums(BCI))
mod.ll <- prestondistr(colSums(BCI))
mod.oct
mod.ll
plot(mod.oct)  
lines(mod.ll, line.col="blue3") # Different
## Smoothed density
den <- density(log2(colSums(BCI)))
lines(den$x, ncol(BCI)*den$y, lwd=2) # Fairly similar to mod.oct
## Extrapolated richness
veiledspec(mod.oct)
veiledspec(mod.ll)
}
\keyword{ univar }
\keyword{ distribution }

\eof
\name{humpfit}
\alias{humpfit}
\alias{print.humpfit}
\alias{summary.humpfit}
\alias{print.summary.humpfit}
\alias{lines.humpfit}
\alias{plot.humpfit}
\alias{points.humpfit}
\alias{predict.humpfit}

\title{No-interaction Model for Hump-backed Species Richness vs. Biomass }
\description{
  Function \code{humpfit} fits a no-interaction model for species
  richness vs. biomass data (Oksanen 1996). This is a null model that
  produces a hump-backed response as an artifact of plant size and
  density. 
}
\usage{
humpfit(mass, spno, family = poisson)
\method{summary}{humpfit}(object, ...)
\method{predict}{humpfit}(object, newdata = NULL, ...) 
\method{plot}{humpfit}(x, xlab = "Biomass", ylab = "Species Richness", lwd = 2, 
    l.col = "blue", p.col = 1, type = "b", ...)
\method{points}{humpfit}(x, ...)
\method{lines}{humpfit}(x, segments=101,  ...)
}

\arguments{
  \item{mass}{Biomass. }
  \item{spno}{Species richness.}
  \item{family}{Family of error distribution. Any \code{\link{family}}
    can be used, but the link function is always Fisher's diversity
    model, and other \code{link} functions are silently ignored. }
  \item{x, object}{Result object of \code{humpfit}}
  \item{newdata}{Values of \code{mass} used in \code{predict}. The
    original data values are used if missing.}
  \item{xlab,ylab}{Axis labels in \code{plot}}
  \item{lwd}{Line width}
  \item{l.col, p.col}{Line and point colour in \code{plot}}
  \item{type}{Type of \code{plot}: \code{"p"} for observed points,
    \code{"l"} for fitted lines, \code{"b"} for both, and \code{"n"} for
    only setting axes.}
  \item{segments}{Number of segments used for fitted lines.}
  \item{...}{Other parameters to functions.}
}
\details{
  The no-interaction model assumes that the humped species richness
  pattern along biomass gradient is an artifact of plant size and
  density (Oksanen 1996). For low-biomass sites, it assumes that plants
  have a fixed size, and biomass increases with increasing number of
  plants. When the sites becomes crowded, the number of plants and
  species richness reaches the maximum. Higher biomass is reached by
  increasing the plant size, and then the number of plants and species
  richness will decrease. At biomasses below the hump, plant number and
  biomass are linearly related, and above the hump, plant number is
  proportional to inverse squared biomass. The number of plants is
  related to the number of species by the relationship (\code{link}
  function) from Fisher's log-series (Fisher et al. 1943).

  The parameters of the model are:
  \enumerate{
    \item \code{hump}: the location of the hump on the biomass gradient.
    \item \code{scale}: an arbitrary multiplier to translate the biomass
    into virtual number of plants.
    \item \code{alpha}: Fisher's \eqn{\alpha}{alpha} to translate the
    virtual number of plants into number of species.
  }
  The parameters \code{scale} and \code{alpha} are intermingled and this
  function should not be used for estimating Fisher's
  \eqn{\alpha}{alpha}.  Probably the only meaningful and interesting
  parameter is the location of the \code{hump}.

  The original model intended to show that there is no need to speculate
  about `competition' and `stress' (Al-Mufti et al. 1977), but humped
  response can be produced as an artifact of using fixed plot size for
  varying plant sizes and densities.  
}
\value{
  The function returns an object of class \code{"humpfit"} inheriting
  from class \code{"glm"}. The result object has specific
  \code{summary}, \code{predict}, \code{plot}, \code{points} and
  \code{lines} methods. In addition, it can be accessed by the following
  methods for \code{\link{glm}} objects: \code{\link{AIC}},
  \code{\link{extractAIC}}, \code{\link{deviance}}, \code{\link{coef}},
  \code{\link{residuals.glm}} (except \code{type = "partial"}),
  \code{\link{fitted}}, and perhaps some others. 
}
\references{
  Al-Mufti, M.M., Sykes, C.L, Furness, S.B., Grime, J.P & Band,
  S.R. (1977) A quantitative analysis of shoot phenology and dominance
  in herbaceous vegetation. \emph{Journal of Ecology} 65,759--791.
  
  Fisher, R.A., Corbet, A.S. & Williams, C.B. (1943) The relation
  between the number of species and the number of individuals in a
  random sample of of an animal population. \emph{Journal of Animal
    Ecology} 12, 42--58.
  
  Oksanen, J. (1996) Is the humped relationship between species richness
  and biomass an artefact due to plot size? \emph{Journal of Ecology}
  84, 293--295.
}
\author{ Jari Oksanen }
\note{
  The function is a replacement for the original \code{GLIM4} function
  at the archive of Journal of Ecology.  There the function was
  represented as a mixed \code{\link{glm}} with one non-linear
  parameter (\code{hump}) and a special one-parameter link function from
  Fisher's log-series.  The current function directly applies non-linear
  maximum likelihood fitting using function \code{\link{nlm}}.  Some
  expected problems with the current approach are:
  \itemize{
    \item The function is discontinuous at \code{hump} and may be
    difficult to optimize in some cases (the lines will always join, but
    the derivative jumps).
    \item The function does not try very hard to find sensible starting
    values and can fail. This really should be improved.
    \item The estimation is unconstrained, but both \code{scale} and
    \code{alpha} should always be positive.  Perhaps they should be
    fitted as logarithmic (which could improve the symmetry of
    confidence limits, too, but this needs research, and a
    \code{\link{profile}} function). Fitting \code{\link{Gamma}}
    \code{\link{family}} models might become easier, too.
    }
}


\seealso{\code{\link{fisherfit}}. }
\examples{
##
## Data approximated from Al-Mufti et al. (1977)
##
mass <- c(140,230,310,310,400,510,610,670,860,900,1050,1160,1900,2480)
spno <- c(1,  4,  3,  9, 18, 30, 20, 14,  3,  2,  3,  2,  5,  2)
sol <- humpfit(mass, spno)
summary(sol) # Almost infinite alpha...
plot(sol)
}
\keyword{models }
\keyword{regression }
\keyword{nonlinear}

\eof
\name{initMDS}
\alias{initMDS}
\alias{postMDS}
\title{ Random Start and Axis Scaling for isoMDS }
\description{
  Function \code{initMDS} gives a random  start for multidimensional
  scaling, and \code{postMDS} performs some post-standardizations for
  multidimensional scaling, in order to make the configurations easier
  to interpret 
}
\usage{
initMDS(x, k=2)
postMDS(X, dist, pc=TRUE, center=TRUE, halfchange=TRUE, threshold=0.8,
        nthreshold=10, plot=FALSE)
}
%- maybe also `usage' for other objects documented here.
\arguments{
  \item{x}{Dissimilarity matrix for isoMDS.}
  \item{k}{Number of dimensions.}
  \item{X}{Configuration from multidimensional scaling. }
  \item{dist}{Dissimilarity matrix used in multidimensional scaling. }
  \item{pc}{Rotate to principal components. }
  \item{center}{Centre the configuration. }
  \item{halfchange}{Scale axes to half-change units.}
  \item{threshold}{Largest dissimilarity used in half-change scaling. }
  \item{nthreshold}{ Minimum number of points in half-change scaling. }
  \item{plot}{Show half-change scaling plot.} 
}
\details{
  Non-metric Multidimensional Scaling (NMDS) is commonly regarded as the
  most robust unconstrained ordination method in community ecology (Minchin
  1987).  Functions \code{initMDS} and \code{postMDS} together with some
  other functions are intended to 
  help run NMDS in \code{\link[MASS]{isoMDS}} like recommended by
  Minchin (1987)  -- NMDS is not a
  strong choice unless used correctly:
  \enumerate{
    \item You should use a dissimilarity index that has a good rank
    order relation with ecological gradients.  Function
    \code{\link{rankindex}} may help in choosing an adequate index.
    Often a good \emph{a priori} choice is to use Bray--Curtis index,
    perhaps with \code{\link{wisconsin}} double standardization.  Some
    recommended indices are available in function
    \code{\link{vegdist}}.

    \item NMDS should be run with several random starts.  It is
    dangerous to follow the common default of starting with metric
    scaling (\code{\link[mva]{cmdscale}}), because this may be close to
    a local optimum which will trap the iteration.  Function
    \code{initMDS} provides such random starts.

    \item NMDS solutions with minimum stress should be compared for
    consistency.  You should be satisfied only when several minimum
    stress solutions have similar configurations. In particular in large data
    sets, single points may be unstable even with about
    equal stress.  Function \code{postMDS} provides a simple solution
    for fixing the scaling of NMDS.  Function \code{\link{procrustes}}
    provides Procrustes rotation for more formal inspection.
    }

  Function \code{postMDS} provides the following ways of ``fixing'' the
  indeterminacy of scaling and orientation of axes in NMDS:
  Centring moves the origin to the
  average of each axis.  Principal components rotate the configuration
  so that the variance of points is maximized on first
  dimensions. Half-change scaling scales the configuration so that one 
  unit means halving of community similarity from replicate similarity.
  Half-change scaling is
  based on closer dissimilarities where the relation between ordination
  distance and community dissimilarity is rather linear; the limit is
  controlled by parameter \code{threshold}.  If there are enough points
  below this threshold (controlled by the the parameter
  \code{nthreshold}), dissimilarities are regressed on distances.
  The intercept of this regression is taken as the replicate
  dissimilarity, and half-change is the distance where similarity
  halves according to linear regression.  Obviously the method is
  applicable only for dissimilarity indices scaled to \eqn{0 \ldots 1},
  such as Kulczynski, Bray-Curtis and Canberra indices. 
}

\value{
  Function \code{initMDS} returns a random configuration which is
  intended to be used within \code{\link[MASS]{isoMDS}} only.  Function
  \code{postMDS} returns the result of \code{\link[MASS]{isoMDS}} with
  updated configuration. 
}  
\references{ Minchin, P.R. (1987)  An evaluation of relative robustness
  of techniques for ecological ordinations. \emph{Vegetatio} 71, 145-156. }

\author{ Jari Oksanen }

\seealso{\code{\link[MASS]{isoMDS}}, \code{\link[mva]{cmdscale}},
  \code{\link{procrustes}}.  }

\examples{
## The recommended way of running NMDS (Minchin 1987)
##
data(varespec)
data(varechem)
library(MASS) ## isoMDS
library(mva)  ## cmdscale: default start to isoMDS
# Select a good dissimilarity index
rankindex(scale(varechem),varespec)
rankindex(scale(varechem),wisconsin(varespec))
vare.dist <- vegdist(wisconsin(varespec), "bray")
# Get the baseline solution: start with cmdscale
mds.null <- isoMDS(vare.dist, tol=1e-7)
## See if you can get any better.
repeat{
  mds.1<- isoMDS(vare.dist, initMDS(vare.dist), maxit=200, trace=FALSE, tol=1e-7)
  if(mds.1$stress < mds.null$stress) break
}
# Scale solutions ("fix translation, rotation and scale")
mds.null <- postMDS(mds.null, vare.dist)
mds.1 <- postMDS(mds.1, vare.dist)
# Compare solutions
plot(procrustes(mds.1, mds.null))
}
\keyword{ multivariate }%-- one or more ...

\eof
\name{make.cepnames}
\alias{make.cepnames}

\title{Abbreviates a Botanical or Zoological Latin Name into an Eight-character Name} 
\description{
  A standard CEP name has four first letters of the generic name and
  four first letters of the specific epithet of a Latin name. The last
  epithet, that may be a subspecific name, is used in the current
  function. The returned names are made unique with function
  \code{\link{make.unique}} which adds numbers to the end of CEP names if needed.
}
\usage{
make.cepnames(names)
}
\arguments{
  \item{names}{The names to be formatted into CEP names. }
}
\details{
  Cornell Ecology Programs (CEP) used eight-letter abbreviations for
  species and site names. In species, the names were formed by taking
  four first letters of the generic name and four first letters of the
  specific or subspecific epithet. The CEP names were originally used,
  because old \code{FORTRAN IV} did not have \code{CHARACTER} data type,
  but text variables had to be stored into numerical variables, which in
  popular computers could hold four characters. In modern times,
  there is no reason for this limitation, but ecologists are used to
  these names, and they may be practical to avoid congestion in
  ordination plots.
}
\value{
  Function returns CEP names.
}
\author{ Jari Oksanen }
\note{
  The function is simpleminded and rigid. You must write a better one if
  you need.
}
\seealso{
  \code{\link{make.names}}, \code{\link{strsplit}},
  \code{\link{substring}}, \code{\link{paste}}.
  }
\examples{
make.cepnames(c("Aa maderoi", "Poa sp.", "Cladina rangiferina",
"Cladonia cornuta", "Cladonia cornuta var. groenlandica",
"Cladonia rangiformis"))
data(BCI)
colnames(BCI) <- make.cepnames(colnames(BCI))
}
\keyword{ character }


\eof
\name{mantel}
\alias{mantel}
\alias{print.mantel}

\title{Mantel Test for Two Dissimilarity Matrices }
\description{
  Function \code{mantel} finds the Mantel statistic as a matrix
  correlation between two dissimilarity matrices, and evaluates the
  significance of the statistic by permuting rows and columns of the
  dissimilarity matrix.  The statistic is evaluated either as a moment
  correlation or as a rank correlation.

}
\usage{
mantel(xdis, ydis, method="pearson", permutations=1000, strata)
}

\arguments{
  \item{xdis}{ First dissimilarity matrix or a \code{dist} object. }
  \item{ydis}{ Second dissimilarity matrix or a \code{dist} object. }
  \item{method}{ Correlation method, as accepted by \code{\link[ctest]{cor.test}}:
    \code{pearson}, \code{spearman} or \code{kendall}. }
  \item{permutations}{Number of permutations in assessing significance. }
  \item{strata}{An integer vector or factor specifying the strata for
    permutation. If supplied, observations are permuted only within the
    specified strata.}
}
\details{
  Mantel statistic is simply a correlation between entries of two
  dissimilarity matrices (some use cross products, but these are linearly
  related).  However, the significance cannot be directly assessed,
  because there are \eqn{N(N-1)/2} entries for just \eqn{N} observations.
  Mantel developed asymptotic test, but here we use permutations of
  \eqn{N} rows and columns of dissimilarity matrix.

  The function uses \code{\link[ctest]{cor.test}}, which should accept
  alternatives \code{pearson} for product moment correlations and
  \code{spearman} or \code{kendall} for rank correlations.
}
\value{
  The function returns a list of class \code{mantel} with following
  components: 
  \item{Call }{Function call.}
  \item{method }{Correlation method used, as returned by
    \code{\link[ctest]{cor.test}}.}
  \item{statistic}{The Mantel statistic.}
  \item{signif}{Empirical significance level from permutations.}
  \item{perm}{A vector of permuted values.}
  \item{permutations}{Number of permutations.}
  }
  
}
\references{ The test is due to Mantel, of course, but the
  current implementation is based on Legendre and Legendre.

  Legendre, P. and Legendre, L. (1998) \emph{Numerical Ecology}. 2nd English
  Edition. Elsevier.
  
}

\author{Jari Oksanen }


\seealso{\code{\link[ctest]{cor.test}} for correlation tests,
  \code{\link{protest}} (``Procrustes test'') for an alternative with
  ordination diagrams,
  and \code{\link{anosim}} for comparing dissimilarities against
  classification.  For dissimilarity matrices, see \code{\link{vegdist}}
  or \code{\link[mva]{dist}}. }

\examples{
## Is vegetation related to environment?
data(varespec)
data(varechem)
veg.dist <- vegdist(varespec) # Bray-Curtis
env.dist <- vegdist(scale(varechem), "euclid")
mantel(veg.dist, env.dist)
mantel(veg.dist, env.dist, method="spear")
}
\keyword{ multivariate }
\keyword{ htest }

\eof
\name{ordihull}
\alias{ordihull}
\alias{ordiarrows}
\alias{ordisegments}
\alias{ordigrid}
\alias{ordispider}
\alias{ordiellipse}
\alias{ordicluster}
\alias{ordispantree}
\alias{weights.cca}
\alias{weights.rda}
\alias{weights.decorana}

\title{Add Graphical Items to Ordination Diagrams}
\description{
  Functions to add convex hulls, arrows, line segments, regular grids of
  points, `spider' graphs, ellipses, cluster dendrogram or spanning
 trees to ordination diagrams. The
  ordination diagrams can be produced by \code{vegan}
  \code{\link{plot.cca}}, \code{\link{plot.decorana}} or
  \code{\link{ordiplot}}.
}
\usage{
ordihull(ord, groups, display = "sites", draw = c("lines","polygon"), ...)
ordiarrows(ord, groups, levels, replicates, display = "sites", ...)
ordisegments(ord, groups, levels, replicates, display = "sites", ...)
ordigrid(ord, levels, replicates, display = "sites",  ...)
ordispider(ord, groups, display="sites", w = weights(ord, display), ...)
ordiellipse(ord, groups, display="sites", kind = c("sd","se"), conf,
            draw = c("lines","polygon"), w = weights(ord, display), ...)
ordicluster(ord, cluster, prune = 0, display = "sites",
            w = weights(ord, display), ...)
ordispantree(ord, tree, display = "sites", ...)
}

\arguments{
  \item{ord}{An ordination object or an \code{\link{ordiplot}} object. }
  \item{groups}{Factor giving the groups for which the graphical item is
    drawn. }
  \item{levels, replicates}{Alternatively, regular
    groups can be defined with arguments \code{levels} and
    \code{replicates}, where \code{levels} gives the number of groups,
    and \code{replicates} the number of successive items at the same
    group.}
  \item{display}{Item to displayed. }
  \item{draw}{Use either \code{\link{lines}} or \code{\link{polygon}} to
    draw the
    line. Graphical parameters are passed to both. The main difference
    is that \code{polygon}s may be filled and non-transparent.}
  \item{w}{Weights used to find the average within group. Weights are
    used automatically for \code{\link{cca}}
    and \code{\link{decorana}} results, unless undone by the
    user. \code{w=NULL} sets equal weights to all points. }
  \item{kind}{Whether standard deviations of points (\code{sd}) or
    standard deviations of their (weighted) averages (\code{se}) are
    used.}
  \item{conf}{Confidence limit for ellipses, e.g. 0.95. If given, the
    corresponding \code{sd} or \code{se} is multiplied with the
    corresponding value found from the Chi-squared distribution with
    2df. }
  \item{cluster}{Result of hierarchic cluster analysis, such as
    \code{\link[mva]{hclust}} or \code{\link[cluster]{agnes}}.}
  \item{prune}{Number of upper level hierarchies removed from the
    dendrogram. If \code{prune} \eqn{>0}, dendrogram will be
    disconnected.}
  \item{tree}{Structure defining a spanning tree. This can be a result
    of \code{\link{spantree}} or a vector giving the child node of each
    parent omitting the first point. Values \code{NA} means that there
    is no link from the corresponding parent.}
  \item{\dots}{Parameters passed to graphical functions such
    as \code{\link{lines}}, \code{\link{segments}},
    \code{\link{arrows}}, \code{\link{polygon}} or to
    \code{\link{scores}} to select axes and
    scaling etc. }
  }
\details{
  Function \code{ordihull} draws \code{\link{lines}} or
  \code{\link{polygon}}s for the convex
  hulls found by function \code{\link{chull}} encircling
  the items in the groups.

  Function \code{ordiarrows} draws
  \code{\link{arrows}} and \code{ordisegments} draws line
  \code{\link{segments}} between successive items in the
  groups. Function \code{ordigrid} draws line
  \code{\link{segments}} both within the groups and for the
  corresponding items among the groups.

  Function \code{ordispider} draws a `spider' diagram where each point
  is connected to the group centroid with
  \code{\link{segments}}. Weighted centroids are used in the
  correspondence analysis  methods \code{\link{cca}} and
  \code{\link{decorana}} or if the user gives the weights in the
  call. If \code{ordispider} is called with \code{\link{cca}} or
  \code{\link{rda}} result without \code{groups} argument, the function
  connects each `WA' scores to the correspoding `LC' score.

  Function \code{ordiellipse} draws \code{\link{lines}} or
  \code{\link{polygon}}s for dispersion
  \code{\link[ellipse]{ellipse}} using either standard deviation of
  point scores or standard error of the (weighted) average of
  scores, and the (weighted) correlation defines the direction of the
  principal axis of the ellipse. The function requires library
  \code{ellipse}. An ellipsoid hull can be drawn with function
  \code{\link[cluster]{ellipsoidhull}} of package \code{cluster}.

  Function \code{ordicluster} overlays a cluster dendrogram onto
  ordination. It needs the result from a hierarchic clustering such as
  \code{\link[mva]{hclust}} or \code{\link[cluster]{agnes}}, or other
  with a similar structure. Function \code{ordicluster} connects
  cluster centroids to each other with line
  \code{\link{segments}}. Function uses centroids of all points in the 
  clusters, and is therefore similar to average linkage methods.

  Function \code{ordispantree} overlays a (minimum) spanning tree onto
  ordination. It needs a result from \code{\link{spantree}} or a
  vector listing children of each parent, starting from second (i.e.,
  omitting the first: the number of links is one less number of
  points). Missing links are denoted as \code{NA}. For an example, see
  \code{\link{spantree}}. 
}

\note{These functions add graphical items to ordination graph: You must
  draw a graph first.
  }
\author{ Jari Oksanen }

\seealso{The function pass parameters to basic graphical functions, and
  you may wish to change the default values in \code{\link{arrows}},
  \code{\link{lines}}, \code{\link{segments}} and
  \code{\link{polygon}}. You can pass
  parameters to \code{\link{scores}} as well. Other underlying functions
  are \code{\link{chull}} and \code{\link[ellipse]{ellipse}}.
}


\examples{
data(dune)
data(dune.env)
mod <- cca(dune ~ Moisture, dune.env)
attach(dune.env)
plot(mod, type="n")
ordihull(mod, Moisture)
ordispider(mod, col="red")
plot(mod, type = "p", display="sites")
ordicluster(mod, hclust(vegdist(dune)), prune=3, col = "blue")
# The following is not executed automatically because it needs
# a non-standard library `ellipse'. 
\dontrun{ordiellipse(mod, Moisture, kind="se", level=0.95, lwd=2, col="blue")}
}
\keyword{aplot }


\eof
\name{ordiplot}
\alias{ordiplot}
\alias{identify.ordiplot}
\alias{scores.ordiplot}
\alias{points.ordiplot}
\alias{text.ordiplot}

\title{ Alternative plot and identify Functions for Ordination }
\description{
  Ordination plot function especially for congested plots. Function
  \code{ordiplot} always plots only unlabelled points, but
  \code{identify.ordiplot} can be used to add labels to selected site,
  species or constraint points.  Function \code{identify.ordiplot} can
  be used to identify points from \code{\link{plot.cca}},
  \code{\link{plot.decorana}}, \code{\link{plot.procrustes}} and
  \code{\link{plot.rad}} as well.
}
\usage{
ordiplot(ord, choices = c(1, 2), type="points", ...)
\method{identify}{ordiplot}(x, what, ...)
\method{points}{ordiplot}(x, what, ...)
\method{text}{ordiplot}(x, what, ...)
}

\arguments{
  \item{ord}{A result from an ordination.}
  \item{choices}{Axes shown. }
  \item{type}{The type of graph which may be \code{"points"},
    \code{"text"} or
    \code{"none"} for any ordination method, or any of the alternatives
    in \code{\link{plot.cca}} or \code{\link{plot.decorana}} in
    \code{\link{cca}}, \code{\link{rda}} or \code{\link{decorana}}
    graphs.}
  \item{\dots}{Other graphical parameters. }
  \item{x}{A result object from \code{ordiplot}.}
  \item{what}{Items identified in the ordination plot. The types depend
    on the kind of plot used. Most methods know \code{sites} and
    \code{species}, functions \code{\link{cca}} and \code{\link{rda}}
    know in addition 
    \code{constraints} (for `LC' scores), \code{centroids} and
    \code{biplot}, and \code{\link{plot.procrustes}} ordination plot has
    \code{heads} and \code{points}.}
}
\details{
  Function \code{ordiplot} draws an ordination diagram using black circles for
  sites and red crosses for species.  It returns invisibly an object of
  class \code{ordiplot} which can be used by \code{identify.ordiplot}
  to label selected sites or species, or constraints in
  \code{\link{cca}} and \code{\link{rda}}.

  The function can handle output from several alternative ordination
  methods. For \code{\link{cca}}, \code{\link{rda}} and
  \code{\link{decorana}} it uses their \code{plot} method with option
  \code{type = "points"}. In addition, the \code{plot} functions of
  these methods return invisibly an \code{ordiplot} object which can
  be used by \code{identify.ordiplot} to label points. For other
  ordinations it relies on \code{\link{scores}} to extract the scores.

  For full user control of plots, it is best to call \code{ordiplot}
  with \code{type = "none"} and save the result, and then add sites and
  species using \code{points.ordiplot} or \code{text.ordiplot} which
  both pass all their arguments to the corresponding default graphical
  functions. 
}
\value{
  Function \code{ordiplot} returns invisibly an object of class
  \code{ordiplot} with items \code{sites}, \code{species} and
  \code{constraints} (if these are available in the ordination
  object). Function \code{identify.ordiplot} uses this object to label
  the point.
}

\author{
  Jari Oksanen
}
\note{
  The purpose of these functions is to provide similar functionality as
  the \code{plot}, \code{plotid} and \code{specid} methods in library
  \code{labdsv}. The functions are somewhat limited in parametrization,
  but you can call directly the standard \code{\link{identify}} and
  \code{\link{plot}} functions for a better user control.
}

\seealso{ \code{\link{identify}} for basic operations, \code{\link{plot.cca}},
  \code{\link{plot.decorana}}, \code{\link{plot.procrustes}} which also
  produce objects for
  \code{identify.ordiplot} and \code{\link{scores}} for extracting
  scores from non-\code{vegan} ordinations.   
}

\examples{
# Draw a cute NMDS plot
data(dune)
dune.dis <- vegdist(wisconsin(dune))
library(MASS)
dune.mds <- isoMDS(dune.dis)
dune.mds <- postMDS(dune.mds, dune.dis)
# Dirty trick: Save species weighted averages in cproj which we
# know in ordiplot... (you should ask me to improve the function)
dune.mds$cproj <- wascores(dune.mds$points, dune, expand = TRUE)
fig <- ordiplot(dune.mds, type = "none")
points(fig, "sites", pch=21, col="red", bg="yellow")
text(fig, "species", col="blue", cex=0.9)
# A quick plot of the previous.
# identify is not run automatically because it needs user interaction:
\dontrun{fig <- ordiplot(dune.mds)}
\dontrun{identify(fig, "spec")}
}
\keyword{ hplot }
\keyword{ iplot }
\keyword{ aplot }

\eof
\name{ordisurf}
\alias{ordisurf}

\title{ Smooths Variables and Plots Contours on Ordination. }
\description{
  Function \code{ordisurf} fits a smooth surface for given variable and
  plots the result on ordination diagram.
}
\usage{
ordisurf(x, y, choices=c(1, 2), knots=10, family="gaussian", col="red",
     thinplate = TRUE, add = FALSE, ...)
}

\arguments{
  \item{x}{Ordination configuration, either a matrix or a result known
    by \code{\link{scores}}. }
  \item{y}{ Variable to be plotted. }
  \item{choices}{Ordination axes. }
  \item{knots}{Number of initial knots in \code{\link[mgcv]{gam}} (one
    more than degrees of freedom). }
  \item{family}{ Error distribution in  \code{\link[mgcv]{gam}}. }
  \item{col}{ Colour of contours. }
  \item{thinplate}{Use thinplate splines in \code{\link[mgcv]{gam}}.}
  \item{add}{Add contours on an existing diagram or draw a new plot. }
  \item{\dots}{ Other graphical parameters. }
}
\details{
  Function \code{ordisurf} fits a smooth surface using thinplate spline
  fitting in \code{\link[mgcv]{gam}}, and  interpolates the fitted
  values  into a
  regular grid using \code{\link[akima]{interp}}.  Finally, it plots the
  results either over an existing ordination diagram or draws a new plot
  with sample plots and fitted contours.  The function uses
  \code{\link{scores}} to extract ordination scores, and \code{x} can be
  any result object known by that function.
}
\value{
  Function is usually called for its side effect of drawing the contour
  plot, but it returns the result object of \code{\link[mgcv]{gam}}.
}
\author{ Dave Roberts and Jari Oksanen }
\note{ The funtion requires libraries \code{mgcv}
  (\code{\link[mgcv]{gam}}) and \code{akima}
  (\code{\link[akima]{interp}}).  In fact, it is a very primitive wrapper
  for these.

  The default is to use thinplate splines.  These make sense in
  ordination as they have equal smoothing in all directions and are
  rotation invariant.  However, they seem to fail badly in some case,
  and then separate spline smoothing may be used.

  The function was called \code{surf} in older version of \code{vegan};
  new names was chosen to avoid name clashes with the \code{labdsv} package.
}

\seealso{ For basic routines \code{\link[mgcv]{gam}},
  \code{\link[akima]{interp}} and \code{\link{scores}}. Function 
  \code{\link{envfit}} provides a poorer but more traditional and compact
  alternative. } 

\examples{
## The examples are not run by `example(ordisurf)' because they need
## libraries `mgcv' and `akima' which may not exist in every system.
\dontrun{data(varespec)}
\dontrun{data(varechem)}
\dontrun{library(MASS)}
\dontrun{library(mva)}
\dontrun{vare.dist <- vegdist(varespec)}
\dontrun{vare.mds <- isoMDS(vare.dist)}
\dontrun{attach(varespec)}
\dontrun{attach(varechem)}
\dontrun{ordisurf(vare.mds, Baresoil, xlab="Dim1", ylab="Dim2")}
## Total cover of reindeer lichens
\dontrun{ordisurf(vare.mds, Cla.ste+Cla.arb+Cla.ran, xlab="Dim1", ylab="Dim2")} 
}
\keyword{ multivariate }
\keyword{ aplot }

\eof
\name{plot.cca}
\alias{plot.cca}
\alias{text.cca}
\alias{points.cca}
\alias{scores.cca}

\title{Plot or Extract Results of Constrained Correspondence Analysis
  or Redundancy Analysis}
\description{
  Functions to plot or extract results of constrained correspondence analysis
  (\code{\link{cca}}), redundancy analysis (\code{\link{rda}}) or
  constrained analysis of principal coordinates (\code{\link{capscale}}).
}
\usage{
\method{plot}{cca}(x, choices = c(1, 2), display = c("sp", "wa", "cn"),
         scaling = 2, type, ...)
\method{text}{cca}(x, display = "sites", choices = c(1, 2), scaling = 2,
    mul.arrow = 1, head.arrow = 0.05, ...)
\method{points}{cca}(x, display = "sites", choices = c(1, 2), scaling = 2,
    mul.arrow = 1, head.arrow = 0.05, ...)
\method{scores}{cca}(x, choices=c(1,2), display=c("sp","wa","cn"),scaling=2, ...)
}

\arguments{
  \item{x}{A \code{cca} result object.}
    \item{choices}{Axes shown.}
  \item{display}{Scores shown.  These must some of the alternatives
    \code{sp} for species scores, \code{wa} for site scores, \code{lc}
    for linear constraints or ``LC scores'', or \code{bp} for biplot
    arrows or \code{cn} for centroids of factor constraints instead of
    an arrow.}
  \item{type}{Type of plot: partial match to \code{text}
    for text labels, \code{points} for points, and \code{none} for
    setting frames only.  If omitted, \code{text} is selected for
    smaller data sets, and \code{points} for larger.}
  \item{scaling}{Scaling for species and site scores. Either species
    (\code{2}) or site (\code{1}) scores are scaled by eigenvalues, and
    the other set of scores is left unscaled, or with \code{3} both are
    scaled symmetrically by square root of eigenvalues. }
  \item{mul.arrow}{Factor to expand arrows to fit the graph.}
  \item{head.arrow}{Default length of arrow heads.}
  \item{...}{Other parameters for plotting functions.}
}

\details{
  Same \code{plot} function will be used for \code{\link{cca}} and
  \code{\link{rda}}. This produces a quick, standard plot with current
  \code{scaling}.

  The \code{plot} function sets colours (\code{col}), plotting
  characters (\code{pch}) and character sizes (\code{cex}) to
  certain standard values. For a fuller control of produced plot, it is
  best to call \code{plot} with \code{type="none"} first, and then add
  each plotting item separately using \code{text.cca} or
  \code{points.cca} functions. These use the default settings of standard
  \code{\link{text}} and \code{\link{points}} functions and accept all
  their parameters, allowing thus a full user control of produced plots.

  Environmental variables receive a special treatment. With
  \code{display="bp"}, arrows will be drawn. These are labelled with
  \code{text} and unlabelled with \code{points}. The basic \code{plot}
  function uses a simple (but not very clever) heuristics for adjusting
  arrow lengths to plots, but with \code{points.cca} and \code{text.cca}
  the user must give the expansion factor in
  \code{mul.arrow}. The behaviour is still more peculiar with
  \code{display="cn"} which requests centroids of levels of
  \code{\link{factor}} variables (these are available only if there were
  factors and a formula interface was used in \code{\link{cca}} or
  \code{\link{rda}}). With this option, biplot arrows are plotted in
  addition to centroids in cases which do not have a centroid: Continuous
  variables are presented with arrows and ordered factors with arrows
  and centroids. 

  If you want to have still a better control of plots, it is better to
  produce them using primitive \code{plot} commands.. Function
  \code{scores} helps in extracting the 
  needed components with the selected \code{scaling}.
}
\value{
  The \code{plot} function returns invisibly a plotting structure which
  can be used by function \code{\link{identify.ordiplot}} to identify
  the points or other functions in the \code{\link{ordiplot}} family. 
}

\author{Jari Oksanen }
\note{ Option \code{display="cn"} (centroids and biplot arrows) may
  become the default instead of the current \code{display="bp"} in the
  future version.
  }

\seealso{\code{\link{cca}}, \code{\link{rda}} and \code{\link{capscale}}
  for getting something
  to plot, \code{\link{ordiplot}} for an alternative plotting routine
  and more support functions, and \code{\link{text}},
  \code{\link{points}} and \code{\link{arrows}} for the basic routines.  }

\examples{
data(dune)
data(dune.env)
mod <- cca(dune ~ A1 + Moisture + Management, dune.env)
plot(mod, type="n")
text(mod, dis="cn", mul=2)
points(mod, pch=21, col="red", bg="yellow", cex=1.2)
text(mod, "species", col="blue", cex=0.8)
}
\keyword{hplot}
\keyword{aplot}

\eof
\name{procrustes}
\alias{procrustes}
\alias{print.procrustes}
\alias{summary.procrustes}
\alias{print.summary.procrustes}
\alias{plot.procrustes}
\alias{points.procrustes}
\alias{lines.procrustes}
\alias{residuals.procrustes}
\alias{fitted.procrustes}
\alias{protest}
\alias{print.protest}

\title{Procrustes Rotation of Two Configurations }
\description{
 Function \code{procrustes} rotates a configuration to maximum similarity
  with another configuration. Function \code{protest} tests the
  non-randomness (`significance') between two configurations.
}
\usage{
procrustes(X, Y, scale = TRUE, symmetric = FALSE)
\method{summary}{procrustes}(object, ...)
\method{plot}{procrustes}(x, kind=1, choices=c(1,2), xlab, ylab, main,
     ar.col = "blue", len=0.05, ...)
\method{points}{procrustes}(x, display = c("target", "rotated"), ...)
\method{lines}{procrustes}(x, type = c("segments", "arrows"), choices = c(1, 2), ...)  
\method{residuals}{procrustes}(object, ...)
\method{fitted}{procrustes}(object, truemean = TRUE, ...)
protest(X, Y, permutations = 1000, strata)
}
%- maybe also `usage' for other objects documented here.
\arguments{
  \item{X}{Target matrix}
  \item{Y}{Matrix to be rotated.}
  \item{scale}{Allow scaling of axes of \code{Y}.}
  \item{symmetric}{Use symmetric Procrustes statistic (the rotation will
    still be non-symmetric).}
  \item{x, object}{An object of class \code{procrustes}.}
  \item{kind}{For \code{plot} function, the kind of plot produced:
    \code{kind = 1} plots shifts in two configurations, \code{kind = 0}
    draws a corresponding empty plot, and \code{kind = 2}
    plots an impulse diagram of residuals.}
  \item{choices}{Axes (dimensions) plotted.}
  \item{xlab, ylab}{Axis labels, if defaults unacceptable.}
  \item{main}{Plot title, if default unacceptable.}
  \item{display}{Show only the \code{"target"} or \code{"rotated"}
    matrix as points.}
  \item{type}{Combine \code{target} and \code{rotated} points with line
    segments or arrows.}
  \item{truemean}{Use the original range of target matrix instead of
    centring the fitted values.}
  \item{permutations}{Number of permutation to assess the significance
    of the symmetric Procrustes statistic. }
  \item{strata}{An integer vector or factor specifying the strata for
    permutation. If supplied, observations are permuted only within the
    specified strata.}
  \item{ar.col}{Arrow colour.}
  \item{len}{Width of the arrow head.}
  \item{...}{Other parameters passed to functions.}
}
\details{
  Procrustes rotation rotates a matrix to maximum similarity with a
  target matrix minimizing sum of squared differences.  Procrustes
  rotation is typically used in comparison of ordination results.  It is
  particularly useful in comparing alternative solutions in
  multidimensional scaling.  If \code{scale=FALSE}, the function only
  rotates matrix \code{Y}. If \code{scale=TRUE}, it scales linearly
  configuration \code{Y} for maximum similarity.  Since \code{Y} is scaled
  to fit \code{X}, the scaling is non-symmetric. However, with
  \code{symmetric=TRUE}, the configurations are scaled to equal
  dispersions and  a symmetric version of the Procrustes statistic
  is computed.

  Instead of matrix, \code{X} and \code{Y} can be results from an
  ordination from which \code{\link{scores}} can extract results.

  Function \code{plot} plots a \code{procrustes}
  object and  returns invisibly an \code{ordiplot} object so that
  function \code{\link{identify.ordiplot}} can be used for identifying
  points. The items in the \code{ordiplot} object are called
  \code{heads} and \code{points} with \code{kind=1} (ordination diagram)
  and \code{sites} with \code{kind=2} (residuals).  In ordination
  diagrams, the arrow heads point to the target configuration, which may
  be either logical or illogical.  Function \code{plot} passes
  parameters to underlying plotting functions.  For full control of
  plots, you can draw the axes using \code{plot} with \code{kind = 0},
  and then add items with \code{points} or \code{lines}.  These
  functions pass all parameters to the underlying functions so that you
  can select the plotting characters, their size, colours etc., or youc
  an select the width, colour and type of line \code{\link{segments}} or
  arrows, or you can select the orientation and head width of
  \code{\link{arrows}}.

  Function \code{residuals} returns the pointwise
  residuals, and \code{fitted} the fitted values, either centred to zero
  mean (if \code{truemean=FALSE}) or with the original scale (these
  hardly make sense if \code{symmetric = TRUE}). In
  addition, there are \code{summary} and \code{print} methods.

  If matrix \code{X} has a lower number of columns than matrix
  \code{Y}, then matrix \code{X} will be filled with zero columns to
  match dimensions. This means that the function can be used to rotate
  an ordination configuration to an environmental variable (most
  practically extracting the result with the \code{fitted} function).

  Function \code{protest} calls \code{procrustes(..., symmetric = TRUE)}
  repeatedly to estimate the `significance' of the Procrustes
  statistic. Function \code{protest} uses a correlation-like statistic
  derived from the symmetric Procrustes sum of squares \eqn{ss} as
  \eqn{r =\sqrt{(1-ss)}}, and sometimes called \eqn{m_{12}}. Function
  \code{protest} has own \code{print} method, but otherwise uses
  \code{procrustes} methods. Thus \code{plot} with a \code{protest} object
  yields a ``Procrustean superimposition plot.''
}

\value{
  Function \code{procrustes} returns an object of class
  \code{procrustes} with items. Function \code{protest} inherits from
  \code{procrustes}, but amends that with some new items:
  \item{Yrot }{Rotated matrix \code{Y}.}
  \item{X}{Target matrix.}
  \item{ss }{Sum of squared differences between \code{X} and \code{Yrot}.}
  \item{rotation}{Orthogonal rotation matrix.}
  \item{translation}{Translation of the origin.}
  \item{scale}{Scaling factor.}
  \item{symmetric}{Type of \code{ss} statistic.}
  \item{call}{Function call.}
  \item{t0}{This and the following items are only in class
    \code{protest}:  Procrustes correlation from non-permuted solution.}
  \item{t}{Procrustes correlations from permutations.}
  \item{signif}{`Significance' of \code{t}}
  \item{permutations}{Number of permutations.}
  \item{strata}{The name of the stratifying variable.}
  \item{stratum.values}{Values of the stratifying variable.}
}
\references{
  Mardia, K.V., Kent, J.T. and Bibby,
  J.M. (1979). \emph{Multivariate Analysis}. Academic Press.

  Peres-Neto, P.R. and Jackson, D.A. (2001). How well do multivariate
  data sets match? The advantages of a Procrustean superimposition
  approach over the Mantel test. \emph{Oecologia} 129: 169-178.
  
}
\author{Jari Oksanen }

\note{The function \code{protest} follows Peres-Neto & Jackson (2001),
  but the implementation is still after Mardia \emph{et al.}
  (1979).}

\seealso{\code{\link[MASS]{isoMDS}}, \code{\link{initMDS}} for obtaining
objects for \code{procrustes}, and \code{\link{mantel}} for an
alternative to \code{protest} without need of dimension reduction.} 

\examples{
data(varespec)
vare.dist <- vegdist(wisconsin(varespec))
library(MASS)  ## isoMDS
library(mva)   ## cmdscale to start isoMDS
mds.null <- isoMDS(vare.dist, tol=1e-7)
## This was a good seed for me: your rng may vary.
set.seed(237)
mds.alt <- isoMDS(vare.dist, initMDS(vare.dist), maxit=200, tol=1e-7)
vare.proc <- procrustes(mds.alt$points, mds.null$points)
vare.proc
summary(vare.proc)
plot(vare.proc)
plot(vare.proc, kind=2)
residuals(vare.proc)
## Reset rng:
rm(.Random.seed)
}
\keyword{multivariate }%-- one or more ...
\keyword{htest}

\eof
\name{radfit}
\alias{radfit}
\alias{radfit.default}
\alias{radfit.data.frame}
\alias{AIC.radfit}
\alias{as.rad}
\alias{coef.radfit}
\alias{fitted.radfit}
\alias{lines.radline}
\alias{plot.radfit.frame}
\alias{plot.radfit}
\alias{plot.radline}
\alias{plot.rad}
\alias{points.radline}
\alias{print.radfit.frame}
\alias{print.radfit}
\alias{print.radline}
\alias{rad.preempt}
\alias{rad.lognormal}
\alias{rad.veil}
\alias{rad.zipf}
\alias{rad.zipfbrot}

\title{ Rank -- Abundance or Dominance / Diversity Models}
\description{
  Functions construct rank -- abundance or dominance / diversity or
  Whittaker plots and fit pre-emption, log-Normal, veiled log-Normal,
  Zipf and Zipf -- Mandelbrot models of species abundance.
}
\usage{
\method{radfit}{data.frame}(df, ...)
\method{plot}{radfit.frame}(x, order.by, BIC = FALSE, model, legend = TRUE,
     as.table = TRUE, ...)
\method{radfit}{default}(x, ...)
\method{plot}{radfit}(x, BIC = FALSE, legend = TRUE, ...)  
rad.preempt(x, family = poisson, ...)
rad.lognormal(x, family = poisson, ...)
rad.veil(x, family = poisson, ...)
rad.zipf(x, family = poisson, ...)
rad.zipfbrot(x, family = poisson, ...)
\method{plot}{radline}(x, xlab = "Rank", ylab = "Abundance", type = "b", ...)
\method{lines}{radline}(x, ...)
\method{points}{radline}(x, ...)
as.rad(x)
\method{plot}{rad}(x, xlab = "Rank", ylab = "Abundance", ...)
}

\arguments{
  \item{df}{Data frame where sites are rows and species are columns.}
  \item{x}{A vector giving species abundances in a site, or an object to
    be plotted.}
  \item{order.by}{A vector used for ordering sites in plots.}
  \item{BIC}{Use Bayesian Information Criterion, BIC, instead of
    Akaike's AIC. The penalty for a parameter is \eqn{k = \log(S)} where
    \eqn{S} is the number of species, whereas AIC uses \eqn{k = 2}.}
  \item{model}{Show only the specified model. If missing, AIC is used to
    select the model. The model names (which can be abbreviated) are
    \code{Preemption}, \code{Lognormal}, \code{Veiled.LN},
    \code{Zipf}, \code{Mandelbrot}. }
  \item{legend}{Add legend of line colours.}
  \item{as.table}{Arrange panels starting from upper left corner (passed
    to \code{\link[lattice]{xyplot}}).}
  \item{family}{Error distribution (passed to \code{\link{glm}}). All
    alternatives accepting \code{link = "log"} in \code{\link{family}}
    can be used, although not all make sense.}
  \item{xlab,ylab}{Labels for \code{x} and \code{y} axes.}
  \item{type}{Type of the plot, \code{"b"} for plotting both observed points
    and fitted lines, \code{"p"} for only points, \code{"l"} for only
    fitted lines, and \code{"n"} for only setting the frame. }
  \item{\dots}{Other parameters to functions. }
}
\details{
  Rank -- Abundance Dominance (RAD) or Dominance/Diversity plots
  (Whittaker 1965) display logarithmic species abundances against
  species rank order in the community. These plots are supposed to be
  effective in analysing types of abundance distributions in
  communities. These functions fit some of the most popular models
  following Wilson (1991). Function \code{as.rad} constructs observed
  RAD data.
  Functions \code{rad.XXXX} (where \code{XXXX} is a name) fit
  the individual models, and
  function \code{radfit} fits all models.  The
  argument of the function \code{radfit} can be either a vector for a
  single community or a data frame where each row represents a
  distinct community. All these functions have their own \code{plot}
  functions. When the argument is a data frame, \code{plot} uses
  \code{\link[lattice]{Lattice}} graphics, and other functions use
  ordinary graphics. The ordinary graphics functions return invisibly an
  \code{\link{ordiplot}} object for observed points, and function
  \code{\link{identify.ordiplot}} can be used to label selected
  species. The most complete control of graphics can be achieved
  with \code{rad.XXXX} methods which have \code{points} and \code{lines}
  functions to add observed values and fitted models into existing
  graphs.  

  Function \code{rad.preempt} fits the niche preemption model,
  a.k.a. geometric series or Motomura model, where the expected
  abundance \eqn{a} of species at rank \eqn{r} is \eqn{a_r = J \alpha (1 -
    \alpha)^{r-1}}{a[r] = J*alpha*(1-alpha)^(r-1)}. The only estimated
  parameter is the preemption coefficient \eqn{\alpha} which gives the
  decay rate of abundance per rank. In addition there is a fixed scaling
  parameter \eqn{J} which is the total abundance. 
  The niche preemption model is a straight line in a
  RAD plot. Function \code{rad.lognormal} fits a log-Normal model which
  assumes that the logarithmic abundances are distributed Normally, or
  \eqn{a_r =  \exp( \log \mu + \log \sigma N)}{a[r] = exp(log(mu) +
    log(sigma) * N)}, where \eqn{N} is a Normal deviate. 
  Function \code{rad.veil} is similar, but it assumes
  that only a proportion \code{veil} of most common species were
  observed in the community, the rest being too rare or scanty to occur
  in a sample plot of this size (but would occur in a larger
  plot). Function \code{rad.zipf} fits the Zipf model \eqn{a_r = J p_1
    r^\gamma}{a[r] = J*p1*r^gamma} where \eqn{p_1}{p1} is the fitted
  proportion of the most abundant species, and \eqn{\gamma} is a decay coefficient. The
  Zipf -- Mandelbrot 
  model (\code{rad.zipfbrot}) adds one parameter: \eqn{a_r = J c
    (r + \beta)^\gamma}{a[r] = J*c*(r+beta)^gamma} after which
  \eqn{p_1}{p1} of the Zipf model changes into a meaningless scaling
  constant \eqn{c}. There are great histories about ecological
  mechanisms behind each model (Wilson 1991), but
  several alternative and contrasting mechanisms can produce
  similar models and a good fit does not imply a specific mechanism.

  Log-Normal and Zipf models are generalized linear
  models (\code{\link{glm}}) with logarithmic link function.
  Veiled log-Normal and Zipf -- Mandelbrot add one
  nonlinear parameter, and these two models are fitted using
  \code{\link{nlm}} for the nonlinear parameter and estimating other
  parameters and log-Likelihood with \code{\link{glm}}. Pre-emption
  model is fitted as purely nonlinear model.  The default
  \code{\link{family}} is \code{poisson} which is appropriate only for
  genuine counts (integers), but other families that accept \code{link =
    "log"} can be used. Family \code{\link{Gamma}} may be
  appropriate for abundance data, such as cover. The ``best''
  model is selected by \code{\link{AIC}}. Therefore ``quasi'' families
  such as \code{\link{quasipoisson}} cannot be used: they do not
  have \code{\link{AIC}} nor log-Likelihood needed in non-linear
  models.
}

\value{
  Function \code{rad.XXXX} will return an object of class
  \code{radline}, which is constructed to resemble results of \code{\link{glm}}
  and has many (but not all) of its components, even when only
  \code{\link{nlm}} was used in fitting. At least the following
  \code{\link{glm}} methods can be applied to the result:
  \code{\link{fitted}}, \code{\link{residuals.glm}}  with alternatives
  \code{"deviance"} (default), \code{"pearson"}, \code{"response"},
  function \code{\link{coef}}, \code{\link{AIC}},
  \code{\link{extractAIC}}, and \code{\link{deviance}}.
  Function \code{radfit} applied to a vector will return
  an object of class \code{radfit} with item \code{y} for the
  constructed RAD, item \code{family} for the error distribution, and
  item \code{models} containing each \code{radline} object as an
  item. In addition, there are special \code{AIC}, \code{coef} and
  \code{fitted} implementations for \code{radfit} results. 
  When applied to a data frame
  \code{radfit} will return an object of class \code{radfit.frame} which
  is a list of \code{radfit} objects. The functions are still
  preliminary, and the items in the \code{radline} objects may change.
}

\references{
  Preston, F.W. (1948) The commonness and rarity of
  species. \emph{Ecology} 29, 254--283.
  
  Whittaker, R. H. (1965) Dominance and diversity in plant
  communities. \emph{Science} 147, 250--260.

  Wilson, J. B. (1991) Methods for fitting dominance/diversity
  curves. \emph{Journal of Vegetation Science} 2, 35--46.
}
\author{ Jari Oksanen }
\note{
  The RAD models are usually fitted for proportions instead of original
  abundances. However, nothing in these models seems to require division
  of abundances by site totals, and original observations are used in
  these functions. If you wish to use proportions, you must standardize
  your data by site totals, e.g. with \code{\link{decostand}} and use
  appropriate \code{\link{family}} such as \code{\link{Gamma}}.

  The lognormal model is fitted in a standard way, but I do think this is
  not quite correct -- at least it is not equivalent to fitting Normal
  density to log abundances like originally suggested (Preston 1948). 
  
  Some models may fail. In particular, \code{rad.veil} often tends to
  \code{veil = 0} meaning that none of the community is present, and the
  function prints an error message \code{Error: NA/NaN/Inf in foreign
    function call (arg 1)}. The error is caught and \code{NA} are
  returned.

  Wilson (1991) defined preemption model as \eqn{a_r = J p_1 (1
    - \alpha)^{r-1}}{a[r] = J*p1*(1 - alpha)^(r-1)}, where \eqn{p_1}{p1}
    is the fitted proportion of the first species. However, parameter
    \eqn{p_1}{p1} is completely defined by \eqn{\alpha} since the fitted
    proportions must add to one, and therefore I handle preemption as a
    one-parameter model.  
}
\seealso{\code{\link{fisherfit}} and \code{\link{prestonfit}}.
  An alternative approach is to use
  \code{\link{qqnorm}} or  \code{\link{qqplot}} with any distribution.
  For controlling graphics: \code{\link[lattice]{Lattice}},
  \code{\link[lattice]{xyplot}}, \code{\link[lattice]{lset}}. }
\examples{
data(BCI)
mod <- rad.veil(BCI[1,])
mod
plot(mod)
mod <- radfit(BCI[1,])
plot(mod)
# Take a subset of BCI to save time and nerves
mod <- radfit(BCI[2:5,])
mod
plot(mod, pch=".")
}
\keyword{ univar }
\keyword{ distribution }

\eof
\name{rankindex}
\alias{rankindex}

\title{Compares Dissimilarity Indices for Gradient Detection }
\description{
  Rank correlations between dissimilarity indices
  and gradient separation.
}
\usage{
rankindex(grad, veg, indices = c("euc", "man", "gow", "bra", "kul"),
          stepacross = FALSE, method = "kendall", ...)
}

\arguments{
  \item{grad}{The gradient variable or matrix. }
  \item{veg}{The community data matrix. }
  \item{indices}{Dissimilarity indices compared, partial matches to
    alternatives in \code{\link{vegdist}}.}
  \item{stepacross}{Use \code{\link{stepacross}} to find
    a shorter path dissimilarity. The dissimilarities for site pairs
    with no shared species are set \code{NA} using
    \code{\link{no.shared}} so that indices with no fixed
    upper limit can also be analysed.}
  \item{method}{Rank correlation method used. }
  \item{...}{Other parameters to \code{\link{stepacross}}.}
}
\details{
  A good dissimilarity index for multidimensional scaling 
  should have a high rank-order similarity with gradient separation.
  The function compares most indices in \code{\link{vegdist}} against
  gradient separation using rank correlation coefficients in
  \code{\link[ctest]{cor.test}}.
}
\value{
  Returns a named vector of rank correlations.
}
\references{ Faith, F.P., Minchin, P.R. and Belbin,
  L. (1987).  Compositional dissimilarity as a robust measure of
    ecological distance. \emph{Vegetatio} 69, 57-68. }
\author{Jari Oksanen }
\note{
  There are several problems in using rank correlation coefficients.
  Typically there are very many ties when \eqn{n(n-1)/2} gradient
  separation values are derived from just \eqn{n} observations.
  Due to floating point arithmetics, many tied values differ by
  machine epsilon and are arbitrarily ranked differently by
  \code{\link{rank}} used in \code{\link[ctest]{cor.test}}.  Two indices
  which are identical with certain
  transformation or standardization may differ slightly
  (magnitude \eqn{10^{-15}}) and this may lead into third or fourth decimal
  instability in rank correlations.  Small differences in rank
  correlations should not be taken too seriously.  Probably this method
  should be replaced with a sounder method, but I do not yet know
  which\ldots  You may experiment with \code{\link{mantel}},
  \code{\link{anosim}} or even \code{\link{protest}}. 

  Kendall's rank correlation may be very slow with large data sets and
  you may consider
  other alternatives in \code{\link[ctest]{cor.test}}.
}


\seealso{\code{\link{vegdist}}, \code{\link{stepacross}},
  \code{\link{no.shared}}, \code{\link[MASS]{isoMDS}},
    \code{\link[ctest]{cor.test}}, \code{\link{Machine}}, and for
    alternatives \code{\link{anosim}}, \code{\link{mantel}} and
    \code{\link{protest}}. }

\examples{
data(varespec)
data(varechem)
## The next scales all environmental variables to unit variance.
## Some would use PCA transformation.
rankindex(scale(varechem), varespec)
rankindex(scale(varechem), wisconsin(varespec))
}
\keyword{ multivariate }%-- one or more ...

\eof
\name{read.cep}
\alias{read.cep}

\title{Reads a CEP (Canoco) data file }
\description{
  \code{read.cep} reads a file formatted by relaxed strict CEP format
  used by \code{Canoco} software, among others.
}
\usage{
read.cep(file, maxdata=10000, positive=TRUE, trace=FALSE, force=FALSE)
}

\arguments{
  \item{file}{File name (character variable). }
  \item{maxdata}{Maximum number of non-zero entries. }
  \item{positive}{Only positive entries, like in community data.}
  \item{trace}{Work verbosely.}
  \item{force}{Run function, even if \R refuses first.}
}
\details{
  Cornell Ecology Programs (CEP) introduced several data formats
  designed for punched cards.  One of these was the `condensed strict'
  format which was adopted by popular software \code{DECORANA} and
  \code{TWINSPAN}. Later, Cajo ter Braak wrote \code{Canoco}
  based on \code{DECORANA}, where he adopted the format, but relaxed it
  somewhat (that's why I call it a `relaxed strict' format). Further, he
  introduced a more ordinary `free' format, and allowed the use of
  classical Fortran style `open' format with fixed field widths.  This
  function should be able to deal with all these \code{Canoco} formats,
  whereas it cannot read many of the traditional CEP alternatives.

  All variants of CEP formats have:
  \itemize{
    \item Two or three title cards, most importantly specifying the format (or word
    \code{FREE}) and the number of items per record (number of species
    and sites for \code{FREE} format).
    \item Data in one of three accepted formats:
    \enumerate{
      \item Condensed format: First number on the line is the site
      identifier, and it is followed by pairs (`couplets') of numbers
      identifying the species and its abundance (an integer and a floating
      point number).
      \item Open Fortran format, where the first number on the line must
      be the site number, followed by abundance values in fields of
      fixed widths. Empty fields are interpreted as zeros.
      \item `Free' format, where the numbers are interpreted as
      abundance values.  These numbers must be separated by blank space,
      and zeros must be written as zeros.
    }
    \item Species and site names, given in Fortran format \code{(10A8)}:
    Ten names per line, eight columns for each.
  }

  With option \code{positive = TRUE} the function removes all lines and
  columns with zero or negative marginal sums.  In community data
  with only positive entries, this removes empty sites and species.
  If data entries can be negative, this ruins data, and such data sets
  should be read in with option \code{positive = FALSE}.
}
\value{
  Returns a data frame, where columns are species and rows are
  sites. Column and row names are taken from the CEP file, and changed
  into unique \R names by \code{\link{make.names}} after stripping the blanks.
}
\references{ 
  Ter Braak, C.J.F. (1984--): CANOCO -- a FORTRAN program for \emph{cano}nical
  \emph{c}ommunity \emph{o}rdination by [partial] [detrended] [canonical]
  correspondence analysis, principal components analysis and redundancy
  analysis. \emph{TNO Inst. of Applied Computer Sci., Stat. Dept. Wageningen,
  The Netherlands}. 
}
\author{ Jari Oksanen }

\note{
  The function relies on smooth linking of Fortran file IO in \R
  session.  This is not guaranteed to work, and therefore the function
  may not work in \emph{your} system, but it
  can crash the \R session.  Therefore the default is that the function
  does not run.  If you still want to try:
  \enumerate{
    \item
    Save your session
    \item
    Run \code{read.cep()} with switch \code{force=TRUE}
  }
}


\examples{
## Provided that you have the file `dune.spe'
\dontrun{theclassic <- read.cep("dune.spe", force=T)}
}
\keyword{ IO }
\keyword{ file }





\eof
\name{scores}
\alias{scores}
\alias{scores.default}

\title{ Get Species or Site Scores from an Ordination }
\description{
  Function to access either species or site scores for specified axes in
  some ordination methods.
}
\usage{
\method{scores}{default}(x, display=c("sites", "species"), choices, ...)
}

\arguments{
  \item{x}{ An ordination result. }
  \item{display}{ Partial match to access scores for \code{sites} or
    \code{species}.  }
  \item{choices}{ Ordination axes.  If missing, returns all axes.}
  \item{...}{ Other parameters (unused). }
}
\details{
  Functions \code{\link{cca}} and \code{\link{decorana}} have specific
  \code{scores} function to access their ordination scores.  Most
  standard ordination methods of libraries \code{mva}, \code{multiv} and
  \code{MASS} do not have a  specific\code{class}, and no specific method can be
  written for them.  However, \code{scores.default} guesses where
  some commonly used functions keep their site scores and possible
  species scores.  For site scores, the function seeks items in order
  \code{points}, \code{rproj}, \code{x}, and \code{scores}.  For species,
  the seeking order is \code{cproj}, \code{rotation}, and
  \code{loadings}.
  If \code{x} is a matrix, \code{scores.default} returns the chosen
  columns of that matrix, ignoring whether species or sites were
  requested (do not regard this as a bug but as a feature, please).
  Currently the function seems to work at least for \code{\link[MASS]{isoMDS}},
  \code{\link[mva]{prcomp}}, \code{\link[mva]{princomp}},
  \code{\link[multiv]{ca}}, \code{\link[multiv]{pca}}.  It may work in
  other cases or fail mysteriously.
}
\value{
  The function returns a matrix of requested scores.
}
\author{Jari Oksanen }

\seealso{\code{\link{scores.cca}}, \code{\link{scores.decorana}}.  These
have somewhat different interface -- \code{\link{scores.cca}} in
particular -- but all work with keywords \code{display="sites"} and
\code{display="species"} and return a matrix with these.
}
\examples{
data(varespec)
library(mva)
vare.pca <- prcomp(varespec)
scores(vare.pca, choices=c(1,2))
}
\keyword{ multivariate }





\eof
\name{specaccum}
\alias{specaccum}
\alias{print.specaccum}
\alias{summary.specaccum}
\alias{plot.specaccum}
\alias{boxplot.specaccum}

\title{Species Accumulation Curves }
\description{
  Function \code{specaccum} finds species accumulation curves or the
  number of species for a certain number of sampled sites or
  individuals. 
}
\usage{
specaccum(comm, method = "exact", permutations = 100, ...)
\method{plot}{specaccum}(x, add = FALSE, ci = 2, ci.type = c("bar", "line", "polygon"), 
    col = par("fg"), ci.col = col, ci.lty = 1, xlab = "Sites", 
    ylab = x$method, ...)
\method{boxplot}{specaccum}(x, add = FALSE, ...)
}

\arguments{
  \item{comm}{Community data set.}
  \item{method}{Species accumulation method (partial match). Method
    \code{"collector"}
    adds sites in the order they happen to be in the data,
    \code{"random"} adds sites in random order, \code{"exact"} finds the
    expected (mean) species richness, \code{"coleman"} finds the
    expected richness following
    Coleman et al. 1982, and \code{"rarefaction"} finds the mean when
    accumulating individuals instead of sites.  }
  \item{permutations}{Number of permutations with \code{method =
      "random"}.}
  \item{x}{A \code{specaccum} result object}
  \item{add}{Add to an existing graph.}
  \item{ci}{Multiplier used to get confidence intervals from standard
    deviation (standard error of the estimate). Value \code{ci = 0}
    suppresses drawing confidence intervals.}
  \item{ci.type}{Type of confidence intervals in the graph: \code{"bar"}
    draws vertical bars, \code{"line"} draws lines, and
    \code{"polygon"} draws a shaded area.}
  \item{col}{Colour for drawing lines.}
  \item{ci.col}{Colour for drawing lines or filling the
    \code{"polygon"}.}
  \item{ci.lty}{Line type for confidence intervals or border of the
    \code{"polygon"}.}
  \item{xlab,ylab}{Labels for \code{x} and \code{y} axis.}
  \item{...}{Other parameters to functions.}
}
\details{
  Species accumulation curves (SAC) are used to compare diversity properties
  of community data sets using different accumulator functions. The
  classic method is \code{"random"} which finds the mean SAC and its
  standard deviation from random permutations of the data, or
  subsampling without replacement (Gotelli & Colwell 2001).
  The \code{"exact"} method finds the
  expected SAC using the method of Kindt (2003), and its standard deviation.
  Method \code{"coleman"} finds the expected SAC and its standard
  deviation following Coleman et al. (1982).  All these methods are
  based on sampling sites without replacement. In contrast, the
  \code{method = "rarefaction"} finds the expected species richness and
  its standard deviation by sampling individuals instead of sites. It
  achieves this by applying function \code{\link{rarefy}} with number of individuals
  corresponding to average number of individuals per site.

  The function has a \code{plot} method. In addition, \code{method =
    "random"} has \code{summary} and \code{boxplot} methods. 
}

\value{
  The function returns an object of class \code{"specaccum"} with items:
  \item{call }{Function call.}
  \item{method}{Accumulator method.}
  \item{sites}{Number of sites.  For \code{method = "rarefaction"} this
    is the average number of sites corresponding to a certain number of
    individuals.}
  \item{richness}{The number of species corresponding to number of
    sites.  With \code{method = "collector"} this is the observed
    richness, for other methods the average or expected richness.}
  \item{sd}{The standard deviation of SAC (or its standard error). This
    is \code{NULL} in \code{method = "collector"}, and it
    is estimated from permutations in \code{method = "random"}, and from
    analytic equations in other methods.}
  \item{perm}{Permutation results with \code{method = "random"} and
    \code{NULL} in other cases. Each column in \code{perm} holds one
    permutation.}
}
\references{
  Coleman, B.D, Mares, M.A., Willis, M.R. & Hsieh,
  Y. (1982). Randomness, area and species richness. \emph{Ecology} 63:
  1121--1133. 
  
  Gotellli, N.J. & Colwell, R.K. (2001). Quantifying biodiversity:
  procedures and pitfalls in measurement and comparison of species
  richness. \emph{Ecol. Lett.} 4, 379--391.

  Kindt, R. (2003). Exact species richness for sample-based accumulation
  curves. \emph{Manuscript.}
}
\author{Roeland Kindt \email{r.kindt@cgiar.org} and Jari Oksanen.}
\note{
  The SAC with \code{method = "exact"} was
  developed by Roeland Kindt, and its standard deviation by Jari
  Oksanen (both are unpublished). The \code{method = "coleman"}
  underestimates the SAC because it does not handle properly sampling
  without replacement.  Further, its standard deviation does not take
  into account species correlations, and is generally too low. }

\seealso{\code{\link{rarefy}}. Underlying graphical functions are
  \code{\link{boxplot}}, \code{\link{matlines}}, \code{\link{segments}}
    and \code{\link{polygon}}. }
\examples{
data(BCI)
sp1 <- specaccum(BCI)
sp2 <- specaccum(BCI, "random")
sp2
summary(sp2)
plot(sp1, ci.type="poly", col="blue", lwd=2, ci.lty=0, ci.col="lightblue")
boxplot(sp2, col="yellow", add=TRUE, pch="+")
}
\keyword{univar }


\eof
\name{specpool}
\alias{specpool}
\alias{specpool2vect}
\title{ Extrapolated Species Richness in a Species Pool}
\description{
  The function estimates the extrapolated species richness in a species
  pool, or the number of unobserved species.
}
\usage{
specpool(x, pool)
specpool2vect(X, index = c("Jack.1","Jack.2", "Chao", "Boot", "Species"))
}

\arguments{
  \item{x}{Data frame or matrix with species data.}
  \item{pool}{A vector giving a classification for pooling the sites in
    the species data. If missing, all sites are pooled together.}
  \item{X}{A \code{specpool} result object.}
  \item{index}{The selected index of extrapolated richness.}
}
\details{
  Many species will always remain unseen or undetected in a collection
  of sample plots.  The function uses some popular ways of estimating
  the number of these unseen species and adding them to the observed
  species richness (Palmer 1990, Colwell & Coddington 1994).

  In the following, \eqn{S_P} is the extrapolated richness in a pool,
  \eqn{S_0} is the observed number of species in the
  collection, \eqn{a_1}{a1} and \eqn{a_2}{a2} are the number of species
  occurring only in one or only in two sites in the collection, \eqn{p_i}
  is the frequency of species \eqn{i}, and \eqn{N} is the number of
  sites in the collection.  The variants of extrapolated richness are:
  \tabular{ll}{
    Chao
    \tab \eqn{S_P = S_0 + \frac{a_1^2}{2 a_2}}{S_P = S_0 + a1/2/a2}
    \cr
    First order jackknife
    \tab \eqn{S_P = S_0 + a_1 \frac{N-1}{N}}{S_P = S_0 + a1*(N-1)/N}
    \cr
    Second order jackknife
    \tab \eqn{S_P = S_0 + a_1 \frac{2N - 3}{N} - a_2 \frac{(N-2)^2}{N
	(N-1)}}{S_P = S_0 + a1*(2*n-3)/n - a2*(n-2)^2/n/(n-1)}
    \cr
    Bootstrap
    \tab \eqn{S_P = S_0 + \sum_{i=1}^{S_0} (1 - p_i)^N}{S_P = S_0 + Sum
      (1-p_i)^N}
    }
 
}
\value{
  The function returns a data frame with entries for observed richness
  and each of the indices for each class in \code{pool} vector.  The
  utility function \code{specpool2vect} maps the pooled values into
  a vector giving the value of selected \code{index} for each original
  site. 
}
\references{
  Colwell, R.K. & Coddington, J.A. (1994). Estimating terrestrial
  biodiversity through
  extrapolation. \emph{Phil. Trans. Roy. Soc. London} B 345, 101--118.

  Palmer, M.W. (1990). The estimation of species richness by
  extrapolation. \emph{Ecology} 71, 1195--1198.


}
\author{Jari Oksanen }
\note{
  The functions are based on assumption that there is a species pool:
  The community is closed so that there is a fixed pool size \eqn{S_P}.
  Such cases may exist, although I have not seen them yet.  All indices
  are biased for open communities.

  An approximate ("traditional") variant is used for the Chao index.

  The function is still preliminary.  I may add variances, although
  these seem to be biased and confusing.

  See \url{http://viceroy.eeb.uconn.edu/EstimateS} for a more complete
  (and positive) discussion and alternative software for some platforms.
}
\seealso{\code{\link{veiledspec}}, \code{\link{diversity}}. }
\examples{
data(dune)
data(dune.env)
attach(dune.env)
pool <- specpool(dune, Management)
pool
op <- par(mfrow=c(1,2))
boxplot(specnumber(dune) ~ Management, col="hotpink", border="cyan3",
 notch=TRUE)
boxplot(specnumber(dune)/specpool2vect(pool) ~ Management, col="hotpink",
 border="cyan3", notch=TRUE)
par(op)
}
\keyword{ univar }


\eof
\name{stepacross}
\alias{stepacross}

\title{Stepacross as Flexible Shortest Paths or Extended Dissimilarities } 
\description{
  Function \code{stepacross} tries to replace dissimilarities with
  shortest paths stepping across intermediate 
  sites while regarding dissimilarities above a threshold as missing
  data (\code{NA}). With \code{path = "shortest"} this is the flexible shortest
  path (Williamson 1978, Bradfield & Kenkel 1987),
  and with \code{path = "extended"} an
  approximation known as extended dissimilarities (De'ath 1999).
  The use of \code{stepacross} should improve the ordination with high
  beta diversity, when there are many sites with no species in common.
}
\usage{
stepacross(dis, path = "shortest", toolong = 1, trace = TRUE)
}
\arguments{
  \item{dis}{Dissimilarity data inheriting from class \code{dist} or
    a an object, such as a matrix, that can be converted to a
    dissimilarity matrix. Functions \code{\link{vegdist}} and
    \code{\link[mva]{dist}} are some functions producing suitable
    dissimilarity data. }
  \item{path}{The method of stepping across (partial match)
    Alternative \code{"shortest"} finds the shortest paths, and
    \code{"extended"}  their approximation known as extended
    dissimilarities.} 
  \item{toolong}{Shortest dissimilarity regarded as \code{NA}.
    The function uses a fuzz factor, so
    that dissimilarities close to the limit will be made \code{NA}, too. }
  \item{trace}{ Trace the calculations.}
}
\details{
  Williamson (1978) suggested using flexible shortest paths to estimate
  dissimilarities between sites which have nothing in common, or no shared
  species. With \code{path = "shortest"} function \code{stepacross}
  replaces dissimilarities that are
  \code{toolong} or longer with \code{NA}, and tries to find shortest
  paths between all sites using remaining dissimilarities. Several
  dissimilarity indices are semi-metric which means that they do not
  obey the triangle inequality \eqn{d_{ij} \leq d_{ik} + d_{kj}}{d[ij] <=
    d[ik] + d[kj]}, and shortest path algorithm can replace these
  dissimilarities as well, even when they are shorter than
  \code{toolong}. 

  De'ath (1999) suggested a simplified method known as extended
  dissimilarities, which are calculated with \code{path =
    "extended"}. In this method, dissimilarities that are
  \code{toolong} or longer are first made \code{NA}, and then the function
  tries replace these \code{NA} dissimilarities with a path through
  single stepping stone points. If not all \code{NA} could be 
  replaced with one pass, the function will make new passes with updated
  dissimilarities as long as
  all \code{NA} are replaced with extended dissimilarities. This mean
  that in the second and further passes, the remaining \code{NA}
  dissimilarities are allowed to have more than one stepping stone site,
  but previously replaced dissimilarities are not updated. Further, the
  function does not consider dissimilarities shorter than \code{toolong},
  although some of these could be replaced with a shorter path in
  semi-metric indices, and used as a part of other paths. In optimal
  cases, the extended dissimilarities are equal to shortest paths, but
  in several cases they are longer.  

  As an alternative to defining too long dissimilarities with parameter
  \code{toolong}, the input dissimilarities can contain \code{NA}s. If
  \code{toolong} is zero or negative, the function does not make any
  dissimilarities into \code{NA}. If there are no \code{NA}s in the
  input  and \code{toolong = 0}, \code{path = "shortest"}
  will find shorter paths for semi-metric indices, and \code{path =
    "extended"} will do nothing. Function \code{\link{no.shared}} can be
  used to set dissimilarities to \code{NA}.
  
  If the data are disconnected or there is no path between all points,
  the result will
  contain \code{NA}s and a warning is issued. Several methods cannot
  handle \code{NA} dissimilarities, and this warning should be taken
  seriously. Function \code{\link{distconnected}} can be used to find
  connected groups and remove rare outlier observations or groups of
  observations.

  Alternative \code{path = "shortest"} uses Dijkstra's method for
  finding flexible shortest paths, implemented as priority-first search
  for dense graphs (Sedgewick 1990). Alternative \code{path =
    "extended"} follows De'ath (1999), but implementation is simpler
  than in his code in \code{\link[pcurve]{pcdists}}.
  
}
\value{
  Function returns an object of class \code{dist} with extended
  dissimilarities (see functions \code{\link{vegdist}} and
  \code{\link[mva]{dist}}). 
  The value of \code{path} is appended to the \code{method} attribute.
}
\references{
  Bradfield, G.E. & Kenkel, N.C. (1987). Nonlinear ordination using
  flexible shortest path adjustment of ecological
  distances. \emph{Ecology} 68, 750--753.
  
  De'ath, G. (1999). Extended dissimilarity: a method of robust
  estimation of ecological distances from high beta diversity
  data. \emph{Plant Ecol.} 144, 191--199.

  Sedgewick, R. (1990). \emph{Algorithms in C}. Addison Wesley. 

  Williamson, M.H. (1978). The ordination of incidence
  data. \emph{J. Ecol.} 66, 911-920.
}
\author{ Jari Oksanen}
\note{
  The function changes the original dissimilarities, and not all
  like this. It may be best to  use  the
  function only when you really \emph{must}:  extremely high
  beta diversity where a large proportion of dissimilarities are at their
  upper limit (no species in common). 

  Semi-metric indices vary in their degree of violating the triangle
  inequality. Morisita and Horn--Morisita indices of
  \code{\link{vegdist}} may be very strongly semi-metric, and shortest
  paths can change these indices very much. Mountford index violates
  basic rules of dissimilarities: non-identical sites have zero
  dissimilarity if species composition of the poorer site is a subset of
  the richer. With Mountford index, you can find three sites \eqn{i, j,
    k} so that \eqn{d_{ik} = 0}{d[ik] = 0} and \eqn{d_{jk} = 0}{d[jk] =
    0}, but \eqn{d_{ij} > 0}{d[ij] > 0}. The results of \code{stepacross}
  on Mountford index can be very weird. If \code{stepacross} is needed,
  it is best to try to use it with more metric indices only.
}

\seealso{
  Function \code{\link{distconnected}} can find connected groups in
  disconnected data, and function \code{\link{no.shared}} can be used to
  set dissimilarities as \code{NA}. 
  Function \code{\link[pcurve]{pcdists}} in library \code{pcurve}
  contains an alternative implementation. 
 }
\examples{
# There are no data sets with high beta diversity in vegan, but this
# should give an idea.
data(dune)
dis <- vegdist(dune)
edis <- stepacross(dis)
plot(edis, dis, xlab = "Shortest path", ylab = "Original")
## Manhattan distance have no fixed upper limit.
dis <- vegdist(dune, "manhattan")
is.na(dis) <- no.shared(dune)
dis <- stepacross(dis, toolong=0)
}
\keyword{multivariate }


\eof
\name{varespec}
\alias{varechem}
\alias{varespec}
\docType{data}
\title{Vegetation and environment in lichen pastures}
\usage{
       data(varechem)
       data(varespec)
}
\description{
  The \code{varespec} data frame has 24 rows and 44 columns.  Columns
  are estimated cover values of 44 species.  The variable names are
  formed from the scientific names, and are self explanatory for anybody
  familiar with the vegetation type.
The \code{varechem} data frame has 24 rows and 14 columns, giving the
soil characteristics of the very same sites as in the \code{varespec}
data frame. The chemical measurements have obvious names.
\code{Baresoil} gives the estimated cover of bare soil, \code{Humpdepth}
the thickness of the humus layer.

}


}
\references{
Vre, H., Ohtonen, R. and Oksanen, J. (1995) Effects of reindeer
grazing on understorey vegetation in dry Pinus sylvestris
forests. \emph{Journal of Vegetation Science} 6, 523--530.  
}
\examples{
data(varespec)
data(varechem)
}
\keyword{datasets}

\eof
\name{vegan-internal}
\alias{ordiParseFormula}
\alias{permuted.index}
\alias{centroids.cca}
\alias{spider.cca}

\title{Internal vegan functions}

\description{
  Internal vegan functions.
}
\usage{
ordiParseFormula(formula, data)
centroids.cca(x, mf, wt)
permuted.index(n, strata)
spider.cca(x, ...)
}

\details{
  These are not to be called by the user. Function \code{spider.cca} was
  replaced with \code{\link{ordispider}} and will be removed in the
  future. 
}

\keyword{internal }


\eof
\name{vegdist}
\alias{vegdist}
\title{Dissimilarity Indices for Community Ecologists }
\description{
  The function computes dissimilarity indices that are useful for or
  popular with community ecologists.
  Gower, Bray--Curtis, Jaccard and
  Kulczynski indices are good in detecting underlying
  ecological gradients (Faith et al. 1987). Morisita and Horn--Morisita
  indices should be able to handle different sample sizes (Wolda 1981,
  Krebs 1999),
  and Mountford (1962) index for presence--absence data should
  be able to handle unknown (and variable) sample sizes.
}

\usage{ vegdist(x, method="bray", diag=FALSE, upper=FALSE) } 
\arguments{
  \item{x}{ Community data matrix.}
  \item{method}{Dissimilarity index, partial match to  \code{"manhattan"},
    \code{"euclidean"}, \code{"canberra"}, \code{"bray"}, \code{"kulczynski"},
     \code{"jaccard"}, \code{"gower"}, \code{"morisita"}, \code{"horn"} or
    \code{"mountford"}.}
  \item{diag}{Compute diagonals. }
  \item{upper}{Return only the upper diagonal. }
}
\details{
  Jaccard and Mountford indices are discussed below.
  The other indices are defined as:
  \tabular{ll}{
    \code{euclidean}
    \tab \eqn{d_{jk} = \sqrt{\sum_i (x_{ij}-x_{ik})^2}}{d[jk] = sqrt(sum (x[ij]-x[ik])^2)}
    \cr
    \code{manhattan}
    \tab \eqn{d_{jk} = \sum_i |x_{ij} - x_{ik}|}{d[jk] = sum(abs(x[ij] -
      x[ik]))}
    \cr
    \code{gower}
    \tab \eqn{d_{jk} = \sum_i \frac{|x_{ij}-x_{ik}|}{\max x_i-\min x_i}}{d[jk] = sum (abs(x[ij]-x[ik])/(max(x[i])-min(x[i]))}
    \cr
    \code{canberra}
    \tab \eqn{d_{jk}=\frac{1}{NZ} \sum_i
      \frac{|x_{ij}-x_{ik}|}{x_{ij}+x_{ik}}}{d[jk] = (1/NZ) sum
      ((x[ij]-x[ik])/(x[ij]+x[ik]))}
    \cr
    \tab where \eqn{NZ} is the number of non-zero entries.
    \cr
    \code{bray}
    \tab \eqn{d_{jk} = \frac{\sum_i |x_{ij}-x_{ik}|}{\sum_i (x_{ij}+x_{ik})}}{d[jk] = (sum abs(x[ij]-x[ik])/(sum (x[ij]+x[ik]))}
    \cr
    \code{kulczynski}
    \tab \eqn{d_{jk} = 1-0.5(\frac{\sum_i \min(x_{ij},x_{ik})}{\sum_i x_{ij}} +
      \frac{\sum_i \min(x_{ij},x_{ik})}{\sum_i x_{ik}} )}{d[jk] 1 - 0.5*((sum min(x[ij],x[ik])/(sum x[ij]) + (sum
      min(x[ij],x[ik])/(sum x[ik]))}
    \cr
    \code{morisita}
    \tab {\eqn{d_{jk} = \frac{2 \sum_i x_{ij} x_{ik}}{(\lambda_j +
	  \lambda_k) \sum_i x_{ij} \sum_i
	  x_{ik}}}{d[jk] = 2*sum(x[ij]*x[ik])/((lambda[j]+lambda[k]) *
	sum(x[ij])*sum(x[ik]))}  }
    \cr
    \tab where \eqn{\lambda_j = \frac{\sum_i x_{ij} (x_{ij} - 1)}{\sum_i
      x_{ij} \sum_i (x_{ij} - 1)}}{lambda[j] =
      sum(x[ij]*(x[ij]-1))/sum(x[ij])*sum(x[ij]-1)}
    \cr
    \code{horn}
    \tab Like \code{morisita}, but \eqn{\lambda_j = \sum_i
      x_{ij}^2/(\sum_i x_{ij})^2}{lambda[j] = sum(x[ij]^2)/(sum(x[ij])^2)}
  }

  Jaccard index is computed as \eqn{2B/(1+B)}, where \eqn{B} is
  Bray--Curtis dissimilarity.

  Mountford index is defined as \eqn{M = 1/\alpha} where \eqn{\alpha} is
  the parameter of Fisher's logseries assuming that the compared
  communities are samples from the same community
  (cf. \code{\link{fisherfit}}, \code{\link{fisher.alpha}}). The index
  \eqn{M} is found as the positive root of equation \eqn{\exp(aM) +
  \exp(bM) = 1 + \exp[(a+b-j)M]}{exp(a*M) + exp(b*M) = 1 +
  exp((a+b-j)*M)}, where \eqn{j} is the number of species occurring in
  both communities, and \eqn{a} and \eqn{b} are the number of species in
  each separate community (so the index uses presence--absence
  information). Mountford index is usually misrepresented in the
  literature: indeed Mountford (1962) suggested an approximation to be used as starting
  value in iterations, but the proper index is defined as the root of the equation
  above. The function \code{vegdist} solves \eqn{M} with the Newton
  method. Please note that if either \eqn{a} or \eqn{b} are equal to
  \eqn{j}, one of the communities could be a subset of other, and the
  dissimilarity is \eqn{0} meaning that non-identical objects may be
  regarded as similar and the index is non-metric. The Mountford index
  is in the range \eqn{0 \dots \log(2)}, but the dissimilarities are
  divided by \eqn{\log(2)} 
  so that the results will be in the conventional range \eqn{0 \dots 1}.

  Morisita index can be used with genuine count data only. Its
  Horn--Morisita variant is able to handle any abundance data.

  Euclidean and Manhattan dissimilarities are not good in gradient
  separation without proper standardization but are still included for
  comparison and special needs.

  Bray--Curtis and Jaccard indices are rank-order similar, and some
  other indices become identical or rank-order similar after some 
  standardizations, especially with presence/absence transformation of
  equalizing site totals with \code{\link{decostand}}.

  The naming conventions vary. The one adopted here is traditional
  rather than truthful to priority. The abbreviation \code{"horn"} for
  the Horn--Morisita index is misleading, since there is a separate
  Horn index. The abbreviation will be changed if that index is implemented in
  \code{vegan}. 
}
\value{
  Should provide a drop-in replacement for \code{\link[mva]{dist}} and
  return a distance object of the same type. 
}
\references{
  Faith, D. P, Minchin, P. R. and Belbin, L. (1987).
  Compositional dissimilarity as a robust measure of ecological
  distance. \emph{Vegetatio} 69, 57--68.

  Krebs, C. J. (1999). \emph{Ecological Methodology.} Addison Wesley Longman.
  
  Mountford, M. D. (1962). An index of similarity and its application to
  classification problems. In: P.W.Murphy (ed.),
  \emph{Progress in Soil Zoology}, 43--50. Butterworths.

  Wolda, H. (1981). Similarity indices, sample size and
  diversity. \emph{Oecologia} 50, 296--302.
}

\author{ Jari Oksanen }

\note{The  function is an alternative to \code{\link[mva]{dist}} adding
  some ecologically meaningful indices.  Both methods should produce
  similar types of objects which can be interchanged in any method
  accepting either.  Manhattan and Euclidean dissimilarities should be
  identical in both methods, and Canberra dissimilarity may be similar.
}

\seealso{ \code{\link{decostand}}, \code{\link[mva]{dist}},
  \code{\link{rankindex}}, \code{\link[MASS]{isoMDS}}, \code{\link{stepacross}}. }

\examples{
data(varespec)
vare.dist <- vegdist(varespec)
# Orlci's Chord distance: range 0 .. sqrt(2)
vare.dist <- vegdist(decostand(varespec, "norm"), "euclidean")
}
\keyword{ multivariate }

\eof
\name{vegemite}
\alias{vegemite}
\alias{coverscale}

\title{Prints a Compact, Ordered Vegetation Table }
\description{
  The function prints a compact vegetation table, where species are
  rows, and each site takes only one column without spaces.  The
  vegetation table can be ordered by explicit indexing, by environmental
  variables or results from an ordination or cluster analysis.
}
\usage{
vegemite(x, use, scale, sp.ind, site.ind, zero=".")
coverscale(x, scale=c("Braun.Blanquet", "Domin", "Hult", "Hill", "fix", "log"))
}

\arguments{
  \item{x}{Vegetation data. }
  \item{use}{Either a vector or an object from \code{cca},
    \code{decorana} \emph{etc.} or \code{hclust} for ordering sites and species.}
  \item{sp.ind}{Species indices. }
  \item{site.ind}{Site indices. }
  \item{zero}{Character used for zeros. }
  \item{scale}{Cover scale used (can be abbreviated).}
}
\details{
  The function prints a traditional vegetation table.
  Unlike in ordinary data matrices, species are used as rows and sites
  as columns.  The table is printed in compact form:  only one character
  can be used for abundance, and there are no spaces between columns.

  The parameter \code{use} can be a vector or an object from
  \code{\link[mva]{hclust}} or any ordination result recognized by
  \code{\link{scores}}. 
  If \code{use} is a vector, it is used
  for ordering sites.  If \code{use} is an object from ordination, both
  sites and species
  are arranged by the first axis.  
  When \code{use} is an
  object from \code{\link[mva]{hclust}}, the sites are ordered similarly
  as in the cluster dendrogram.
  If ordination methods provide species scores, these are used for
  ordering species.  In all cases where species scores are missing,
  species are ordered by their weighted averages (\code{\link{wascores}})
  on site scores. There is no natural, unique ordering in hierarchic
  clustering, but in some cases species are still nicely ordered.
  Alternatively, species and sites can be ordered explicitly giving
  their indices or names in parameters \code{sp.ind} and
  \code{site.ind}.  If these are given, they take precedence over
  \code{use}. 

  If \code{scale} is given, \code{vegemite} calls
  \code{coverscale} to transform percent cover
  scale or some other scales into traditional class scales used in
  vegetation science (\code{coverscale} can be called directly, too).
  Braun-Blanquet and Domin scales are actually not
  strict cover scales, and the limits used for codes \code{r} and
  \code{+} are arbitrary.  Scale \code{Hill} may be
  inappropriately named, since Mark O. Hill probably never intended this
  as a cover scale.  However, it is used as default `cut levels' in his
  \code{TWINSPAN}, and surprisingly many users stick to this default,
  and so this is a \emph{de facto} standard in publications.  All
  traditional
  scales assume that values are cover percentages with maximum 100.
  However, non-traditional alternative \code{log} can be used with any
  scale range.  Its class limits are integer powers of 1/2 of the
  observed maximum in the data, with \code{+} used for non-zero entries
  less than 1/512 of data maximum (\code{log} stands alternatively for
  logarithmic or logical).  Scale \code{fix} is intended for `fixing'
  10-point scales: it truncates scale values to integers, and replaces
  10 with \code{X} and positive values below 1 with \code{+}. 
}
\value{
  The function is used mainly to print a table, but it returns
  (invisibly) a list
  with items:
  \item{spec}{Ordered species indices.}
  \item{sites}{Ordered site indices.}
}
\references{ The cover scales are presented in many textbooks of vegetation
  science; I used:

  Shimwell, D.W. (1971) \emph{The Description and Classification of
  Vegetation}. Sidgwick & Jackson.
}
\author{Jari Oksanen}

\seealso{\code{\link{cut}} and \code{\link{approx}} for making your own
  `cover scales', \code{\link{wascores}} for weighted averages.
  }

\note{ This function was called \code{vegetab} in older versions of
  \code{vegan}.  The new name was chosen  because the output is so
  compact (and to avoid confusion with the \code{vegtab} function in the
  \code{labdsv} package).
    }
\examples{
data(varespec)
## Print only more common species 
freq <- apply(varespec > 0, 2, sum)
vegemite(varespec, scale="Hult", sp.ind = freq > 10)
## Order by correspondence analysis, use Hill scaling and layout:
dca <- decorana(varespec)
vegemite(varespec, dca, "Hill", zero="-")
}
\keyword{ print }
\keyword{ manip }


\eof
\name{wascores}
\alias{wascores}
\alias{eigengrad}

\title{ Weighted Averages Scores for Species }
\description{
  Computes Weighted Averages scores of species for ordination
  configuration or for environmental variables.
}
\usage{
wascores(x, w, expand=FALSE)
eigengrad(x, w)
} 

\arguments{
  \item{x}{Environmental variables or ordination scores.}
  \item{w}{Weights: species abundances.}
  \item{expand}{Expand weighted averages so that they have the same
    weighted variance as the corresponding environmental variables.  }
}
\details{
  Function \code{wascores} computes weighted averages. Weighted averages
  `shrink': they cannot be more extreme than values used for calculating
  the averages. With \code{expand = TRUE}, the function `dehsrinks' the
  weighted averages by making their biased weighted variance equal to
  the biased weighted variance of the corresponding environmental
  variable.  Function \code{eigengrad} returns the inverses of squared
  expansion factors or the attribute \code{shrinkage} of the
  \code{wascores} result for each environmental gradient.  This is equal
  to the constrained eigenvalue of \code{\link{cca}} when only this one
  gradient was used as a constraint, and describes the strength of the
  gradient. 
}
\value{
  Function \code{wascores} returns a matrix where species define rows
  and ordination axes or environmental variables define columns. If
  \code{expand = TRUE}, attribute \code{shrinkage} has the inverses of
  squared expansion factors or \code{\link{cca}} eigenvalues for the
  variable.  Function \code{eigengrad} returns only the \code{shrinkage}
  attribute. 
}

\author{ Jari Oksanen }

\seealso{ \code{\link[MASS]{isoMDS}}, \code{\link{cca}}. }

\examples{
data(varespec)
data(varechem)
library(MASS)  ## isoMDS
library(mva)   ## cmdscale to start isoMDS
vare.dist <- vegdist(wisconsin(varespec))
vare.mds <- isoMDS(vare.dist)
vare.points <- postMDS(vare.mds$points, vare.dist)
vare.wa <- wascores(vare.points, varespec)
plot(scores(vare.points), pch="+", asp=1)
text(vare.wa, rownames(vare.wa), cex=0.8, col="blue")
## Omit rare species (frequency <= 4)
freq <- apply(varespec>0, 2, sum)
plot(scores(vare.points), pch="+", asp=1)
text(vare.wa[freq > 4,], rownames(vare.wa)[freq > 4],cex=0.8,col="blue")
## Works for environmental variables, too.
wascores(varechem, varespec)
## And the strengths of these variables are:
eigengrad(varechem, varespec)
}

}
\keyword{ multivariate }
\keyword{ univar }




\eof
