pcurve                package:pcurve                R Documentation

_P_r_i_n_c_i_p_a_l _C_u_r_v_e _A_n_a_l_y_s_i_s

_D_e_s_c_r_i_p_t_i_o_n:

     Fits a principal curve to a numeric multivariate dataset in
     arbitrary dimensions. Produces diagnostic plots.

_U_s_a_g_e:

     pcurve(x, xcan = NULL, start = "ca", rank = FALSE, cv.fit = FALSE,
     penalty= 1, cv.all = FALSE, df = "vary", fit.meth = "spline",
     canfit = "lm",candf = FALSE, vary.adj = FALSE, subset,
     robust = FALSE, lowf = 0.5, min.df, max.df, max.df.cv.fit,
     ext.dist = TRUE, ext.dc = 0.9, metric = "bray", latent = FALSE,
     plot.pca = TRUE, thresh = 0.001, plot.true = TRUE,
     plot.init = FALSE, plot.segs = TRUE, plot.resp = TRUE,
     plot.cov = TRUE, maxit = 10, stretch = 2, fits = FALSE,
     prnt.fits = TRUE, trace = TRUE, trace.all = FALSE, pch = 1,
     row.chk0 = FALSE, col.chk0 = TRUE, use.loc = FALSE)
      

_A_r_g_u_m_e_n_t_s:

       x: numeric data matrix or data.frame.

    xcan: data.frame or matrix of explanatory variables to be used in
          constrained PCs.

   start: specifies how to determine the starting configuration
          (location of points on initial curve): "ca" = correspondence
          analysis; "pca" = principal components analysis with Euclidan
          metric;  "pca.bc" = principal components analysis with
          Bray-Curtis metric; "mds" = non-metric multidimensional
          scaling with Euclidean metric; "mds.bc" = non-metric
          multidimensional scaling with Bray-Curtis metric; "cs.bc" =
          classical scaling (metric multidimensional scaling) with
          Bray-Curtis metric; "ran" = random start.  Or if start is
          numeric and of length dim(x)[1] a user supplied configuration
          will be used.

    rank: if TRUE starting configuration is transformed to rank

  cv.fit: if TRUE a final iteration using cross-validation is done.

 penalty: penalty for smoothing spline. A value of 1 corresponds to no
          penalty with values > 1 giving a less-smoothed fit.
          Increasing the penalty for small data sets can reduce
          over-fitting.  If penalty = "np", penalty = 1 for  N > 1000,
          penalty = 2 for N <=100, and penalty = 4-log(N, 10) for N  >
          100 and N <= 1000.

  cv.all: if TRUE a cross-validated smoothing spline fit at each
          iteration.

      df: if numeric specifies the df for the smoothing spline.

fit.meth: specifies smoother. "spline" = smooth.spline, "poisson" =
          poisson general additive model, "binomial" = binomial general
          additive model, "lowess" = lowess smoother (this argument
          overridden by robust = TRUE).

  canfit: "lm" or "gam", model used to relate pc to xcan.

   candf: if canfit = "gam", df for model. May be a single value or a
          vector of FALSE or positive integers indicating dfs for each
          explanatory  variable in xcan.  If FALSE, this is equivalent
          to fx=FALSE in 'gam', and d.f. is selected by GCV.UBRE

vary.adj: if FALSE the same df are used for the smooth of each
          variable, otherwise each variable  has its own df.

  subset: used to take a subset of x and start (if numeric).

  robust: if TRUE uses lowess smooths, if FALSE uses smoothing spline.

    lowf: specifies the span of the lowess smooth.

  min.df: specifies the min df for the smoothing.

  max.df: specifies the max df for the smoothing.

max.df.cv.fit: 

ext.dist: if TRUE extended dissimilarities in calculation of initial
          configuration using the flexible shortest path. If FALSE
          standard dissimilarites are used (see De'ath, 1999b and
          'stepacross' in package vegan).

  ext.dc: critical distance, the toolong argument in 'stepacross'.

  metric: similarity metric, the method argument in 'vegdist' in
          package vegan.

  latent: if FALSE locations are rescaled after each iteration to give
          distance along the curve; if TRUE no rescaling is done.

plot.pca: if TRUE the fitting is plotted (assuming plot.true = TRUE) in
          the first 2  dimensions of PCA space.

  thresh: threshold value of difference in cross-validation for ceasing
          iteration

plot.true: if TRUE the fitting process is plotted.

plot.init: if TRUE the initial fits to each variable are plotted.

plot.segs: if TRUE segments linking the fitted points on the curves to
          their corresponding data points are plotted.

plot.resp: if TRUE the final response curves are plotted.

plot.cov: if TRUE covariate partial effects are plotted (only if xcan
          is not null).

   maxit: specifies the maximin number of iterations.

 stretch: end segments of the curve are stretched by this factor at
          each iteration.

    fits: if TRUE value of pcurve includes diagnostics for each
          variable.

prnt.fits: statistics on model fits printed.

   trace: prints out useful fitting diagnostics at each iteration.

trace.all: if TRUE prints out all curve details at each iteration.

     pch: symbol for plots

row.chk0: if TRUE checks for and removes rows of x identically 0.

col.chk0: if TRUE checks for and removes columns of x identically 0.

 use.loc: if TRUE pauses during the fitting displays (left mouse-click
          to progress to next plot).

_D_e_t_a_i_l_s:

     See De'ath (1999a) for a full discussion of the functions and
     their application.

_V_a_l_u_e:

     An object of class principal curve containing a list comprising

       s: fitted values

     tag: order of points along the curve

  lambda: locations along the curve

    dist: sum of squared distances of points from the curve

       c: call to pcurve

       x: data to which the curve was fitted

      df: degrees of freedom for the smoothers used in the fit

fit.list: diagnostics for each variable, only included if fits = TRUE.

_A_u_t_h_o_r(_s):

     R port by Chris Walsh Chris.Walsh@sci.monash.edu.au from S+
     library by Glenn De'ath g.death@aims.gov.au. Original S code for
     principal curve analysis by Trevor Hastie
     hastie@stat.stanford.edu.

_R_e_f_e_r_e_n_c_e_s:

     De'ath, G. 1999a Principal Curves: a new technique for indirect
     and direct gradient analysis. _Ecology_ *80*, 2237-2253.

     De'ath, G. 1999b Extended dissimilarity: method of robust
     estimation of ecological distances with high beta diversity.
     _Plant Ecology_ *144*, 191-199.

     Gittins, R. 1985 _Canonical Analysis.  A review with applications
     in ecology._  Berlin: Springer-Verlag.

     Hastie, T.J and Tibshirani, R.J. 1990 _Generalized additive
     models._ London: Chapman and Hall.

     Hastie, T.J. and Stuetzle, W. 1989 Principal Curves. _Journal of
     the American Statistical Association_ *84*, 502-516.

_S_e_e _A_l_s_o:

     'pcdiags.plt', 'vegdist',  'stepacross'

_E_x_a_m_p_l_e_s:

     #a simulated dataset with 4 response variables (taxa 1-4),
     #n=100.  The response curve is Gaussian and noise is Poisson.
         data(sim4var)
         sim4fit <-  pcurve(sim4var, plot.init = FALSE, use.loc = TRUE)

     #Limestone grassland community example worked by De'ath (1999a),
     #from data in Gittins (1985)
         data(soilspec)
         species <- sqrt(soilspec[,2:9])
         envvar <- soilspec[,10:12]
     #indirect gradient analysis
         spec.fit <- pcurve(species, start = "mds.bc", plot.init = FALSE,
                            use.loc = TRUE)
     #direct gradient analysis
         soilspec.fit <- pcurve(species, xcan = envvar, 
                                start = "mds.bc", plot.init = FALSE,  
                                fits = TRUE, prnt.fits = TRUE,
                                use.loc = TRUE)

