regress               package:monomvn               R Documentation

_S_w_i_t_c_h _f_u_n_c_t_i_o_n _f_o_r _l_e_a_s_t _s_q_u_a_r_e_s _a_n_d _p_a_r_s_i_m_o_n_i_o_u_s _m_o_n_o_m_v_n _r_e_g_r_e_s_s_i_o_n_s

_D_e_s_c_r_i_p_t_i_o_n:

     This function fits the specified ordinary least squares or
     parsimonious regression (plsr, pcr, ridge, and lars methods)
     depending on the arguments provided, and returns estimates of
     coefficients and (co-)variances in a 'monomvn' friendly format

_U_s_a_g_e:

     regress(X, y, method = c("lsr", "plsr", "pcr", "lasso", "lar",
          "forward.stagewise", "stepwise", "ridge", "factor"), p = 0,
          ncomp.max = Inf, validation = c("CV", "LOO", "Cp"),
          verb = 0, quiet = TRUE)

_A_r_g_u_m_e_n_t_s:

       X: 'data.frame', 'matrix', or vector of inputs 'X' 

       y: matrix of responses 'y' of row-length equal to the leading
          dimension (rows) of 'X', i.e., 'nrow(y) == nrow(X)'; if 'y'
          is a vector, then 'nrow' may be interpreted as 'length' 

  method: describes the type of _parsimonious_ (or _shrinkage_)
          regression, or ordinary least squares. From the 'pls' package
          we have '"plsr"' (plsr, the default) for  partial least
          squares and '"pcr"' (pcr) for standard principal component
          regression.  From the 'lars' package (see the '"type"'
          argument to lars) we have '"lasso"' for L1-constrained
          regression, '"lar"' for least angle regression,
          '"forward.stagewise"' and '"stepwise"' for fast
          implementations of classical forward selection of covariates.
           From the 'MASS' package we have '"ridge"' as implemented by
          the 'lm.ridge' function.  The '"factor"' method treats the
          first 'p' columns of 'y' as known factors

       p: when performing regressions, '0 <= p <= 1' is the proportion
          of the number of columns to rows in the design matrix before
          an alternative regression 'method' (except '"lsr"') is
          performed as if  least-squares regression failed.
          Least-squares regression is known to fail when the number of
          columns is greater than or equal to the number of rows. The
          default setting, 'p = 0', forces the specified 'method' to be
          used for _every_ regression unless 'method = "lsr"' is
          specified but is unstable. Intermediate settings of 'p' allow
          the user to specify that least squares regressions are
          preferred only when there are sufficiently more rows in the
          design matrix ('X') than columns. When 'method = "factor"'
          the 'p' argument represents an integer (positive) number of
          initial columns of 'y' to treat as known factors

ncomp.max: maximal number of (principal) components to consider in a
          'method'-only meaningful for the '"plsr"' or '"pcr"' methods.
           Large settings can cause the execution to be slow as they
          drastically increase the cross-validation (CV) time

validation: method for cross validation when applying  a _parsimonious_
          regression method.  The default setting of '"CV"' (randomized
          10-fold cross-validation) is the faster method,  but does not
          yield a deterministic result and does not apply for
          regressions on less than ten responses. '"LOO"'
          (leave-one-out cross-validation) is deterministic, always
          applicable, and applied automatically whenever  '"CV"' cannot
          be used.  When standard least squares is appropriate, the
          methods implemented the 'lars' package (e.g. lasso) support
          model choice via the '"Cp"' statistic, which defaults to the
          '"CV"' method when least squares fails.  This argument is
          ignored for the '"ridge"' method; see details below

    verb: whether or not to print progress indicators.  The default
          ('verb = 0') keeps quiet.  This argument is provided for
          'monomvn' and is not intended to be set by the user via this
          interface 

   quiet: causes 'warning's about regressions to be silenced when
          'TRUE'

_D_e_t_a_i_l_s:

     All 'method's (except '"lsr"') require a scheme for estimating the
     amount of variability explained by increasing numbers of non-zero
     coefficients (or principal components) in the model. Towards this
     end, the 'pls' and 'lars' packages support 10-fold cross
     validation (CV) or leave-one-out (LOO) CV estimates of root mean
     squared error.  See 'pls' and 'lars' for more details.  The
     'regress' function uses CV in all cases except when 'nrow(X) <=
     10', in which case CV fails and LOO is used.  Whenever 'nrow(X) <=
     3' 'pcr' fails,  so 'plsr' is used instead. If 'quiet = FALSE'
     then a 'warning' is given whenever the first choice for a
     regression fails.

     For 'pls' methods, RMSEs are calculated for a number of components
     in '1:ncomp.max' where a 'NULL' value for 'ncomp.max' it is
     replaced with

     'ncomp.max <- min(ncomp.max, ncol(y), nrow(X)-1)'

     which is the max allowed by the 'pls' package.

     Simple heuristics are used to select a small number of components
     ('ncomp' for 'pls'), or number of coefficients (for 'lars') which
     explains a large amount of the variability (RMSE). The 'lars'
     methods use a one-standard error rule outlined in Section 7.10,
     page 216 of HTF below.  The 'pls' package does not currently
     support the calculation of standard errors for CV estimates of
     RMSE, so a simple linear penalty for increasing 'ncomp' is used
     instead.  The ridge constant (lambda) for 'lm.ridge' is set using
     the 'optimize' function on the 'GCV' output.

_V_a_l_u_e:

     'regress' returns a 'list' containing the components listed below.

   call : a copy of the function call as used

 method : a copy of the 'method' input argument

  ncomp : depends on the 'method' used: is 'NA' when 'method = "lsr"';
          is the number of principal components for 'method = "pcr"'
          and 'method = "plsr"'; is the number of non-zero components
          in the coefficient vector ('$b', not counting the intercept)
          for any of the 'lars' methods; and gives the chosen lambda
          penalty parameter for 'method = "ridge"'

 lambda : if 'method' is one of 'c("lasso", "forward.stagewise",
          "ridge")', then this field records the lambda penalty
          parameter used

      b : matrix containing the estimated regression coefficients, with
          'ncol(b) = ncol(y)' and the intercept in the first row

      S : (biased corrected) maximum likelihood estimate of residual
          covariance matrix

_N_o_t_e:

     The CV in 'plsr' and 'lars' are random in nature, and so can be
     dependent on the random seed.  Use 'validation="LOO"' for
     deterministic (but slower) result

     Be warned that the 'lars' implementation of '"forward.stagewise"'
     can sometimes get stuck in (what seems like) an infinite loop.
     This is not a bug in the 'regress' function; the bug has been
     reported to the authors of 'lars'

_A_u_t_h_o_r(_s):

     Robert B. Gramacy bobby@statslab.cam.ac.uk

_R_e_f_e_r_e_n_c_e_s:

     Bjorn-Helge Mevik and Ron Wehrens (2007). _The 'pls' Package:
     Principal Component and Partial Least Squares Regression in R._ 
     Journal of Statistical Software *18*(2)

     Bradley Efron, Trevor Hastie, Ian Johnstone and Robert Tibshirani
     (2003). _Least Angle Regression (with discussion)._ Annals of
     Statistics *32*(2); see also 
      <URL:
     http://www-stat.stanford.edu/~hastie/Papers/LARS/LeastAngle_2002.pdf>

     <URL: http://www.statslab.cam.ac.uk/~bobby/monomvn.html>

_S_e_e _A_l_s_o:

     'monomvn', 'blasso', 'lars' in the 'lars' library, 'lm.ridge' in
     the 'MASS' library, 'plsr' and 'pcr' in the 'pls' library

_E_x_a_m_p_l_e_s:

     ## following the lars diabetes example
     data(diabetes)
     attach(diabetes)

     ## Ordinary Least Squares regression
     reg.ols <- regress(x, y)

     ## Lasso regression
     reg.lasso <- regress(x, y, method="lasso")

     ## partial least squares regression
     reg.plsr <- regress(x, y, method="plsr")

     ## ridge regression
     reg.ridge <- regress(x, y, method="ridge")

     ## compare the coefs
     data.frame(ols=reg.ols$b, lasso=reg.lasso$b,
                plsr=reg.plsr$b, ridge=reg.ridge$b)

     ## summarize the posterior distribution of lambda2 and s2
     detach(diabetes)

