dr                    package:dr                    R Documentation

_M_a_i_n _f_u_n_c_t_i_o_n _f_o_r _d_i_m_e_n_s_i_o_n _r_e_d_u_c_t_i_o_n _r_e_g_r_e_s_s_i_o_n

_D_e_s_c_r_i_p_t_i_o_n:

     This is the main function in the dr package.  It creates objects
     of class dr to estimate the central (mean) subspace and perform
     tests concerning its dimension.  Several helper functions that
     require a dr object can then be applied to the output from this
     function.

_U_s_a_g_e:

     dr (formula, data, subset, group=NULL, na.action = na.fail, weights, ...)
         
     dr.compute (x, y, weights, group=NULL, method = "sir", chi2approx="bx",...)
      

_A_r_g_u_m_e_n_t_s:

 formula: a two-sided formula like 'y~x1+x2+x3', where the left-side
          variable is a vector or a matrix of the response variable(s),
          and the right-hand side variables represent the predictors. 
          While any legal formula in the Rogers-Wilkinson notation can
          appear, dimension reduction methods generally expect the
          predictors to be numeric, not factors, with no nesting.  
          Full rank models are recommended, although rank deficient
          models are permitted.

          The left-hand side of the formula will generally be a single
          vector, but it can also be a matrix, such as
          'cbind(y1+y2)~x1+x2+x3' if the 'method' is '"save"' or
          '"sir"'.  Both of these methods are based on slicing, and for
          the multivariate case slices are determined by slicing on all
          the columns of the left-hand side variables.   

    data: an optional data frame containing the variables in the model.
          By default the variables are taken from the environment from
          which `dr' is called.

  subset: an optional vector specifying a subset of observations to be
          used in the fitting process.

   group: If used, this argument specifies a grouping variable so that 
          dimension reduction is done separately for each distinct
          level.  This is implemented only when 'method' is one of
          '"sir"',  '"save"', or '"ire"'.  This argument must be a
          one-sided formula. For example, '~Location' would fit
          separately for each level of the variable 'Location'.  The
          formula '~A:B' would fit separately for each combination of
          'A' and 'B', provided that both have been declared factors.

 weights: an optional vector of weights to be used where appropriate. 
          In the context of dimension reduction methods, weights are
          used to obtain elliptical symmetry, not constant variance.  

na.action: a function which indicates what should happen when the data
          contain `NA's.  The default is `na.fail,' which will stop
          calculations. The option 'na.omit' is also permitted, but it
          may not work correctly when weights are used.

       x: The design matrix.  This will be computed from the formula by
          'dr' and then passed to 'dr.compute', or you can create it
          yourself.

       y: The response vector or matrix

  method: This character string specifies the method of fitting.  The
          options include '"sir"', '"save"', '"phdy"', '"phdres"' and 
          '"ire"'.  Each method may have its own additional arguments,
          or its own defaults; see the details below for more
          information.

chi2approx: Several dr methods compute significance levels using 
          statistics that are asymptotically distributed as a linear
          combination of chi^2(1) random variables.  This keyword
          chooses the method for computing the chi2approx, either
          '"bx"', the default for a method suggested by Bentler and Xie
          (2000) or '"wood"' for a method proposed by Wood (1989).

     ...: For 'dr', all additional  arguments passed to 'dr.compute'. 
          For  'dr.compute', additional  arguments may be required for
          particular dimension reduction method.  For example, 
          'nslices' is the number of slices used by '"sir"' and
          '"save"'. 'numdir' is the maximum number of directions to
          compute, with default equal to 4. Other methods may have
          other defaults.

_D_e_t_a_i_l_s:

     The general regression problem studies F(y|x), the conditional
     distribution of a response y given a set of predictors x.   This
     function provides methods for estimating the dimension and central
     subspace of a general regression problem.  That is, we want to
     find a  p by d matrix B of minimal rank d such that 

                           F(y|x)=F(y|B'x)

     Both the dimension d and the subspace R(B) are unknown.  These
     methods make few assumptions.  Many methods are based on the
     inverse distribution, F(x|y).  

     For the methods '"sir"', '"save"', '"phdy"' and  '"phdres"', a
     kernel matrix M is estimated such that the  column space of M
     should be close to the central subspace  R(B).  The eigenvectors
     corresponding to the 'd' largest  eigenvalues of M provide an
     estimate of R(B).

     For the method '"ire"', subspaces are estimated by minimizing  an
     objective function.

     Categorical predictors can be included using the 'groups' 
     argument, with the methods '"sir"', '"save"' and  '"ire"', using
     the ideas from Chiaromonte, Cook and Li (2002).

     The primary output from this method is (1) a set of vectors whose 
     span estimates 'R(B)'; and various tests concerning the  dimension
     'd'.  

     Weights can be used, essentially to specify the relative 
     frequency of each case in the data.  Empirical weights that make 
     the contours of the weighted sample closer to elliptical can be 
     computed using 'dr.weights'.   This will usually result in zero
     weight for some  cases.  The function will set zero estimated
     weights to missing.

_V_a_l_u_e:

     dr returns an object that inherits from dr (the name of the type
     is the value of the 'method' argument), with attributes: 

       x: The design matrix

       y: The response vector

 weights: The weights used, normalized to add to n.

      qr: QR factorization of x.

   cases: Number of cases used.

    call: The initial call to 'dr'.

       M: A matrix that depends on the method of computing.  The column
          space of M should be close to the central subspace.

 evalues: The eigenvalues of M (or squared singular values if M is not
          symmetric).

evectors: The eigenvectors of M (or of M'M if M is not square and
          symmetric) ordered according to the eigenvalues.

chi2approx: Value of the input argument of this name.

  numdir: The maximum number of directions to be found.  The output
          value of numdir may be smaller than the input value.

slice.info: output from 'sir.slice', used by sir and save.

  method: the dimension reduction method used.

   terms: same as terms attribute in lm or glm.  Needed to make
          'update' work correctly.

       A: If method='"save"', then 'A' is a three dimensional array
          needed to compute test statistics.

_A_u_t_h_o_r(_s):

     Sanford Weisberg, <sandy@stat.umn.edu>.

_R_e_f_e_r_e_n_c_e_s:

     Bentler, P. M. and Xie, J. (2000), Corrections to test statistics
     in  principal Hessian directions.  _Statistics and Probability 
     Letters_, 47, 381-389.  Approximate p-values.

     Cook, R. D. (1998).  _Regression Graphics_.  New York:  Wiley.  
     This book provides the basic results for dimension reduction 
     methods, including detailed discussion of the methods '"sir"', 
     '"phdy"' and '"phdres"'.

     Cook, R. D. (2004). Testing predictor contributions in sufficient 
     dimension reduction. _Annals of Statistics_, 32, 1062-1092.  
     Introduced marginal coordinate tests.

     Cook, R. D. and Nachtsheim, C. (1994), Reweighting to achieve 
     elliptically contoured predictors in regression.  _Journal of  the
     American Statistical Association_, 89, 592-599.  Describes the 
     weighting scheme used by 'dr.weights'.

     Cook, R. D. and Ni, L. (2004). Sufficient dimension reduction via 
     inverse regression:  A minimum discrrepancy approach, _Journal  of
     the American Statistical Association_, 100, 410-428. The  '"ire"'
     is described in this paper.

     Cook, R. D. and Weisberg, S. (1999).  _Applied Regression 
     Including Computing and Graphics_, New York:  Wiley,  <URL:
     http://www.stat.umn.edu/arc>.  The program 'arc' described  in
     this book also computes most of the dimension reduction methods 
     described here.

     Chiaromonte, F., Cook, R. D. and Li, B. (2002). Sufficient
     dimension  reduction in regressions with categorical predictors.
     Ann. Statist.  30 475-497.  Introduced grouping, or conditioning
     on factors.

     Shao, Y., Cook, R. D. and Weisberg (2007).  Marginal tests with 
     sliced average variance estimation.  _Biometrika_.  Describes  the
     tests used for '"save"'. 

     Wen, X. and Cook, R. D. (2007).  Optimal Sufficient Dimension 
     Reduction in Regressions with Categorical Predictors, _Journal  of
     Statistical Inference and Planning_.   This paper extends the 
     '"ire"' method to grouping.  

     Wood, A. T. A. (1989) An F approximation to the distribution  of a
     linear combination of chi-squared variables.  _Communications in
     Statistics: Simulation and Computation_, 18,  1439-1456. 
     Approximations for p-values.

_E_x_a_m_p_l_e_s:

     data(ais)
     # default fitting method is "sir"
     s0 <- dr(LBM~log(SSF)+log(Wt)+log(Hg)+log(Ht)+log(WCC)+log(RCC)+
       log(Hc)+log(Ferr),data=ais) 
     # Refit, using a different function for slicing to agree with arc.
     summary(s1 <- update(s0,slice.function=dr.slices.arc))
     # Refit again, using save, with 10 slices; the default is max(8,ncol+3)
     summary(s2<-update(s1,nslices=10,method="save"))
     # Refit, using phdres.  Tests are different for phd, and not
     # Fit using phdres; output is similar for phdy, but tests are not justifiable. 
     summary(s3<- update(s1,method="phdres"))
     # fit using ire:
     summary(s4 <- update(s1,method="ire"))
     # fit using Sex as a grouping variable.  
     s5 <- update(s4,group=~Sex)

