covMest                package:rrcov                R Documentation

_C_o_n_s_t_r_a_i_n_e_d _M-_E_s_t_i_m_a_t_e_s _o_f _L_o_c_a_t_i_o_n _a_n_d _S_c_a_t_t_e_r

_D_e_s_c_r_i_p_t_i_o_n:

     Computes constrained M-Estimates of multivariate location and
     scatter  based on the translated biweight function ('t-biweight')
     using a High breakdown point initial estimate (Minimum Covariance
     Determinant -  'Fast MCD')

_U_s_a_g_e:

     covMest(x, cor=FALSE, r = 0.45, arp = 0.05, eps=1e-3, 
         maxiter=120, control, t0, S0)

_A_r_g_u_m_e_n_t_s:

       x: a matrix or data frame. 

     cor: should the returned result include a correlation matrix?
          Default is 'cor = FALSE'

       r: required breakdown point.  Allowed values are between  '(n -
          p)/(2 * n)' and 1 and the default is 0.45

     arp: asympthotic rejection point, i.e. the fraction of points 
          receiving zero weight (see Rocke (1996)).  Default is '0.05'.

     eps: a numeric value specifying the relative precision of the
          solution of  the M-estimate. Defaults to '1e-3'

 maxiter: maximum number of iterations allowed in the computation of 
          the M-estimate. Defaults to 120

 control: a list with estimation options - same as these provided in
          the fucntion specification. If the control object is
          supplied, the parameters from it will be used. If parameters
          are passed also in the invocation statement, they will
          override the corresponding elements of the control object.

      t0: optional initial high breakdown point estimates of the
          location.  If not supplied MCD will be used. 

      S0: optional initial high breakdown point estimates of the 
          scatter. If not supplied MCD will be used. 

_D_e_t_a_i_l_s:

     Rocke (1996) has shown that the S-estimates of multivariate
     location and scatter in high dimensions can be sensitive to
     outliers even if the breakdown point  is set to be near 0.5. To
     mitigate this problem he proposed to utilize  the translated 
     biweight (or t-biweight) method with a standardization step
     consisting of equating the median of 'rho(d)'  with the median
     under normality. This is then not an S-estimate, but is  instead a
     constrained M-estimate. In order to make the smooth estimators  to
     work, a reasonable starting point is necessary, which will lead
     reliably to a  good solution of the estimator. In 'covMest' the
     MCD computed by  'covMcd' is used, but the user has the
     possibility to give her own  initial estimates.

_V_a_l_u_e:

     An object of class '"mest"' which is basically a 'list' with the
     following components. This class is "derived" from '"mcd"' so that
     the same generic functions -  'print',  'plot',  'summary' - can
     be used. NOTE: this is going to change - in one of the next
     revisions 'covMest'  will return an S4 class '"mest"' which is
     derived (i.e. 'contains')  form class '"cov"'.

  center: the final estimate of location.

     cov: the final estimate of scatter.

     cor: the estimate of the correlation matrix (only if 'cor =
          TRUE').

     mah: mahalanobis distances of the observations using the 
          M-estimate of the location and scatter.

       X: the input data as a matrix.

   n.obs: total number of observations.

  method: character string naming the method (M-Estimates).

    call: the call used (see 'match.call').

_N_o_t_e:

     The psi, rho and weight functions for the M estimation are
     encapsulated in a  virtual S4 class 'PsiFun' from which a 'PsiBwt'
     class, implementing  the translated biweight (t-biweight), is
     dervied. The base class  'PsiFun'  contains also the M-iteration
     itself. Although not documented and not  accessibale directly by
     the user these classes will form the bases for adding  other
     functions (biweight, LWS, etc.) as well as S-estimates.

_A_u_t_h_o_r(_s):

     Valentin Todorov valentin.todorov@chello.at, 

     (some code from C. Becker -  
     http://www.sfb475.uni-dortmund.de/dienst/de/content/struk-d/bereic
     ha-d/tpa1softw-d.html)

_R_e_f_e_r_e_n_c_e_s:

     D.L.Woodruff and D.M.Rocke (1994) Computable robust estimation of
     multivariate location and shape on high dimension using compound
     estimators, _Journal of the American  Statistical Association_,
     *89*, 888-896.

     D.M.Rocke (1996) Robustness properties of S-estimates of
     multivariate location and shape in high dimension, _Annals of
     Statistics_, *24*, 1327-1345.

     D.M.Rocke and D.L.Woodruff (1996) Identification of outliers in
     multivariate data _Journal of the American Statistical
     Association_, *91*, 1047-1061.

_S_e_e _A_l_s_o:

     'covMcd'

_E_x_a_m_p_l_e_s:

     data(hbk)
     hbk.x <- data.matrix(hbk[, 1:3])
     covMest(hbk.x)

     ## the following three statements are equivalent
     c0 <- covMest(hbk.x)
     c1 <- covMest(hbk.x, r = 0.45)
     c2 <- covMest(hbk.x, control = rrcov.control(r = 0.45))
     ## direct specification overrides control one:
     c3 <- covMest(hbk.x, r = 0.45,
                  control = rrcov.control(r=0.25))
     c1

