CauRuimet                package:PTAk                R Documentation

_R_o_b_u_s_t _e_s_t_i_m_a_t_i_o_n _o_f _w_i_t_h_i_n _g_r_o_u_p _c_o_v_a_r_i_a_n_c_e

_D_e_s_c_r_i_p_t_i_o_n:

     Gives a robust estimate of an unknown within group covariance,
     aiming either to look for dense groups or to sparse groups
     (outliers) according to _local variance and weighting function_
     choice.

_U_s_a_g_e:

      CauRuimet(Z,ker=1,m0=1,withingroup=TRUE,
                   loc=substitute(apply(Z,2,mean,trim=.1)),matrixmethod=TRUE)

             

_A_r_g_u_m_e_n_t_s:

       Z: matrix

     ker: either numerical or a function: if numerical the weighting
          function is e^{(-ker ;t)}, otherwise 
           ker=function(t){return(expression)} is a positive decreasing
          function. 

      m0: is a graph of neighbourhood or another proximity matrix, the
          hadamard product of the proximities will be operated

withingroup: logical,if 'TRUE' the aim is to give a robust estimate for
          dense groups, if 'FALSE' the aim is to give a robust estimate
          for outliers

     loc: a vector of locations or a function using mean, median, to
          give an estimate of it

matrixmethod: if 'TRUE' (only with 'withingroup') uses some matrix
          computation rather than double looping as suggests the
          formula below 

_D_e_t_a_i_l_s:

     When withingroup is 'TRUE', local(defined by the weighting)
     variance formula is used, aiming at finding dense groups: 

 W_g=frac{sum_{i=1}^{n-1}sum_{j=i+1}^n ker(d^2_{S^-}(Z_i,Z_j))(Z_i-Z_j)'(Z_i-Z_j)}{sum_{i=1}^{n-1}sum_{j=i+1}^n ker(d^2_{S^-}(Z_i,Z_j))}

     where d^2_{S^-}( . , .) is the squared euclidian distance with S^-
     the inverse of a robust sample covariance (i.e. using 'loc'
     instead of the mean) ; if 'FALSE' weighted global variance is
     used:

 W_o=frac{sum_{i=1}^nker(d^2_{S^-}(Z_i,tilde{Z}))(Z_i-tilde{Z})'(Z_i-tilde{Z})} {sum_{i=1}^n  ker(d^2_{S^-}(Z_i,tilde{Z}))}

     where tilde{Z} is the vector 'loc'. 
      If 'm0' is a graph of neighbourhood and ker is the function
     returning 1 (no proximity due to distance is used) the function
     will return (when 'withingroup=TRUE') the _local
     variance-covariance_ matrix as define in Lebart(1969).

_V_a_l_u_e:

     a matrix

_N_o_t_e:

     As mentioned by Caussinus and Ruiz a good strategy to reveal dense
     groups with generalised PCA would be to reveal outliers first
     using the metric W_o^{-1} and remove them before using the metric
     W_g^{-1}. Based on theoretical considerations they recommand for 
     the choice of 'ker', with the decreasing function e^{(-ker ;t)}: a
     lower bound of 1 if 'withingroup' and something fairly small say
     in the interval [0.05;0.3] otherwise.

_A_u_t_h_o_r(_s):

     Didier Leibovici c3s2i@free.fr

_R_e_f_e_r_e_n_c_e_s:

     Caussinus, H and Ruiz, A (1990) _Interesting Projections of
     Multidimensional Data by Means of Generalized Principal Components
     Analysis_. COMPSTAT90, Physica-Verlag, Heidelberg,121-126.

     Faraj, A (1994) _Interpretation tools for Generalized Discriminant
     Analysis_.In: New Approches in Classification and Data Analysis,
     Springer-Verlag, 286-291, Heidelberg.

     Lebart, L (1969) _Analyse statistique de la
     contiguite_.Publication de l'Institut de Statistiques
     Universitaire de Paris, XVIII,81-112.

_S_e_e _A_l_s_o:

     'SVDgen'

_E_x_a_m_p_l_e_s:

      data(iris)
       iris2 <- as.matrix(iris[,1:4])
       dimnames(iris2)[[1]] <- as.character(iris[,5])

      D2 <- CauRuimet(iris2,ker=1,withingroup=TRUE)
      D2 <- Powmat(D2,(-1))
      iris2 <- sweep(iris2,2,apply(iris2,2,mean))
      res <- SVDgen(iris2,D2=D2,D1=1)
      plot(res,nb1=1,nb2=2,cex=0.5)
      summary(res,testvar=0)

      # the same in a demo function

      # demo.CauRuimet(ker=4,withingroup=TRUE,openX11s=FALSE)
      # demo.Cauruimet(ker=0.15,withingroup=FALSE,openX11s=FALSE)

