trimkmeans            package:trimcluster            R Documentation

_T_r_i_m_m_e_d _k-_m_e_a_n_s _c_l_u_s_t_e_r_i_n_g

_D_e_s_c_r_i_p_t_i_o_n:

     The trimmed k-means clustering method by Cuesta-Albertos,
     Gordaliza and Matran (1997). This optimizes the k-means criterion
     under trimming a portion of the points.

_U_s_a_g_e:

       trimkmeans(data,k,trim=0.1, scaling=FALSE, runs=100, points=NULL,
                            countmode=runs+1, printcrit=FALSE,
                            maxit=2*nrow(as.matrix(data)))

       ## S3 method for class 'tkm':
       print(x, ...)
       ## S3 method for class 'tkm':
       plot(x, data, ...)

_A_r_g_u_m_e_n_t_s:

    data: matrix or data.frame with raw data

       k: integer. Number of clusters.

    trim: numeric between 0 and 1. Proportion of points to be trimmed.

 scaling: logical. If 'TRUE', the variables are centered at their means
          and scaled to unit variance before execution.

    runs: integer. Number of algorithm runs from initial means
          (randomly chosen from the data points).

  points: 'NULL' or a matrix with k vectors used as means to initialize
          the algorithm. If initial mean vectors are specified, 'runs'
          should be 1 (otherwise the same initial means are used for
          all runs).

countmode: optional positive integer. Every 'countmode' algorithm runs
          'trimkmeans' shows a message.

printcrit: logical. If 'TRUE', all criterion values (mean squares) of
          the algorithm runs are printed.

   maxit: integer. Maximum number of iterations within an algorithm
          run. Each iteration determines all points which are closer to
          a different cluster center than the one to which they are
          currently assigned. The algorithm terminates if no more
          points have to be reassigned, or if 'maxit' is reached.

       x: object of class 'tkm'.

     ...: further arguments to be transferred to 'plot' or
          'plotcluster'.

_D_e_t_a_i_l_s:

     'plot.tkm' calls 'plotcluster' if the dimensionality of the data
     'p' is 1, shows a scatterplot with non-trimmed regions if 'p=2'
     and discriminant coordinates computed from the clusters (ignoring
     the trimmed points) if 'p>2'.

_V_a_l_u_e:

     An object of class 'tkm' which is a LIST with components 

classification: integer vector coding cluster membership with trimmed
          observations coded as 'k+1'.

   means: numerical matrix giving the mean vectors of the k classes.

 disttom: vector of squared Euclidean distances of all points to the
          closest mean.

    ropt: maximum value of 'disttom' so that the corresponding point is
          not trimmed.

       k: see above.

    trim: see above.

    runs: see above.

 scaling: see above.

_A_u_t_h_o_r(_s):

     Christian Hennig chrish@stats.ucl.ac.uk <URL:
     http://www.homepages.ucl.ac.uk/~ucakche/>

_R_e_f_e_r_e_n_c_e_s:

     Cuesta-Albertos, J. A., Gordaliza, A., and Matran, C. (1997)
     Trimmed k-Means: An Attempt to Robustify Quantizers, Annals of
     Statistics, 25, 553-576.

_S_e_e _A_l_s_o:

     'plotcluster'

_E_x_a_m_p_l_e_s:

       set.seed(10001)
       n1 <-60
       n2 <-60
       n3 <-70
       n0 <-10
       nn <- n1+n2+n3+n0
       pp <- 2
       X <- matrix(rep(0,nn*pp),nrow=nn)
       ii <-0
       for (i in 1:n1){
         ii <-ii+1
         X[ii,] <- c(5,-5)+rnorm(2)
       }
       for (i in 1:n2){
         ii <- ii+1
         X[ii,] <- c(5,5)+rnorm(2)*0.75
       }
       for (i in 1:n3){
         ii <- ii+1
         X[ii,] <- c(-5,-5)+rnorm(2)*0.75
       }
       for (i in 1:n0){
         ii <- ii+1
         X[ii,] <- rnorm(2)*8
       }
       tkm1 <- trimkmeans(X,k=3,trim=0.1,runs=3)
     # runs=3 is used to save computing time.
       print(tkm1)
       plot(tkm1,X)

