cclust               package:flexclust               R Documentation

_C_o_n_v_e_x _C_l_u_s_t_e_r_i_n_g

_D_e_s_c_r_i_p_t_i_o_n:

     Perform k-means clustering, hard competitive learning or neural
     gas on a data matrix.

_U_s_a_g_e:

     cclust(x, k, dist = "euclidean", method = "kmeans",
            weights=NULL, control=NULL, group=NULL, simple=FALSE)

_A_r_g_u_m_e_n_t_s:

       x: A numeric matrix of data, or an object that can be coerced to
          such a matrix (such as a numeric vector or a data frame with
          all numeric columns).

       k: Either the number of clusters or a set of initial (distinct)
          cluster centroids.  If a number, a random set of (distinct)
          rows in 'x' is chosen as the initial centroids.

    dist: Distance measure, one of '"euclidean"' (mean square distance)
          or '"manhattan "' (absolute distance).

  method: Clustering algorithm: one of '"kmeans"', '"hardcl"' or
          '"neuralgas"', see details below.

 weights: An optional vector of weights to be used in the fitting
          process. Works only in combination with hard competitive
          learning.

 control: An object of class 'cclustControl'.

   group: Currently ignored.

  simple: Return an object of class 'kccasimple'?

_D_e_t_a_i_l_s:

     This function uses the same computational engine as the earlier
     function of the same name from package `cclust'. The main
     difference is that it returns an S4 object of class '"kcca"',
     hence all available methods for '"kcca"' objects can be used. By
     default 'kcca' and 'cclust' use exactly the same algorithm, but
     'cclust' will usually be much faster because it uses compiled
     code.

     If 'dist' is '"euclidean"', the distance between the cluster
     center and the data points is the Euclidian distance (ordinary
     kmeans algorithm), and cluster means are used as centroids. If
     '"manhattan"', the distance between the cluster center and the
     data points is the sum of the absolute values of the distances,
     and the column-wise cluster medians are used as centroids.

     If 'method' is '"kmeans"', the classic kmeans algorithm as given
     by MacQueen (1967) is used, which works by repeatedly moving all
     cluster centers to the mean of their respective Voronoi sets. If
     '"hardcl"', on-line updates are used (AKA hard competitive
     learning), which work by randomly drawing an observation from 'x'
     and moving the closest center towards that point (e.g., Ripley
     1996). If '"neuralgas"' then the  neural gas algorithm by
     Martinetz et al (1993) is used. It is similar to hard competitive
     learning, but in addition to the closest centroid also the second
     closest centroid is moved in each iteration.

_V_a_l_u_e:

     An object of class '"kcca"'.

_A_u_t_h_o_r(_s):

     Evgenia Dimitriadou and Friedrich Leisch

_R_e_f_e_r_e_n_c_e_s:

     MacQueen, J. (1967).  Some methods for classification and analysis
     of multivariate observations. In _Proceedings of the Fifth
     Berkeley Symposium on  Mathematical Statistics and  Probability_,
     eds L. M. Le Cam & J. Neyman, 1, pp. 281-297. Berkeley, CA:
     University of California Press.

     Martinetz T., Berkovich S., and Schulten K (1993). `Neural-Gas'
     Network for Vector Quantization and its Application to Time-Series
     Prediction. IEEE Transactions on Neural Networks, 4 (4), pp.
     558-569.

     Ripley, B. D. (1996) _Pattern Recognition and Neural Networks._
     Cambridge.

_S_e_e _A_l_s_o:

     'cclustControl-class', 'kcca'

_E_x_a_m_p_l_e_s:

     ## a 2-dimensional example
     x<-rbind(matrix(rnorm(100,sd=0.3),ncol=2),
              matrix(rnorm(100,mean=1,sd=0.3),ncol=2))
     cl<-cclust(x,2)
     plot(x, col=predict(cl))
     points(cl@centers, pch="x", cex=2, col=3) 

     ## a 3-dimensional example 
     x<-rbind(matrix(rnorm(150,sd=0.3),ncol=3),
              matrix(rnorm(150,mean=2,sd=0.3),ncol=3),
              matrix(rnorm(150,mean=4,sd=0.3),ncol=3))
     cl<-cclust(x, 6, method="neuralgas")
     pairs(x, col=predict(cl))
     plot(cl, data=x)

