kkmeans               package:kernlab               R Documentation

_K_e_r_n_e_l _k-_m_e_a_n_s

_D_e_s_c_r_i_p_t_i_o_n:

     A weigthed kernel version of the famous k-means algorithm.

_U_s_a_g_e:

     ## S4 method for signature 'formula':
     kkmeans(x, data = NULL, na.action = na.omit, ...)

     ## S4 method for signature 'matrix':
     kkmeans(x, centers, kernel = "rbfdot", kpar = list(sigma = 0.1),alg="kkmeans", p=1, na.action = na.omit, ...)

     ## S4 method for signature 'kernelMatrix':
     kkmeans(x, centers, ...)

     ## S4 method for signature 'list':
     kkmeans(x, centers, kernel = "stringdot", kpar = list(length=4, lambda=0.5), alg ="kkmeans", p = 1, na.action = na.omit, ...)

_A_r_g_u_m_e_n_t_s:

       x: the matrix of data to be clustered, or a symbolic description
          of the model to be fit, or a kernel Matrix of class
          'kernelMatrix', or a list of character vectors.

    data: an optional data frame containing the variables in the model.
          By default the variables are taken from the environment which
          `kkmeans' is called from.

 centers: Either the number of clusters or a set of initial cluster
          centers. If the first, a random set of rows in the
          eigenvectors matrix are chosen as the initial centers.

  kernel: the kernel function used in training and predicting. This
          parameter can be set to any function, of class kernel, which
          computes a inner product in feature space between two vector
          arguments (see 'link{kernels}'). 'kernlab' provides the most
          popular kernel functions which can be used by setting the
          kernel parameter to the following strings:

             *  'rbfdot' Radial Basis kernel  "Gaussian"

             *  'polydot' Polynomial kernel 

             *  'vanilladot' Linear kernel 

             *  'tanhdot' Hyperbolic tangent kernel 

             *  'laplacedot' Laplacian kernel 

             *  'besseldot' Bessel kernel 

             *  'anovadot' ANOVA RBF kernel 

             *  'splinedot' Spline kernel

             *  'stringdot' String kernel

          Setting the kernel parameter to "matrix" treats 'x' as a
          kernel matrix calling the 'kernelMatrix' interface.

          The kernel parameter can also be set to a user defined
          function of class kernel by passing the function name as an
          argument.

    kpar: a character string or the list of hyper-parameters (kernel
          parameters). The default character string '"automatic"' uses
          a heuristic the determine a suitable value for the width
          parameter of the RBF kernel.

          A list can also be used containing the parameters to be used
          with the kernel function. Valid parameters for existing
          kernels are :


             *  'sigma' inverse kernel width for the Radial Basis
                kernel function "rbfdot" and the Laplacian kernel
                "laplacedot".

             *  'degree, scale, offset' for the Polynomial kernel
                "polydot"

             *  'scale, offset' for the Hyperbolic tangent kernel
                function "tanhdot"

             *  'sigma, order, degree' for the Bessel kernel
                "besseldot". 

             *  'sigma, degree' for the ANOVA kernel "anovadot".

             *  'length, lambda, normalized' for the "stringdot" kernel
                where length is the length of the strings considered,
                lambda the decay factor and normalized a logical
                parameter determining if the kernel evaluations should
                be normalized.

          Hyper-parameters for user defined kernels can be passed
          through the kpar parameter as well.

     alg: the algorithm to use. Options currently include 'kkmeans' and
          'kerninghan'. 

       p: a parameter used to keep the affinity matrix positive
          semidefinite

na.action: The action to perform on NA

     ...: additional parameters

_D_e_t_a_i_l_s:

     The algorithm is implemented using the triangle inequality to
     avoid unnecessary and computational expensive distance
     calculations. This leads to significant speedup particularly on
     large data sets with a high number of clusters. 
      With a particular choice of weights this algorithm becomes
     equivalent to Kernighan-Lin, and the norm-cut graph partitioning
     algorithms. 
      The function also support input in the form of a kernel matrix or
     a list of characters for text clustering.
      The data can be passed to the 'kkmeans' function in a 'matrix' or
     a 'data.frame', in addition 'kkmeans' also supports input in the
     form of a kernel matrix of class 'kernelMatrix' or as a list of
     character vectors where a string kernel has to be used.

_V_a_l_u_e:

     An S4 object of class 'specc' wich extends the class 'vector'
     containing integers indicating the cluster to which each point is
     allocated. The following slots contain useful information

 centers: A matrix of cluster centers.

    size: The number of point in each cluster

withinss: The within-cluster sum of squares for each cluster

 kernelf: The kernel function used

_A_u_t_h_o_r(_s):

     Alexandros Karatzoglou 
      alexandros.karatzoglou@ci.tuwien.ac.at

_R_e_f_e_r_e_n_c_e_s:

     Inderjit Dhillon, Yuqiang Guan, Brian Kulis
      A Unified view of Kernel k-means, Spectral Clustering and Graph
     Partitioning
      UTCS Technical Report
      <URL:
     http://www.cs.utexas.edu/users/kulis/pubs/spectral_techreport.pdf>

_S_e_e _A_l_s_o:

     'specc', 'kpca', 'kcca'

_E_x_a_m_p_l_e_s:

     ## Cluster the iris data set.
     data(iris)

     sc <- kkmeans(as.matrix(iris[,-5]), centers=3)

     sc
     centers(sc)
     size(sc)
     withinss(sc)

