specc                package:kernlab                R Documentation

_S_p_e_c_t_r_a_l _C_l_u_s_t_e_r_i_n_g

_D_e_s_c_r_i_p_t_i_o_n:

     A spectral clustering  algorithm. Clustering performed by
     embedding the data into the subspace of the eigenvectors of a
     kernel matrix.

_U_s_a_g_e:

     ## S4 method for signature 'formula':
     specc(x, data = NULL, na.action = na.omit, ...)

     ## S4 method for signature 'matrix':
     specc(x, centers, kernel = "rbfdot", kpar = list(sigma = 0.1), 
            nystrom.red = FALSE, nystrom.sample = dim(x)[1]/6, iterations = 200,
            mod.sample = 0.75, na.action = na.omit, ...)

     ## S4 method for signature 'kernelMatrix':
     specc(x, centers, nystrom.red = FALSE, iterations = 200, ...)

     ## S4 method for signature 'list':
     specc(x, centers, kernel = "stringdot", kpar = list(length=4, lambda=0.5),
                            nystrom.red = FALSE, nystrom.sample = length(x)/6, iterations = 200,
                            mod.sample =  0.75, na.action = na.omit, ...)

_A_r_g_u_m_e_n_t_s:

       x: the matrix of data to be clustered, or a symbolic description
          of the model to be fit, or a kernel Matrix of class
          'kernelMatrix', or a list of character vectors.

    data: an optional data frame containing the variables in the model.
          By default the variables are taken from the environment which
          `specc' is called from.

 centers: Either the number of clusters or a set of initial cluster
          centers. If the first, a random set of rows in the
          eigenvectors matrix are chosen as the initial centers.

  kernel: the kernel function used in training and predicting. This
          parameter can be set to any function, of class kernel, which
          computes a dot product between two vector arguments. kernlab
          provides the most popular kernel functions which can be used
          by setting the kernel parameter to the following strings:

             *  'rbfdot' Radial Basis kernel function "Gaussian"

             *  'polydot' Polynomial kernel function

             *  'vanilladot' Linear kernel function

             *  'tanhdot' Hyperbolic tangent kernel function

             *  'laplacedot' Laplacian kernel function

             *  'besseldot' Bessel kernel function

             *  'anovadot' ANOVA RBF kernel function

             *  'splinedot' Spline kernel 

          The kernel parameter can also be set to a user defined
          function of class kernel by passing the function name as an
          argument. 

    kpar: a character string or the list of hyper-parameters (kernel
          parameters). The default character string '"automatic"' uses
          a heuristic the determine a suitable value for the width
          parameter of the RBF kernel. The second option '"local"'
          (local scaling) uses a more advanced heuristic and sets a
          width parameter for every point in the data set. This is
          particularly useful when the data incorporates multiple
          scales. A list can also be used containing the parameters to
          be used with the kernel function. Valid parameters for
          existing kernels are :

             *  'sigma' inverse kernel width for the Radial Basis
                kernel function "rbfdot" and the Laplacian kernel
                "laplacedot".

             *  'degree, scale, offset' for the Polynomial kernel
                "polydot"

             *  'scale, offset' for the Hyperbolic tangent kernel
                function "tanhdot"

             *  'sigma, order, degree' for the Bessel kernel
                "besseldot". 

             *  'sigma, degree' for the ANOVA kernel "anovadot".

          Hyper-parameters for user defined kernels can be passed
          through the kpar parameter as well.

nystrom.red: use nystrom method to calculate eigenvectors. When 'TRUE'
          a sample of the dataset is used to calculate the eigenvalues,
          thus only a n x m matrix where n the sample size is stored in
          memory (default: 'FALSE'

nystrom.sample: Number of data points to use for estimating the
          eigenvalues when using the nystrom method. (default :
          dim(x)[1]/6)

mod.sample: Proportion of data to use when estimating sigma (default:
          0.75)

iterations: The maximum number of iterations allowed. 

na.action: The action to perform on NA

     ...: additional parameters

_D_e_t_a_i_l_s:

     Spectral clustering  works by embedding the data points of the
     partitioning problem into the subspace of the k largest
     eigenvectors of a normalized affinity/kernel matrix. Using a
     simple clustering method like kmeans on the embedded points
     usually leads to good performance. It can be shown that  spectral
     clustering methods boil down to a a graph partitioning problem.
      The data can be passed to the 'specc' function in a 'matrix' or a
     'data.frame', in addition 'specc' also supports input in the form
     of a kernel matrix of class 'kernelMatrix' or as a list of
     character vectors where a string kernel has to be used.

_V_a_l_u_e:

     An S4 object of class 'specc' wich extends the class 'vector'
     containing integers indicating the cluster to which each point is
     allocated. The following slots contain useful information

 centers: A matrix of cluster centers.

    size: The number of point in each cluster

withinss: The within-cluster sum of squares for each cluster

 kernelf: The kernel function used

_A_u_t_h_o_r(_s):

     Alexandros Karatzoglou 
      alexandros.karatzoglou@ci.tuwien.ac.at

_R_e_f_e_r_e_n_c_e_s:

     Andrew Y. Ng, Michael I. Jordan, Yair Weiss
      _On Spectral Clustering: Analysis and an Algorithm_
      Neural Information Processing Symposium 2001
      <URL: http://www.nips.cc/NIPS2001/papers/psgz/AA35.ps.gz>

_S_e_e _A_l_s_o:

     'kkmeans', 'kpca', 'kcca'

_E_x_a_m_p_l_e_s:

     ## Cluster the spirals data set.
     data(spirals)

     sc <- specc(spirals, centers=2)

     sc
     centers(sc)
     size(sc)
     withinss(sc)

     plot(spirals, col=sc)

