| clustering {clues} | R Documentation |
Data clustering (after data shrinking).
clustering(y, disMethod = "Euclidean")
y |
data matrix which is a R matrix object (for dimension > 1) or vector object (for dimension=1) with rows be observations and columns be variables. |
disMethod |
specification of the dissimilarity measure. The available measures are “Euclidean” and “1-corr”. |
We first store the first observation (data point) in point[1].
We then get the nearest neighbor of point[1]. Store it in
point[2]. Store the dissimilarity between point[1] and
point[2] to db[1]. We next remove point[1].
We then find the nearest neighbor of point[2].
Store it in point[3]. Store the dissimilarity between point[2]
and point[3] to db[2]. We then remove point[2]
and find the nearest neighbor of point[3]. We repeat this procudure
until we find point[n] and db[n-1] where n is the
total number of data points.
Next, we calculate the interquartile range (IQR) of the vector db.
We then check which elements of db are larger than avg+1.5IQR
where avg is the average of the vector db. The mininum value of
these outlier dissimilarities will be stored in omin.
An estimate of the number of clusters is g where g-1 is the number
of the outlier dissimilarities.
The position of an outlier dissimilarity
indicates the end of a cluster and the start of a new cluster.
To get a reasonable clustering result, data sharpening (shrinking) is recommended before data clustering.
mem |
vector of the cluster membership of data points. The cluster member ship takes values: 1, 2, ..., g, where g is the estimated number of clusters. |
size |
vector of the number of data points for clusters. |
g |
an estimate of the number of clusters. |
db |
vector of dissimilarities between consecutive data points (c.f. details). |
point |
vector of consecutive data points (c.f. details). |
omin |
The minimum value of the outlier dissimilarities (c.f. details). |
Wang, S., Qiu, W., and Zamar, R. H. (2007). CLUES: A non-parametric clustering method based on local shrinking. Computational Statistics & Data Analysis, Vol. 52, issue 1, pages 286-298.
# ruspini data data(Ruspini) # data matrix ruspini <- Ruspini$ruspini tt <- clustering(ruspini) plotClusters(ruspini, tt$mem)