| diffusionKmeans {diffusionMap} | R Documentation |
Clusters a data set based on its diffusion coordinates.
diffusionKmeans(dmap, K, params = c(), Niter = 50, epsilon = 0.001)
dmap |
a '"dmap"' object, computed by diffusion() |
K |
number of clusters |
params |
optional parameters for each data point. Entry can be a vector of length n, or a matrix with n rows. If this argument is given, cluster centroid parameters are returned. |
Niter |
number of K-means iterations performed. |
epsilon |
stopping criterion for relative change in distortion for each K-means iteration |
A '"dmap"' object computed by diffuse() is the input, so diffuse() must be performed first. Function is written this way so the K-means parameters may be varied without having to recompute the diffusion map coordinates in each run.
The returned value is a list with components
part |
final labelling of data from K-means. n-dimensional vector with integers between 1 and K |
cent |
K geometric centroids found by K-means |
D |
minimum of total distortion (loss function of K-means) found across K-means runs |
DK |
n by k matrix of squared (Euclidean) distances from each point to every centroid for the optimal K-means run |
centparams |
optional parameters for each centroid. Only returned if params is specified in the function call. Is a matrix with k rows. |
Joseph Richards jwrichar@stat.cmu.edu
Lafon, S., & Lee, A., (2006), IEEE Trans. Pattern Anal. and Mach. Intel., 28, 1393
Richards, J. W., Freeman, P. E., Lee, A. B., Schafer, C. M., (2009), ApJ, 691, 32
## example with annulus data set
data(annulus)
par(mfrow=c(2,1))
plot(annulus,main="Annulus Data",pch=20,cex=.7)
D = dist(annulus) # use Euclidean distance
dmap = diffuse(D,0.03) # compute diffusion map
k=2 # number of clusters
dkmeans = diffusionKmeans(dmap, k,Niter=25)
plot(annulus,main="Colored by diffusion K-means clustering",pch=20,
cex=.7,col=dkmeans$part)
## example with Chainlink data set
data(Chainlink)
lab.col = c(rep("red",500),rep("blue",500)); n=1000
scatterplot3d(Chainlink$C1,Chainlink$C2,Chainlink$C3,color=lab.col,
main="Chainlink Data") # plot Chainlink data
D = dist(Chainlink) # use Euclidean distance
dmap = diffuse(D,neigen=3,,eps.val=.01) # compute diffusion map & plot
plot(dmap)
print(dmap)
dkmeans = diffusionKmeans(dmap, K=2, Niter=25)
col.dkmeans=ifelse(dkmeans$part==1,"red","blue")
scatterplot3d(Chainlink$C1,Chainlink$C2,Chainlink$C3,color=col.dkmeans,
main="Chainlink Data, colored by diffusion K-means classification")