ML | spectral clustering

In multivariate statistics and the clustering of data, spectral clustering techniques make use of the spectrum (eigenvalues) of the similarity matrix of the data to perform dimensionality reduction before clustering in fewer dimensions. The similarity matrix is provided as an input and consists of a quantitative assessment of the relative similarity of each pair of points in the dataset.



不同的谱聚类算法就是计算Laplacian matrix的算法不一样。

  1. 计算相似矩阵S;(相似就连边);
  2. 计算Laplacian矩阵L(是图论里的概念);
  3. 计算L的特征向量(注意这里是最小的k个特征向量);组成转换矩阵;
  4. 降维;
  5. 聚类;(k-means)

The simplest algorithm

Given a simple graph G with n vertices, its Laplacian matrix $L:=(\ell_{i,j})_{n \times n}$ is defined as:

$L = D - A.$
That is, it is the difference of the degree matrix D and the adjacency matrix A of the graph. In the case of directed graphs, either the indegree or outdegree might be used, depending on the application.

