谱聚类思想
谱聚类的思想来源于图论,它把待聚类的数据集中的每一个样本看做是图中一个顶点,这些顶点连接在一起,连接的这些边上有权重,权重的大小表示这些样本之间的相似程度。同一类的顶点它们的相似程度很高,在图论中体现为同一类的顶点中连接它们的边的权重很大,不在同一类的顶点连接它们的边的权重很小。于是谱聚类的最终目标就是找到一种切割图的方法,使得切割之后的各个子图内的权重很大,子图之间的权重很小。
---------------------------------------------------------------------------------------------------------------------------------------
输入:样本集合 X = { x 1 , x 2 , . . . , x N } X=\{x_1,x_2,...,x_N \} X={x1,x2,...,xN}、聚类数量K
输出:样本集合的聚类C*
优化问题:
m i n { A k } k = 1 K N c u t ( V ) = m i n { A k } k = 1 K ∑ k = 1 K W ( A k , A ‾ k ) ∑ i ∈ A k d i → { A k } k = 1 K = a r g m i n { A k } k = 1 K N c u t ( V ) = a r g m i n { A k } k = 1 K ∑ k = 1 K W ( A k , A ‾ k ) ∑ i ∈ A k d i → Y ^ = a r g m i n Y ∑ k = 1 K y k T L y k y k T D y k 这里 y k 是 Y 的第 k 列 Y ∈ R N × K , b e t h e c l u s t e r i n d i c a t o r m a t r i x , i n w h i c h y i l = 1 i n d i c a t e s t h a t x i i s a s s i g n e d t o t h e l t h c l u s t e r . { y i ∈ { 0 , 1 } K ∑ j = 1 K y i j = 1 y i = [ y i 1 y i 2 . . . y i K ] \underset{\{A_k\}_{k=1}^K}{min\ }Ncut(V)=\underset{\{A_k\}_{k=1}^K}{min\ }\sum_{k=1}^K\frac{W(A_k,\overline A_k)}{\sum_{i∈A_k}d_i}\\ \\ →\{A_k\}_{k=1}^K=arg \underset{\{A_k\}_{k=1}^K}{min\ }Ncut(V)=arg\underset{\{A_k\}_{k=1}^K}{min\ }\sum_{k=1}^K\frac{W(A_k,\overline A_k)}{\sum_{i∈A_k}d_i} \\ \\ →\hat Y=arg\ \underset{Y}{min}\sum_{k=1}^K\frac{y_k^TLy_k}{y_k^TDy_k}\\ \\ 这里y_k是Y的第k列\\ \\ Y∈R^{N×K},be\ the\ cluster\ indicator\ matrix,\\ in\ which\ y_{il}=1\ indicates\ that\ x_i\ is\ assigned\ to\ the\ l_{th}\ cluster.\\ \begin{cases} y_i ∈\{0,1\}^K \\ \\ \sum_{j=1}^Ky_{ij}=1 \\ \end{cases}\ \ \ \ \ \ \ \ \ y_i= \begin{bmatrix} y_{i1}\\ y_{i2}\\ ... \\ y_{iK} \end{bmatrix} {Ak}k=1Kmin Ncut(V)={Ak}k=1Kmin k=1∑K∑i∈AkdiW(Ak,Ak)→{Ak}k=1K=arg{Ak}k=1Kmin Ncut(V)=arg{Ak}k=1Kmin k=1∑K∑i∈AkdiW(Ak,Ak)→Y^=arg Ymink=1∑KykTDykykTLyk这里yk是Y的第k列Y∈RN×K,be the cluster indicator matrix,in which yil=1 indicates that xi is assigned to the lth cluster.⎩ ⎨ ⎧yi∈{0,1}K∑j=1Kyij=1 yi= yi1yi2...yiK
m i n Y T r ( Y T L Y ( Y T D Y ) − 1 ) Y ^ = a r g m i n Y T r ( Y T L Y ( Y T D Y ) − 1 ) \underset{Y}{min}\ Tr(Y^TLY(Y^TDY)^{-1})\\ \\ \hat{Y}=arg\ \underset{Y}{min}\ Tr(Y^TLY(Y^TDY)^{-1}) \\ \\ Ymin Tr(YTLY(YTDY)−1)Y^=arg Ymin Tr(YTLY(YTDY)−1)
这里L=D-W是拉普拉斯矩阵
---------------------------------------------------------------------------------------------------------------------------------------
符号说明:Graph-based(带权重的无向图)
样本数据: X = ( x 1 , . . . , x N ) ⊤ \ X=(x_1,...,x_N)^\top X=(x1,...,xN)⊤
无向图: G = { V , E } G=\{V,E\} G={V,E}
顶点集: V = { 1 , 2 , . . . , N } ⇔ X V=\{1,2,...,N\}⇔X V={1,2,...,N}⇔X
边集: E : s i m i l a r i t y m a t r i x ( a f f i m t y m a t r i x ) E:similarity\ \ matrix(affimty\ \ matrix) E:similarity matrix(affimty matrix)
权重矩阵:W
W = [ w 11 w 12 . . . w 1 N w 21 w 22 . . . w 2 N . . . . . . . . . . . . w N 1 w N 2 . . . w N N ] = [ w i j ] , 1 ≤ i , j ≤ N 其中 w i j = { K ( x i , x j ) = e x p { − ∣ ∣ x i − x j ∣ ∣ 2 2 2 θ 2 } if ( i , j ) ∈ E 0 if ( i , j ) ∉ E W= \begin{bmatrix} w_{11} & w_{12} & ... & w_{1N} \\ w_{21} & w_{22} & ... & w_{2N}\\ ... & ... & ... & ...\\ w_{N1} & w_{N2} & ... & w_{NN} \end{bmatrix} =[w_{ij}],1≤i,j≤N\\ \\ 其中w_{ij}= \begin{cases} K(x_i,x_j)=exp\{-\frac{||x_i-x_j||_2^2}{2\theta ^2} \} & \text{if } (i,j)∈E \\ \\ 0 & \text{if } (i,j)∉E \\ \end{cases}\\ \\ W= w11w21...wN1w12w22...wN2............w1Nw2N...wNN =[wij],1≤i,j≤N其中wij=⎩ ⎨ ⎧K(xi,xj)=exp{−2θ2∣∣xi−xj∣∣22}0if (i,j)∈Eif (i,j)∈/E
顶点i的度: d i = ∑ j = 1 N w i j d_i=\sum_{j=1}^Nw_{ij} di=∑j=1Nwij
度矩阵: D = d i a g ( W ⋅ 1 N ) D=diag(W⋅\mathbf{1}_N) D=diag(W⋅1N)
D = d i a g ( W ⋅ 1 N ) = [ d 1 0 . . . 0 0 d 2 . . . 0 . . . . . . . . . . . . 0 0 . . . d N ] = [ ∑ j = 1 N w 1 j 0 . . . 0 0 ∑ j = 1 N w 2 j . . . 0 . . . . . . . . . . . . 0 0 . . . ∑ j = 1 N w N j ] D=diag(W⋅\mathbf{1}_N)= \begin{bmatrix} d_1 & 0 & ... & 0 \\ 0 & d_2 & ... & 0\\ ... & ... & ... & ...\\ 0 & 0 & ... & d_N \end{bmatrix}= \begin{bmatrix} \sum_{j=1}^Nw_{1j} & 0 & ... & 0 \\ 0 & \sum_{j=1}^Nw_{2j} & ... & 0\\ ... & ... & ... & ...\\ 0 & 0 & ... & \sum_{j=1}^Nw_{Nj} \end{bmatrix}\\ \\ D=diag(W⋅1N)= d10...00d2...0............00...dN = ∑j=1Nw1j0...00∑j=1Nw2j...0............00...∑j=1NwNj
Laplacian Matrix: L = D − W L=D-W L=D−W
相关定义:
对于集合A和集合B,A和B是顶点集V的子集且A和B的交集为空,即:
A ⊂ V , B ⊂ V , A ∩ B = ∅ → W ( A , B ) = ∑ i ∈ A , j ∈ B w i j \\A \subset V,B \subset V,A\cap B=\emptyset\\ \\ →W(A,B)=\sum_{i∈A,j∈B} w_{ij} A⊂V,B⊂V,A∩B=∅→W(A,B)=i∈A,j∈B∑wij
这个就是说对于A和B两个类别,计算一个类的节点到另一个类的节点所有边的权重和。
更严格的切图的定义如下:假如一共K个类别也就是将顶点集合分为K个子集合,即:
V = ∪ k = 1 K A k = A 1 ∪ A 2 ∪ A 3 ∪ . . . ∪ A K V=∪_{k=1}^K A_k=A_1∪A_2∪A_3∪...∪A_K V=∪k=1KAk=A1∪A2∪A3∪...∪AK
C u t ( V ) = C u t ( A 1 , . . . , A K ) = ∑ k = 1 K W ( A k , A ‾ k ) = ∑ k = 1 K W ( A k , V ) − ∑ k = 1 K W ( A k , A k ) Cut(V)=Cut(A_1,...,A_K)=\sum_{k=1}^K W(A_k,\overline A_k)=\sum_{k=1}^K W(A_k,V)-\sum_{k=1}^K W(A_k,A_k) Cut(V)=Cut(A1,...,AK)=k=1∑KW(Ak,Ak)=k=1∑KW(Ak,V)−k=1∑KW(Ak,Ak)
我们的目标是 m i n { A k } k = 1 K C u t ( V ) \underset{\{A_k\}_{k=1}^K}{min}Cut(V) {Ak}k=1KminCut(V)(切割之后的各个子图内的权重很大,子图之间的权重很小。),但是直接拿Cut作为目标函数会有问题,如下图我们选择一个权重最小的边缘的点,比如C和H之间进行cut,这样可以最小化 C u t ( A 1 , A 2 , . . . A k ) Cut(A_1,A_2,...A_k) Cut(A1,A2,...Ak), 但是却不是最优的切图。
对Cut做一个normalize。
Ncut的定义:
除以度来做一个Normalize,度用 d e g r e e ( A k ) degree(A_k) degree(Ak)表示。
c u t ( V ) = ∑ k = 1 K W ( A k , A ‾ k ) → N c u t = ∑ k = 1 K W ( A k , A ‾ k ) Δ Δ = d e g r e e ( A k ) = ∑ i ∈ A k d i d i = ∑ j = 1 N w i j → N c u t = ∑ k = 1 K W ( A k , A ‾ k ) ∑ i ∈ A k d i d i = ∑ j = 1 N w i j = ∑ k = 1 K W ( A k , V ) − W ( A k , A k ) ∑ i ∈ A k d i = ∑ k = 1 K W ( A k , V ) − W ( A k , A k ) ∑ i ∈ A k ∑ j = 1 N w i j cut(V)=\sum_{k=1}^KW(A_k,\overline A_k)\\ \\ \\ →Ncut=\sum_{k=1}^K\frac{W(A_k,\overline A_k)}{\Delta}\\ \\ \Delta=degree(A_k)=\sum_{i∈A_k}d_i\ \ \ \ \ d_i=\sum_{j=1}^Nw_{ij}\\ \\ \\ →Ncut=\sum_{k=1}^K\frac{W(A_k,\overline A_k)}{\sum_{i∈A_k}d_i}\ \ \ \ \ d_i=\sum_{j=1}^Nw_{ij}\\ \\ \\ =\sum_{k=1}^K\frac{W(A_k,V)-W(A_k,A_k)}{\sum_{i∈A_k}d_i}\\ \\ \\ =\sum_{k=1}^K\frac{W(A_k,V)-W(A_k,A_k)}{\sum_{i∈A_k}\sum_{j=1}^Nw_{ij}} cut(V)=k=1∑KW(Ak,Ak)→Ncut=k=1∑KΔW(Ak,Ak)Δ=degree(Ak)=i∈Ak∑di di=j=1∑Nwij→Ncut=k=1∑K∑i∈AkdiW(Ak,Ak) di=j=1∑Nwij=k=1∑K∑i∈AkdiW(Ak,V)−W(Ak,Ak)=k=1∑K∑i∈Ak∑j=1NwijW(Ak,V)−W(Ak,Ak)
优化目标:
m i n { A k } k = 1 K N c u t ( V ) \underset{\{A_k\}_{k=1}^K}{min\ }Ncut(V) {Ak}k=1Kmin Ncut(V)
Model
m i n { A k } k = 1 K N c u t ( V ) = m i n { A k } k = 1 K ∑ k = 1 K W ( A k , A ‾ k ) ∑ i ∈ A k d i → { A k } k = 1 K = a r g m i n { A k } k = 1 K N c u t ( V ) = a r g m i n { A k } k = 1 K ∑ k = 1 K W ( A k , A ‾ k ) ∑ i ∈ A k d i \underset{\{A_k\}_{k=1}^K}{min\ }Ncut(V)=\underset{\{A_k\}_{k=1}^K}{min\ }\sum_{k=1}^K\frac{W(A_k,\overline A_k)}{\sum_{i∈A_k}d_i}\\ \\ \\ →\{A_k\}_{k=1}^K=arg \underset{\{A_k\}_{k=1}^K}{min\ }Ncut(V)=arg\underset{\{A_k\}_{k=1}^K}{min\ }\sum_{k=1}^K\frac{W(A_k,\overline A_k)}{\sum_{i∈A_k}d_i} {Ak}k=1Kmin Ncut(V)={Ak}k=1Kmin k=1∑K∑i∈AkdiW(Ak,Ak)→{Ak}k=1K=arg{Ak}k=1Kmin Ncut(V)=arg{Ak}k=1Kmin k=1∑K∑i∈AkdiW(Ak,Ak)
引入指示向量indicator vector:
{ y i ∈ { 0 , 1 } K ∑ j = 1 K y i j = 1 y i = [ y i 1 y i 2 . . . y i K ] y i j = 1 ⇔ 第 i 个样本属于第 j 个类别 \begin{cases} y_i ∈\{0,1\}^K \\ \\ \sum_{j=1}^Ky_{ij}=1 \\ \end{cases}\ \ \ \ \ \ \ \ \ y_i= \begin{bmatrix} y_{i1}\\ y_{i2}\\ ... \\ y_{iK} \end{bmatrix} \ \ \ \ \ \ \ \ \\ y_{ij}=1⇔第\ i个样本属于第\ j个类别 ⎩ ⎨ ⎧yi∈{0,1}K∑j=1Kyij=1 yi= yi1yi2...yiK yij=1⇔第 i个样本属于第 j个类别
Y = [ y 1 , . . . y K ] N × K ⊤ 将问题模型转换 : Y ^ = a r g m i n Y ^ N c u t ( V ) Y=[y_1,...y_K]^\top_{N×K}\\ \\ 将问题模型转换: \hat Y=arg \underset{\hat Y}{min}Ncut(V) Y=[y1,...yK]N×K⊤将问题模型转换:Y^=argY^minNcut(V)
目的是将问题模型转换: Y ^ = a r g m i n Y ^ N c u t ( V ) \hat Y=arg \underset{\hat Y}{min}Ncut(V) Y^=argY^minNcut(V)
将Ncut转换成矩阵的形式:
N c u t = ∑ k = 1 K W ( A k , A ‾ k ) ∑ i ∈ A k d i = T r [ W ( A 1 , A ‾ 1 ) ∑ i ∈ A 1 d i 0 . . . 0 0 W ( A 2 , A ‾ 2 ) ∑ i ∈ A 2 d i . . . 0 . . . . . . . . . . . . 0 0 . . . W ( A K , A ‾ K ) ∑ i ∈ A K d i ] = T r [ W ( A 1 , A ‾ 1 ) 0 . . . 0 0 W ( A 2 , A ‾ 2 ) . . . 0 . . . . . . . . . . . . 0 0 . . . W ( A K , A ‾ K ) ] [ ∑ i ∈ A 1 d i 0 . . . 0 0 ∑ i ∈ A 2 d i . . . 0 . . . . . . . . . . . . 0 0 . . . ∑ i ∈ A K d i ] − 1 Ncut=\sum_{k=1}^K\frac{W(A_k,\overline A_k)}{\sum_{i∈A_k}d_i}\\ \\ \\ =Tr \begin{bmatrix} \frac{W(A_1,\overline A_1)}{\sum_{i∈A_1}d_i} & 0 & ... & 0 \\ 0 & \frac{W(A_2,\overline A_2)}{\sum_{i∈A_2}d_i} & ... & 0\\ ... & ... & ... & ...\\ 0 & 0 & ... & \frac{W(A_K,\overline A_K)}{\sum_{i∈A_K}d_i} \end{bmatrix}\\ \\ \\ =Tr \begin{bmatrix} W(A_1,\overline A_1) & 0 & ... & 0 \\ 0 & W(A_2,\overline A_2) & ... & 0\\ ... & ... & ... & ...\\ 0 & 0 & ... & W(A_K,\overline A_K) \end{bmatrix} \begin{bmatrix} \sum_{i∈A_1}d_i & 0 & ... & 0 \\ 0 & \sum_{i∈A_2}d_i & ... & 0\\ ... & ... & ... & ...\\ 0 & 0 & ... & \sum_{i∈A_K}d_i \end{bmatrix}^{-1} Ncut=k=1∑K∑i∈AkdiW(Ak,Ak)=Tr ∑i∈A1diW(A1,A1)0...00∑i∈A2diW(A2,A2)...0............00...∑i∈AKdiW(AK,AK) =Tr W(A1,A1)0...00W(A2,A2)...0............00...W(AK,AK) ∑i∈A1di0...00∑i∈A2di...0............00...∑i∈AKdi −1
记 O K × K = [ W ( A 1 , A ‾ 1 ) 0 . . . 0 0 W ( A 2 , A ‾ 2 ) . . . 0 . . . . . . . . . . . . 0 0 . . . W ( A K , A ‾ K ) ] P K × K = [ ∑ i ∈ A 1 d i 0 . . . 0 0 ∑ i ∈ A 2 d i . . . 0 . . . . . . . . . . . . 0 0 . . . ∑ i ∈ A K d i ] 记O_{K×K}= \begin{bmatrix} W(A_1,\overline A_1) & 0 & ... & 0 \\ 0 & W(A_2,\overline A_2) & ... & 0\\ ... & ... & ... & ...\\ 0 & 0 & ... & W(A_K,\overline A_K) \end{bmatrix}\\ \\ \\ P_{K×K}= \begin{bmatrix} \sum_{i∈A_1}d_i & 0 & ... & 0 \\ 0 & \sum_{i∈A_2}d_i & ... & 0\\ ... & ... & ... & ...\\ 0 & 0 & ... & \sum_{i∈A_K}d_i \end{bmatrix} 记OK×K= W(A1,A1)0...00W(A2,A2)...0............00...W(AK,AK) PK×K= ∑i∈A1di0...00∑i∈A2di...0............00...∑i∈AKdi
现在问题转化为: m i n { A k } k = 1 K N c u t ( V ) = m i n { A k } k = 1 K T r ( O P − 1 ) \underset{\{A_k\}_{k=1}^K}{min}Ncut(V)=\underset{\{A_k\}_{k=1}^K}{min}Tr(OP^{-1}) {Ak}k=1KminNcut(V)={Ak}k=1KminTr(OP−1)
已知W、Y,求O、P,我们要将O和P用Y和W表示:
先求解P:
Y ⊤ Y = [ y 1 , . . . y N ] [ y 1 T y 2 T . . . y N T ] Y^\top Y=[y_1,...y_N] \begin{bmatrix} y_{1}^T\\ y_{2}^T\\ ... \\ y_{N}^T \end{bmatrix} Y⊤Y=[y1,...yN] y1Ty2T...yNT
= ∑ i = 1 N y i y i T = [ N 1 0 . . . 0 0 N 2 . . . 0 . . . . . . . . . . . . 0 0 . . . N K ] K × K =\sum_{i=1}^Ny_iy_i^T= \begin{bmatrix} N_1 & 0 & ... & 0 \\ 0 & N_2 & ... & 0\\ ... & ... & ... & ...\\ 0 & 0 & ... & N_K \end{bmatrix}_{K×K} =i=1∑NyiyiT= N10...00N2...0............00...NK K×K
= [ ∑ i ∈ A 1 1 0 . . . 0 0 ∑ i ∈ A 2 1 . . . 0 . . . . . . . . . . . . 0 0 . . . ∑ i ∈ A K 1 ] K × K =\begin{bmatrix} \sum_{i∈A_1}1 & 0 & ... & 0 \\ 0 & \sum_{i∈A_2}1 & ... & 0\\ ... & ... & ... & ...\\ 0 & 0 & ... & \sum_{i∈A_K}1 \end{bmatrix}_{K×K} = ∑i∈A110...00∑i∈A21...0............00...∑i∈AK1 K×K
N k N_k Nk 的含义:在N个样本中,属于类别k的样本个数。 ∑ k = 1 N N k = N , N k = ∣ A k ∣ = ∑ i ∈ A k 1 \sum_{k=1}^NN_k=N,N_k=|A_k|=\sum_{i∈A_k}1 ∑k=1NNk=N,Nk=∣Ak∣=∑i∈Ak1
∑ i = 1 N y i d i y i T = y 1 d 1 y 1 T + y 2 d 2 y 2 T . . . + y N d N y N T = Y T D Y P K × K = [ ∑ i ∈ A 1 d i 0 . . . 0 0 ∑ i ∈ A 2 d i . . . 0 . . . . . . . . . . . . 0 0 . . . ∑ i ∈ A K d i ] = Y T D Y 其中 , D = [ d 1 0 . . . 0 0 d 2 . . . 0 . . . . . . . . . . . . 0 0 . . . d N ] = d i a g ( W ⋅ 1 N ) = [ ∑ j = 1 N w 1 j 0 . . . 0 0 ∑ j = 1 N w 2 j . . . 0 . . . . . . . . . . . . 0 0 . . . ∑ j = 1 N w N j ] \sum_{i=1}^Ny_id_iy_i^T=y_1d_1y_1^T+y_2d_2y_2^T...+y_Nd_Ny_N^T=Y^TDY \\ \\ \\ P_{K×K}= \begin{bmatrix} \sum_{i∈A_1}d_i & 0 & ... & 0 \\ 0 & \sum_{i∈A_2}d_i & ... & 0\\ ... & ... & ... & ...\\ 0 & 0 & ... & \sum_{i∈A_K}d_i \end{bmatrix}=Y^TDY\\ \\ \\ 其中,D= \begin{bmatrix} d_1 & 0 & ... & 0 \\ 0 & d_2 & ... & 0\\ ... & ... & ... & ...\\ 0 & 0 & ... & d_N \end{bmatrix}=diag(W·\mathbf{1}_N)= \begin{bmatrix} \sum_{j=1}^Nw_{1j} & 0 & ... & 0 \\ 0 & \sum_{j=1}^Nw_{2j} & ... & 0\\ ... & ... & ... & ...\\ 0 & 0 & ... & \sum_{j=1}^Nw_{Nj} \end{bmatrix}\\ \\ \\ i=1∑NyidiyiT=y1d1y1T+y2d2y2T...+yNdNyNT=YTDYPK×K= ∑i∈A1di0...00∑i∈A2di...0............00...∑i∈AKdi =YTDY其中,D= d10...00d2...0............00...dN =diag(W⋅1N)= ∑j=1Nw1j0...00∑j=1Nw2j...0............00...∑j=1NwNj
所以我们求解的P为:
P = Y T D Y 其中 , D = d i a g ( W ⋅ 1 N ) P=Y^TDY\\ \\ \\ 其中,D=diag(W·\mathbf{1}_N) P=YTDY其中,D=diag(W⋅1N)
再求解O:
O K × K = [ W ( A 1 , A ‾ 1 ) 0 . . . 0 0 W ( A 2 , A ‾ 2 ) . . . 0 . . . . . . . . . . . . 0 0 . . . W ( A K , A ‾ K ) ] W ( A k , A k ‾ ) = W ( A k , V ) ⏟ ∑ i ∈ A k d i − W ( A k , A k ) ⏟ ∑ i ∈ A k ∑ j ∈ A k w i j → O = [ ∑ i ∈ A 1 d i 0 . . . 0 0 ∑ i ∈ A 2 d i . . . 0 . . . . . . . . . . . . 0 0 . . . ∑ i ∈ A K d i ] − [ W ( A 1 , A 1 ) 0 . . . 0 0 W ( A 2 , A 2 ) . . . 0 . . . . . . . . . . . . 0 0 . . . W ( A K , A K ) ] O_{K×K}= \begin{bmatrix} W(A_1,\overline A_1) & 0 & ... & 0 \\ 0 & W(A_2,\overline A_2) & ... & 0\\ ... & ... & ... & ...\\ 0 & 0 & ... & W(A_K,\overline A_K) \end{bmatrix}\\ \\ \\ W(A_k,\overline{A_k})=\underbrace{W(A_k,V)}_{\sum_{i∈A_k}d_i}-\underbrace{W(A_k,A_k)}_{\sum_{i∈A_k}\sum_{j∈A_k}w_{ij}}\\ \\ \\ →O= \begin{bmatrix} \sum_{i∈A_1}d_i & 0 & ... & 0 \\ 0 & \sum_{i∈A_2}d_i & ... & 0\\ ... & ... & ... & ...\\ 0 & 0 & ... & \sum_{i∈A_K}d_i \end{bmatrix}- \begin{bmatrix} W(A_1,A_1) & 0 & ... & 0 \\ 0 & W(A_2, A_2) & ... & 0\\ ... & ... & ... & ...\\ 0 & 0 & ... & W(A_K, A_K) \end{bmatrix}\\ \\ \\ OK×K= W(A1,A1)0...00W(A2,A2)...0............00...W(AK,AK) W(Ak,Ak)=∑i∈Akdi W(Ak,V)−∑i∈Ak∑j∈Akwij W(Ak,Ak)→O= ∑i∈A1di0...00∑i∈A2di...0............00...∑i∈AKdi − W(A1,A1)0...00W(A2,A2)...0............00...W(AK,AK)
前面的矩阵我们知道: Y T D Y Y^TDY YTDY, 再来看后面部分:
[ W ( A 1 , A 1 ) 0 . . . 0 0 W ( A 2 , A 2 ) . . . 0 . . . . . . . . . . . . 0 0 . . . W ( A K , A K ) ] \begin{bmatrix} W(A_1,A_1) & 0 & ... & 0 \\ 0 & W(A_2, A_2) & ... & 0\\ ... & ... & ... & ...\\ 0 & 0 & ... & W(A_K, A_K) \end{bmatrix} W(A1,A1)0...00W(A2,A2)...0............00...W(AK,AK)
= [ ∑ i ∈ A 1 ∑ j ∈ A 1 w i j 0 . . . 0 0 ∑ i ∈ A 2 ∑ j ∈ A 2 w i j . . . 0 . . . . . . . . . . . . 0 . . . . . . ∑ i ∈ A K ∑ j ∈ A K w i j ] =\begin{bmatrix} \sum_{i∈A_1}\sum_{j∈A_1}w_{ij} & 0 & ... & 0 \\ 0 & \sum_{i∈A_2}\sum_{j∈A_2}w_{ij} & ... & 0\\ ... & ... & ... & ...\\ 0 & ... & ...& \sum_{i∈A_K}\sum_{j∈A_K}w_{ij} \end{bmatrix} = ∑i∈A1∑j∈A1wij0...00∑i∈A2∑j∈A2wij..................00...∑i∈AK∑j∈AKwij
猜想后半部分是否等于 Y T W Y Y^TWY YTWY 验证一下:
Y T W Y 维度是 K × K 维 Y T W Y = [ y 1 , . . . y N ] [ w 11 w 12 . . . w 1 N w 21 w 22 . . . w 2 N . . . . . . . . . . . . w N 1 w N 2 . . . w N N ] [ y 1 T y 2 T . . . y N T ] = [ ∑ i = 1 N y i w i 1 , . . . , ∑ i = 1 N y i w N i ] [ y 1 T y 2 T . . . y N T ] = ∑ i = 1 N ∑ j = 1 N y i w i j y i T = ∑ i = 1 N ∑ j = 1 N y i y i T w i j = [ ∑ i ∈ A 1 ∑ j ∈ A 1 w i j ∑ i ∈ A 1 ∑ j ∈ A 2 w i j . . . ∑ i ∈ A 1 ∑ j ∈ A K w i j ∑ i ∈ A 2 ∑ j ∈ A 1 w i j ∑ i ∈ A 2 ∑ j ∈ A 2 w i j . . . ∑ i ∈ A 2 ∑ j ∈ A K w i j . . . . . . . . . . . . ∑ i ∈ A K ∑ j ∈ A 1 w i j ∑ i ∈ A K ∑ j ∈ A 2 w i j . . . ∑ i ∈ A K ∑ j ∈ A K w i j ] Y^TWY维度是K×K维\\ \\ \\ Y^TWY=[y_1,...y_N] \begin{bmatrix} w_{11} & w_{12} & ... & w_{1N} \\ w_{21} & w_{22} & ... & w_{2N}\\ ... & ... & ... & ...\\ w_{N1} & w_{N2} & ... & w_{NN} \end{bmatrix} \begin{bmatrix} y_{1}^T\\ y_{2}^T\\ ... \\ y_{N}^T \end{bmatrix}\\ \\ =[\sum_{i=1}^Ny_iw_{i1},...,\sum_{i=1}^Ny_iw_{Ni}] \begin{bmatrix} y_{1}^T\\ y_{2}^T\\ ... \\ y_{N}^T \end{bmatrix}\\ \\ =\sum_{i=1}^N\sum_{j=1}^Ny_iw_{ij}y_i^T=\sum_{i=1}^N\sum_{j=1}^Ny_iy_i^Tw_{ij}\\ \\ =\begin{bmatrix} \sum_{i∈A_1}\sum_{j∈A_1}w_{ij} & \sum_{i∈A_1}\sum_{j∈A_2}w_{ij} & ... & \sum_{i∈A_1}\sum_{j∈A_K}w_{ij} \\ \sum_{i∈A_2}\sum_{j∈A_1}w_{ij} & \sum_{i∈A_2}\sum_{j∈A_2}w_{ij} & ... & \sum_{i∈A_2}\sum_{j∈A_K}w_{ij}\\ ... & ... & ... & ...\\ \sum_{i∈A_K}\sum_{j∈A_1}w_{ij} & \sum_{i∈A_K}\sum_{j∈A_2}w_{ij} & ... & \sum_{i∈A_K}\sum_{j∈A_K}w_{ij} \end{bmatrix} YTWY维度是K×K维YTWY=[y1,...yN] w11w21...wN1w12w22...wN2............w1Nw2N...wNN y1Ty2T...yNT =[i=1∑Nyiwi1,...,i=1∑NyiwNi] y1Ty2T...yNT =i=1∑Nj=1∑NyiwijyiT=i=1∑Nj=1∑NyiyiTwij= ∑i∈A1∑j∈A1wij∑i∈A2∑j∈A1wij...∑i∈AK∑j∈A1wij∑i∈A1∑j∈A2wij∑i∈A2∑j∈A2wij...∑i∈AK∑j∈A2wij............∑i∈A1∑j∈AKwij∑i∈A2∑j∈AKwij...∑i∈AK∑j∈AKwij
观察上式和O的后半部分:
[ ∑ i ∈ A 1 ∑ j ∈ A 1 w i j 0 . . . 0 0 ∑ i ∈ A 2 ∑ j ∈ A 2 w i j . . . 0 . . . . . . . . . . . . 0 . . . . . . ∑ i ∈ A K ∑ j ∈ A K w i j ] \begin{bmatrix} \sum_{i∈A_1}\sum_{j∈A_1}w_{ij} & 0 & ... & 0 \\ 0 & \sum_{i∈A_2}\sum_{j∈A_2}w_{ij} & ... & 0\\ ... & ... & ... & ...\\ 0 & ... & ...& \sum_{i∈A_K}\sum_{j∈A_K}w_{ij} \end{bmatrix} \\ \\ \\ ∑i∈A1∑j∈A1wij0...00∑i∈A2∑j∈A2wij..................00...∑i∈AK∑j∈AKwij
[ ∑ i ∈ A 1 ∑ j ∈ A 1 w i j ∑ i ∈ A 1 ∑ j ∈ A 2 w i j . . . ∑ i ∈ A 1 ∑ j ∈ A K w i j ∑ i ∈ A 2 ∑ j ∈ A 1 w i j ∑ i ∈ A 2 ∑ j ∈ A 2 w i j . . . ∑ i ∈ A 2 ∑ j ∈ A K w i j . . . . . . . . . . . . ∑ i ∈ A K ∑ j ∈ A 1 w i j ∑ i ∈ A K ∑ j ∈ A 2 w i j . . . ∑ i ∈ A K ∑ j ∈ A K w i j ] \begin{bmatrix} \sum_{i∈A_1}\sum_{j∈A_1}w_{ij} & \sum_{i∈A_1}\sum_{j∈A_2}w_{ij} & ... & \sum_{i∈A_1}\sum_{j∈A_K}w_{ij} \\ \sum_{i∈A_2}\sum_{j∈A_1}w_{ij} & \sum_{i∈A_2}\sum_{j∈A_2}w_{ij} & ... & \sum_{i∈A_2}\sum_{j∈A_K}w_{ij}\\ ... & ... & ... & ...\\ \sum_{i∈A_K}\sum_{j∈A_1}w_{ij} & \sum_{i∈A_K}\sum_{j∈A_2}w_{ij} & ... & \sum_{i∈A_K}\sum_{j∈A_K}w_{ij} \end{bmatrix} ∑i∈A1∑j∈A1wij∑i∈A2∑j∈A1wij...∑i∈AK∑j∈A1wij∑i∈A1∑j∈A2wij∑i∈A2∑j∈A2wij...∑i∈AK∑j∈A2wij............∑i∈A1∑j∈AKwij∑i∈A2∑j∈AKwij...∑i∈AK∑j∈AKwij
我们发现,这两个矩阵对角线元素是相同的,又因为我们是对迹求最小,即只考虑对角线的元素。所以将O的后半部分换成$Y^TWY $并不影响我们的结果
记 O ′ = Y T D Y − Y T W Y O'=Y^TDY - Y^TWY O′=YTDY−YTWY 那么 O ′ P O'P O′P相当于对 O ′ O' O′的对角线做一些变化。那么就有 T r ( O P ) = T r ( O ′ P ) Tr(OP)=Tr(O'P) Tr(OP)=Tr(O′P)。
至此我们解出了 O O O ,并且提出了用 O ′ O' O′ 代替 O O O 可以达到同样的目的。
O ′ = Y T D Y − Y T W Y O'=Y^TDY-Y^TWY O′=YTDY−YTWY
我们最终的优化问题变为:
Y ^ = a r g m i n Y ^ T r ( Y T ( D − W ) Y ( Y T D Y ) − 1 ) = Y ^ = a r g m i n Y ^ T r ( Y T L Y ( Y T D Y ) − 1 ) 这里 L = D − W 是拉普拉斯矩阵 \hat Y=arg \underset{\hat Y}{min}\ Tr(Y^T(D-W)Y(Y^TDY)^{-1})\\ \\ \\ =\hat Y=arg \underset{\hat Y}{min}\ Tr(Y^TLY(Y^TDY)^{-1})\\ \\ 这里L=D-W是拉普拉斯矩阵 Y^=argY^min Tr(YT(D−W)Y(YTDY)−1)=Y^=argY^min Tr(YTLY(YTDY)−1)这里L=D−W是拉普拉斯矩阵
T r ( Y T L Y ( Y T D Y ) − 1 ) Tr(Y^T LY(Y^T DY)^{-1}) Tr(YTLY(YTDY)−1)
其中 Y ∈ R N × K Y∈R^{N×K} Y∈RN×K ,每一行是ONE-HOT,表示第i行属于哪一类。 Y Y Y 形如:
[ 0 . . . 0 1 0 . . . 0 0 . . . 1 0 0 . . . 0 . . . . . . . . . . . . . . . . . . . . . 0 . . . 0 0 0 . . . 1 1 . . . 0 1 0 . . . 0 ] \begin{bmatrix} 0 & ...& 0 & 1 & 0 &... & 0 \\ 0 & ...&1 & 0 & 0 &... & 0\\ ... & ...& ... & ... & ... & ... & ...\\ 0 & ...& 0 & 0 & 0 &... & 1\\ 1 & ...&0 & 1 & 0 &... & 0 \end{bmatrix} 00...01...............01...0010...0100...00...............00...10
记:
P = Y T D Y = d i a g ( ∑ i ∈ A 1 d i , ∑ i ∈ A 2 d i , . . . , ∑ i ∈ A K d i ) = d i a g ( p 1 , p 2 , . . . , p k ) 原式 = T r ( Y T L Y P − 1 ) = T r ( Y T L Y P − 1 2 P − 1 2 ) = T r ( P − 1 2 Y T L Y P − 1 2 ) P=Y^TDY=diag(\sum_{i∈A_1}d_i,\sum_{i∈A_2}d_i,...,\sum_{i∈A_K}d_i)=diag(p_1,p_2,...,p_k)\\ \\ 原式=Tr(Y^TLYP^{-1})=Tr(Y^TLYP^{-\frac12}P^{-\frac12})=Tr(P^{-\frac12}Y^TLYP^{-\frac12}) P=YTDY=diag(i∈A1∑di,i∈A2∑di,...,i∈AK∑di)=diag(p1,p2,...,pk)原式=Tr(YTLYP−1)=Tr(YTLYP−21P−21)=Tr(P−21YTLYP−21)
记:
H = Y P − 1 2 , H T = P − 1 2 Y T H T H = P − 1 2 Y T Y P − 1 2 = P − 1 2 I P − 1 2 = P − 1 H=YP^{-\frac12},H^T=P^{-\frac12}Y^T\\ \\ H^TH=P^{-\frac12}Y^TYP^{-\frac12}=P^{-\frac12}IP^{-\frac12}=P^{-1} H=YP−21,HT=P−21YTHTH=P−21YTYP−21=P−21IP−21=P−1
原式 = T r ( H T L H ) 原式=Tr(H^TLH) 原式=Tr(HTLH)
定理1:
对于半正定矩阵L, 特征值(eigenvalue): 0 ≤ λ 1 ≤ λ 2 ≤ . . . ≤ λ n 0≤\lambda_1≤\lambda_2≤...≤\lambda_n 0≤λ1≤λ2≤...≤λn
特征基(eigbasis): { v ‾ 1 , v ‾ 2 , . . . , v ‾ n } \{\overline v_1,\overline v_2,...,\overline v_n\} {v1,v2,...,vn} →Orthonormal,标准正交化之后的特征向量
当 x ∈ R N , a n d x T x = 1 \mathbf{x}∈R^{N},and\ \ \mathbf{x}^T\mathbf{x}=\mathbf{1} x∈RN,and xTx=1 时, x T L x \mathbf{x}^TL\mathbf{x} xTLx 的最小值在 x = v ‾ 1 \mathbf{x}=\overline v_1 x=v1 时取到。
proof:
x 可以用 e i g b a s i s 表示 , 因为 e i g b a s i s 是 o r t h o n o r m a l x = c 1 v ‾ 1 + c 2 v ‾ 2 + . . . + c n v ‾ n L x = λ x = c 1 λ 1 v ‾ 1 + c 2 λ 2 v ‾ 2 + . . . + c n λ n v ‾ n → x T L x = c 1 2 λ 1 + c 2 2 λ 2 + . . . + c n 2 λ n 因为 x T x = 1 → c 1 2 + c 2 2 + . . . + c n 2 → x T L x = c 1 2 λ 1 + c 2 2 λ 2 + . . . + c n 2 λ n ≥ λ 1 当 c 1 2 = 1 , c i = 0 , i ≠ 1 时等号成立 ⇔ x = v ‾ 1 o r x = − v ‾ 1 \mathbf{x}可以用eigbasis表示, \ 因为eigbasis是orthonormal\\ \\ \mathbf{x}=c_1\overline v_1+c_2\overline v_2+...+c_n\overline v_n\\ \\ L\mathbf{x}=\lambda \mathbf{x}=c_1\lambda_1\overline v_1+c_2\lambda_2\overline v_2+...+c_n\lambda_n\overline v_n\\ \\ →\mathbf{x}^TL\mathbf{x}=c_1^2\lambda_1+c_2^2\lambda_2+...+c_n^2\lambda_n\\ \\ 因为\mathbf{x}^T\mathbf{x}=\mathbf{1}→c_1^2+c_2^2+...+c_n^2\\ \\ →\mathbf{x}^TL\mathbf{x}=c_1^2\lambda_1+c_2^2\lambda_2+...+c_n^2\lambda_n≥\lambda_1\\ 当c_1^2=1,c_i=0,i≠1时等号成立⇔\mathbf{x}=\overline v_1\ \ or\ \ \mathbf{x}=-\overline v_1 x可以用eigbasis表示, 因为eigbasis是orthonormalx=c1v1+c2v2+...+cnvnLx=λx=c1λ1v1+c2λ2v2+...+cnλnvn→xTLx=c12λ1+c22λ2+...+cn2λn因为xTx=1→c12+c22+...+cn2→xTLx=c12λ1+c22λ2+...+cn2λn≥λ1当c12=1,ci=0,i=1时等号成立⇔x=v1 or x=−v1
定理2:
对于半正定矩阵L, 特征值(eigenvalue): 0 ≤ λ 1 ≤ λ 2 ≤ . . . ≤ λ n 0≤\lambda_1≤\lambda_2≤...≤\lambda_n 0≤λ1≤λ2≤...≤λn
特征基(eigbasis): { v ‾ 1 , v ‾ 2 , . . . , v ‾ n } \{\overline v_1,\overline v_2,...,\overline v_n\} {v1,v2,...,vn} →Orthonormal,标准正交化之后的特征向量
当 F ∈ R N × K , a n d F T F = I F∈R^{N×K},\ and\ F^TF=I F∈RN×K, and FTF=I 时, T r ( F T L F ) Tr(F^TLF) Tr(FTLF) 的最小值在 F = [ v ‾ 1 , v ‾ 2 , . . . , v ‾ K ] F=[\overline v_1,\overline v_2,...,\overline v_K] F=[v1,v2,...,vK] 时取到
proof:
D e n o t e F = [ f 1 , f 2 , . . . , f K ] T r ( F T L F ) = ∑ i = 1 K f i T L f i 由于定理 2 f 1 = v ‾ 1 , f 2 = v ‾ 2 , . . . , f n = v ‾ n 时 , T r ( F T L F ) 最小 Denote\ \ \ F=[f_1,f_2,...,f_K]\\ \\ Tr(F^TLF)=\sum_{i=1}^Kf_i^TLf_i\\ \\ 由于定理2\ \ \ f_1=\overline v_1 , f_2=\overline v_2,...,f_n=\overline v_n 时,Tr(F^TLF)最小 Denote F=[f1,