最近对流形学习比较感兴趣,因为流形学习用到了MDS算法,所以写一篇博客记录下来,主要是对西瓜书里面的公式进行详细的推导,并给出Python代码实现。MDS算法即多维缩放(Multiple Dimensional Scaling)算法,是一种常见的降维算法,核心思想是要求原始空间样本间的距离在低维空间得到保持。
现给定训练样本 χ = { x 1 , x 2 , . . . , x m } \chi=\{x_1, x_2,...,x_m\} χ={x1,x2,...,xm},其中 x i ∈ R d , i = 1 , . . . , m x_i\in R^{d},i=1,...,m xi∈Rd,i=1,...,m,共 m m m个样本,每个样本的长度为 d d d。我们的目标是获得在低维 d ′ d' d′空间的表示 Z = { z 1 , z 2 , . . . , z m } ∈ R d ′ × m Z=\{z_1, z_2,...,z_m\}\in R^{d'\times{m}} Z={z1,z2,...,zm}∈Rd′×m,且 d ′ < d d'
令 B = Z T Z ∈ R m × m B=Z^TZ\in R^{m\times{m}} B=ZTZ∈Rm×m, B B B为降维后的内积矩阵,即 b i j = z i T z j b_{ij}=z_i^Tz_j bij=ziTzj,则 d i j = ∣ ∣ z i − z j ∣ ∣ 2 2 = ( z i ) 2 + ( z j ) 2 − 2 z i T z j = b i i + b j j − 2 b i j d_{ij}=||z_i-z_j||_2^2=(z_i)^2+(z_j)^2-2z_i^Tz_j=b_{ii}+b_{jj}-2b_{ij} dij=∣∣zi−zj∣∣22=(zi)2+(zj)2−2ziTzj=bii+bjj−2bij此时令降维后的样本 Z Z Z被中心化 ∑ i = 1 m z i = 0 \sum_{i=1}^m z_i=0 ∑i=1mzi=0,这里的中心化要说明一下,第一遍看的时候迷迷糊糊,跑完实验才彻底理解,中心化指的是所有的低维向量的某一个维度之和等于零,而非低维向量自身的所有维度相加等于零。那么 ∑ i = 1 m b i j = b 1 j + b 2 j + . . . + b m j = z 1 T z j + z 2 T z j + . . . + z m T z j = ( z 1 T + z 2 T + . . . + z m T ) z j = 0 \begin{aligned} \sum_{i=1}^m b_{ij} &= b_{1j}+b_{2j}+...+b_{mj} \\ & = z_1^Tz_j+z_2^Tz_j+...+z_m^Tz_j \\ & = (z_1^T+z_2^T+...+z_m^T)z_j \\ & = 0 \end{aligned} i=1∑mbij=b1j+b2j+...+bmj=z1Tzj+z2Tzj+...+zmTzj=(z1T+z2T+...+zmT)zj=0同理, ∑ j = 1 m b i j = 0 \sum_{j=1}^m b_{ij}=0 ∑j=1mbij=0。接下来可以推导得到: ∑ i = 1 m d i j 2 = d 1 j 2 + d 2 j 2 + . . . + d m j 2 = b 11 + b j j − 2 b 1 j + b 22 + b j j − 2 b 2 j + . . . + b m m + b j j − 2 b m j = b 11 + b 22 + . . . + b m m + m b j j − 2 ( b 1 j + b 2 j + . . . + b m j ) = t r ( B ) + m b j j \begin{aligned} \sum_{i=1}^m d_{ij}^2 &=d_{1j}^2+d_{2j}^2+...+d_{mj}^2 \\ &= b_{11}+b_{jj}-2b_{1j}+b_{22}+b_{jj}-2b_{2j}+...+b_{mm}+b_{jj}-2b_{mj} \\ &= b_{11}+b_{22}+...+b_{mm}+mb_{jj}-2(b_{1j}+b_{2j}+...+b_{mj})\\ &= tr(B)+mb_{jj} \end{aligned} i=1∑mdij2=d1j2+d2j2+...+dmj2=b11+bjj−2b1j+b22+bjj−2b2j+...+bmm+bjj−2bmj=b11+b22+...+bmm+mbjj−2(b1j+b2j+...+bmj)=tr(B)+mbjj同理, ∑ j = 1 m d i j 2 = t r ( B ) + m b i i \sum_{j=1}^m d_{ij}^2=tr(B)+mb_{ii} ∑j=1mdij2=tr(B)+mbii,最后: ∑ i = 1 m ∑ j = 1 m d i j 2 = ∑ j = 1 m d 1 j 2 + ∑ j = 1 m d 2 j 2 + . . . + ∑ j = 1 m d m j 2 = t r ( B ) + m b 11 + t r ( B ) + m b 22 + . . . + t r ( B ) + m b m m = m × t r ( B ) + m ( b 11 + b 22 + . . . + b m m ) = 2 m × t r ( B ) \begin{aligned} \sum_{i=1}^m \sum_{j=1}^m d_{ij}^2 &=\sum_{j=1}^m d_{1j}^2+ \sum_{j=1}^m d_{2j}^2+...+\sum_{j=1}^m d_{mj}^2\\ &= tr(B)+mb_{11}+tr(B)+mb_{22}+...+tr(B)+mb_{mm} \\ &= m\times tr(B)+m(b_{11}+b_{22}+...+b_{mm}) \\ &= 2m\times tr(B) \end{aligned} i=1∑mj=1∑mdij2=j=1∑md1j2+j=1∑md2j2+...+j=1∑mdmj2=tr(B)+mb11+tr(B)+mb22+...+tr(B)+mbmm=m×tr(B)+m(b11+b22+...+bmm)=2m×tr(B)在得到原始空间样本的距离矩阵 D D D和 ∑ i = 1 m d i j 2 \sum_{i=1}^m d_{ij}^2 ∑i=1mdij2、 ∑ j = 1 m d i j 2 \sum_{j=1}^m d_{ij}^2 ∑j=1mdij2以及 ∑ i = 1 m ∑ j = 1 m d i j 2 \sum_{i=1}^m \sum_{j=1}^m d_{ij}^2 ∑i=1m∑j=1mdij2之后,就可以求得降维后的内积矩阵 B B B: b i j = 1 2 ( b i i + b j j − d i j 2 ) = 1 2 ( 1 m [ t r ( B ) + m b j j ] + 1 m [ t r ( B ) + m b i i ] − 2 m t r ( B ) − d i j 2 ) = 1 2 ( 1 m ∑ i = 1 m d i j 2 + 1 m ∑ j = 1 m d i j 2 − 1 m 2 ∑ i = 1 m ∑ j = 1 m d i j 2 − d i j 2 ) \begin{aligned} b_{ij} & = \frac{1}{2} (b_{ii}+b_{jj}-d_{ij}^2) \\ & = \frac{1}{2} (\frac{1}{m}[tr(B)+mb_{jj}]+\frac{1}{m}[tr(B)+mb_{ii}]-\frac{2}{m}tr(B)-d_{ij}^2) \\ &= \frac{1}{2} (\frac{1}{m}\sum_{i=1}^m d_{ij}^2+\frac{1}{m}\sum_{j=1}^m d_{ij}^2-\frac{1}{m^2}\sum_{i=1}^m\sum_{j=1}^m d_{ij}^2-d_{ij}^2) \end{aligned} bij=21(bii+bjj−dij2)=21(m1[tr(B)+mbjj]+m1[tr(B)+mbii]−m2tr(B)−dij2)=21(m1i=1∑mdij2+m1j=1∑mdij2−m21i=1∑mj=1∑mdij2−dij2) 下一步就是对内积矩阵 B B B进行特征值分解: B = V Λ V T B=V \Lambda V^T B=VΛVT,其中 Λ = d i a g ( λ 1 , λ 2 , . . . , λ d ) \Lambda=diag(\lambda_1,\lambda_2,..., \lambda_d) Λ=diag(λ1,λ2,...,λd)特征值构成的对角矩阵,且 λ 1 ≥ λ 2 ≥ . . . ≥ λ d \lambda_1 \geq \lambda_2 \geq ... \geq \lambda_d λ1≥λ2≥...≥λd。现取前 d ′ d' d′个特征值构成对角矩阵 Λ ′ = d i a g ( λ 1 , λ 2 , . . . , λ d ′ ) ∈ R d ′ × d ′ \Lambda'=diag(\lambda_1,\lambda_2,..., \lambda_{d'}) \in R^{d'\times{d'}} Λ′=diag(λ1,λ2,...,λd′)∈Rd′×d′,对应的特征向量矩阵 V ′ ∈ R m × d ′ V' \in R^{m\times{d'}} V′∈Rm×d′。因为 B = Z T Z B=Z^TZ B=ZTZ,所以 Z = Λ ′ 1 / 2 V ′ T ∈ R d ′ × m Z=\Lambda'^{1/2} V'^T \in R^{d'\times{m}} Z=Λ′1/2V′T∈Rd′×m,这就得到了最后的低维嵌入向量 Z Z Z。公式一大堆,看似很复杂,最后总结一下就三个步骤:首先求原始空间的距离矩阵 D D D,然后根据 D D D求降维后的内积矩阵 B B B,最后根据 B B B求低维嵌入 Z Z Z。
下面通过代码来说明一下,这里只给出部分核心代码,完整代码见我的GitHub,代码部分参考了博客MDS算法,在此表示感谢。首先是距离矩阵 D D D的求解:
def get_distance_matrix(data):
expand_ = data[:, np.newaxis, :]
repeat1 = np.repeat(expand_, data.shape[0], axis=1)
repeat2 = np.swapaxes(repeat1, 0, 1)
D = np.linalg.norm(repeat1 - repeat2, ord=2, axis=-1, keepdims=True).squeeze(-1)
return D
然后是求解内积矩阵 B B B:
def get_matrix_B(D):
assert D.shape[0] == D.shape[1]
DD = np.square(D)
sum_ = np.sum(DD, axis=1) / D.shape[0]
Di = np.repeat(sum_[:, np.newaxis], D.shape[0], axis=1)
Dj = np.repeat(sum_[np.newaxis, :], D.shape[0], axis=0)
Dij = np.sum(DD) / ((D.shape[0])**2) * np.ones([D.shape[0], D.shape[0]])
B = (Di + Dj - DD- Dij) / 2
return B
最后根据 B B B求解 Z Z Z:
def MDS(data, n=2):
D = get_distance_matrix(data)
B = get_matrix_B(D)
B_value, B_vector = np.linalg.eigh(B)
Be_sort = np.argsort(-B_value)
B_value = B_value[Be_sort] # 降序排列的特征值
B_vector = B_vector[:,Be_sort] # 降序排列的特征值对应的特征向量
Bez = np.diag(B_value[0:n])
Bvz = B_vector[:, 0:n]
Z = np.dot(np.sqrt(Bez), Bvz.T).T
return Z