西瓜书公式(10.31)的推导

西瓜书 10.5.2 节 局部线性嵌入

与 Isomap 试图保持近邻样本之间的距离不同,局部线性嵌入(Locally Linear Embedding, 简称 LLE)试图保持邻域内样本之间的线性关系。

LLE 先为每个样本 x i x_i xi 找到其近邻下标集合 Q i Q_i Qi, 然后计算出基于 Q i Q_i Qi 中的样本点对 x i x_i xi 进行线性重构的系数 w i w_i wi

LLE 在低维空间(维度为 d ′ d' d)中保持 w i w_i wi 不变,于是 x i x_i xi 对应的低维空间坐标 z i z_i zi 可通过下式求解:
min ⁡ z 1 , z 2 , . . . , z m ∑ i = 1 m ∥ z i − ∑ j ∈ Q i w i j z j ∥ 2 2 (10.29) \min_{z_1,z_2,...,z_m}\sum_{i=1}^m \left \lVert z_i-\sum_{j\in Q_i} w_{ij} z_j\right \rVert_2^2 \tag{10.29} z1,z2,...,zmmini=1m zijQiwijzj 22(10.29)

Z = ( z 1 , z 2 , . . . , z m ) ∈ R d ′ × m Z=\left( z_1, z_2,..., z_m\right) \in \mathbb{R}^{d'\times m} Z=(z1,z2,...,zm)Rd×m, ( W ) i , j = w i j (W)_{i,j}=w_{ij} (W)i,j=wij,
M = ( I − W ) T ( I − W ) (10.30) M = (I - W)^T(I-W) \tag{10.30} M=(IW)T(IW)(10.30)
则式 (10.29) 可重写为
min ⁡ Z t r ( Z M Z T ) , s . t .      Z Z T = I . (10.31) \min_{Z} tr(ZMZ^T), \\ \tag{10.31} s.t. \ \ \ \ ZZ^T = I. Zmintr(ZMZT),s.t.    ZZT=I.(10.31)
其中 Z T ∈ R m × d ′ Z^T \in \mathbb{R}^{m \times d'} ZTRm×d 中的 d ′ d' d 个列向量是对 M M M 进行特征值分解后最小的 d ′ d' d 个特征值对应的特征向量( Z Z T = I ZZ^T=I ZZT=I 表示要求特征向量是单位向量)。

下面我们对式 (10.31) 进行推导:
t r ( Z M Z T ) = t r ( Z ( I − W ) T ( I − W ) Z T ) = t r ( Z Z T − Z W T Z T − Z W Z T + Z W T W Z T ) = t r ( Z Z T ) − t r ( Z W T Z T ) − t r ( Z W Z T ) + t r ( Z W T W Z T ) = t r ( Z Z T ) − 2 ⋅ t r ( Z W Z T ) + t r ( Z W T W Z T ) = ∑ j = 1 d ′ ( ( Z Z T ) j j − 2 ⋅ ( Z W Z T ) j j + ( Z W T W Z T ) j j ) = ∑ j = 1 d ′ [ ∑ i = 1 m Z j i ( Z T ) i j − 2 ⋅ ∑ i = 1 m Z j i ( W Z T ) i , j + ∑ i = 1 m ( Z W T ) j i ( W Z T ) i j ] = ∑ i = 1 m [ ∑ j = 1 d ′ Z j i ( Z T ) i j − 2 ⋅ ∑ j = 1 d ′ Z j i ( W Z T ) i , j + ∑ j = 1 d ′ ( Z W T ) j i ( W Z T ) i j ] = ∑ i = 1 m [ ∑ j = 1 d ′ ( Z T ) i j Z j i − 2 ⋅ ∑ j = 1 d ′ ( W Z T ) i j Z j i + ∑ j = 1 d ′ ( W Z T ) i j ( Z W T ) j i ] = ∑ i = 1 m [ ( Z T ) i ( Z T ) i T − 2 ( W Z T ) i ( Z T ) i T + ( W Z T ) i ( W Z T ) i T ] = ∑ i = 1 m [ z i T z i − 2 ( W Z T ) i z i + ( W Z T ) i ( W Z T ) i T ] \begin{aligned} tr(ZMZ^T) &=tr\left(Z(I - W)^T(I-W)Z^T\right) \\ &= tr(ZZ^T-ZW^TZ^T-ZWZ^T+ZW^TWZ^T) \\ &=tr(ZZ^T)-tr(ZW^TZ^T)-tr(ZWZ^T)+tr(ZW^TWZ^T) \\ &=tr(ZZ^T)-2\cdot tr(ZWZ^T)+tr(ZW^TWZ^T) \\ &=\sum_{j=1}^{d'}\left((ZZ^T)_{jj}-2\cdot(ZWZ^T)_{jj}+ (ZW^TWZ^T)_{jj}\right) \\ &=\sum_{j=1}^{d'}\left[ \sum_{i=1}^{m}Z_{ji}(Z^T)_{ij} - 2 \cdot \sum_{i=1}^{m}Z_{ji}(WZ^T)_{i,j} + \sum_{i=1}^{m}(ZW^T)_{ji} (WZ^T)_{ij} \right]\\ &=\sum_{i=1}^m\left[ \sum_{j=1}^{d'}Z_{ji}(Z^T)_{ij} - 2 \cdot \sum_{j=1}^{d'}Z_{ji}(WZ^T)_{i,j} + \sum_{j=1}^{d'}(ZW^T)_{ji} (WZ^T)_{ij} \right]\\ &=\sum_{i=1}^m\left[ \sum_{j=1}^{d'}(Z^T)_{ij}Z_{ji} - 2 \cdot \sum_{j=1}^{d'}(WZ^T)_{ij}Z_{ji} + \sum_{j=1}^{d'}(WZ^T)_{ij}(ZW^T)_{ji} \right]\\ &=\sum_{i=1}^m\left[ (Z^T)_{i}(Z^T)_{i}^T - 2 (WZ^T)_{i}(Z^T)_{i}^T + (WZ^T)_{i}(WZ^T)_{i}^T \right]\\ &=\sum_{i=1}^m\left[ z_{i}^T z_{i}- 2 (WZ^T)_{i}z_{i} + (WZ^T)_{i}(WZ^T)_{i}^T \right]\\ \end{aligned} tr(ZMZT)=tr(Z(IW)T(IW)ZT)=tr(ZZTZWTZTZWZT+ZWTWZT)=tr(ZZT)tr(ZWTZT)tr(ZWZT)+tr(ZWTWZT)=tr(ZZT)2tr(ZWZT)+tr(ZWTWZT)=j=1d((ZZT)jj2(ZWZT)jj+(ZWTWZT)jj)=j=1d[i=1mZji(ZT)ij2i=1mZji(WZT)i,j+i=1m(ZWT)ji(WZT)ij]=i=1m j=1dZji(ZT)ij2j=1dZji(WZT)i,j+j=1d(ZWT)ji(WZT)ij =i=1m j=1d(ZT)ijZji2j=1d(WZT)ijZji+j=1d(WZT)ij(ZWT)ji =i=1m[(ZT)i(ZT)iT2(WZT)i(ZT)iT+(WZT)i(WZT)iT]=i=1m[ziTzi2(WZT)izi+(WZT)i(WZT)iT]

其中
( W Z T ) i = W i Z T = ∑ j = 1 m W i j ( Z T ) j = ∑ j = 1 m w i j z j \begin{aligned} (WZ^T)_i&=W_iZ^T \\ &=\sum_{j=1}^mW_{ij}(Z^T)_{j} \\ &=\sum_{j=1}^m w_{ij}z_{j} \end{aligned} (WZT)i=WiZT=j=1mWij(ZT)j=j=1mwijzj

代入上式可得
t r ( Z M Z T ) = ∑ i = 1 m [ z i T z i − 2 ( W Z T ) i z i + ( W Z T ) i ( W Z T ) i T ] = ∑ i = 1 m [ z i T z i − 2 ( ∑ j = 1 m w i j z j ) T z i + ( ∑ j = 1 m w i j z j ) T ( ∑ j = 1 m w i j z j ) ] = ∑ i = 1 m ( z i − ∑ j = 1 m w i j z j ) 2 \begin{aligned} tr(ZMZ^T) &=\sum_{i=1}^m\left[ z_{i}^T z_{i}- 2 (WZ^T)_{i}z_{i} + (WZ^T)_{i}(WZ^T)_{i}^T \right] \\ &=\sum_{i=1}^m\left[ z_{i}^T z_{i}- 2 \left(\sum_{j=1}^m w_{ij}z_{j} \right)^Tz_{i} + \left(\sum_{j=1}^m w_{ij}z_{j} \right)^T\left(\sum_{j=1}^m w_{ij}z_{j} \right) \right] \\ &=\sum_{i=1}^m \left( z_i- \sum_{j=1}^m w_{ij}z_{j} \right)^2 \end{aligned} tr(ZMZT)=i=1m[ziTzi2(WZT)izi+(WZT)i(WZT)iT]=i=1m ziTzi2(j=1mwijzj)Tzi+(j=1mwijzj)T(j=1mwijzj) =i=1m(zij=1mwijzj)2

所以
min ⁡ Z t r ( Z M Z T ) = min ⁡ z 1 , z 2 , . . . , z m ∑ i = 1 m ( z i − ∑ j = 1 m w i j z j ) 2 = min ⁡ z 1 , z 2 , . . . , z m ∑ i = 1 m ∥ z i − ∑ j ∈ Q i w i j z j ∥ 2 2 \begin{aligned} \min_{Z} tr(ZMZ^T) &= \min_{z_1,z_2,...,z_m}\sum_{i=1}^m \left( z_i- \sum_{j=1}^m w_{ij}z_{j} \right)^2 \\ &=\min_{z_1,z_2,...,z_m}\sum_{i=1}^m \left \lVert z_i-\sum_{j\in Q_i} w_{ij} z_j\right \rVert_2^2 \end{aligned} Zmintr(ZMZT)=z1,z2,...,zmmini=1m(zij=1mwijzj)2=z1,z2,...,zmmini=1m zijQiwijzj 22
Q.E.D.

你可能感兴趣的:(AI,ML,机器学习,度量学习,降维,西瓜书,线性代数)