【白板推导系列笔记】降维-PCA-最大投影方差&最小重构代价

作者:shuhuai008
链接:【机器学习】【白板推导系列】【合集 1~33】_哔哩哔哩_bilibili

PCA的核心就是对原始特征空间的重构(将一组可能线性相关的变量,通过正交变换变换成一组线性无关的变量)
两个基本的要求是最大投影方差(即找到的投影方向对于数据集投影方差最大),最小重构代价(即降维所得到的新数据与原数据相比,信息损失最小)

X = ( x 1 x 2 ⋯ x N ) N × p T = ( x 1 T x 2 T ⋮ x N T ) = ( x 11 x 12 ⋯ x 1 p x 21 x 22 ⋯ x 2 p ⋮ ⋮ ⋮ x N 1 x N 2 ⋯ x N P ) N × p x i ∈ R p , i = 1 , 2 , ⋯   , N 记 1 N = ( 1 1 ⋮ 1 ) N × 1 x ˉ = 1 N X T 1 N , S = 1 N X T H X \begin{gathered} X=\begin{pmatrix} x_{1} & x_{2} & \cdots & x_{N} \end{pmatrix}^{T}_{N \times p}=\begin{pmatrix} x_{1}^{T} \\ x_{2}^{T} \\ \vdots \\ x_{N}^{T} \end{pmatrix}=\begin{pmatrix} x_{11} & x_{12} & \cdots & x_{1p} \\ x_{21} & x_{22} & \cdots & x_{2p} \\ \vdots & \vdots & & \vdots \\ x_{N1} & x_{N2} & \cdots & x_{NP} \end{pmatrix}_{N \times p}\\ x_{i}\in \mathbb{R}^{p},i=1,2,\cdots ,N\\ 记1_{N}=\begin{pmatrix}1 \\ 1 \\ \vdots \\ 1\end{pmatrix}_{N \times 1}\\ \bar{x}=\frac{1}{N}X^{T}1_{N},S=\frac{1}{N}X^{T}\mathbb{H}X \end{gathered} X=(x1x2xN)N×pT= x1Tx2TxNT = x11x21xN1x12x22xN2x1px2pxNP N×pxiRp,i=1,2,,N1N= 111 N×1xˉ=N1XT1N,S=N1XTHX
对于新的方向向量 u 1 u_{1} u1,归零化后数据的投影为
( x i − x ˉ ) u 1 (x_{i}-\bar{x})u_{1} (xixˉ)u1
显然由于归零化,新的数据集 x ˉ = 0 \bar{x}=0 xˉ=0(即对于 x i − x ˉ x_{i}-\bar{x} xixˉ数据集),因此投影方差为
J = 1 N ∑ i = 1 N [ ( x i − x ˉ ) T u 1 ] 2 − 0 2 = 1 N ∑ i = 1 N [ ( x i − x ˉ ) T u 1 ] 2 = u 1 T ( ∑ i = 1 N 1 N ( x i − x ˉ ) ( x i − x ˉ ) T ) u 1 = u 1 T ⋅ S ⋅ u 1 \begin{aligned} J&=\frac{1}{N}\sum\limits_{i=1}^{N}[(x_{i}-\bar{x})^{T}u_{1}]^{2}-0^{2}\\ &=\frac{1}{N}\sum\limits_{i=1}^{N}[(x_{i}-\bar{x})^{T}u_{1}]^{2}\\ &=u_{1}^{T}\left(\sum\limits_{i=1}^{N} \frac{1}{N}(x_{i}-\bar{x})(x_{i}-\bar{x})^{T}\right)u_{1}\\ &=u_{1}^{T}\cdot S \cdot u_{1} \end{aligned} J=N1i=1N[(xixˉ)Tu1]202=N1i=1N[(xixˉ)Tu1]2=u1T(i=1NN1(xixˉ)(xixˉ)T)u1=u1TSu1
对于 u 1 ^ \hat{u_{1}} u1^
u 1 ^ = argmax  u 1 u 1 T ⋅ S ⋅ u 1 \begin{aligned} \hat{u_{1}}&=\mathop{\text{argmax}\space}\limits_{u_{1}}u_{1}^{T}\cdot S \cdot u_{1} \end{aligned} u1^=u1argmax u1TSu1
这里我们令 u 1 T u 1 = 1 u_{1}^{T}u_{1}=1 u1Tu1=1,因此,根据拉格朗日数乘法有
L ( u , λ ) = u 1 T S u 1 + λ ( 1 − u 1 T u 1 ) ∂ L ( u , λ ) ∂ u 1 = 2 S u 1 − λ 2 u 1 = 0 S u 1 = λ u 1 \begin{aligned} L(u,\lambda)&=u_{1}^{T}Su_{1}+\lambda(1-u_{1}^{T}u_{1})\\ \frac{\partial L(u,\lambda)}{\partial u_{1}}&=2S u_{1}-\lambda 2u_{1}=0\\ S u_{1}&=\lambda u_{1} \end{aligned} L(u,λ)u1L(u,λ)Su1=u1TSu1+λ(1u1Tu1)=2Su1λ2u1=0=λu1
上式对于方差矩阵 S S S u 1 u_{1} u1即为特征向量, λ \lambda λ即为特征值。将该式代回 u 1 ^ \hat{u_{1}} u1^
u 1 ^ = argmax  u 1 u 1 T λ u 1 = argmax  u 1 λ u 1 T u 1 = argmax  u 1 λ \begin{aligned} \hat{u_{1}}&=\mathop{\text{argmax}\space}\limits_{u_{1}}u_{1}^{T}\lambda u_{1}\\ &=\mathop{\text{argmax}\space}\limits_{u_{1}}\lambda u_{1}^{T}u_{1}\\ &=\mathop{\text{argmax}\space}\limits_{u_{1}}\lambda \end{aligned} u1^=u1argmax u1Tλu1=u1argmax λu1Tu1=u1argmax λ
这里是对于降维后只有一个向量,如果是想要降维后有 q q q个向量,思路大体一致
J = ∑ j = 1 q u j T S u j = ∑ j = 1 q λ j ( 从大到小取 λ ) \begin{aligned} J&=\sum\limits_{j=1}^{q}u_{j}^{T}Su_{j}\\ &=\sum\limits_{j=1}^{q}\lambda_{j}\quad (从大到小取\lambda) \end{aligned} J=j=1qujTSuj=j=1qλj(从大到小取λ)
另一个角度要求最小重构代价。对于原数据 x i x_{i} xi,原本是 p p p维向量,如果我们保留 u i u_{i} ui的所有向量,则可表示为
x i = ∑ k = 1 p ( x i T u i ) u i x_{i}=\sum\limits_{k=1}^{p}(x_{i}^{T}u_{i})u_{i} xi=k=1p(xiTui)ui
其中 x i T u i x_{i}^{T}u_{i} xiTui可以认为是投影长度,也就是单位长度 u i u_{i} ui为单位向量。
对于 x ^ i \hat{x}_{i} x^i,其也是 p p p维的,但我们假设只保留其最大 λ \lambda λ对应的前 q q q个维度,则可表示为
x i ^ = ∑ k = 1 q ( x i T u i ) u i \hat{x_{i}}=\sum\limits_{k=1}^{q}(x_{i}^{T}u_{i})u_{i} xi^=k=1q(xiTui)ui
在PCA中,由于我们需要中心化原数据集,因此上述 x i x_{i} xi需要变为 x i − x ˉ x_{i}-\bar{x} xixˉ(其实道理都一样),对应损失函数
J = 1 N ∑ i = 1 N ∣ ∣ ( x i − x ˉ ) − x i ^ ∣ ∣ 2 = 1 N ∑ i = 1 N ∣ ∣ ∑ k = q + 1 p [ ( x i − x ˉ ) T u k ] u k ∣ ∣ 2 = 1 N ∑ i = 1 N ∑ k = q + 1 p [ ( x i − x ˉ ) T u k ] 2 = ∑ k = q + 1 p ∑ i = 1 N 1 N [ ( x i − x ˉ ) T u k ] 2 ⏟ u k T ⋅ S ⋅ u k = ∑ k = q + 1 p u k T ⋅ S ⋅ u k \begin{aligned} J&=\frac{1}{N}\sum\limits_{i=1}^{N}||(x_{i}-\bar{x})-\hat{x_{i}}||^{2}\\ &=\frac{1}{N}\sum\limits_{i=1}^{N}\left|\left|\sum\limits_{k=q+1}^{p}[(x_{i}-\bar{x})^{T}u_{k}]u_{k}\right|\right|^{2}\\ &=\frac{1}{N}\sum\limits_{i=1}^{N}\sum\limits_{k=q+1}^{p}[(x_{i}-\bar{x})^{T}u_{k}]^{2}\\ &=\sum\limits_{k=q+1}^{p}\underbrace{\sum\limits_{i=1}^{N} \frac{1}{N}[(x_{i}-\bar{x})^{T}u_{k}]^{2}}_{u_{k}^{T}\cdot S \cdot u_{k}}\\ &=\sum\limits_{k=q+1}^{p}u_{k}^{T}\cdot S \cdot u_{k} \end{aligned} J=N1i=1N∣∣(xixˉ)xi^2=N1i=1N k=q+1p[(xixˉ)Tuk]uk 2=N1i=1Nk=q+1p[(xixˉ)Tuk]2=k=q+1pukTSuk i=1NN1[(xixˉ)Tuk]2=k=q+1pukTSuk
因此对于 u k ^ \hat{u_{k}} uk^,有
u k ^ = argmin  u k T S u k \hat{u_{k}}=\mathop{\text{argmin}\space}u_{k}^{T}Su_{k} uk^=argmin ukTSuk
再根据之前的要求 u k T u k = 1 u_{k}^{T}u_{k}=1 ukTuk=1,建立拉格朗日函数和上面最大投影方差完全相同,不再展示
这里就说明了两个角度最大投影方差,最小重构代价是相同的

你可能感兴趣的:(白板推导系列笔记,重构,机器学习,算法,人工智能,数据挖掘)