线性判别分析(Linear Discriment Analysis, LDA)是一种经典的分类算法。核心思想是将训练数据投影到这样的一条直线上:
当我们对新数据进行预测时,只需同样的将该数据投影到求得的直线上,通过判断,投影点哪一个类近,就认为是哪一类。
总结起来就是六个字:类内近,类间远。
我们以二分类为例:
我们假设输入样本为 ( X , Y ) (X,Y) (X,Y),样本总数为N,有两个类别 C 1 C_1 C1和 C 2 C_2 C2。
X = ( x 11 , x 12 , ⋯ x 1 p x 21 , x 22 , ⋯ , x 2 p ⋮ x n 1 , x N 2 , ⋯ , x N p ) N × p X=\left(\begin{array}{c}x_{11}, x_{12}, \cdots x_{1 p} \\ x_{21}, x_{22}, \cdots, x_{2 p} \\ \vdots \\ x_{n 1}, x_{N 2}, \cdots, x_{N p} \end{array}\right)_{N \times p} X=⎝⎜⎜⎜⎛x11,x12,⋯x1px21,x22,⋯,x2p⋮xn1,xN2,⋯,xNp⎠⎟⎟⎟⎞N×p
Y = ( y 1 , y 2 , ⋯ , y N ) ⊤ , y ∈ { C 1 , C 2 } Y=\left(y_{1}, y_{2}, \cdots, y_{N}\right)^{\top}, \quad y \in\{C_1,C_2\} Y=(y1,y2,⋯,yN)⊤,y∈{C1,C2}
则所求直线可表示为:
y = W ⊤ x y=W^{\top} x y=W⊤x
其中, W = ( w 1 , w 2 , ⋯ , w N ) ⊤ W=\left(w_{1}, w_{2}, \cdots, w_{N}\right)^{\top} W=(w1,w2,⋯,wN)⊤
为方便运算,我们假定 ∥ W ∥ = 1 \|W\|=1 ∥W∥=1。
其中样本点 ∥ x i ∥ \|x_i\| ∥xi∥在直线 y = w ⊤ x y =w^{\top} x y=w⊤x的投影距为:
∥ x i → ∥ ⋅ cos θ = ∥ x ⃗ i ∥ ⋅ ∥ w ⃗ ∥ ⋅ cos θ = ω ⊤ x i \|\overrightarrow{x_{i}}\| \cdot \cos \theta=\left\|\vec{x}_{i}\right\| \cdot\|\vec{w}\| \cdot \cos \theta=\omega^{\top} x_{i} ∥xi∥⋅cosθ=∥xi∥⋅∥w∥⋅cosθ=ω⊤xi
我们以投影距作为样本点 x i x_i xi在直线 y = w ⊤ x y =w^{\top} x y=w⊤x的一维坐标。
类间距我们用样本的方差均值表示,类内距我们用样本方差表示。
样本均值: y ˉ = 1 N ∑ i = 1 N w ⊤ x i \begin{aligned}\bar{y}=\frac{1}{N} \sum_{i=1}^{N} w^{\top} x_{i} \end{aligned} yˉ=N1i=1∑Nw⊤xi
样本方差: S = 1 N ∑ i = 1 N ( w ⊤ x i − y ˉ ) ( w ⊤ x i − y ˉ ) ⊤ \begin{aligned}S=\frac{1}{N} \sum_{i=1}^{N}\left(w^{\top} x_{i}-\bar{y}\right)\left(w^{\top} x_{i}-\bar{y}\right)^{\top} \end{aligned} S=N1i=1∑N(w⊤xi−yˉ)(w⊤xi−yˉ)⊤
类间距为: ( y ˉ 1 − y ˉ 2 ) 2 \left(\bar{y}_{1}-\bar{y}_{2}\right)^{2} (yˉ1−yˉ2)2
类内距为: S 1 + S 2 S_1+S_2 S1+S2
优化函数为: J ( w ) = ( y 1 ˉ − y ˉ 2 ) 2 S 1 + S 2 \Large J(w)=\frac{\left(\bar{y_{1}}-\bar{y}_{2}\right)^{2}}{S_{1}+S_{2}} J(w)=S1+S2(y1ˉ−yˉ2)2
w ^ = arg max w J ( w ) = arg max w ( y ˉ 1 − y ˉ 2 ) S 1 + S 2 \Large \widehat{w}=\arg \max _{w} J(w)=\arg \max _{w} \frac{\left(\bar{y}_{1}-\bar{y}_{2}\right)}{S_{1}+S_{2}} w =argmaxwJ(w)=argmaxwS1+S2(yˉ1−yˉ2)
其中:
( y ˉ 1 − y ˉ 2 ) 2 = ( 1 N 1 ∑ i = 1 N 1 w ⊤ x i − 1 N 2 ∑ i = 1 N 1 w ⊤ x i ) 2 = ( w ⊤ 1 N i ∑ i = 1 N 1 x i − ω i ∑ i = 1 N 2 x i ) 2 = w ⊤ ( x ˉ c 1 − x ˉ c 2 ) ( x ˉ c 1 − x ˉ c 2 ) ⊤ w \begin{aligned} \left(\bar{y}_{1}-\bar{y}_{2}\right)^{2} & = \left(\frac{1}{N_{1}} \sum_{i=1}^{N_{1}} w^{\top} x_{i}-\frac{1}{N_{2}} \sum_{i=1}^{N_{1}} w^{\top} x_{i}\right)^{2} \\ & =\left(w^{\top} \frac{1}{N_{i}} \sum_{i=1}^{N_{1}} x_{i}-\omega_{i} \sum_{i=1}^{N_{2}} x_{i}\right)^{2} \\ & =w^{\top}\left(\bar{x}_{c_{1}}-\bar{x}_{c_2}\right)(\bar x_{c_1}-\bar x_{c_2})^{\top} w \end{aligned} (yˉ1−yˉ2)2=(N11i=1∑N1w⊤xi−N21i=1∑N1w⊤xi)2=(w⊤Ni1i=1∑N1xi−ωii=1∑N2xi)2=w⊤(xˉc1−xˉc2)(xˉc1−xˉc2)⊤w
S 1 = 1 N 1 ∑ i = 1 N 1 ( ω j ˉ x i − y ˉ 1 ) ( ω ⊤ x i − y ˉ 1 ) ⊤ = 1 N 1 ∑ i = 1 N 1 ( w ⊤ x i − 1 ) N 1 ∑ i = 1 N w ⊤ x i ) ⋅ ( ω x i − 1 N 1 ∑ i = 1 N w ⊤ x i = 1 N 1 ∑ i = 1 N 1 w ⊤ ( x i − 1 N 1 ∑ i = 1 N x i ) ⋅ ( x i − 1 N 1 ∑ i = 1 N x i ) ⊤ ⋅ w = w ⊤ 1 N 1 ∑ i = 1 N ( x i − x ˉ 1 ) ( x i − x ˉ 1 ) ⊤ ⋅ w = w ⊤ S C 1 w \begin{aligned} S_{1} &=\frac{1}{N_{1}} \sum_{i=1}^{N_{1}}\left(\omega^{\bar{j}} x_{i}-\bar{y}_{1}\right)\left(\omega^{\top} x_{i}-\bar{y}_{1}\right)^{\top} \\ &=\frac{1}{N_{1}} \sum_{i=1}^{N_{1}}\left(w^{\top} x_{i}-\frac{1}){N_{1}} \sum_{i=1}^{N} w^{\top} x_{i}\right) \cdot\left(\omega_{x_{i}}-\frac{1}{N_{1}} \sum_{i=1}^{N} w^{\top} x_{i}\right.\\ &=\frac{1}{N_{1}} \sum_{i=1}^{N_{1}} w^{\top}\left(x_{i}-\frac{1}{N_{1}} \sum_{i=1}^{N} x_{i}\right) \cdot\left(x_{i}-\frac{1}{N_{1}}\sum_{i=1}^{N} x_{i}\right)^\top \cdot w \\ &=w^{\top} \frac{1}{N_{1}} \sum_{i=1}^{N}\left(x_{i}-\bar{x}_{1}\right)\left(x_{i}-\bar{x}_{1}\right)^{\top} \cdot w \\ &=w^{\top} S_{C_1} w \end{aligned} S1=N11i=1∑N1(ωjˉxi−yˉ1)(ω⊤xi−yˉ1)⊤=N11i=1∑N1(w⊤xi−)1N1i=1∑Nw⊤xi)⋅(ωxi−N11i=1∑Nw⊤xi=N11i=1∑N1w⊤(xi−N11i=1∑Nxi)⋅(xi−N11i=1∑Nxi)⊤⋅w=w⊤N11i=1∑N(xi−xˉ1)(xi−xˉ1)⊤⋅w=w⊤SC1w
同理 S 2 = w ⊤ S C 2 w S_2 = w^\top S_{C_2}w S2=w⊤SC2w
故 :
J ( w ) = ( y 1 ˉ − y ˉ 2 ) 2 S 1 + S 2 = w ⊤ ( x ˉ c 1 − x ˉ c 2 ) ( x ˉ c 1 − x ˉ c 2 ) ⊤ w w ⊤ ( S C 1 + S C 2 ) w \Large \begin{aligned} J(w)&= \frac{\left(\bar{y_{1}}-\bar{y}_{2}\right)^{2}}{S_{1}+S_{2}} \\ & = \frac{w^{\top}\left(\bar{x}_{c_{1}}-\bar{x}_{c_2}\right)(\bar x_{c_1}-\bar x_{c_2})^{\top} w} {w^{\top} (S_{C_1}+S_{C_2}) w } \end{aligned} J(w)=S1+S2(y1ˉ−yˉ2)2=w⊤(SC1+SC2)ww⊤(xˉc1−xˉc2)(xˉc1−xˉc2)⊤w
而其中 ( x ˉ c 1 − x ˉ c 2 ) ( x ˉ c 1 − x ˉ c 2 ) ⊤ (\bar{x}_{c_{1}}-\bar{x}_{c_2})(\bar x_{c_1}-\bar x_{c_2})^{\top} (xˉc1−xˉc2)(xˉc1−xˉc2)⊤和 ( S C 1 + S C 2 ) (S_{C_1}+S_{C_2}) (SC1+SC2)与w无关,为方便运算,我们令 S b = ( x ˉ c 1 − x ˉ c 2 ) ( x ˉ c 1 − x ˉ c 2 ) ⊤ S_{b}= (\bar{x}_{c_{1}}-\bar{x}_{c_2})(\bar x_{c_1}-\bar x_{c_2})^{\top} Sb=(xˉc1−xˉc2)(xˉc1−xˉc2)⊤, S w = S C 1 + S C 2 S_{w}=S_{C_{1}}+S_{C_{2}} Sw=SC1+SC2。
则 J ( w ) = w ⊤ S b w w ⊤ S w w \Large J(w) = \frac{w^{\top} S_{b} w}{w^{\top} S_{w} w} J(w)=w⊤Swww⊤Sbw
对w进行求导:
∂ J ( ω ) ∂ ω = ∂ ∂ w ω ⊤ S b ω ( ω ⊤ S w ω ) − 1 = 2 S b w ( w ⊤ S w w ) − 1 + w ⊤ S b w ( − 1 ) ( w ⊤ S w w ) − 2 ⋅ 2 S ω w = 0 \begin{aligned} \frac{\partial J(\omega)}{\partial \omega} &=\frac{\partial}{\partial w} \omega^{\top} S_{b} \omega\left(\omega^{\top} S_{w} \omega\right)^{-1} \\ &=2 S_{b} w\left(w^{\top} S_{w} w\right)^{-1}+w^{\top} S_{b} w(-1)\left(w^{\top} S_{w} w\right)^{-2} \cdot 2 S_{\omega} w=0 \end{aligned} ∂ω∂J(ω)=∂w∂ω⊤Sbω(ω⊤Swω)−1=2Sbw(w⊤Sww)−1+w⊤Sbw(−1)(w⊤Sww)−2⋅2Sωw=0
两边同乘以 ( w ⊤ S w w ) 2 \left(w^{\top} S_{w} w\right)^{2} (w⊤Sww)2,得:
S b w ( w ⊤ S w w ) − w ⊤ S b w ⋅ S w w = 0 S_{b} w\left(w^{\top} S_{w} w\right)-w^{\top} S_{b} w \cdot S_{w} w=0 Sbw(w⊤Sww)−w⊤Sbw⋅Sww=0
S b w w ⊤ S w w = w ⊤ S b w S w w S_{b} w w^{\top} S_{w} w=w^{\top} S_{b} w S_{w} w Sbww⊤Sww=w⊤SbwSww
其中: S b S_b Sb和 S w S_w Sw均是(p * p)维的,而w是(p * 1)维的,
则 w ⊤ S w w w^{\top} S_{w} w w⊤Sww 和 w ⊤ S b w w^{\top} S_{b} w w⊤Sbw的结果均为1维常量,故可直接消去,不妨设为常数C。则
S w w = w ⊤ S w w w ⊤ S b w S b w \Large S_{w} w=\frac{w^{\top} S_{w} w}{w^{\top} S_{b} w} S_{b} w Sww=w⊤Sbww⊤SwwSbw
w = C S v − 1 S b w \Large w=C S_{v}^{-1} S_{b} w w=CSv−1Sbw
w ∝ S w − 1 S b w \Large w \propto S_{w}^{-1} S_{b} w w∝Sw−1Sbw
w ∝ S w − 1 ( x ˉ c 1 − x ˉ c 2 ) ( x ˉ c 1 − x ˉ c 2 ) ⊤ w \Large w \propto S_{w}^{-1}\left(\bar{x}_{c_{1}}-\bar x_{c_{2}}\right)\left({\bar x_{c_{1}}-\bar x_{c_{2}}}\right)^{\top} w w∝Sw−1(xˉc1−xˉc2)(xˉc1−xˉc2)⊤w
同理 ( x ˉ c 1 − x ˉ c 2 ) ⊤ w \left({\bar x_{c_{1}}-\bar x_{c_{2}}}\right)^{\top} w (xˉc1−xˉc2)⊤w 是一个常数。
故最终可获得:
w ∝ S w − 1 ( x ˉ c 1 − x ˉ c 2 ) w ∝ ( S c 1 + S c 2 ) − 1 ( x ˉ 1 − x ˉ 2 ) \begin{array}{l} w \propto S_{w^{-1}}\left(\bar{x}_{c_1}-\bar x_{c_{2}}\right) \\ w \propto\left(S_{c_1}+S_{c_{2}}\right)^{-1}(\bar x_{1} -\bar x_2) \end{array} w∝Sw−1(xˉc1−xˉc2)w∝(Sc1+Sc2)−1(xˉ1−xˉ2)
即我们计算得到了w的方向,从而也就知道了分类的超平面(垂直于该直线),便可进行分类任务。
待更新。
[1] 机器学习
[2] https://github.com/shuhuai007/Machine-Learning-Session
[3] 图解机器学习