线性判别分析

线性判别分析

文章目录

  • 线性判别分析
    • 核心思想
    • 数学推导
    • 算法实现
    • 参考文献

核心思想

线性判别分析(Linear Discriment Analysis, LDA)是一种经典的分类算法。核心思想是将训练数据投影到这样的一条直线上:

  • 在该直线上的所有投影点,属于同一类别的数据尽可能的近,简称类内近。
  • 在该直线上的所有投影点,属于不同类的数据尽可能的远,简称类间远。

当我们对新数据进行预测时,只需同样的将该数据投影到求得的直线上,通过判断,投影点哪一个类近,就认为是哪一类。

总结起来就是六个字:类内近,类间远。

数学推导

我们以二分类为例:

我们假设输入样本为 ( X , Y ) (X,Y) (X,Y),样本总数为N,有两个类别 C 1 C_1 C1 C 2 C_2 C2
X = ( x 11 , x 12 , ⋯ x 1 p x 21 , x 22 , ⋯   , x 2 p ⋮ x n 1 , x N 2 , ⋯   , x N p ) N × p X=\left(\begin{array}{c}x_{11}, x_{12}, \cdots x_{1 p} \\ x_{21}, x_{22}, \cdots, x_{2 p} \\ \vdots \\ x_{n 1}, x_{N 2}, \cdots, x_{N p} \end{array}\right)_{N \times p} X=x11,x12,x1px21,x22,,x2pxn1,xN2,,xNpN×p
Y = ( y 1 , y 2 , ⋯   , y N ) ⊤ , y ∈ { C 1 , C 2 } Y=\left(y_{1}, y_{2}, \cdots, y_{N}\right)^{\top}, \quad y \in\{C_1,C_2\} Y=(y1,y2,,yN),y{C1,C2}

则所求直线可表示为:

y = W ⊤ x y=W^{\top} x y=Wx
其中, W = ( w 1 , w 2 , ⋯   , w N ) ⊤ W=\left(w_{1}, w_{2}, \cdots, w_{N}\right)^{\top} W=(w1,w2,,wN)

为方便运算,我们假定 ∥ W ∥ = 1 \|W\|=1 W=1

线性判别分析_第1张图片

其中样本点 ∥ x i ∥ \|x_i\| xi在直线 y = w ⊤ x y =w^{\top} x y=wx的投影距为:

∥ x i → ∥ ⋅ cos ⁡ θ = ∥ x ⃗ i ∥ ⋅ ∥ w ⃗ ∥ ⋅ cos ⁡ θ = ω ⊤ x i \|\overrightarrow{x_{i}}\| \cdot \cos \theta=\left\|\vec{x}_{i}\right\| \cdot\|\vec{w}\| \cdot \cos \theta=\omega^{\top} x_{i} xi cosθ=x iw cosθ=ωxi

我们以投影距作为样本点 x i x_i xi在直线 y = w ⊤ x y =w^{\top} x y=wx的一维坐标。

类间距我们用样本的方差均值表示,类内距我们用样本方差表示。

样本均值: y ˉ = 1 N ∑ i = 1 N w ⊤ x i \begin{aligned}\bar{y}=\frac{1}{N} \sum_{i=1}^{N} w^{\top} x_{i} \end{aligned} yˉ=N1i=1Nwxi

样本方差: S = 1 N ∑ i = 1 N ( w ⊤ x i − y ˉ ) ( w ⊤ x i − y ˉ ) ⊤ \begin{aligned}S=\frac{1}{N} \sum_{i=1}^{N}\left(w^{\top} x_{i}-\bar{y}\right)\left(w^{\top} x_{i}-\bar{y}\right)^{\top} \end{aligned} S=N1i=1N(wxiyˉ)(wxiyˉ)

类间距为: ( y ˉ 1 − y ˉ 2 ) 2 \left(\bar{y}_{1}-\bar{y}_{2}\right)^{2} (yˉ1yˉ2)2

类内距为: S 1 + S 2 S_1+S_2 S1+S2

优化函数为: J ( w ) = ( y 1 ˉ − y ˉ 2 ) 2 S 1 + S 2 \Large J(w)=\frac{\left(\bar{y_{1}}-\bar{y}_{2}\right)^{2}}{S_{1}+S_{2}} J(w)=S1+S2(y1ˉyˉ2)2

w ^ = arg ⁡ max ⁡ w J ( w ) = arg ⁡ max ⁡ w ( y ˉ 1 − y ˉ 2 ) S 1 + S 2 \Large \widehat{w}=\arg \max _{w} J(w)=\arg \max _{w} \frac{\left(\bar{y}_{1}-\bar{y}_{2}\right)}{S_{1}+S_{2}} w =argmaxwJ(w)=argmaxwS1+S2(yˉ1yˉ2)

其中:

( y ˉ 1 − y ˉ 2 ) 2 = ( 1 N 1 ∑ i = 1 N 1 w ⊤ x i − 1 N 2 ∑ i = 1 N 1 w ⊤ x i ) 2 = ( w ⊤ 1 N i ∑ i = 1 N 1 x i − ω i ∑ i = 1 N 2 x i ) 2 = w ⊤ ( x ˉ c 1 − x ˉ c 2 ) ( x ˉ c 1 − x ˉ c 2 ) ⊤ w \begin{aligned} \left(\bar{y}_{1}-\bar{y}_{2}\right)^{2} & = \left(\frac{1}{N_{1}} \sum_{i=1}^{N_{1}} w^{\top} x_{i}-\frac{1}{N_{2}} \sum_{i=1}^{N_{1}} w^{\top} x_{i}\right)^{2} \\ & =\left(w^{\top} \frac{1}{N_{i}} \sum_{i=1}^{N_{1}} x_{i}-\omega_{i} \sum_{i=1}^{N_{2}} x_{i}\right)^{2} \\ & =w^{\top}\left(\bar{x}_{c_{1}}-\bar{x}_{c_2}\right)(\bar x_{c_1}-\bar x_{c_2})^{\top} w \end{aligned} (yˉ1yˉ2)2=(N11i=1N1wxiN21i=1N1wxi)2=(wNi1i=1N1xiωii=1N2xi)2=w(xˉc1xˉc2)(xˉc1xˉc2)w

S 1 = 1 N 1 ∑ i = 1 N 1 ( ω j ˉ x i − y ˉ 1 ) ( ω ⊤ x i − y ˉ 1 ) ⊤ = 1 N 1 ∑ i = 1 N 1 ( w ⊤ x i − 1 ) N 1 ∑ i = 1 N w ⊤ x i ) ⋅ ( ω x i − 1 N 1 ∑ i = 1 N w ⊤ x i = 1 N 1 ∑ i = 1 N 1 w ⊤ ( x i − 1 N 1 ∑ i = 1 N x i ) ⋅ ( x i − 1 N 1 ∑ i = 1 N x i ) ⊤ ⋅ w = w ⊤ 1 N 1 ∑ i = 1 N ( x i − x ˉ 1 ) ( x i − x ˉ 1 ) ⊤ ⋅ w = w ⊤ S C 1 w \begin{aligned} S_{1} &=\frac{1}{N_{1}} \sum_{i=1}^{N_{1}}\left(\omega^{\bar{j}} x_{i}-\bar{y}_{1}\right)\left(\omega^{\top} x_{i}-\bar{y}_{1}\right)^{\top} \\ &=\frac{1}{N_{1}} \sum_{i=1}^{N_{1}}\left(w^{\top} x_{i}-\frac{1}){N_{1}} \sum_{i=1}^{N} w^{\top} x_{i}\right) \cdot\left(\omega_{x_{i}}-\frac{1}{N_{1}} \sum_{i=1}^{N} w^{\top} x_{i}\right.\\ &=\frac{1}{N_{1}} \sum_{i=1}^{N_{1}} w^{\top}\left(x_{i}-\frac{1}{N_{1}} \sum_{i=1}^{N} x_{i}\right) \cdot\left(x_{i}-\frac{1}{N_{1}}\sum_{i=1}^{N} x_{i}\right)^\top \cdot w \\ &=w^{\top} \frac{1}{N_{1}} \sum_{i=1}^{N}\left(x_{i}-\bar{x}_{1}\right)\left(x_{i}-\bar{x}_{1}\right)^{\top} \cdot w \\ &=w^{\top} S_{C_1} w \end{aligned} S1=N11i=1N1(ωjˉxiyˉ1)(ωxiyˉ1)=N11i=1N1(wxi)1N1i=1Nwxi)(ωxiN11i=1Nwxi=N11i=1N1w(xiN11i=1Nxi)(xiN11i=1Nxi)w=wN11i=1N(xixˉ1)(xixˉ1)w=wSC1w

同理 S 2 = w ⊤ S C 2 w S_2 = w^\top S_{C_2}w S2=wSC2w

故 :
J ( w ) = ( y 1 ˉ − y ˉ 2 ) 2 S 1 + S 2 = w ⊤ ( x ˉ c 1 − x ˉ c 2 ) ( x ˉ c 1 − x ˉ c 2 ) ⊤ w w ⊤ ( S C 1 + S C 2 ) w \Large \begin{aligned} J(w)&= \frac{\left(\bar{y_{1}}-\bar{y}_{2}\right)^{2}}{S_{1}+S_{2}} \\ & = \frac{w^{\top}\left(\bar{x}_{c_{1}}-\bar{x}_{c_2}\right)(\bar x_{c_1}-\bar x_{c_2})^{\top} w} {w^{\top} (S_{C_1}+S_{C_2}) w } \end{aligned} J(w)=S1+S2(y1ˉyˉ2)2=w(SC1+SC2)ww(xˉc1xˉc2)(xˉc1xˉc2)w

而其中 ( x ˉ c 1 − x ˉ c 2 ) ( x ˉ c 1 − x ˉ c 2 ) ⊤ (\bar{x}_{c_{1}}-\bar{x}_{c_2})(\bar x_{c_1}-\bar x_{c_2})^{\top} (xˉc1xˉc2)(xˉc1xˉc2) ( S C 1 + S C 2 ) (S_{C_1}+S_{C_2}) (SC1+SC2)与w无关,为方便运算,我们令 S b = ( x ˉ c 1 − x ˉ c 2 ) ( x ˉ c 1 − x ˉ c 2 ) ⊤ S_{b}= (\bar{x}_{c_{1}}-\bar{x}_{c_2})(\bar x_{c_1}-\bar x_{c_2})^{\top} Sb=(xˉc1xˉc2)(xˉc1xˉc2), S w = S C 1 + S C 2 S_{w}=S_{C_{1}}+S_{C_{2}} Sw=SC1+SC2

J ( w ) = w ⊤ S b w w ⊤ S w w \Large J(w) = \frac{w^{\top} S_{b} w}{w^{\top} S_{w} w} J(w)=wSwwwSbw

对w进行求导:

∂ J ( ω ) ∂ ω = ∂ ∂ w ω ⊤ S b ω ( ω ⊤ S w ω ) − 1 = 2 S b w ( w ⊤ S w w ) − 1 + w ⊤ S b w ( − 1 ) ( w ⊤ S w w ) − 2 ⋅ 2 S ω w = 0 \begin{aligned} \frac{\partial J(\omega)}{\partial \omega} &=\frac{\partial}{\partial w} \omega^{\top} S_{b} \omega\left(\omega^{\top} S_{w} \omega\right)^{-1} \\ &=2 S_{b} w\left(w^{\top} S_{w} w\right)^{-1}+w^{\top} S_{b} w(-1)\left(w^{\top} S_{w} w\right)^{-2} \cdot 2 S_{\omega} w=0 \end{aligned} ωJ(ω)=wωSbω(ωSwω)1=2Sbw(wSww)1+wSbw(1)(wSww)22Sωw=0

两边同乘以 ( w ⊤ S w w ) 2 \left(w^{\top} S_{w} w\right)^{2} (wSww)2,得:

S b w ( w ⊤ S w w ) − w ⊤ S b w ⋅ S w w = 0 S_{b} w\left(w^{\top} S_{w} w\right)-w^{\top} S_{b} w \cdot S_{w} w=0 Sbw(wSww)wSbwSww=0

S b w w ⊤ S w w = w ⊤ S b w S w w S_{b} w w^{\top} S_{w} w=w^{\top} S_{b} w S_{w} w SbwwSww=wSbwSww

其中: S b S_b Sb S w S_w Sw均是(p * p)维的,而w是(p * 1)维的,

w ⊤ S w w w^{\top} S_{w} w wSww w ⊤ S b w w^{\top} S_{b} w wSbw的结果均为1维常量,故可直接消去,不妨设为常数C。则

S w w = w ⊤ S w w w ⊤ S b w S b w \Large S_{w} w=\frac{w^{\top} S_{w} w}{w^{\top} S_{b} w} S_{b} w Sww=wSbwwSwwSbw

w = C S v − 1 S b w \Large w=C S_{v}^{-1} S_{b} w w=CSv1Sbw

w ∝ S w − 1 S b w \Large w \propto S_{w}^{-1} S_{b} w wSw1Sbw

w ∝ S w − 1 ( x ˉ c 1 − x ˉ c 2 ) ( x ˉ c 1 − x ˉ c 2 ) ⊤ w \Large w \propto S_{w}^{-1}\left(\bar{x}_{c_{1}}-\bar x_{c_{2}}\right)\left({\bar x_{c_{1}}-\bar x_{c_{2}}}\right)^{\top} w wSw1(xˉc1xˉc2)(xˉc1xˉc2)w

同理 ( x ˉ c 1 − x ˉ c 2 ) ⊤ w \left({\bar x_{c_{1}}-\bar x_{c_{2}}}\right)^{\top} w (xˉc1xˉc2)w 是一个常数。

故最终可获得:

w ∝ S w − 1 ( x ˉ c 1 − x ˉ c 2 ) w ∝ ( S c 1 + S c 2 ) − 1 ( x ˉ 1 − x ˉ 2 ) \begin{array}{l} w \propto S_{w^{-1}}\left(\bar{x}_{c_1}-\bar x_{c_{2}}\right) \\ w \propto\left(S_{c_1}+S_{c_{2}}\right)^{-1}(\bar x_{1} -\bar x_2) \end{array} wSw1(xˉc1xˉc2)w(Sc1+Sc2)1(xˉ1xˉ2)

即我们计算得到了w的方向,从而也就知道了分类的超平面(垂直于该直线),便可进行分类任务。

算法实现

待更新。

参考文献

[1] 机器学习
[2] https://github.com/shuhuai007/Machine-Learning-Session
[3] 图解机器学习

你可能感兴趣的:(机器学习)