LDA(Fisher)线性判别分析

LDA(Linear Discriminant Analysis)是一种经典的线性判别方法,又称Fisher判别分析。该方法思想比较简单:给定训练集样例,设法将样例投影到一维的直线上,使得同类样例的投影点尽可能接近和密集(即希望类内离散度越小越好),异类投影点尽可能远离(即希望两类的均值点之差越小越好)

LDA(Fisher)线性判别分析_第1张图片

两类数据点的类心分别是 μ 1 = 1 ∣ C 1 ∣ ∑ x ∈ C 1 x 和 μ 2 = 1 ∣ C 2 ∣ ∑ x ∈ C 2 x \mu_{1}=\frac{1}{|C_{1}|}\sum_{x\in C_{1}}x和\mu_{2}=\frac{1}{|C_{2}|}\sum_{x\in C_{2}}x μ1=C11xC1xμ2=C21xC2x
样本点 x x x投影到 w w w方向上后,在一维直线上得到的点为: y = w T x y=w^{T}x y=wTx
投影后的类心为: m k = 1 ∣ C k ∣ ∑ x ∈ C k w T x = w T 1 ∣ C k ∣ ∑ x ∈ C k x = w T μ k m_{k}=\frac{1}{|C_{k}|}\sum_{x\in C_{k}}w^{T}x=w^{T}\frac{1}{|C_{k}|}\sum_{x\in C_{k}}x=w^{T}\mu_{k} mk=Ck1xCkwTx=wTCk1xCkx=wTμk
类心间距为: ( m 1 − m 2 ) 2 = ( m 1 − m 2 ) ( m 1 − m 2 ) T = w T ( μ 1 − μ 2 ) ( μ 1 − μ 2 ) T w = w T S b w (m_{1}-m_{2})^{2}=(m_{1}-m_{2})(m_{1}-m_{2})^{T}\\ =w^{T}(\mu_{1}-\mu_{2})(\mu_{1}-\mu_{2})^{T}w=w^{T}S_{b}w\\ (m1m2)2=(m1m2)(m1m2)T=wT(μ1μ2)(μ1μ2)Tw=wTSbw
其中 S b S_{b} Sb称为类间散度矩阵: S b = ( μ 1 − μ 2 ) ( μ 1 − μ 2 ) T S_{b}=(\mu_{1}-\mu_{2})(\mu_{1}-\mu_{2})^{T} Sb=(μ1μ2)(μ1μ2)T
类内距离用类内样本的方差来衡量,对于第 k k k个类别,方差为 S k = ∑ x ∈ C k ( y − m k ) 2 = ∑ x ∈ C k ( w T ( x − μ k ) ) 2 = ∑ x ∈ C k ( w T ( x − μ k ) ) ( w T ( x − μ k ) ) T = ∑ x ∈ C k ( w T ( x − μ k ) ( x − μ k ) T w ) = w T [ ∑ x ∈ C k ( x − μ k ) ( x − μ k ) T ] w S_{k}=\sum_{x\in C_{k}}(y-m_{k})^{2}=\sum_{x\in C_{k}}(w^T({x}-\mu_{k}))^{2}\\ =\sum_{x\in C_{k}}(w^T({x}-\mu_{k}))(w^T({x}-\mu_{k}))^{T}\\ =\sum_{x\in C_{k}}(w^T({x}-\mu_{k})(x-\mu_{k})^{T}w)\\ =w^T[\sum_{x\in C_{k}}({x}-\mu_{k})(x-\mu_{k})^{T}]w Sk=xCk(ymk)2=xCk(wT(xμk))2=xCk(wT(xμk))(wT(xμk))T=xCk(wT(xμk)(xμk)Tw)=wT[xCk(xμk)(xμk)T]w
所有类别类内距离之和为: S 1 2 + S 2 2 = w T [ ∑ x ∈ C 1 ( x − μ 1 ) ( x − μ 1 ) T + ∑ x ∈ C 2 ( x − μ 2 ) ( x − μ 2 ) T ] w S_{1}^{2}+S_{2}^{2}\\=w^T[\sum_{x\in C_{1}}({x}-\mu_{1})(x-\mu_{1})^{T}+\sum_{x\in C_{2}}({x}-\mu_{2})(x-\mu_{2})^{T}]w S12+S22=wT[xC1(xμ1)(xμ1)T+xC2(xμ2)(xμ2)T]w
所以类内散度矩阵为: S w = ∑ x ∈ C 1 ( x − μ 1 ) ( x − μ 1 ) T + ∑ x ∈ C 2 ( x − μ 2 ) ( x − μ 2 ) T S_{w}=\sum_{x\in C_{1}}({x}-\mu_{1})(x-\mu_{1})^{T}+\sum_{x\in C_{2}}({x}-\mu_{2})(x-\mu_{2})^{T} Sw=xC1(xμ1)(xμ1)T+xC2(xμ2)(xμ2)T

我们的优化目标是提升类间距离,减小类内距离,所以可最大化函数: J ( w ) = ( m 1 − m 2 ) 2 S 1 2 + S 2 2 = w T S b w w T S w w J(w)=\frac{(m_{1}-m_{2})^{2}}{S_{1}^{2}+S_{2}^{2}}=\frac{w^{T}S_{b}w}{w^{T}S_{w}w} J(w)=S12+S22(m1m2)2=wTSwwwTSbw
从上式可以看出, J J J w w w的方向有关,确定方向后,与 w w w的长度无关。求解过程中,分子分母会同时变化,所以首先固定分母为某一个非0常数,即: w T S w w = c , c ≠ 0 w^{T}S_{w}w=c,c\neq 0 wTSww=c,c̸=0,此时求解 J ( w ) J(w) J(w)等价于: max ⁡ w w T S b w s . t .   w T S w w = c , c ≠ 0 \max_{w} w^{T}S_{b}w\\ s.t. \ w^{T}S_{w}w=c,c\neq 0 wmaxwTSbws.t. wTSww=c,c̸=0
此时可应用拉格朗日(Lagrange)乘数法: L ( w , λ ) = w T S b w − λ ( w T S w w − c ) L(w,\lambda)=w^{T}S_{b}w-\lambda(w^{T}S_{w}w-c) L(w,λ)=wTSbwλ(wTSwwc)
∂ L ( w , λ ) ∂ w = ( S b + S b T ) w − λ ( S w + S w T ) w = 2 S b w − 2 λ S w w = 0 \frac{\partial L(w,\lambda)}{\partial w}=(S_{b}+S_{b}^{T})w-\lambda(S_{w}+S_{w}^{T})w\\ =2S_{b}w-2\lambda S_{w}w=0 wL(w,λ)=(Sb+SbT)wλ(Sw+SwT)w=2Sbw2λSww=0
化简可得:
S w − 1 S b w = λ w S_{w}^{-1}S_{b}w=\lambda w Sw1Sbw=λw
S b w = ( μ 1 − μ 2 ) ( μ 1 − μ 2 ) T w = β ( μ 1 − μ 2 ) S_{b}w=(\mu_{1}-\mu_{2})(\mu_{1}-\mu_{2})^{T}w=\beta(\mu_{1}-\mu_{2}) Sbw=(μ1μ2)(μ1μ2)Tw=β(μ1μ2)表明 S b w S_{b}w Sbw的方向恒为 μ 1 − μ 2 \mu_{1}-\mu_{2} μ1μ2,带入上式可得:
w = β λ S w − 1 ( μ 1 − μ 2 ) w=\frac{\beta}{\lambda}S_{w}^{-1}(\mu_{1}-\mu_{2}) w=λβSw1(μ1μ2)
又因为 w w w只与方向有关,与长度无关,所以上式可以写为:
w = S w − 1 ( μ 1 − μ 2 ) w=S_{w}^{-1}(\mu_{1}-\mu_{2}) w=Sw1(μ1μ2)
考虑到数值解的稳定性,在实践中通常对 S w S_{w} Sw进行奇异值分解,即 S w = U Σ V T S_{w}=U\Sigma V^{T} Sw=UΣVT,然后再由 S w − 1 = V Σ − 1 U T S_{w}^{-1}=V\Sigma ^{-1}U^{T} Sw1=VΣ1UT。矩阵的奇异值分解可以参考:https://blog.csdn.net/winycg/article/details/83005881

sklearn实现LDA线性判别:

import numpy as np
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
y = np.array([1, 1, 1, 2, 2, 2])
clf = LinearDiscriminantAnalysis(solver='svd')
clf.fit(X, y)
print(clf.predict([[-0.8, -1]])) # [1]

你可能感兴趣的:(python机器学习)