LDA线性判别分析的原理推导与Python实现

LDA(Linear Discriminant Analysis)是一种有监督的分类算法,他常常被用来做数据与处理中的降维或者分类任务

目的
LDA的目标是找出能够最大化类间区分度的坐标轴成分将特征空间(数据集中的多维样本)投影到一个维度更小的k维子空间中,同时保持区分类别的信息

原理
将原始数据投影到维度更低的空间中,使得投影后的点会形成按照类别区分,一簇一簇的情况,相同类别的店,将会在投影后的空间中更接近

与PCA区别
首先LDA是有监督的,此外,最大的区别是LDA计算的是另一类特定的方向,更关心分类而不是方差

目标函数的导出
LDA的目标是找出 y = w T x y=w^{T} x y=wTx 中的 w w w,即投影方向

LDA分类器的一个目标是使得不同类别之间的距离越远越好,同一类别之中的距离越近越好,所以需要计算类内和类间的距离

每一类样本的均值为 μ i = 1 N i ∑ x ∈ ω i x \mu_{i}=\frac{1}{N_{i}} \sum_{x \in \omega_{i}} x μi=Ni1xωix
投影后,每一类样本均值变为 μ ~ i = 1 N i ∑ y ∈ ω i y = 1 N i ∑ x ∈ ω i w T x = w T μ i \tilde{\mu}_{\mathrm{i}}=\frac{1}{N_{i}} \sum_{y \in \omega_{i}} y=\frac{1}{N_{i}} \sum_{x \in \omega_{i}} w^{T} x=w^{T} \mu_{i} μ~i=Ni1yωiy=Ni1xωiwTx=wTμi要使得投影后的两类样本中心点尽量分离,设定 J ( w ) = ∣ μ 1 ~ − μ 2 ~ ∣ = ∣ w T ( μ 1 − μ 2 ) ∣ \mathrm{J}(\mathrm{w})=|\widetilde{\mu_{1}}-\widetilde{\mu_{2}}|=\left|w^{T}\left(\mu_{1}-\mu_{2}\right)\right| J(w)=μ1 μ2 =wT(μ1μ2)但是只让 J ( w ) \mathrm{J}(\mathrm{w}) J(w)最大只可以达到类间距离尽可能大,无法实现类内距离尽可能小,所以定义 s ~ i 2 = ∑ y ∈ ω i ( y − μ ~ i ) 2 \tilde{s}_{i}^{2}=\sum_{y \in \omega_{i}}\left(y-\tilde{\mu}_{i}\right)^{2} s~i2=yωi(yμ~i)2
综合考虑 J ( w ) \mathrm{J}(\mathrm{w}) J(w) s ~ t 2 \tilde{s}_{t}^{2} s~t2,设定目标函数为 I ( w ) = ∣ μ ~ 1 − μ ~ 2 ∣ 2 s ^ 1 2 + s ^ 2 2 \mathrm{I}(\mathrm{w})=\frac{\left|\widetilde{\mu}_{1}-\widetilde{\mu}_{2}\right|^{2}}{\hat{s}_{1}^{2}+\hat{s}_{2}^{2}} I(w)=s^12+s^22μ 1μ 22其中 s ~ t 2 \tilde{s}_{t}^{2} s~t2被称为散列值,它的值越大代表样本点越集中,越小代表样本点越分散,公式展开为 s ~ i 2 = ∑ y ∈ ω i ( y − μ ~ i ) 2 = ∑ x ∈ ω i ( w T x − w τ μ i ) 2 = ∑ x ∈ ω i w T ( x − μ i ) ( x − μ i ) τ w \tilde{s}_{i}^{2}=\sum_{y \in \omega_{i}}\left(y-\tilde{\mu}_{i}\right)^{2}=\sum_{x \in \omega_{i}}\left(w^{T} x-w^{\tau} \mu_{i}\right)^{2}=\sum_{x \in \omega_{i}} w^{T}\left(x-\mu_{i}\right)\left(x-\mu_{i}\right)^{\tau} w s~i2=yωi(yμ~i)2=xωi(wTxwτμi)2=xωiwT(xμi)(xμi)τw公式中间的那部分又被称散列矩阵,计算方法为 s i = ∑ x ∈ ω i ( x − μ i ) ( x − μ i ) T \mathrm{s}_{i}=\sum_{x \in \omega_{i}}\left(x-\mu_{i}\right)\left(x-\mu_{i}\right)^{T} si=xωi(xμi)(xμi)T
定义投影前类内散布矩阵为 S w = S 1 + S 2 S w=S 1+S 2 Sw=S1+S2
所以投影后每一类的散列值为 s ~ i 2 = w T s i w \tilde{s}_{i}^{2}=w^{\mathrm{T}} \mathrm{s}_{i} w s~i2=wTsiw
整体类内散列值为,即目标函数分母为 s 1 ~ 2 + s 2 ~ 2 = w T S w w \tilde{s_{1}}^{2}+\tilde{s_{2}}^{2}=w^{\mathrm{T}} \mathrm{S}_{w} w s1~2+s2~2=wTSww
继续将目标函数的分子展开 ( μ ~ 1 − μ ~ 2 ) 2 = ( w ⊤ μ 1 − w ⊤ μ 2 ) 2 = w ⊤ ( μ 1 − μ 2 ) ( μ 1 − μ 2 ) ⊤ w = w ⊤ S B w \left(\tilde{\mu}_{1}-\tilde{\mu}_{2}\right)^{2}=\left(w^{\top} \mu_{1}-w^{\top} \mu_{2}\right)^{2}=w^{\top}\left(\mu_{1}-\mu_{2}\right)\left(\mu_{1}-\mu_{2}\right)^{\top} w=w^{\top} S_{B} w (μ~1μ~2)2=(wμ1wμ2)2=w(μ1μ2)(μ1μ2)w=wSBw其中 ( μ 1 − μ 2 ) ( μ 1 − μ 2 ) ⊤ \left(\mu_{1}-\mu_{2}\right)\left(\mu_{1}-\mu_{2}\right)^{\top} (μ1μ2)(μ1μ2)被称为类间散列布矩阵 S B S_{B} SB

所以最终目标函数被化简为 J ( w ) = w T S B w w T S w w J(w)=\frac{w^{T} S_{B} w}{w^{T} S_{w} w} J(w)=wTSwwwTSBw

目标函数的求解
由于目标函数是分数形式的,所以需要首先定下分母或者分子,否则有无穷多个解
此处将分母的大小设定为1(同时是一种归一化处理)
及目标函数变为 min ⁡ J ( w ) = w T S B w \min{}J(w)={w^{T} S_{B} w} minJ(w)=wTSBw S . T : w T S w w = 1 S.T:{w^{T} S_{w} w} = 1 S.TwTSww=1
利用拉格朗日法进行求解
c ( w ) = w ′ S B w − λ ( w ′ S B w − 1 ) ⇒ d c d w = 2 S B w − 2 λ S w w = 0 ⇒ S B w = λ S w w \begin{array}{l}{c(w)=w^{\prime} S_{B }{w}-\lambda\left(w^{\prime} S_{B }{w}-1\right)} \\ {\Rightarrow \frac{d c}{d w}=2 S_{B }{w}-2 \lambda S_{w} w=0} \\ {\Rightarrow S_{B }{w}=\lambda S_{w }{w}}\end{array} c(w)=wSBwλ(wSBw1)dwdc=2SBw2λSww=0SBw=λSww
对等式两边都乘以 S w S_{w } Sw的逆,得 S W − 1 S B w = λ w S_{W}^{-1} S_{B }{w}=\lambda {w} SW1SBw=λw
由此可以得出, w {w} w就是矩阵 S W − 1 S B w S_{W}^{-1} S_{B }{w} SW1SBw的特征向量

feature_dict = {i: label for i, label in
                zip(range(4), ("sepal length in cm", "sepal width in cm", "petal length in cm", "petal width in cm"))}
import pandas as pd
#读取数据---------------------------------------------------------------------------------------------------------------
df = pd.read_csv("iris.data", header=None, sep=",")
df.columns = [l for i, l in sorted(feature_dict.items())] + ["class label"]
df.dropna(how="all", inplace=True)
print(df.tail())
from sklearn.preprocessing import LabelEncoder#注意利用LabelEncoder库可以将分类数据数字化

X = df[["sepal length in cm", "sepal width in cm", "petal length in cm", "petal width in cm"]].values
y = df["class label"].values

enc = LabelEncoder()
enc.fit(y)
y = enc.transform(y) + 1
print(y)






#计算S_W---------------------------------------------------------------------------------------------------------------
import numpy as np

np.set_printoptions(precision=4)
mean_vectors = []
for cl in range(1, 4):
    mean_vectors.append(np.mean(X[y == cl], axis=0))
    print("%s mean is %s" % (cl, mean_vectors))

S_W = np.zeros((4, 4))
for cl, mv in zip(range(1, 4), mean_vectors):
    class_sc_mat = np.zeros((4, 4))
    for row in X[y == cl]:
        row, mv = row.reshape(4, 1), mv.reshape(4, 1)
        class_sc_mat += (row - mv).dot((row - mv).T)
    S_W += class_sc_mat







#计算S_B---------------------------------------------------------------------------------------------------------------
overall_mean = np.mean(X, axis=0)

S_B = np.zeros((4, 4))
for i, mean_vec in enumerate(mean_vectors):
    n = X[y == i + 1].shape[0]
    mean_vec = mean_vec.reshape(4, 1)
    overall_mean = overall_mean.reshape(4, 1)
    S_B += n * (mean_vec - overall_mean).dot((mean_vec - overall_mean).T)








#求特征值和特征向量---------------------------------------------------------------------------------------------------------------
eigen_vals, eigen_vecs = np.linalg.eig(np.linalg.inv(S_W).dot(S_B))
eigen_pairs = [(np.abs(eigen_vals[i]), eigen_vecs[:, i]) for i in range(len(eigen_vals))]
eigen_pairs = sorted(eigen_pairs, key=lambda k: k[0], reverse=True)
for eigen_val in eigen_pairs:
    print(eigen_val[0])
print(eigen_pairs)





#画出特征值的占比---------------------------------------------------------------------------------------------------------------
tot = sum(eigen_vals.real)
discr = [(i / tot) for i in sorted(eigen_vals.real, reverse=True)]
cum_discr = np.cumsum(discr)
import matplotlib.pyplot as plt

plt.bar(range(1, 5), discr, alpha=0.5, align='center', label='individual discriminability')
plt.step(range(1, 5), cum_discr, where='mid', label='cumulative discriminability')
plt.ylabel('discriminability ration')
plt.xlabel('Linear Discriminants')
plt.ylim([-0.1, 1.1])
plt.legend(loc='best')
plt.show()






#水平合并特征向量得出W---------------------------------------------------------------------------------------------------------------
W = np.hstack((eigen_pairs[0][1][:, np.newaxis], eigen_pairs[1][1][:, np.newaxis]))
print(W)





#W和原始数据做内积,得出新数据--------------------------------------------------------------------------------------------------------------
X_new = X.dot(W)





#绘制新数据图像--------------------------------------------------------------------------------------------------------------
colors = ['r', 'b', 'g']
markers = ['s', 'x', 'o']
for l, c, m in zip(np.unique(y), colors, markers):
    plt.scatter(X_new[y == l, 0], X_new[y == l, 1]*(-1) , c=c, label=l, marker=m)
plt.xlabel('LD 1')
plt.ylabel('LD 2')
plt.legend(loc='lower right')
plt.show()

from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
lda = LinearDiscriminantAnalysis(n_components=2)
X_new = lda.fit_transform(X,y)
for l, c, m in zip(np.unique(y), colors, markers):
    plt.scatter(X_new[y == l, 0], X_new[y == l, 1] , c=c, label=l, marker=m)
plt.show()

你可能感兴趣的:(机器学习算法)