十二.softmax多分类和sklearn实现

目录

  • 1.softmax函数
  • 2.softmax多分类模型
  • 3.损失函数
  • 4.参数学习
  • 5.sklearn实现多分类
    • (1)数据集简介
    • (2)准备数据
    • (3)数据标准化
    • (4)训练、预测和评价

1.softmax函数

softmax函数相当于将输出值转化为一个概率分布:
f ( x ) = e x ∑ e x f(x)=\frac{e^{x} }{\sum e^{x}} f(x)=exex
因此, f ( x ) ∈ [ 0 , 1 ] , ∑ f ( x ) = 1 f(x) \in [0,1],\sum f(x)=1 f(x)[0,1],f(x)=1

2.softmax多分类模型

对于多分类任务,有 m m m个样本,每个样本有 n n n个维度和一个标签 y y y y y y ( 1 , 2 , 3 , . . . , C ) (1,2,3,...,C) (1,2,3,...,C) C C C个类别, w c ∈ R 1 × n \mathbf{w}_{c}\in R^{1\times n} wcR1×n为第 c c c个类别的权重系数,第 i i i个样本 x i ∈ R n × 1 \mathbf{x}^{i}\in R^{n\times 1} xiRn×1的类别 y i \mathbf{y}^{i} yi为类别 c c c的概率为:
p ( y i ∣ x i ) = e w c x i ∑ c = 1 C e w c x i p(y^{i}|x^{i})=\frac{e^{\mathbf{w^{c}\mathbf{x}^{i}}}}{\sum_{c=1}^{C}e^{\mathbf{w^{c}}}\mathbf{x^{i}}} p(yixi)=c=1Cewcxiewcxi
用one-hot向量 y ^ i \mathbf{\widehat{y}}^{i} y i表示样本 x i \mathbf{x^{i}} xi的预测结果:
y ^ i = e W x i E T e W x i \mathbf{\widehat{y}}^{i}=\frac{e^{\mathbf{W}\mathbf{x^{i}}}}{E^{T}{e^{\mathbf{W}\mathbf{x^{i}}}}} y i=ETeWxieWxi
其中, W ∈ R c × n , x i ∈ R n × 1 , E ∈ R c × 1 , y ^ i ∈ R c × 1 \mathbf{W}\in R^{c\times n},\mathbf{x}^{i}\in R^{n\times 1},E \in R^{c\times 1},\mathbf{\widehat{y}}^{i}\in R ^{c\times 1} WRc×n,xiRn×1,ERc×1,y iRc×1

3.损失函数

softmax模型采用交叉熵作为损失函数:
J ( W ) = − 1 m ∑ i = 1 m ( y i ) T log ⁡ y ^ i = − 1 m ∑ i = 1 m ( y i ) T log ⁡ e W x i E T e W x i J(\mathbf{W})=-\frac{1}{m}\sum_{i=1}^{m}(\mathbf{y}^{i})^{T}\log \widehat{\mathbf{y}}^{i}=-\frac{1}{m}\sum_{i=1}^{m}(\mathbf{y}^{i})^{T}\log \frac{e^{\mathbf{Wx^{i}}}}{E^{T}e^{\mathbf{Wx^{i}}}} J(W)=m1i=1m(yi)Tlogy i=m1i=1m(yi)TlogETeWxieWxi

4.参数学习

使用梯度下降迭代求解参数值,令:
L = − ( y i ) T log ⁡ e W x i E T e W x i = − ( y i ) T [ W x i − E log ⁡ E T e W x i ] L=-(\mathbf{y}^{i})^{T}\log \frac{e^{\mathbf{Wx^{i}}}}{E^{T}e^{\mathbf{Wx^{i}}}}=-(\mathbf{y}^{i})^{T}[\mathbf{Wx^{i}}-E\log E^{T}e^{\mathbf{Wx^{i}}}] L=(yi)TlogETeWxieWxi=(yi)T[WxiElogETeWxi]
其中:
W ∈ R c × n , x i ∈ R n × 1 , E ∈ R c × 1 , y i ∈ R c × 1 , L ∈ R \mathbf{W}\in R^{c\times n},\mathbf{x}^{i}\in R^{n\times 1},E \in R^{c\times 1},\mathbf{y}^{i}\in R^{c\times 1},L\in R WRc×n,xiRn×1,ERc×1,yiRc×1,LR
又由于 ( y i ) T E = 1 (\mathbf{y}^{i})^{T}E=1 (yi)TE=1,所以:
L = − ( y i ) T W x i − log ⁡ E T e W x i L=-(\mathbf{y}^{i})^{T}\mathbf{Wx^{i}}-\log E^{T}e^{\mathbf{Wx^{i}}} L=(yi)TWxilogETeWxi
要求 ∂ L ∂ W \frac{\partial L}{\partial \mathbf{W}} WL,标量对矩阵求导,使用矩阵微分(矩阵微分乘法,逐元素函数微分):
d L = ( − y i ) T d W x i + E T ( e W x i ⊙ d W x i ) E T e W x i dL=(-\mathbf{y}^{i})^{T}d\mathbf{Wx}^{i}+\frac{E^{T}(e^{\mathbf{Wx}^{i}}\odot d\mathbf{Wx}^{i})}{E^{T}e^{\mathbf{Wx}^{i}}} dL=(yi)TdWxi+ETeWxiET(eWxidWxi)
E T ( U ⊙ V ) = U T V E^{T}(\mathbf{U\odot V})=\mathbf{U^{T}V} ET(UV)=UTV,所以:
E T ( e W x i ⊙ d W x i ) = ( e W x i ) T d W x i E^{T}(e^{\mathbf{Wx}^{i}}\odot d\mathbf{Wx}^{i})=(e^{\mathbf{Wx}^{i}})^{T} d\mathbf{Wx}^{i} ET(eWxidWxi)=(eWxi)TdWxi
综上:
d L = ( − y i ) T d W x i + E T ( e W x i ⊙ d W x i ) E T e W x i = ( − y i ) T d W x i + ( e W x i ) T d W x i E T e W x i = [ ( − y i ) T + ( e W x i ) T E T e W x i ] d W x i = [ ( y ^ i ) T − ( y i ) T ] d W x i = t r [ ( y ^ i − y i ) T d W x i ] = t r [ x i ( y ^ i − y i ) T d W ] \begin{aligned} dL&=(-\mathbf{y}^{i})^{T}d\mathbf{Wx}^{i}+\frac{E^{T}(e^{\mathbf{Wx}^{i}}\odot d\mathbf{Wx}^{i})}{E^{T}e^{\mathbf{Wx}^{i}}} \\&=(-\mathbf{y}^{i})^{T}d\mathbf{Wx}^{i}+\frac{(e^{\mathbf{Wx}^{i}})^{T}d\mathbf{Wx}^{i}}{E^{T}e^{\mathbf{Wx}^{i}}} \\&=[(-\mathbf{y}^{i})^{T}+\frac{(e^{\mathbf{Wx}^{i}})^{T}}{E^{T}e^{\mathbf{Wx}^{i}}}]d\mathbf{Wx}^{i} \\&=[(\widehat{\mathbf{y}}^{i})^{T}-(\mathbf{y}^{i})^{T}]d\mathbf{Wx}^{i} \\&=tr[(\widehat{\mathbf{y}}^{i}-\mathbf{y}^{i})^{T}d\mathbf{Wx}^{i}] \\&=tr[\mathbf{x}^{i}(\widehat{\mathbf{y}}^{i}-\mathbf{y}^{i})^{T}d\mathbf{W}] \end{aligned} dL=(yi)TdWxi+ETeWxiET(eWxidWxi)=(yi)TdWxi+ETeWxi(eWxi)TdWxi=[(yi)T+ETeWxi(eWxi)T]dWxi=[(y i)T(yi)T]dWxi=tr[(y iyi)TdWxi]=tr[xi(y iyi)TdW]
最终:
∂ L ∂ W = [ x i ( y ^ i − y i ) T ] T = ( y ^ i − y i ) ( x i ) T ∂ J ∂ W = ∑ i = 1 m ( y ^ i − y i ) ( x i ) T \frac{\partial L}{\partial \mathbf{W}}=[\mathbf{x}^{i}(\widehat{\mathbf{y}}^{i}-\mathbf{y}^{i})^{T}]^{T}=(\widehat{\mathbf{y}}^{i}-\mathbf{y}^{i})(\mathbf{x}^{i})^{T} \\ \frac{\partial J}{\partial \mathbf{W}}=\sum_{i=1}^{m}(\widehat{\mathbf{y}}^{i}-\mathbf{y}^{i})(\mathbf{x}^{i})^{T} WL=[xi(y iyi)T]T=(y iyi)(xi)TWJ=i=1m(y iyi)(xi)T
k + 1 k+1 k+1次的迭代公式为:
W k + 1 = W k − λ ∑ i = 1 m ( y ^ i − y i ) ( x i ) T \mathbf{W}^{k+1}=\mathbf{W}^{k}-\lambda\sum_{i=1}^{m}(\widehat{\mathbf{y}}^{i}-\mathbf{y}^{i})(\mathbf{x}^{i})^{T} Wk+1=Wkλi=1m(y iyi)(xi)T
其中:
y ^ i = e W x i E T e W x i \mathbf{\widehat{y}}^{i}=\frac{e^{\mathbf{W}\mathbf{x^{i}}}}{E^{T}{e^{\mathbf{W}\mathbf{x^{i}}}}} y i=ETeWxieWxi

5.sklearn实现多分类

(1)数据集简介

鸢尾花数据集是一个经典数据集。数据集内包含 3 类共 150 条记录,每类各 50 个数据,每条记录都有 4 项特征:花萼长度、花萼宽度、花瓣长度、花瓣宽度,可以通过这4个特征预测鸢尾花卉属于4种中的哪一品种。

(2)准备数据

导入数据集,划分训练集和测试集

from sklearn import datasets
from sklearn.model_selection import train_test_split
iris = datasets.load_iris()
x,y=iris.data,iris.target
x_train,x_test,y_train,y_test = train_test_split(x,y)
print(x_train.shape,x_test.shape,y_train.shape,y_test.shape)

输出:

(112, 4) (38, 4) (112,) (38,)

(3)数据标准化

from sklearn.preprocessing import StandardScaler
std = StandardScaler()
x_train = std.fit_transform(x_train)
x_test = std.fit_transform(x_test)

(4)训练、预测和评价

from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
model = LogisticRegression()
model.fit(x_train,y_train)
y_pred = model.predict(x_test)
print(accuracy_score(y_pred,y_test))

输出:

0.9736842105263158

你可能感兴趣的:(机器学习实战,机器学习理论基础)