flyfish
理论说明
一张图说明Multi-Class 分类 和Multi-Label 分类的区别
假设一共有C类,这样一共有3类,一张图里只有一个目标,表示label的向量里只有一个1
这是Multi-Class,正类(positive class) 只有一个, 有C−1个负类( negative classes)
假设一共有3类,一张图里有多个目标,表示label的向量里有多个1,也就是正类(positive class)可以有多个
这是Multi-Label。
代码里通常用target 表示ground truth,也就是图片中标签(label)的表示,用target 向量存储这些标签(label)
target 向量中的类别索引对应的值为1,就表示图片中有该类目标
太阳、月亮、云 ,索引分别是0、1、2
如果label是1、0、0表示 有太阳(类别索引0的位置为1)
如果label是0、1、0表示 有月亮(类别索引1的位置为1)
CrossEntropyLoss的两个重要参数
Input: (N,C) C 是类别的数量,N是mini-batch的大小,内容是每个类的得分
Target: (N) N是mini-batch的大小,0 <= targets[i] <= C-1
#存储的是索引 假设一共有3类,如果Target是2 表示one-hot向量 001,Target是1表示010
下面的代码是以Multi-Class分类的交叉熵来描述
假设有这样的数据
target可以换成label,ground truth等,one hot 向量表示
input(predicted values) | target(Ground truth values) |
---|---|
1 、2、 7 | 0、0、1 |
交叉熵损失(Cross-Entropy loss)通常定义
公式里的log对应的是以e为底,这里没有使用数学里的ln,对应的函数通常是exp
Loss = − ∑ i = 1 N y ^ i ∗ log ( y i ) \operatorname{Loss}=-\sum_{i=1}^{N} \hat{y}_{i} * \log \left(y_{i}\right) Loss=−i=1∑Ny^i∗log(yi)
y ^ i \hat{y}_{i} y^i是ground truth的值(Ground truth values),是打标签(label)的值,是one-hot向量,代码常用一个叫做Target 向量存储这些值
定义下数据
import torch
import torch.nn as nn
import torch.nn.functional as F
import numpy as np
input = torch.tensor([[1, 2, 7]]).float()
target = torch.tensor([2]).long()# 001
可以用直接用CrossEntropyLoss API实现
loss = nn.CrossEntropyLoss() output = loss(input, target)
print(output)#tensor(0.0092)
CrossEntropyLoss的两个重要参数
Input: (N,C) C 是类别的数量,N是mini-batch的大小,内容是每个类的得分
Target: (N) N是mini-batch的大小,0 <= targets[i] <= C-1
存储的是索引 假设一共有3类,如果Target是2 表示one-hot向量 001,Target是1表示010
y i y_{i} yi是预测值(predicted values),也就是softmax的结果
我先计算下softmax,按照惯例我会列举softmax多种写法,然后找出你最熟悉的字母
S ( f y i ) = e f y i ∑ j e f j S(f_{y_i}) = \dfrac{e^{f_{y_i}}}{\sum_{j}e^{f_j}} S(fyi)=∑jefjefyi
s o f t m a x ( x j ) = e x j ∑ i = 1 n e x i softmax(x_j)=\frac{e^{x_j}}{\sum_{i=1}^{n} e^{x_i}} softmax(xj)=∑i=1nexiexj
s ( x i ) = e x i ∑ j = 1 n e x j s\left(x_{i}\right)=\frac{e^{x_{i}}}{\sum_{j=1}^{n} e^{x_{j}}} s(xi)=∑j=1nexjexi
Softmax ( x i ) = exp ( x i ) ∑ j exp ( x j ) \operatorname{Softmax}\left(x_{i}\right)=\frac{\exp \left(x_{i}\right)}{\sum_{j} \exp \left(x_{j}\right)} Softmax(xi)=∑jexp(xj)exp(xi)
softmax ( z j ) = e z j ∑ k = 1 K e z k for j = 1 , … , K \operatorname{softmax}\left(z_{j}\right)=\frac{e^{z_{j}}}{\sum_{k=1}^{K} e^{z_{k}}} \text { for } j=1, \ldots, \mathrm{K} softmax(zj)=∑k=1Kezkezj for j=1,…,K
把公式拆解
假设有4个数分别是 z 1 , z 2 , z 3 , z 4 z_1 ,z_2,z_3,z_4 z1,z2,z3,z4我们计算其中一个 z 1 z_1 z1
softmax ( z 1 ) = e z 1 ∑ k = 1 K e z k for j = 1 , … , K = e z 1 e z 1 + e z 2 + e z 3 + e z 4 \begin{aligned} \operatorname{softmax}\left(z_{1}\right) &=\frac{e^{z_{1}}}{\sum_{k=1}^{K} e^{z_{k}}} \text { for } j=1, \ldots, \mathrm{K} \\ &=\frac{e^{z_{1}}}{e^{z_{1}}+e^{z_{2}}+e^{z_{3}}+e^{z_{4}}} \\ \end{aligned} softmax(z1)=∑k=1Kezkez1 for j=1,…,K=ez1+ez2+ez3+ez4ez1
输出结果都加起来和是1
e z 1 e z 1 + e z 2 + e z 3 + e z 4 + e z 2 e z 1 + e z 2 + e z 3 + e z 4 + e z 3 e z 1 + e z 2 + e z 3 + e z 4 + e z 4 e z 1 + e z 2 + e z 3 + e z 4 = e z 1 + e z 2 + e z 3 + e z 4 e z 1 + e z 2 + e z 3 + e z 4 = 1 \begin{array}{l} \frac{e^{z_{1}}}{e^{z_{1}}+e^{z_{2}}+e^{z_{3}}+e^{z_{4}}}\\ +\frac{e^{z_{2}}}{e^{z_{1}}+e^{z_{2}}+e^{z_{3}}+e^{z_{4}}} \\ +\frac{e^{z_{3}}}{e^{z_{1}}+e^{z_{2}}+e^{z_{3}}+e^{z_{4}}}\\ +\frac{e^{z_{4}}}{e^{z_{1}}+e^{z_{2}}+e^{z_{3}}+e^{z_{4}}} \\ \\ =\frac{e^{z_{1}}+e^{z_{2}}+e^{z_{3}}+e^{z_{4}}}{e^{z_{1}}+e^{z_{2}}+e^{z_{3}}+e^{z_{4}}}=1 \end{array} ez1+ez2+ez3+ez4ez1+ez1+ez2+ez3+ez4ez2+ez1+ez2+ez3+ez4ez3+ez1+ez2+ez3+ez4ez4=ez1+ez2+ez3+ez4ez1+ez2+ez3+ez4=1
那么代码就是这样
def softmax1(x):
result = np.exp(x) / np.sum(np.exp(x), axis=0)
return result
问题就出现了 e的n次方 ,
n比较大时容易溢出结果是[nan nan nan]
n比较小时例如 -100000,结果是invalid value encountered in true_divide
所以还需要改进公式
为了使softmax函数在数值上稳定,需要将向量中的值标准化,方法是将分子和分母乘以一个常数
过程证明如下
s o f t m a x ( z j ) = e z j ∑ k = 1 N e z k = C e z j C ∑ k = 1 N e z k = e z j + log ( C ) ∑ k = 1 N e z k + log ( C ) \begin{aligned} {softmax}\left(z_{j}\right) &=\frac{e^{z_{j}}}{\sum_{k=1}^{N} e^{z_{k}}} \\ \\ &=\frac{C e^{z_{j}}}{C \sum_{k=1}^{N} e^{z_{k}}} \\ \\ &=\frac{e^{z_{j}+\log (C)}}{\sum_{k=1}^{N} e^{z_{k}+\log (C)}} \end{aligned} softmax(zj)=∑k=1Nezkezj=C∑k=1NezkCezj=∑k=1Nezk+log(C)ezj+log(C)
l o g ( C ) 的 值 为 − m a x ( z ) log(C)的值为−max(z) log(C)的值为−max(z),所以代码就变成了下面的样子
def softmax2(X):
result = np.exp(X - np.max(X))
return result / np.sum(result,axis=0)
softmax计算完再计算logsoftmax
log(Softmax(x))
PyTorch官网将式子拆开是 以下字母将采用PyTorch官网使用的字母
log Softmax ( x i ) = log ( exp ( x i ) ∑ j exp ( x j ) ) \log \operatorname{Softmax}\left(x_{i}\right)=\log \left(\frac{\exp \left(x_{i}\right)}{\sum_{j} \exp \left(x_{j}\right)}\right) logSoftmax(xi)=log(∑jexp(xj)exp(xi))
所以代码可以是这样(多写一份numpy版)
Intermediate1=np.log(softmax2(scores))
Intermediate2=torch.log(torch.tensor(softmax2(scores)))
print("Intermediate1:",Intermediate1)#[-6.00917448 -5.00917448 -0.00917448] 如果末尾保留4为小数,则与tensor数据一致
print("Intermediate2:",Intermediate2)#tensor([-6.0092, -5.0092, -0.0092], dtype=torch.float64)
loss = nn.NLLLoss()
output = loss(Intermediate2.view(1,3), target)
print(output)#tensor(0.0092)
如果是自带的LogSoftmax API实现可以这样写
ls = nn.LogSoftmax(dim=1)
Intermediate = ls(input)
print("Intermediate:",Intermediate)
#Intermediate: tensor([[-6.0092, -5.0092, -0.0092]])
loss = nn.NLLLoss()
output = loss(Intermediate, target)
print(output)#tensor(0.0092)
NLLLoss是negative log likelihood loss缩写 ,计算方法如下
在这个例子中, target是2, one-hot表示为(0,0,1), NLLLoss的计算如下
− ( 0 ∗ ( − 6.0092 ) + 0 ∗ ( − 5.0092 ) + 1 ∗ ( − 0.0092 ) ) = 0.0092 -(0* (-6.0092)+0*(-5.0092)+1*(-0.0092))=0.0092 −(0∗(−6.0092)+0∗(−5.0092)+1∗(−0.0092))=0.0092
最终结果是0.0092
关于logsoftmax的变形
log ( e x j ∑ i = 1 n e x i ) = log ( e x j ) − log ( ∑ i = 1 n e x i ) = x j − log ( ∑ i = 1 n e x i ) \begin{aligned} \log \left(\frac{e^{x} j}{\sum_{i=1}^{n} e^{x_{i}}}\right) &=\log \left(e^{x_{j}}\right)-\log \left(\sum_{i=1}^{n} e^{x_{i}}\right) \\ &=x_{j}-\log \left(\sum_{i=1}^{n} e^{x_{i}}\right) \end{aligned} log(∑i=1nexiexj)=log(exj)−log(i=1∑nexi)=xj−log(i=1∑nexi)
因为
log SumExp ( x 1 … x n ) = log ( ∑ i = 1 n e x i ) \log \operatorname{SumExp}\left(x_{1} \ldots x_{n}\right)=\log \left(\sum_{i=1}^{n} e^{x_{i}}\right) logSumExp(x1…xn)=log(i=1∑nexi)
代码编写的依据是
log ( Softmax ( x j , x 1 … x n ) ) = x j − log SumExp ( x 1 … x n ) = x j − ( log ( ∑ i = 1 n e x i − c ) + c ) \begin{aligned} \log \left(\text {Softmax}\left(x_{j}, x_{1} \ldots x_{n}\right)\right) &=x_{j}-\log \operatorname{SumExp}\left(x_{1} \ldots x_{n}\right) \\ &=x_{j}-(\log \left(\sum_{i=1}^{n} e^{x_{i}-c}\right)+c) \end{aligned} log(Softmax(xj,x1…xn))=xj−logSumExp(x1…xn)=xj−(log(i=1∑nexi−c)+c)
logSumExp中间过程是
根据对数运算法则
e a ⋅ e b = e a + b log ( a ⋅ b ) = log ( a ) + log ( b ) log SumExp ( x 1 … x n ) = log ( ∑ i = 1 n e x i ) = log ( ∑ i = 1 n e x i − c e c ) = log ( e c ∑ i = 1 n e x i − c ) = log ( ∑ i = 1 n e x i − c ) + log ( e c ) = log ( ∑ i = 1 n e x i − c ) + c \begin{aligned} e^{a} \cdot e^{b}&=e^{a+b} & \\ \log (a \cdot b) &=\log (a)+\log (b) \\ \\ \log \operatorname{SumExp}\left(x_{1} \ldots x_{n}\right) &=\log \left(\sum_{i=1}^{n} e^{x_{i}}\right) \\ &=\log \left(\sum_{i=1}^{n} e^{x_{i} - c} e^{c}\right) \\ &=\log \left(e^{c} \sum_{i=1}^{n} e^{x_{i}- c}\right) \\ &=\log \left(\sum_{i=1}^{n} e^{x_{i}- c}\right)+\log \left(e^{c}\right) \\ &=\log \left(\sum_{i=1}^{n} e^{x_{i}- c}\right)+c \end{aligned} ea⋅eblog(a⋅b)logSumExp(x1…xn)=ea+b=log(a)+log(b)=log(i=1∑nexi)=log(i=1∑nexi−cec)=log(eci=1∑nexi−c)=log(i=1∑nexi−c)+log(ec)=log(i=1∑nexi−c)+c
换成另一种方式
y = log SumExp ( x 1 … x n ) y=\log \operatorname{SumExp}\left(x_{1} \ldots x_{n}\right) y=logSumExp(x1…xn)
y = log ∑ i = 1 n e x i e y = ∑ i = 1 n e x i e − c e y = e − c ∑ i = 1 n e x i e y − c = ∑ i = 1 n e − c e x i y − c = log ∑ i = 1 n e x i − c y = log ∑ i = 1 n e x i − c + c \begin{aligned} y &=\log \sum_{i=1}^{n} e^{x_{i}} \\ e^{y} &=\sum_{i=1}^{n} e^{x_{i}} \\ e^{-c} e^{y} &=e^{-c} \sum_{i=1}^{n} e^{x_{i}} \\ e^{y-c}&=\sum_{i=1}^{n} e^{-c} e^{x_{i}} \\ y-c&=\log \sum_{i=1}^{n} e^{x_{i}-c} \\ y&=\log \sum_{i=1}^{n} e^{x_{i}-c}+c \end{aligned} yeye−ceyey−cy−cy=logi=1∑nexi=i=1∑nexi=e−ci=1∑nexi=i=1∑ne−cexi=logi=1∑nexi−c=logi=1∑nexi−c+c
C的取值
c = max i x i c=\max _{i} x_{i} c=imaxxi
那么代码可以这样写
c=input.max(dim=1,keepdim=True)
d=torch.logsumexp(input, 1,keepdim=True)
Intermediate=input-d
print(Intermediate)#tensor([[-6.0092, -5.0092, -0.0092]]) 与上面的值完全相同
loss = nn.NLLLoss()
output = loss(Intermediate, target)
print(output)#tensor(0.0092)
如果把torch.logsumexp拆开可以这样写
c = input.data.max()
d=(torch.log(torch.sum(torch.exp(input - c), 1, keepdim=True)) + c)
Intermediate=input-d
print(Intermediate)#tensor([[-6.0092, -5.0092, -0.0092]]) 与上面的值完全相同
loss = nn.NLLLoss()
output = loss(Intermediate, target)
print(output)#tensor(0.0092)
完整的源码如下
import torch
import torch.nn as nn
import torch.nn.functional as F
import numpy as np
input = torch.tensor([[1, 2, 7]]).float()
target = torch.tensor([2]).long()# 001
#-----------------------------------------------------
# 方法1
loss = nn.CrossEntropyLoss()
output = loss(input, target)
print(output)#tensor(0.0092)
#-----------------------------------------------------
# 方法2
ls = nn.LogSoftmax(dim=1)
Intermediate = ls(input)
print("Intermediate:",Intermediate)
#Intermediate: tensor([[-6.0092, -5.0092, -0.0092]])
loss = nn.NLLLoss()
output = loss(Intermediate, target)
print(output)#tensor(0.0092)
#-----------------------------------------------------
# 方法3
scores = np.array([1, 2, 7])
def softmax1(x):
result = np.exp(x) / np.sum(np.exp(x), axis=0)
return result
def softmax2(X):
result = np.exp(X - np.max(X))
return result / np.sum(result,axis=0)
print(softmax1(scores))#[0.00245611 0.00667641 0.99086747]
print(softmax2(scores))#[0.00245611 0.00667641 0.99086747]
Intermediate1=np.log(softmax2(scores))
Intermediate2=torch.log(torch.tensor(softmax2(scores)))
print("Intermediate1:",Intermediate1)#[-6.00917448 -5.00917448 -0.00917448] 如果末尾保留4为小数,则与tensor数据一致
print("Intermediate2:",Intermediate2)#tensor([-6.0092, -5.0092, -0.0092], dtype=torch.float64)
loss = nn.NLLLoss()
output = loss(Intermediate2.view(1,3), target)
print(output)#tensor(0.0092)
#一个np类型,一个tensor类型
#输出与上面的Intermediate一致,可以再执行NLLLoss
#主要用于区别softmax2比softmax1更稳定
#-----------------------------------------------------
#方法4
c=input.max(dim=1,keepdim=True)
d=torch.logsumexp(input, 1,keepdim=True)
Intermediate=input-d
print(Intermediate)#tensor([[-6.0092, -5.0092, -0.0092]]) 与上面的值完全相同
loss = nn.NLLLoss()
output = loss(Intermediate, target)
print(output)#tensor(0.0092)
#-----------------------------------------------------
#方法5
c = input.data.max()
d=(torch.log(torch.sum(torch.exp(input - c), 1, keepdim=True)) + c)
Intermediate=input-d
print(Intermediate)#tensor([[-6.0092, -5.0092, -0.0092]]) 与上面的值完全相同
loss = nn.NLLLoss()
output = loss(Intermediate, target)
print(output)#tensor(0.0092)
#-----------------------------------------------------
版本 PyTorch 1.5
参数size_average,reduce都是要被废弃的,所以重点是reduction
一共有3个值可以选择 none’ | ‘mean’ | ‘sum’
当batchsize=1时
input = torch.tensor([[1, 2, 7]]).float()
target = torch.tensor([2]).long()# 001
none’ | ‘mean’ | ‘sum’
输出分别是
tensor(0.0092)
tensor([0.0092])
tensor(0.0092)
tensor(0.0092)
没有变化
假设batchsize=3,
3个样本的时候
import torch
import torch.nn as nn
#3个样本
input = torch.tensor([[1, 2, 7],[4,7,9],[8,3,4]]).float()
target = torch.tensor([2,2,0]).long()# 001
loss = nn.CrossEntropyLoss()
loss1 = nn.CrossEntropyLoss(reduction="none")
loss2 = nn.CrossEntropyLoss(reduction="mean")
loss3 = nn.CrossEntropyLoss(reduction="sum")
output = loss(input, target)
output1 = loss1(input, target)
output2 = loss2(input, target)
output3 = loss3(input, target)
print(output)
print(output1)
print(output2)
print(output3)
# tensor(0.0556)
# tensor([0.0092, 0.1328, 0.0247])#none
# tensor(0.0556)#mean
# tensor(0.1668)#sum
默认情况reduction=“mean”
mean表示 均值 0.1668 / 3= 0.0556
sum表示和 0.0092 + 0.1328 + 0.0247 =0.1668
参考
关于论文
tricks-of-the-trade-logsumexp
Understanding Categorical Cross-Entropy Loss, Binary Cross-Entropy Loss, Softmax Loss, Logistic Loss, Focal Loss and all those confusing names
Classification and Loss Evaluation - Softmax and Cross Entropy Loss
The log-sum-exp trick in Machine Learning
torch.nn.CrossEntropyLoss
torch.nn.LogSoftmax
Understanding softmax and the negative log-likelihood