Environment
损失函数用于描述模型预测与真实值之间的差异。严格意义上来说,损失函数(loss function)是对于单个样本实例而言的,而代价函数(cost function)的对于训练数据集而言的
但实际表述中并不严格区分二者。此外,最终要优化的函数为目标函数为代价函数加上正则项
PyTorch 在 torch.nn
模块中提供了 18 种常用的损失函数的类,它们被定义为 torch.nn.Module
的子类,通过重写 forward
方法,在其中调用 torch.nn.functional 中的函数实现。
from torch.nn import Module, CrossEntropyLoss
issubclass(CrossEntropyLoss, Module) # True
实例化这些类时,都需要传入一个参数 reduction
mean
,计算 Cost = 1 N ∑ i N f ( y ^ , y ) \text{Cost} = \frac{1}{N} \sum_i^N f(\hat{y}, y) Cost=N1∑iNf(y^,y)sum
,则计算 Cost = ∑ i N f ( y ^ , y ) \text{Cost} = \sum_i^N f(\hat{y}, y) Cost=∑iNf(y^,y)none
,则计算 Loss = f ( y ^ , y ) \text{Loss} = f(\hat{y}, y) Loss=f(y^,y)from torch.nn import L1Loss
l1_loss = L1Loss(reduction='mean')
这里小小地学习一下这些损失函数的类,有些尚未在应用到,日后方便查询 ✅
L1Loss
类,计算 inputs 和 target 之差的绝对值, l o s s = ∣ y ^ − y ∣ loss = |\hat{y} - y| loss=∣y^−y∣
import torch
from torch.nn import L1Loss
# create data
inputs = torch.tensor([1, 5, 3, 9, 7], dtype=torch.float)
target = torch.tensor([4, 2, 6, 0, 8], dtype=torch.float)
l1_loss = L1Loss(reduction='none')
print(l1_loss(inputs, target))
# tensor([3., 3., 3., 9., 1.])
SmoothL1Loss
类,平滑 L1 损失函数,计算公式为
l o s s = { 1 2 ( y ^ − y ) 2 , if ∣ y ^ − y ∣ < 1 ∣ y ^ − y ∣ − 1 2 , otherwise loss = \begin{cases} \frac{1}{2} (\hat{y} - y)^2, \text{ if} |\hat{y} - y| < 1 \\ |\hat{y} - y| - \frac{1}{2}, \text{otherwise} \end{cases} loss={21(y^−y)2, if∣y^−y∣<1∣y^−y∣−21,otherwise
import torch
from torch.nn import SmoothL1Loss
# create data
inputs = torch.tensor([1, 5, 3, 9, 7.6], dtype=torch.float)
target = torch.tensor([4, 2, 6, 0, 8], dtype=torch.float)
smooth_l1_loss = SmoothL1Loss(reduction='none')
print(smooth_l1_loss(inputs, target))
# tensor([2.5000, 2.5000, 2.5000, 8.5000, 0.0800])
MSELoss
类,计算 inputs 和 target 之差的平方, l o s s = ( y ^ − y ) 2 loss = (\hat{y} - y)^2 loss=(y^−y)2
import torch
from torch.nn import MSELoss
# create data
inputs = torch.tensor([1, 5, 3, 9, 7], dtype=torch.float)
target = torch.tensor([4, 2, 6, 0, 8], dtype=torch.float)
l2_loss = MSELoss(reduction='none')
print(l2_loss(inputs, target))
BCELoss
类,计算二分类交叉熵,要求输入 inputs 的值范围在 [ 0 , 1 ] [0, 1] [0,1]
import torch
from torch.nn import BCELoss
# create data
inputs = torch.tensor([
[1, 3],
[4, 2]
], dtype=torch.float)
target = torch.tensor([
[1, 0],
[1, 0]
], dtype=torch.float)
binary_crossentropy_loss = BCELoss(
weight=None,
reduction='none'
)
# 用 sigmoid 将输入压缩至 0 到 1
print(binary_crossentropy_loss(torch.sigmoid(inputs), target))
# tensor([[0.3133, 3.0486],
# [0.0181, 2.1269]])
在 BCELoss
类中,要求输入 inputs 的值范围在 [ 0 , 1 ] [0, 1] [0,1],因此需要额外调用 torch.sigmoid
计算二分类交叉熵。
使用 BCEWithLogitsLoss
类则不用额外调用 torch.sigmoid
。计算公式为
l o s s = − ( y log σ ( y ^ ) + ( 1 − y ) log ( 1 − σ ( y ^ ) ) ) loss = -(y \log{\sigma(\hat{y})} + (1 - y) \log{(1 - \sigma(\hat{y}))}) loss=−(ylogσ(y^)+(1−y)log(1−σ(y^)))
import torch
from torch.nn import BCEWithLogitsLoss
# create data
inputs = torch.tensor([
[1, 3],
[4, 2]
], dtype=torch.float)
target = torch.tensor([
[1, 0],
[1, 0]
], dtype=torch.float)
bce_with_logits_loss = BCEWithLogitsLoss(
weight=None,
reduction='none',
pos_weight=None # 正样本的权重
)
print(bce_with_logits_loss(inputs, target))
# tensor([[0.3133, 3.0486],
# [0.0181, 2.1269]])
CrossEntropyLoss
类,将 LogSoftmax
和 NLLLoss
结合,计算交叉熵损失。可以参考这篇博文 Pytorch详解NLLLoss和CrossEntropyLoss。
import torch
from torch.nn import CrossEntropyLoss
# create data
inputs = torch.tensor([
[1, 2],
[1, 3],
[1, 3]
], dtype=torch.float)
target = torch.tensor([0, 1, 1], dtype=torch.long)
cross_entropy_loss = CrossEntropyLoss(
weight=None, # 为各类别的损失设置权重
ignore_index=-1, # 忽略某个类别
reduction='none'
)
print(cross_entropy_loss(inputs, target))
# tensor([1.3133, 0.1269, 0.1269])
传入 weight
参数
import torch
from torch.nn import CrossEntropyLoss
# create data
inputs = torch.tensor([
[1, 2],
[1, 3],
[1, 3]
], dtype=torch.float)
target = torch.tensor([0, 1, 1], dtype=torch.long)
# 设置 weight,如下表示:表示标签为 0 的样本权重为 1,标签为 1 的权重为 2
weight = torch.tensor([1, 2], dtype=torch.float)
cross_entropy_loss = CrossEntropyLoss(weight, reduction='none')
print(cross_entropy_loss(inputs, target))
# tensor([1.3133, 0.2539, 0.2539])
NLLLoss
类,取出真实标签对应的预测分数,并取相反数
import torch
from torch.nn import NLLLoss
# create data
inputs = torch.tensor([
[1, 2],
[1, 3],
[1, 3]
], dtype=torch.float)
target = torch.tensor([0, 1, 1], dtype=torch.long)
nll = NLLLoss(
weight=None,
ignore_index=-1,
reduction='none'
)
print(nll(inputs, target))
# tensor([-1., -3., -3.])
PoisonNLLLoss
类,对于泊松分布目标的负对数似然损失,计算公式
若参数 log_input
为 True
,则 l o s s = e y ^ − y × y ^ loss = e^{\hat{y}} - y \times \hat{y} loss=ey^−y×y^
若参数 log_input
为 False
,则 l o s s = y ^ − y × log ( y ^ + e p s ) loss = \hat{y} - y \times \log{(\hat{y} + eps)} loss=y^−y×log(y^+eps)
import torch
from torch.nn import PoissonNLLLoss
# create data
inputs = torch.tensor([
[0.3, 0.7],
[0.6, 0.4]
], dtype=torch.float)
target = torch.tensor([
[1, 0],
[1, 0]
], dtype=torch.long)
poison_nll_loss = PoissonNLLLoss(
log_input=True, # 指示是否输入的预测值已取了对数
full=False, # 计算所有 loss,默认为 False
eps=1e-8, # 修正项,避免 log_input=False 时对 0 取对数
reduction='none'
)
print(poison_nll_loss(inputs, target))
# tensor([[1.0499, 2.0138],
# [1.2221, 1.4918]])
KLDivLoss
类,计算 KL 散度(KL Divergence),即相对熵
相对熵的理论公式为
D K L ( P ∣ Q ) = E x ∼ p [ P ( x ) Q ( x ) ] = E x ∼ p [ log P ( x ) − log Q ( x ) ] D_{KL}(P|Q) = E_{x \sim p} [\frac{P(x)}{Q(x)}] = E_{x \sim p} [\log{P(x)} - \log{Q(x)}] DKL(P∣Q)=Ex∼p[Q(x)P(x)]=Ex∼p[logP(x)−logQ(x)]
但 PyTorch 中的计算为
l o s s = y ( log y − y ^ ) loss = y (\log{y} - \hat{y}) loss=y(logy−y^)
意味着,在输入 y ^ \hat{y} y^ 之前要先计算其 log-probability,可以使用 LogSoftmax 实现
import torch
from torch.nn import KLDivLoss, LogSoftmax
# create data
inputs = torch.tensor([
[0.5, 0.3, 0.2],
[0.2, 0.3, 0.5]
])
target = torch.tensor([
[0.9, 0.05, 0.05],
[0.1, 0.7, 0.2]
])
# log-probability
inputs = LogSoftmax(dim=1)(inputs)
# reduction 还可以传入 'batchmean',之后版本的 reduction='mean' 的效果将变为与 reduction='batchmean' 相同
kl_div_loss = KLDivLoss(reduction='none')
print(kl_div_loss(inputs, target))
# tensor([[ 0.7510, -0.0928, -0.0878],
# [-0.1063, 0.5482, -0.1339]])
MarginRankingLoss
类,描述两个向量之间的相似度,常用于排序任务
计算公式: l o s s = m a x { 0 , − y × ( y 1 ^ − y 2 ^ ) + m a r g i n ) } loss = max\{0, -y \times (\hat{y_1} - \hat{y_2}) + margin)\} loss=max{0,−y×(y1^−y2^)+margin)}
import torch
from torch.nn import MarginRankingLoss
# create data
y1 = torch.tensor([
[1],
[2],
[3]
], dtype=torch.float)
y2 = torch.tensor([
[2],
[2],
[2]
], dtype=torch.float)
y_true = torch.tensor([1, 1, -1], dtype=torch.float)
margin_ranking_loss = MarginRankingLoss(
margin=0.0, # 边界值,\hat{y_1} 和 \hat{y_2} 之间的差异值
reduction='none'
)
# 返回一个 n x n 的 loss 矩阵,
# 第一行表示 y1 中的第一个元素和 y2 中的每一个元素计算的结果,
# 第二行表示 y1 中的第二个元素和 y2 中的每一个元素计算的结果,以此类推。
print(margin_ranking_loss(y1, y2, y_true))
# tensor([[1., 1., 0.],
# [0., 0., 0.],
# [0., 0., 1.]])
HingeEmbeddingLoss
类,计算预测与真实之间的相似性,常用于非线性 embedding 和半监督学习。
计算公式:
l o s s = { y ^ , i f y = 1 m a x { 0 , Δ − y ^ } , i f y = − 1 loss = \begin{cases} \hat{y} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ , if \ \ y = 1 \\ max\{ 0, \Delta - \hat{y} \} \ \ , if \ \ y = -1 \end{cases} loss={y^ ,if y=1max{0,Δ−y^} ,if y=−1
import torch
from torch.nn import HingeEmbeddingLoss
# create data
# 输入 inputs 应为两个预测之差的绝对值
inputs = torch.tensor([[1., .8, .5]])
target = torch.tensor([[1, 1, -1]])
hinge_embedding_loss = HingeEmbeddingLoss(margin=1.0, reduction='none')
print(hinge_embedding_loss(inputs, target))
# tensor([[1.0000, 0.8000, 0.5000]])
MultiLabelMarginLoss
类,多标签边界损失。计算公式为
l o s s = ∑ i j m a x { 0 , 1 − ( x [ y [ j ] ] − x [ i ] ) } x.size ( 0 ) loss = \sum_{ij} \frac{max\{ 0, 1 - (x[y[j]] - x[i]) \}}{\text{x.size}(0)} loss=ij∑x.size(0)max{0,1−(x[y[j]]−x[i])}
where i = 0 i = 0 i=0 to x.size ( 0 ) − 1 \text{x.size}(0)-1 x.size(0)−1, j = 0 j = 0 j=0 to y.size ( 0 ) − 1 \text{y.size}(0)-1 y.size(0)−1, 0 ≤ y [ j ] ≥ x.size ( 0 ) − 1 0 \leq y[j] \geq \text{x.size}(0)-1 0≤y[j]≥x.size(0)−1, and i ≠ y [ j ] i \not= y[j] i=y[j] for all i i i and j j j.
import torch
from torch.nn import MultiLabelMarginLoss
# create data
inputs = torch.tensor([[0.1, 0.2, 0.4, 0.8]])
target = torch.tensor([[0, 3, -1, -1]], dtype=torch.long)
multi_label_margin_loss = MultiLabelMarginLoss(reduction='none')
print(multi_label_margin_loss(inputs, target))
# tensor([0.8500])
手动计算
import torch
# create data
inputs = torch.tensor([[0.1, 0.2, 0.4, 0.8]])
target = torch.tensor([[0, 3, -1, -1]], dtype=torch.long)
input_ = inputs[0]
item1 = (1 - (input_[0] - input_[1])) + (1 - (input_[0] - input_[2]))
item2 = (1 - (input_[3] - input_[1])) + (1 - (input_[3] - input_[2]))
print((item1 + item2) / input_.size(0))
# tensor([0.8500])
SoftMarginLoss
类,二分类 logistic 损失函数,计算公式
l o s s = log ( 1 + e ( − y × y ^ ) ) loss = \log{(1 + e^{(-y \times \hat{y})})} loss=log(1+e(−y×y^))
import torch
from torch.nn import SoftMarginLoss
# create data
inputs = torch.tensor([
[0.3, 0.7],
[0.5, 0.5]
])
target = torch.tensor([
[-1, 1],
[1, -1]
], dtype=torch.float)
soft_margin_loss = SoftMarginLoss(reduction='none')
print(soft_margin_loss(inputs, target))
# tensor([[0.8544, 0.4032],
# [0.4741, 0.9741]])
MultiLabelSoftMarginLoss
类,SoftMarginLoss
的多标签版本,计算公式
l o s s = − 1 C ∑ i [ y i log ( 1 1 + e − y i ^ ) + ( 1 − y i ) log ( e − y i ^ 1 + e − y i ^ ) ] loss = -\frac{1}{C} \sum_i \left[ y_i \log{(\frac{1}{1 + e^{-\hat{y_i}}})} + (1 - y_i) \log{(\frac{e^{-\hat{y_i}}}{1 + e^{-\hat{y_i}}})} \right] loss=−C1i∑[yilog(1+e−yi^1)+(1−yi)log(1+e−yi^e−yi^)]
其中, C C C 为标签数, y i y_i yi 表示某一个标签的真实值, y i ^ \hat{y_i} yi^ 表示某一个标签的预测值。
import torch
from torch.nn import MultiLabelSoftMarginLoss
# create data
inputs = torch.tensor([[0.3, 0.7, 0.8]])
target = torch.tensor([[0, 1, 1]], dtype=torch.float)
multi_label_soft_margin_loss = MultiLabelSoftMarginLoss(weight=None, reduction='none')
print(multi_label_soft_margin_loss(inputs, target))
# tensor([0.5429])
手动计算
import torch
# create data
inputs = torch.tensor([[0.3, 0.7, 0.8]])
target = torch.tensor([[0, 1, 1]], dtype=torch.float)
C = 3
i_0 = torch.log(torch.exp(-inputs[0][0]) / (1 + torch.exp(-inputs[0][0])))
i_1 = torch.log(1 / (1 + torch.exp(-inputs[0][1])))
i_2 = torch.log(1 / (1 + torch.exp(-inputs[0][2])))
res = -(1 / C) * (i_0 + i_1 + i_2)
print(res)
# tensor([0.5429])
CosineEmbeddingLoss
类,采用余弦相似度计算两个输入的相似性,常用于非线性 embedding 和半监督学习。计算公式
l o s s = { 1 − cos ( x 1 , x 2 ) , i f y = 1 m a x { 0 , cos ( x 1 , x 2 ) − m a r g i n } , i f y = − 1 loss = \begin{cases} 1 - \cos{(x_1, x_2)} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ , if \ \ y = 1 \\ max\{ 0, \cos{(x_1, x_2)} - margin \} \ \ , if \ \ y = -1 \end{cases} loss={1−cos(x1,x2) ,if y=1max{0,cos(x1,x2)−margin} ,if y=−1
import torch
from torch.nn import CosineEmbeddingLoss
# create data
inputs1 = torch.tensor([
[.3, .5, .7],
[.3, .5, .7]
])
inputs2 = torch.tensor([
[.1, .3, .5],
[.1, .3, .5]
])
target = torch.tensor([1, -1], dtype=torch.float)
cosine_embedding_loss = CosineEmbeddingLoss(
margin=0.0, # 边界值,可取值范围为 [-1, 1],推荐 [0, 0.5]
reduction='none'
)
print(cosine_embedding_loss(inputs1, inputs2, target))
# tensor([0.0167, 0.9833])
手动计算
import torch
from torch.nn import CosineEmbeddingLoss
# create data
inputs1 = torch.tensor([
[.3, .5, .7],
[.3, .5, .7]
])
inputs2 = torch.tensor([
[.1, .3, .5],
[.1, .3, .5]
])
target = torch.tensor([1, -1], dtype=torch.float)
def cosine(a, b):
numerator = a @ b
denominator = torch.norm(a, 2) * torch.norm(b, 2)
return numerator / denominator
res_y_pos = 1 - cosine(inputs1[0], inputs2[0]) # y = 1
res_y_neg = max(0, cosine(inputs1[1], inputs2[1])) # y = -1
print(res_y_pos, res_y_neg)
# tensor(0.0167) tensor(0.9833)
MultiMarginLoss
类,计算多分类任务的合页损失,计算公式
l o s s = 1 C ∑ i ( m a x { 0 , m a r g i n − x [ y ] + x [ i ] } ) p loss = \frac{1}{C} \sum_i (max\{ 0, margin - x[y] + x[i] \})^p loss=C1i∑(max{0,margin−x[y]+x[i]})p
其中, C C C 为多分类的类别数
import torch
from torch.nn import MultiMarginLoss
# create data
inputs = torch.tensor([
[0.1, 0.2, 0.7],
[0.2, 0.5, 0.3]
])
target = torch.tensor([1, 2], dtype=torch.long)
multi_margin_loss = MultiMarginLoss(
p=1, # 指数部分的值,可传入 1 或 2
margin=1.0, # 边界值
weight=None, # 各类别损失的权重
reduction='none'
)
print(multi_margin_loss(inputs, target))
# tensor([0.8000, 0.7000])
手动计算
import torch
# create data
inputs = torch.tensor([
[0.1, 0.2, 0.7],
[0.2, 0.5, 0.3]
])
target = torch.tensor([1, 2], dtype=torch.long)
# 对于第一个样本实例
inputs_ = inputs[0]
margin = 1
i_0 = margin - (inputs_[1] - inputs_[0]) # > 0
i_2 = margin - (inputs_[1] - inputs_[2]) # > 0
res = (i_0 + i_2) / inputs_.size(0)
print(res)
# tensor(0.8000)
TripletMarginLoss
类,计算三元组损失,常用于人脸识别。计算公式
L ( a , p , n ) = m a x { d ( a i , p i ) − d ( a i , n i ) + m a r g i n , 0 } L(a, p, n) = max\{ d(a_i, p_i) - d(a_i, n_i) + margin, 0 \} L(a,p,n)=max{d(ai,pi)−d(ai,ni)+margin,0}
其中, d ( x , y ) = ∣ ∣ x − y ∣ ∣ p d(x, y) = ||x - y||_p d(x,y)=∣∣x−y∣∣p
在这里插入代码片import torch
from torch.nn import TripletMarginLoss
# create data
anchor = torch.tensor([[1.]])
pos = torch.tensor([[2.]])
neg = torch.tensor([[0.5]])
triplet_margin_loss = TripletMarginLoss(
margin=1.0, # 边界值
p=2.0, # 范数的阶,默认为 2
eps=1e-06,
swap=False,
reduction='none'
)
print(triplet_margin_loss(anchor, pos, neg))
# tensor([1.5000])
手动计算
(1. - 2.)**2 - np.sqrt((1. - .5)**2) + 1 # 1.5
计算 CTC (Connectionist Temporal Classificatoin) 损失,用于时序类数据的分类问题。
from torch.nn import CTCLoss
CTCLoss(
blank=0, # blank label
reduction='mean',
zero_infinity=False # 无穷大的值或梯度置零
)
时序建模方面有待进一步了解…