PyTorch学习笔记(四)损失函数

Environment

  • OS: macOS Mojave
  • Python version: 3.7
  • PyTorch version: 1.4.0
  • IDE: PyCharm

文章目录

  • 0. 写在前面
  • 1. L1Loss
  • 2. SmoothL1Loss
  • 3. MSELoss
  • 4. BCELoss
  • 5. BCEWithLogitsLoss
  • 6. CrossEntropyLoss
  • 7. NLLLoss
  • 8. PoisonNLLLoss
  • 9. KLDivLoss
  • 10. MarginRankingLoss
  • 11. HingeEmbeddingLoss
  • 12. MultiLabelMarginLoss
  • 13. SoftMarginLoss
  • 14. MultiLabelSoftMarginLoss
  • 15. CosineEmbeddingLoss
  • 16. MultiMarginLoss
  • 17. TripletMarginLoss
  • 18. CTCLoss


0. 写在前面

损失函数用于描述模型预测与真实值之间的差异。严格意义上来说,损失函数(loss function)是对于单个样本实例而言的,而代价函数(cost function)的对于训练数据集而言的

  • 损失函数 Loss = f ( y ^ , y ) \text{Loss} = f(\hat{y}, y) Loss=f(y^,y)
  • 代价函数 Cost = ∑ i N f ( y ^ , y ) \text{Cost} = \sum_i^N f(\hat{y}, y) Cost=iNf(y^,y) Cost = 1 N ∑ i N f ( y ^ , y ) \text{Cost} = \frac{1}{N} \sum_i^N f(\hat{y}, y) Cost=N1iNf(y^,y)

但实际表述中并不严格区分二者。此外,最终要优化的函数为目标函数为代价函数加上正则项

  • 目标函数 Obj = Cost + Regularization \text{Obj} = \text{Cost} + \text{Regularization} Obj=Cost+Regularization

PyTorch 在 torch.nn 模块中提供了 18 种常用的损失函数的类,它们被定义为 torch.nn.Module 的子类,通过重写 forward 方法,在其中调用 torch.nn.functional 中的函数实现。

from torch.nn import Module, CrossEntropyLoss

issubclass(CrossEntropyLoss, Module)  # True

实例化这些类时,都需要传入一个参数 reduction

  • 默认值为 mean,计算 Cost = 1 N ∑ i N f ( y ^ , y ) \text{Cost} = \frac{1}{N} \sum_i^N f(\hat{y}, y) Cost=N1iNf(y^,y)
  • 若传入 sum,则计算 Cost = ∑ i N f ( y ^ , y ) \text{Cost} = \sum_i^N f(\hat{y}, y) Cost=iNf(y^,y)
  • 若传入 none,则计算 Loss = f ( y ^ , y ) \text{Loss} = f(\hat{y}, y) Loss=f(y^,y)
from torch.nn import L1Loss

l1_loss = L1Loss(reduction='mean')

这里小小地学习一下这些损失函数的类,有些尚未在应用到,日后方便查询 ✅

1. L1Loss

L1Loss 类,计算 inputs 和 target 之差的绝对值, l o s s = ∣ y ^ − y ∣ loss = |\hat{y} - y| loss=y^y

import torch
from torch.nn import L1Loss

# create data
inputs = torch.tensor([1, 5, 3, 9, 7], dtype=torch.float)
target = torch.tensor([4, 2, 6, 0, 8], dtype=torch.float)

l1_loss = L1Loss(reduction='none')
print(l1_loss(inputs, target))
# tensor([3., 3., 3., 9., 1.])

2. SmoothL1Loss

SmoothL1Loss 类,平滑 L1 损失函数,计算公式为
l o s s = { 1 2 ( y ^ − y ) 2 ,  if ∣ y ^ − y ∣ < 1 ∣ y ^ − y ∣ − 1 2 , otherwise loss = \begin{cases} \frac{1}{2} (\hat{y} - y)^2, \text{ if} |\hat{y} - y| < 1 \\ |\hat{y} - y| - \frac{1}{2}, \text{otherwise} \end{cases} loss={21(y^y)2, ify^y<1y^y21,otherwise

import torch
from torch.nn import SmoothL1Loss

# create data
inputs = torch.tensor([1, 5, 3, 9, 7.6], dtype=torch.float)
target = torch.tensor([4, 2, 6, 0, 8], dtype=torch.float)

smooth_l1_loss = SmoothL1Loss(reduction='none')
print(smooth_l1_loss(inputs, target))
# tensor([2.5000, 2.5000, 2.5000, 8.5000, 0.0800])

3. MSELoss

MSELoss 类,计算 inputs 和 target 之差的平方, l o s s = ( y ^ − y ) 2 loss = (\hat{y} - y)^2 loss=(y^y)2

import torch
from torch.nn import MSELoss

# create data
inputs = torch.tensor([1, 5, 3, 9, 7], dtype=torch.float)
target = torch.tensor([4, 2, 6, 0, 8], dtype=torch.float)

l2_loss = MSELoss(reduction='none')

print(l2_loss(inputs, target))

4. BCELoss

BCELoss 类,计算二分类交叉熵,要求输入 inputs 的值范围在 [ 0 , 1 ] [0, 1] [0,1]

import torch
from torch.nn import BCELoss

# create data
inputs = torch.tensor([
    [1, 3],
    [4, 2]
], dtype=torch.float)
target = torch.tensor([
    [1, 0],
    [1, 0]
], dtype=torch.float)

binary_crossentropy_loss = BCELoss(
    weight=None,
    reduction='none'
)

# 用 sigmoid 将输入压缩至 0 到 1
print(binary_crossentropy_loss(torch.sigmoid(inputs), target))
# tensor([[0.3133, 3.0486],
#         [0.0181, 2.1269]])

5. BCEWithLogitsLoss

BCELoss 类中,要求输入 inputs 的值范围在 [ 0 , 1 ] [0, 1] [0,1],因此需要额外调用 torch.sigmoid 计算二分类交叉熵。

使用 BCEWithLogitsLoss 类则不用额外调用 torch.sigmoid。计算公式为
l o s s = − ( y log ⁡ σ ( y ^ ) + ( 1 − y ) log ⁡ ( 1 − σ ( y ^ ) ) ) loss = -(y \log{\sigma(\hat{y})} + (1 - y) \log{(1 - \sigma(\hat{y}))}) loss=(ylogσ(y^)+(1y)log(1σ(y^)))

import torch
from torch.nn import BCEWithLogitsLoss

# create data
inputs = torch.tensor([
    [1, 3],
    [4, 2]
], dtype=torch.float)
target = torch.tensor([
    [1, 0],
    [1, 0]
], dtype=torch.float)

bce_with_logits_loss = BCEWithLogitsLoss(
    weight=None,
    reduction='none',
    pos_weight=None  # 正样本的权重
)
print(bce_with_logits_loss(inputs, target))
# tensor([[0.3133, 3.0486],
#         [0.0181, 2.1269]])

6. CrossEntropyLoss

CrossEntropyLoss 类,将 LogSoftmaxNLLLoss 结合,计算交叉熵损失。可以参考这篇博文 Pytorch详解NLLLoss和CrossEntropyLoss。

import torch
from torch.nn import CrossEntropyLoss

# create data
inputs = torch.tensor([
    [1, 2],
    [1, 3],
    [1, 3]
], dtype=torch.float)
target = torch.tensor([0, 1, 1], dtype=torch.long)

cross_entropy_loss = CrossEntropyLoss(
    weight=None,  # 为各类别的损失设置权重
    ignore_index=-1,  # 忽略某个类别
    reduction='none'
)
print(cross_entropy_loss(inputs, target))
# tensor([1.3133, 0.1269, 0.1269])

传入 weight 参数

import torch
from torch.nn import CrossEntropyLoss

# create data
inputs = torch.tensor([
    [1, 2],
    [1, 3],
    [1, 3]
], dtype=torch.float)
target = torch.tensor([0, 1, 1], dtype=torch.long)

# 设置 weight,如下表示:表示标签为 0 的样本权重为 1,标签为 1 的权重为 2
weight = torch.tensor([1, 2], dtype=torch.float)
cross_entropy_loss = CrossEntropyLoss(weight, reduction='none')
print(cross_entropy_loss(inputs, target))
# tensor([1.3133, 0.2539, 0.2539]) 

7. NLLLoss

NLLLoss 类,取出真实标签对应的预测分数,并取相反数

import torch
from torch.nn import NLLLoss

# create data
inputs = torch.tensor([
    [1, 2],
    [1, 3],
    [1, 3]
], dtype=torch.float)
target = torch.tensor([0, 1, 1], dtype=torch.long)

nll = NLLLoss(
    weight=None,
    ignore_index=-1,
    reduction='none'
)
print(nll(inputs, target))
# tensor([-1., -3., -3.])

8. PoisonNLLLoss

PoisonNLLLoss 类,对于泊松分布目标的负对数似然损失,计算公式

若参数 log_inputTrue,则 l o s s = e y ^ − y × y ^ loss = e^{\hat{y}} - y \times \hat{y} loss=ey^y×y^

若参数 log_inputFalse,则 l o s s = y ^ − y × log ⁡ ( y ^ + e p s ) loss = \hat{y} - y \times \log{(\hat{y} + eps)} loss=y^y×log(y^+eps)

import torch
from torch.nn import PoissonNLLLoss

# create data
inputs = torch.tensor([
    [0.3, 0.7],
    [0.6, 0.4]
], dtype=torch.float)
target = torch.tensor([
    [1, 0],
    [1, 0]
], dtype=torch.long)

poison_nll_loss = PoissonNLLLoss(
    log_input=True,  # 指示是否输入的预测值已取了对数
    full=False,  # 计算所有 loss,默认为 False
    eps=1e-8,  # 修正项,避免 log_input=False 时对 0 取对数
    reduction='none'
)

print(poison_nll_loss(inputs, target))
# tensor([[1.0499, 2.0138],
#         [1.2221, 1.4918]])

9. KLDivLoss

KLDivLoss 类,计算 KL 散度(KL Divergence),即相对熵

相对熵的理论公式为
D K L ( P ∣ Q ) = E x ∼ p [ P ( x ) Q ( x ) ] = E x ∼ p [ log ⁡ P ( x ) − log ⁡ Q ( x ) ] D_{KL}(P|Q) = E_{x \sim p} [\frac{P(x)}{Q(x)}] = E_{x \sim p} [\log{P(x)} - \log{Q(x)}] DKL(PQ)=Exp[Q(x)P(x)]=Exp[logP(x)logQ(x)]

但 PyTorch 中的计算为
l o s s = y ( log ⁡ y − y ^ ) loss = y (\log{y} - \hat{y}) loss=y(logyy^)

意味着,在输入 y ^ \hat{y} y^ 之前要先计算其 log-probability,可以使用 LogSoftmax 实现

import torch
from torch.nn import KLDivLoss, LogSoftmax

# create data
inputs = torch.tensor([
    [0.5, 0.3, 0.2],
    [0.2, 0.3, 0.5]
])
target = torch.tensor([
    [0.9, 0.05, 0.05],
    [0.1, 0.7, 0.2]
])

# log-probability
inputs = LogSoftmax(dim=1)(inputs)

# reduction 还可以传入 'batchmean',之后版本的 reduction='mean' 的效果将变为与 reduction='batchmean' 相同
kl_div_loss = KLDivLoss(reduction='none')

print(kl_div_loss(inputs, target))
# tensor([[ 0.7510, -0.0928, -0.0878],
#         [-0.1063,  0.5482, -0.1339]])

10. MarginRankingLoss

MarginRankingLoss 类,描述两个向量之间的相似度,常用于排序任务

计算公式: l o s s = m a x { 0 , − y × ( y 1 ^ − y 2 ^ ) + m a r g i n ) } loss = max\{0, -y \times (\hat{y_1} - \hat{y_2}) + margin)\} loss=max{0,y×(y1^y2^)+margin)}

import torch
from torch.nn import MarginRankingLoss

# create data
y1 = torch.tensor([
    [1],
    [2],
    [3]
], dtype=torch.float)
y2 = torch.tensor([
    [2],
    [2],
    [2]
], dtype=torch.float)
y_true = torch.tensor([1, 1, -1], dtype=torch.float)

margin_ranking_loss = MarginRankingLoss(
    margin=0.0,  # 边界值,\hat{y_1} 和 \hat{y_2} 之间的差异值
    reduction='none'
)

# 返回一个 n x n 的 loss 矩阵,
# 第一行表示 y1 中的第一个元素和 y2 中的每一个元素计算的结果,
# 第二行表示 y1 中的第二个元素和 y2 中的每一个元素计算的结果,以此类推。
print(margin_ranking_loss(y1, y2, y_true))
# tensor([[1., 1., 0.],
#         [0., 0., 0.],
#         [0., 0., 1.]])

11. HingeEmbeddingLoss

HingeEmbeddingLoss 类,计算预测与真实之间的相似性,常用于非线性 embedding 和半监督学习。

计算公式:
l o s s = { y ^                          , i f    y = 1 m a x { 0 , Δ − y ^ }    , i f    y = − 1 loss = \begin{cases} \hat{y} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ , if \ \ y = 1 \\ max\{ 0, \Delta - \hat{y} \} \ \ , if \ \ y = -1 \end{cases} loss={y^                        ,if  y=1max{0,Δy^}  ,if  y=1

import torch
from torch.nn import HingeEmbeddingLoss

# create data
# 输入 inputs 应为两个预测之差的绝对值
inputs = torch.tensor([[1., .8, .5]])
target = torch.tensor([[1, 1, -1]])

hinge_embedding_loss = HingeEmbeddingLoss(margin=1.0, reduction='none')
print(hinge_embedding_loss(inputs, target))
# tensor([[1.0000, 0.8000, 0.5000]])

12. MultiLabelMarginLoss

MultiLabelMarginLoss 类,多标签边界损失。计算公式为
l o s s = ∑ i j m a x { 0 , 1 − ( x [ y [ j ] ] − x [ i ] ) } x.size ( 0 ) loss = \sum_{ij} \frac{max\{ 0, 1 - (x[y[j]] - x[i]) \}}{\text{x.size}(0)} loss=ijx.size(0)max{0,1(x[y[j]]x[i])}
where i = 0 i = 0 i=0 to x.size ( 0 ) − 1 \text{x.size}(0)-1 x.size(0)1, j = 0 j = 0 j=0 to y.size ( 0 ) − 1 \text{y.size}(0)-1 y.size(0)1, 0 ≤ y [ j ] ≥ x.size ( 0 ) − 1 0 \leq y[j] \geq \text{x.size}(0)-1 0y[j]x.size(0)1, and i ≠ y [ j ] i \not= y[j] i=y[j] for all i i i and j j j.

import torch
from torch.nn import MultiLabelMarginLoss

# create data
inputs = torch.tensor([[0.1, 0.2, 0.4, 0.8]])
target = torch.tensor([[0, 3, -1, -1]], dtype=torch.long)

multi_label_margin_loss = MultiLabelMarginLoss(reduction='none')
print(multi_label_margin_loss(inputs, target))
# tensor([0.8500])

手动计算

import torch

# create data
inputs = torch.tensor([[0.1, 0.2, 0.4, 0.8]])
target = torch.tensor([[0, 3, -1, -1]], dtype=torch.long)

input_ = inputs[0]
item1 = (1 - (input_[0] - input_[1])) + (1 - (input_[0] - input_[2]))
item2 = (1 - (input_[3] - input_[1])) + (1 - (input_[3] - input_[2]))

print((item1 + item2) / input_.size(0))
# tensor([0.8500])

13. SoftMarginLoss

SoftMarginLoss 类,二分类 logistic 损失函数,计算公式
l o s s = log ⁡ ( 1 + e ( − y × y ^ ) ) loss = \log{(1 + e^{(-y \times \hat{y})})} loss=log(1+e(y×y^))

import torch
from torch.nn import SoftMarginLoss

# create data
inputs = torch.tensor([
    [0.3, 0.7],
    [0.5, 0.5]
])
target = torch.tensor([
    [-1, 1],
    [1, -1]
], dtype=torch.float)

soft_margin_loss = SoftMarginLoss(reduction='none')
print(soft_margin_loss(inputs, target))
# tensor([[0.8544, 0.4032],
#         [0.4741, 0.9741]])

14. MultiLabelSoftMarginLoss

MultiLabelSoftMarginLoss 类,SoftMarginLoss 的多标签版本,计算公式
l o s s = − 1 C ∑ i [ y i log ⁡ ( 1 1 + e − y i ^ ) + ( 1 − y i ) log ⁡ ( e − y i ^ 1 + e − y i ^ ) ] loss = -\frac{1}{C} \sum_i \left[ y_i \log{(\frac{1}{1 + e^{-\hat{y_i}}})} + (1 - y_i) \log{(\frac{e^{-\hat{y_i}}}{1 + e^{-\hat{y_i}}})} \right] loss=C1i[yilog(1+eyi^1)+(1yi)log(1+eyi^eyi^)]
其中, C C C 为标签数, y i y_i yi 表示某一个标签的真实值, y i ^ \hat{y_i} yi^ 表示某一个标签的预测值。

import torch
from torch.nn import MultiLabelSoftMarginLoss

# create data
inputs = torch.tensor([[0.3, 0.7, 0.8]])
target = torch.tensor([[0, 1, 1]], dtype=torch.float)

multi_label_soft_margin_loss = MultiLabelSoftMarginLoss(weight=None, reduction='none')
print(multi_label_soft_margin_loss(inputs, target))
# tensor([0.5429])

手动计算

import torch

# create data
inputs = torch.tensor([[0.3, 0.7, 0.8]])
target = torch.tensor([[0, 1, 1]], dtype=torch.float)

C = 3
i_0 = torch.log(torch.exp(-inputs[0][0]) / (1 + torch.exp(-inputs[0][0])))
i_1 = torch.log(1 / (1 + torch.exp(-inputs[0][1])))
i_2 = torch.log(1 / (1 + torch.exp(-inputs[0][2])))
res = -(1 / C) * (i_0 + i_1 + i_2)
print(res)
# tensor([0.5429])

15. CosineEmbeddingLoss

CosineEmbeddingLoss 类,采用余弦相似度计算两个输入的相似性,常用于非线性 embedding 和半监督学习。计算公式
l o s s = { 1 − cos ⁡ ( x 1 , x 2 )                            , i f    y = 1 m a x { 0 , cos ⁡ ( x 1 , x 2 ) − m a r g i n }    , i f    y = − 1 loss = \begin{cases} 1 - \cos{(x_1, x_2)} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ , if \ \ y = 1 \\ max\{ 0, \cos{(x_1, x_2)} - margin \} \ \ , if \ \ y = -1 \end{cases} loss={1cos(x1,x2)                          ,if  y=1max{0,cos(x1,x2)margin}  ,if  y=1

import torch
from torch.nn import CosineEmbeddingLoss

# create data
inputs1 = torch.tensor([
    [.3, .5, .7],
    [.3, .5, .7]
])
inputs2 = torch.tensor([
    [.1, .3, .5],
    [.1, .3, .5]
])
target = torch.tensor([1, -1], dtype=torch.float)

cosine_embedding_loss = CosineEmbeddingLoss(
    margin=0.0,  # 边界值,可取值范围为 [-1, 1],推荐 [0, 0.5]
    reduction='none'
)
print(cosine_embedding_loss(inputs1, inputs2, target))
# tensor([0.0167, 0.9833])

手动计算

import torch
from torch.nn import CosineEmbeddingLoss

# create data
inputs1 = torch.tensor([
    [.3, .5, .7],
    [.3, .5, .7]
])
inputs2 = torch.tensor([
    [.1, .3, .5],
    [.1, .3, .5]
])
target = torch.tensor([1, -1], dtype=torch.float)


def cosine(a, b):
    numerator = a @ b
    denominator = torch.norm(a, 2) * torch.norm(b, 2)
    return numerator / denominator


res_y_pos = 1 - cosine(inputs1[0], inputs2[0])  # y = 1
res_y_neg = max(0, cosine(inputs1[1], inputs2[1]))  # y = -1
print(res_y_pos, res_y_neg)
# tensor(0.0167) tensor(0.9833)

16. MultiMarginLoss

MultiMarginLoss 类,计算多分类任务的合页损失,计算公式
l o s s = 1 C ∑ i ( m a x { 0 , m a r g i n − x [ y ] + x [ i ] } ) p loss = \frac{1}{C} \sum_i (max\{ 0, margin - x[y] + x[i] \})^p loss=C1i(max{0,marginx[y]+x[i]})p
其中, C C C 为多分类的类别数

import torch
from torch.nn import MultiMarginLoss

# create data
inputs = torch.tensor([
    [0.1, 0.2, 0.7],
    [0.2, 0.5, 0.3]
])
target = torch.tensor([1, 2], dtype=torch.long)

multi_margin_loss = MultiMarginLoss(
    p=1,  # 指数部分的值,可传入 1 或 2
    margin=1.0,  # 边界值
    weight=None,  # 各类别损失的权重
    reduction='none'
)
print(multi_margin_loss(inputs, target))
# tensor([0.8000, 0.7000])

手动计算

import torch

# create data
inputs = torch.tensor([
    [0.1, 0.2, 0.7],
    [0.2, 0.5, 0.3]
])
target = torch.tensor([1, 2], dtype=torch.long)

# 对于第一个样本实例
inputs_ = inputs[0]
margin = 1

i_0 = margin - (inputs_[1] - inputs_[0])  # > 0
i_2 = margin - (inputs_[1] - inputs_[2])  # > 0
res = (i_0 + i_2) / inputs_.size(0)
print(res)
# tensor(0.8000)

17. TripletMarginLoss

TripletMarginLoss 类,计算三元组损失,常用于人脸识别。计算公式
L ( a , p , n ) = m a x { d ( a i , p i ) − d ( a i , n i ) + m a r g i n , 0 } L(a, p, n) = max\{ d(a_i, p_i) - d(a_i, n_i) + margin, 0 \} L(a,p,n)=max{d(ai,pi)d(ai,ni)+margin,0}
其中, d ( x , y ) = ∣ ∣ x − y ∣ ∣ p d(x, y) = ||x - y||_p d(x,y)=xyp

在这里插入代码片import torch
from torch.nn import TripletMarginLoss

# create data
anchor = torch.tensor([[1.]])
pos = torch.tensor([[2.]])
neg = torch.tensor([[0.5]])

triplet_margin_loss = TripletMarginLoss(
    margin=1.0,  # 边界值
    p=2.0,  # 范数的阶,默认为 2
    eps=1e-06,
    swap=False,
    reduction='none'
)
print(triplet_margin_loss(anchor, pos, neg))
# tensor([1.5000])

手动计算

(1. - 2.)**2 - np.sqrt((1. - .5)**2) + 1  # 1.5

18. CTCLoss

计算 CTC (Connectionist Temporal Classificatoin) 损失,用于时序类数据的分类问题。

from torch.nn import CTCLoss

CTCLoss(
    blank=0, # blank label
    reduction='mean',
    zero_infinity=False # 无穷大的值或梯度置零
)

时序建模方面有待进一步了解…

你可能感兴趣的:(PyTorch学习笔记)