论文链接:https://arxiv.org/pdf/1801.07698.pdf
作者开源代码:https://github.com/deepinsight/insightface
人脸识别有两条研究主线,一种是把他当成分类问题,在训练集上采用softmax损失函数训练,另一种是直接在度量空间学习,比如triplet loss。然而这两种方法都有缺陷,对于softmax损失:(1)最后一个全连接层的权重随着训练集身份的增多而线性增加,(2)学习的特征对于闭集问题(也就是训练集)是分开的,但是对于开集(测试集,与训练集身份没有交集)的人脸识别,学习到的特征判别性不够强。对于triplet loss:(1)对于大型数据集,triplet数量爆炸式增长,导致迭代次数急剧增长(2)对于有效的模型训练,semi-hard sample挖掘是一个相对困难的问题。
Additive Angular MarginLoss (ArcFace)进一步提高人脸识别模型的判别能力,以及提升了训练的稳定度(之前的A-softmax训练为了收敛联合softmax一起训练),如下图,在最后一个全连接层的特征和权重归一化后,他们的点积等于余弦距离。可以先通过反余弦函数计算特征与权重向量的角度,然后在这个角度上加上一个Margin。
话不多说,直接上公式:
上式中x为最后面fc层的输出(没有经过softmax层)w为到fc层的权重。
对于arcface loss的数学理论层面的理解可以去看原论文,能理解最好,但是不能理解没有关系,能用上就可以了,只要知道论文中大量实验表明他是有效的即可。那么下面就是怎么去用了。最近想把这个loss移植到行人重识别的torchreid框架中去。遇到的问题,特此记录。
首先,我也是上网找别人写好的代码直接来用的。我用的代码如下:
import math
import torch
import torch.nn as nn
import torch.nn.functional as F
class ArcFaceLoss(nn.Module):
r"""Implement of large margin arc distance: :
Args:
in_features: size of each input sample
out_features: size of each output sample
s: norm of input feature
m: margin
cos(theta + m)
"""
def __init__(self, in_features, num_classes, s=30.0, m=0.50, easy_margin=False, use_gpu=True):
super(ArcFaceLoss, self).__init__()
self.in_features = in_features
self.out_features = num_classes
self.s = s
self.m = m
self.use_gpu = use_gpu
self.logsoftmax = nn.LogSoftmax(dim=1)
# Parameter 的用途:
# 将一个不可训练的类型Tensor转换成可以训练的类型parameter
# 并将这个parameter绑定到这个module里面
# net.parameter()中就有这个绑定的parameter,所以在参数优化的时候可以进行优化的
# https://www.jianshu.com/p/d8b77cc02410
# 初始化权重
self.weight = nn.Parameter(torch.randn(num_classes, in_features))
# self.weight = Parameter(torch.FloatTensor(out_features, in_features))
nn.init.xavier_uniform_(self.weight)
self.easy_margin = easy_margin
self.cos_m = math.cos(m)
self.sin_m = math.sin(m)
self.th = math.cos(math.pi - m)
self.mm = math.sin(math.pi - m) * m
def forward(self, input, label):
# --------------------------- cos(theta) & phi(theta) ---------------------------
# torch.nn.functional.linear(input, weight, bias=None)
# y=x*W^T+b
cosine = F.linear(F.normalize(input), F.normalize(self.weight))
sine = torch.sqrt(1.0 - torch.pow(cosine, 2))
# cos(a+b)=cos(a)*cos(b)-size(a)*sin(b)
phi = cosine * self.cos_m - sine * self.sin_m
if self.easy_margin:
# torch.where(condition, x, y) → Tensor
# condition (ByteTensor) – When True (nonzero), yield x, otherwise yield y
# x (Tensor) – values selected at indices where condition is True
# y (Tensor) – values selected at indices where condition is False
# return:
# A tensor of shape equal to the broadcasted shape of condition, x, y
# cosine>0 means two class is similar, thus use the phi which make it
phi = torch.where(cosine > 0, phi, cosine)
else:
phi = torch.where(cosine > self.th, phi, cosine - self.mm)
# --------------------------- convert label to one-hot ---------------------------
# one_hot = torch.zeros(cosine.size(), requires_grad=True, device='cuda')
# 将cos(\theta + m)更新到tensor相应的位置中
one_hot = torch.zeros(cosine.size(), device='cuda')
# scatter_(dim, index, src)
one_hot.scatter_(1, label.view(-1, 1).long(), 1)
# -------------torch.where(out_i = {x_i if condition_i else y_i) -------------
output = (one_hot * phi) + ((1.0 - one_hot) * cosine)
# you can use torch.where if your torch.__version__ is 0.4
output *= self.s
return output
需要说的有两点把其中一点就是
self.weight = nn.Parameter(torch.randn(num_classes, in_features))
这个weight是需要放进网络训练的,因此需要传进GPU里,不然可能会报错说CPU跑不了。改成下面的形式即可:
if self.use_gpu:
self.weight = nn.Parameter(torch.randn(num_classes, in_features).cuda())
else:
self.weight = nn.Parameter(torch.randn(num_classes, in_features))
第二点就是需要注意一下返回的数据类型,output是一个tensor,并不是最后得到的loss值。它需要经过softmax层然后按照cross entropy loss的计算方式去计算。没有读代码直接去用果然还是不行,然后就把后面计算过程也给加进去了。代码如下:
# ArcFace
import math
import torch
import torch.nn as nn
import torch.nn.functional as F
class ArcFaceLoss(nn.Module):
r"""Implement of large margin arc distance: :
Args:
in_features: size of each input sample
out_features: size of each output sample
s: norm of input feature
m: margin
cos(theta + m)
"""
def __init__(self, in_features, num_classes, s=30.0, m=0.50, easy_margin=False, use_gpu=True):
super(ArcFaceLoss, self).__init__()
self.in_features = in_features
self.out_features = num_classes
self.s = s
self.m = m
self.use_gpu = use_gpu
self.logsoftmax = nn.LogSoftmax(dim=1)
# Parameter 的用途:
# 将一个不可训练的类型Tensor转换成可以训练的类型parameter
# 并将这个parameter绑定到这个module里面
# net.parameter()中就有这个绑定的parameter,所以在参数优化的时候可以进行优化的
# https://www.jianshu.com/p/d8b77cc02410
# 初始化权重
if self.use_gpu:
self.weight = nn.Parameter(torch.randn(num_classes, in_features).cuda())
else:
self.weight = nn.Parameter(torch.randn(num_classes, in_features))
# self.weight = Parameter(torch.FloatTensor(out_features, in_features))
nn.init.xavier_uniform_(self.weight)
self.easy_margin = easy_margin
self.cos_m = math.cos(m)
self.sin_m = math.sin(m)
self.th = math.cos(math.pi - m)
self.mm = math.sin(math.pi - m) * m
def forward(self, input, label):
# --------------------------- cos(theta) & phi(theta) ---------------------------
# torch.nn.functional.linear(input, weight, bias=None)
# y=x*W^T+b
cosine = F.linear(F.normalize(input), F.normalize(self.weight))
sine = torch.sqrt(1.0 - torch.pow(cosine, 2))
# cos(a+b)=cos(a)*cos(b)-size(a)*sin(b)
phi = cosine * self.cos_m - sine * self.sin_m
if self.easy_margin:
# torch.where(condition, x, y) → Tensor
# condition (ByteTensor) – When True (nonzero), yield x, otherwise yield y
# x (Tensor) – values selected at indices where condition is True
# y (Tensor) – values selected at indices where condition is False
# return:
# A tensor of shape equal to the broadcasted shape of condition, x, y
# cosine>0 means two class is similar, thus use the phi which make it
phi = torch.where(cosine > 0, phi, cosine)
else:
phi = torch.where(cosine > self.th, phi, cosine - self.mm)
# --------------------------- convert label to one-hot ---------------------------
# one_hot = torch.zeros(cosine.size(), requires_grad=True, device='cuda')
# 将cos(\theta + m)更新到tensor相应的位置中
one_hot = torch.zeros(cosine.size(), device='cuda')
# scatter_(dim, index, src)
one_hot.scatter_(1, label.view(-1, 1).long(), 1)
# -------------torch.where(out_i = {x_i if condition_i else y_i) -------------
output = (one_hot * phi) + ((1.0 - one_hot) * cosine)
# you can use torch.where if your torch.__version__ is 0.4
output *= self.s
log_probs = self.logsoftmax(output)
# print(output)
# return output.sum()
return (-one_hot * log_probs).mean(0).sum()