arXiv-2021
尺度变化是人脸检测中最具挑战性的问题之一
Modern face detectors employ feature pyramids to deal with scale variation
但是特征金字塔存在如下问题:
it might break the feature consistency across different scales of faces(想表达的是一定范围的尺寸人脸,eg 50~100大小的,都落在一张特征图上,怕特征图 hold 不住)
作者对特征金字塔进行改进,提出 EMFace(EXPLORING RECEPTIVE)
The branch pooling balances the representations of parallel branches during training and enables a single branch to implement inference during testing
RFP 的细节如下
先经过三个权重共享的空洞卷积+残差结构
RFP 输入输出维度相同
代码:https://github.com/emdata-ailab/EMface
class MRF(nn.Module):
def __init__(self,in_planes):
super(MRF,self).__init__()
self.share_weight=nn.Parameter(torch.randn(in_planes,in_planes,3,3)) # 共享权重,卷积核 3x3
self.bn1=nn.BatchNorm2d(in_planes)
self.bn2=nn.BatchNorm2d(in_planes)
self.bn3=nn.BatchNorm2d(in_planes)
self.relu1=nn.ReLU(inplace=True)
self.relu2=nn.ReLU(inplace=True)
self.relu3=nn.ReLU(inplace=True)
def forward(self,x):
residual=x
x1=F.conv2d(x,self.share_weight, stride=1, padding=1,bias=None, dilation=1)
x1=self.bn1(x1)
x1=x1+residual
x1=self.relu1(x1)
x2=F.conv2d(x,self.share_weight, stride=1, padding=3,bias=None, dilation=3)
x2=self.bn2(x2)
x2=x2+residual
x2=self.relu2(x2)
x3=F.conv2d(x,self.share_weight,stride=1, padding=5,bias=None, dilation=5)
x3=self.bn3(x3)
x3=x3+residual
x3=self.relu3(x3)
y=torch.cat((x1, x2, x3), dim=1) # (n, 3*in_planes, h, w)
b,c,h,w=y.size()
y=y.view(b,3,c//3,h,w) # (n, 3, in_planes, h, w)
y1=y.mean(dim=1,keepdim=True) # (n, 1, in_planes, h, w)
y=y1.view(b,-1,h,w) # (n, in_planes, h, w)
return y
调用
self.MR1=MRF(256)
self.MR2=MRF(256)
self.MR3=MRF(256)
self.MR4=MRF(256)
self.MR5=MRF(256)
self.MR6=MRF(256)
ResNet50 + FPN
WIDER FACE and UFDD
metrics 为 AP
(1)Number of Branches
(2)Weight Sharing.
RFP 中 multi-branch convolution 的权重 share 参数量减少很多,精度略微下降
(3)Branch Pooling.
训练的时候 3 branches,测试的时候选择不同的输出方式,输出数量,输出组合形式(BP,add,concat)
we drop out the Branch-1 and Branch-3 (d=1 and d=5 in Figure 4) in RFP and only keep the Branch-2 to output in the
inference phase.
作者测试时最终仅保留了 branch-2 作为输出