M2Det 阅读

@InProceedings{
author = {Qijie Zhao, Tao Sheng1,Yongtao Wang, Zhi Tang, Ying Chen, Ling Cai and Haibin Ling},
title = {M2Det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid Network},
booktitle = {The AAAI Conference on Artificial Intelligence},
year = {2018}
}

Problem

Two strategies to solve the Scale Variation.

  1. 使用image pyramid at the testing time,增加memory, computational complexity
  2. 使用feature pyramid at both training and testing phases, more efficient

Limitation

simply construct the feature pyramid according to the inherent multi-scale.
pyramidal architecture of the backbones which are actually designed for object classification task.
分类网络提取特征对检测任务的表达能力不够
金字塔每层特征主要由backbone的single-level layers构建,语义信息不足
高层特征:分类、simple appearances。低层特征:位置回归、complex appearances

Solution

提出新的构建feature pyramid的方法来提取feature,Multi-Level Feature Pyramid Network.
金字塔的每层feature map
都由来自多个levellayers构建。6 scales and 8 levels

M2Det 阅读_第1张图片

M2Det 阅读_第2张图片M2Det 阅读_第3张图片

FFMv1 enriches semantic information into base features by fusing feature maps of the backbone.

Each TUM generates a group of multi-scale features, and then the alternating joint TUMs and FFMv2s extract multi-level multiscale features.

SFAM aggregates the features into the multi-level feature pyramid through a scale-wise feature concatenation operation and an adaptive attention mechanism. TUMs生成的多级多尺度特征集合成一个多级特征金字塔

At the detection stage, 每个pyramid feature后接一个location regression conv + 一个classification convFeature map 每个pixel6 anchors with 3 ratios。最后soft-NMS

M2Det 阅读_第4张图片 M2Det 阅读_第5张图片

Discussion

  1. detection accuracy improvement of M2Det is mainly brought by the proposed MLFPN.
    (1) 使用更深层feature构建金字塔,more representative
    (2) 用于最后detection的每种scale feature map都由多个层来构建, better for handling appearance-complexity variation across object instances.
    It learns very effective features to handle scale variation and appearance-complexity variation across object instances;
    It is necessary to use multi-level features to detect objects with similar size
  2. 虽然结构花哨,尤其是TUM,但speed不低。TUM轻量
  3. 增加TUM的个数比增加TUM的channel都能带来的提升,但增TUM个数更有效

你可能感兴趣的:(paper,reading)