TMM-2019
行人检测应用很广泛,security and surveillance, mobile robotics, autonomous driving, and crowd sourcing 等
行人检测效果受检测算法和数据影响,两者交替螺旋上升
there is a gap in the diversity and density between real world requirements and current pedestrian detection benchmarks
本文作者提出了 a large and diverse dataset named WiderPerson for dense pedestrian detection in the wild,29.87 annotations per image
Google, Bing, and Baidu
超 50 个关键字 (e.g., ppedestrian, cyclist, walking, running, marathon, square dance and group photo)
∼50, 000 候选图片,筛掉后剩 13, 382
训练,验证,测试数量为 8, 000, 1, 000 and 4, 382
标注头顶和脚的中心两个点,然后根据固定长宽比 w h = 0.41 \frac{w}{h} = 0.41 hw=0.41 生成矩形框标签
假人也会标注,例如 human on the posters, reflections, mannequin and statues, etc
标完后 three-fold cross-validation to check the annotations strictly.(三个人复查,一半以上认为有瑕疵就返工)
1)Capacity
define three levels of difficulty
用 Edgebox 方法产生的 proposal 来试试召回率,反映不同难度数据之间的差异
2)Scale
横坐标 bbox 的高度像素数,纵坐标为频次
评价指标,MR,越小越好
average log miss rate over false positives per-image(FPPI) ranging in [ 1 0 − 2 , 1 0 0 ] [10^{−2}, 10^{0}] [10−2,100]
riders / partially-visible persons / crowd / ignore regions are ignored
11 different anchor-box scales and 1 aspect ratio (w/h = 0.41)
1)Finer Feature Map
删除了第四次 down-sampling operation,输出特征图 stride 16 改为了 stride 8
all layers before the fourth down-sampling operation are unchanged and all convolutional filters after it are modified by the “hole algorithm”
2)Ignore Region and Tiny Pedestrian Handling
3)RoI Feature Enhancing
4)Dynamic Sample Strategy
Faster RCNN 中,256 and 128 samples for RPN and Fast R-CNN with 1 : 1 and 1 : 3 positive-negative ratio
作者的数据集平均每张图上的目标数量约为 28.87,the fixed sample strategy will lead to inadequate use of training positive samples
作者的改进(dynamic sample strategy)
if there are too many positive samples, we determine the number of negative samples based on the above positive-negative ratio to ensure that all positive samples are used, otherwise we follow the original strategy.
【Focal Loss】《Focal Loss for Dense Object Detection》
prevent the sampling of background boxes in those ignored areas
1)Detection Results
WiderPerson
Caltech-USA
WdierPerson ⇒ \Rightarrow ⇒ Caltech-USA 表示在 WdierPerson 上预训练,Caltech-USA 上 fine-tune,双斜杠下面是 SOTA
看看 demo
2)Quantity Analysis
a log arithmic relation between the amount of training data and the performance of deep learning methods.
3)Quality Analysis
adding fine-grained annotations for riders is helpful for the pedestrian detection performance
4)Error Analysis
LOC indicates the localization errors that occurs when a pedestrian is detected with a misaligned bounding box, and BG indicates that a background region is mistakenly detected as a pedestrian
pre-trained + fine-tune
1)Caltech
Table VII
2)CityPersons
Table VIII
NMS,usually not trained but has a great influence on detection performance.
降低 false positive
如何让目标检测算法暂停误报?