论文阅读:Learning Data Augmentation Strategies for Object Detection

文章目录

      • 1、论文总述
      • 2、学习数据增强这方面的学术进展
      • 3、本文对离散优化问题所使用的方法
      • 4、搜索出来的最有效的数据增强手段
      • 5、实验结果表格
      • 6、两个有趣的发现
      • 7、 Learned data augmentation improves model regularization

1、论文总述

这篇论文是在Autoaugment基础上探索目标检测里的数据增强的自动学习,Autoaugment是针对分类网络的,论文最主要的内容是先罗列了很多的数据增强手段,然后分成5个sub_policy,在这些空间中搜索最优的不同数据增强手段的集合,由于搜索空间太大,作者使用了强化学习和RNN手段,最后是验证了学习得来的数据增强手段在不同的backbone 、不同的目标检测模型、 不同的数据集以及不同大小的数据集上都有效果。
关于这篇论文有个博客总结的不错,可以配合着一起阅读。

In this work, we create a set of simple transformations
that may be applied to object detection datasets and then
transfer these transformations to other detection datasets
and architectures. These transformations are only used during training and not test time. Our transformations include
those that can be applied to the whole image without affecting the bounding box locations (e.g. color transformations borrowed from image classification models), transformations that affect the whole image while changing the
bounding box locations (e.g., translating or shearing of the
whole image), and transformations that are only applied to
objects within the bounding boxes.
作者把数据增强分为三个方面
As the number of transformations becomes large, it becomes non-trivial to manually combine them effectively. We therefore search for
policies specifically designed for object detection datasets.
Experiments show that this method achieves very good performance across different datasets, dataset sizes, backbone
architectures and detection algorithms. Additionally, we investigate how the performance of a data augmentation policy depends on the number of operations included in the
search space and how the effective of the augmentation
technique varies as dataset size changes.

作者把数据增强分为三个方面:1是对整张图像的,例如亮度 对比度之类的;2是同时影响图像和bbox的,比如翻转 crop操作等;3是只影响bbox的,例如文中提到的BBox Only ShearY,只对bbox内进行改变(平移或者crop)

2、学习数据增强这方面的学术进展

To avoid the data-specific nature of data augmentation,
recent work has focused on learning data augmentation
strategies directly from data itself. For example, Smart
Augmentation uses a network that generates new data by
merging two or more samples from the same class [22].
Tran et al. generate augmented data, using a Bayesian approach, based on the distribution learned from the training
set [45]. DeVries and Taylor used simple transformations
like noise, interpolations and extrapolations in the learned
feature space to augment data [8]. Ratner et al., used generative adversarial networks to generate sequences of data
augmentation operations [37]. More recently, several papers used the AutoAugment [5] search space with improved
the optimization algorithms to find AutoAugment policies
more efficiently [17, 23].

以上都是对分类网络的数据增强进行学习
这篇是对检测网络进行数据增强的学习:

While all of the above approaches have worked on classification problems, we take an automated approach to finding optimal data augmentation policies for object detection.
Unlike classification, labeled data for object detection is
more scarce because it is more costly to annotate detection
data. Compared to image classification, developing a data
augmentation strategy for object detection is harder because
there are more ways and complexities introduced by distorting the image, bounding box locations, and the sizes of the
objects in detection datasets

3、本文对离散优化问题所使用的方法

Many methods exist for addressing the discrete optimization problem including reinforcement learning [55],
evolutionary methods [38] and sequential model-based optimization [26]. In this work, we choose to build on previous work by structuring the discrete optimization problem (用RNN和增强学习)
as the output space of an RNN and employ reinforcement
learning to update the weights of the model [55].
The training setup for the RNN is similar to [55, 56, 6, 5]. We employ
the proximal policy optimization (PPO) [41] for the search
algorithm. The RNN is unrolled 30 steps to predict a single augmentation policy. The number of unrolled steps, 30,
corresponds to the number of discrete predictions that must
be made in order to enumerate 5 sub-policies. Each subpolicy consists of 2 operations and each operation consists
of 3 predictions corresponding to the selected image transformation, probability of application and magnitude of the
transformation.

4、搜索出来的最有效的数据增强手段

Upon inspection, the most commonly used operation in good policies is
Rotate, which rotates the whole image and the bounding
boxes. The bounding boxes end up larger after the rotation,
to include all of the rotated object. Despite this effect of the
Rotate operation, it seems to be very beneficial: it is the
most frequently used operation in good policies. Two other
operations that are commonly used are Equalize and
BBox Only TranslateY. Equalize flattens the histogram of the pixel values, and does not modify the location
or size of each bounding box. BBox Only TranslateY
translates only the objects in bounding boxes vertically, up
or down with equal probability.

3个:Rotate Equalize and BBox Only TranslateY

5、实验结果表格

论文阅读:Learning Data Augmentation Strategies for Object Detection_第1张图片

6、两个有趣的发现

第一个有趣发现:小目标效果明显
It is interesting to note that models trained with learned augmentation policy seem to do especially well on detecting
smaller objects, especially when fewer images are present in
the training dataset. For example, for small objects, applying the learned augmentation policy seems to be better than
increasing the dataset size by 50%, as seen in Table. 5. For
small objects, training with the learned augmentation policy
with 9000 examples results in better performance than the
baseline when using 15000 images. In this scenario using
our augmentation policy is almost as effective as doubling
your dataset size
第二个有趣发现:AP75效果明显
Another interesting behavior of models trained with the
learned augmentation policy is that they do relatively better
on the harder task of AP75 (average precision IoU=0.75). In
Fig. 4, we plot the percentage improvement in mAP, AP50,
and AP75 for models trained with the learned augmentation policy (relative to baseline augmentation). The relative
improvement of AP75 is larger than that of AP50 for all
training set sizes. The learned data augmentation is particularly beneficial at AP75 indicating that the augmentation
policy helps with more precisely aligning the bounding box
prediction. This suggests that the augmentation policy particularly helps with learned fine spatial details in bounding
box position – which is consistent with the gains observed
with small objects.

论文阅读:Learning Data Augmentation Strategies for Object Detection_第2张图片论文阅读:Learning Data Augmentation Strategies for Object Detection_第3张图片论文阅读:Learning Data Augmentation Strategies for Object Detection_第4张图片论文阅读:Learning Data Augmentation Strategies for Object Detection_第5张图片

7、 Learned data augmentation improves model regularization

论文阅读:Learning Data Augmentation Strategies for Object Detection_第6张图片

你可能感兴趣的:(论文阅读)