论文阅读:Aggregated Residual Transformations for Deep Neural Networks(ResNeXt)

文章目录

      • 1、论文总述
      • 2、vgg/resnet 与 Inception系列网络设计机制的不同点
      • 3、ResNeXt并不是ensembling
      • 4、两个设计准则
      • 5、split-transform-merge的本质
      • 6、BN和Relu的位置

1、论文总述

这篇论文提出的网络是resnet的升级版,设计思想结合了vgg/resnet 的stacking building blocks 以及 Inception系列的 split - transform - merge,ResNeXt中的next是指作者在这篇论文中提出了另一个维度:cardinality,作者认为这是了 depth 和 width 维度以外另一个可以提升网络性能的维度,并且作者在论文实验部分还验证了 cardinality这个维度比depth 和 width这俩维度提升网络性能更有效,cardinality就是split分支的个数 。其中 作者在实现这个提出的block时利用了分组卷积,作者也说明这是第一篇利用 分组卷积 来提升网络性能的工作,同样的模型复杂度下,性能超越resnet。
注: depth指网络的层数,width指feature map的通道个数;ResNeXt 32*4d 指32个cardinality,每个cardinality的width为4。

论文阅读:Aggregated Residual Transformations for Deep Neural Networks(ResNeXt)_第1张图片
论文阅读:Aggregated Residual Transformations for Deep Neural Networks(ResNeXt)_第2张图片

In this paper, we present a simple architecture which
adopts VGG/ResNets’ strategy of repeating layers, while
exploiting the split-transform-merge strategy in an easy, extensible way. A module in our network performs a set
of transformations , each on a low-dimensional embedding,
whose outputs are aggregated by summation. We pursuit a
simple realization of this idea — the transformations to be
aggregated are all of the same topology (e.g., Fig. 1 (right)).
This design allows us to extend to any large number of
transformations without specialized designs.
Interestingly, under this simplified situation we show that
our model has two other equivalent forms (Fig. 3). The reformulation in Fig. 3(b) appears similar to the InceptionResNet module [37] in that it concatenates multiple paths;
but our module differs from all existing Inception modules
in that all our paths share the same topology and thus the
number of paths can be easily isolated as a factor to be investigated. In a more succinct reformulation, our module
can be reshaped by Krizhevsky et al.’s grouped convolutions [24] (Fig. 3©), which, however, had been developed
as an engineering compromise.
We empirically demonstrate that our aggregated transformations outperform the original ResNet module, even
under the restricted condition of maintaining computational
complexity and model size — e.g., Fig. 1(right) is designed
to keep the FLOPs complexity and number of parameters of
Fig. 1(left). We emphasize that while it is relatively easy to
increase accuracy by increasing capacity (going deeper or
wider), methods that increase accuracy while maintaining
(or reducing) complexity are rare in the literature.

2、vgg/resnet 与 Inception系列网络设计机制的不同点

1、vgg与resnet的设计是堆叠同一类型的block
The VGG-nets [36] exhibit a simple yet effective strategy of constructing very deep networks : stacking build-
ing blocks of the same shape.
This strategy is inherited
by ResNets [14] which stack modules of the same topology. This simple rule reduces the free choices of hyperparameters, and depth is exposed as an essential dimension
in neural networks. Moreover, we argue that the simplicity
of this rule may reduce the risk of over-adapting the hyperparameters to a specific dataset.
2、Inception系列的设计是spilt - transform - merge
Unlike VGG-nets, the family of Inception models [38,
17, 39, 37] have demonstrated that carefully designed
topologies are able to achieve compelling accuracy with low
theoretical complexity. The Inception models have evolved
over time [38, 39], but an important common property is
a split-transform-merge strategy. In an Inception module,
the input is split into a few lower-dimensional embeddings
(by 1×1 convolutions), transformed by a set of specialized
filters (3×3, 5×5, etc.), and merged by concatenation. It
can be shown that the solution space of this architecture is a
strict subspace of the solution space of a single large layer
(e.g., 5×5) operating on a high-dimensional embedding.
The split-transform-merge behavior of Inception modules
is expected to approach the representational power of large
and dense layers, but at a considerably lower computational
complexity.
以下是Inception系列的缺点:超参数太多,换个数据集还得重新调参
Despite good accuracy, the realization of Inception models has been accompanied with a series of complicating fac-
tors — the filter numbers and sizes are tailored for each
individual transformation, and the modules are customized
stage-by-stage. Although careful combinations of these
components yield excellent neural network recipes, it is in
general unclear how to adapt the Inception architectures to
new datasets/tasks, especially when there are many factors
and hyper-parameters to be designed.

说明:虽然作者提出的resnext与Inception系列看着有点像,但是设计起来比Inception系列的简单很多,只是相同block的堆叠而已。

3、ResNeXt并不是ensembling

Averaging a set of independently trained networks is an effective solution to improving accuracy [24],
widely adopted in recognition competitions [33]. Veit et al. [40] interpret a single ResNet as an ensemble of shallower
networks, which results from ResNet’s additive behaviors
[15]. Our method harnesses additions to aggregate a set of
transformations. But we argue that it is imprecise to view
our method as ensembling, because the members to be aggregated are trained jointly, not independently.

作者的解释为:因为ResNeXt是联合训练的,并不是分开单独训练的

4、两个设计准则

These blocks have the same topology, and are
subject to two simple rules inspired by VGG/ResNets:
(i)
if producing spatial maps of the same size, the blocks share
the same hyper-parameters (width and filter sizes), and
(ii)
each time when the spatial map is downsampled by a factor of 2, the width of the blocks is multiplied by a factor
of 2.
The second rule ensures that the computational complexity, in terms of FLOPs (floating-point operations, in #
of multiply-adds), is roughly the same for all blocks.

论文阅读:Aggregated Residual Transformations for Deep Neural Networks(ResNeXt)_第3张图片

5、split-transform-merge的本质

论文阅读:Aggregated Residual Transformations for Deep Neural Networks(ResNeXt)_第4张图片

论文阅读:Aggregated Residual Transformations for Deep Neural Networks(ResNeXt)_第5张图片
关于 low-dimensional embedding的理解,请参考:深度学习中 Embedding层两大作用的个人理解

6、BN和Relu的位置

Our models are realized by the form of Fig. 3©. We
perform batch normalization (BN) [17] right after the con-
volutions in Fig. 3©.6 ReLU is performed right after each
BN, expect for the output of the block where ReLU is performed after the adding to the shortcut, following [14].

6 With BN, for the equivalent form in Fig. 3(a), BN is employed after
aggregating the transformations and before adding to the shortcut.

参考文献
1、Aggregated Residual Transformations for Deep Neural Networks - arxiv 16.11
2、【ResNext】《Aggregated Residual Transformations for Deep Neural Networks》

你可能感兴趣的:(论文阅读,目标分类)