Densely Connected Convolutional Networks(引言翻译(有选择性))


Densely Connected Convolutional Networks(引言翻译(有选择性))_第1张图片

as information about the input or gradient passes through many layers, it can vanish and "wash out" by the time it reaches the end (or beginning) of the network.




ResNets [11] and Highway Networks [34] bypass signal from one layer to the next via identity connections. Stochastic depth [13] shortens ResNets by randomly dropping layers during training to allow better information and gradient flow.

ResNets[11]和Highway Networks[34]通过身份连接将信号从绕过一层,跳跃到下一层。


FractalNets [17] repeatedly combine several parallel layer sequences with different number of convolutional blocks to obtain a large nominal depth, while maintaining many short paths in the network.


Although these different approaches vary in network topology and training procedure, they all share a key characteristic: they create short paths from early layers to later layers.



In this paper, we propose an architecture that distills this insight into a simple connectivity pattern: to ensure maximum information flow between layers in the network,

we connect all layers (with matching feature-map sizes) directly with each other



To preserve the feed-forward nature, each layer obtains additional inputs from all preceding layers and passes on its own feature-maps to all subsequent layers.


Figure 1 illustrates this layout schematically.


Crucially, in contrast to ResNets, we never combine features through summation before they are passed into a layer; instead, we combine features by concatenating them. Hence, the l th layer has l inputs, consisting of the feature-maps of all preceding convolutional blocks.

至关重要的是,与 ResNets 相比,我们在特征传递到层之前从未通过求和来组合特征;



Its own feature-maps are passed on to all L−` subsequent layers. This introducesL(L+1) 2 connections in an L-layer network, instead of justL, as in traditional architectures. Because of its dense connectivity pattern, we refer to our approach as Dense Convolutional Network (DenseNet).

它将自己的特征图传递到所有 L-` 后续层。

这也意味着在 L 层网络中引入了 L(L+1) 2 个连接,而不是像传统架构中那样仅有 L个连接。

由于其密集连接的模式,我们将这个方法称为密集卷积网络 (DenseNet)。

A possibly counter-intuitive effect of this dense connectivity pattern is that it requires fewer parameters than traditional convolutional networks, as there is no need to relearn redundant feature-maps. Traditional feed-forward architectures can be viewed as algorithms with a state, which is passed on from layer to layer.




Each layer reads the state from its preceding layer and writes to the subsequent layer. It changes the state but also passes on information that needs to be preserved. ResNets [11] make this information preservation explicit through additive identity transformations. Recent variations of ResNets [13] show that many layers contribute very little and can in fact be randomly dropped during training.


ResNets [11] 通过加性恒等变换使这些信息保存下来。

ResNets 的最新变体 [13] 表明,许多层贡献很少,实际上可以在训练期间随机丢弃。

This makes the state of ResNets similar to (unrolled) recurrent neural networks [21], but the number of parameters of ResNets is substantially larger because each layer has its own weights.

这使得 ResNet 的状态类似于(展开)循环神经网络 [21],

但 ResNet 的参数数量要大得多,因为每一层都有自己的权重。

Our proposed DenseNet architecture explicitly differentiates between information that is added to the network and information that is preserved. DenseNet layers are very narrow (e.g., 12 filters per layer), adding only a small set of feature-maps to the "collective knowledge" of the network and keep the remaining featuremaps unchanged—and the final classifier makes a decision based on all feature-maps in the network.

我们提出的 DenseNet 架构明确区分了添加到网络的信息和需要保留的信息。

DenseNet 层非常窄(例如,每层 12 个过滤器),



Besides better parameter efficiency, one big advantage of DenseNets is their improved flow of information and gradients throughout the network, which makes them easy to train. Each layer has direct access to the gradients from the loss function and the original input signal, leading to an implicit deep supervision [20]. This helps training of deeper network architectures. Further, we also observe that dense connections have a regularizing effect, which reduces overfitting on tasks with smaller training set sizes.

除了更好的参数效率外,DenseNets 的一大优势是它们在整个网络中改进了信息和梯度流





We evaluate DenseNets on four highly competitive benchmark datasets (CIFAR-10, CIFAR-100, SVHN, and ImageNet).


Our models tend to require much fewer parameters than existing algorithms with comparable accuracy. Further, we significantly outperform the current state-of the-art results on most of the benchmark tasks.


