翻译得有可能会不太专业,望见谅的同时,如果有些地方翻译错了,欢迎批评指正!
as information about the input or gradient passes through many layers, it can vanish and "wash out" by the time it reaches the end (or beginning) of the network.
当有关输入或梯度传递的信息通过许多层时,
当它到达网络的末端(或开始)时,
它可能会消失。
ResNets [11] and Highway Networks [34] bypass signal from one layer to the next via identity connections. Stochastic depth [13] shortens ResNets by randomly dropping layers during training to allow better information and gradient flow.
ResNets[11]和Highway Networks[34]通过身份连接将信号从绕过一层,跳跃到下一层。
随机深度[13]通过在训练期间随机丢弃一些层来缩短ResNets的来获取更好的信息和梯度流。
FractalNets [17] repeatedly combine several parallel layer sequences with different number of convolutional blocks to obtain a large nominal depth, while maintaining many short paths in the network.
FractalNets[17]将几个具有不同数量卷积块的并行层序列重复组合,以获得较大的标称(名义上的)深度,同时在网络中保持许多短的连接路径。
Although these different approaches vary in network topology and training procedure, they all share a key characteristic: they create short paths from early layers to later layers.
尽管这些不同的方法在网络拓扑和训练过程中有所不同,
但它们都有一个关键特征:它们都建立了从浅层到深层的短的连接路径。
In this paper, we propose an architecture that distills this insight into a simple connectivity pattern: to ensure maximum information flow between layers in the network,
we connect all layers (with matching feature-map sizes) directly with each other
在本文中,我们提出了一种架构,将这一见解提炼成一个简单的连接模式:
为了确保网络中各层之间的最大信息流,我们将所有层(具有匹配的特征映射大小)直接彼此连接。
To preserve the feed-forward nature, each layer obtains additional inputs from all preceding layers and passes on its own feature-maps to all subsequent layers.
为了保留前馈层的优点,每一层都从所有前面的层获取额外的输入,并将其自己的特征图传递给所有后续的层。
Figure 1 illustrates this layout schematically.
图1以示意图的方式说明了这种布局。
Crucially, in contrast to ResNets, we never combine features through summation before they are passed into a layer; instead, we combine features by concatenating them. Hence, the l th layer has l inputs, consisting of the feature-maps of all preceding convolutional blocks.
至关重要的是,与 ResNets 相比,我们在特征传递到层之前从未通过求和来组合特征;
相反,我们通过连接它们来组合特征。
因此,第l层有l个输入,由所有先前卷积块的特征图组成。
Its own feature-maps are passed on to all L−` subsequent layers. This introducesL(L+1) 2 connections in an L-layer network, instead of justL, as in traditional architectures. Because of its dense connectivity pattern, we refer to our approach as Dense Convolutional Network (DenseNet).
它将自己的特征图传递到所有 L-` 后续层。
这也意味着在 L 层网络中引入了 L(L+1) 2 个连接,而不是像传统架构中那样仅有 L个连接。
由于其密集连接的模式,我们将这个方法称为密集卷积网络 (DenseNet)。
A possibly counter-intuitive effect of this dense connectivity pattern is that it requires fewer parameters than traditional convolutional networks, as there is no need to relearn redundant feature-maps. Traditional feed-forward architectures can be viewed as algorithms with a state, which is passed on from layer to layer.
这种密集连接模式的看上去可能违反直觉,但是它需要的参数比传统的深度卷积网络要少,
因为不需要重新学习冗余的特征图。传统的前馈架构可以看作是一个具有状态的算法,
它在层之间传递。
Each layer reads the state from its preceding layer and writes to the subsequent layer. It changes the state but also passes on information that needs to be preserved. ResNets [11] make this information preservation explicit through additive identity transformations. Recent variations of ResNets [13] show that many layers contribute very little and can in fact be randomly dropped during training.
每层从其前一层读取状态并写入后续层。它改变了状态,但也传递了需要保留的信息。
ResNets [11] 通过加性恒等变换使这些信息保存下来。
ResNets 的最新变体 [13] 表明,许多层贡献很少,实际上可以在训练期间随机丢弃。
This makes the state of ResNets similar to (unrolled) recurrent neural networks [21], but the number of parameters of ResNets is substantially larger because each layer has its own weights.
这使得 ResNet 的状态类似于(展开)循环神经网络 [21],
但 ResNet 的参数数量要大得多,因为每一层都有自己的权重。
Our proposed DenseNet architecture explicitly differentiates between information that is added to the network and information that is preserved. DenseNet layers are very narrow (e.g., 12 filters per layer), adding only a small set of feature-maps to the "collective knowledge" of the network and keep the remaining featuremaps unchanged—and the final classifier makes a decision based on all feature-maps in the network.
我们提出的 DenseNet 架构明确区分了添加到网络的信息和需要保留的信息。
DenseNet 层非常窄(例如,每层 12 个过滤器),
仅向网络的“集体知识”添加一小组特征图并保持其余特征图不变
——最终的分类器根据网络中的所有特征图做出决定。
Besides better parameter efficiency, one big advantage of DenseNets is their improved flow of information and gradients throughout the network, which makes them easy to train. Each layer has direct access to the gradients from the loss function and the original input signal, leading to an implicit deep supervision [20]. This helps training of deeper network architectures. Further, we also observe that dense connections have a regularizing effect, which reduces overfitting on tasks with smaller training set sizes.
除了更好的参数效率外,DenseNets 的一大优势是它们在整个网络中改进了信息和梯度流,
这使得它们易于训练。
每一层都可以直接访问损失函数和原始输入信号的梯度,从而得到隐式深度监督[20]。
这有助于训练更深层次的网络架构。
此外,我们还观察到密集连接具有正则化效果,这减少了对训练集大小较小的任务的过度拟合。
We evaluate DenseNets on four highly competitive benchmark datasets (CIFAR-10, CIFAR-100, SVHN, and ImageNet).
我们在四个极具竞争力的基准数据集(CIFAR-10、CIFAR-100、SVHN和ImageNet)上评估DenseNet。
Our models tend to require much fewer parameters than existing algorithms with comparable accuracy. Further, we significantly outperform the current state-of the-art results on most of the benchmark tasks.
我们的模型通常需要比现有算法更少的参数,同时具有与其相当的准确度。
此外,我们在大多数基准测试任务上明显优于当前最先进的结果。