Training Very Deep Networks--Highway Networks

网上有传言 微软的深度残差学习是抄袭 Highway Networks,只是Highway Networks的一个特例。Highway Networks 的确是先发表的。

http://people.idsia.ch/~rupesh/very_deep_learning/

有开源代码

Our Highway Networks take inspiration from Long Short Term Memory (LSTM) and allow training of deep, efficient networks (even with hundreds of layers) with conventional gradient-based methods. Even when large depths are not required, highway layers can be used instead of traditional neural layers to allow the network to adaptively copy or transform representations

我们这个高速CNN网络受 LSTM启发,可以使用传统基于梯度的方法快速训练深度网络(几百层的)。即使不需要大的深度,高速网络也可以自适应表示合适的特征变换。

2 Highway Networks
一般一个 plain feedforward neural network 有L层网络组成,每层网络对输入进行一个非线性映射变换,可以表达如下

一般后续还有其他处理,例如非线性激活函数, convolutional or recurrent
对于高速CNN网络,我们定义一层网络如下

We refer to T as the transform gate and C as the carry gate
T和C分别表示 对输入的映射和直接传送。
在这篇文献中我们设置 C=1-T,则得到下式
这里写图片描述
上公式中参数的维数须一致。
我们观察到,对于特殊的T:
这里写图片描述
for the Jacobian of the layer transform:
这里写图片描述

Thus, depending on the output of the transform gates, a highway layer can smoothly vary its behavior between that of H and that of a layer which simply passes its inputs through

2.1 Constructing Highway Networks
如果 x,y,H,T的维数不一致,可以通过处理使其一致。

2.2 Training Deep Highway Networks
我们定义 transform gate 如下
这里写图片描述
W是权重矩阵, b是 bias 向量
This suggests a simple initialization scheme which is independent of the nature of H: b T can be initialized with a negative value (e.g. -1, -3 etc.) such that the network is initially biased towards carry behavior. This scheme is strongly inspired by the proposal [30] to initially bias the gates in an LSTM network, to help bridge long-term temporal dependencies early in learning

  初始化时可以给b初始化一个负值,相当于网络在开始的时候侧重于搬运行为(carry behavior),就是什么处理都不做。这个主要是受文献【30】启发。我们的实验也证明了这个推测是正确的。

Training Very Deep Networks--Highway Networks_第1张图片

Training Very Deep Networks--Highway Networks_第2张图片

你可能感兴趣的:(Training Very Deep Networks--Highway Networks)