Do Deep Nets Really Need to be Deep?

一、主要思想

用一种模型压缩( Model Compression)[2]的方法训练浅层网络来模仿深层网络,得到只有一个隐藏层的浅层网络。 Shallow nets can be trained that perform similarly to complex, well-engineered, deeper convolutional architectures,文章通过实验验证了这一结论,并推论 there probably exist better algorithms for training shallow feed-forward nets than those currently available only by deep models.


Do Deep Nets Really Need to be Deep?_第1张图片
  • 当复杂模型可以被浅层模型来模拟时,就说明复杂模型的内在 function 并不是真正复杂的。模型的复杂性,和模型表达能力的复杂度是两回事。

二、 Training Shallow Nets to Mimic Deep Nets

1、training a state-of-the-art deep model

2、training a shallow model to mimic the deep model.

1. Model Compression

        将未标签的数据输入 teacher model,得到的分数拿去训练 student model model,it is trained to learn the function that was learned by the larger mode。主要问题在于学习模型的复杂程度和达到最好学习效果的 size of the representation。
       浅层网络在原始数据上直接训练要比深层网络更容易过拟合,所以采用模型压缩的方法就相当于一种正则化手段来缩小浅层网络和深层网络之间的 gap,如下图

Do Deep Nets Really Need to be Deep?_第2张图片
gap

2.Mimic Learning via Regressing Logit with L2 Loss

        shallow mimic models 的训练采用 softmax 层前的 Logit 输出, The logit values provide richer information to student to mimic the exact behaviours of a teach model.

3.Speeding-up Mimic Learning by Introducing a Linear Layer

        模拟模型层数少,但节点多,运算非常慢,收敛的也慢,因此在输入层和非线性隐藏层之间加入一个线性层(含有 k 个 units),由于线性层可以被吸收到权重矩阵中,所以加入线性层之后,新的模型具备和原来一样的表达能力。
这样重新参数化权重矩阵不仅提高了收敛速度,也大大降低了内存空间,这样也就允许训练更大的浅层网络

4 Discussion

  1. 为了学习更难的深层模型,加入了一个卷积层和池化层。
    SNN-MIMIC models for CIFAR-10 thus consist of a convolution and max pooling layer followed by fully connected 1200 linear units and 30k non-linear units.
  2. shallow models with a number of parameters comparable to deep models are likely capable of learning even more accurate functions if a more accurate teacher and/or more unlabeled data became available


    Do Deep Nets Really Need to be Deep?_第3张图片
  3. 浅层网络更加适合当前的并行计算设备,计算速度更快,需要更少的计算周期(cycles),更适合实时项目。

三、总结

  1. 本文的模型压缩算法使得在精确度和计算消耗上的权衡更容易了。
    This approach allows one to adjust flexibly the trade-off between accuracy and computational cost
  2. Developing algorithms to train shallow models of high accuracy directly from the original data without going through the intermediate teacher model would, if possible, be a significant contribution.
  3. 深度学习的优势可能来自于其深度结构和当前训练方法的 good match 。

For a given number of parameters, depth may make learning easier , but may not always be essential .

**参考文献:
[1] Lei Jimmy Ba, Rich Caruana. Do Deep Nets Really Need to be Deep? NIPS 2014· ·
[2] Cristian Bucilu, Rich Caruana, and Alexandru Niculescu-Mizil. Model compression. ACM SIGKDD, 2006

注:[2]主要是将复杂的集成模型转化成单层神经网络,结果是:mimic neural nets are 1000 times smaller and 1000 times faster。并且作者认为任何算法模型都可以通过模型压缩方法用简单的神经网络来模拟实现。
· · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·

你可能感兴趣的:(Do Deep Nets Really Need to be Deep?)