前言:这个系列文章将会从经典的卷积神经网络历史开始,然后逐个讲解卷积神经网络结构,代码实现和优化方向。
THE HISTORY OF NEURAL NETWORKS
http://dataconomy.com/2017/04/history-neural-networks/
Convolutional Neural Networks, Part 1: Historical Significance
https://ai.intel.com/convolutional-neural-networks-historical-significance/
以下内容来翻译自斯坦福大学课程:
http://cs231n.github.io/convolutional-networks/
在卷积网络领域有几个有名字的体系结构。最常见的是:
卷积神经网络的第一批成功应用,是由 Yann LeCun 在 20 世纪 90 年代开发的。其中最着名的是用于识别邮政编码,数字等的LeNet架构。
首先让卷积神经网络在计算机视觉中流行的是由 Alex Krizhevsky,Ilya Sutskever 和 Geoff Hinton 开发的 AlexNet。 AlexNet 于 2012 年应用于 ImageNet ILSVRC 挑战赛,并且明显超越亚军(top 5 错误率 16%,对比亚军有 26%)。该卷积神经网络与 LeNet 具有非常相似的架构,但是更深,更大,并且具有彼此堆叠的卷积层(之前通常一个卷积层总是紧接着池化层)。
GoogLeNet 是 ILSVRC 2014获奖者,是来自 Google 由 Szegedy 等人开发的卷积网络。其主要贡献是开发了一个 Inception 模块,该模块大大减少了网络中的参数数量(4M,与带有 60M 的 AlexNet 相比)。另外,这个论文在卷积神经网络的顶部使用平均池化(Average Pooling)而不是完全连接层(Fully Connected layers),从而消除了大量似乎并不重要的参数。 GoogLeNet 还有几个后续版本,最近的 Inception-v4。
2014 年 ILSVRC 亚军是来自 Karen Simonyan和 Andrew Zisserman 的卷积神经网络,被称为VGGNet。它的主要贡献在于表明网络的深度是良好表现的关键组成部分。他们最终的最佳网络包含16个CONV / FC层,并且极具吸引力的是,这个卷积神经网络具有非常均匀的架构,从开始到结束只执行 3x3 卷积和 2x2 池化。他们的预训练模型可直接用于 Caffe。 VGGNet 的缺点是花费更大的代加评估和使用更多的内存和参数(140M)。这些参数中的大部分都位于第一个全连接层中,因为发现这些 FC 层可以在不降低性能的情况下被移除,从而大大减少了必要参数的数量。
残差网络(ResNet)由 Kaiming He 等开发。它是 ILSVRC 2015的获胜者。它具有特殊的跳跃连接和大量的使用批量标准化。该体系结构在网络末端也没有完全连接层。读者还可以参考 Kaiming He 的演讲(视频,幻灯片)以及最近在 Torch 中重现这些网络的实验。 ResNets 目前是迄今为止最先进的卷积神经网络模型,并且是实践中使用卷积神经网络的默认选择(截至2016年5月10日)。特别是,还可以看到更多最新的进展,调整了 Kaiming He 等人的原始架构。深度残差网络中的 Identity Mappings(2016年3月发布)。
英文原文:
There are several architectures in the field of Convolutional Networks that have a name. The most common are:
LeNet. The first successful applications of Convolutional Networks were developed by Yann LeCun in 1990’s. Of these, the best known is the LeNet architecture that was used to read zip codes, digits, etc.
AlexNet. The first work that popularized Convolutional Networks in Computer Vision was the AlexNet, developed by Alex Krizhevsky, Ilya Sutskever and Geoff Hinton. The AlexNet was submitted to the ImageNet ILSVRC challenge in 2012 and significantly outperformed the second runner-up (top 5 error of 16% compared to runner-up with 26% error). The Network had a very similar architecture to LeNet, but was deeper, bigger, and featured Convolutional Layers stacked on top of each other (previously it was common to only have a single CONV layer always immediately followed by a POOL layer).
GoogLeNet. The ILSVRC 2014 winner was a Convolutional Network from Szegedy et al. from Google. Its main contribution was the development of an Inception Module that dramatically reduced the number of parameters in the network (4M, compared to AlexNet with 60M). Additionally, this paper uses Average Pooling instead of Fully Connected layers at the top of the ConvNet, eliminating a large amount of parameters that do not seem to matter much. There are also several followup versions to the GoogLeNet, most recently Inception-v4.
VGGNet. The runner-up in ILSVRC 2014 was the network from Karen Simonyan and Andrew Zisserman that became known as the VGGNet. Its main contribution was in showing that the depth of the network is a critical component for good performance. Their final best network contains 16 CONV/FC layers and, appealingly, features an extremely homogeneous architecture that only performs 3x3 convolutions and 2x2 pooling from the beginning to the end. Their pretrained model is available for plug and play use in Caffe. A downside of the VGGNet is that it is more expensive to evaluate and uses a lot more memory and parameters (140M). Most of these parameters are in the first fully connected layer, and it was since found that these FC layers can be removed with no performance downgrade, significantly reducing the number of necessary parameters.
ResNet. Residual Network developed by Kaiming He et al. was the winner of ILSVRC 2015. It features special skip connections and a heavy use of batch normalization. The architecture is also missing fully connected layers at the end of the network. The reader is also referred to Kaiming’s presentation (video, slides), and some recent experiments that reproduce these networks in Torch. ResNets are currently by far state of the art Convolutional Neural Network models and are the default choice for using ConvNets in practice (as of May 10, 2016). In particular, also see more recent developments that tweak the original architecture from Kaiming He et al. Identity Mappings in Deep Residual Networks (published March 2016).
经典卷积神经网络论文下载:
1409.1556
1512.03385
深度残差网络不同框架的实现:
作者原版
KaimingHe/deep-residual-networks
Deep Residual Learning for Image Recognition
https://github.com/KaimingHe/deep-residual-networks
TensorFlow 版
ry/tensorflow-resnet
ResNet model in TensorFlow
https://github.com/ry/tensorflow-resnet
Keras 版
raghakot/keras-resnet
Residual networks implementation using Keras-1.0 functional API
https://github.com/raghakot/keras-resnet
Torch 版
facebook/fb.resnet.torch
Torch implementation of ResNet from http://arxiv.org/abs/1512.03385 and training scripts
https://github.com/facebook/fb.resnet.torch
下一篇文章将会是 LeNet 卷积神经网络结构,代码实现和优化方向。