[ImageNet]
• Over 15M labeled high resolution images.
• Roughly 22k categories.
• Collected from web and labeled by Amazon Mechanical Turk.
[ILSVRC (ImageNet Large-Scale Visual Recognition Challenge)]
• Annual competition of image classification at large scale.
• 1.2M training images, 50k validation images, and 150k testing images.
• 1k categories.
• Resolution of each image varies.
• Classification: make 5 guesses about the image label (top-5 error).
[Architectures] See Tab. 1.
AlexNet
• Deeper, bigger than LeNet.
• Featured conv layer stacked on top of each other (previously it was common to only have a single conv layer always immediately followed by a pool layer).
• First use of ReLU.
• Heavy data augmentation.
• Dropout.
ZFNet
• Improvement on AlexNet by tweaking the architecture hyperparameters.
• conv1: change from (11 × 11, s4) to (7 × 7, s2).
• conv3,4,5: instead of 384, 384, 256 filters use 512, 1024, 512.
GoogLeNet
• Inception Module that dramatically reduced the number of parameters in the network (4M, compared to AlexNet with 60M).
• Use global average pooling instead of fc.
VGGNet
• Depth of the network is a critical component for good performance.
• 3 × 3 conv and 2 × 2 pool only.
• More parameters (138M).
ResNet
• Skip connections.
• Heavy use of BN.
• Xavier/2 initialization.