ImageNet Classification with Deep Convolutional Neural Networks


  • 训练大规模,深层卷积神经网络去分类 the 1.2 million high-resolution images .

  • 数据集:the ImageNet LSVRC-2010

    • 测试集:top-1 and top-5 error rates of 37.5%
      and 17.0%
  • 神经网络: 60 million parameters and 650,000 neurons,。包含五个卷积层。five convolutional layers

    • max-pooling layers,
    • three fully-connected layers with a final 1000-way softmax
    • non-saturating neurons 非饱和神经元。
    • To reduce overfitting: dropout方法


  • labeled high-resolution images:高分辨率的图像
  • CNN应用在大规模高分辨率图像上是昂贵的
  • current GPUs, paired with a highly-optimized implementation of 2D convolution 有训练的能力。


  • removing any convolutional layer (each of which contains no more than 1% of the model’s parameters) resulted inferior performance


  • ImageNet:15 million images
    • 大约22000个类别。
    • 收集图像工具: Mechanical Turk crowd-sourcing tool
    • ILSVRC 数据集
      • 1.2 million training images,
      • 50,000 validation images
      • 150,000 testing images


  • eight learned layers
    • five convolutional and three fully-connected

ReLU Nonlinearity

f ( x ) = t a n h ( x ) f(x) = tanh(x) f(x)=tanh(x)

f ( x ) = ( 1 + e − x ) − 1 f(x) = (1+ e^{-x})^{-1} f(x)=(1+ex)1

f ( x ) = m a x ( 0 , x ) f(x) = max(0,x) f(x)=max(0,x)

Training on Multiple GPUs

Local Response Normalization

ImageNet Classification with Deep Convolutional Neural Networks_第1张图片

Overlapping Pooling


  • eight layers with weights

    • the first five are convolutional
    • the remaining three are fully-connected.
    • 将最终输出放进 a 1000-way softmax
    • The ReLU non-linearity is applied to the output of every convolutional and fully-connected layer.
    • 第一层卷积滤波器核:
      • the 224×224×3 input image with 96 kernels of size 11×11×3 with a stride of 4 pixels
    • 第二层将第一层的输出作为输入,with 256 kernels of size 5 × 5 × 48.
    • The third convolutional layer has 384 kernels of size 3 × 3 × 256 connected to the (normalized, pooled) outputs of the second convolutional layer.
  • 第四层卷积层: 384 kernels of size 3 × 3 × 192 ,

  • 第五个卷积层: 256 kernels of size 3 × 3 × 192

  • The fully-connected layers have 4096 neurons each.


Data Augmentation

  • artificially enlarge the dataset
  • altering the intensities of the RGB channels in training images


  • “dropped out” in this way do not contribute to the forward pass and do not participate in backpropagation


  • stochastic gradient descent
  • a batch size of 128 examples
  • momentum of 0.9,
  • weight decay of 0.0005.
    ImageNet Classification with Deep Convolutional Neural Networks_第2张图片


  • interchangeably 互换
  • ImageNet Classification with Deep Convolutional Neural Networks_第3张图片

Qualitative Evaluations

  • Computing similarity by using Euclidean distance






T o p − 5 Top-5 Top5的正确率: (所有测试图片中正确标签包含前五个分类概率的个数)除以(总的测试图片数)

  • TOP-5错误率=(所有测试图片中正确标签不在前五个概率中的个数)除以(总的测试图片数
  • TOP-1: 你预测的 l a b e l label label取最后概率向量里面最大的哪一个作为预测结果,如果你的预测结果中概率最大的那个分类正确,即预测正确,否则预测错误
