AlexNet介绍

AlexNet介绍

1 简要概括

    AlexNet由Alex Krizhevsky于2012年提出,夺得2012年ILSVRC比赛的冠军,top5预测的错误率为16.4%,远超第一名。AlexNet采用8层的神经网络,5个卷积层和3个全连接层(3个卷积层后面加了最大池化层),包含6亿3000万个链接,6000万个 参数和65万个神经元。

2 创新点

    成功使用ReLU作为CNN的激活函数,验证了其效果在较深的网络中超过了Sigmoid,成功解决了Sigmoid在网络较深时的梯度弥散问题。

    训练时使用Dropout随机忽略一部分神经元,以避免模型过拟合,一般在全连接层使用,在预测的时候是不使用Dropout的,即Dropout为1.

    在CNN中使用重叠的最大池化(步长小于卷积核)。此前CNN中普遍使用平均池化,使用最大池化可以避免平均池化的模糊效果。同时重叠效果可以提升特征的丰富性。

    提出LRN(Local Response Normalization,即局部响应归一化)层,对局部神经元的活动创建竞争机制,使得其中响应比较大的值变得相对更大,并抑制其他反馈较小的神经元,增强了模型的泛化能力。

    使用CUDA加速神经网络的训练,利用了GPU强大的计算能力。

    数据增强,随机的从256*256的图片中截取224*224大小的区域(以及水平翻转的镜像),相当于增加了(256-224)2*2=2048倍的数据量,如果没有数据增强,模型会陷入过拟合中,使用数据增强可以增大模型的泛化能力。

3 网络结构

AlexNet介绍_第1张图片

    网络的结构如上图所示。其中输入图片为224*224*3,第一个卷积层使用较大的卷积和尺寸11*11,步长为4,有96个卷积核;紧接着是LRN层;然后是一个3*3的最大池化层,步长为2,。这之后的卷积层都比较小,都是5*5或者是3*3的大小,并且步长都为1,即扫描所有的像素;最大池化层则依然为3*3,步长为2。同时可以发现,在前面的几个卷积层中,虽然计算量很大,但是参数量很小,都在1M左右,大部分的参数量都在全连接层里面,当然这也是卷积层共享权重的性质决定的。

4 代码实现

    代码主要来自Tensorflow的开源实现。这里第一层的卷积核数只为64,并不是论文中的96,我也表示很奇怪。

#%%
# Copyright 2015 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
from datetime import datetime
import math
import time
import tensorflow as tf
batch_size=32
num_batches=100
#展示每一层结构的尺寸
def print_activations(t):
    print(t.op.name, ' ', t.get_shape().as_list())
def inference(images):
    parameters = []
    # 通过name_scope将这一层的Variable命名为conv1/XXX
    #224*224*3
    with tf.name_scope('conv1') as scope:
        #设置权重矩阵,和之前的一样,大小为11*11,通道为3,采用64个核
        kernel = tf.Variable(tf.truncated_normal([11, 11, 3, 64], dtype=tf.float32,
                                                 stddev=1e-1), name='weights')
        #进行卷积运算,步长为4*4
        conv = tf.nn.conv2d(images, kernel, [1, 4, 4, 1], padding='SAME')
        biases = tf.Variable(tf.constant(0.0, shape=[64], dtype=tf.float32),
                             trainable=True, name='biases')
        bias = tf.nn.bias_add(conv, biases)
        #计算出第一层的输出结果
        conv1 = tf.nn.relu(bias, name=scope)
        print_activations(conv1)
        #添加这一层的参数
        parameters += [kernel, biases]
  # pool1
  #使用LRN会使前馈、反馈速度降低到1/3
  #56*56*64
    lrn1 = tf.nn.lrn(conv1, 4, bias=1.0, alpha=0.001 / 9.0, beta=0.75, name='lrn1')
    pool1 = tf.nn.max_pool(lrn1,
                           ksize=[1, 3, 3, 1],
                           strides=[1, 2, 2, 1],
                           padding='VALID',
                           name='pool1')
    print_activations(pool1)
  # conv2
  #27*27*64
    with tf.name_scope('conv2') as scope:
        kernel = tf.Variable(tf.truncated_normal([5, 5, 64, 192], dtype=tf.float32,
                                                 stddev=1e-1), name='weights')
        conv = tf.nn.conv2d(pool1, kernel, [1, 1, 1, 1], padding='SAME')
        biases = tf.Variable(tf.constant(0.0, shape=[192], dtype=tf.float32),
                             trainable=True, name='biases')
        bias = tf.nn.bias_add(conv, biases)
        conv2 = tf.nn.relu(bias, name=scope)
        parameters += [kernel, biases]
    print_activations(conv2)
  # pool2
  #27*27*192
    lrn2 = tf.nn.lrn(conv2, 4, bias=1.0, alpha=0.001 / 9.0, beta=0.75, name='lrn2')
    pool2 = tf.nn.max_pool(lrn2,
                           ksize=[1, 3, 3, 1],
                           strides=[1, 2, 2, 1],
                           padding='VALID',
                           name='pool2')
    print_activations(pool2)
  # conv3
  # 13*13*192
    with tf.name_scope('conv3') as scope:
        kernel = tf.Variable(tf.truncated_normal([3, 3, 192, 384],
                                                 dtype=tf.float32,
                                                 stddev=1e-1), name='weights')
        conv = tf.nn.conv2d(pool2, kernel, [1, 1, 1, 1], padding='SAME')
        biases = tf.Variable(tf.constant(0.0, shape=[384], dtype=tf.float32),
                             trainable=True, name='biases')
        bias = tf.nn.bias_add(conv, biases)
        conv3 = tf.nn.relu(bias, name=scope)
        parameters += [kernel, biases]
        print_activations(conv3)
  # conv4
  # 13*13*384
    with tf.name_scope('conv4') as scope:
        kernel = tf.Variable(tf.truncated_normal([3, 3, 384, 256],
                                                 dtype=tf.float32,
                                                 stddev=1e-1), name='weights')
        conv = tf.nn.conv2d(conv3, kernel, [1, 1, 1, 1], padding='SAME')
        biases = tf.Variable(tf.constant(0.0, shape=[256], dtype=tf.float32),
                             trainable=True, name='biases')
        bias = tf.nn.bias_add(conv, biases)
        conv4 = tf.nn.relu(bias, name=scope)
        parameters += [kernel, biases]
        print_activations(conv4)
  # conv5
  # 13*13*256
    with tf.name_scope('conv5') as scope:
        kernel = tf.Variable(tf.truncated_normal([3, 3, 256, 256],
                                                 dtype=tf.float32,
                                                 stddev=1e-1), name='weights')
        conv = tf.nn.conv2d(conv4, kernel, [1, 1, 1, 1], padding='SAME')
        biases = tf.Variable(tf.constant(0.0, shape=[256], dtype=tf.float32),
                             trainable=True, name='biases')
        bias = tf.nn.bias_add(conv, biases)
        conv5 = tf.nn.relu(bias, name=scope)
        parameters += [kernel, biases]
        print_activations(conv5)
  # pool5
  # 13*13*256
    pool5 = tf.nn.max_pool(conv5,
                           ksize=[1, 3, 3, 1],
                           strides=[1, 2, 2, 1],
                           padding='VALID',
                           name='pool5')
    print_activations(pool5)
  # 6*6*256
    return pool5, parameters
#评估AlexNet每轮的计算时间
def time_tensorflow_run(session, target, info_string):
#  """Run the computation to obtain the target tensor and print timing stats.
#
#  Args:
#    session: the TensorFlow session to run the computation under.
#    target: the target Tensor that is passed to the session's run() function.
#    info_string: a string summarizing this run, to be printed with the stats.
#
#  Returns:
#    None
#  """
    #进行热身,不考虑前10次迭代的时间(有显存加载和cache不命中等问题)
    num_steps_burn_in = 10
    total_duration = 0.0
    total_duration_squared = 0.0
    for i in range(num_batches + num_steps_burn_in):
        start_time = time.time()
        #进行每轮迭代,并记录下每轮花费的时间
        _ = session.run(target)
        duration = time.time() - start_time
        if i >= num_steps_burn_in:
            if not i % 10:
                print ('%s: step %d, duration = %.3f' %
                       (datetime.now(), i - num_steps_burn_in, duration))
            total_duration += duration
            total_duration_squared += duration * duration
    mn = total_duration / num_batches
    vr = total_duration_squared / num_batches - mn * mn
    sd = math.sqrt(vr)
    print ('%s: %s across %d steps, %.3f +/- %.3f sec / batch' %
           (datetime.now(), info_string, num_batches, mn, sd))
def run_benchmark():
#  """Run the benchmark on AlexNet."""
    with tf.Graph().as_default():
    # Generate some dummy images.
        image_size = 224
    # Note that our padding definition is slightly different the cuda-convnet.
    # In order to force the model to start with the same activations sizes,
    # we add 3 to the image_size and employ VALID padding above.
        #获取图片数据
        images = tf.Variable(tf.random_normal([batch_size,
                                           image_size,
                                           image_size, 3],
                                          dtype=tf.float32,
                                          stddev=1e-1))
    # Build a Graph that computes the logits predictions from the
    # inference model.
        #建立卷积层并获得卷积层的参数
        pool5, parameters = inference(images)

    # Build an initialization operation.
        init = tf.global_variables_initializer()

    # Start running operations on the Graph.
        config = tf.ConfigProto()
        config.gpu_options.allocator_type = 'BFC'
        sess = tf.Session(config=config)
        sess.run(init)

    # Run the forward benchmark.
        time_tensorflow_run(sess, pool5, "Forward")

    # Add a simple objective so we can calculate the backward pass.
        objective = tf.nn.l2_loss(pool5)
    # Compute the gradient with respect to all the parameters.
        grad = tf.gradients(objective, parameters)
    # Run the backward benchmark.
        time_tensorflow_run(sess, grad, "Forward-backward")
run_benchmark()


5 参考文献

[1]黄文坚,唐源.TensorFlow实战[M].北京:电子工业出版社,2017.

[2]http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf

[3]https://github.com/tensorflow/models/blob/master/tutorials/image/alexnet/alexnet_benchmark.py


你可能感兴趣的:(深度学习,计算机视觉)