GoogLeNet理解及Tensorflow实现

1. 介绍

GoogLeNet首次出现在ILSVRC 2014的比赛中(Inception V1),就以较大优势取得了第一名(top-5错误率6.67%,VGGNet—7.32%)。GoogLeNet有22层深,比[AlexNet]的8层和[VGGNet]的19层还要深。但是GoogLeNet只有500万的参数数量,是AlexNet的1/12(但远胜于AlexNet的准确率),VGGNet的参数是AlexNet的3倍,在内存或计算资源有限时,GoogLeNet是比较好的选择。

降低参数的目的

  1. 参数越多模型越庞大,需要提供模型学习的数据量就越大,而目前高质量的数据非常昂贵;
  2. 参数越多,耗费的计算资源也会更大。

创新点

  1. 去除了最后的全连接层,用全局平均池化层(将图片尺寸变为 1 × 1 1\times1 1×1)来取代它。 → \rightarrow 全连接层几乎占据了AlexNet或VGGNet中的90%的参数量,而且会引起过拟合,去除全连接层后模型训练更快并且减轻了过拟合。(该想法借鉴了Network In Network中的思想)
  2. 利用Inception Module提高参数的利用效率。
    GoogLeNet理解及Tensorflow实现_第1张图片
图1. Inception Module结构图

Inception Module的基本结构
4个分支,通过 1 × 1 1\times1 1×1卷积来进行低成本的跨通道的特征变换。

  • 分支1:对输入进行 1 × 1 1\times1 1×1的卷积,能够跨通道组织信息,提高网络的表达能力,同时可以对输出通道升维和降维;
  • 分支2:先使用 1 × 1 1\times1 1×1卷积,然后连接 3 × 3 3\times3 3×3卷积,相当于进行了两次特征变换;
  • 分支3:先 1 × 1 1\times1 1×1卷积,然后连接 5 × 5 5\times5 5×5卷积;
  • 分支4:先 3 × 3 3\times3 3×3最大池化,后 1 × 1 1\times1 1×1卷积, 1 × 1 1\times1 1×1卷积的性价比很高,用很小的计算量就能增加一层特征变换和非线性化。

Inception Module可以让网络的深度和宽度高效率的扩充,提升准确率且不至于过拟合。

Hebbian原理

人脑神经元的连接是稀疏的,因此研究者认为大型神经网络的合理的连接方式应该也是稀疏的(对于非常大型、非常深的神经网络,可以减轻过拟合并降低计算量)。文中提到的稀疏结构基于Hebbian原理。

Hebbian原理: 神经反射活动的持续和重复会导致神经元连接稳定性的持久提升,当两个神经元细胞A和B距离很近,并且A参与了对B重复、持续的兴奋,那么某些代谢变化会导致A将作为能使B兴奋的细胞。如图2所示,将上一层高度相关(correlated)的的节点聚类,并将聚类出来的每一个小簇(cluster)连接到一起。
GoogLeNet理解及Tensorflow实现_第2张图片

图2. 稀疏结构的构建

一个“好”的稀疏结构,应该是符合Hebbian原理的,我们应该把相关性高的一簇神经元节点连接在一起。 在图片数据中,邻近区域的数据相关性高,因此相邻的像素点被卷积操作连接在一起。我们可能有多个卷积核,在同一空间位置但在不同通道的卷积核的输出结果相关性极高通过 1 × 1 1\times1 1×1卷积能够把这些相关性很高的、在同一个空间位置但是不同通道的特征连接到一起。

网络结构

GoogLeNet理解及Tensorflow实现_第3张图片

图3. GoogLeNet网络结构图

如图3所示,GoogLeNet(Inception v1)有22层深,除了最后一层的输出其中间节点的分类效果也很好。因此在InceptionNet中,还使用到了辅助分类节点(auxiliary classifiers),即将中间某一层的输出用作分类,并按一个较小的权重(0.3)加到最终分类结果中。这相当与做了模型融合,同时给网络增加了反向传播的梯度信号,也提供了额外的正则化,对于整个InceptionNet的训练很有裨益。

2. Inception家族

  • Inception V1: 出自2014年9月的论文《Going Deeper with Convolutions》,top-5错误率为6.67%

使用了异步的SGD训练,学习速率每迭代8个epoch降低4%。同时,Inception V1也使用了Multi-Scale、Multi-Crop等数据增强方法,并在不同的采样数据上训练了7个模型进行融合。

  • Inception V2: 出自2015年2月的论文《Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covarite Shift》,top-5错误率为4.8%
  1. 学了了VGGNet ,用两个 3 × 3 3\times3 3×3的卷积代替 5 × 5 5\times5 5×5的大卷积(用以降低参数量并降低过拟合),还提出了著名的Batch Normalization方法。
  2. BN是一个非常有效的正则化方法,可以让大型卷积网络的训练速度加快很多倍,同时收敛后的分类准确率也可以得到大幅提高。BN用于神经网络某层时,会对每一个mini-batch数据的内部进行标准化(normalization)处理,使输出规范化到 N ∼ ( 0 , 1 ) N\sim(0,1) N(0,1)的正态分布,减少了Internal Covariate Shift(内部神经元分布的改变)。
  3. BN的论文指出,传统的深度神经网络在训练时,每一层的输入的分布都在变化,导致训练变得困难,我们只能使用一个很小的学习速率解决这个问题。而对每一层使用BN之后,能够有效的解决这个问题,学习速率可以增大很多倍,达到之前的准确率所需要的迭代次数只有1/14,训练时间大大缩短。(BN在某种意义上还起到了正则化的作用,所以可以减少或者取消Dropout,简化网络结构。)
  4. 单纯的使用BN获得的增益并不明显,还需要一些相应的调整:
  • 增大学习速率并加快学习衰减速度以适用BN规范化后的数据;
  • 去除Dropout并减轻L2正则(BN已祈祷正则化的作用);
  • 去除LRN;
  • 更彻底地对训练样本进行shuffle;
  • 减少数据增强中对数据地光学畸变(BN训练更快,每个样本被训练的次数更少,因此更真实的样本对训练更有帮助);
    使用这些措施后,Inception V2的在训练达到Inception V1的准确率时快了14倍,并且在模型的训练时的准确率上限更高。
  • Inception V3: 出自2015年12月的论文《Rethinking the Inception Architecture for Computer Vision》,top-5错误率为3.5%

主要有两方面的改造:

  • 一是引入了Factorization into small convolutions 的思想, 将一个较大的二维卷积拆成两个较小的一维卷积,比如将 7 × 7 7\times7 7×7卷积拆成 1 × 7 1\times7 1×7卷积和 7 × 1 7\times1 7×1卷积,或者将 3 × 3 3\times3 3×3的卷积拆成 1 × 3 1\times3 1×3 3 × 1 3\times1 3×1卷积,如图4所示。
    GoogLeNet理解及Tensorflow实现_第4张图片
图4. 将一个3×3卷积拆成1×3卷积和3×1卷积

一方面节约了大量参数,加速运算并减轻了过拟合(将 7 × 7 7\times7 7×7卷积拆成 1 × 7 1\times7 1×7卷积和 7 × 1 7\times1 7×1卷积,比拆成3个 3 × 3 3\times3 3×3卷积更节约参数),同时增加了一层非线性扩展模型表达能力。这种非对称的卷积结构拆分,其结果比对称地拆为几个相同的小卷积核效果更明显,可以处理更多、更丰富的空间特征,增加特征多样性。

  • 另一方面是优化了Inception Module的结构, 现在Inception Module有 35 × 35 , 17 × 17 , 8 × 8 35\times35,17\times17,8\times8 35×35,17×17,8×8三种不同的结构,如图5所示。
    GoogLeNet理解及Tensorflow实现_第5张图片
图5. Inception V3中三种结构的Inception Module

这些Inception Module只在网络的后部出现,前部还是普通的卷积层。并且Inception V3除了在Inception Module中使用分支,还在分支中使用了分支(8×8的结构中),可以说是Network In Network In Network
Inception V3的网络结构如下表:

类型 kernel尺寸/步长(或注释) 输入尺寸
卷积 3×3 / 2 299×299×3
卷积 3×3 / 1 149×149×32
卷积 3×3 / 1 147×147×32
池化 3×3 / 2 147×147×64
卷积 3×3 / 1 73×73×64
卷积 3×3 / 2 71×71×80
卷积 3×3 / 1 35×35×192
Inception 模块组 3个Inception Module 35×35×288
Inception 模块组 5个Inception Module 17×17×768
Inception 模块组 3个Inception Module 8×8×1280
池化 8×8 8×8×2048
线性 logits 1×1×2048
Softmax 分类输出 1×1×1000
  • Inception V4: 出自2016年2月的论文《Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning》,top-5错误率为3.08%

与Inception V3相比V4主要是结合了微软的ResNet。

3. TensorFlow实现

实现的是Inception V3,网络结构如上表。 Inception V3相对比较复杂,所以使用tf.contrib.slim辅助设计网络。contrib.slim中的一些功能和组建可以大大减少设计Inception Net的代码量,只需要少量的代码即可构建好有42层深的Inception V3。
实现代码:

import tensorflow as tf
from datetime import datetime
import time
import math

slim = tf.contrib.slim
#trunc_normal:产生截断的正态分布
trunc_normal = lambda stddev: tf.truncated_normal_initializer(0.0, stddev)

num_batches = 100

'''
inception_arg_scope:用来生成网络中经常用到的函数的默认参数,比如卷积的激活函数、权重初始化方式、标准化器等。
L2正则的weight_decay默认值为0.00004,标准差stddev默认值为0.1,参数batch_norm_var_collection默认值为moving_vars
'''
def inception_v3_arg_scope(weight_decay=0.00004,
                           stddev=0.1,
                           batch_norm_var_collection='moving_vars'):

    '''
    参数字典
    '''
    batch_norm_params = {
        'decay': 0.9997,    #衰减系数
        'epsilon': 0.001,
        'updates_collections': tf.GraphKeys.UPDATE_OPS,
        'variables_collections': {
            'beta': None,
            'gama': None,
            'moving_mean': [batch_norm_var_collection],
            'moving_variance': [batch_norm_var_collection],
        }
    }

    '''
    slim.arg_scope,可以给函数的参数自动赋予某些默认值
    '''
    with slim.arg_scope([slim.conv2d, slim.fully_connected],
                        weights_regularizer=slim.l2_regularizer(weight_decay)):
        with slim.arg_scope(
            [slim.conv2d],
            weights_initializer=tf.truncated_normal_initializer(stddev=stddev),
            activation_fn=tf.nn.relu,
            normalizer_fn=slim.batch_norm,
            normalizer_params=batch_norm_params
        ) as sc:
            return sc

'''
inception_v3_base:生成Inception V3网络的卷积部分
参数inputs为输入的图片数据的tensor,scope为包含了函数默认参数的环境。
输出为:35*35*192
'''
def inception_v3_base(inputs, scope=None):
    end_points = {} #保存关键节点供之后使用
    with tf.variable_scope(scope, 'InceptionV3', [inputs]):
        with slim.arg_scope([slim.conv2d, slim.max_pool2d, slim.avg_pool2d],
                            stride=1, padding='VALID'): #为函数设置默认值
            net = slim.conv2d(inputs, 32, [3, 3], stride=2, scope='Conv2d_1a_3x3')
            net = slim.conv2d(net, 32, [3, 3], scope='Conv2d_2a_3x3')
            net = slim.conv2d(net, 64, [3, 3], padding='SAME', scope='Conv2d_2b_3x3')
            net = slim.max_pool2d(net, [3, 3], stride=2, scope='MaxPool_3a_3x3')
            net = slim.conv2d(net, 80, [1, 1], scope='Conv2d_3b_1x1')
            net = slim.conv2d(net, 192, [3, 3], scope='Conv2d_4a_3x3')
            net = slim.max_pool2d(net, [3, 3], stride=2, scope='MaxPool_5a_3x3')

    '''
    第1个Inception模块组,包括三个模块
    '''
    with slim.arg_scope([slim.conv2d, slim.max_pool2d, slim.avg_pool2d],
                        stride=1, padding='SAME'):
        # 第1个Inception Module
        with tf.variable_scope('Mixed_5b'):
            with tf.variable_scope('Branch_0'):
                branch_0 = slim.conv2d(net, 64, [1, 1], scope='Conv2d_0a_1x1')
            with tf.variable_scope('Branch_1'):
                branch_1 = slim.conv2d(net, 48, [1, 1], scope='Conv2d_0a_1x1')
                branch_1 = slim.conv2d(branch_1, 64, [5, 5], scope='Conv2d_0b_5x5')
            with tf.variable_scope('Branch_2'):
                branch_2 = slim.conv2d(net, 64, [1, 1], scope='Conv2d_0a_1x1')
                branch_2 = slim.conv2d(branch_2, 96, [3, 3], scope='Conv2d_0b_3x3')
                branch_2 = slim.conv2d(branch_2, 96, [3, 3], scope='Conv2d_0c_3x3')
            with tf.variable_scope('Branch_3'):
                branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_0a_3x3')
                branch_3 = slim.conv2d(branch_3, 32, [1, 1], scope='Conv2d_0b_1x1')
            net = tf.concat([branch_0, branch_1, branch_2, branch_3], 3)    #输出通道数为64+64+96+32=256,padding是'SAME',输出为35*35*256

        # 第2个Inception Module
        with tf.variable_scope('Mixed_5c'):
            with tf.variable_scope('Branch_0'):
                branch_0 = slim.conv2d(net, 64, [1, 1], scope='Conv2d_0a_1x1')
            with tf.variable_scope('Branch_1'):
                branch_1 = slim.conv2d(net, 48, [1, 1], scope='Conv2d_0a_1x1')
                branch_1 = slim.conv2d(branch_1, 64, [5, 5], scope='Conv2d_0b_5x5')
            with tf.variable_scope('Branch_2'):
                branch_2 = slim.conv2d(net, 64, [1, 1], scope='Conv2d_0a_1x1')
                branch_2 = slim.conv2d(branch_2, 96, [3, 3], scope='Conv2d_0b_3x3')
                branch_2 = slim.conv2d(branch_2, 96, [3, 3], scope='Conv2d_0c_3x3')
            with tf.variable_scope('Branch_3'):
                branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_0a_3x3')
                branch_3 = slim.conv2d(branch_3, 64, [1, 1], scope='Conv2d_0b_1x1')
            net = tf.concat([branch_0, branch_1, branch_2, branch_3], 3)    #输出通道数为64+64+96+64=288,padding是'SAME',输出为35*35*288

        # 第3个Inception Module
        with tf.variable_scope('Mixed_5d'):
            with tf.variable_scope('Branch_0'):
                branch_0 = slim.conv2d(net, 64, [1, 1], scope='Conv2d_0a_1x1')
            with tf.variable_scope('Branch_1'):
                branch_1 = slim.conv2d(net, 48, [1, 1], scope='Conv2d_0a_1x1')
                branch_1 = slim.conv2d(branch_1, 64, [5, 5], scope='Conv2d_0b_5x5')
            with tf.variable_scope('Branch_2'):
                branch_2 = slim.conv2d(net, 64, [1, 1], scope='Conv2d_0a_1x1')
                branch_2 = slim.conv2d(branch_2, 96, [3, 3], scope='Conv2d_0b_3x3')
                branch_2 = slim.conv2d(branch_2, 96, [3, 3], scope='Conv2d_0c_3x3')
            with tf.variable_scope('Branch_3'):
                branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_0a_3x3')
                branch_3 = slim.conv2d(branch_3, 64, [1, 1], scope='Conv2d_0b_1x1')
            net = tf.concat([branch_0, branch_1, branch_2, branch_3], 3)    #输出通道数为64+64+96+64=288,padding是'SAME',输出为35*35*288

        '''
        第2个模块组Inception,5个模块
        '''
        # 第1个Inception Module
        with tf.variable_scope('Mixed_6a'):
            with tf.variable_scope('Branch_0'):
                branch_0 = slim.conv2d(net, 384, [3, 3],
                                       stride=2, padding='VALID', scope='Conv2d_1a_1x1') #图片缩小为17x17
            with tf.variable_scope('Branch_1'):
                branch_1 = slim.conv2d(net, 64, [1, 1], scope='Conv2d_1a_1x1')
                branch_1 = slim.conv2d(branch_1, 96, [3, 3], scope='Conv2d_1b_3x3')
                branch_1 = slim.conv2d(branch_1, 96, [3, 3],
                                       stride=2, padding='VALID', scope='Conv2d_1c_3x3')
            with tf.variable_scope('Branch_2'):
                branch_2 = slim.max_pool2d(net, [3, 3], stride=2, padding='VALID', scope='MaxPool_1a_3x3')
            net = tf.concat([branch_0, branch_1, branch_2], 3)  #输出通道数为384+96+256=768,输出尺寸为17*17*768

        #后4个Module中都用到了'Factorization into small convolutions'
        # 第2个Inception Module
        with tf.variable_scope('Mixed_6b'):
            with tf.variable_scope('Branch_0'):
                branch_0 = slim.conv2d(net, 192, [1, 1], scope='Conv2d_2a_1x1')
            with tf.variable_scope('Branch_1'):
                branch_1 = slim.conv2d(net, 128, [1, 1], scope='Conv2d_2a_1x1')
                branch_1 = slim.conv2d(branch_1, 128, [1, 7], scope='Conv2d_2b_1x7')
                branch_1 = slim.conv2d(branch_1, 192, [7, 1], scope='Conv2d_2c_7x1')
            with tf.variable_scope('Branch_2'):
                branch_2 = slim.conv2d(net, 128, [1, 1], scope='Conv2d_2a_1x1')
                branch_2 = slim.conv2d(branch_2, 128, [7, 1], scope='Conv2d_2b_7x1')
                branch_2 = slim.conv2d(branch_2, 128, [1, 7], scope='Conv2d_2c_1x7')
                branch_2 = slim.conv2d(branch_2, 128, [7, 1], scope='Conv2d_2d_7x1')
                branch_2 = slim.conv2d(branch_2, 128, [1, 7], scope='Conv2d_2e_1x7')
            with tf.variable_scope('Branch_3'):
                branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_2a_3x3')
                branch_3 = slim.conv2d(branch_3, 192, [1, 1], scope='Conv2d_2b_1x1')
            net = tf.concat([branch_0, branch_1, branch_2, branch_3], 3)    #输出通道为192+192+192+192=768,输出尺寸为17*17*768

        # 第3个Inception Module
        with tf.variable_scope('Mixed_6c'):
            with tf.variable_scope('Branch_0'):
                branch_0 = slim.conv2d(net, 192, [1, 1], scope='Conv2d_2a_1x1')
            with tf.variable_scope('Branch_1'):
                branch_1 = slim.conv2d(net, 160, [1, 1], scope='Conv2d_2a_1x1')
                branch_1 = slim.conv2d(branch_1, 160, [1, 7], scope='Conv2d_2b_1x7')
                branch_1 = slim.conv2d(branch_1, 192, [7, 1], scope='Conv2d_2c_7x1')
            with tf.variable_scope('Branch_2'):
                branch_2 = slim.conv2d(net, 160, [1, 1], scope='Conv2d_2a_1x1')
                branch_2 = slim.conv2d(branch_2, 160, [7, 1], scope='Conv2d_2b_7x1')
                branch_2 = slim.conv2d(branch_2, 160, [1, 7], scope='Conv2d_2c_1x7')
                branch_2 = slim.conv2d(branch_2, 160, [7, 1], scope='Conv2d_2d_7x1')
                branch_2 = slim.conv2d(branch_2, 128, [1, 7], scope='Conv2d_2e_1x7')
            with tf.variable_scope('Branch_3'):
                branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_2a_3x3')
                branch_3 = slim.conv2d(branch_3, 192, [1, 1], scope='Conv2d_2b_1x1')
            net = tf.concat([branch_0, branch_1, branch_2, branch_3], 3)    #输出通道为192+192+192+192=768,输出尺寸为17*17*768

        # 第4个Inception Module
        with tf.variable_scope('Mixed_6d'):
            with tf.variable_scope('Branch_0'):
                branch_0 = slim.conv2d(net, 192, [1, 1], scope='Conv2d_2a_1x1')
            with tf.variable_scope('Branch_1'):
                branch_1 = slim.conv2d(net, 160, [1, 1], scope='Conv2d_2a_1x1')
                branch_1 = slim.conv2d(branch_1, 160, [1, 7], scope='Conv2d_2b_1x7')
                branch_1 = slim.conv2d(branch_1, 192, [7, 1], scope='Conv2d_2c_7x1')
            with tf.variable_scope('Branch_2'):
                branch_2 = slim.conv2d(net, 160, [1, 1], scope='Conv2d_2a_1x1')
                branch_2 = slim.conv2d(branch_2, 160, [7, 1], scope='Conv2d_2b_7x1')
                branch_2 = slim.conv2d(branch_2, 160, [1, 7], scope='Conv2d_2c_1x7')
                branch_2 = slim.conv2d(branch_2, 160, [7, 1], scope='Conv2d_2d_7x1')
                branch_2 = slim.conv2d(branch_2, 128, [1, 7], scope='Conv2d_2e_1x7')
            with tf.variable_scope('Branch_3'):
                branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_2a_3x3')
                branch_3 = slim.conv2d(branch_3, 192, [1, 1], scope='Conv2d_2b_1x1')
            net = tf.concat([branch_0, branch_1, branch_2, branch_3], 3)  # 输出通道为192+192+192+192=768,输出尺寸为17*17*768

        # 第5个Inception Module
        with tf.variable_scope('Mixed_6e'):
            with tf.variable_scope('Branch_0'):
                branch_0 = slim.conv2d(net, 192, [1, 1], scope='Conv2d_2a_1x1')
            with tf.variable_scope('Branch_1'):
                branch_1 = slim.conv2d(net, 160, [1, 1], scope='Conv2d_2a_1x1')
                branch_1 = slim.conv2d(branch_1, 160, [1, 7], scope='Conv2d_2b_1x7')
                branch_1 = slim.conv2d(branch_1, 192, [7, 1], scope='Conv2d_2c_7x1')
            with tf.variable_scope('Branch_2'):
                branch_2 = slim.conv2d(net, 160, [1, 1], scope='Conv2d_2a_1x1')
                branch_2 = slim.conv2d(branch_2, 160, [7, 1], scope='Conv2d_2b_7x1')
                branch_2 = slim.conv2d(branch_2, 160, [1, 7], scope='Conv2d_2c_1x7')
                branch_2 = slim.conv2d(branch_2, 160, [7, 1], scope='Conv2d_2d_7x1')
                branch_2 = slim.conv2d(branch_2, 128, [1, 7], scope='Conv2d_2e_1x7')
            with tf.variable_scope('Branch_3'):
                branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_2a_3x3')
                branch_3 = slim.conv2d(branch_3, 192, [1, 1], scope='Conv2d_2b_1x1')
            net = tf.concat([branch_0, branch_1, branch_2, branch_3], 3)  # 输出通道为192+192+192+192=768,输出尺寸为17*17*768
        end_points['Mixed_6e'] = net

        '''
        第3个Inception模块组,包含3个Inception Module
        '''
        # 第1个Inception Module
        with tf.variable_scope('Mixed_7a'):
            with tf.variable_scope('Branch_0'):
                branch_0 = slim.conv2d(net, 192, [1, 1], scope='Conv2d_3a_1x1')
                branch_0 = slim.conv2d(branch_0, 320, [3, 3],
                                       stride=2, padding='VALID', scope='Conv2d_3b_3x3')
            with tf.variable_scope('Branch_1'):
                branch_1 = slim.conv2d(net, 192, [1, 1], scope='Conv2d_3a_1x1')
                branch_1 = slim.conv2d(branch_1, 192, [1, 7], scope='Conv2d_3b_1x7')
                branch_1 = slim.conv2d(branch_1, 192, [7, 1], scope='Conv2d_3c_7x1')
                branch_1 = slim.conv2d(branch_1, 192, [3, 3], stride=2,
                                       padding='VALID', scope='Conv2d_3d_3x3')
            with tf.variable_scope('Branch_2'):
                branch_2 = slim.max_pool2d(net, [3, 3], stride=2,
                                           padding='VALID', scope='MaxPool_3a_3x3')
            net = tf.concat([branch_0, branch_1, branch_2], 3)  #输出通道数为320+192+768=1280,输出尺寸为8*8*1280

        # 第2个Inception Module
        with tf.variable_scope('Mixed_7b'):
            with tf.variable_scope('Branch_0'):
                branch_0 = slim.conv2d(net, 320, [1, 1], scope='Conv2d_3a_1x1')
            with tf.variable_scope('Branch_1'):
                branch_1 = slim.conv2d(net, 384, [1, 1], scope='Conv2d_3a_1x1')
                branch_1 = tf.concat([
                    slim.conv2d(branch_1, 384, [1, 3], scope='Conv2d_3b_1x3'),
                    slim.conv2d(branch_1, 384, [3, 1], scope='Conv2d_3c_3x1')
                ], 3)
            with tf.variable_scope('Branch_2'):
                branch_2 = slim.conv2d(net, 448, [1, 1], scope='Conv2d_3a_1x1')
                branch_2 = slim.conv2d(branch_2, 384, [3, 3], scope='Conv2d_3b_3x3')
                branch_2 = tf.concat([
                    slim.conv2d(branch_2, 384, [1, 3], scope='Conv2d_3c_1x3'),
                    slim.conv2d(branch_2, 384, [3, 1], scope='Conv2d_3d_3x1')
                ], 3)
            with tf.variable_scope('Branch_3'):
                branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_3a_3x3')
                branch_3 = slim.conv2d(branch_3, 192, [1, 1], scope='Conv2d_3b_1x1')
            net = tf.concat([branch_0, branch_1, branch_2, branch_3], 3)  # 输出通道数为320+768+768+192=2048,输出尺寸为8*8*2048

        # 第3个Inception Module
        with tf.variable_scope('Mixed_7c'):
            with tf.variable_scope('Branch_0'):
                branch_0 = slim.conv2d(net, 320, [1, 1], scope='Conv2d_3a_1x1')
            with tf.variable_scope('Branch_1'):
                branch_1 = slim.conv2d(net, 384, [1, 1], scope='Conv2d_3a_1x1')
                branch_1 = tf.concat([
                    slim.conv2d(branch_1, 384, [1, 3], scope='Conv2d_3b_1x3'),
                    slim.conv2d(branch_1, 384, [3, 1], scope='Conv2d_3c_3x1')
                ], 3)
            with tf.variable_scope('Branch_2'):
                branch_2 = slim.conv2d(net, 448, [1, 1], scope='Conv2d_3a_1x1')
                branch_2 = slim.conv2d(branch_2, 384, [3, 3], scope='Conv2d_3b_3x3')
                branch_2 = tf.concat([
                    slim.conv2d(branch_2, 384, [1, 3], scope='Conv2d_3c_1x3'),
                    slim.conv2d(branch_2, 384, [3, 1], scope='Conv2d_3d_3x1')
                ], 3)
            with tf.variable_scope('Branch_3'):
                branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_3a_3x3')
                branch_3 = slim.conv2d(branch_3, 192, [1, 1], scope='Conv2d_3b_1x1')
            net = tf.concat([branch_0, branch_1, branch_2, branch_3], 3)  # 输出通道数为320+768+768+192=2048,输出尺寸为8*8*2048
        return net, end_points

'''
inception_v3:全局平均池化、Softmax和Auxiliary Logits
num_classes:分类数量
is_training:标志是否是训练过程,只有训练时Batch Normalization和Dropout才会被启用
dropout_keep_prob:Dropout所需保留节点的比例,默认为0.8
prediction_fn:最后用来进行分类的函数
spatial_squeeze:标志是否对输出进行squeeze操作(即去除维数为1的维度,如5x3x1转为5x3)
reuse:标志是否会对网络和Variable进行重复使用
scope:包含了默认参数的环境
'''
def inception_v3(inputs,
                 num_classes=100,
                 is_training=True,
                 dropout_keep_prob=0.8,
                 prediction_fn=slim.softmax,
                 spatial_squeeze=True,
                 reuse=None,
                 scope='InceptionV3'):
    with tf.variable_scope(scope, 'InceptionV3', [inputs, num_classes],
                           reuse=reuse) as scope:
        with slim.arg_scope([slim.batch_norm, slim.dropout],
                            is_training=is_training):
            net, end_points = inception_v3_base(inputs, scope=scope)

        '''
        Auxiliary Logits作为辅助分类的节点,对分类结果预测有很大帮助
        '''
        with slim.arg_scope([slim.conv2d, slim.max_pool2d, slim.avg_pool2d],
                            stride=1, padding='SAME'):
            aux_logits = end_points['Mixed_6e']
            with tf.variable_scope('AuxLogits'):
                aux_logits = slim.avg_pool2d(aux_logits,
                                             [5, 5], stride=3, padding='VALID',
                                             scope='AvgPool_1a_5x5')
                aux_logits = slim.conv2d(aux_logits,
                                         128, [1, 1], scope='Conv2d_1b_1x1')
                aux_logits = slim.conv2d(aux_logits,
                                         768, [5, 5], weights_initializer=trunc_normal(0.01),
                                         padding='VALID', scope='Conv2d_1c_5x5')
                aux_logits = slim.conv2d(aux_logits,
                                         num_classes, [1, 1], activation_fn=None,
                                         normalizer_fn=None, weights_initializer=trunc_normal(0.001),
                                         scope='Conv2d_1d_1x1')
                if spatial_squeeze:
                    aux_logits = tf.squeeze(aux_logits, [1, 2],
                                            name='SpatialSqueeze')
                end_points['AuxLogits'] = aux_logits

            '''
            Logits进行正常的分类预测的逻辑
            '''
            with tf.variable_scope('Logits'):
                net = slim.avg_pool2d(net, [8, 8],
                                      padding='VALID', scope='AvgPool_1a_8x8')
                net = slim.dropout(net, keep_prob=dropout_keep_prob,
                                   scope='Dropout_1b')
                end_points['PreLogits'] = net
                logits = slim.conv2d(net, num_classes, [1, 1],
                                     activation_fn=None, normalizer_fn=None, scope='Conv2d_1c_1x1')
                if spatial_squeeze:
                    logits = tf.squeeze(logits, [1, 2], name='SpatialSqueeze')
                end_points['Logits'] = logits
                end_points['Predictions'] = prediction_fn(logits, scope='Predictions')
        return logits, end_points

'''
time_tensorflow_run:评估每轮计算时间的函数
'''
def time_tensorflow_run(session, target, info_string):
    num_steps_burn_in = 10
    total_duration = 0.0
    total_duration_squared = 0.0

    for i in range(num_batches + num_steps_burn_in):
        start_time = time.time()
        _ = session.run(target)
        duration = time.time() - start_time
        if i >= num_steps_burn_in:
            if not i % 10:
                print('%s: step %d, duration = %.3f' %(datetime.now(), i - num_steps_burn_in, duration))
            total_duration += duration
            total_duration_squared += duration * duration

    mn = total_duration / num_batches
    vr = total_duration_squared / num_batches - mn * mn
    sd = math.sqrt(vr)
    print('%s: %s across %d steps, %.3f +/- %.3f sec / batch'%(datetime.now(), info_string, num_batches, mn, sd))

def main():
    batch_size = 32
    height, width = 299, 299
    inputs = tf.random_uniform((batch_size, height, width, 3))
    with slim.arg_scope(inception_v3_arg_scope()):
        logits, end_points = inception_v3(inputs, is_training=False)

    init = tf.global_variables_initializer()
    sess = tf.Session()
    sess.run(init)
    # 前馈时间计算
    time_tensorflow_run(sess, logits, 'Forward')

if __name__ == '__main__':
    main()

运行结果截图: (仅前馈时间)
GoogLeNet理解及Tensorflow实现_第6张图片

4. 总结

Inception V3是一个非常复杂、精妙的模型,其中用到了非常多之前积累下来的设计大型卷积网络的经验和技巧。
Inception V3作为一个极深的卷积神经网络,拥有非常精妙的设计和构造,整个网络的结构和分支非常复杂,Inception V3中有很多的设计CNN的思想和Trick可以借鉴。

  1. Factorization into small convolutions很有效,可以降低参数量、减轻过拟合,增加网络非线性的表达能力。
  2. 卷积网络从输入到输出,应该让图片尺寸逐渐减小,输出通道数逐渐增加,即让空间结构简化,将空间信息转化为高阶抽象的特征信息。
  3. Inception Module用多个分支提取不同抽象程度的高阶特征的思路很有效,可以丰富网络的表达能力。

参考

  1. 《TensorFlow 实战》—— 黄文坚、唐源

你可能感兴趣的:(深度学习,网络结构)