Inception V4网络结构和代码解析

       学习了Inception V4卷积神经网络,总结一下对Inception V4网络结构和主要代码的理解。

      GoogLeNet对网络中的传统卷积层进行了修改,提出了被称为 Inception 的结构,用于增加网络深度和宽度,提高深度神经网络性能。从Inception V1Inception V44个更新版本,每一版的网络在原来的基础上进行改进,提高网络性能。本文介绍Inception V4的网络结构和主要代码,Inception V4研究了Inception ModuleReduction Module的组合,通过多次卷积和非线性变化,极大的提升了网络性能。

 

1  非Inception Module的普通卷积层

      首先定义一个非Inception Module的普通卷积层函数inception_v4_base,输入参数inputs为图片数据的张量。第1个卷积层的输出通道数为32,卷积核尺寸为【3x3】,步长为2,padding模式是VALID,第1个卷积层之后的张量尺寸变为(299-3)/2+1=149,即【149x149x32】。第2个卷积层的输出通道数为32,卷积核尺寸为【3x3】,步长为1,padding模式是VALID,第2个卷积层之后的张量尺寸变为(149-3)/1+1=147,即【147x147x32】。第3个卷积层的输出通道数为64,卷积核尺寸为【3x3】,步长为1,padding模式是默认的SAME,第3个卷积层之后的张量尺寸不变,输出通道数改变,最后张量尺寸变为【147x147x64】。这几个普通的卷积层主要使用了3x3的小卷积核,小卷积核可以低成本的跨通道的对特征进行组合。

def inception_v4_base(inputs, scope=None):
    with tf.variable_scope(scope, 'InceptionV4', [inputs]):
        with slim.arg_scope([slim.conv2d, slim.max_pool2d, slim.avg_pool2d],stride=1, padding='SAME'):   # 299 x 299 x 3           
            net = slim.conv2d(inputs, 32, [3, 3], stride=2,padding='VALID', scope='Conv2d_1a_3x3')       # 149 x 149 x 32         
            net = slim.conv2d(net, 32, [3, 3], padding='VALID',scope='Conv2d_2a_3x3')                    # 147 x 147 x 32                 
            net = slim.conv2d(net, 64, [3, 3], scope='Conv2d_2b_3x3')                                    # 147 x 147 x 64           

 

2  三个Inception模块组

        普通卷积层之后是三个连续的Inception Module,分别是:Mixed_3a,Mixed_4a,Mixed_5a,每个模块有多个分支。第1个Inception Module Mixed_3a的第1个分支是【3x3】的最大池化,第2个分支是输出通道数为96的【3x3】卷积,步长为2,最后用tf.concat将2个分支的输出合并在一起,输出通道之和为64+96=160,张量尺寸变为(147-3)/2+1=73,即【73x73x160】。第2个Inception Module Mixed_4a的第1个分支经过【1x1】和【3x3】两次卷积,第2个分支经过【1x1】、【1x7】、【7x1】和【3x3】四次卷积,张量尺寸最后变为【71x71x192】。第3个Inception Module Mixed_5a的第1个分支是输出通道数为192的【3x3】卷积,第2个分支是【3x3】的最大池化,卷积和最大池化的步长都是2,且padding模式是VALID,因此尺寸减小一半,最后张量尺寸变为【35x35x384】。

            with tf.variable_scope('Mixed_3a'):
                with tf.variable_scope('Branch_0'):
                     branch_0 = slim.max_pool2d(net, [3, 3], stride=2, padding='VALID',scope='MaxPool_0a_3x3')
                with tf.variable_scope('Branch_1'):
                     branch_1 = slim.conv2d(net, 96, [3, 3], stride=2, padding='VALID',scope='Conv2d_0a_3x3')
                net = tf.concat(axis=3, values=[branch_0, branch_1])                          # 73 x 73 x 160
         
            with tf.variable_scope('Mixed_4a'):
                with tf.variable_scope('Branch_0'):
                    branch_0 = slim.conv2d(net, 64, [1, 1], scope='Conv2d_0a_1x1')
                    branch_0 = slim.conv2d(branch_0, 96, [3, 3], padding='VALID', scope='Conv2d_1a_3x3')
                with tf.variable_scope('Branch_1'):
                    branch_1 = slim.conv2d(net, 64, [1, 1], scope='Conv2d_0a_1x1')
                    branch_1 = slim.conv2d(branch_1, 64, [1, 7], scope='Conv2d_0b_1x7')
                    branch_1 = slim.conv2d(branch_1, 64, [7, 1], scope='Conv2d_0c_7x1')
                    branch_1 = slim.conv2d(branch_1, 96, [3, 3], padding='VALID',scope='Conv2d_1a_3x3')
                net = tf.concat(axis=3, values=[branch_0, branch_1])                        # 71 x 71 x 192
           
            with tf.variable_scope('Mixed_5a'):
                with tf.variable_scope('Branch_0'):
                    branch_0 = slim.conv2d(net, 192, [3, 3], stride=2, padding='VALID',scope='Conv2d_1a_3x3')
                with tf.variable_scope('Branch_1'):
                    branch_1 = slim.max_pool2d(net, [3, 3], stride=2, padding='VALID',scope='MaxPool_1a_3x3')
                net = tf.concat(axis=3, values=[branch_0, branch_1])                       # 35 x 35 x 384 

 

Inception Module和Reduction Module的组合

       三个Inception模块组之后是三个连续的Inception ModuleReduction Module的组合。每个组合都定义了block_inceptionblock_reduction函数,block_inception函数保持图片尺寸不变,只是对网络进行非线性变换,block_reduction函数缩减图片尺寸,同时对网络进行非线性变化,两者的组合更好的提炼网络特征,极大的提升了网络的性能。

   第一个模块组合Inception-A blocksReduction-A block,调用了四次block_inception_a函数,生成Mixed_5bMixed_5cMixed_5dMixed_5e四个模块。每个block_inception_a函数有4个分支,第1个分支是1个【1x1】卷积,第2个分支是1个【1x1】卷积和1个【3x3】卷积,第3个分支是1个【1x1】卷积和2个【3x3】卷积,第4个分支是1个【3x3】的最大池化和1个【1x1】卷积,因为padding模式是SAME,张量尺寸不变,还是【35x35x384】。调用了1次block_reduction_a函数,block_reduction_a函数有3个分支,第1个分支是1个3x3】卷积,第2个分支是1个【1x1】卷积和2个【3x3】卷积,第3个分支是1个【3x3】的最大池化,padding模式是VALID,步长为2,所以张量尺寸缩小一半,最后变为【17x17x1024】。

def block_inception_a(inputs, scope=None, reuse=None):
    with slim.arg_scope([slim.conv2d, slim.avg_pool2d, slim.max_pool2d],stride=1, padding='SAME'):
        with tf.variable_scope(scope, 'BlockInceptionA', [inputs], reuse=reuse):
            with tf.variable_scope('Branch_0'):
                branch_0 = slim.conv2d(inputs, 96, [1, 1], scope='Conv2d_0a_1x1')
            with tf.variable_scope('Branch_1'):
                branch_1 = slim.conv2d(inputs, 64, [1, 1], scope='Conv2d_0a_1x1')
                branch_1 = slim.conv2d(branch_1, 96, [3, 3], scope='Conv2d_0b_3x3')
            with tf.variable_scope('Branch_2'):
                branch_2 = slim.conv2d(inputs, 64, [1, 1], scope='Conv2d_0a_1x1')
                branch_2 = slim.conv2d(branch_2, 96, [3, 3], scope='Conv2d_0b_3x3')
                branch_2 = slim.conv2d(branch_2, 96, [3, 3], scope='Conv2d_0c_3x3')
            with tf.variable_scope('Branch_3'):
                branch_3 = slim.avg_pool2d(inputs, [3, 3], scope='AvgPool_0a_3x3')
                branch_3 = slim.conv2d(branch_3, 96, [1, 1], scope='Conv2d_0b_1x1')
            return tf.concat(axis=3, values=[branch_0, branch_1, branch_2, branch_3])

def block_reduction_a(inputs, scope=None, reuse=None):
      with slim.arg_scope([slim.conv2d, slim.avg_pool2d, slim.max_pool2d],stride=1, padding='SAME'):
            with tf.variable_scope(scope, 'BlockReductionA', [inputs], reuse=reuse):
                with tf.variable_scope('Branch_0'):
                    branch_0 = slim.conv2d(inputs, 384, [3, 3], stride=2, padding='VALID',scope='Conv2d_1a_3x3')
                with tf.variable_scope('Branch_1'):
                    branch_1 = slim.conv2d(inputs, 192, [1, 1], scope='Conv2d_0a_1x1')
                    branch_1 = slim.conv2d(branch_1, 224, [3, 3], scope='Conv2d_0b_3x3')
                    branch_1 = slim.conv2d(branch_1, 256, [3, 3], stride=2,padding='VALID',scope='Conv2d_1a_3x3')
                with tf.variable_scope('Branch_2'):
                    branch_2 = slim.max_pool2d(inputs, [3, 3], stride=2, padding='VALID',scope='MaxPool_1a_3x3')
                return tf.concat(axis=3, values=[branch_0, branch_1, branch_2])

# 4 x Inception-A blocks
for idx in range(4):
    block_scope = 'Mixed_5' + chr(ord('b') + idx)
    net = block_inception_a(net, block_scope)           # 35 x 35 x 384

# Reduction-A block
net = block_reduction_a(net, 'Mixed_6a')                # 17 x 17 x 1024

   第二个模块组合Inception-B blocksReduction-B block,调用了七次block_inception_b函数,生成Mixed_6bMixed_6cMixed_6dMixed_6eMixed_6fMixed_6gMixed_6h七个模块。每个block_inception_b函数有4个分支。第1个分支是1个【1x1】卷积,。第2个分支是1个【1x1】卷积、1个【1x7】卷积和1个【7x1】卷积。第3个分支是1个【1x1】卷积、1个【7x1】卷积、1个【1x7】卷积、再1个【7x1】卷积、1个【1x7】卷积。第4个分支是1个【3x3】的最大池化和1个【1x1】卷积,因为padding模式是SAME,张量尺寸不变,还是17x17x1024】。调用了1次block_reduction_b函数,block_reduction_b函数有3个分支,第1个分支是1个【1x1】卷积和1个3x3】卷积,第2个分支是1个【1x1】卷积、1个【1x7】卷积、1个【7x1】卷积和1个【3x3】卷积,第3个分支是1个【3x3】的最大池化,padding模式是VALID,步长为2,所以张量尺寸缩小一半,最后变为【8x8x1536】。

def block_inception_b(inputs, scope=None, reuse=None):
    with slim.arg_scope([slim.conv2d, slim.avg_pool2d, slim.max_pool2d],stride=1, padding='SAME'):
        with tf.variable_scope(scope, 'BlockInceptionB', [inputs], reuse=reuse):
            with tf.variable_scope('Branch_0'):
                branch_0 = slim.conv2d(inputs, 384, [1, 1], scope='Conv2d_0a_1x1')
            with tf.variable_scope('Branch_1'):
                branch_1 = slim.conv2d(inputs, 192, [1, 1], scope='Conv2d_0a_1x1')
                branch_1 = slim.conv2d(branch_1, 224, [1, 7], scope='Conv2d_0b_1x7')
                branch_1 = slim.conv2d(branch_1, 256, [7, 1], scope='Conv2d_0c_7x1')
            with tf.variable_scope('Branch_2'):
                branch_2 = slim.conv2d(inputs, 192, [1, 1], scope='Conv2d_0a_1x1')
                branch_2 = slim.conv2d(branch_2, 192, [7, 1], scope='Conv2d_0b_7x1')
                branch_2 = slim.conv2d(branch_2, 224, [1, 7], scope='Conv2d_0c_1x7')
                branch_2 = slim.conv2d(branch_2, 224, [7, 1], scope='Conv2d_0d_7x1')
                branch_2 = slim.conv2d(branch_2, 256, [1, 7], scope='Conv2d_0e_1x7')
            with tf.variable_scope('Branch_3'):
                branch_3 = slim.avg_pool2d(inputs, [3, 3], scope='AvgPool_0a_3x3')
                branch_3 = slim.conv2d(branch_3, 128, [1, 1], scope='Conv2d_0b_1x1')
            return tf.concat(axis=3, values=[branch_0, branch_1, branch_2, branch_3])

def block_reduction_b(inputs, scope=None, reuse=None):
    with slim.arg_scope([slim.conv2d, slim.avg_pool2d, slim.max_pool2d],stride=1, padding='SAME'):
        with tf.variable_scope(scope, 'BlockReductionB', [inputs], reuse=reuse):
            with tf.variable_scope('Branch_0'):
                branch_0 = slim.conv2d(inputs, 192, [1, 1], scope='Conv2d_0a_1x1')
                branch_0 = slim.conv2d(branch_0, 192, [3, 3], stride=2,padding='VALID', scope='Conv2d_1a_3x3')
            with tf.variable_scope('Branch_1'):
                branch_1 = slim.conv2d(inputs, 256, [1, 1], scope='Conv2d_0a_1x1')
                branch_1 = slim.conv2d(branch_1, 256, [1, 7], scope='Conv2d_0b_1x7')
                branch_1 = slim.conv2d(branch_1, 320, [7, 1], scope='Conv2d_0c_7x1')
                branch_1 = slim.conv2d(branch_1, 320, [3, 3], stride=2,padding='VALID', scope='Conv2d_1a_3x3')
            with tf.variable_scope('Branch_2'):
                branch_2 = slim.max_pool2d(inputs, [3, 3], stride=2, padding='VALID',scope='MaxPool_1a_3x3')
            return tf.concat(axis=3, values=[branch_0, branch_1, branch_2])

# 7 x Inception-B blocks
for idx in range(7):
    block_scope = 'Mixed_6' + chr(ord('b') + idx)
    net = block_inception_b(net, block_scope)           # 17 x 17 x 1024

# Reduction-B block
net = block_reduction_b(net, 'Mixed_7a')                # 8 x 8 x 1536

    第三个模块组合只有Inception-C blocks,主要是对网络的非线性变换。调用了三次block_inception_c函数,生成Mixed_7bMixed_7cMixed_7d三个模块。每个block_inception_c函数有4个分支。第1个分支是1个【1x1】卷积,。第2个分支是1个【1x1】卷积,然后1个【1x3】卷积和1个【3x1】卷积拼接。第3个分支是1个【1x1】卷积、1个【3x1】卷积、1个【1x3】卷积,然后1个【1x3】卷积和1个【3x1】卷积拼接。第4个分支是1个【3x3】的最大池化和1个【1x1】卷积,因为padding模式是SAME,张量尺寸不变,还是【8x8x1536

def block_inception_c(inputs, scope=None, reuse=None):
    with slim.arg_scope([slim.conv2d, slim.avg_pool2d, slim.max_pool2d],stride=1, padding='SAME'):
        with tf.variable_scope(scope, 'BlockInceptionC', [inputs], reuse=reuse):
            with tf.variable_scope('Branch_0'):
                branch_0 = slim.conv2d(inputs, 256, [1, 1], scope='Conv2d_0a_1x1')
            with tf.variable_scope('Branch_1'):
                branch_1 = slim.conv2d(inputs, 384, [1, 1], scope='Conv2d_0a_1x1')
                branch_1 = tf.concat(axis=3, values=[slim.conv2d(branch_1, 256, [1, 3], scope='Conv2d_0b_1x3'),
                                                     slim.conv2d(branch_1, 256, [3, 1], scope='Conv2d_0c_3x1')])
            with tf.variable_scope('Branch_2'):
                branch_2 = slim.conv2d(inputs, 384, [1, 1], scope='Conv2d_0a_1x1')
                branch_2 = slim.conv2d(branch_2, 448, [3, 1], scope='Conv2d_0b_3x1')
                branch_2 = slim.conv2d(branch_2, 512, [1, 3], scope='Conv2d_0c_1x3')
                branch_2 = tf.concat(axis=3, values=[slim.conv2d(branch_2, 256, [1, 3], scope='Conv2d_0d_1x3'),
                                                     slim.conv2d(branch_2, 256, [3, 1], scope='Conv2d_0e_3x1')])
            with tf.variable_scope('Branch_3'):
                branch_3 = slim.avg_pool2d(inputs, [3, 3], scope='AvgPool_0a_3x3')
                branch_3 = slim.conv2d(branch_3, 256, [1, 1], scope='Conv2d_0b_1x1')
            return tf.concat(axis=3, values=[branch_0, branch_1, branch_2, branch_3])

# 3 x Inception-C blocks
for idx in range(3):
    block_scope = 'Mixed_7' + chr(ord('b') + idx)
    net = block_inception_c(net, block_scope)                   # 8 x 8 x 1536      

 

4  AuxLogits、全局平均池化、Softmax分类

         Inception V4网络的最后一部分是辅助分类AuxLogits、全局平均池化、Softmax分类。首先是AuxLogits,作为辅助分类的节点。先通过end_points['Mixed_6h']得到Mixed_6h后的特征张量【17x17x1024】,之后接一个【5x5】的平均池化,步长为3,padding为VALID,张量尺寸从【17x17x1024】变为【5x5x1024】。接着连接一个输出通道为128的【1x1】卷积和输出通道为768的【5x5】卷积,输出尺寸变为【1x1x768】,然后通过slim.flatten转换为【1x768】的二维向量,最后是一个输出通道数为num_classes的全连接层,变为【1x1000】的二维向量,并将辅助分类节点的输出存储到字典表end_points中。

    end_points = {}
    with tf.variable_scope(scope, 'InceptionV4', [inputs], reuse=reuse) as scope:
        with slim.arg_scope([slim.batch_norm, slim.dropout],is_training=is_training):
            net, end_points = inception_v4_base(inputs, scope=scope)
            with slim.arg_scope([slim.conv2d, slim.max_pool2d, slim.avg_pool2d],stride=1, padding='SAME'):
                if create_aux_logits and num_classes:
                    with tf.variable_scope('AuxLogits'):                        
                        aux_logits = end_points['Mixed_6h']                # 17 x 17 x 1024
                        aux_logits = slim.avg_pool2d(aux_logits, [5, 5], stride=3,padding='VALID',scope='AvgPool_1a_5x5')
                        aux_logits = slim.conv2d(aux_logits, 128, [1, 1], scope='Conv2d_1b_1x1')
                        aux_logits = slim.conv2d(aux_logits, 768, aux_logits.get_shape()[1:3],padding='VALID', scope='Conv2d_2a')
                        aux_logits = slim.flatten(aux_logits)
                        aux_logits = slim.fully_connected(aux_logits, num_classes,activation_fn=None,scope='Aux_logits')
                        end_points['AuxLogits'] = aux_logits

        对最后一个卷积层的输出Mixed_7d进行一个【8x8】的全局平均池化,padding为VALID,输出张量从【8 x 8 x 1536】变为【1 x 1 x 1536】。然后连接一个Dropout层,再通过slim.flatten转换为【1x1536】的二维向量。最后是一个输出通道数为num_classes的全连接层,变为【1x1000】的二维向量。再用Softmax得到最终分类结果。返回分类结果logits和包含各个卷积后的特征图字典表end_points。

                with tf.variable_scope('Logits'):                   
                    kernel_size = net.get_shape()[1:3]    # net: 8 x 8 x 1536
                    if kernel_size.is_fully_defined():
                        net = slim.avg_pool2d(net, kernel_size, padding='VALID',scope='AvgPool_1a')
                    else:
                        net = tf.reduce_mean(net, [1, 2], keep_dims=True, name='global_pool')
                    end_points['global_pool'] = net                   
                    if not num_classes:
                        return net, end_points
                                       
                    net = slim.dropout(net, dropout_keep_prob, scope='Dropout_1b')  # net: 1 x 1 x 1536
                    net = slim.flatten(net, scope='PreLogitsFlatten')
                    end_points['PreLogitsFlatten'] = net
                      
                    logits = slim.fully_connected(net, num_classes, activation_fn=None,scope='Logits') # (1 x 1000)
                    end_points['Logits'] = logits
                    end_points['Predictions'] = tf.nn.softmax(logits, name='Predictions')
            return logits, end_points

 

版权声明:本文为博主原创文章,转载请注明出处。 https://blog.csdn.net/fxfviolet/article/details/81666396

你可能感兴趣的:(Inception V4网络结构和代码解析)