学习了Inception V4卷积神经网络,总结一下对Inception V4网络结构和主要代码的理解。
GoogLeNet对网络中的传统卷积层进行了修改,提出了被称为 Inception 的结构,用于增加网络深度和宽度,提高深度神经网络性能。从Inception V1到Inception V4有4个更新版本,每一版的网络在原来的基础上进行改进,提高网络性能。本文介绍Inception V4的网络结构和主要代码,Inception V4研究了Inception Module和Reduction Module的组合,通过多次卷积和非线性变化,极大的提升了网络性能。
首先定义一个非Inception Module的普通卷积层函数inception_v4_base,输入参数inputs为图片数据的张量。第1个卷积层的输出通道数为32,卷积核尺寸为【3x3】,步长为2,padding模式是VALID,第1个卷积层之后的张量尺寸变为(299-3)/2+1=149,即【149x149x32】。第2个卷积层的输出通道数为32,卷积核尺寸为【3x3】,步长为1,padding模式是VALID,第2个卷积层之后的张量尺寸变为(149-3)/1+1=147,即【147x147x32】。第3个卷积层的输出通道数为64,卷积核尺寸为【3x3】,步长为1,padding模式是默认的SAME,第3个卷积层之后的张量尺寸不变,输出通道数改变,最后张量尺寸变为【147x147x64】。这几个普通的卷积层主要使用了3x3的小卷积核,小卷积核可以低成本的跨通道的对特征进行组合。
def inception_v4_base(inputs, scope=None):
with tf.variable_scope(scope, 'InceptionV4', [inputs]):
with slim.arg_scope([slim.conv2d, slim.max_pool2d, slim.avg_pool2d],stride=1, padding='SAME'): # 299 x 299 x 3
net = slim.conv2d(inputs, 32, [3, 3], stride=2,padding='VALID', scope='Conv2d_1a_3x3') # 149 x 149 x 32
net = slim.conv2d(net, 32, [3, 3], padding='VALID',scope='Conv2d_2a_3x3') # 147 x 147 x 32
net = slim.conv2d(net, 64, [3, 3], scope='Conv2d_2b_3x3') # 147 x 147 x 64
普通卷积层之后是三个连续的Inception Module,分别是:Mixed_3a,Mixed_4a,Mixed_5a,每个模块有多个分支。第1个Inception Module Mixed_3a的第1个分支是【3x3】的最大池化,第2个分支是输出通道数为96的【3x3】卷积,步长为2,最后用tf.concat将2个分支的输出合并在一起,输出通道之和为64+96=160,张量尺寸变为(147-3)/2+1=73,即【73x73x160】。第2个Inception Module Mixed_4a的第1个分支经过【1x1】和【3x3】两次卷积,第2个分支经过【1x1】、【1x7】、【7x1】和【3x3】四次卷积,张量尺寸最后变为【71x71x192】。第3个Inception Module Mixed_5a的第1个分支是输出通道数为192的【3x3】卷积,第2个分支是【3x3】的最大池化,卷积和最大池化的步长都是2,且padding模式是VALID,因此尺寸减小一半,最后张量尺寸变为【35x35x384】。
with tf.variable_scope('Mixed_3a'):
with tf.variable_scope('Branch_0'):
branch_0 = slim.max_pool2d(net, [3, 3], stride=2, padding='VALID',scope='MaxPool_0a_3x3')
with tf.variable_scope('Branch_1'):
branch_1 = slim.conv2d(net, 96, [3, 3], stride=2, padding='VALID',scope='Conv2d_0a_3x3')
net = tf.concat(axis=3, values=[branch_0, branch_1]) # 73 x 73 x 160
with tf.variable_scope('Mixed_4a'):
with tf.variable_scope('Branch_0'):
branch_0 = slim.conv2d(net, 64, [1, 1], scope='Conv2d_0a_1x1')
branch_0 = slim.conv2d(branch_0, 96, [3, 3], padding='VALID', scope='Conv2d_1a_3x3')
with tf.variable_scope('Branch_1'):
branch_1 = slim.conv2d(net, 64, [1, 1], scope='Conv2d_0a_1x1')
branch_1 = slim.conv2d(branch_1, 64, [1, 7], scope='Conv2d_0b_1x7')
branch_1 = slim.conv2d(branch_1, 64, [7, 1], scope='Conv2d_0c_7x1')
branch_1 = slim.conv2d(branch_1, 96, [3, 3], padding='VALID',scope='Conv2d_1a_3x3')
net = tf.concat(axis=3, values=[branch_0, branch_1]) # 71 x 71 x 192
with tf.variable_scope('Mixed_5a'):
with tf.variable_scope('Branch_0'):
branch_0 = slim.conv2d(net, 192, [3, 3], stride=2, padding='VALID',scope='Conv2d_1a_3x3')
with tf.variable_scope('Branch_1'):
branch_1 = slim.max_pool2d(net, [3, 3], stride=2, padding='VALID',scope='MaxPool_1a_3x3')
net = tf.concat(axis=3, values=[branch_0, branch_1]) # 35 x 35 x 384
三个Inception模块组之后是三个连续的Inception Module和Reduction Module的组合。每个组合都定义了block_inception和block_reduction函数,block_inception函数保持图片尺寸不变,只是对网络进行非线性变换,block_reduction函数缩减图片尺寸,同时对网络进行非线性变化,两者的组合更好的提炼网络特征,极大的提升了网络的性能。
第一个模块组合Inception-A blocks和Reduction-A block,调用了四次block_inception_a函数,生成Mixed_5b、Mixed_5c、Mixed_5d、Mixed_5e四个模块。每个block_inception_a函数有4个分支,第1个分支是1个【1x1】卷积,第2个分支是1个【1x1】卷积和1个【3x3】卷积,第3个分支是1个【1x1】卷积和2个【3x3】卷积,第4个分支是1个【3x3】的最大池化和1个【1x1】卷积,因为padding模式是SAME,张量尺寸不变,还是【35x35x384】。调用了1次block_reduction_a函数,block_reduction_a函数有3个分支,第1个分支是1个【3x3】卷积,第2个分支是1个【1x1】卷积和2个【3x3】卷积,第3个分支是1个【3x3】的最大池化,padding模式是VALID,步长为2,所以张量尺寸缩小一半,最后变为【17x17x1024】。
def block_inception_a(inputs, scope=None, reuse=None):
with slim.arg_scope([slim.conv2d, slim.avg_pool2d, slim.max_pool2d],stride=1, padding='SAME'):
with tf.variable_scope(scope, 'BlockInceptionA', [inputs], reuse=reuse):
with tf.variable_scope('Branch_0'):
branch_0 = slim.conv2d(inputs, 96, [1, 1], scope='Conv2d_0a_1x1')
with tf.variable_scope('Branch_1'):
branch_1 = slim.conv2d(inputs, 64, [1, 1], scope='Conv2d_0a_1x1')
branch_1 = slim.conv2d(branch_1, 96, [3, 3], scope='Conv2d_0b_3x3')
with tf.variable_scope('Branch_2'):
branch_2 = slim.conv2d(inputs, 64, [1, 1], scope='Conv2d_0a_1x1')
branch_2 = slim.conv2d(branch_2, 96, [3, 3], scope='Conv2d_0b_3x3')
branch_2 = slim.conv2d(branch_2, 96, [3, 3], scope='Conv2d_0c_3x3')
with tf.variable_scope('Branch_3'):
branch_3 = slim.avg_pool2d(inputs, [3, 3], scope='AvgPool_0a_3x3')
branch_3 = slim.conv2d(branch_3, 96, [1, 1], scope='Conv2d_0b_1x1')
return tf.concat(axis=3, values=[branch_0, branch_1, branch_2, branch_3])
def block_reduction_a(inputs, scope=None, reuse=None):
with slim.arg_scope([slim.conv2d, slim.avg_pool2d, slim.max_pool2d],stride=1, padding='SAME'):
with tf.variable_scope(scope, 'BlockReductionA', [inputs], reuse=reuse):
with tf.variable_scope('Branch_0'):
branch_0 = slim.conv2d(inputs, 384, [3, 3], stride=2, padding='VALID',scope='Conv2d_1a_3x3')
with tf.variable_scope('Branch_1'):
branch_1 = slim.conv2d(inputs, 192, [1, 1], scope='Conv2d_0a_1x1')
branch_1 = slim.conv2d(branch_1, 224, [3, 3], scope='Conv2d_0b_3x3')
branch_1 = slim.conv2d(branch_1, 256, [3, 3], stride=2,padding='VALID',scope='Conv2d_1a_3x3')
with tf.variable_scope('Branch_2'):
branch_2 = slim.max_pool2d(inputs, [3, 3], stride=2, padding='VALID',scope='MaxPool_1a_3x3')
return tf.concat(axis=3, values=[branch_0, branch_1, branch_2])
# 4 x Inception-A blocks
for idx in range(4):
block_scope = 'Mixed_5' + chr(ord('b') + idx)
net = block_inception_a(net, block_scope) # 35 x 35 x 384
# Reduction-A block
net = block_reduction_a(net, 'Mixed_6a') # 17 x 17 x 1024
第二个模块组合Inception-B blocks和Reduction-B block,调用了七次block_inception_b函数,生成Mixed_6b、Mixed_6c、Mixed_6d、Mixed_6e、Mixed_6f、Mixed_6g、Mixed_6h七个模块。每个block_inception_b函数有4个分支。第1个分支是1个【1x1】卷积,。第2个分支是1个【1x1】卷积、1个【1x7】卷积和1个【7x1】卷积。第3个分支是1个【1x1】卷积、1个【7x1】卷积、1个【1x7】卷积、再1个【7x1】卷积、1个【1x7】卷积。第4个分支是1个【3x3】的最大池化和1个【1x1】卷积,因为padding模式是SAME,张量尺寸不变,还是【17x17x1024】。调用了1次block_reduction_b函数,block_reduction_b函数有3个分支,第1个分支是1个【1x1】卷积和1个【3x3】卷积,第2个分支是1个【1x1】卷积、1个【1x7】卷积、1个【7x1】卷积和1个【3x3】卷积,第3个分支是1个【3x3】的最大池化,padding模式是VALID,步长为2,所以张量尺寸缩小一半,最后变为【8x8x1536】。
def block_inception_b(inputs, scope=None, reuse=None):
with slim.arg_scope([slim.conv2d, slim.avg_pool2d, slim.max_pool2d],stride=1, padding='SAME'):
with tf.variable_scope(scope, 'BlockInceptionB', [inputs], reuse=reuse):
with tf.variable_scope('Branch_0'):
branch_0 = slim.conv2d(inputs, 384, [1, 1], scope='Conv2d_0a_1x1')
with tf.variable_scope('Branch_1'):
branch_1 = slim.conv2d(inputs, 192, [1, 1], scope='Conv2d_0a_1x1')
branch_1 = slim.conv2d(branch_1, 224, [1, 7], scope='Conv2d_0b_1x7')
branch_1 = slim.conv2d(branch_1, 256, [7, 1], scope='Conv2d_0c_7x1')
with tf.variable_scope('Branch_2'):
branch_2 = slim.conv2d(inputs, 192, [1, 1], scope='Conv2d_0a_1x1')
branch_2 = slim.conv2d(branch_2, 192, [7, 1], scope='Conv2d_0b_7x1')
branch_2 = slim.conv2d(branch_2, 224, [1, 7], scope='Conv2d_0c_1x7')
branch_2 = slim.conv2d(branch_2, 224, [7, 1], scope='Conv2d_0d_7x1')
branch_2 = slim.conv2d(branch_2, 256, [1, 7], scope='Conv2d_0e_1x7')
with tf.variable_scope('Branch_3'):
branch_3 = slim.avg_pool2d(inputs, [3, 3], scope='AvgPool_0a_3x3')
branch_3 = slim.conv2d(branch_3, 128, [1, 1], scope='Conv2d_0b_1x1')
return tf.concat(axis=3, values=[branch_0, branch_1, branch_2, branch_3])
def block_reduction_b(inputs, scope=None, reuse=None):
with slim.arg_scope([slim.conv2d, slim.avg_pool2d, slim.max_pool2d],stride=1, padding='SAME'):
with tf.variable_scope(scope, 'BlockReductionB', [inputs], reuse=reuse):
with tf.variable_scope('Branch_0'):
branch_0 = slim.conv2d(inputs, 192, [1, 1], scope='Conv2d_0a_1x1')
branch_0 = slim.conv2d(branch_0, 192, [3, 3], stride=2,padding='VALID', scope='Conv2d_1a_3x3')
with tf.variable_scope('Branch_1'):
branch_1 = slim.conv2d(inputs, 256, [1, 1], scope='Conv2d_0a_1x1')
branch_1 = slim.conv2d(branch_1, 256, [1, 7], scope='Conv2d_0b_1x7')
branch_1 = slim.conv2d(branch_1, 320, [7, 1], scope='Conv2d_0c_7x1')
branch_1 = slim.conv2d(branch_1, 320, [3, 3], stride=2,padding='VALID', scope='Conv2d_1a_3x3')
with tf.variable_scope('Branch_2'):
branch_2 = slim.max_pool2d(inputs, [3, 3], stride=2, padding='VALID',scope='MaxPool_1a_3x3')
return tf.concat(axis=3, values=[branch_0, branch_1, branch_2])
# 7 x Inception-B blocks
for idx in range(7):
block_scope = 'Mixed_6' + chr(ord('b') + idx)
net = block_inception_b(net, block_scope) # 17 x 17 x 1024
# Reduction-B block
net = block_reduction_b(net, 'Mixed_7a') # 8 x 8 x 1536
第三个模块组合只有Inception-C blocks,主要是对网络的非线性变换。调用了三次block_inception_c函数,生成Mixed_7b、Mixed_7c、Mixed_7d三个模块。每个block_inception_c函数有4个分支。第1个分支是1个【1x1】卷积,。第2个分支是1个【1x1】卷积,然后1个【1x3】卷积和1个【3x1】卷积拼接。第3个分支是1个【1x1】卷积、1个【3x1】卷积、1个【1x3】卷积,然后1个【1x3】卷积和1个【3x1】卷积拼接。第4个分支是1个【3x3】的最大池化和1个【1x1】卷积,因为padding模式是SAME,张量尺寸不变,还是【8x8x1536】。
def block_inception_c(inputs, scope=None, reuse=None):
with slim.arg_scope([slim.conv2d, slim.avg_pool2d, slim.max_pool2d],stride=1, padding='SAME'):
with tf.variable_scope(scope, 'BlockInceptionC', [inputs], reuse=reuse):
with tf.variable_scope('Branch_0'):
branch_0 = slim.conv2d(inputs, 256, [1, 1], scope='Conv2d_0a_1x1')
with tf.variable_scope('Branch_1'):
branch_1 = slim.conv2d(inputs, 384, [1, 1], scope='Conv2d_0a_1x1')
branch_1 = tf.concat(axis=3, values=[slim.conv2d(branch_1, 256, [1, 3], scope='Conv2d_0b_1x3'),
slim.conv2d(branch_1, 256, [3, 1], scope='Conv2d_0c_3x1')])
with tf.variable_scope('Branch_2'):
branch_2 = slim.conv2d(inputs, 384, [1, 1], scope='Conv2d_0a_1x1')
branch_2 = slim.conv2d(branch_2, 448, [3, 1], scope='Conv2d_0b_3x1')
branch_2 = slim.conv2d(branch_2, 512, [1, 3], scope='Conv2d_0c_1x3')
branch_2 = tf.concat(axis=3, values=[slim.conv2d(branch_2, 256, [1, 3], scope='Conv2d_0d_1x3'),
slim.conv2d(branch_2, 256, [3, 1], scope='Conv2d_0e_3x1')])
with tf.variable_scope('Branch_3'):
branch_3 = slim.avg_pool2d(inputs, [3, 3], scope='AvgPool_0a_3x3')
branch_3 = slim.conv2d(branch_3, 256, [1, 1], scope='Conv2d_0b_1x1')
return tf.concat(axis=3, values=[branch_0, branch_1, branch_2, branch_3])
# 3 x Inception-C blocks
for idx in range(3):
block_scope = 'Mixed_7' + chr(ord('b') + idx)
net = block_inception_c(net, block_scope) # 8 x 8 x 1536
Inception V4网络的最后一部分是辅助分类AuxLogits、全局平均池化、Softmax分类。首先是AuxLogits,作为辅助分类的节点。先通过end_points['Mixed_6h']得到Mixed_6h后的特征张量【17x17x1024】,之后接一个【5x5】的平均池化,步长为3,padding为VALID,张量尺寸从【17x17x1024】变为【5x5x1024】。接着连接一个输出通道为128的【1x1】卷积和输出通道为768的【5x5】卷积,输出尺寸变为【1x1x768】,然后通过slim.flatten转换为【1x768】的二维向量,最后是一个输出通道数为num_classes的全连接层,变为【1x1000】的二维向量,并将辅助分类节点的输出存储到字典表end_points中。
end_points = {}
with tf.variable_scope(scope, 'InceptionV4', [inputs], reuse=reuse) as scope:
with slim.arg_scope([slim.batch_norm, slim.dropout],is_training=is_training):
net, end_points = inception_v4_base(inputs, scope=scope)
with slim.arg_scope([slim.conv2d, slim.max_pool2d, slim.avg_pool2d],stride=1, padding='SAME'):
if create_aux_logits and num_classes:
with tf.variable_scope('AuxLogits'):
aux_logits = end_points['Mixed_6h'] # 17 x 17 x 1024
aux_logits = slim.avg_pool2d(aux_logits, [5, 5], stride=3,padding='VALID',scope='AvgPool_1a_5x5')
aux_logits = slim.conv2d(aux_logits, 128, [1, 1], scope='Conv2d_1b_1x1')
aux_logits = slim.conv2d(aux_logits, 768, aux_logits.get_shape()[1:3],padding='VALID', scope='Conv2d_2a')
aux_logits = slim.flatten(aux_logits)
aux_logits = slim.fully_connected(aux_logits, num_classes,activation_fn=None,scope='Aux_logits')
end_points['AuxLogits'] = aux_logits
对最后一个卷积层的输出Mixed_7d进行一个【8x8】的全局平均池化,padding为VALID,输出张量从【8 x 8 x 1536】变为【1 x 1 x 1536】。然后连接一个Dropout层,再通过slim.flatten转换为【1x1536】的二维向量。最后是一个输出通道数为num_classes的全连接层,变为【1x1000】的二维向量。再用Softmax得到最终分类结果。返回分类结果logits和包含各个卷积后的特征图字典表end_points。
with tf.variable_scope('Logits'):
kernel_size = net.get_shape()[1:3] # net: 8 x 8 x 1536
if kernel_size.is_fully_defined():
net = slim.avg_pool2d(net, kernel_size, padding='VALID',scope='AvgPool_1a')
else:
net = tf.reduce_mean(net, [1, 2], keep_dims=True, name='global_pool')
end_points['global_pool'] = net
if not num_classes:
return net, end_points
net = slim.dropout(net, dropout_keep_prob, scope='Dropout_1b') # net: 1 x 1 x 1536
net = slim.flatten(net, scope='PreLogitsFlatten')
end_points['PreLogitsFlatten'] = net
logits = slim.fully_connected(net, num_classes, activation_fn=None,scope='Logits') # (1 x 1000)
end_points['Logits'] = logits
end_points['Predictions'] = tf.nn.softmax(logits, name='Predictions')
return logits, end_points
版权声明:本文为博主原创文章,转载请注明出处。 https://blog.csdn.net/fxfviolet/article/details/81666396