参考链接:
1、ResNet介绍
2、resnet50结构图
根据上面参考链接有两处纠错,如下图:
1、介绍的表格里,最下面平均值池化应该是7*7才对,否则验证的时候结果输出不正确
resnet主要有resnet18、resnet34、resnet50、resnet101、resnet502 ,5种结构,最常用Resnet50,其他的结构也可以尝试。根据这两篇博客描述的进行搭建网络模型,代码如下:
# *_* coding : UTF-8 *_*
# Author : ZPH 复现Fang Teacher
# Creat Data : 2021/4/10 21:46
# Project Name : my_resnet.PY
import tensorflow as tf
MODE_RESNET18 = "resnet18"#2,2,2,2
MODE_RESNET34 = "resnet34"#3,4,6,3 building_block
MODE_RESNET50 = "resnet50"#3,4,6,3 bottleneck
MODE_RESNET101 = "resnet101"#3,4,23,3
MODE_RESNET152 = "resnet152"#3,8,36,3
def building_block(input,init_filters,resize,name):
with tf.variable_scope(name):
with tf.variable_scope("left"):
left = tf.layers.conv2d(input,init_filters,3,2 if resize else 1,"same",name="Conv1")
left = tf.layers.batch_normalization(left,training=True)
left = tf.nn.relu(left)
left = tf.layers.conv2d(left,init_filters,3,1,"same",name="Conv2")
left = tf.layers.batch_normalization(left,training=True)
with tf.variable_scope("right"):
if resize or input.shape[3].value != init_filters:
right = tf.layers.conv2d(input,init_filters,3,2 if resize else 1,"same",name="Conv1")
right = tf.layers.batch_normalization(right,training=True)
else:
right = input
return tf.nn.relu(left+right)
def bottlenck(input,init_filters,resize,name):
with tf.variable_scope(name):
with tf.variable_scope("left"):
#1*1
left = tf.layers.conv2d(input, init_filters, 1, 2 if resize else 1, "same", name="Conv1")
left = tf.layers.batch_normalization(left, training=True)
left = tf.nn.relu(left)
#3*3
left = tf.layers.conv2d(left, init_filters, 3, 1, "same", name="Conv2")
left = tf.layers.batch_normalization(left, training=True)
left = tf.nn.relu(left)
# 1*1 ,最后一层通道数变成了4倍
init_filters *= 4
left = tf.layers.conv2d(left, init_filters, 1, 1, "same", name="Conv3")
left = tf.layers.batch_normalization(left, training=True)
with tf.variable_scope("right"):
if resize or input.shape[3].value != init_filters:
right = tf.layers.conv2d(input, init_filters, 3, 2 if resize else 1, "same", name="Conv1")
right = tf.layers.batch_normalization(right, training=True)
else:
right = input
return tf.nn.relu(left + right)
_setting = {
MODE_RESNET18:((2,2,2,2),building_block),
MODE_RESNET34:((3,4,6,3),building_block),
MODE_RESNET50:((3,4,6,3),bottlenck),
MODE_RESNET101:((3,4,23,3),bottlenck),
MODE_RESNET152:((3,4,36,3),bottlenck),
}
def _check(input):#输入数据,形状是标准的卷积样例[-1,244,244,channel]
_,height,width,_ = input.shape
height = height.value
if height %32 !=0:
raise Exception("The height of the input must be times of 32")
width = width.value
if width %32 !=0:
raise Exception("The width of the input must be time of 32")
return height,width
_name_id = 0
def resnet(input,mode,logit_size,name=None):
'''
使用resnet进行抽取特征向量
:param input: 输入数据,形状是标准的卷积样例[-1,244,244,channel],给出的样例,输入是224*224
:param mode:MODE_RESNET152、MODE_RESNET101、MODE_RESNET50、MODE_RESNET34、MODE_RESNET18
:param logit_size:输出的向量长度,自定义
:param name:
:return:[-1,logit_size]
'''
height,width = _check(input)#数据有效性检查
if name is None:
global _name_id
name = "resnet_%d"%_name_id
_name_id +=1
with tf.variable_scope(name):
base_size = (height//32,width//32)#(7,7)
input = tf.layers.conv2d(input,64,base_size,2,"same",activation=tf.nn.relu,name="conv1")
input = tf.layers.max_pooling2d(input,3,2,"same")#
init_filter = 64#初始通道数
module = _setting[mode][1]#制定resnet的模型函数
for ord,repeats in enumerate(_setting[mode][0]):#[3,4,6,3]
for i in range(repeats):#对里面每一个元素进行循环构建
resize = i==0and ord !=0#布尔类型,是否要进行resize
input = module(input,init_filter,resize,"Conv%d_%d"%(ord,i+1))
init_filter *= 2
input = tf.layers.average_pooling2d(input,base_size,1,"valid")
semantics = tf.reshape(input,[-1,input.shape[3].value])
logit = tf.layers.dense(semantics,logit_size,name="dense")#[-1,logit_size]
return logit
if __name__ == '__main__':
a = tf.random_normal([10,224,224,3])#随机模拟生成10个样本,224*224的图片,3通道
b = resnet(a,MODE_RESNET50,100)
print(b.shape)#输出的语义向量应该是[10,100]
其实还有很多可以优化的地方可以测试,如:
1、在卷积的时候采用步长为2进行降采样,可以写成步长均为1的计算,保证计算的充足性。然后在Maxpooling的时候进行下采样,把maxpooling 步长2可以设置为4(理论上该方法与卷积进行步长为2和maxpooling步长为2是等同的效果,但结果应该更优,需要实验验证才可得出这个结论)
2、maxpooling窗口大小可以设置22或者55…根据具体情形测试
3、理论上说输入的数据是32的倍数就行,同时在最后拉平的时候需要注意参数调整(这是我猜测的,也需要验证)