残差网络我是在吴恩达深度学习里了解的,真正领会还是看了几篇博文才慢慢的瞎编出来。
我们先来了解一下为什么会衍生出残差网络。(下面文字是抄的,我觉得我写的肯定不如人家的好,我会把连接贴出来)
深层次网络训练瓶颈:梯度消失,网络退化
当使用更深层的网络时,会发生梯度消失、爆炸问题,这个问题很大程度通过标准的初始化和正则化层来基本解决,这样可以确保几十层的网络能够收敛,但是随着网络层数的增加,梯度消失或者爆炸的问题仍然存在。
问题就是网络的退化,举个例子,假设已经有了一个最优化的网络结构,是18层。当我们设计网络结构的时候,我们并不知道具体多少层次的网络时最优化的网络结构,假设设计了34层网络结构。那么多出来的16层其实是冗余的,我们希望训练网络的过程中,模型能够自己训练这五层为恒等映射,也就是经过这层时的输入与输出完全一样。但是往往模型很难将这16层恒等映射的参数学习正确,那么就一定会不比最优化的18层网络结构性能好,这就是随着网络深度增加,模型会产生退化现象。它不是由过拟合产生的,而是由冗余的网络层学习了不是恒等映射的参数造成的。
从下图可以看出,数据经过了两条路线,一条是常规路线,另一条则是捷径(shortcut),直接实现单位映射的直接连接的路线,这有点类似与电路中的“短路”。通过实验,这种带有shortcut的结构确实可以很好地应对退化问题。我们把网络中的一个模块的输入和输出关系看作是 y = H ( x ) y=H(x) y=H(x),那么直接通过梯度方法求 H(x) 就会遇到上面提到的退化问题,如果使用了这种带shortcut的结构,那么可变参数部分的优化目标就不再是 H(x),若用F(x)来代表需要优化的部分的话,则 H ( x ) = F ( x ) + x H(x)=F(x)+x H(x)=F(x)+x,也就是 F ( x ) = H ( x ) − x F(x)=H(x)−x F(x)=H(x)−x。因为在单位映射的假设中 y = x y y=xy y=xy 就相当于观测值,所以 F ( x ) F(x) F(x) 就对应着残差,因而叫残差网络。为啥要这样做,因为作者认为学习残差 F ( X ) F(X) F(X) 比直接学习 H ( X ) H(X) H(X)简单!设想下,现在根据我们只需要去学习输入和输出的差值就可以了,绝对量变为相对量( H ( x ) − x H(x)−x H(x)−x 就是输出相对于输入变化了多少),优化起来简单很多。
考虑到x的维度与 F ( x ) F(x) F(x)维度可能不匹配情况,需进行维度匹配。这里论文中采用两种方法解决这一问题(其实是三种,但通过实验发现第三种方法会使performance急剧下降,故不采用):
1. zero_padding:对恒等层进行0填充的方式将维度补充完整。这种方法不会增加额外的参数
2. projection:在恒等层采用1x1的卷积核来增加维度。这种方法会增加额外的参数
下图展示了两种形态的残差模块,左图是常规残差模块,有两个3×3卷积核卷积核组成,但是随着网络进一步加深,这种残差结构在实践中并不是十分有效。针对这问题,右图的“瓶颈残差模块”(bottleneck residual block)可以有更好的效果,它依次由1×1、3×3、1×1这三个卷积层堆积而成,这里的1×1的卷积能够起降维或升维的作用,从而令3×3的卷积可以在相对较低维度的输入上进行,以达到提高计算效率的目的。
现在我们进入代码:
导入模块
import tensorflow as tf
from tensorflow.keras import layers,Sequential
import tensorflow.keras as keras
import os
# Basic Block 模块。
class BasicBlock(layers.Layer):
def __init__(self, filter_num, stride=1):
super(BasicBlock, self).__init__()
self.conv1 = layers.Conv2D(filter_num, (3, 3), strides=stride, padding='same')
self.bn1 = layers.BatchNormalization()
self.relu = layers.Activation('relu')
#上一块如果做Stride就会有一个下采样,在这个里面就不做下采样了。这一块始终保持size一致,把stride固定为1
self.conv2 = layers.Conv2D(filter_num, (3, 3), strides=1, padding='same')
self.bn2 = layers.BatchNormalization()
if stride != 1:
self.downsample = Sequential()
self.downsample.add(layers.Conv2D(filter_num, (1, 1), strides=stride))
else:
self.downsample = lambda x:x
def call(self, inputs, training=None):
# [b, h, w, c]
out = self.conv1(inputs)
out = self.bn1(out)
out = self.relu(out)
out = self.conv2(out)
out = self.bn2(out)
identity = self.downsample(inputs)
output = layers.add([out, identity]) #layers下面有一个add,把这2个层添加进来相加。
output = tf.nn.relu(output)
return output
class ResNet(keras.Model):
# 第一个参数layer_dims:[2, 2, 2, 2] 4个Res Block,每个包含2个Basic Block
# 第二个参数num_classes:我们的全连接输出,取决于输出有多少类。
def __init__(self, layer_dims, num_classes=6):
super(ResNet, self).__init__()
# 预处理层;实现起来比较灵活可以加 MAXPool2D,可以没有。
self.stem = Sequential([layers.Conv2D(64, (3, 3), strides=(1, 1)),
layers.BatchNormalization(),
layers.Activation('relu'),
layers.MaxPool2D(pool_size=(2, 2), strides=(1, 1), padding='same')
])
# 创建4个Res Block;注意第1项不一定以2倍形式扩张,都是比较随意的,这里都是经验值。
self.layer1 = self.build_resblock(64, layer_dims[0])
self.layer2 = self.build_resblock(128, layer_dims[1], stride=2)
self.layer3 = self.build_resblock(256, layer_dims[2], stride=2)
self.layer4 = self.build_resblock(512, layer_dims[3], stride=2)
self.avgpool = layers.GlobalAveragePooling2D()
self.fc = layers.Dense(num_classes)
def call(self,inputs, training=None):
# __init__中准备工作完毕;下面完成前向运算过程。
x = self.stem(inputs)
x = self.layer1(x)
x = self.layer2(x)
x = self.layer3(x)
x = self.layer4(x)
# 做一个global average pooling,得到之后只会得到一个channel,不需要做reshape操作了。
# shape为 [batchsize, channel]
x = self.avgpool(x)
# [b, 100]
x = self.fc(x)
return x
# 实现 Res Block; 创建一个Res Block
def build_resblock(self, filter_num, blocks, stride=1):
res_blocks = Sequential()
# may down sample 也许进行下采样。
# 对于当前Res Block中的Basic Block,我们要求每个Res Block只有一次下采样的能力。
res_blocks.add(BasicBlock(filter_num, stride))
for _ in range(1, blocks):
res_blocks.add(BasicBlock(filter_num, stride=1)) # 这里stride设置为1,只会在第一个Basic Block做一个下采样。
return res_blocks
def resnet18():
return ResNet([2, 2, 2, 2])
model = resnet18()
model.build(input_shape=(None, 32, 32, 3))
model.summary()
到这里残差网络就已经完成了。
CBAM主要在传统CNN上引入通道注意力机制和空间注意力机制
import tensorflow as tf
from tensorflow.keras import layers,Sequential,regularizers,optimizers
import tensorflow.keras as keras
定义一个3 * 3 的卷积,kernel_initializer=“he_normal”,"logrot_normal"
def regurlarized_padded_conv(*args,**kwargs):
return layers.Conv2D(*args,**kwargs,padding="same",
use_bias=False,
kernel_initializer="he_normal",
kernel_regularizer=regularizers.l2(5e-4))
通道注意力机制
class ChannelAttention(layers.Layer):
def __init__(self,in_planes,ration=16):
super(ChannelAttention,self).__init__()
self.avg = layers.GlobalAveragePooling2D()
self.max = layers.GlobalMaxPooling2D()
self.conv1 = layers.Conv2D(in_planes//ration,kernel_size=1,strides=1,
padding ="same",
kernel_regularizer=regularizers.l2(1e-4),
use_bias=True,activation=tf.nn.relu)
self.conv2 = layers.Conv2D(in_planes,kernel_size=1,strides=1,
padding= "same",
kernel_regularizer=regularizers.l2(1e-4),
use_bias=True)
def call(self,inputs):
avg = self.avg(inputs)
max = self.max(inputs)
avg = layers.Reshape((1,1,avg.shape[1]))(avg)
max = layers.Reshape((1,1,max.shape[1]))(max)
avg_out = self.conv2(self.conv1(avg))
max_out = self.conv2(self.conv1(max))
out = avg_out + max_out
out = tf.nn.sigmoid(out)
return out
空间注意力机制
class SpatialAttention(layers.Layer):
def __init__(self,kernel_size=7):
super(SpatialAttention,self).__init__()
self.conv1 = regurlarized_padded_conv(1,kernel_size=kernel_size,strides=1,activation=tf.nn.sigmoid)
def call(self,inputs):
avg_out = tf.reduce_mean(inputs,axis=3)
max_out = tf.reduce_max(inputs,axis=3)
out = tf.stack([avg_out,max_out],axis=3)
out = self.conv1(out)
return out
Basic Block 模块
class BasicBlock(layers.Layer):
expansion = 1
def __init__(self,in_channels,out_channels,stride=1):
super(BasicBlock,self).__init__()
self.conv1 = regurlarized_padded_conv(out_channels,kernel_size=3,
strides=stride)
self.bn1 = layers.BatchNormalization()
self.conv2 = regurlarized_padded_conv(out_channels,kernel_size=3,strides=1)
self.bn2 = layers.BatchNormalization()
########注意力机制#################
self.ca = ChannelAttention(out_channels)
self.sa = SpatialAttention()
#3.判断stride是否等于1,如果为1就是没有降采样
if stride != 1 or in_channels != self.expansion * out_channels:
self.shortcut = Sequential([regurlarized_padded_conv(self.expansion*out_channels,
kernel_size=1,strides=stride),
layers.BatchNormalization()])
else:
self.shortcut = lambda x,_:x
def call(self,inputs,training=False):
out = self.conv1(inputs)
out = self.bn1(out,training=training)
out = tf.nn.relu(out)
out = self.conv2(out)
out = self.bn2(out,training=training)
########注意力机制###########
out = self.ca(out) * out
out = self.sa(out) * out
out = out + self.shortcut(inputs,training)
out = tf.nn.relu(out)
return out
Res Block 模块
我用CPU运算太慢了,所以我注释掉了两个跳跃模块具体过程可以看Res Block 模块代码:
class ResNet(keras.Model):
def __init__(self,layer_dims,num_classes=6):
super(ResNet,self).__init__()
self.in_channels = 64
#预测理卷积
self.stem = Sequential([
regurlarized_padded_conv(64,kernel_size=3,strides=1),
layers.BatchNormalization()
])
#创建4个残差网络
self.layer1 = self.build_resblock(32,layer_dims[0],stride=1)
self.layer2 = self.build_resblock(64,layer_dims[1],stride=2)
# self.layer3 = self.build_resblock(256,layer_dims[2],stride=2)
# self.layer4 = self.build_resblock(512,layer_dims[3],stride=2)
self.final_bn = layers.BatchNormalization()
self.avgpool = layers.GlobalAveragePooling2D()
self.fc = layers.Dense(num_classes,activation="softmax")
def call(self,inputs,training=False):
out = self.stem(inputs,training)
out = tf.nn.relu(out)
out = self.layer1(out,training=training)
out = self.layer2(out,training=training)
# out = self.layer3(out,training=training)
# out = self.layer4(out,training=training)
out = self.final_bn(out)
out = self.avgpool(out)
out = self.fc(out)
return out
# self.final_bn = layers.BatchNormalization()
# self.avgpool =
#1.创建resBlock
def build_resblock(self,out_channels,num_blocks,stride):
strides = [stride] + [1] * (num_blocks - 1)
res_blocks = Sequential()
for stride in strides:
res_blocks.add(BasicBlock(self.in_channels,out_channels,stride))
self.in_channels = out_channels
return res_blocks
注意!!!
因为我注释掉了最后两个跳跃模块,就不是18层了,而是10层,所以我在设置模块时,不再是[2,2,2,2] 而是 [2,2]
ef ResNet18():
return ResNet([2,2])
数据预处理
mport numpy as np
import matplotlib.pyplot as plt
from keras.preprocessing.image import ImageDataGenerator,load_img,img_to_array,array_to_img
import glob,os,random
#加载数据集路径
base_path = "./data"
#查看数据集长度 2295
img_list = glob.glob(os.path.join(base_path,"*/*.jpg"))
print(len(img_list))
对数据进行分组
# 对数据集进行分组
train_datagen = ImageDataGenerator(
rescale=1./255,shear_range=0.1,zoom_range=0.1,
width_shift_range=0.1,height_shift_range=0.1,horizontal_flip=True,
vertical_flip=True,validation_split=0.1)
test_data = ImageDataGenerator(rescale=1./255,validation_split=0.1)
train_generator = train_datagen.flow_from_directory(base_path,target_size=(300,300),
batch_size=16,
class_mode="categorical",
subset="training",seed=0)
validation_generator = test_data.flow_from_directory(base_path,target_size=(300,300),
batch_size=16,
class_mode="categorical",
subset="validation",seed=0
)
labels = (train_generator.class_indices)
labels = dict((v,k) for k,v in labels.items())
print(labels)
训练数据
model = ResNet18()
model.build(input_shape=(None,300,300,3))
model.summary()
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['acc'])
model.fit_generator(train_generator, epochs=100, steps_per_epoch=2068//32,validation_data=validation_generator,
validation_steps=227//32)
测试数据我就不写了,上一篇垃圾分类里已经写了。
到这里就完成了。因为我注释掉了2个残差模块,训练的结果是惨不忍睹,没办法,我的小妾(GPU)真是做不到啊!!!!!
参考自大神博文处,我这是组合性质,单独一个知识点请点击下方:
《CBAM 注意力机制详解》
https://blog.csdn.net/abc13526222160/article/details/103765484
《十分钟理解残差网络》
https://blog.csdn.net/fendouaini/article/details/82027389?depth_1-utm_source=distribute.pc_relevant.none-task-blog-BlogCommendFromBaidu-3&utm_source=distribute.pc_relevant.none-task-blog-BlogCommendFromBaidu-3
《残差网络详解》
https://blog.csdn.net/abc13526222160/article/details/90057121?depth_1-utm_source=distribute.pc_relevant.none-task-blog-BlogCommendFromBaidu-9&utm_source=distribute.pc_relevant.none-task-blog-BlogCommendFromBaidu-9