动手学习ResNet50

ResNet 论文

《Deep Residual Learning for Image Recognition》
论文地址:https://arxiv.org/abs/1512.03385

残差网络(ResNet)

以学习ResNet的收获、ResNet50的复现二大部分,简述ResNet50网络。

一、学习ResNet的收获

  1. ResNet网络解决了深度CNN模型难训练的问题,并指出CNN模型随深度的加深可以得到更优的解,对于 plain/residual nets 均如此。随着 nets的加深,plain nets 的收敛速度变得缓慢,导致了CNN模型的深层网络不如较浅层网络,但是 residual nets 解决了这个问题。至于 residual nets 如何解决深层网络收敛速度变得缓慢的问题,得益于 residual learning 。

  2. Residual learning 的理想公式为
    F ( x ) = H ( x ) − x . ( 1 ) F(x) = H(x)-x.(1) F(x)=H(x)x.(1)
    H ( x ) = F ( x ) + x . ( 2 ) H(x)=F(x)+x.(2) H(x)=F(x)+x.(2)
    设所需求解的底层映射为H(x),通过训练堆叠的非线性层映射F(x)=H(x)-x,因此H(x)=F(x)+x。
    但是 Residual learning 的实际使用公式为
    y = F ( x , W i ) + x . ( 3 ) y=F(x,\mathop{W}_{i})+x.(3) y=F(x,Wi)+x.(3)
    或 y = F ( x , W i ) + W s x . ( 4 ) 或y=F(x,\mathop{W}_{i})+\mathop{W}_{s}x.(4) y=F(x,Wi)+Wsx.(4)
    设x、y、F分别为输入、输出、堆叠的非线性层映射。采用标识性快捷连接(Identity Mapping by Shortcuts),把输入x与堆叠的非线性层映射F相加得到输出y。其中Wi,Ws均为线性投影,对于x和F的维数(及尺寸),相同时使用公式(3),不相同时使用公式(4)。
    (Fig 2.为 Residual learning 的结构图)
    动手学习ResNet50_第1张图片

  3. 对于 plain/residual nets 的深度探究与验证。

    1). plain nets 随着网络深度的加深,plain nets 的性能降低并非由梯度消失或梯度爆炸引起。对于 20-layer and 56-layer “plain” networks,其 training error 和 test error 随网络深度的加深而提高,但并非无法训练(如Fig 1.)。因此网络深度的加深,导致了网络收敛速度的降低。
    动手学习ResNet50_第2张图片

    2). residual nets 随着网络深度的加深,residual nets 的性能提升并非一成不变。《Deep Residual Learning for Image Recognition》指出 1202-layers ResNet 的 error 比 110-layers ResNet 的 error 高出 1.5% 左右,在CIFAR-10数据集及其他条件相同下测得(见 Table 6.)。何博士猜测数据集的规模未能匹配网络的规模,造成 residual nets 性能的降低(通俗:数据集的规模太小而网络的深度太大,造成 residual nets 性能的降低)。
    动手学习ResNet50_第3张图片

二、ResNet50的复现

通过九大部分复现ResNet50,例如数据增强、参数控制、数据集构建、网络构建、模型加载、模型训练、绘制损失图、模型验证、模型预测。

1. 数据增强

	数据增强由缩放、裁剪、旋转、亮度、对比度、颜色的变化组成。调用pytorch的	 transformer完成。
	注意transformer匹配PIL的RGB格式,而非Opencv的BGR格式。

打开图片

import matplotlib.pyplot as plt
from PIL import Image
from torchvision import transforms as tfs

img = Image.open('gift/bald3.jpg')
plt.imshow(img)
plt.show()
# print(img.size)--->(301, 301)

动手学习ResNet50_第4张图片
比例缩放

# 比例缩放
# rint('before scale, shape:{}'.format(img.size))-->(before scale, shape:(301, 301))
# 参数为元组(row,col)时,图片尺寸硬性放缩为(row,col)(宽,长)
# 参数为数字size时,图片按比例缩放,其中min(row,col)=size
new_img = tfs.Resize(256)(img)
# print('after scale, shape:{}'.format(new_img.size))-->(after scale, shape:(256, 256))
plt.imshow(new_img)
plt.show()

动手学习ResNet50_第5张图片
随机裁剪

# 随机裁剪
random_img = tfs.RandomCrop(224)(new_img)
plt.imshow(random_img)
plt.show()

动手学习ResNet50_第6张图片
随机旋转

# 随机角度旋转
for i in range(1):
    rot_img = tfs.RandomRotation(180)(new_img)
    plt.imshow(rot_img)
    plt.show()

动手学习ResNet50_第7张图片

# 在这里给出图片翻转的代码,效果自行实验
# 随机水平翻转
h_filp = tfs.RandomHorizontalFlip()(new_img)
plt.imshow(h_filp)
plt.show()
# 随机竖直翻转
v_filp = tfs.RandomVerticalFlip()(new_img)
plt.imshow(v_filp)
plt.show()

亮度变化

# 亮度
bright_img = tfs.ColorJitter(brightness=1)(new_img)# 随机从 0---2 之间亮度变化,1 表示原图
plt.imshow(bright_img)
plt.show()

动手学习ResNet50_第8张图片
对比度变化

# 对比度
contrast_img = tfs.ColorJitter(contrast=1)(new_img)# 随机从 0---2 之间对比度变化,1 表示原图
plt.imshow(contrast_img)
plt.show()

动手学习ResNet50_第9张图片
颜色变化

# 颜色
color_img = tfs.ColorJitter(hue=0.5)(new_img)# 随机从 -0.5---0.5 之间对颜色变化
plt.imshow(color_img)
plt.show()

动手学习ResNet50_第10张图片
数据增强总实验

img_aug = tfs.Compose([
    tfs.Resize(256),
    tfs.RandomRotation(180),
    tfs.RandomCrop(224),
    tfs.ColorJitter(brightness=0.5, contrast=0.5, hue=0.5)
])
# 以九宫格的形式展示
nrows = 3
ncols = 3
figsize = (32, 32)
_, figs = plt.subplots(nrows, ncols, figsize=figsize)
for i in range(nrows):
    for j in range(ncols):
        figs[i][j].imshow(img_aug(img))
        figs[i][j].axes.get_xaxis().set_visible(False)
        figs[i][j].axes.get_yaxis().set_visible(False)
plt.show()

2. 参数控制

batch_size = 32            # 每次喂入的数据量
lr         = 0.01          # 学习率
step_size  = 1             # 每n个epoch更新一次学习率
epoch_num  = 10            # 总迭代次数
num_print  = 280           # 每n次batch打印一次
num_check  = 1             # 每n个epoch验证一次模型,若效果更优则保存模型
enhance    = False         # 是否数据增强

3. 数据集构建

数据预处理,采用标准化处理。详细见动手学习VGG16的数据预处理。

"""
train_path、verification_path、test_path 同为字典(类别kind,列表list),其中列表内存放着图片的绝对路径。
labels 也是一个字典(类别kind,序号number),序号为1~n的数字
"""

import torch
import random
from PIL import Image
from torch.autograd import Variable
from torchvision import transforms
from torch.utils.data import Dataset, DataLoader

en_transform = transforms.Compose([
    transforms.Resize(256),
    transforms.RandomRotation(180),
    transforms.RandomCrop(224),
    transforms.ColorJitter(brightness=0.5, contrast=0.5, hue=0.5),
    transforms.ToTensor(),
    transforms.Normalize((0.51865010496974, 0.49877608954906466, 0.5143190141916275),
                         (7.181780141103533, 8.053991771959863, 8.290017965464534))
])

no_transform = transforms.Compose([
    transforms.Resize(224),
    transforms.ToTensor(),
    transforms.Normalize((0.51865010496974, 0.49877608954906466, 0.5143190141916275),
                         (7.181780141103533, 8.053991771959863, 8.290017965464534))
                               ])

# -----------------ready the dataset--------------------------
def default_loader(path):
    img = Image.open(path)
    return img

class MyDataset(Dataset):
    # 构造函数
    def __init__(self, path, transform=None, target_transform=None, loader=default_loader, enhance=False):
        imgs = []
        for classification in path:
            for i in range(len(path[classification])):
                img_path  = path[classification][i]
                img_label = labels[classification]
                imgs.append((img_path,int(img_label)))#imgs中包含有图像路径和标签
        self.path             = path
        self.imgs             = imgs
        self.transform        = transform
        self.target_transform = target_transform
        self.loader           = loader
        
    # hash_map建立
    def __getitem__(self, index):
        img_path, img_label = self.imgs[index]
        # 调用 opencv 打开图片
        img = self.loader(img_path)
        if self.transform is not None:
            img = self.transform(img)
        img_label -= 1
        return img, img_label
    
    def __len__(self):
        return len(self.imgs)

train_data        = MyDataset(train_path, transform=no_transform,target_transform=en_transform,enhance=True)
verification_data = MyDataset(verification_path, transform=no_transform,target_transform=en_transform,enhance=False)
test_data         = MyDataset(test_path, transform=no_transform,target_transform=en_transform,enhance=False)

#train_data 、verification_data和test_data包含多有的训练、验证与测试数据,调用DataLoader批量加载
train_loader        = DataLoader(dataset=train_data, batch_size=batch_size, shuffle=True)
verification_loader = DataLoader(dataset=verification_data, batch_size=batch_size, shuffle=False)
test_loader         = DataLoader(dataset=test_data, batch_size=batch_size, shuffle=False)

4. 网络构建

我们先来领略一下何博士设计的ResNet网络的结构。
动手学习ResNet50_第11张图片

ResNet网络由数个残差块及卷积层构成,残差块的结构决定了ResNets。在这里给出何博士提到的2个结构分别为original、proposed,其中original为论文的主要实验对象,proposed为何博士于附录中提及。(其结构及性能见图8)参考于你必须要知道CNN模型:ResNet。
动手学习ResNet50_第12张图片

import torch
from torch import optim
import torchvision
import matplotlib.pyplot as plt
import numpy as np
from torchvision.utils import make_grid
import time
from torch import nn
from torchsummary import summary

残差块-original

# 残差块
class Residual(nn.Module):
    def __init__(self,input_channels, temp_channels, num_channels,
                  use_1x1conv=False, strides=1):
        super(Residual, self).__init__()
        
        self.conv1 = nn.Conv2d(input_channels, temp_channels,
                                   kernel_size=1, stride=strides)
        self.conv2 = nn.Conv2d(temp_channels, temp_channels,
                                   kernel_size=3, padding=1)
        self.conv3 = nn.Conv2d(temp_channels, num_channels,
                                   kernel_size=1)
        if use_1x1conv:
            self.conv4 = nn.Conv2d(input_channels,num_channels,
                                   kernel_size=1,stride=strides)
        else:
            self.conv4 = None
                
        self.bn1 = nn.BatchNorm2d(temp_channels)
        self.bn2 = nn.BatchNorm2d(temp_channels)
        self.bn3 = nn.BatchNorm2d(num_channels)
        
        self.rl1 = nn.ReLU(inplace=True)
        self.rl2 = nn.ReLU(inplace=True)
        self.rl3 = nn.ReLU(inplace=True)
        
    def forward(self, x):
        y = self.rl1(self.bn1(self.conv1(x)))
        y = self.rl2(self.bn2(self.conv2(y)))
        y = self.bn3(self.conv3(y))
            
        if self.conv4:
            x = self.conv4(x)
            
        return self.rl3(y + x)

残差块-proposed

# 残差块
class Residual(nn.Module):
    def __init__(self,input_channels, temp_channels, num_channels,
                  use_1x1conv=False, strides=1):
        super(Residual, self).__init__()
        
        self.conv1 = nn.Conv2d(input_channels, temp_channels,
                                   kernel_size=1, stride=strides)
        self.conv2 = nn.Conv2d(temp_channels, temp_channels,
                                   kernel_size=3, padding=1)
        self.conv3 = nn.Conv2d(temp_channels, num_channels,
                                   kernel_size=1)
        if use_1x1conv:
            self.conv4 = nn.Conv2d(input_channels,num_channels,
                                   kernel_size=1,stride=strides)
        else:
            self.conv4 = None
                
        self.bn1 = nn.BatchNorm2d(input_channels)
        self.bn2 = nn.BatchNorm2d(temp_channels)
        self.bn3 = nn.BatchNorm2d(temp_channels)
        
        self.rl1 = nn.ReLU(inplace=True)
        self.rl2 = nn.ReLU(inplace=True)
        self.rl3 = nn.ReLU(inplace=True)
        
    def forward(self, x):
        y = self.conv1(self.rl1(self.bn1(x)))
        y = self.conv2(self.rl2(self.bn2(y)))
        y = self.conv3(self.rl3(self.bn3(y)))
            
        if self.conv4:
            x = self.conv4(x)
            
        return y + x

ResNet50网络

# 残差网络

# Residual封装
def resent_block(channels, num_residuals, first_block=False):
    blk = []
    input_channels, temp_channels, num_channels = channels
    for i in range(num_residuals):
        if i == 0 and first_block:
            blk.append(Residual(channels[0], channels[1], channels[2],
                                use_1x1conv=True))
        elif i == 0 and not first_block:
            blk.append(Residual(channels[0], channels[1], channels[2],
                                use_1x1conv=True, strides=2))
        else:
            blk.append(Residual(channels[2], channels[1], channels[2]))
    return blk

# ResNet50定义
class ResNet50(nn.Module):
    def __init__(self):
        super(ResNet50, self).__init__()
        
        # 第一层,1个卷积层和1个最大池化层
        self.layer1 = nn.Sequential(
            nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3),
            nn.BatchNorm2d(64),
            nn.ReLU(inplace=True),
            
            nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
        )
        
        # 第二层,3 * 3个卷积层
        self.layer2 = nn.Sequential(*resent_block((64, 64, 256), 3, first_block=True))
        
        # 第三层,4 * 3个卷积层
        self.layer3 = nn.Sequential(*resent_block((256, 128, 512), 4))
        
        # 第四层,6 * 3个卷积层
        self.layer4 = nn.Sequential(*resent_block((512, 256, 1024), 6))
        
        # 第五层,3 * 3个卷积层
        self.layer5 = nn.Sequential(*resent_block((1024, 512, 2048), 3))
        
        self.conv_layer = nn.Sequential(
            self.layer1,
            self.layer2,
            self.layer3,
            self.layer4,
            self.layer5
        )
        
        self.fc = nn.Sequential(
            nn.Linear(2048, 1000),
            nn.ReLU(inplace=True),
            
            nn.Linear(1000, 29)
        )
        
    def forward(self, x):
        x = self.conv_layer(x)
        # 全局平均池化层
        x = nn.functional.adaptive_avg_pool2d(x, (1,1))
        x = x.view(x.size(0), -1)
        x = self.fc(x)
        return x

# 验证网络是否可运行
if __name__ == "__main__":
    device       = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    resent_model = ResNet50().to(device)
    summary(resent_model, (3, 224, 224))#打印网络结构

其中original、proposed结构的ResNet50的参数均为49kw(与何博士提出的ResNet50的参数有出入),区别于卷积层、归一化层、激活函数ReLU层的顺序不同。

5. 模型加载

# ResNet50
PATH   = None

if not PATH:
    device      = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
    model       = ResNet50().to(device)
else:
    model  = ResNet50()
    model.load_state_dict(torch.load(PATH), strict=False)
    model.eval()
    device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
    model.to(device)
    summary(model, (3,224,224))

# 调参
# 交叉熵
criterion = nn.CrossEntropyLoss() 
# 迭代器
optimizer = optim.SGD(model.parameters(), lr=lr, momentum=0.8, weight_decay=0.001)
# 更新学习率
schedule  = optim.lr_scheduler.StepLR(optimizer, step_size=step_size, gamma=0.5, last_epoch=-1)

6. 模型训练

# 训练

# 损失图
loss_list         = []
correct_optimal   = 0.0

for epoch in range(epoch_num):
    model.train()
    running_loss = 0.0
    start        = time.time()
    print(1 + epoch)
    for i, (inputs, labels) in enumerate(train_loader, 0):
        # 从train_loader中取出64个数据
        inputs, labels = inputs.to(device), labels.to(device)
        # 梯度清零
        optimizer.zero_grad()
        
        # 模型训练
        outputs = model(inputs)
        #print(outputs.shape)
        
        # 反向传播
        loss = criterion(outputs,labels).to(device)
        loss.backward()
        optimizer.step()
        
        running_loss += loss.item()
        
        if (i+1) % num_print == 0:
            print('[%d epoch, %d]  loss:%.6f' %(epoch+1, i+1, running_loss/num_print))
            loss_list.append(running_loss/num_print)
            running_loss = 0.0
    
    # 打印学习率及确认学习率是否进行更新
    lr_1 = optimizer.param_groups[0]['lr']
    print("learn_rate: %.15f"%lr_1)
    schedule.step()
    
    # 验证模式
    if (epoch+1) % num_check == 0:
        # 不需要梯度更新
        model.eval()
        correct = 0.0
        total   = 0
        with torch.no_grad():
            print("=======================check=======================")
            for inputs, labels in verification_loader:
                # 从train_loader中取出batch_size个数据
                inputs, labels = inputs.to(device), labels.to(device)
                
                # 模型验证
                outputs = model(inputs)
                pred    = outputs.argmax(dim=1) #返回每一行中最大值的索引
                total   = total + inputs.size(0)
                correct = correct + torch.eq(pred, labels).sum().item()
            
        
        correct = 100 * correct/total
        print("Accuracy of the network on the 19797 verification images:%.2f %%" %correct )
        print("===================================================")
        
        # 模型保存
        if correct > correct_optimal:
            torch.save(model.state_dict(), 'ResNet_enhance/ResNet50_%03d-correct%.3f.pth' % (epoch + 1, correct))
            correct_optimal = correct
        
    end=time.time()
    print("time:{}".format(end-start))

7. 绘制损失图

以下损失图分别为original、proposed结构的ResNet50的损失图,其6个单位(由数据集的数量及自定义参数num_print决定)为一次epoch。

import matplotlib.pyplot as plt

x = [ i+1 for i in range(len(loss_list)) ]

# plot函数作图
plt.plot(x, loss_list)  

# show函数展示出这个图,如果没有这行代码,则程序完成绘图,但看不到
plt.show() 

original
动手学习ResNet50_第13张图片

proposed
动手学习ResNet50_第14张图片

8. 模型验证

# 检验模式,不需要梯度更新
model.eval()
correct = 0.0
total   = 0
with torch.no_grad():
    print("=======================check=======================")
    for inputs, labels in test_loader:
        # 从train_loader中取出batch_size个数据
        inputs, labels = inputs.to(device), labels.to(device)
                
        # 模型检验
        outputs = model(inputs)
        pred    = outputs.argmax(dim=1) #返回每一行中最大值的索引
        total   = total + inputs.size(0)
        correct = correct + torch.eq(pred, labels).sum().item()
            
        
    correct = 100 * correct/total
    print("Accuracy of the network on the 20094 test images:%.2f %%" %correct )
    print("===================================================")

9. 模型预测

数据集构建

"""
test为一个列表,存放图片的路径
其路径样式为asl_alphabet_test\\K_test.jpg
pre_test_dict为一个字典,如(种类:标号)
"""
import torch
from PIL import Image
from torch.autograd import Variable
from torchvision import transforms
from torch.utils.data import Dataset, DataLoader

no_transform = transforms.Compose([
    transforms.Resize(224),
    transforms.ToTensor(),
    transforms.Normalize((0.51865010496974, 0.49877608954906466, 0.5143190141916275),
                         (7.181780141103533, 8.053991771959863, 8.290017965464534))
                               ])

# -----------------ready the dataset--------------------------
def default_loader(path):
    img = Image.open(path)
    return img

class MyDataset(Dataset):
    # 构造函数
    def __init__(self, path, transform=None, target_transform=None, loader=default_loader):
        imgs = []
        for img_path in path:
            temp      = img_path.split("\\")
            label     = temp[4][:-9]
            img_label = pre_test_dict[label]
            imgs.append((img_path,int(img_label),label))#imgs中包含有图像路径和标签
        self.path             = path
        self.imgs             = imgs
        self.transform        = transform
        self.target_transform = target_transform
        self.loader           = loader
        
    # hash_map建立
    def __getitem__(self, index):
        img_path, img_label, label = self.imgs[index]
        # 调用 opencv 打开图片
        img = self.loader(img_path)
        if self.transform is not None:
            img = self.transform(img)
        img_label -= 1
        return img, img_path, label
    
    def __len__(self):
        return len(self.imgs)


test_data         = MyDataset(test, transform=no_transform)

#test_data测试数据,调用DataLoader批量加载
test_loader         = DataLoader(dataset=test_data, batch_size=batch_size, shuffle=False)

模型预测

"""
test_dict为一个字典,如(标号:种类)
"""

# 预测模式,不需要梯度更新
model.eval()
with torch.no_grad():
    print("=======================forecast=======================")
    for inputs, img_paths, kind_labels in test_loader:
        # 从train_loader中取出batch_size个数据
        inputs = inputs.to(device)
                
        # 模型检验
        outputs      = model(inputs)
        pred         = outputs.argmax(dim=1).tolist() #返回每一行中最大值的索引
        for i in range(len(pred)):
            predict = test_dict[pred[i]+1]
            path    = img_paths[i]
            real    = kind_labels[i]
            print("path: %s, predict: %s, real: %s"%(path, predict, real))
    print("===================================================")

你可能感兴趣的:(深度学习,计算机视觉)