使用pytorch搭建GoogLeNet网络 学习笔记

使用pytorch搭建GoogLeNet网络

  • GoogLeNet网络详解
    • Inception结构
    • 辅助分类器(Auxiliary Classifier)
    • GoogLeNet网络结构和参数
      • 网络结构图
      • 网络参数表
    • 小结
  • 使用pytorch搭建GoogLeNet网络
    • model
    • train
    • predict

教程来自B站up@霹雳吧啦Wz,链接:
 
GoogLeNet网络详解:https://www.bilibili.com/video/BV1z7411T7ie
 
使用pytorch搭建GoogLeNet网络:https://www.bilibili.com/video/BV1r7411T7M5
 
up的CSDN博客:https://blog.csdn.net/qq_37541097
 
论文链接:Going Deeper with Convolutions
 

GoogLeNet网络详解

 
GoogLeNet是2014年Google团队提出的一种全新的深度学习结构,斩获当年ImageNet竞赛中Classification Task(分类任务)第一名。(第二名是上篇文章中的VGG)
 

网络中的亮点

  • 引入了Inception结构(融合不同尺度的特征信息)
  • 使用1×1的卷积核进行降维以及映射处理
  • 添加两个辅助分类器帮助训练(AlexNet和VGG都只有一个输出层,GoogLeNet有三个)
  • 丢弃全连接层,使用平均池化层(大大减少模型参数)
     

Inception结构

 
AlexNet和VGG结构都是串行结构(一系列卷积层和池化层串联),Inception结构为并行结构,将输入的特征矩阵分为多个分支,同时输入到多个卷积层和池化层进行处理,处理后按深度拼接,得到输出特征矩阵。

注意:每个分支所得的特征矩阵高和宽必须相同
 


Inception原始结构

 使用pytorch搭建GoogLeNet网络 学习笔记_第1张图片
包含

  • 卷积核大小为1×1的卷积层
  • 卷积核大小为3×3的卷积层
  • 卷积核大小为5×5的卷积层
  • 池化核大小为3×3的池化层

 
具有降维功能的Inception结构
 
使用pytorch搭建GoogLeNet网络 学习笔记_第2张图片
相比于初始版本,多了三个卷积核大小为1×1的卷积层,起降维作用。
 
1×1卷积核的降维作用
 
1. 不使用1×1卷积核降维
 
假设有一个深度为512的特征矩阵,使用64个5×5卷积核进行卷积,需要的参数为819200个。
 使用pytorch搭建GoogLeNet网络 学习笔记_第3张图片
 
2. 使用1×1卷积核降维
 
假设有一个深度为512的特征矩阵,先使用24个1×1卷积核进行卷积,得到一个深度为24的特征矩阵(输出矩阵的深度与卷积核个数相等),再使用64个5×5卷积核进行卷积,需要的参数为50688个。
 
使用pytorch搭建GoogLeNet网络 学习笔记_第4张图片
 
对比可得,通过使用1×1卷积核,降低了特征矩阵的深度,从而减少卷积参数,最终减少了计算量
 

辅助分类器(Auxiliary Classifier)

 
GoogLeNet网络中有两个结构相同的辅助分类器,如下图所示。
使用pytorch搭建GoogLeNet网络 学习笔记_第5张图片

  • 第一层   AveragePool:平均池化下采样层,池化核大小为5×5,步距为3
    • 从Inception(4a)中输入到该层时,输入特征矩阵大小为14×14×512,输出为4×4×512;
    • 从Inception(4b)中输入到该层时,输入特征矩阵大小为14×14×528 ,输出为4×4×528;
  • 第二层   Conv:128个卷积核大小为1×1的卷积层(使用了ReLU激活函数),降维
  • 第三层   FC:节点个数为1024的全连接层(使用了ReLU激活函数)
  • 两个全连接层之间使用了dropout函数(70%的比例随机失活神经元)
  • 第四层   FC:全连接层,对于ImageNet数据集有1000个类别,节点个数就是1000
  • Softmax激活函数

GoogLeNet网络结构和参数

 

网络结构图

 
使用pytorch搭建GoogLeNet网络 学习笔记_第6张图片
 

网络参数表

 
使用pytorch搭建GoogLeNet网络 学习笔记_第7张图片
 
结合下图
 
使用pytorch搭建GoogLeNet网络 学习笔记_第8张图片

  • #1×1     分支1上1x1的卷积核个数
  • #3×3 reduce   分支2上1x1的卷积核个数
  • #3×3     分支2上3x3的卷积核个数
  • #5×5 reduce   分支3上1x1的卷积核个数
  • #5×5     分支3上5x5的卷积核个数
  • poolproj   分支4上1x1的卷积核个数
  •  

小结

 
视频up分别搭建了GoogLeNet和VGGNet,并分别计算所生成的模型参数,结果如下:

GoogLeNet生成的参数大约是VGG网络的1/20,网络模型小,准确率高。但是GoogLeNet网络有两个辅助分类器,搭建和修改网络比VGG复杂。
 
使用pytorch搭建GoogLeNet网络 学习笔记_第9张图片
 

使用pytorch搭建GoogLeNet网络

 

model

model.py

import torch.nn as nn
import torch
import torch.nn.functional as F


class GoogLeNet(nn.Module):
	# aux_logits=True 是否使用辅助分类器
    def __init__(self, num_classes=1000, aux_logits=True, init_weights=False):	
        super(GoogLeNet, self).__init__()
        self.aux_logits = aux_logits
		
		# (224-7+2*3)/2+1=112.5,向下取整
        self.conv1 = BasicConv2d(3, 64, kernel_size=7, stride=2, padding=3)	
        self.maxpool1 = nn.MaxPool2d(3, stride=2, ceil_mode=True)	# ceil_mode=True 向上取整 False 向下取整

        self.conv2 = BasicConv2d(64, 64, kernel_size=1)
        self.conv3 = BasicConv2d(64, 192, kernel_size=3, padding=1)
        self.maxpool2 = nn.MaxPool2d(3, stride=2, ceil_mode=True)

        self.inception3a = Inception(192, 64, 96, 128, 16, 32, 32)
        self.inception3b = Inception(256, 128, 128, 192, 32, 96, 64)
        self.maxpool3 = nn.MaxPool2d(3, stride=2, ceil_mode=True)

        self.inception4a = Inception(480, 192, 96, 208, 16, 48, 64)
        self.inception4b = Inception(512, 160, 112, 224, 24, 64, 64)
        self.inception4c = Inception(512, 128, 128, 256, 24, 64, 64)
        self.inception4d = Inception(512, 112, 144, 288, 32, 64, 64)
        self.inception4e = Inception(528, 256, 160, 320, 32, 128, 128)
        self.maxpool4 = nn.MaxPool2d(3, stride=2, ceil_mode=True)

        self.inception5a = Inception(832, 256, 160, 320, 32, 128, 128)
        self.inception5b = Inception(832, 384, 192, 384, 48, 128, 128)

        if self.aux_logits:	# 是否使用辅助分类器
            self.aux1 = InceptionAux(512, num_classes)
            self.aux2 = InceptionAux(528, num_classes)

        self.avgpool = nn.AdaptiveAvgPool2d((1, 1))	# 自适应的平均池化下采样操作,自动计算核的大小和每次移动的步长
        self.dropout = nn.Dropout(0.4)
        self.fc = nn.Linear(1024, num_classes)
        if init_weights:
            self._initialize_weights()

    def forward(self, x):
        # N x 3 x 224 x 224
        x = self.conv1(x)
        # N x 64 x 112 x 112
        x = self.maxpool1(x)
        # N x 64 x 56 x 56
        x = self.conv2(x)
        # N x 64 x 56 x 56
        x = self.conv3(x)
        # N x 192 x 56 x 56
        x = self.maxpool2(x)

        # N x 192 x 28 x 28
        x = self.inception3a(x)
        # N x 256 x 28 x 28
        x = self.inception3b(x)
        # N x 480 x 28 x 28
        x = self.maxpool3(x)
        # N x 480 x 14 x 14
        x = self.inception4a(x)
        # N x 512 x 14 x 14
        if self.training and self.aux_logits:    # eval model lose this layer
            aux1 = self.aux1(x)

        x = self.inception4b(x)
        # N x 512 x 14 x 14
        x = self.inception4c(x)
        # N x 512 x 14 x 14
        x = self.inception4d(x)
        # N x 528 x 14 x 14

		# 训练模式self.training为True,
        if self.training and self.aux_logits:    # eval model lose this layer
            aux2 = self.aux2(x)

        x = self.inception4e(x)
        # N x 832 x 14 x 14
        x = self.maxpool4(x)
        # N x 832 x 7 x 7
        x = self.inception5a(x)
        # N x 832 x 7 x 7
        x = self.inception5b(x)
        # N x 1024 x 7 x 7

        x = self.avgpool(x)
        # N x 1024 x 1 x 1
        x = torch.flatten(x, 1)
        # N x 1024
        x = self.dropout(x)
        x = self.fc(x)
        # N x 1000 (num_classes)
        if self.training and self.aux_logits:   # eval model lose this layer
            return x, aux2, aux1
        return x

    def _initialize_weights(self):
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
                if m.bias is not None:
                    nn.init.constant_(m.bias, 0)
            elif isinstance(m, nn.Linear):
                nn.init.normal_(m.weight, 0, 0.01)
                nn.init.constant_(m.bias, 0)

# Inception结构
class Inception(nn.Module):
    def __init__(self, in_channels, ch1x1, ch3x3red, ch3x3, ch5x5red, ch5x5, pool_proj):
        super(Inception, self).__init__()
		# 第一个分支
        self.branch1 = BasicConv2d(in_channels, ch1x1, kernel_size=1)
		# 第二个分支
        self.branch2 = nn.Sequential(
            BasicConv2d(in_channels, ch3x3red, kernel_size=1),
            BasicConv2d(ch3x3red, ch3x3, kernel_size=3, padding=1)   # 保证输出大小等于输入大小
        )
		# 第三个分支
        self.branch3 = nn.Sequential(
            BasicConv2d(in_channels, ch5x5red, kernel_size=1),
            BasicConv2d(ch5x5red, ch5x5, kernel_size=5, padding=2)   # 保证输出大小等于输入大小
        )
		# 第四个分支
        self.branch4 = nn.Sequential(
            nn.MaxPool2d(kernel_size=3, stride=1, padding=1),
            BasicConv2d(in_channels, pool_proj, kernel_size=1)
        )
	# 正向传播过程
    def forward(self, x):
        branch1 = self.branch1(x)
        branch2 = self.branch2(x)
        branch3 = self.branch3(x)
        branch4 = self.branch4(x)

        outputs = [branch1, branch2, branch3, branch4]	# 输出放入一个列表中
        return torch.cat(outputs, 1)	# 对输出在第1个维度(channel)进行合并 batch channel height width

# 辅助分类器
class InceptionAux(nn.Module):
    def __init__(self, in_channels, num_classes):	# 输入特征矩阵深度和分类个数
        super(InceptionAux, self).__init__()
        self.averagePool = nn.AvgPool2d(kernel_size=5, stride=3)
        self.conv = BasicConv2d(in_channels, 128, kernel_size=1)  # output[batch, 128, 4, 4]

        self.fc1 = nn.Linear(2048, 1024)
        self.fc2 = nn.Linear(1024, num_classes)

    def forward(self, x):
        # aux1: N x 512 x 14 x 14, aux2: N x 528 x 14 x 14
        x = self.averagePool(x)
        # aux1: N x 512 x 4 x 4, aux2: N x 528 x 4 x 4
        x = self.conv(x)
        # N x 128 x 4 x 4
        x = torch.flatten(x, 1)	# 展平操作
        x = F.dropout(x, 0.5, training=self.training)	
        # N x 2048
        x = F.relu(self.fc1(x), inplace=True)
        x = F.dropout(x, 0.5, training=self.training)
        # N x 1024
        x = self.fc2(x)
        # N x num_classes
        return x

# 卷积层+ReLU
class BasicConv2d(nn.Module):
    def __init__(self, in_channels, out_channels, **kwargs):
        super(BasicConv2d, self).__init__()
        self.conv = nn.Conv2d(in_channels, out_channels, **kwargs)
        self.relu = nn.ReLU(inplace=True)

    def forward(self, x):
        x = self.conv(x)
        x = self.relu(x)
        return x

dropout函数

x = F.dropout(x, 0.5, training=self.training)

0.5: 以50%的比例随机失活神经元(原论文为70%)

training=self.training: 当实例化一个模型model后,可以通过model.train()和model.eval()来控制模型的状态,在model.train()模式下self.training=True,在model.eval()模式下self.training=False。
 

train

train.py

import os
import json

import torch
import torch.nn as nn
from torchvision import transforms, datasets
import torch.optim as optim
from tqdm import tqdm

from model import GoogLeNet


def main():
    device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
    print("using {} device.".format(device))

    data_transform = {
        "train": transforms.Compose([transforms.RandomResizedCrop(224),
                                     transforms.RandomHorizontalFlip(),
                                     transforms.ToTensor(),
                                     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))]),
        "val": transforms.Compose([transforms.Resize((224, 224)),
                                   transforms.ToTensor(),
                                   transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])}

    data_root = os.path.abspath(os.path.join(os.getcwd(), "../.."))  # get data root path
    image_path = os.path.join(data_root, "data_set", "flower_data")  # flower data set path
    assert os.path.exists(image_path), "{} path does not exist.".format(image_path)
    train_dataset = datasets.ImageFolder(root=os.path.join(image_path, "train"),
                                         transform=data_transform["train"])
    train_num = len(train_dataset)

    # {'daisy':0, 'dandelion':1, 'roses':2, 'sunflower':3, 'tulips':4}
    flower_list = train_dataset.class_to_idx
    cla_dict = dict((val, key) for key, val in flower_list.items())
    # write dict into json file
    json_str = json.dumps(cla_dict, indent=4)
    with open('class_indices.json', 'w') as json_file:
        json_file.write(json_str)

    batch_size = 32
    nw = min([os.cpu_count(), batch_size if batch_size > 1 else 0, 8])  # number of workers
    print('Using {} dataloader workers every process'.format(nw))

    train_loader = torch.utils.data.DataLoader(train_dataset,
                                               batch_size=batch_size, shuffle=True,
                                               num_workers=nw)

    validate_dataset = datasets.ImageFolder(root=os.path.join(image_path, "val"),
                                            transform=data_transform["val"])
    val_num = len(validate_dataset)
    validate_loader = torch.utils.data.DataLoader(validate_dataset,
                                                  batch_size=batch_size, shuffle=False,
                                                  num_workers=nw)

    print("using {} images for training, {} images for validation.".format(train_num,
                                                                           val_num))

    # test_data_iter = iter(validate_loader)
    # test_image, test_label = test_data_iter.next()

    # net = torchvision.models.googlenet(num_classes=5)
    # model_dict = net.state_dict()
    # pretrain_model = torch.load("googlenet.pth")
    # del_list = ["aux1.fc2.weight", "aux1.fc2.bias",
    #             "aux2.fc2.weight", "aux2.fc2.bias",
    #             "fc.weight", "fc.bias"]
    # pretrain_dict = {k: v for k, v in pretrain_model.items() if k not in del_list}
    # model_dict.update(pretrain_dict)
    # net.load_state_dict(model_dict)
    net = GoogLeNet(num_classes=5, aux_logits=True, init_weights=True)
    net.to(device)
    loss_function = nn.CrossEntropyLoss()
    optimizer = optim.Adam(net.parameters(), lr=0.0003)

    epochs = 30
    best_acc = 0.0
    save_path = './googleNet.pth'
    train_steps = len(train_loader)
    for epoch in range(epochs):
        # train
        net.train()
        running_loss = 0.0
        train_bar = tqdm(train_loader)
        for step, data in enumerate(train_bar):
            images, labels = data
            optimizer.zero_grad()
            logits, aux_logits2, aux_logits1 = net(images.to(device))
            loss0 = loss_function(logits, labels.to(device))
            loss1 = loss_function(aux_logits1, labels.to(device))
            loss2 = loss_function(aux_logits2, labels.to(device))
            loss = loss0 + loss1 * 0.3 + loss2 * 0.3
            loss.backward()
            optimizer.step()

            # print statistics
            running_loss += loss.item()

            train_bar.desc = "train epoch[{}/{}] loss:{:.3f}".format(epoch + 1,
                                                                     epochs,
                                                                     loss)

        # validate
        net.eval()
        acc = 0.0  # accumulate accurate number / epoch
        with torch.no_grad():
            val_bar = tqdm(validate_loader)
            for val_data in val_bar:
                val_images, val_labels = val_data
                outputs = net(val_images.to(device))  # eval model only have last output layer
                predict_y = torch.max(outputs, dim=1)[1]
                acc += torch.eq(predict_y, val_labels.to(device)).sum().item()

        val_accurate = acc / val_num
        print('[epoch %d] train_loss: %.3f  val_accuracy: %.3f' %
              (epoch + 1, running_loss / train_steps, val_accurate))

        if val_accurate > best_acc:
            best_acc = val_accurate
            torch.save(net.state_dict(), save_path)

    print('Finished Training')


if __name__ == '__main__':
    main()

 

predict

predict.py

import os
import json

import torch
from PIL import Image
from torchvision import transforms
import matplotlib.pyplot as plt

from model import GoogLeNet


def main():
    device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

    data_transform = transforms.Compose(
        [transforms.Resize((224, 224)),
         transforms.ToTensor(),
         transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

    # load image
    img_path = "../tulip.jpg"
    assert os.path.exists(img_path), "file: '{}' dose not exist.".format(img_path)
    img = Image.open(img_path)
    plt.imshow(img)
    # [N, C, H, W]
    img = data_transform(img)
    # expand batch dimension
    img = torch.unsqueeze(img, dim=0)

    # read class_indict
    json_path = './class_indices.json'
    assert os.path.exists(json_path), "file: '{}' dose not exist.".format(json_path)

    json_file = open(json_path, "r")
    class_indict = json.load(json_file)

    # create model
    model = GoogLeNet(num_classes=5, aux_logits=False).to(device)

    # load model weights
    weights_path = "./googleNet.pth"
    assert os.path.exists(weights_path), "file: '{}' dose not exist.".format(weights_path)
    missing_keys, unexpected_keys = model.load_state_dict(torch.load(weights_path, map_location=device),
                                                          strict=False) # 当前模型不需要辅助分类器,保存模型含有辅助分类器的参数

    model.eval()
    with torch.no_grad():
        # predict class
        output = torch.squeeze(model(img.to(device))).cpu()
        predict = torch.softmax(output, dim=0)
        predict_cla = torch.argmax(predict).numpy()

    print_res = "class: {}   prob: {:.3}".format(class_indict[str(predict_cla)],
                                                 predict[predict_cla].numpy())
    plt.title(print_res)
    print(print_res)
    plt.show()


if __name__ == '__main__':
    main()

你可能感兴趣的:(pytorch学习笔记,神经网络,深度学习,pytorch)