使用pytorch搭建VGG网络 学习笔记

使用pytorch搭建VGG网络

  • VGG网络详解
    • VGG网络配置
    • CNN感受野
    • VGG网络结构
  • 使用pytorch搭建VGG网络
    • model
    • train
    • predict

教程来自B站up@霹雳吧啦Wz,链接:
 
VGG网络详解及感受野的计算:https://www.bilibili.com/video/BV1q7411T7Y6
 
使用pytorch搭建VGG网络:https://www.bilibili.com/video/BV1i7411T7ZN
 
up的博客:https://blog.csdn.net/qq_37541097
 
论文链接:Very Deep Convolutional Networks for Large-Scale Image Recognition
 

VGG网络详解

 
VGGNet是牛津大学计算机视觉组VGG(Visual Geometry Group)和Google DeepMind公司的研究员仪器研发的深度卷积神经网络。VGG斩获2014年ImageNet竞赛中Localization Task(定位任务)第一名和Classification Task(分类任务)第二名。
 

VGG网络配置

 
配置深度从左(A)到右(E)依次增加,使用过程中常使用D配置(16层),此配置中包含13个卷积层和3个全连接层。
 
使用pytorch搭建VGG网络 学习笔记_第1张图片
 
网络中的亮点:
 
通过堆叠多个3×3的卷积核来替代大尺度卷积核(减少所需参数),可以拥有相同的感受野
 
论文中提到,可以通过
 
堆叠两个3×3的卷积核替代5×5的卷积核
 
堆叠三个3×3的卷积核替代7×7的卷积核
 

CNN感受野

 
在卷积神经网络中,决定某一层输出结果中一个元素所对应的输入层的区域大小,被称作感受野(receptive field)。通俗的解释是,输出feature map上的一个单元对应输入层上区域大小
 
如下图所示,最下层为9×9×1的特征矩阵,通过卷积层Conv1(卷积核大小为3×3,步距为2),得到4×4×1,再经过池化层MaxPool1(池化核大小为2×2,步距为2),得到2×2×1的特征矩阵。
第三层中一个单元在第二层中对应的感受野是2×2区域,在原图(第一层)中对应的感受野大小为5×5区域。
使用pytorch搭建VGG网络 学习笔记_第2张图片
 
感受野计算公式:
F ( i ) = ( F ( i + 1 ) − 1 ) × S t r i d e + K s i z e F(i) = (F(i+1)-1)×Stride+Ksize F(i)=(F(i+1)1)×Stride+Ksize

  • F ( i ) F(i) F(i) 为第i层感受野
  • S t r i d e Stride Stride 为第i层的步距
  • K s i z e Ksize Ksize 为卷积核或池化核尺寸

 
Feature map: F ( 3 ) = 1 F(3)=1 F(3)=1

Pool1:       F ( 2 ) = ( 1 − 1 ) × 2 + 2 = 2 F(2)=(1-1)×2+2=2 F(2)=(11)×2+2=2

Conv1:      F ( 1 ) = ( 2 − 1 ) × 2 + 3 = 5 F(1)=(2-1)×2+3=5 F(1)=(21)×2+3=5

 

VGG网络中两个3×3的卷积核替代5×5的卷积核,三个3×3的卷积核替代7×7的卷积核,可以通过如下计算得出(VGG网络中步距默认为1)
假设一个特征矩阵通过三层3×3的卷积层后,得到feature map

 

Feature map:  F ( 4 ) = 1 F(4)=1 F(4)=1

Conv3×3(3):   F ( 3 ) = ( 1 − 1 ) × 1 + 3 = 3 F(3)=(1-1)×1+3=3 F(3)=(11)×1+3=3

Conv3×3(2):   F ( 2 ) = ( 3 − 1 ) × 1 + 3 = 5 F(2)=(3-1)×1+3=5 F(2)=(31)×1+3=5

Conv3×3(1):   F ( 1 ) = ( 5 − 1 ) × 1 + 3 = 7 F(1)=(5-1)×1+3=7 F(1)=(51)×1+3=7

 

通过三层3×3的卷积核卷积之后所得到的的一个单位对应的感受野相当于采用7×7的大小的卷积核得到的感受野相同。
使用7×7卷积核所需参数,与堆叠三个3×3卷积核所需的参数(假设输入输出channel为C)

7×7卷积核: 7 × 7 × C × C = 49 C 2 7×7×C×C=49C^2 7×7×C×C=49C2
三个3×3卷积核: 3 × 3 × C × C + 3 × 3 × C × C + 3 × 3 × C × C = 27 C 2 3×3×C×C+3×3×C×C+3×3×C×C=27C^2 3×3×C×C+3×3×C×C+3×3×C×C=27C2

所以两种方法的感受野相同,但堆叠3×3卷积核的方法所需参数更少。
 

VGG网络结构

 
输入为224×224大小的RGB图像,首先经过两个3×3的卷积层 → 一个最大下采样层 → 两个3×3的卷积层 → 一个最大下采样层 → 三个3×3的卷积层 → 一个最大下采样层 → 三个3×3的卷积层 → 一个最大下采样层 → 三个3×3的卷积层 → 一个最大下采样层 → 三个全连接层 → soft-max处理得到概率分布

使用pytorch搭建VGG网络 学习笔记_第3张图片
 
表中  conv的Stride默认为1,padding默认为1
    maxpool的Stride默认为2,size默认为2

由公式
o u t s i z e = ( i n s i z e − F s i z e + 2 P ) S + 1 out_{size}= \frac{(in_{size}-F_{size}+2P)}{S}+1 outsize=S(insizeFsize+2P)+1

  • i n s i z e in_{size} insize  输入图片大小 W×W
  • F s i z e F_{size} Fsize   Filter 大小(卷积核或池化核)
  • P P P     padding的像素数
  • S S S     步距Stride

使用pytorch搭建VGG网络 学习笔记_第4张图片

 

使用pytorch搭建VGG网络

 
搭建 A、B、D、E 四个配置模型,将VGG网络分成两个部分:提取特征网络结构和分类网络结构。
 
使用pytorch搭建VGG网络 学习笔记_第5张图片
 

model

model.py

import torch.nn as nn
import torch

# official pretrain weights
model_urls = {
    'vgg11': 'https://download.pytorch.org/models/vgg11-bbd30ac9.pth',
    'vgg13': 'https://download.pytorch.org/models/vgg13-c768596a.pth',
    'vgg16': 'https://download.pytorch.org/models/vgg16-397923af.pth',
    'vgg19': 'https://download.pytorch.org/models/vgg19-dcbb9e9d.pth'
}


class VGG(nn.Module):
    def __init__(self, features, num_classes=1000, init_weights=False):
        super(VGG, self).__init__()
        self.features = features
        
        # 构建分类网络结构
        self.classifier = nn.Sequential(
            nn.Linear(512*7*7, 4096),		# 第一层全连接层
            nn.ReLU(True),					
            nn.Dropout(p=0.5),				# 50%的比例随机失活
            nn.Linear(4096, 4096),			# 第二层全连接层
            nn.ReLU(True),
            nn.Dropout(p=0.5),
            nn.Linear(4096, num_classes)	# 第三层全连接层
        )
        if init_weights:					# 是否进行权重初始化
            self._initialize_weights()
            
	# 正向传播过程
    def forward(self, x): 
        # N x 3 x 224 x 224
        x = self.features(x)				# 输入到特征提取网络
        # N x 512 x 7 x 7
        x = torch.flatten(x, start_dim=1)	# 展平处理,从第1维度展平(第0维度为batch)
        # N x 512*7*7
        x = self.classifier(x)				# 输入到分类网络中,得到输出
        return x

    def _initialize_weights(self):
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                # nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
                nn.init.xavier_uniform_(m.weight)
                if m.bias is not None:
                    nn.init.constant_(m.bias, 0)
            elif isinstance(m, nn.Linear):
                nn.init.xavier_uniform_(m.weight)
                # nn.init.normal_(m.weight, 0, 0.01)
                nn.init.constant_(m.bias, 0)

# 构建提取特征网络结构
def make_features(cfg: list):		# 传入对应配置的列表
    layers = []						# 定义空列表,存放每一层的结构
    in_channels = 3					# 输入为RGB图片,输入通道为3
    for v in cfg:					# 遍历配置列表
        if v == "M":				# 如果为M,则为池化层,创建一个最大池化下采样层
            layers += [nn.MaxPool2d(kernel_size=2, stride=2)]
        else:						# 不等于M,则为数字,创建卷积层
            conv2d = nn.Conv2d(in_channels, v, kernel_size=3, padding=1)
            layers += [conv2d, nn.ReLU(True)] # 每个卷积层都采用RELU激活函数,将定义好的卷积层和RELU拼接
            in_channels = v
    return nn.Sequential(*layers)	# 非关键字参数,*layers可以传递任意数量的实参,以元组的形式导入


cfgs = {
    'vgg11': [64, 'M', 128, 'M', 256, 256, 'M', 512, 512, 'M', 512, 512, 'M'],
    'vgg13': [64, 64, 'M', 128, 128, 'M', 256, 256, 'M', 512, 512, 'M', 512, 512, 'M'],
    'vgg16': [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 'M', 512, 512, 512, 'M', 512, 512, 512, 'M'],
    'vgg19': [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 256, 'M', 512, 512, 512, 512, 'M', 512, 512, 512, 512, 'M'],
}

# 实例化配置模型
def vgg(model_name="vgg16", **kwargs):
    assert model_name in cfgs, "Warning: model number {} not in cfgs dict!".format(model_name)
    cfg = cfgs[model_name]

    model = VGG(make_features(cfg), **kwargs)		# 可以传递任意数量的实参,以字典的形式导入
    return model

 
其中
 
cfgs为一个字典文件,字典中每个key代表一个模型的配置文件:‘vgg11’对应表格中的A,‘vgg13’对应表格中的B,‘vgg16’对应表格中的D,‘vgg19’对应表格中的E。每个key对应的值为一个列表,数字代表卷积层中卷积核的个数,字母‘M’表示池化层结构。
 

cfgs = {
    'vgg11': [64, 'M', 128, 'M', 256, 256, 'M', 512, 512, 'M', 512, 512, 'M'],
    'vgg13': [64, 64, 'M', 128, 128, 'M', 256, 256, 'M', 512, 512, 'M', 512, 512, 'M'],
    'vgg16': [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 'M', 512, 512, 512, 'M', 512, 512, 512, 'M'],
    'vgg19': [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 256, 'M', 512, 512, 512, 512, 'M', 512, 512, 512, 512, 'M'],

 

train

train.py与上一讲相同,只需改变部分参数。

import os
import json

import torch
import torch.nn as nn
from torchvision import transforms, datasets
import torch.optim as optim
from tqdm import tqdm

from model import vgg


def main():
    device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
    print("using {} device.".format(device))

    data_transform = {
        "train": transforms.Compose([transforms.RandomResizedCrop(224),
                                     transforms.RandomHorizontalFlip(),
                                     transforms.ToTensor(),
                                     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))]),
        "val": transforms.Compose([transforms.Resize((224, 224)),
                                   transforms.ToTensor(),
                                   transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])}

    data_root = os.path.abspath(os.path.join(os.getcwd(), "../.."))  # get data root path
    image_path = os.path.join(data_root, "data_set", "flower_data")  # flower data set path
    assert os.path.exists(image_path), "{} path does not exist.".format(image_path)
    train_dataset = datasets.ImageFolder(root=os.path.join(image_path, "train"),
                                         transform=data_transform["train"])
    train_num = len(train_dataset)

    # {'daisy':0, 'dandelion':1, 'roses':2, 'sunflower':3, 'tulips':4}
    flower_list = train_dataset.class_to_idx
    cla_dict = dict((val, key) for key, val in flower_list.items())
    # write dict into json file
    json_str = json.dumps(cla_dict, indent=4)
    with open('class_indices.json', 'w') as json_file:
        json_file.write(json_str)

    batch_size = 32
    nw = min([os.cpu_count(), batch_size if batch_size > 1 else 0, 8])  # number of workers
    print('Using {} dataloader workers every process'.format(nw))

    train_loader = torch.utils.data.DataLoader(train_dataset,
                                               batch_size=batch_size, shuffle=True,
                                               num_workers=nw)

    validate_dataset = datasets.ImageFolder(root=os.path.join(image_path, "val"),
                                            transform=data_transform["val"])
    val_num = len(validate_dataset)
    validate_loader = torch.utils.data.DataLoader(validate_dataset,
                                                  batch_size=batch_size, shuffle=False,
                                                  num_workers=nw)
    print("using {} images for training, {} images for validation.".format(train_num,
                                                                           val_num))

    # test_data_iter = iter(validate_loader)
    # test_image, test_label = test_data_iter.next()

    model_name = "vgg16"
    net = vgg(model_name=model_name, num_classes=5, init_weights=True)
    net.to(device)
    loss_function = nn.CrossEntropyLoss()
    optimizer = optim.Adam(net.parameters(), lr=0.0001)

    epochs = 30
    best_acc = 0.0
    save_path = './{}Net.pth'.format(model_name)
    train_steps = len(train_loader)
    for epoch in range(epochs):
        # train
        net.train()
        running_loss = 0.0
        train_bar = tqdm(train_loader)
        for step, data in enumerate(train_bar):
            images, labels = data
            optimizer.zero_grad()
            outputs = net(images.to(device))
            loss = loss_function(outputs, labels.to(device))
            loss.backward()
            optimizer.step()

            # print statistics
            running_loss += loss.item()

            train_bar.desc = "train epoch[{}/{}] loss:{:.3f}".format(epoch + 1,
                                                                     epochs,
                                                                     loss)

        # validate
        net.eval()
        acc = 0.0  # accumulate accurate number / epoch
        with torch.no_grad():
            val_bar = tqdm(validate_loader)
            for val_data in val_bar:
                val_images, val_labels = val_data
                outputs = net(val_images.to(device))
                predict_y = torch.max(outputs, dim=1)[1]
                acc += torch.eq(predict_y, val_labels.to(device)).sum().item()

        val_accurate = acc / val_num
        print('[epoch %d] train_loss: %.3f  val_accuracy: %.3f' %
              (epoch + 1, running_loss / train_steps, val_accurate))

        if val_accurate > best_acc:
            best_acc = val_accurate
            torch.save(net.state_dict(), save_path)

    print('Finished Training')


if __name__ == '__main__':
    main()

 

predict

predict.py

import os
import json

import torch
from PIL import Image
from torchvision import transforms
import matplotlib.pyplot as plt

from model import vgg


def main():
    device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

    data_transform = transforms.Compose(
        [transforms.Resize((224, 224)),
         transforms.ToTensor(),
         transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

    # load image
    img_path = "../tulip.jpg"
    assert os.path.exists(img_path), "file: '{}' dose not exist.".format(img_path)
    img = Image.open(img_path)
    plt.imshow(img)
    # [N, C, H, W]
    img = data_transform(img)
    # expand batch dimension
    img = torch.unsqueeze(img, dim=0)

    # read class_indict
    json_path = './class_indices.json'
    assert os.path.exists(json_path), "file: '{}' dose not exist.".format(json_path)

    json_file = open(json_path, "r")
    class_indict = json.load(json_file)
    
    # create model
    model = vgg(model_name="vgg16", num_classes=5).to(device)
    # load model weights
    weights_path = "./vgg16Net.pth"
    assert os.path.exists(weights_path), "file: '{}' dose not exist.".format(weights_path)
    model.load_state_dict(torch.load(weights_path, map_location=device))

    model.eval()
    with torch.no_grad():
        # predict class
        output = torch.squeeze(model(img.to(device))).cpu()
        predict = torch.softmax(output, dim=0)
        predict_cla = torch.argmax(predict).numpy()

    print_res = "class: {}   prob: {:.3}".format(class_indict[str(predict_cla)],
                                                 predict[predict_cla].numpy())
    plt.title(print_res)
    print(print_res)
    plt.show()


if __name__ == '__main__':
    main()

你可能感兴趣的:(pytorch学习笔记,卷积神经网络,pytorch,深度学习)