深度学习CV学习笔记(Alexnet)

文章目录

  • 前言
  • Alexnet
    • moduel
  • train.py
    • predic.py

前言

之前苦于CV不知道具体怎么入手,在看完cs231n的课程之后,算是对整体的套路和方法有了大概的认识,但是牵扯但具体的代码,感觉还是处于一个非常懵的状态,好在突然发现了一个大佬github仓库,讲的正好是CV很经典的论文和具体的代码实现,因此我决定写一个笔记追踪我的学习过程。
下面给上大佬的仓库链接: https://github.com/WZMIAOMIAO/deep-learning-for-image-processing
再贴上另一位大佬的csdn博客,他也跟踪学习了这个仓库:https://blog.csdn.net/m0_37867091

Alexnet

深度学习CV学习笔记(Alexnet)_第1张图片
图中并没有给出Maxpooling操作的细节结果,你可以在前两个卷积中相想成Maxpooling和Conv2d在两个块一步做完得到下一个特征。

moduel

本次代码都只使用了原模型一般的卷积核进行操作,可以参照下面的网络的详细结构和代码一起看
深度学习CV学习笔记(Alexnet)_第2张图片

import torch.nn as nn
import torch

class AlexNet(nn.Module):
    def __init__(self, num_classes=1000, init_weights=False):
        super(AlexNet, self).__init__()
        self.features = nn.Sequential(
            nn.Conv2d(3, 48, kernel_size=11, stride=4, padding=2),  # input[3, 224, 224]  output[48, 55, 55]
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),                  # output[48, 27, 27]
            nn.Conv2d(48, 128, kernel_size=5, padding=2),           # output[128, 27, 27]
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),                  # output[128, 13, 13]
            nn.Conv2d(128, 192, kernel_size=3, padding=1),          # output[192, 13, 13]
            nn.ReLU(inplace=True),
            nn.Conv2d(192, 192, kernel_size=3, padding=1),          # output[192, 13, 13]
            nn.ReLU(inplace=True),
            nn.nplaConv2d(192, 128, kernel_size=3, padding=1),          # output[128, 13, 13]
            nn.ReLU(ice=True),
            nn.MaxPool2d(kernel_size=3, stride=2),                  # output[128, 6, 6]
        )
        self.classifier = nn.Sequential(
            nn.Dropout(p=0.5),
            nn.Linear(128 * 6 * 6, 2048),
            nn.ReLU(inplace=True),
            nn.Dropout(p=0.5),
            nn.Linear(2048, 2048),
            nn.ReLU(inplace=True),
            nn.Linear(2048, num_classes),
        )
        if init_weights:
            self._initialize_weights()

    def forward(self, x):
        x = self.features(x)
        x = torch.flatten(x, start_dim=1)
        x = self.classifier(x)
        return x

    def _initialize_weights(self):
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
                if m.bias is not None:
                    nn.init.constant_(m.bias, 0)
            elif isinstance(m, nn.Linear):
                nn.init.normal_(m.weight, 0, 0.01)
                nn.init.constant_(m.bias, 0)

1.nn.Sequential()函数可以在网络层数比较深时精简我们的代码,把我们的各个模块打包成一个新的模块。

    Example::
        # Using Sequential to create a small model. When `model` is run,
        # input will first be passed to `Conv2d(1,20,5)`. The output of
        # `Conv2d(1,20,5)` will be used as the input to the first
        # `ReLU`; the output of the first `ReLU` will become the input
        # for `Conv2d(20,64,5)`. Finally, the output of
        # `Conv2d(20,64,5)` will be used as input to the second `ReLU`
        model = nn.Sequential(
                  nn.Conv2d(1,20,5),
                  nn.ReLU(),
                  nn.Conv2d(20,64,5),
                  nn.ReLU()
                )

        # Using Sequential with OrderedDict. This is functionally the
        # same as the above code
        model = nn.Sequential(OrderedDict([
                  ('conv1', nn.Conv2d(1,20,5)),
                  ('relu1', nn.ReLU()),
                  ('conv2', nn.Conv2d(20,64,5)),
                  ('relu2', nn.ReLU())
                ]))
    """

2.对于padding的补充,传入的数据只能是padding (int, tuple or str, optional): Padding added to all four sides of the input. Default: 0。本次为了方便直接使用一个int型。例如:传入2,则会在上下左右各补两行0,如果是(1,2)的tuple就会在左右补1行0,上下补两行0。
如果想要实现更加精细化的操作就要使用nn.ZeroPad2d(dtype:tensor) Args:padding (int, tuple): the size of the padding.

import torch.nn as nn
import torch
F = torch.ones(3, 3)
m = nn.ZeroPad2d((1,2))
print(F)
F_padding = m(F)
print(F_padding)
four = nn.ZeroPad2d((1,1,2,0))
print(four(F))
# tensor([[1., 1., 1.],
#         [1., 1., 1.],
#         [1., 1., 1.]])
# tensor([[0., 1., 1., 1., 0., 0.],
#         [0., 1., 1., 1., 0., 0.],
#         [0., 1., 1., 1., 0., 0.]])
# tensor([[0., 0., 0., 0., 0.],
#         [0., 0., 0., 0., 0.],
#         [0., 1., 1., 1., 0.],
#         [0., 1., 1., 1., 0.],
#         [0., 1., 1., 1., 0.]])

3.对于卷积完的特征图: N = (W-F+2P)/S+1,
N:卷积完特征图的大小;
W:输入特征图的大小
F:卷积核的大小
P:padding的大小,不一定都是2P
S:卷积的步长
如果得到的结果不是整数,例如:N=55.25,stride=4,那么就是舍弃最后一行和最后一列;N = 55.8, stride = 5那么就会舍弃最后四行和最后四列。0.25 * 4 = 1,0.8 * 5 = 4。可以通过余数*步长的计算舍弃的最后一部分。

4.nn.ReLU(inplace=True)这里的inplace = True 是pytorch增加计算量但降低内存使用的一种方法。
5.如果在网络搭建的过程中传入参数init_weights = True那么就会在网络搭建完成后初始化权重。

    def _initialize_weights(self):
        for m in self.modules():				#继承自父类 nn.Module
            if isinstance(m, nn.Conv2d):        #判断每一层所属的类别
                nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu') 
                #使用了kaiming_normal_方法对m.weight进行初始
                if m.bias is not None:				
                    nn.init.constant_(m.bias, 0)
            elif isinstance(m, nn.Linear):		#判断每一层所属的类别
                nn.init.normal_(m.weight, 0, 0.01)	#对m.weight 进行初始化
                nn.init.constant_(m.bias, 0)               

我们看一下self.modules的定义,他会返回一个迭代器遍历网络中的所有模块,也就是之前定义的每一个层结构。

    def modules(self) -> Iterator['Module']:
        r"""Returns an iterator over all modules in the network.

其中if instance(m, nn.Conv2d)elif isinstance(m, nn.Linear)是为了判断每一层所属的类别是卷积层还是全连接层,并使用对应的初始化方法。

if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')

当前pytorch版本会自动对我们的网络进行基本的初始化,其实不需要自己初始化。

6.x = torch.flatten(x, start_dim=1) dim = 1,是对我们的tensor从第一维开始展平,因为我们的tensor是四维的[batch, channel, height, width]
batch是图片的个数,我们并不去动它,而从后面把每张图片的特征展开成一维的向量。

train.py

import os
import json

import torch
import torch.nn as nn
from torchvision import transforms, datasets, utils
import matplotlib.pyplot as plt
import numpy as np
import torch.optim as optim
from tqdm import tqdm

from model import AlexNet

def main():
    device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
    print("using {} device.".format(device))
    
#定义一个字典data_transform,key为str,映射为对象Compose
    data_transform = {                                                    
        "train": transforms.Compose([transforms.RandomResizedCrop(224),
                                     transforms.RandomHorizontalFlip(),
                                     transforms.ToTensor(),
                                     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))]),
        "val": transforms.Compose([transforms.Resize((224, 224)),  # cannot 224, must (224, 224)
                                   transforms.ToTensor(),
                                   transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])}

    data_root = os.path.abspath(os.path.join(os.getcwd(), "../.."))  # get data root path,"../.." 返回上上级目录
    image_path = os.path.join(data_root, "data_set", "flower_data")  # flower data set path 
    assert os.path.exists(image_path), "{} path does not exist.".format(image_path)
    train_dataset = datasets.ImageFolder(root=os.path.join(image_path, "train"),
                                         transform=data_transform["train"])
    train_num = len(train_dataset)

    # {'daisy':0, 'dandelion':1, 'roses':2, 'sunflower':3, 'tulips':4}
    flower_list = train_dataset.class_to_idx          #导入
    cla_dict = dict((val, key) for key, val in flower_list.items())     #交换key和val
    # write dict into json file
    json_str = json.dumps(cla_dict, indent=4)
    with open('class_indices.json', 'w') as json_file:
        json_file.write(json_str)

    batch_size = 32
    nw = min([os.cpu_count(), batch_size if batch_size > 1 else 0, 8]) 
     # number of workers,在视频中是0,但是仓库中改动过的,会无法运行,建议改成nw = 0
    print('Using {} dataloader workers every process'.format(nw))

    train_loader = torch.utils.data.DataLoader(train_dataset,
                                               batch_size=batch_size, shuffle=True,
                                               num_workers=nw)

    validate_dataset = datasets.ImageFolder(root=os.path.join(image_path, "val"),
                                            transform=data_transform["val"])
    val_num = len(validate_dataset)
    validate_loader = torch.utils.data.DataLoader(validate_dataset,
                                                  batch_size=4, shuffle=False,
                                                  num_workers=nw)

    print("using {} images for training, {} images for validation.".format(train_num,
                                                                           val_num))
    # test_data_iter = iter(validate_loader)
    # test_image, test_label = test_data_iter.next()
    #
    # def imshow(img):
    #     img = img / 2 + 0.5  # unnormalize
    #     npimg = img.numpy()
    #     plt.imshow(np.transpose(npimg, (1, 2, 0)))
    #     plt.show()
    #
    # print(' '.join('%5s' % cla_dict[test_label[j].item()] for j in range(4)))
    # imshow(utils.make_grid(test_image))

    net = AlexNet(num_classes=5, init_weights=True)

    net.to(device)
    loss_function = nn.CrossEntropyLoss()
    # pata = list(net.parameters())
    optimizer = optim.Adam(net.parameters(), lr=0.0002)

    epochs = 10
    save_path = './AlexNet.pth'
    best_acc = 0.0
    train_steps = len(train_loader)
    for epoch in range(epochs):
        # train
        net.train() 
        '''This has any effect only on certain modules. See documentations of
        particular modules for details of their behaviors in training/evaluation
        mode, if they are affected, e.g. :class:`Dropout`, :class:`BatchNorm`,
        etc.'''      
        #和net.val()是一对,可以在这个函数的解释中看到,是为了在训练的时候不dropout,在验证的时候dropout防止过拟合
        running_loss = 0.0
        train_bar = tqdm(train_loader)
        for step, data in enumerate(train_bar):
            images, labels = data
            optimizer.zero_grad()
            outputs = net(images.to(device))
            loss = loss_function(outputs, labels.to(device))
            loss.backward()
            optimizer.step()

            # print statistics
            running_loss += loss.item()

            train_bar.desc = "train epoch[{}/{}] loss:{:.3f}".format(epoch + 1,
                                                                     epochs,
                                                                     loss)

        # validate
        net.eval()
        acc = 0.0  # accumulate accurate number / epoch
        with torch.no_grad():
            val_bar = tqdm(validate_loader)
            for val_data in val_bar:
                val_images, val_labels = val_data
                outputs = net(val_images.to(device))
                predict_y = torch.max(outputs, dim=1)[1]
                acc += torch.eq(predict_y, val_labels.to(device)).sum().item()

        val_accurate = acc / val_num
        print('[epoch %d] train_loss: %.3f  val_accuracy: %.3f' %
              (epoch + 1, running_loss / train_steps, val_accurate))

        if val_accurate > best_acc:
            best_acc = val_accurate
            torch.save(net.state_dict(), save_path)

    print('Finished Training')


if __name__ == '__main__':
    main()

其中定义了字典data_transform,用来处理训练集和测试集,key为str,映射为类Compose:

 data_transform = {                                                    
        "train": transforms.Compose([transforms.RandomResizedCrop(224),
                                     transforms.RandomHorizontalFlip(),
                                     transforms.ToTensor(),
                                     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))]),
        "val": transforms.Compose([transforms.Resize((224, 224)),  # cannot 224, must (224, 224)
                                   transforms.ToTensor(),
                                   transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])}

predic.py

import os
import json

import torch
from PIL import Image
from torchvision import transforms
import matplotlib.pyplot as plt

from model import AlexNet

def main():
    device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

    data_transform = transforms.Compose(
        [transforms.Resize((224, 224)),
         transforms.ToTensor(),
         transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

    # load image
    img_path = "../tulip.jpg"
    assert os.path.exists(img_path), "file: '{}' dose not exist.".format(img_path)
    img = Image.open(img_path)

    plt.imshow(img)
    # [N, C, H, W]
    img = data_transform(img)
    # expand batch dimension
    img = torch.unsqueeze(img, dim=0)

    # read class_indict
    json_path = './class_indices.json'
    assert os.path.exists(json_path), "file: '{}' dose not exist.".format(json_path)

    json_file = open(json_path, "r")
    class_indict = json.load(json_file)

    # create model
    model = AlexNet(num_classes=5).to(device)

    # load model weights
    weights_path = "./AlexNet.pth"
    assert os.path.exists(weights_path), "file: '{}' dose not exist.".format(weights_path)
    model.load_state_dict(torch.load(weights_path))

    model.eval()
    with torch.no_grad():
        # predict class
        output = torch.squeeze(model(img.to(device))).cpu()
        predict = torch.softmax(output, dim=0)
        predict_cla = torch.argmax(predict).numpy()

    print_res = "class: {}   prob: {:.3}".format(class_indict[str(predict_cla)],
                                                 predict[predict_cla].numpy())
    plt.title(print_res)
    for i in range(len(predict)):
        print("class: {:10}   prob: {:.3}".format(class_indict[str(i)],
                                                  predict[i].numpy()))
    plt.show()


if __name__ == '__main__':
    main()

你可能感兴趣的:(深度学习,计算机视觉,人工智能,pytorch)