使用 PyTorch 的计算机视觉简介 (6/6)

使用 PyTorch 的计算机视觉简介 (6/6)_第1张图片

一、说明

        本文主要介绍CNN中在pytorch的实现,其中MobileNet 网络,数据集来源,以及训练过程,模型生成和存储,模型调入等。

二、轻量级网络和移动网络

        我们已经看到,复杂的网络需要大量的计算资源,如GPU,用于训练和快速推理。然而,事实证明,在大多数情况下,参数数量明显较少的模型仍然可以被训练为表现得相当好。换句话说,模型复杂性的增加通常会导致模型性能的小幅(非成比例)增加。

        我们在模块开始时训练 MNIST 数字分类时观察到了这一点。简单密集模型的准确性并不明显低于强大的CNN。增加分类器中CNN层的数量和/或神经元的数量使我们能够获得百分之几的准确率。这让我们想到,我们可以尝试轻量级网络架构,以便训练更快的模型。如果我们希望能够在移动设备上执行我们的模型,这一点尤其重要。

        该模块将依赖于我们在上一个单元中下载的猫和狗数据集。首先,我们将确保数据集可用。

!wget https://raw.githubusercontent.com/MicrosoftDocs/pytorchfundamentals/main/computer-vision-pytorch/pytorchcv.py

 

import torch
import torch.nn as nn
import torchvision
import matplotlib.pyplot as plt
from torchinfo import summary
import os

from pytorchcv import train, display_dataset, train_long, load_cats_dogs_dataset, validate, common_transform
if not os.path.exists('data/kagglecatsanddogs_5340.zip'):
    !wget -P data -q https://download.microsoft.com/download/3/E/1/3E1C3F21-ECDB-4869-8368-6DEBA77B919F/kagglecatsanddogs_5340.zip

dataset, train_loader, test_loader = load_cats_dogs_dataset()
三、移动网络
        在上一个单元中,我们已经看到了用于图像分类的 ResNet 架构。ResNet的更轻量级模拟是MobileNet,它使用所谓的倒置残差块。让我们加载预先训练的移动网络,看看它是如何工作的:

model = torch.hub.load('pytorch/vision:v0.13.0', 'mobilenet_v2', weights='MobileNet_V2_Weights.DEFAULT')
model.eval()
print(model)
MobileNetV2(
  (features): Sequential(
    (0): ConvBNReLU(
      (0): Conv2d(3, 32, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU6(inplace=True)
    )
    (1): InvertedResidual(
      (conv): Sequential(
        (0): ConvBNReLU(
          (0): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False)
          (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU6(inplace=True)
        )
        (1): Conv2d(32, 16, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (2): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (2): InvertedResidual(
      (conv): Sequential(
        (0): ConvBNReLU(
          (0): Conv2d(16, 96, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU6(inplace=True)
        )
        (1): ConvBNReLU(
          (0): Conv2d(96, 96, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=96, bias=False)
          (1): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU6(inplace=True)
        )
        (2): Conv2d(96, 24, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (3): BatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (3): InvertedResidual(
      (conv): Sequential(
        (0): ConvBNReLU(
          (0): Conv2d(24, 144, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(144, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU6(inplace=True)
        )
        (1): ConvBNReLU(
          (0): Conv2d(144, 144, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=144, bias=False)
          (1): BatchNorm2d(144, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU6(inplace=True)
        )
        (2): Conv2d(144, 24, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (3): BatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (4): InvertedResidual(
      (conv): Sequential(
        (0): ConvBNReLU(
          (0): Conv2d(24, 144, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(144, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU6(inplace=True)
        )
        (1): ConvBNReLU(
          (0): Conv2d(144, 144, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=144, bias=False)
          (1): BatchNorm2d(144, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU6(inplace=True)
        )
        (2): Conv2d(144, 32, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (3): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (5): InvertedResidual(
      (conv): Sequential(
        (0): ConvBNReLU(
          (0): Conv2d(32, 192, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(192, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU6(inplace=True)
        )
        (1): ConvBNReLU(
          (0): Conv2d(192, 192, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=192, bias=False)
          (1): BatchNorm2d(192, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU6(inplace=True)
        )
        (2): Conv2d(192, 32, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (3): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (6): InvertedResidual(
      (conv): Sequential(
        (0): ConvBNReLU(
          (0): Conv2d(32, 192, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(192, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU6(inplace=True)
        )
        (1): ConvBNReLU(
          (0): Conv2d(192, 192, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=192, bias=False)
          (1): BatchNorm2d(192, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU6(inplace=True)
        )
        (2): Conv2d(192, 32, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (3): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (7): InvertedResidual(
      (conv): Sequential(
        (0): ConvBNReLU(
          (0): Conv2d(32, 192, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(192, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU6(inplace=True)
        )
        (1): ConvBNReLU(
          (0): Conv2d(192, 192, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=192, bias=False)
          (1): BatchNorm2d(192, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU6(inplace=True)
        )
        (2): Conv2d(192, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (3): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (8): InvertedResidual(
      (conv): Sequential(
        (0): ConvBNReLU(
          (0): Conv2d(64, 384, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(384, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU6(inplace=True)
        )
        (1): ConvBNReLU(
          (0): Conv2d(384, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=384, bias=False)
          (1): BatchNorm2d(384, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU6(inplace=True)
        )
        (2): Conv2d(384, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (3): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (9): InvertedResidual(
      (conv): Sequential(
        (0): ConvBNReLU(
          (0): Conv2d(64, 384, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(384, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU6(inplace=True)
        )
        (1): ConvBNReLU(
          (0): Conv2d(384, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=384, bias=False)
          (1): BatchNorm2d(384, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU6(inplace=True)
        )
        (2): Conv2d(384, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (3): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (10): InvertedResidual(
      (conv): Sequential(
        (0): ConvBNReLU(
          (0): Conv2d(64, 384, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(384, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU6(inplace=True)
        )
        (1): ConvBNReLU(
          (0): Conv2d(384, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=384, bias=False)
          (1): BatchNorm2d(384, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU6(inplace=True)
        )
        (2): Conv2d(384, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (3): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (11): InvertedResidual(
      (conv): Sequential(
        (0): ConvBNReLU(
          (0): Conv2d(64, 384, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(384, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU6(inplace=True)
        )
        (1): ConvBNReLU(
          (0): Conv2d(384, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=384, bias=False)
          (1): BatchNorm2d(384, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU6(inplace=True)
        )
        (2): Conv2d(384, 96, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (3): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (12): InvertedResidual(
      (conv): Sequential(
        (0): ConvBNReLU(
          (0): Conv2d(96, 576, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(576, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU6(inplace=True)
        )
        (1): ConvBNReLU(
          (0): Conv2d(576, 576, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=576, bias=False)
          (1): BatchNorm2d(576, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU6(inplace=True)
        )
        (2): Conv2d(576, 96, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (3): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (13): InvertedResidual(
      (conv): Sequential(
        (0): ConvBNReLU(
          (0): Conv2d(96, 576, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(576, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU6(inplace=True)
        )
        (1): ConvBNReLU(
          (0): Conv2d(576, 576, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=576, bias=False)
          (1): BatchNorm2d(576, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU6(inplace=True)
        )
        (2): Conv2d(576, 96, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (3): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (14): InvertedResidual(
      (conv): Sequential(
        (0): ConvBNReLU(
          (0): Conv2d(96, 576, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(576, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU6(inplace=True)
        )
        (1): ConvBNReLU(
          (0): Conv2d(576, 576, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=576, bias=False)
          (1): BatchNorm2d(576, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU6(inplace=True)
        )
        (2): Conv2d(576, 160, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (3): BatchNorm2d(160, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (15): InvertedResidual(
      (conv): Sequential(
        (0): ConvBNReLU(
          (0): Conv2d(160, 960, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(960, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU6(inplace=True)
        )
        (1): ConvBNReLU(
          (0): Conv2d(960, 960, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=960, bias=False)
          (1): BatchNorm2d(960, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU6(inplace=True)
        )
        (2): Conv2d(960, 160, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (3): BatchNorm2d(160, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (16): InvertedResidual(
      (conv): Sequential(
        (0): ConvBNReLU(
          (0): Conv2d(160, 960, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(960, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU6(inplace=True)
        )
        (1): ConvBNReLU(
          (0): Conv2d(960, 960, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=960, bias=False)
          (1): BatchNorm2d(960, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU6(inplace=True)
        )
        (2): Conv2d(960, 160, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (3): BatchNorm2d(160, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (17): InvertedResidual(
      (conv): Sequential(
        (0): ConvBNReLU(
          (0): Conv2d(160, 960, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(960, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU6(inplace=True)
        )
        (1): ConvBNReLU(
          (0): Conv2d(960, 960, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=960, bias=False)
          (1): BatchNorm2d(960, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU6(inplace=True)
        )
        (2): Conv2d(960, 320, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (3): BatchNorm2d(320, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (18): ConvBNReLU(
      (0): Conv2d(320, 1280, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (1): BatchNorm2d(1280, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU6(inplace=True)
    )
  )
  (classifier): Sequential(
    (0): Dropout(p=0.2, inplace=False)
    (1): Linear(in_features=1280, out_features=1000, bias=True)
  )
)

        让我们将模型应用于我们的数据集,并确保它有效。

sample_image = dataset[0][0].unsqueeze(0)
res = model(sample_image)
print(res[0].argmax())
tensor(281)

        结果 (281) 是 ImageNet 类号,我们在上一个单元中已经讨论过了。请注意,MobileNet 和全尺寸 ResNet 模型中的参数数量差异很大。在某些方面,MobileNet比VGG型号系列更紧凑,但精度较低。但是,参数数量的减少自然会导致模型精度有所下降。

三 使用移动网络进行迁移学习

        现在,让我们执行与上一单元相同的迁移学习过程,但使用MobileNet。首先,让我们冻结模型的所有参数:

for x in model.parameters():
    x.requires_grad = False

        然后,替换最终分类器。我们还将模型传输到默认训练设备(GPU 或 CPU):

device = 'cuda' if torch.cuda.is_available() else 'cpu'
model.classifier = nn.Linear(1280,2)
model = model.to(device)
summary(model,input_size=(1,3,244,244))
==========================================================================================
Layer (type:depth-idx)                   Output Shape              Param #
==========================================================================================
├─Sequential: 1-1                        [1, 1280, 8, 8]           --
|    └─ConvBNReLU: 2-1                   [1, 32, 122, 122]         --
|    |    └─Conv2d: 3-1                  [1, 32, 122, 122]         (864)
|    |    └─BatchNorm2d: 3-2             [1, 32, 122, 122]         (64)
|    |    └─ReLU6: 3-3                   [1, 32, 122, 122]         --
|    └─InvertedResidual: 2-2             [1, 16, 122, 122]         --
|    |    └─Sequential: 3-4              [1, 16, 122, 122]         (896)
|    └─InvertedResidual: 2-3             [1, 24, 61, 61]           --
|    |    └─Sequential: 3-5              [1, 24, 61, 61]           (5,136)
|    └─InvertedResidual: 2-4             [1, 24, 61, 61]           --
|    |    └─Sequential: 3-6              [1, 24, 61, 61]           (8,832)
|    └─InvertedResidual: 2-5             [1, 32, 31, 31]           --
|    |    └─Sequential: 3-7              [1, 32, 31, 31]           (10,000)
|    └─InvertedResidual: 2-6             [1, 32, 31, 31]           --
|    |    └─Sequential: 3-8              [1, 32, 31, 31]           (14,848)
|    └─InvertedResidual: 2-7             [1, 32, 31, 31]           --
|    |    └─Sequential: 3-9              [1, 32, 31, 31]           (14,848)
|    └─InvertedResidual: 2-8             [1, 64, 16, 16]           --
|    |    └─Sequential: 3-10             [1, 64, 16, 16]           (21,056)
|    └─InvertedResidual: 2-9             [1, 64, 16, 16]           --
|    |    └─Sequential: 3-11             [1, 64, 16, 16]           (54,272)
|    └─InvertedResidual: 2-10            [1, 64, 16, 16]           --
|    |    └─Sequential: 3-12             [1, 64, 16, 16]           (54,272)
|    └─InvertedResidual: 2-11            [1, 64, 16, 16]           --
|    |    └─Sequential: 3-13             [1, 64, 16, 16]           (54,272)
|    └─InvertedResidual: 2-12            [1, 96, 16, 16]           --
|    |    └─Sequential: 3-14             [1, 96, 16, 16]           (66,624)
|    └─InvertedResidual: 2-13            [1, 96, 16, 16]           --
|    |    └─Sequential: 3-15             [1, 96, 16, 16]           (118,272)
|    └─InvertedResidual: 2-14            [1, 96, 16, 16]           --
|    |    └─Sequential: 3-16             [1, 96, 16, 16]           (118,272)
|    └─InvertedResidual: 2-15            [1, 160, 8, 8]            --
|    |    └─Sequential: 3-17             [1, 160, 8, 8]            (155,264)
|    └─InvertedResidual: 2-16            [1, 160, 8, 8]            --
|    |    └─Sequential: 3-18             [1, 160, 8, 8]            (320,000)
|    └─InvertedResidual: 2-17            [1, 160, 8, 8]            --
|    |    └─Sequential: 3-19             [1, 160, 8, 8]            (320,000)
|    └─InvertedResidual: 2-18            [1, 320, 8, 8]            --
|    |    └─Sequential: 3-20             [1, 320, 8, 8]            (473,920)
|    └─ConvBNReLU: 2-19                  [1, 1280, 8, 8]           --
|    |    └─Conv2d: 3-21                 [1, 1280, 8, 8]           (409,600)
|    |    └─BatchNorm2d: 3-22            [1, 1280, 8, 8]           (2,560)
|    |    └─ReLU6: 3-23                  [1, 1280, 8, 8]           --
├─Linear: 1-2                            [1, 2]                    2,562
==========================================================================================
Total params: 2,226,434
Trainable params: 2,562
Non-trainable params: 2,223,872
Total mult-adds (M): 196.40
==========================================================================================
Input size (MB): 0.71
Forward/backward pass size (MB): 20.12
Params size (MB): 8.91
Estimated Total Size (MB): 29.74
==========================================================================================

现在让我们进行实际培训:

train_long(model,train_loader,test_loader,loss_fn=torch.nn.CrossEntropyLoss(),epochs=1,print_freq=90)
Epoch 0, minibatch 0: train acc = 0.5, train loss = 0.02309325896203518
Epoch 0, minibatch 90: train acc = 0.9443681318681318, train loss = 0.006317565729329874
Epoch 0, minibatch 180: train acc = 0.9488950276243094, train loss = 0.00590015182178982
Epoch 0, minibatch 270: train acc = 0.9492619926199262, train loss = 0.006072205810969167
Epoch 0, minibatch 360: train acc = 0.9500519390581718, train loss = 0.00641324315374908
Epoch 0, minibatch 450: train acc = 0.9494872505543237, train loss = 0.006945275943189397
Epoch 0, minibatch 540: train acc = 0.9521141404805915, train loss = 0.0067323536617257896
Epoch 0 done, validation acc = 0.98245, validation loss = 0.002347727584838867

四、结语

        请注意,MobileNet 的精度几乎与 VGG-16 相同,只是略低于满量程 ResNet。小型模型(如MobileNet或ResNet-18)的主要优点是它们可以在移动设备上使用。

V笔记本

你可能感兴趣的:(深度学习,深度学习和图像处理,人工智能,pytorch,计算机视觉,人工智能)