基于CNN的图像分类

1.问题描述


原文地址:基于CNN的食物分类问题


通过编写CNN进行图片分类,并分辨出食物的类别。

1.1 数据描述

将一系列图片按照[类别_编号]方式进行文件重命名。
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传
(img-9XZ8wQVG-1597155742646)(https://imgkr2.cn-bj.ufileos.com/57641e91-f43b-4196-86ed-5efbd98b91cf.png?UCloudPublicKey=TOKEN_8d8b72be-579a-4e83-bfd0-5f6ce1546f13&Signature=MvDnKR3CFwYUB23bGQb6JWGnwsk%253D&Expires=1597234953)]
·总的数据大小:训练集大小:9866;测试集大小:3430

2.代码实现

2.1运行过程中所用到的库

# Import需要的库
import os
import numpy as np
import cv2
import torch
import torch.nn as nn
import torchvision
import torchvision.transforms as transforms
import pandas as pd
from torch.utils.data import DataLoader, Dataset
import time

from tensorboardX import SummaryWriter

2.2通过CV2读入图片

以下用一个例子展示opencv的使用:

通过创建一个名字为Image的画布,并通过cv2读入名称’0_0.jpg’的文件并展示

# 一个展示cv2读取和显示图片的例子
img = cv2.imread(os.path.join(os.path.join('./food-11', "training"), '0_0.jpg'))
cv2.namedWindow("Image")
cv2.imshow("Image", img)
cv2.waitKey (0)

结果:

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-UIO6NMKT-1597155742647)(https://imgkr2.cn-bj.ufileos.com/43df7df7-86d5-4542-9839-591d371a0a51.png?UCloudPublicKey=TOKEN_8d8b72be-579a-4e83-bfd0-5f6ce1546f13&Signature=u5F68TtiMdDWH0UCx40jYjMPuz0%253D&Expires=1597235548)]

2.2.1 定义图片读入函数

def readfile(path, label):
    # label 是一個 boolean variable,代表需不需要回傳 y 值
    image_dir = sorted(os.listdir(path))
    x = np.zeros((len(image_dir), 128, 128, 3), dtype=np.uint8)
    y = np.zeros((len(image_dir)), dtype=np.uint8)
    for i, file in enumerate(image_dir):
        img = cv2.imread(os.path.join(path, file))
        x[i, :, :] =( cv2.resize(img,(128, 128)))[: , : , : : -1]
        if label:
          y[i] = int(file.split("_")[0])
    if label:
      return x, y
    else:
      return x

代码说明:

1.为减少资源的使用,通过cv2把原图片以[128 * 128 * 3]读入。

2.该函数将input(图片)和ouput(lable)分离

3.注意cv2读入图片是按照[BGR]读入,通过img[: , : , : : -1]可以转换成[RGB]。

2.2.2 训练集和测试集的读入

workspace_dir = './food-11'
print("Reading data")
train_x, train_y = readfile(os.path.join(workspace_dir, "training"), True)
print("Size of training data = {}".format(len(train_x)))
val_x, val_y = readfile(os.path.join(workspace_dir, "validation"), True)
print("Size of test data = {}".format(len(val_x)))

结果:

Reading data
Size of training data = 9866
Size of test data = 3430

2.3 Dataset

2.3.1 ImgDataset类

train_transform = transforms.Compose([
    transforms.ToPILImage(),
    transforms.RandomHorizontalFlip(p=0.45), #0.45的概率翻转原图
    transforms.RandomRotation(15), #-15° -- 15°间随机旋转
    transforms.ToTensor(), #转换成 Tensor数据,並normalize到[0,1](data normalization "0,1")
])

class ImgDataset(Dataset):
    def __init__(self, x, y=None, transform=None):
        self.x = x
        # label is required to be a LongTensor
        self.y = y
        if y is not None:
            self.y = torch.LongTensor(y)
        self.transform = transform
    def __len__(self):
        return len(self.x)
    def __getitem__(self, index):
        X = self.x[index]
        if self.transform is not None:
            X = self.transform(X)
        if self.y is not None:
            Y = self.y[index]
            return X, Y
        else:
            return X

代码解释:

该部分通过ImgDataset继承Dataset类,规范化训练集和测试集。通过transforms.Compose对图片进行变换,详情参考官方文件。

2.3.1 train_set和train_loader生成

batch_size = 128
train_set = ImgDataset(train_x, train_y, train_transform)
val_set = ImgDataset(val_x, val_y, test_transform)
train_loader = DataLoader(train_set, batch_size=batch_size, shuffle=True)
val_loader = DataLoader(val_set, batch_size=batch_size, shuffle=False)

形成以128为一小批量训练集,详情运行代码:

DataLoader??

得到DataLoader的使用方法和含义。

2.4 CNN以及全链接网络模型

class Classifier(nn.Module):
    def __init__(self):
        super(Classifier, self).__init__()
        #torch.nn.Conv2d(in_channels, out_channels, kernel_size, stride, padding)
        #torch.nn.MaxPool2d(kernel_size, stride, padding)
        #input 維度 [3, 128, 128]
        self.cnn = nn.Sequential(
            nn.Conv2d(3, 64, 3, 1, 1),  # [64, 128, 128]
            nn.BatchNorm2d(64),
            nn.ReLU(),
            nn.MaxPool2d(2, 2, 0),      # [64, 64, 64]

            nn.Conv2d(64, 128, 3, 1, 1), # [128, 64, 64]
            nn.BatchNorm2d(128),
            nn.ReLU(),
            nn.MaxPool2d(2, 2, 0),      # [128, 32, 32]

            nn.Conv2d(128, 256, 3, 1, 1), # [256, 32, 32]
            nn.BatchNorm2d(256),
            nn.ReLU(),
            nn.MaxPool2d(2, 2, 0),      # [256, 16, 16]

            nn.Conv2d(256, 512, 3, 1, 1), # [512, 16, 16]
            nn.BatchNorm2d(512),
            nn.ReLU(),
            nn.MaxPool2d(2, 2, 0),       # [512, 8, 8]
            
            nn.Conv2d(512, 512, 3, 1, 1), # [512, 8, 8]
            nn.BatchNorm2d(512),
            nn.ReLU(),
            nn.MaxPool2d(2, 2, 0),       # [512, 4, 4]
        )
        self.fc = nn.Sequential(
            nn.Linear(512*4*4, 1024),
            nn.ReLU(),
            nn.Linear(1024, 512),
            nn.ReLU(),
            nn.Linear(512, 512),
            nn.ReLU(),
            nn.Linear(512, 11)
        )

    def forward(self, x):
        out = self.cnn(x)
        out = out.view(out.size()[0], -1)
        return self.fc(out)

通过上节xx,得到Pytorch网络部分的使用方法。

import torchvision.models as models
nnt_model = models.AlexNet(num_classes = 6)#除了AlexNet模型外还有很多其他的模型

实际上,通过以上代码,可以挑选合适的模型进行导入CNN模型,不需要再自己设计网络。

2.4.1 网络可视化

首先通过tensorboardX读取网络

net = Classifier()
dummy_input = torch.rand(13, 3, 128, 128) #假设输入13张1*28*28的图片
with SummaryWriter(comment='food_classify') as w:
    w.add_graph(net, (dummy_input, ))

之后在终端输入指令(location表示写入的地址,默认与代码路径一致):

tensorboard --logdir=location
之后在浏览器输入:
http://localhost:6006/

打开网络结构图显示:

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-RFApBe3D-1597155742648)(https://imgkr2.cn-bj.ufileos.com/b915f8a1-0569-4027-aa81-45df41d9df27.png?UCloudPublicKey=TOKEN_8d8b72be-579a-4e83-bfd0-5f6ce1546f13&Signature=3CntZXupO6GHfV%252FsFrNxUp%252FvJAM%253D&Expires=1597237985)]
展开Classifier:
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-Y1Phr4MX-1597155742650)(https://imgkr2.cn-bj.ufileos.com/6a4dd1b0-753b-4763-8f1e-8822433726d3.png?UCloudPublicKey=TOKEN_8d8b72be-579a-4e83-bfd0-5f6ce1546f13&Signature=ImNHu%252FuopVdMF9TlmeWcIvCu2DQ%253D&Expires=1597238031)]
通过运行:

print(net)

得到Classifier的结构:

Classifier(
  (cnn): Sequential(
    (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU()
    (3): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (4): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (5): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (6): ReLU()
    (7): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (8): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (9): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (10): ReLU()
    (11): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (12): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (13): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (14): ReLU()
    (15): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (16): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (17): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (18): ReLU()
    (19): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (fc): Sequential(
    (0): Linear(in_features=8192, out_features=1024, bias=True)
    (1): ReLU()
    (2): Linear(in_features=1024, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=512, bias=True)
    (5): ReLU()
    (6): Linear(in_features=512, out_features=11, bias=True)
  )
)

以上两个结果同样显示了该网络由两部分构成:CNN和全链接网络。

2.5 网络的训练以及结果展示

model = Classifier().cuda()
loss = nn.CrossEntropyLoss() # 因為是 classification task,所以 loss 使用 CrossEntropyLoss
optimizer = torch.optim.Adam(model.parameters(), lr=0.001) # optimizer 使用 Adam
num_epoch = 30

for epoch in range(num_epoch):
    epoch_start_time = time.time()
    train_acc = 0.0
    train_loss = 0.0
    val_acc = 0.0
    val_loss = 0.0

    model.train() # 確保 model 是在 train model (開啟 Dropout 等...)
    for i, data in enumerate(train_loader):
        optimizer.zero_grad() # 用 optimizer 將 model 參數的 gradient 歸零
        train_pred = model(data[0].cuda()) # 利用 model 得到預測的機率分佈 這邊實際上就是去呼叫 model 的 forward 函數
        batch_loss = loss(train_pred, data[1].cuda()) # 計算 loss (注意 prediction 跟 label 必須同時在 CPU 或是 GPU 上)
        batch_loss.backward() # 利用 back propagation 算出每個參數的 gradient
        optimizer.step() # 以 optimizer 用 gradient 更新參數值

        train_acc += np.sum(np.argmax(train_pred.cpu().data.numpy(), axis=1) == data[1].numpy())
        train_loss += batch_loss.item()
    
    model.eval()
    with torch.no_grad():
        for i, data in enumerate(val_loader):
            val_pred = model(data[0].cuda())
            batch_loss = loss(val_pred, data[1].cuda())

            val_acc += np.sum(np.argmax(val_pred.cpu().data.numpy(), axis=1) == data[1].numpy())
            val_loss += batch_loss.item()

        #將結果 print 出來
        print('[%03d/%03d] %2.2f sec(s) Train Acc: %3.6f Loss: %3.6f | test Acc: %3.6f loss: %3.6f' % \
            (epoch + 1, num_epoch, time.time()-epoch_start_time, \
             train_acc/train_set.__len__(), train_loss/train_set.__len__(), val_acc/val_set.__len__(), val_loss/val_set.__len__()))

结果

[001/030] 16.98 sec(s) Train Acc: 0.280559 Loss: 0.016183 | test Acc: 0.295335 loss: 0.016735
[002/030] 16.86 sec(s) Train Acc: 0.354754 Loss: 0.014479 | test Acc: 0.319825 loss: 0.017289
[003/030] 16.29 sec(s) Train Acc: 0.400973 Loss: 0.013473 | test Acc: 0.413120 loss: 0.013207
[004/030] 16.08 sec(s) Train Acc: 0.452666 Loss: 0.012537 | test Acc: 0.411953 loss: 0.013202
[005/030] 16.46 sec(s) Train Acc: 0.494121 Loss: 0.011631 | test Acc: 0.460933 loss: 0.012513
[006/030] 16.54 sec(s) Train Acc: 0.515305 Loss: 0.010921 | test Acc: 0.390671 loss: 0.014933
[007/030] 15.99 sec(s) Train Acc: 0.532435 Loss: 0.010492 | test Acc: 0.471720 loss: 0.013182
[008/030] 16.30 sec(s) Train Acc: 0.566491 Loss: 0.009825 | test Acc: 0.504082 loss: 0.011438
[009/030] 16.31 sec(s) Train Acc: 0.586965 Loss: 0.009388 | test Acc: 0.474927 loss: 0.012664
[010/030] 15.97 sec(s) Train Acc: 0.595885 Loss: 0.009042 | test Acc: 0.541108 loss: 0.011149
[011/030] 16.48 sec(s) Train Acc: 0.622846 Loss: 0.008508 | test Acc: 0.499125 loss: 0.012531
[012/030] 16.68 sec(s) Train Acc: 0.636225 Loss: 0.008209 | test Acc: 0.515160 loss: 0.011924
[013/030] 16.58 sec(s) Train Acc: 0.649503 Loss: 0.007908 | test Acc: 0.434111 loss: 0.015755
[014/030] 16.60 sec(s) Train Acc: 0.674437 Loss: 0.007412 | test Acc: 0.558892 loss: 0.011394
[015/030] 16.32 sec(s) Train Acc: 0.668660 Loss: 0.007474 | test Acc: 0.609621 loss: 0.009658
[016/030] 16.70 sec(s) Train Acc: 0.713764 Loss: 0.006494 | test Acc: 0.593294 loss: 0.009982
[017/030] 16.82 sec(s) Train Acc: 0.724204 Loss: 0.006316 | test Acc: 0.534985 loss: 0.014265
[018/030] 16.29 sec(s) Train Acc: 0.724103 Loss: 0.006184 | test Acc: 0.611370 loss: 0.010203
[019/030] 16.37 sec(s) Train Acc: 0.742347 Loss: 0.005739 | test Acc: 0.496210 loss: 0.015555
[020/030] 16.44 sec(s) Train Acc: 0.759578 Loss: 0.005440 | test Acc: 0.626822 loss: 0.010625
[021/030] 16.59 sec(s) Train Acc: 0.767180 Loss: 0.005330 | test Acc: 0.602915 loss: 0.010722
[022/030] 16.90 sec(s) Train Acc: 0.761808 Loss: 0.005411 | test Acc: 0.638484 loss: 0.010020
[023/030] 16.99 sec(s) Train Acc: 0.783093 Loss: 0.004916 | test Acc: 0.541399 loss: 0.013694
[024/030] 17.04 sec(s) Train Acc: 0.800122 Loss: 0.004536 | test Acc: 0.444023 loss: 0.020840
[025/030] 16.95 sec(s) Train Acc: 0.794952 Loss: 0.004620 | test Acc: 0.615452 loss: 0.011386
[026/030] 17.13 sec(s) Train Acc: 0.832354 Loss: 0.003756 | test Acc: 0.634694 loss: 0.010731
[027/030] 17.02 sec(s) Train Acc: 0.837523 Loss: 0.003630 | test Acc: 0.633819 loss: 0.011042
[028/030] 17.02 sec(s) Train Acc: 0.844517 Loss: 0.003431 | test Acc: 0.631195 loss: 0.012094
[029/030] 17.03 sec(s) Train Acc: 0.858200 Loss: 0.003251 | test Acc: 0.632070 loss: 0.011708
[030/030] 16.58 sec(s) Train Acc: 0.808433 Loss: 0.004465 | test Acc: 0.611953 loss: 0.011797

通过训练,最终在训练集上得到80.8%以及测试集上61.2%的准确度。

3 总结

从结果上看,本模型的识别率只有61.2%,误差是比较大的。以下将提出一部分优化方案:

·读入的图片扩大为[256 * 256 *3]或者更大

·在2.3部分,通过变换形成更多的数据集增加训练的数据量

·形成更深、更宽的网络

·其他

4.展望

通过图像分类识别可以做更多有趣的事情,以下提供个人的想法。

4.1 人物-环境分割

该部分想法如下:
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-lnNcnniC-1597155742651)(https://imgkr2.cn-bj.ufileos.com/4aa561f3-4cd6-46cc-8341-ef52649ca0bd.png?UCloudPublicKey=TOKEN_8d8b72be-579a-4e83-bfd0-5f6ce1546f13&Signature=gp3QLenrCVCUP9p1O5S2YyPFx%252BU%253D&Expires=1597239601)]
以图片作为输入以及分离出的人物作为输出,通过大量的数据进行训练,最终形成网络 f f f,通过输入图片可以得到分割的人物。

应用范围:

  1. 证件照:

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-VnqA5tGq-1597155742651)(https://imgkr2.cn-bj.ufileos.com/3ad700dc-015f-49c9-9b1e-15b1037a82ac.png?UCloudPublicKey=TOKEN_8d8b72be-579a-4e83-bfd0-5f6ce1546f13&Signature=Hn%252FDELdWqxGpQMXufMqGRcvdRrk%253D&Expires=1597240022)]

选择你喜欢的颜色,比如绿色作为你的背景色~

  1. 人物识别的优化

通过分割后再进行CNN人脸识别,可能会提高识别的准确性,例如通过网络 g g g可以识别出人物小A,并把文字住在头像上面或者读出图片中的人名等。

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-HrIJQO2z-1597155742652)(https://imgkr2.cn-bj.ufileos.com/3ac0c2b4-6440-4ab7-98be-08473b51d6a3.png?UCloudPublicKey=TOKEN_8d8b72be-579a-4e83-bfd0-5f6ce1546f13&Signature=3eSxiNkGlv3bn7i18yvhIA4l3XA%253D&Expires=1597240237)]

  1. 应用于犯罪侦查或找人

首先获得目标人物有关视频(视频的本质是一帧帧图片的顺序排列)或大量图片进行训练并得到可以识别出目标的网络,之后接入摄像头,实时分析视频是否出现目标,当检测到目标后发送检索信息。

5 参考

代码及原理主要参考了李宏毅老师以及他的团队。

B站地址:https://www.bilibili.com/video/BV1JE411g7XF?p=21
本文代码以及数据:https://github.com/bternity/ML2020/CNN_photo

你可能感兴趣的:(Pytorch,CNN,神经网络,人工智能,深度学习)