项目3-食物图片分类

项目3-食物图片分类

友情提示

同学们可以前往课程作业区先行动手尝试!!!

项目描述

训练一个简单的卷积神经网络,实现食物图片的分类。

数据集介绍

本次使用的数据集为food-11数据集,共有11类

Bread, Dairy product, Dessert, Egg, Fried food, Meat, Noodles/Pasta, Rice, Seafood, Soup, and Vegetable/Fruit.
(面包,乳制品,甜点,鸡蛋,油炸食品,肉类,面条/意大利面,米饭,海鲜,汤,蔬菜/水果)
Training set: 9866张
Validation set: 3430张
Testing set: 3347张

数据格式
下载 zip 档后解压缩会有三个资料夹,分别为training、validation 以及 testing
training 以及 validation 中的照片名称格式为 [类别]_[编号].jpg,例如 3_100.jpg 即为类别 3 的照片(编号不重要)

项目要求

  • 请使用 CNN 搭建 model
  • 不能使用额外 dataset
  • 禁止使用 pre-trained model(只能自己手写CNN)
  • 请不要上网寻找 label

数据准备

!unzip -d work data/data111173/food-11.zip # 解压缩food-11数据集

环境配置/安装

定义数据集

在 paddle 中,我们可以利用 paddle.io 的 Dataset 及 DataLoader 来"包装"数据,使后续的训练及预测更为方便。
Dataset 需要 overload 两个函数:lengetitem
len 必须要回传 dataset 的大小,而 getitem 则定义了当函数利用 [idx] 取值时,数据集应该要怎么回传数据。
实际上我们并不会直接使用到这两个函数,但是使用 DataLoader 在 enumerate Dataset 时会使用到,没有做的话会在运行阶段出现错误。

# Import需要的套件
import os
import cv2
import time
import numpy as np
import paddle
from paddle.io import Dataset, DataLoader
from paddle.nn import Sequential, Conv2D, BatchNorm2D, ReLU, MaxPool2D, Linear, Flatten
from paddle.vision.transforms import Compose, Transpose, RandomRotation, RandomHorizontalFlip, Normalize, Resize

# 分配GPU设备
place = paddle.CUDAPlace(0)
paddle.disable_static(place)
paddle.__version__
'2.0.2'
# 数据预处理和数据增强
class FoodDataset(Dataset):
    def __init__(self, image_path, image_size=(128, 128), mode='train'):
        self.image_path = image_path
        # 返回一个list,包含给定path目录下所有条目的名字,该list是任意顺序。
        self.image_file_list = sorted(os.listdir(image_path))
        self.mode = mode

        # training 时做数据增强(Data Augmentation) 
        self.train_transforms = Compose([
            Resize(size=image_size),
            RandomHorizontalFlip(), # 随机水平翻转
            RandomRotation(15),     # 随机旋转
            Transpose(),            # 将输入的图像数据更改为目标格式。输出的图片是numpy.ndarray的实例。
                                    # 例如,大多数数据预处理是使用HWC格式的图片,而神经网络可能使用CHW模式输入张量。 
            Normalize(mean=127.5, std=127.5) # Min-Max归一化
        ])
        # testing 时不需做数据增强(Data Augmentation) 
        self.test_transforms = Compose([
            Resize(size=image_size),
            Transpose(),
            Normalize(mean=127.5, std=127.5)
        ])
        
    def __len__(self):
        return len(self.image_file_list)

    # os.path.join():如果各组件名首字母不包含’/’,则函数会自动加上
    # 如果有一个组件是一个绝对路径,则在它之前的所有组件均会被舍弃
    # 如果最后一个组件为空,则生成的路径以一个’/’分隔符结尾
    def __getitem__(self, idx):
        img = cv2.imread(os.path.join(self.image_path, self.image_file_list[idx]))
        if self.mode == 'train':
            img = self.train_transforms(img)
            label = int(self.image_file_list[idx].split("_")[0])
            return img, label
        else:
            img = self.test_transforms(img)
            return img
batch_size = 128
traindataset = FoodDataset('work/food-11/training')
valdataset = FoodDataset('work/food-11/validation')

# drop_last=True,当最后一个batch不等于batch_size时,丢弃最后一个batch;
train_loader = DataLoader(traindataset, places=paddle.CUDAPlace(0), batch_size=batch_size, shuffle=True, drop_last=True)
val_loader = DataLoader(valdataset, places=paddle.CUDAPlace(0), batch_size=batch_size, shuffle=False, drop_last=True)

模型结构

卷积神经网络时常使用“Conv+BN+激活+池化”作为一个基础block,我们可以将多个block堆叠在一起,进行特征提取,最后连接一个Linear层,实现图片分类。

class Classifier(paddle.nn.Layer):
    def __init__(self):
        super(Classifier, self).__init__()

        # 构建一个cnn网络
        # input 维度 [3, 128, 128]
        #  Sequential():构建顺序的线性网络结构,一层一层加到 paddle.nn.Sequential 子类中。
        # 二维卷积层Conv2D():输入图像的通道数为3、卷积操作后输出的通道数为64、卷积核大小为3*3,步长(stride)为1、填充(padding)为1
        # 最终计算输出特征层大小。输入和输出是NCHW或NHWC格式,其中N是批尺寸,C是通道数,H是特征高度,W是特征宽度。
        self.cnn = Sequential(
            Conv2D(3, 64, 3, 1, 1),  # [64, 128, 128]
            BatchNorm2D(64),         # 64为输入 Tensor 的通道数量。BatchNorm2D可用作卷积和全连接操作的批归一化函数,
                                     # 根据当前批次数据按通道计算的均值和方差进行归一化
            ReLU(),
            MaxPool2D(2, 2, 0),      # [64, 64, 64],构建一个二维最大池化层,池化核大小为2*2,池化层的步长为2,池化填充为0

            Conv2D(64, 128, 3, 1, 1), # [128, 64, 64]
            BatchNorm2D(128),
            ReLU(),
            MaxPool2D(2, 2, 0),      # [128, 32, 32]

            Conv2D(128, 256, 3, 1, 1), # [256, 32, 32]
            BatchNorm2D(256),
            ReLU(),
            MaxPool2D(2, 2, 0),      # [256, 16, 16]

            Conv2D(256, 512, 3, 1, 1), # [512, 16, 16]
            BatchNorm2D(512),
            ReLU(),
            MaxPool2D(2, 2, 0),       # [512, 8, 8]
            
            Conv2D(512, 512, 3, 1, 1), # [512, 8, 8]
            BatchNorm2D(512),
            ReLU(),
            MaxPool2D(2, 2, 0),       # [512, 4, 4]
        )
    
        # 构建一个全连接层 fc
        # Linear(512*4*4, 1024):线性变换层输入单元的数目为512*4*4,线性变换层输出单元的数目为1024
        self.fc = Sequential(
            Linear(512*4*4, 1024),  # 除8
            ReLU(),
            Linear(1024, 512),      # 除2
            ReLU(),
            Linear(512, 11)         # 512变成11
        )

    # 前向传播
    def forward(self, x):
        x = self.cnn(x)             
        x = x.flatten() # flatten op 根据给定的start_axis(默认为1) 和 stop_axis(默认为-1) 将连续的维度展平为一维。也就是[512,16]
        x = self.fc(x)  # 该OP将在神经网络中构建一个全连接层
        return x
my_model = paddle.Model(Classifier())  # 模型封装
my_model.summary((-1, 3, 128, 128))    # 模型可视化,(-1, 3, 128, 128)为输入张量的大小
---------------------------------------------------------------------------
 Layer (type)       Input Shape          Output Shape         Param #    
===========================================================================
   Conv2D-1      [[1, 3, 128, 128]]   [1, 64, 128, 128]        1,792     
 BatchNorm2D-1  [[1, 64, 128, 128]]   [1, 64, 128, 128]         256      
    ReLU-1      [[1, 64, 128, 128]]   [1, 64, 128, 128]          0       
  MaxPool2D-1   [[1, 64, 128, 128]]    [1, 64, 64, 64]           0       
   Conv2D-2      [[1, 64, 64, 64]]     [1, 128, 64, 64]       73,856     
 BatchNorm2D-2   [[1, 128, 64, 64]]    [1, 128, 64, 64]         512      
    ReLU-2       [[1, 128, 64, 64]]    [1, 128, 64, 64]          0       
  MaxPool2D-2    [[1, 128, 64, 64]]    [1, 128, 32, 32]          0       
   Conv2D-3      [[1, 128, 32, 32]]    [1, 256, 32, 32]       295,168    
 BatchNorm2D-3   [[1, 256, 32, 32]]    [1, 256, 32, 32]        1,024     
    ReLU-3       [[1, 256, 32, 32]]    [1, 256, 32, 32]          0       
  MaxPool2D-3    [[1, 256, 32, 32]]    [1, 256, 16, 16]          0       
   Conv2D-4      [[1, 256, 16, 16]]    [1, 512, 16, 16]      1,180,160   
 BatchNorm2D-4   [[1, 512, 16, 16]]    [1, 512, 16, 16]        2,048     
    ReLU-4       [[1, 512, 16, 16]]    [1, 512, 16, 16]          0       
  MaxPool2D-4    [[1, 512, 16, 16]]     [1, 512, 8, 8]           0       
   Conv2D-5       [[1, 512, 8, 8]]      [1, 512, 8, 8]       2,359,808   
 BatchNorm2D-5    [[1, 512, 8, 8]]      [1, 512, 8, 8]         2,048     
    ReLU-5        [[1, 512, 8, 8]]      [1, 512, 8, 8]           0       
  MaxPool2D-5     [[1, 512, 8, 8]]      [1, 512, 4, 4]           0       
   Linear-1         [[1, 8192]]           [1, 1024]          8,389,632   
    ReLU-6          [[1, 1024]]           [1, 1024]              0       
   Linear-2         [[1, 1024]]            [1, 512]           524,800    
    ReLU-7           [[1, 512]]            [1, 512]              0       
   Linear-3          [[1, 512]]            [1, 11]             5,643     
===========================================================================
Total params: 12,836,747
Trainable params: 12,830,859
Non-trainable params: 5,888
---------------------------------------------------------------------------
Input size (MB): 0.19
Forward/backward pass size (MB): 49.59
Params size (MB): 48.97
Estimated Total Size (MB): 98.74
---------------------------------------------------------------------------
{'total_params': 12836747, 'trainable_params': 12830859}

模型训练

使用训练数据集训练,并使用验证数据集寻找好的参数

epoch_num = 30
learning_rate = 0.001

model = Classifier()
loss = paddle.nn.loss.CrossEntropyLoss() # 因为是分类任务,所以 loss 使用交叉熵损失函数(CrossEntropyLoss)
optimizer = paddle.optimizer.Adam(learning_rate=learning_rate, parameters=model.parameters()) # optimizer 使用 Adam
print('开始训练...')
for epoch in range(epoch_num):
    epoch_start_time = time.time()
    train_acc = 0.0
    train_loss = 0.0
    val_acc = 0.0
    val_loss = 0.0

    # 模型训练
    model.train()
    for img, label in train_loader():
        optimizer.clear_grad()         # 清除需要优化的参数的梯度。 
        pred = model(img)
        step_loss = loss(pred, label)  # 计算本轮的loss值
        step_loss.backward()           # 反向传播计算得到每个参数的梯度值
        optimizer.step()               # 通过梯度下降执行一步参数更新

        # np.argmax: 返回一个numpy数组中最大值的索引值。
        train_acc += np.sum(np.argmax(pred.numpy(), axis=1) == label.numpy())
        train_loss += step_loss.numpy()[0]

    # 模型验证
    model.eval()
    for img, label in val_loader():
        pred = model(img)
        step_loss = loss(pred, label)
        
        val_acc += np.sum(np.argmax(pred.numpy(), axis=1) == label.numpy())
        val_loss += step_loss.numpy()[0]

    # 将结果打印出来
    # traindataset.__len__()是训练集的图片总数量
    print('[%03d/%03d] %2.2f sec(s) Train Acc: %3.6f Loss: %3.6f | Val Acc: %3.6f loss: %3.6f' % \
                (epoch + 1, epoch_num, \
                 time.time()-epoch_start_time, \
                 train_acc/traindataset.__len__(), \
                 train_loss/traindataset.__len__(), \
                 val_acc/valdataset.__len__(), \
                 val_loss/valdataset.__len__()))
[001/030] 78.07 sec(s) Train Acc: 0.197040 Loss: 0.033973 | Val Acc: 0.239067 loss: 0.016069
[002/030] 66.40 sec(s) Train Acc: 0.283093 Loss: 0.015871 | Val Acc: 0.271720 loss: 0.015075
[003/030] 66.77 sec(s) Train Acc: 0.348571 Loss: 0.014516 | Val Acc: 0.353936 loss: 0.013829
[004/030] 67.00 sec(s) Train Acc: 0.412629 Loss: 0.013252 | Val Acc: 0.381341 loss: 0.013192
[005/030] 92.92 sec(s) Train Acc: 0.458950 Loss: 0.012151 | Val Acc: 0.421283 loss: 0.012638
[006/030] 67.13 sec(s) Train Acc: 0.486418 Loss: 0.011528 | Val Acc: 0.453644 loss: 0.011558
[007/030] 71.55 sec(s) Train Acc: 0.520576 Loss: 0.010795 | Val Acc: 0.499708 loss: 0.010775
[008/030] 66.95 sec(s) Train Acc: 0.558281 Loss: 0.009988 | Val Acc: 0.476676 loss: 0.011289
[009/030] 66.69 sec(s) Train Acc: 0.576221 Loss: 0.009365 | Val Acc: 0.512828 loss: 0.010600
[010/030] 67.44 sec(s) Train Acc: 0.600243 Loss: 0.008972 | Val Acc: 0.507289 loss: 0.010593
[011/030] 69.19 sec(s) Train Acc: 0.621630 Loss: 0.008399 | Val Acc: 0.533236 loss: 0.010174
[012/030] 66.98 sec(s) Train Acc: 0.638861 Loss: 0.008078 | Val Acc: 0.534402 loss: 0.010951
[013/030] 66.74 sec(s) Train Acc: 0.666633 Loss: 0.007525 | Val Acc: 0.553353 loss: 0.009786
[014/030] 69.09 sec(s) Train Acc: 0.686702 Loss: 0.007050 | Val Acc: 0.552187 loss: 0.009947
[015/030] 67.26 sec(s) Train Acc: 0.705554 Loss: 0.006693 | Val Acc: 0.561224 loss: 0.010083
[016/030] 66.67 sec(s) Train Acc: 0.711230 Loss: 0.006416 | Val Acc: 0.590962 loss: 0.008998
[017/030] 66.38 sec(s) Train Acc: 0.727955 Loss: 0.006109 | Val Acc: 0.609621 loss: 0.008669
[018/030] 67.56 sec(s) Train Acc: 0.746402 Loss: 0.005621 | Val Acc: 0.601166 loss: 0.009159
[019/030] 66.95 sec(s) Train Acc: 0.764849 Loss: 0.005266 | Val Acc: 0.590962 loss: 0.009681
[020/030] 67.16 sec(s) Train Acc: 0.770322 Loss: 0.005144 | Val Acc: 0.604373 loss: 0.008848
[021/030] 66.34 sec(s) Train Acc: 0.793635 Loss: 0.004687 | Val Acc: 0.630612 loss: 0.008895
[022/030] 67.93 sec(s) Train Acc: 0.808636 Loss: 0.004238 | Val Acc: 0.645773 loss: 0.008846
[023/030] 67.29 sec(s) Train Acc: 0.809953 Loss: 0.004180 | Val Acc: 0.643732 loss: 0.008936
[024/030] 67.64 sec(s) Train Acc: 0.817251 Loss: 0.004004 | Val Acc: 0.625656 loss: 0.008972
[025/030] 66.48 sec(s) Train Acc: 0.834887 Loss: 0.003617 | Val Acc: 0.618367 loss: 0.010019
[026/030] 66.29 sec(s) Train Acc: 0.858099 Loss: 0.003225 | Val Acc: 0.641108 loss: 0.009641
[027/030] 66.80 sec(s) Train Acc: 0.855869 Loss: 0.003111 | Val Acc: 0.625364 loss: 0.010371
[028/030] 67.64 sec(s) Train Acc: 0.873910 Loss: 0.002769 | Val Acc: 0.650729 loss: 0.009417
[029/030] 67.54 sec(s) Train Acc: 0.889317 Loss: 0.002453 | Val Acc: 0.646939 loss: 0.009326
[030/030] 67.47 sec(s) Train Acc: 0.903203 Loss: 0.002232 | Val Acc: 0.635860 loss: 0.010703

得到好的参数后,我们使用训练数据和验证数据共同训练(数据量变多,模型效果较好)

!mkdir 'work/food-11/train_val'
!cp work/food-11/training/* work/food-11/train_val/
!cp work/food-11/validation/* work/food-11/train_val/
traindataset = FoodDataset('work/food-11/train_val')
train_loader = DataLoader(traindataset, places=paddle.CUDAPlace(0), batch_size=batch_size, shuffle=True, drop_last=True)
epoch_num = 30
learning_rate = 0.001

model_best = Classifier()
loss = paddle.nn.loss.CrossEntropyLoss() # 因为是分类任务,所以 loss 使用 CrossEntropyLoss
optimizer = paddle.optimizer.Adam(learning_rate=learning_rate, parameters=model_best.parameters()) # optimizer 使用 Adam
print('start training...')
for epoch in range(epoch_num):
    epoch_start_time = time.time()
    train_acc = 0.0
    train_loss = 0.0
    val_acc = 0.0
    val_loss = 0.0

    # 模型训练
    model_best.train()
    for img, label in train_loader():
        optimizer.clear_grad()
        pred = model_best(img)
        step_loss = loss(pred, label)
        step_loss.backward()
        optimizer.step()

        train_acc += np.sum(np.argmax(pred.numpy(), axis=1) == label.numpy())
        train_loss += step_loss.numpy()[0]

    # 将结果打印出来
    print('[%03d/%03d] %2.2f sec(s) Train Acc: %3.6f Loss: %3.6f' % \
                (epoch + 1, epoch_num, \
                 time.time()-epoch_start_time, \
                 train_acc/traindataset.__len__(), \
                 train_loss/traindataset.__len__()))
[001/030] 52.86 sec(s) Train Acc: 0.209710 Loss: 0.030978
[002/030] 51.75 sec(s) Train Acc: 0.309852 Loss: 0.015400
[003/030] 51.96 sec(s) Train Acc: 0.368538 Loss: 0.014062
[004/030] 51.41 sec(s) Train Acc: 0.416886 Loss: 0.012990
[005/030] 51.69 sec(s) Train Acc: 0.464119 Loss: 0.012143
[006/030] 52.95 sec(s) Train Acc: 0.508109 Loss: 0.011022
[007/030] 52.35 sec(s) Train Acc: 0.542672 Loss: 0.010315
[008/030] 52.61 sec(s) Train Acc: 0.576221 Loss: 0.009657
[009/030] 52.52 sec(s) Train Acc: 0.585242 Loss: 0.009335
[010/030] 54.86 sec(s) Train Acc: 0.621731 Loss: 0.008598
[011/030] 52.65 sec(s) Train Acc: 0.643016 Loss: 0.008111
[012/030] 53.64 sec(s) Train Acc: 0.648997 Loss: 0.007846
[013/030] 51.45 sec(s) Train Acc: 0.678289 Loss: 0.007290
[014/030] 52.33 sec(s) Train Acc: 0.690249 Loss: 0.006890
[015/030] 55.81 sec(s) Train Acc: 0.709305 Loss: 0.006557
[016/030] 51.56 sec(s) Train Acc: 0.726434 Loss: 0.006150
[017/030] 52.10 sec(s) Train Acc: 0.740523 Loss: 0.005867
[018/030] 51.66 sec(s) Train Acc: 0.750456 Loss: 0.005640
[019/030] 51.62 sec(s) Train Acc: 0.759578 Loss: 0.005316
[020/030] 51.60 sec(s) Train Acc: 0.778938 Loss: 0.004905
[021/030] 51.33 sec(s) Train Acc: 0.795865 Loss: 0.004588
[022/030] 51.84 sec(s) Train Acc: 0.806304 Loss: 0.004316
[023/030] 51.70 sec(s) Train Acc: 0.810562 Loss: 0.004183
[024/030] 52.05 sec(s) Train Acc: 0.830022 Loss: 0.003741
[025/030] 52.42 sec(s) Train Acc: 0.848672 Loss: 0.003405
[026/030] 56.51 sec(s) Train Acc: 0.854855 Loss: 0.003292
[027/030] 52.37 sec(s) Train Acc: 0.868032 Loss: 0.002951
[028/030] 51.69 sec(s) Train Acc: 0.873606 Loss: 0.002777
[029/030] 51.63 sec(s) Train Acc: 0.881005 Loss: 0.002633
[030/030] 51.57 sec(s) Train Acc: 0.899453 Loss: 0.002222

测试

利用刚刚训练好的模型进行预测

batch_size = 128
testdataset = FoodDataset('work/food-11/testing', mode='test')
test_loader = DataLoader(testdataset, places=paddle.CUDAPlace(0), batch_size=batch_size, shuffle=False, drop_last=True)
prediction = list()
model_best.eval()
for img in test_loader():
    pred = model_best(img[0])
    test_label = np.argmax(pred.numpy(), axis=1) # 返回的是最大值序号的单元素数组
    for y in test_label:
        prediction.append(y)
# 将结果写入CSV文件
with open('work/predict.csv', 'w') as f:
    f.write('Id,Category\n')
    for i, y in  enumerate(prediction):
        f.write('{},{}\n'.format(i, y))

你可能感兴趣的:(CH4-李宏毅机器学习,分类,深度学习,python)