基于paddlex图像分类模型训练(二):训练自己的分类模型、熟悉官方demo

0. 前言

相关系列博文:基于paddlex图像分类模型训练(一):图像分类数据集切分:文件夹转化为imagenet训练格式

代码在线运行:

https://aistudio.baidu.com/aistudio/projectdetail/5440569

1. 官方demo:6类蔬菜分类

基于paddlex图像分类模型训练(二):训练自己的分类模型、熟悉官方demo_第1张图片

1.1 百度6类蔬菜数据集下载(各200张,共1200)

import paddlex as pdx
from paddlex import transforms as T

# 下载和解压蔬菜分类数据集
veg_dataset = 'https://bj.bcebos.com/paddlex/datasets/vegetables_cls.tar.gz'
pdx.utils.download_and_decompress(veg_dataset, path='./')

下载后的数据集结构

aistudio@jupyter-40397-5440569:~/work/vegetables_cls$ tree -L 1
.
├── bocai
├── changqiezi
├── hongxiancai
├── huluobo
├── labels.txt
├── test_list.txt
├── train_list.txt
├── val_list.txt
├── xihongshi
└── xilanhua
6 directories, 4 files

1.2 训练

原始代码:https://github.com/PaddlePaddle/PaddleX/blob/develop/tutorials/train/image_classification/mobilenetv3_small.py

import paddlex as pdx
from paddlex import transforms as T

# 下载和解压蔬菜分类数据集
veg_dataset = 'https://bj.bcebos.com/paddlex/datasets/vegetables_cls.tar.gz'
pdx.utils.download_and_decompress(veg_dataset, path='./')

# 定义训练和验证时的transforms
# API说明:https://github.com/PaddlePaddle/PaddleX/blob/develop/docs/apis/transforms/transforms.md
train_transforms = T.Compose(
    [T.RandomCrop(crop_size=224), T.RandomHorizontalFlip(), T.Normalize()])

eval_transforms = T.Compose([
    T.ResizeByShort(short_size=256), T.CenterCrop(crop_size=224), T.Normalize()
])

# 定义训练和验证所用的数据集
# API说明:https://github.com/PaddlePaddle/PaddleX/blob/develop/docs/apis/datasets.md
train_dataset = pdx.datasets.ImageNet(
    data_dir='vegetables_cls',
    file_list='vegetables_cls/train_list.txt',
    label_list='vegetables_cls/labels.txt',
    transforms=train_transforms,
    shuffle=True)

eval_dataset = pdx.datasets.ImageNet(
    data_dir='vegetables_cls',
    file_list='vegetables_cls/val_list.txt',
    label_list='vegetables_cls/labels.txt',
    transforms=eval_transforms)

# 初始化模型,并进行训练
# 可使用VisualDL查看训练指标,参考https://github.com/PaddlePaddle/PaddleX/blob/develop/docs/visualdl.md
num_classes = len(train_dataset.labels)
model = pdx.cls.MobileNetV3_small(num_classes=num_classes)

# API说明:https://github.com/PaddlePaddle/PaddleX/blob/develop/docs/apis/models/classification.md
# 各参数介绍与调整说明:https://github.com/PaddlePaddle/PaddleX/tree/develop/docs/parameters.md
model.train(
    num_epochs=10,
    train_dataset=train_dataset,
    train_batch_size=32,
    eval_dataset=eval_dataset,
    lr_decay_epochs=[4, 6, 8],
    learning_rate=0.01,
    save_dir='output/mobilenetv3_small',
    use_vdl=True)

训练结果 (百度aistudio )

基于paddlex图像分类模型训练(二):训练自己的分类模型、熟悉官方demo_第2张图片

1.3 预测

为了验证实用性,从百度随意下载两张图片


'''
代码来源:
https://github.com/PaddlePaddle/PaddleX/blob/develop/docs/apis/prediction.md
'''
import paddlex as pdx
test_jpg = 'fanqie.jpg'
model = pdx.load_model('output/mobilenetv3_small/best_model/')
result = model.predict(test_jpg)
print("Predict Result: ", result)
# Predict Result:  [{'category_id': 4, 'category': 'xihongshi', 'score': 0.7541489}]

基于paddlex图像分类模型训练(二):训练自己的分类模型、熟悉官方demo_第3张图片
基于paddlex图像分类模型训练(二):训练自己的分类模型、熟悉官方demo_第4张图片

2. 训练自己的动漫分类模型

2.1 数据集

2.2 训练

代码参考:新增超轻量分类模型PPLCNet,在Intel CPU上,单张图像预测速度约5ms,ImageNet-1K数据集上Top1识别准确率达到80.82%,超越ResNet152的模型效果

import paddlex as pdx
from paddlex import transforms as T



# 定义训练和验证时的transforms
# API说明:https://github.com/PaddlePaddle/PaddleX/blob/develop/docs/apis/transforms/transforms.md
train_transforms = T.Compose(
    [T.RandomCrop(crop_size=224), T.RandomHorizontalFlip(), T.Normalize()])

eval_transforms = T.Compose([
    T.ResizeByShort(short_size=256), T.CenterCrop(crop_size=224), T.Normalize()
])

# 定义训练和验证所用的数据集
# API说明:https://github.com/PaddlePaddle/PaddleX/blob/develop/docs/apis/datasets.md
train_dataset = pdx.datasets.ImageNet(
    data_dir='anime_cls_2',
    file_list='anime_cls_2/train_list.txt',
    label_list='anime_cls_2/labels.txt',
    transforms=train_transforms,
    shuffle=True)

eval_dataset = pdx.datasets.ImageNet(
    data_dir='anime_cls_2',
    file_list='anime_cls_2/val_list.txt',
    label_list='anime_cls_2/labels.txt',
    transforms=eval_transforms)

# 初始化模型,并进行训练
# 可使用VisualDL查看训练指标,参考https://github.com/PaddlePaddle/PaddleX/blob/develop/docs/visualdl.md
num_classes = len(train_dataset.labels)
model = pdx.cls.PPLCNet(num_classes=num_classes, scale=1)

# API说明:https://github.com/PaddlePaddle/PaddleX/blob/develop/docs/apis/models/classification.md
# 各参数介绍与调整说明:https://github.com/PaddlePaddle/PaddleX/tree/develop/docs/parameters.md
model.train(
    num_epochs=10,
    pretrain_weights='IMAGENET',
    train_dataset=train_dataset,
    train_batch_size=16,
    eval_dataset=eval_dataset,
    lr_decay_epochs=[4, 6, 8],
    learning_rate=0.1,
    save_dir='output/pplcnet',
    log_interval_steps=10,
    label_smoothing=.1,
    use_vdl=True)

训练时间大约1分钟
基于paddlex图像分类模型训练(二):训练自己的分类模型、熟悉官方demo_第5张图片

2.3 预测

import paddlex as pdx
test_jpg = 'https://img1.baidu.com/it/u=642615975,3013253527&fm=253&fmt=auto&app=138&f=JPEG?w=501&h=500'
model = pdx.load_model('output/pplcnet/best_model/')
result = model.predict(test_jpg)
print("Predict Result: ", result)

附录

在线训练素材数据集

你可能感兴趣的:(#,Paddle,#,图像分类,计算机视觉相关,分类,深度学习)