Swin Transformer 来实现行人检测和追踪_副本

转载自AI Studio

标题项目链接https://aistudio.baidu.com/aistudio/projectdetail/2022805

引入

  • 之前使用 Swin Transformer 实现过图像分类任务
  • 今天换个下游任务——目标检测,尝试使用 Swin Transformer 作为 Backbone 在 PaddleDetection 套件中实现目标检测任务

已知问题

  • 目前这个 Backbone 的代码还不太稳定,目前有以下几个问题,才疏学浅,暂时没找到解决方法
    • Droppath 模块中 paddle.rand() 函数会偶发性出现错误,提示 system error
    • RCNN 类模型训练时,当模型的输入分辨率或者 Batchsize 过大时,cuda 会报 700 错误
    • YOLO 类模型训练时,当模型的输入分辨率过大时,会出现 BCE Loss 异常,感觉像是梯度消失导致的

PaddleDetection

PaddleDetection 飞桨目标检测开发套件,旨在帮助开发者更快更好地完成检测模型的组建、训练、优化及部署等全开发流程。

PaddleDetection 模块化地实现了多种主流目标检测算法,提供了丰富的数据增强策略、网络模块组件(如骨干网络)、损失函数等,并集成了模型压缩和跨平台高性能部署能力。

经过长时间产业实践打磨,PaddleDetection 已拥有顺畅、卓越的使用体验,被工业质检、遥感图像检测、无人巡检、新零售、互联网、科研等十多个行业的开发者广泛应用。

产品动态

  • 2021.04.14: 发布 release/2.0 版本,PaddleDetection 全面支持动态图,覆盖静态图模型算法,全面升级模型效果,同时发布 PP-YOLO v2, PPYOLO tiny 模型,增强版 anchor free 模型 PAFNet,新增旋转框检测 S2ANet 模型,详情参考 PaddleDetection
  • 2021.02.07: 发布 release/2.0-rc 版本,PaddleDetection 动态图试用版本,详情参考 PaddleDetection 动态图。

特性

  • 模型丰富: 包含目标检测实例分割人脸检测100+个预训练模型,涵盖多种全球竞赛冠军方案
  • 使用简洁:模块化设计,解耦各个网络组件,开发者轻松搭建、试用各种检测模型及优化策略,快速得到高性能、定制化的算法。
  • 端到端打通: 从数据增强、组网、训练、压缩、部署端到端打通,并完备支持云端/边缘端多架构、多设备部署。
  • 高性能: 基于飞桨的高性能内核,模型训练速度及显存占用优势明显。支持FP16训练, 支持多机训练。

套件结构概览

Architectures Backbones Components Data Augmentation
  • Two-Stage Detection
    • Faster RCNN
    • FPN
    • Cascade-RCNN
    • Libra RCNN
    • Hybrid Task RCNN
    • PSS-Det
  • One-Stage Detection
    • RetinaNet
    • YOLOv3
    • YOLOv4
    • PP-YOLO
    • SSD
  • Anchor Free
    • CornerNet-Squeeze
    • FCOS
    • TTFNet
  • Instance Segmentation
    • Mask RCNN
    • SOLOv2
  • Face-Detction
    • FaceBoxes
    • BlazeFace
    • BlazeFace-NAS
  • ResNet(&vd)
  • ResNeXt(&vd)
  • SENet
  • Res2Net
  • HRNet
  • Hourglass
  • CBNet
  • GCNet
  • DarkNet
  • CSPDarkNet
  • VGG
  • MobileNetv1/v3
  • GhostNet
  • Efficientnet
  • Common
    • Sync-BN
    • Group Norm
    • DCNv2
    • Non-local
  • FPN
    • BiFPN
    • BFP
    • HRFPN
    • ACFPN
  • Loss
    • Smooth-L1
    • GIoU/DIoU/CIoU
    • IoUAware
  • Post-processing
    • SoftNMS
    • MatrixNMS
  • Speed
    • FP16 training
    • Multi-machine training
  • Resize
  • Flipping
  • Expand
  • Crop
  • Color Distort
  • Random Erasing
  • Mixup
  • Cutmix
  • Grid Mask
  • Auto Augment

模型性能概览

各模型结构和骨干网络的代表模型在COCO数据集上精度mAP和单卡Tesla V100上预测速度(FPS)对比图。

Swin Transformer 来实现行人检测和追踪_副本_第1张图片

说明:

  • CBResNetCascade-Faster-RCNN-CBResNet200vd-FPN模型,COCO数据集mAP高达53.3%
  • Cascade-Faster-RCNNCascade-Faster-RCNN-ResNet50vd-DCN,PaddleDetection将其优化到COCO数据mAP为47.8%时推理速度为20FPS
  • PP-YOLO在COCO数据集精度45.9%,Tesla V100预测速度72.9FPS,精度速度均优于YOLOv4
  • PP-YOLO v2是对PP-YOLO模型的进一步优化,在COCO数据集精度49.5%,Tesla V100预测速度68.9FPS

同步 PaddleDetection 代码

# !git clone https://github.com.cnpmjs.org/PaddlePaddle/PaddleDetection -b release/2.0 --depth 1

添加 Backbone

  • 添加模型代码:PaddleDetection/ppdet/modeling/backbones/swin_transformer.py
  • 修改__init__.py:PaddleDetection/ppdet/modeling/backbones/__init__.py

编写配置文件

  • 本次使用的配置文件如下:
# faster_rcnn_swin_ti.yaml
use_gpu: true
log_iter: 10
save_dir: output
snapshot_epoch: 1

epoch: 12

LearningRate:
  base_lr: 0.001
  schedulers:
  - !PiecewiseDecay
    gamma: 0.1
    milestones: [8, 11]
  - !LinearWarmup
    start_factor: 0.1
    steps: 1000

OptimizerBuilder:
  optimizer:
    momentum: 0.9
    type: Momentum
  regularizer:
    factor: 0.0001
    type: L2


architecture: FasterRCNN

FasterRCNN:
  backbone: SwinTransformer
  neck: FPN
  rpn_head: RPNHead
  bbox_head: BBoxHead
  # post process
  bbox_post_process: BBoxPostProcess

SwinTransformer:
  out_indices: [0,1,2,3]
  pretrained: https://bj.bcebos.com/v1/ai-studio-online/19a72dd9eb884f4581492a61fab901e60e858e34569f4805b619eceabd6a4315?responseContentDisposition=attachment%3B%20filename%3Dswin_tiny_patch4_window7_224.pdparams

FPN:
  out_channel: 256

RPNHead:
  anchor_generator:
    aspect_ratios: [0.5, 1.0, 2.0]
    anchor_sizes: [[32], [64], [128], [256], [512]]
    strides: [4, 8, 16, 32, 64]
  rpn_target_assign:
    batch_size_per_im: 256
    fg_fraction: 0.5
    negative_overlap: 0.3
    positive_overlap: 0.7
    use_random: True
  train_proposal:
    min_size: 0.0
    nms_thresh: 0.7
    pre_nms_top_n: 2000
    post_nms_top_n: 1000
    topk_after_collect: True
  test_proposal:
    min_size: 0.0
    nms_thresh: 0.7
    pre_nms_top_n: 1000
    post_nms_top_n: 1000


BBoxHead:
  head: TwoFCHead
  roi_extractor:
    resolution: 7
    sampling_ratio: 0
    aligned: True
  bbox_assigner: BBoxAssigner

BBoxAssigner:
  batch_size_per_im: 512
  bg_thresh: 0.5
  fg_thresh: 0.5
  fg_fraction: 0.25
  use_random: True

TwoFCHead:
  out_channel: 1024

BBoxPostProcess:
  decode: RCNNBox
  nms:
    name: MultiClassNMS
    keep_top_k: 100
    score_threshold: 0.05
    nms_threshold: 0.5

worker_num: 2
TrainReader:
  sample_transforms:
  - Decode: {}
  - RandomResize: {target_size: [[640, 1333]], interp: 2, keep_ratio: True}
  - RandomFlip: {prob: 0.5}
  - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]}
  - Permute: {}
  batch_transforms:
  - PadBatch: {pad_to_stride: 32, pad_gt: true}
  batch_size: 1
  shuffle: true
  drop_last: true


EvalReader:
  sample_transforms:
  - Decode: {}
  - Resize: {interp: 2, target_size: [640, 1333], keep_ratio: True}
  - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]}
  - Permute: {}
  batch_transforms:
  - PadBatch: {pad_to_stride: 32, pad_gt: false}
  batch_size: 1
  shuffle: false
  drop_last: false
  drop_empty: false


TestReader:
  sample_transforms:
  - Decode: {}
  - Resize: {interp: 2, target_size: [640, 1333], keep_ratio: True}
  - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]}
  - Permute: {}
  batch_transforms:
  - PadBatch: {pad_to_stride: 32, pad_gt: false}
  batch_size: 1
  shuffle: false
  drop_last: false

metric: VOC
map_type: integral
num_classes: 4

TrainDataset:
  !VOCDataSet
    dataset_dir: dataset/roadsign_voc
    anno_path: train.txt
    label_list: label_list.txt
    data_fields: ['image', 'gt_bbox', 'gt_class', 'difficult']

EvalDataset:
  !VOCDataSet
    dataset_dir: dataset/roadsign_voc
    anno_path: valid.txt
    label_list: label_list.txt
    data_fields: ['image', 'gt_bbox', 'gt_class', 'difficult']

TestDataset:
  !ImageFolder
    anno_path: dataset/roadsign_voc/label_list.txt

模型训练

%cd ~/PaddleDetection

!python tools/train.py -c ~/faster_rcnn_swin_ti.yaml --eval
%cd ~/PaddleDetection

!python tools/train.py -c ~/yolov3_swin_ti.yaml --eval
%cd work/PaddleDetection/
/home/aistudio/work/PaddleDetection
!python -u tools/infer.py -c faster_rcnn_swin_ti.yaml -o weights=output/faster_rcnn_swin_ti/best_model.pdparams --infer_img=output/000000014439_640x640.jpg
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/layers/utils.py:26: DeprecationWarning: `np.int` is a deprecated alias for the builtin `int`. To silence this warning, use `int` by itself. Doing this will not modify any behavior and is safe. When replacing `np.int`, you may wish to use e.g. `np.int64` or `np.int32` to specify the precision. If you wish to review your current use, check the release note link for additional information.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  def convert_to_list(value, n, name, dtype=np.int):
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/tensor/creation.py:143: DeprecationWarning: `np.object` is a deprecated alias for the builtin `object`. To silence this warning, use `object` by itself. Doing this will not modify any behavior and is safe. 
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  if data.dtype == np.object:
W0605 10:17:16.612674   925 device_context.cc:362] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 10.1, Runtime API Version: 10.1
W0605 10:17:16.617413   925 device_context.cc:372] device: 0, cuDNN Version: 7.6.
2021-06-05 10:17:19,274 - INFO - unique_endpoints {''}
2021-06-05 10:17:19,274 - INFO - Found /home/aistudio/.cache/paddle/hapi/weights/19a72dd9eb884f4581492a61fab901e60e858e34569f4805b619eceabd6a4315?responseContentDisposition=attachment%3B%20filename%3Dswin_tiny_patch4_window7_224.pdparams
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/tensor/creation.py:143: DeprecationWarning: `np.object` is a deprecated alias for the builtin `object`. To silence this warning, use `object` by itself. Doing this will not modify any behavior and is safe. 
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  if data.dtype == np.object:
[06/05 10:17:21] ppdet.utils.checkpoint INFO: Finish loading model weights: output/faster_rcnn_swin_ti/best_model.pdparams
[06/05 10:17:21] ppdet.engine INFO: Detection bbox results save in output/000000014439_640x640.jpg

import numpy as np
import os

image_path = 'mot_images/3/'
imgs = os.listdir(image_path)
infer_imgs = np.random.choice(imgs, 10)
infer_imgs
array(['00092.jpg', '00187.jpg', '00083.jpg', '00005.jpg', '00036.jpg',
       '00032.jpg', '00203.jpg', '00247.jpg', '00103.jpg', '00106.jpg'],
      dtype='
from tqdm import tqdm
# 这里是使用单卡的示例代码
!CUDA_VISIBLE_DEVICES=0
# !python tools/infer.py -c ppyolov2.yml -o weights=output/ppyolov2/best_model.pdparams --infer_img=/home/aistudio/work/PaddleDetection/mot_imgs/0/00161.jpg
for img in tqdm(infer_imgs):
    print("python tools/infer.py -c faster_rcnn_swin_ti.yaml -o weights=output/faster_rcnn_swin_ti/best_model.pdparams --infer_img=mot_images/3/" + img)
    os.system("python tools/infer.py -c faster_rcnn_swin_ti.yaml -o weights=output/faster_rcnn_swin_ti/best_model.pdparams --infer_img=mot_images/3/" + img)
  0%|          | 0/10 [00:00
import glob
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
from tqdm import tqdm

%matplotlib inline
imgs = glob.glob('output/*.jpg')
plt.figure(figsize=(16, 40))
for i in range(len(imgs)):
    img = mpimg.imread(imgs[i])
    plt.subplot(5, 2, i+1)
    plt.imshow(img)
plt.show()

总结

  • 这样 Swin Transformer 模型就被添加到了 PaddleDetection 套件中了
  • 不过目前 Swin Transformer 模型做 PaddleDetection 检测的 Backbone 仍不太稳定
  • 之后再尝试调试一下,找找具体问题在哪,看看能不能把这些问题给解决掉

你可能感兴趣的:(transformer,paddlepaddle,深度学习)