基于Transform的深度学习目标检测(DETR模型)

项目链接

Github

论文:End-to-End Object Detection with Transformers

1、准备数据

1.1 coco数据集

COCO的全称是Common Objects in COntext,是微软团队提供的一个可以用来进行图像识别的数据集。COCO通过在Flickr上搜索80个对象类别和各种场景类型来收集图像,其使用了亚马逊的Mechanical Turk(AMT)。

COCO数据集现在有3种标注类型:object instances(语义分割、目标检测), object keypoints(关键点/目标检测), 和image captions(看图说话),如下图所示:

基于Transform的深度学习目标检测(DETR模型)_第1张图片

%cd work/DETR/

首先解压数据集,执行如下代码即可,解压执行一次就可以。

!mkdir /home/aistudio/dataset
!unzip -q -o /home/aistudio/data/data7122/train2017.zip -d /home/aistudio/dataset
!unzip -q -o /home/aistudio/data/data7122/val2017.zip -d /home/aistudio/dataset
!unzip -q -o /home/aistudio/data/data7122/annotations_trainval2017.zip -d /home/aistudio/dataset
print('完整数据集解压完毕!')

解压之后,完整COCO数据存储结构:

|-- coco |-- annotations:标注文件 |-- person_keypoints_train2017.json:关键点检测 |-- person_keypoints_val2017.json |-- captions_train2017.json:看图说话 |-- captions_val2017.json |-- instances_train2017.json:目标实例 |-- instances_val2017.json |-- images:图片 |-- train2017 |-- val2017

然后安装pycocotools,用于加载、解析和可视化COCO数据集,代码实现如下:

!pip install pycocotools
# 您可以通过如下代码导入实验环境
import warnings
warnings.filterwarnings('ignore')
import os
import time
import numpy as np
from PIL import Image, ImageDraw
import math
from numbers import Integral
import copy
from scipy.optimize import linear_sum_assignment


import pycocotools.mask as mask_util
import collections
from collections.abc import Sequence
import json
import sys
import cv2
import uuid
import random
import traceback
import six
import datetime

import paddle
from paddle.io import Dataset
from paddle.io import DataLoader
from paddle.io import DistributedBatchSampler

import paddle.nn as nn
import paddle.nn.functional as F
from paddle.regularizer import L2Decay
from paddle.nn.initializer import Uniform
from paddle import ParamAttr
from paddle.nn.initializer import Constant
from paddle.vision.ops import DeformConv2D
import paddle.distributed as dist
from paddle.fluid.dataloader.collate import default_collate_fn
import paddle.optimizer as optimizer
import paddle.regularizer as regularizer

from resnet import ConvNormLayer,BottleNeck,Blocks,NameAdapter
from layers import MultiHeadAttention, _convert_attention_mask
from initializer import linear_init_, conv_init_, xavier_uniform_, normal_

2、Model Zoo

2.1 coco的目标检测预训练模型

name backbone schedule inf_time box AP url size
0 DETR R50 500 0.036 42.0 model | logs 159Mb
1 DETR-DC5 R50 500 0.083 43.3 model | logs 159Mb
2 DETR R101 500 0.050 43.5 model | logs 232Mb
3 DETR-DC5 R101 500 0.097 44.9 model | logs 232Mb

2.2 coco的全景分割预训练模型

name backbone box AP segm AP PQ url size
0 DETR R50 38.8 31.1 43.4 download 165Mb
1 DETR-DC5 R50 40.2 31.9 44.6 download 165Mb
2 DETR R101 40.1 33 45.1 download 237Mb

2.2 COCO val5k目标检测评价结果

DETR COCO 2017 evaluation results

Object detection

2.2.1 DETR R50

单个节点训练的Train命令行:

 Eval command line:

python main.py --batch_size 2 --no_aux_loss --eval \
    --resume https://dl.fbaipublicfiles.com/detr/detr-r50-e632da11.pth \
    --coco_path /path/to/coco

COCO bbox detection val5k evaluation results:

IoU metric: bbox
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.420
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.624
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.442
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.205
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.458
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.611
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.333
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.533
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.574
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.312
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.628
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.805

 2.2.2 DETR R50-DC5

Train command line for training on 8 nodes:

python run_with_submitit.py \
    --nodes 8 --timeout 3200 \
    --batch_size 1 --dilation \
    --lr_drop 400 --epochs 500 \
    --coco_path /path/to/coco

Eval command line:

python main.py --no_aux_loss --eval \
    --batch_size 1 --dilation \
    --resume https://dl.fbaipublicfiles.com/detr/detr-r50-dc5-f0fb7ef5.pth \
    --coco_path /path/to/coco

COCO bbox detection val5k evaluation results:

IoU metric: bbox
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.433
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.631
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.459
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.225
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.473
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.611
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.342
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.551
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.594
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.344
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.646
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.814

2.2.3  DETR R101

Train command line for a single node training:

python -m torch.distributed.launch --nproc_per_node=8 --use_env main.py \
    --backbone resnet101 \
    --lr_drop 400 --epochs 500 \
    --coco_path /path/to/coco

Eval command line:

python main.py --batch_size 2 --no_aux_loss --eval \
    --backbone resnet101 \
    --resume https://dl.fbaipublicfiles.com/detr/detr-r101-2c7b67e5.pth \
    --coco_path /path/to/coco

 COCO bbox detection val5k evaluation results:

IoU metric: bbox
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.435
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.638
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.464
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.219
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.480
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.618
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.344
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.548
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.590
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.337
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.644
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.814

 2.2.4 DETR R101-DC5

Train command line for training on 8 nodes:

python run_with_submitit.py \
    --nodes 8 --timeout 3200 \
    --backbone resnet101 \
    --batch_size 1 --dilation \
    --lr_drop 400 --epochs 500 \
    --coco_path /path/to/coco

Eval command line:

python main.py --no_aux_loss --eval \
    --backbone resnet101 \
    --batch_size 1 --dilation \
    --resume https://dl.fbaipublicfiles.com/detr/detr-r101-dc5-a2e86def.pth \
    --coco_path /path/to/coco

COCO bbox detection val5k evaluation results:

IoU metric: bbox
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.449
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.647
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.477
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.237
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.495
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.623
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.350
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.561
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.604
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.348
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.662
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.810

2.3 Panoptic segmentation

2.3.1 DETR R50

Eval command line:

python main.py \
    --batch_size 1 --no_aux_loss --eval \
    --resume https://dl.fbaipublicfiles.com/detr/detr-r50-panoptic-00ce5173.pth \
    --masks --dataset_file coco_panoptic \
    --coco_path /path/to/coco/ \
    --coco_panoptic_path /path/to/coco_panoptic

Results:

IoU metric: segm
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.311
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.541
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.313
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.116
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.346
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.507
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.264
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.395
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.411
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.190
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.467
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.604
IoU metric: bbox
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.388
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.599
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.400
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.173
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.428
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.591
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.314
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.485
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.510
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.253
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.561
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.738
          |    PQ     SQ     RQ     N
--------------------------------------
All       |  43.4   79.3   53.8   133
Things    |  48.2   79.8   59.5    80
Stuff     |  36.3   78.5   45.3    53

2.3.2 DETR-R50 DC5 

Eval command line:

python main.py \
    --dilation \
    --batch_size 1 --no_aux_loss --eval \
    --resume https://dl.fbaipublicfiles.com/detr/detr-r50-dc5-panoptic-da08f1b1.pth \
    --masks --dataset_file coco_panoptic \
    --coco_path /path/to/coco/ \
    --coco_panoptic_path /path/to/coco_panoptic

Results:

IoU metric: segm
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.319
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.547
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.325
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.130
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.358
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.504
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.268
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.406
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.424
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.210
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.478
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.609
IoU metric: bbox
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.402
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.601
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.418
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.193
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.441
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.592
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.320
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.499
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.527
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.279
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.577
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.743
          |    PQ     SQ     RQ     N
--------------------------------------
All       |  44.6   79.8   55.0   133
Things    |  49.4   80.5   60.6    80
Stuff     |  37.3   78.7   46.5    53

2.3.3  DETR-R101

Eval command line:

python main.py \
    --backbone resnet101 \
    --batch_size 1 --no_aux_loss --eval \
    --resume https://dl.fbaipublicfiles.com/detr/detr-r101-panoptic-40021d53.pth \
    --masks --dataset_file coco_panoptic \
    --coco_path /path/to/coco/ \
    --coco_panoptic_path /path/to/coco_panoptic

Results:

IoU metric: segm
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.330
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.565
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.337
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.130
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.371
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.524
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.276
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.417
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.434
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.216
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.489
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.631
IoU metric: bbox
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.401
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.611
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.419
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.184
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.441
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.592
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.321
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.498
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.522
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.270
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.574
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.750
          |    PQ     SQ     RQ     N
--------------------------------------
All       |  45.1   79.9   55.5   133
Things    |  50.5   80.9   61.7    80
Stuff     |  37.0   78.5   46.0    53

2.4  Notebooks Examples

  • DETR's hands on Colab Notebook: Shows how to load a model from hub, generate predictions, then visualize the attention of the model (similar to the figures of the paper)
  • 演示如何从中心加载模型,生成预测,然后可视化模型的注意力(类似于论文中的图形)
  • Standalone Colab Notebook: In this notebook, we demonstrate how to implement a simplified version of DETR from the grounds up in 50 lines of Python, then visualize the predictions. It is a good starting point if you want to gain better understanding the architecture and poke around before diving in the codebase.
  • 在本手册中,我们将演示如何用50行Python代码从头到脚实现简化版的DETR,然后将预测可视化。如果您想更好地理解体系结构,并在深入代码库之前四处查看,那么这是一个很好的起点。
  • 对象检测与DETR -一个最小的实现
    在本笔记本中,我们展示了DETR(检测变压器)的演示,与本文中的基线模型略有不同。我们将展示如何定义模型、加载预先训练的权重以及可视化边界框和类预测。让我们从一些常见的导入开始。
    from PIL import Image
    import requests
    import matplotlib.pyplot as plt
    %config InlineBackend.figure_format = 'retina'
    
    import torch
    from torch import nn
    from torchvision.models import resnet50
    import torchvision.transforms as T
    torch.set_grad_enabled(False);
    
    ## DETR
    Here is a minimal implementation of DETR:
    
    class DETRdemo(nn.Module):
        """
        Demo DETR implementation.
    
        Demo implementation of DETR in minimal number of lines, with the
        following differences wrt DETR in the paper:
        * learned positional encoding (instead of sine)
        * positional encoding is passed at input (instead of attention)
        * fc bbox predictor (instead of MLP)
        The model achieves ~40 AP on COCO val5k and runs at ~28 FPS on Tesla V100.
        Only batch size 1 supported.
        """
        def __init__(self, num_classes, hidden_dim=256, nheads=8,
                     num_encoder_layers=6, num_decoder_layers=6):
            super().__init__()
    
            # create ResNet-50 backbone
            self.backbone = resnet50()
            del self.backbone.fc
    
            # create conversion layer
            self.conv = nn.Conv2d(2048, hidden_dim, 1)
    
            # create a default PyTorch transformer
            self.transformer = nn.Transformer(
                hidden_dim, nheads, num_encoder_layers, num_decoder_layers)
    
            # prediction heads, one extra class for predicting non-empty slots
            # note that in baseline DETR linear_bbox layer is 3-layer MLP
            self.linear_class = nn.Linear(hidden_dim, num_classes + 1)
            self.linear_bbox = nn.Linear(hidden_dim, 4)
    
            # output positional encodings (object queries)
            self.query_pos = nn.Parameter(torch.rand(100, hidden_dim))
    
            # spatial positional encodings
            # note that in baseline DETR we use sine positional encodings
            self.row_embed = nn.Parameter(torch.rand(50, hidden_dim // 2))
            self.col_embed = nn.Parameter(torch.rand(50, hidden_dim // 2))
    
        def forward(self, inputs):
            # propagate inputs through ResNet-50 up to avg-pool layer
            x = self.backbone.conv1(inputs)
            x = self.backbone.bn1(x)
            x = self.backbone.relu(x)
            x = self.backbone.maxpool(x)
    
            x = self.backbone.layer1(x)
            x = self.backbone.layer2(x)
            x = self.backbone.layer3(x)
            x = self.backbone.layer4(x)
    
            # convert from 2048 to 256 feature planes for the transformer
            h = self.conv(x)
    
            # construct positional encodings
            H, W = h.shape[-2:]
            pos = torch.cat([
                self.col_embed[:W].unsqueeze(0).repeat(H, 1, 1),
                self.row_embed[:H].unsqueeze(1).repeat(1, W, 1),
            ], dim=-1).flatten(0, 1).unsqueeze(1)
    
            # propagate through the transformer
            h = self.transformer(pos + 0.1 * h.flatten(2).permute(2, 0, 1),
                                 self.query_pos.unsqueeze(1)).transpose(0, 1)
            
            # finally project transformer outputs to class labels and bounding boxes
            return {'pred_logits': self.linear_class(h), 
                    'pred_boxes': self.linear_bbox(h).sigmoid()}

  • Panoptic Colab Notebook: Demonstrates how to use DETR for panoptic segmentation and plot the predictions.
  • 演示如何使用DETR进行全景分割并绘制预测图。

 3、Usage - Object detection(DETR)

3.1 环境搭建

install PyTorch 1.5+ and torchvision 0.6+:

conda install -c pytorch pytorch torchvision

 Install pycocotools (for evaluation on COCO) and scipy (for training):

conda install cython scipy
pip install -U 'git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI'

或者
git clone https://github.com/cocodataset/cocoapi.git
cd cocoapi/PythonAPI
 
python setup.py build_ext --inplace
python setup.py build_ext install

 That's it, should be good to train and evaluate detection models.

(optional) to work with panoptic install panopticapi:

pip install git+https://github.com/cocodataset/panopticapi.git

 3.2 准备自己的数据集

windows10复现DEtection TRansformers(DETR)并实现自己的数据集

DETR训练自己的数据集

以COCO格式数据集为标准,COCO数据集现在有3种标注类型:object instances(目标实例), object keypoints(目标上的关键点), and image captions(看图说话),使用JSON文件存储。其文件目录如下:

 其中,annotations包含训练集和验证集对应的json文件

在这里插入图片描述

 如果事先准备好了VOC格式的数据集,则可通过脚本进行转换,详见VOC格式数据集转为COCO格式数据集脚本

转换成json格式,生成的文件夹记得改为instances_train2017.json这种样子,如下图

在这里插入图片描述

3.3 Training

要在8个gpu的单个节点上训练基线DETR,运行300个epoch:
python -m torch.distributed.launch --nproc_per_node=8 --use_env main.py --coco_path /path/to/coco 

 3.4 Evaluation

为了在COCO val5k上用一次GPU运行评估DETR R50:

python main.py --batch_size 2 --no_aux_loss --eval --resume https://dl.fbaipublicfiles.com/detr/detr-r50-e632da11.pth --coco_path /path/to/coco

 在此基础上,我们提供了所有的DETR检测模型的结果。请注意,数字取决于每个GPU的批大小(图像数量)。非DC5模型的批大小为2,DC5的批大小为1,因此,如果每个GPU使用超过1张图像进行评估,DC5模型的AP将显著下降。

3.5 Multinode training

pip install submitit

 Train baseline DETR-6-6 model on 4 nodes for 300 epochs:

python run_with_submitit.py --timeout 3000 --coco_path /path/to/coco

3.3.1 预训练模型下载、类别参数设置、模型训练和模型测试

1、构建数据自己数据集的预训练进行迁移学习

将下载的预训练模型.pth文件改一下,因为他是用的coco数据集,而我们只需要训练自己的数据集,就是下图这个文件

在这里插入图片描述

 运行一下代码,就会生成一个你数据集所需要的物体数目的pth,记得改那个数字。

import torch
from pathlib import Path
FILE = Path(__file__).resolve()
print("FILE",FILE)
ROOT = FILE.parents[0]  #  root directory
print("ROOT",ROOT)

pretrained_weights  = torch.load(ROOT / 'pre_weights/detr-r50.pth')#加载预训练模型
num_class = 3 #这里是你的物体数+1,因为背景也算一个
pretrained_weights["model"]["class_embed.weight"].resize_(num_class+1, 256)
pretrained_weights["model"]["class_embed.bias"].resize_(num_class+1)
torch.save(pretrained_weights, "detr-r50_%d.pth"%num_class)#构建自己数据集的权重文件

运行完后会生成下图的文件:

在这里插入图片描述

2、 修改detr.py中305行的num_classes改成你的物体种类的数目

在这里插入图片描述

 基于Transform的深度学习目标检测(DETR模型)_第2张图片

 

 3、修改main.py的参数(根据需要修改)

parser.add_argument('--num_queries', default=100, type=int,

                        help="Number of query slots")#查询槽位数,一张图片的最大框数

4、 运行main.py进行训练

python main.py --dataset_file "coco" --coco_path data/coco --epochs 100 --lr=1e-4 --batch_size=2 --num_workers=4 --output_dir="outputs" --resume="detr-r50_3.pth"

基于Transform的深度学习目标检测(DETR模型)_第3张图片

训练完后会在outputs生成下图的文件,log文件是记录每一个epoch的一些信息

在这里插入图片描述

 最后,就是拿自己的训练的模型进行测试,更改这里的图片路径为要测试的图片的路径,还有第19行的CLASSES=[],记得改成自己的类别!(预测代码参考https://colab.research.google.com/github/facebookresearch/detr/blob/colab/notebooks/DETR_panoptic.ipynb)

在这里插入图片描述

 整个流程就结束了,这个代码还提供了画图的功能,在util中的plot_utils.py文件

在这里插入图片描述

在这个py文件下加入这个代码,路径自己改哦!

if __name__ == '__main__':
    files = list(Path('../outputs/eval').glob('*.pth'))
    plot_precision_recall(files)
    plt.show()
    plot_logs(logs=Path('D:/detr/outputs/log/'),fields=('class_error', 'loss_bbox_unscaled', 'mAP'), ewm_col=0, log_name='log.txt')
    plt.show()

出现这个错误:
OMP: Error #15: Initializing libiomp5md.dll, but found libiomp5md.dll already initialized.

解决方法为

import os
os.environ['KMP_DUPLICATE_LIB_OK'] = 'TRUE'
if __name__ == '__main__':
    os.environ['KMP_DUPLICATE_LIB_OK'] = 'TRUE'
    files = list(Path('../outputs/eval').glob('*.pth'))
    plot_precision_recall(files)
    plt.show()
    plot_logs(logs=Path('D:/BaiduNetdiskDownload/detr/outputs/log/'),fields=('class_error', 'loss_bbox_unscaled', 'mAP'), ewm_col=0, log_name='log.txt')
    plt.show()

 模型的预测代码:

'''
Author: your name
Date: 2022-04-19 16:23:32
LastEditTime: 2022-04-19 18:46:25
LastEditors: Please set LastEditors
Description: 打开koroFileHeader查看配置 进行设置: https://github.com/OBKoro1/koro1FileHeader/wiki/%E9%85%8D%E7%BD%AE
FilePath: \detr-main\predict.py
'''
import argparse
import datetime
from PIL import Image
import requests
import matplotlib.pyplot as plt

import ipywidgets as widgets
from IPython.display import display, clear_output
import torch
from torch import nn
from torchvision.models import resnet50
import torchvision.transforms as T
torch.set_grad_enabled(False)
from models.detr import DETR
from models import build_model

from pathlib import Path
FILE = Path(__file__).resolve()
print("FILE",FILE)
ROOT = FILE.parents[0]  # YOLOv5 root directory
print("ROOT",ROOT)

#COCO classes
CLASSES = [
    'N/A', 'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus',
    'train', 'truck', 'boat', 'traffic light', 'fire hydrant', 'N/A',
    'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse',
    'sheep', 'cow', 'elephant', 'bear', 'zebra', 'giraffe', 'N/A', 'backpack',
    'umbrella', 'N/A', 'N/A', 'handbag', 'tie', 'suitcase', 'frisbee', 'skis',
    'snowboard', 'sports ball', 'kite', 'baseball bat', 'baseball glove',
    'skateboard', 'surfboard', 'tennis racket', 'bottle', 'N/A', 'wine glass',
    'cup', 'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple', 'sandwich',
    'orange', 'broccoli', 'carrot', 'hot dog', 'pizza', 'donut', 'cake',
    'chair', 'couch', 'potted plant', 'bed', 'N/A', 'dining table', 'N/A',
    'N/A', 'toilet', 'N/A', 'tv', 'laptop', 'mouse', 'remote', 'keyboard',
    'cell phone', 'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'N/A',
    'book', 'clock', 'vase', 'scissors', 'teddy bear', 'hair drier',
    'toothbrush'
]

#colors for visualization
COLORS = [[0.000, 0.447, 0.741], [0.850, 0.325, 0.098], [0.929, 0.694, 0.125],
          [0.494, 0.184, 0.556], [0.466, 0.674, 0.188], [0.301, 0.745, 0.933]]


# standard PyTorch mean-std input image normalization
transform = T.Compose([
    T.Resize(800),
    T.ToTensor(),
    T.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])

# for output bounding box post-processing
def box_cxcywh_to_xyxy(x):
    x_c, y_c, w, h = x.unbind(1)
    b = [(x_c - 0.5 * w), (y_c - 0.5 * h),
         (x_c + 0.5 * w), (y_c + 0.5 * h)]
    return torch.stack(b, dim=1)

def rescale_bboxes(out_bbox, size):
    img_w, img_h = size
    b = box_cxcywh_to_xyxy(out_bbox)
    b = b * torch.tensor([img_w, img_h, img_w, img_h], dtype=torch.float32)
    return b


def plot_results(pil_img, prob, boxes):
    plt.figure(figsize=(16,10))
    plt.imshow(pil_img)
    ax = plt.gca()
    colors = COLORS * 100
    for p, (xmin, ymin, xmax, ymax), c in zip(prob, boxes.tolist(), colors):
        ax.add_patch(plt.Rectangle((xmin, ymin), xmax - xmin, ymax - ymin,
                                   fill=False, color=c, linewidth=3))
        cl = p.argmax()
        text = f'{CLASSES[cl]}: {p[cl]:0.2f}'
        ax.text(xmin, ymin, text, fontsize=15,
                bbox=dict(facecolor='yellow', alpha=0.5))
    plt.axis('off')
    plt.show()


def get_args_parser():
    parser = argparse.ArgumentParser('Transformer detector Predict', add_help=False)
    parser.add_argument('--source', type=str, default=ROOT / 'data/coco/val2017/000000007108.jpg', help='file/dir/URL/glob, 0 for webcam')
    parser.add_argument('--weights', type=str, default=ROOT / 'pre_weights/detr-r50-dc5.pth', help='model path(s)')
    parser.add_argument('--out_dir', type=str, default=ROOT / 'runs/inference', help=' save predict result path')
    
    
    parser.add_argument('--lr', default=1e-4, type=float)#学习率
    parser.add_argument('--lr_backbone', default=1e-5, type=float)#backbone提取特征的学习率
    parser.add_argument('--batch_size', default=2, type=int)
    parser.add_argument('--weight_decay', default=1e-4, type=float)
    parser.add_argument('--epochs', default=300, type=int)
    parser.add_argument('--lr_drop', default=200, type=int)
    parser.add_argument('--clip_max_norm', default=0.1, type=float,
                        help='gradient clipping max norm')
    
    # Model parameters
    parser.add_argument('--frozen_weights', type=str, default=None,
                        help="Path to the pretrained model. If set, only the mask head will be trained")#到预训练模型的路径。如果设置,只有面具头将被训练
    # * Backbone
    parser.add_argument('--backbone', default='resnet50', type=str,
                        help="Name of the convolutional backbone to use")#要使用的主干卷积网络的名称
    parser.add_argument('--dilation', action='store_true',
                        help="If true, we replace stride with dilation in the last convolutional block (DC5)")#如果为真,我们在最后一个卷积区用膨胀代替stride (DC5)
    parser.add_argument('--position_embedding', default='sine', type=str, choices=('sine', 'learned'),
                        help="Type of positional embedding to use on top of the image features")#用于图像特征顶部的位置嵌入类型

        


    #* Transformer
    parser.add_argument('--enc_layers', default=6, type=int,
                        help="Number of encoding layers in the transformer")
    parser.add_argument('--dec_layers', default=6, type=int,
                        help="Number of decoding layers in the transformer")
    parser.add_argument('--dim_feedforward', default=2048, type=int,
                        help="Intermediate size of the feedforward layers in the transformer blocks")#变压器块中前馈层的中间尺寸
    parser.add_argument('--hidden_dim', default=256, type=int,
                        help="Size of the embeddings (dimension of the transformer)")#预埋件尺寸(变压器尺寸)
    parser.add_argument('--dropout', default=0.1, type=float,
                        help="Dropout applied in the transformer")#变压器漏接
    parser.add_argument('--nheads', default=8, type=int,
                        help="Number of attention heads inside the transformer's attentions")#变压器注意力内部的注意力头的数量
    parser.add_argument('--num_queries', default=100, type=int,
                        help="Number of query slots")#查询槽位数
    parser.add_argument('--pre_norm', action='store_true')

    # Loss
    parser.add_argument('--no_aux_loss', dest='aux_loss', action='store_false',
                        help="Disables auxiliary decoding losses (loss at each layer)")#禁用辅助解码损耗
    # * Matcher
    parser.add_argument('--set_cost_class', default=1, type=float,
                        help="Class coefficient in the matching cost")#匹配代价中的类系数
    parser.add_argument('--set_cost_bbox', default=5, type=float,
                        help="L1 box coefficient in the matching cost")#匹配成本中的L1盒系数
    parser.add_argument('--set_cost_giou', default=2, type=float,
                        help="giou box coefficient in the matching cost")
    # * Loss coefficients
    parser.add_argument('--mask_loss_coef', default=1, type=float)
    parser.add_argument('--dice_loss_coef', default=1, type=float)
    parser.add_argument('--bbox_loss_coef', default=5, type=float)
    parser.add_argument('--giou_loss_coef', default=2, type=float)
    parser.add_argument('--eos_coef', default=0.1, type=float,
                        help="Relative classification weight of the no-object class")#无对象类的相对分类权重
    
    # * Segmentation
    parser.add_argument('--masks', action='store_true',
                        help="Train segmentation head if the flag is provided")
    
     # dataset parameters
    parser.add_argument('--dataset_file', default='coco')
    parser.add_argument('--device', default='cpu',
                        help='device to use for training / testing')
    return parser
    

def main(args):
    
    # Detection - using a pre-trained model 
    pre_model=args.weights
    model, criterion, postprocessors = build_model(args)
    checkpoints = torch.load(pre_model)
    model.load_state_dict(checkpoints['model'])
    model.eval()
    im = Image.open(args.source)
    # # mean-std normalize the input image (batch-size: 1)
    img = transform(im).unsqueeze(0)
    # # propagate through the model
    outputs = model(img)
    # # keep only predictions with 0.7+ confidence
    probas = outputs['pred_logits'].softmax(-1)[0, :, :-1]
    keep = probas.max(-1).values > 0.9
    # # convert boxes from [0; 1] to image scales
    bboxes_scaled = rescale_bboxes(outputs['pred_boxes'][0, keep], im.size)
    plot_results(im, probas[keep], bboxes_scaled)

if __name__ == '__main__':
    parser = argparse.ArgumentParser('DETR predict script', parents=[get_args_parser()])
    args = parser.parse_args()
    # print("args",args)
    if args.out_dir:
        Path(args.out_dir).mkdir(parents=True, exist_ok=True)
    main(args)

模型的预测结果 

 

 4、Usage -Usage - Segmentation(DETR)

我们表明,它是相对简单的扩展DETR,以预测分割掩码。我们主要展示了强大的全景分割结果

数据准备

For panoptic segmentation, you need the panoptic annotations additionally to the coco dataset (see above for the coco dataset). You need to download and extract the annotations. We expect the directory structure to be the following:

path/to/coco_panoptic/
  annotations/  # annotation json files
  panoptic_train2017/    # train panoptic annotations
  panoptic_val2017/      # val panoptic annotations

 Training

我们建议将分割训练分为两个阶段:首先训练DETR来检测所有的盒子,然后训练分割头部。对于全视域分割,DETR必须学会同时检测物体和物体类的盒子。你可以在8个gpu的单个节点上训练它300个epoch:

python -m torch.distributed.launch --nproc_per_node=8 --use_env main.py --coco_path /path/to/coco  --coco_panoptic_path /path/to/coco_panoptic --dataset_file coco_panoptic --output_dir /output/path/box_model

 例如分割,你可以简单地训练一个普通的盒子模型(或使用我们提供的一个预先训练的模型)。

一旦你有一个盒子模型检查点,你需要冻结它,并训练分割头在隔离。对于全景分割,你可以用8个图形处理器在一个节点上训练25个epoch:

python -m torch.distributed.launch --nproc_per_node=8 --use_env main.py --masks --epochs 25 --lr_drop 15 --coco_path /path/to/coco  --coco_panoptic_path /path/to/coco_panoptic  --dataset_file coco_panoptic --frozen_weights /output/path/box_model/checkpoint.pth --output_dir /output/path/segm_model

 For instance segmentation only, simply remove the dataset_file and coco_panoptic_path arguments from the above command line.

目录

1、准备数据

1.1 coco数据集

2、Model Zoo

2.1 coco的目标检测预训练模型

2.2 coco的全景分割预训练模型

2.2 COCO val5k目标检测评价结果

DETR COCO 2017 evaluation results

Object detection

2.2.1 DETR R50

 2.2.2 DETR R50-DC5

2.2.3  DETR R101

 2.2.4 DETR R101-DC5

2.3 Panoptic segmentation

2.3.2 DETR-R50 DC5 

2.3.3  DETR-R101

2.4  Notebooks Examples


你可能感兴趣的:(深度学习,COCO数据处理,深度学习,机器学习)