项目链接
Github
论文:End-to-End Object Detection with Transformers
COCO的全称是Common Objects in COntext,是微软团队提供的一个可以用来进行图像识别的数据集。COCO通过在Flickr上搜索80个对象类别和各种场景类型来收集图像,其使用了亚马逊的Mechanical Turk(AMT)。
COCO数据集现在有3种标注类型:object instances(语义分割、目标检测), object keypoints(关键点/目标检测), 和image captions(看图说话),如下图所示:
%cd work/DETR/
首先解压数据集,执行如下代码即可,解压执行一次就可以。
!mkdir /home/aistudio/dataset
!unzip -q -o /home/aistudio/data/data7122/train2017.zip -d /home/aistudio/dataset
!unzip -q -o /home/aistudio/data/data7122/val2017.zip -d /home/aistudio/dataset
!unzip -q -o /home/aistudio/data/data7122/annotations_trainval2017.zip -d /home/aistudio/dataset
print('完整数据集解压完毕!')
解压之后,完整COCO数据存储结构:
|-- coco |-- annotations:标注文件 |-- person_keypoints_train2017.json:关键点检测 |-- person_keypoints_val2017.json |-- captions_train2017.json:看图说话 |-- captions_val2017.json |-- instances_train2017.json:目标实例 |-- instances_val2017.json |-- images:图片 |-- train2017 |-- val2017
然后安装pycocotools,用于加载、解析和可视化COCO数据集,代码实现如下:
!pip install pycocotools
# 您可以通过如下代码导入实验环境
import warnings
warnings.filterwarnings('ignore')
import os
import time
import numpy as np
from PIL import Image, ImageDraw
import math
from numbers import Integral
import copy
from scipy.optimize import linear_sum_assignment
import pycocotools.mask as mask_util
import collections
from collections.abc import Sequence
import json
import sys
import cv2
import uuid
import random
import traceback
import six
import datetime
import paddle
from paddle.io import Dataset
from paddle.io import DataLoader
from paddle.io import DistributedBatchSampler
import paddle.nn as nn
import paddle.nn.functional as F
from paddle.regularizer import L2Decay
from paddle.nn.initializer import Uniform
from paddle import ParamAttr
from paddle.nn.initializer import Constant
from paddle.vision.ops import DeformConv2D
import paddle.distributed as dist
from paddle.fluid.dataloader.collate import default_collate_fn
import paddle.optimizer as optimizer
import paddle.regularizer as regularizer
from resnet import ConvNormLayer,BottleNeck,Blocks,NameAdapter
from layers import MultiHeadAttention, _convert_attention_mask
from initializer import linear_init_, conv_init_, xavier_uniform_, normal_
name | backbone | schedule | inf_time | box AP | url | size | |
---|---|---|---|---|---|---|---|
0 | DETR | R50 | 500 | 0.036 | 42.0 | model | logs | 159Mb |
1 | DETR-DC5 | R50 | 500 | 0.083 | 43.3 | model | logs | 159Mb |
2 | DETR | R101 | 500 | 0.050 | 43.5 | model | logs | 232Mb |
3 | DETR-DC5 | R101 | 500 | 0.097 | 44.9 | model | logs | 232Mb |
name | backbone | box AP | segm AP | PQ | url | size | |
---|---|---|---|---|---|---|---|
0 | DETR | R50 | 38.8 | 31.1 | 43.4 | download | 165Mb |
1 | DETR-DC5 | R50 | 40.2 | 31.9 | 44.6 | download | 165Mb |
2 | DETR | R101 | 40.1 | 33 | 45.1 | download | 237Mb |
单个节点训练的Train命令行:
Eval command line:
python main.py --batch_size 2 --no_aux_loss --eval \
--resume https://dl.fbaipublicfiles.com/detr/detr-r50-e632da11.pth \
--coco_path /path/to/coco
COCO bbox detection val5k evaluation results:
IoU metric: bbox
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.420
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.624
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.442
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.205
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.458
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.611
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.333
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.533
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.574
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.312
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.628
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.805
Train command line for training on 8 nodes:
python run_with_submitit.py \
--nodes 8 --timeout 3200 \
--batch_size 1 --dilation \
--lr_drop 400 --epochs 500 \
--coco_path /path/to/coco
Eval command line:
python main.py --no_aux_loss --eval \
--batch_size 1 --dilation \
--resume https://dl.fbaipublicfiles.com/detr/detr-r50-dc5-f0fb7ef5.pth \
--coco_path /path/to/coco
COCO bbox detection val5k evaluation results:
IoU metric: bbox
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.433
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.631
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.459
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.225
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.473
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.611
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.342
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.551
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.594
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.344
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.646
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.814
Train command line for a single node training:
python -m torch.distributed.launch --nproc_per_node=8 --use_env main.py \
--backbone resnet101 \
--lr_drop 400 --epochs 500 \
--coco_path /path/to/coco
Eval command line:
python main.py --batch_size 2 --no_aux_loss --eval \
--backbone resnet101 \
--resume https://dl.fbaipublicfiles.com/detr/detr-r101-2c7b67e5.pth \
--coco_path /path/to/coco
COCO bbox detection val5k evaluation results:
IoU metric: bbox
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.435
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.638
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.464
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.219
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.480
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.618
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.344
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.548
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.590
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.337
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.644
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.814
Train command line for training on 8 nodes:
python run_with_submitit.py \
--nodes 8 --timeout 3200 \
--backbone resnet101 \
--batch_size 1 --dilation \
--lr_drop 400 --epochs 500 \
--coco_path /path/to/coco
Eval command line:
python main.py --no_aux_loss --eval \
--backbone resnet101 \
--batch_size 1 --dilation \
--resume https://dl.fbaipublicfiles.com/detr/detr-r101-dc5-a2e86def.pth \
--coco_path /path/to/coco
COCO bbox detection val5k evaluation results:
IoU metric: bbox
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.449
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.647
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.477
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.237
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.495
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.623
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.350
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.561
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.604
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.348
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.662
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.810
2.3.1 DETR R50
Eval command line:
python main.py \
--batch_size 1 --no_aux_loss --eval \
--resume https://dl.fbaipublicfiles.com/detr/detr-r50-panoptic-00ce5173.pth \
--masks --dataset_file coco_panoptic \
--coco_path /path/to/coco/ \
--coco_panoptic_path /path/to/coco_panoptic
Results:
IoU metric: segm
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.311
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.541
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.313
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.116
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.346
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.507
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.264
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.395
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.411
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.190
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.467
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.604
IoU metric: bbox
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.388
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.599
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.400
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.173
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.428
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.591
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.314
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.485
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.510
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.253
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.561
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.738
| PQ SQ RQ N
--------------------------------------
All | 43.4 79.3 53.8 133
Things | 48.2 79.8 59.5 80
Stuff | 36.3 78.5 45.3 53
Eval command line:
python main.py \
--dilation \
--batch_size 1 --no_aux_loss --eval \
--resume https://dl.fbaipublicfiles.com/detr/detr-r50-dc5-panoptic-da08f1b1.pth \
--masks --dataset_file coco_panoptic \
--coco_path /path/to/coco/ \
--coco_panoptic_path /path/to/coco_panoptic
Results:
IoU metric: segm
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.319
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.547
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.325
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.130
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.358
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.504
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.268
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.406
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.424
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.210
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.478
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.609
IoU metric: bbox
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.402
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.601
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.418
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.193
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.441
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.592
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.320
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.499
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.527
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.279
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.577
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.743
| PQ SQ RQ N
--------------------------------------
All | 44.6 79.8 55.0 133
Things | 49.4 80.5 60.6 80
Stuff | 37.3 78.7 46.5 53
Eval command line:
python main.py \
--backbone resnet101 \
--batch_size 1 --no_aux_loss --eval \
--resume https://dl.fbaipublicfiles.com/detr/detr-r101-panoptic-40021d53.pth \
--masks --dataset_file coco_panoptic \
--coco_path /path/to/coco/ \
--coco_panoptic_path /path/to/coco_panoptic
Results:
IoU metric: segm
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.330
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.565
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.337
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.130
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.371
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.524
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.276
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.417
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.434
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.216
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.489
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.631
IoU metric: bbox
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.401
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.611
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.419
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.184
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.441
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.592
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.321
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.498
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.522
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.270
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.574
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.750
| PQ SQ RQ N
--------------------------------------
All | 45.1 79.9 55.5 133
Things | 50.5 80.9 61.7 80
Stuff | 37.0 78.5 46.0 53
对象检测与DETR -一个最小的实现
在本笔记本中,我们展示了DETR(检测变压器)的演示,与本文中的基线模型略有不同。我们将展示如何定义模型、加载预先训练的权重以及可视化边界框和类预测。让我们从一些常见的导入开始。
from PIL import Image
import requests
import matplotlib.pyplot as plt
%config InlineBackend.figure_format = 'retina'
import torch
from torch import nn
from torchvision.models import resnet50
import torchvision.transforms as T
torch.set_grad_enabled(False);
## DETR
Here is a minimal implementation of DETR:
class DETRdemo(nn.Module):
"""
Demo DETR implementation.
Demo implementation of DETR in minimal number of lines, with the
following differences wrt DETR in the paper:
* learned positional encoding (instead of sine)
* positional encoding is passed at input (instead of attention)
* fc bbox predictor (instead of MLP)
The model achieves ~40 AP on COCO val5k and runs at ~28 FPS on Tesla V100.
Only batch size 1 supported.
"""
def __init__(self, num_classes, hidden_dim=256, nheads=8,
num_encoder_layers=6, num_decoder_layers=6):
super().__init__()
# create ResNet-50 backbone
self.backbone = resnet50()
del self.backbone.fc
# create conversion layer
self.conv = nn.Conv2d(2048, hidden_dim, 1)
# create a default PyTorch transformer
self.transformer = nn.Transformer(
hidden_dim, nheads, num_encoder_layers, num_decoder_layers)
# prediction heads, one extra class for predicting non-empty slots
# note that in baseline DETR linear_bbox layer is 3-layer MLP
self.linear_class = nn.Linear(hidden_dim, num_classes + 1)
self.linear_bbox = nn.Linear(hidden_dim, 4)
# output positional encodings (object queries)
self.query_pos = nn.Parameter(torch.rand(100, hidden_dim))
# spatial positional encodings
# note that in baseline DETR we use sine positional encodings
self.row_embed = nn.Parameter(torch.rand(50, hidden_dim // 2))
self.col_embed = nn.Parameter(torch.rand(50, hidden_dim // 2))
def forward(self, inputs):
# propagate inputs through ResNet-50 up to avg-pool layer
x = self.backbone.conv1(inputs)
x = self.backbone.bn1(x)
x = self.backbone.relu(x)
x = self.backbone.maxpool(x)
x = self.backbone.layer1(x)
x = self.backbone.layer2(x)
x = self.backbone.layer3(x)
x = self.backbone.layer4(x)
# convert from 2048 to 256 feature planes for the transformer
h = self.conv(x)
# construct positional encodings
H, W = h.shape[-2:]
pos = torch.cat([
self.col_embed[:W].unsqueeze(0).repeat(H, 1, 1),
self.row_embed[:H].unsqueeze(1).repeat(1, W, 1),
], dim=-1).flatten(0, 1).unsqueeze(1)
# propagate through the transformer
h = self.transformer(pos + 0.1 * h.flatten(2).permute(2, 0, 1),
self.query_pos.unsqueeze(1)).transpose(0, 1)
# finally project transformer outputs to class labels and bounding boxes
return {'pred_logits': self.linear_class(h),
'pred_boxes': self.linear_bbox(h).sigmoid()}
install PyTorch 1.5+ and torchvision 0.6+:
conda install -c pytorch pytorch torchvision
Install pycocotools (for evaluation on COCO) and scipy (for training):
conda install cython scipy
pip install -U 'git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI'
或者
git clone https://github.com/cocodataset/cocoapi.git
cd cocoapi/PythonAPI
python setup.py build_ext --inplace
python setup.py build_ext install
That's it, should be good to train and evaluate detection models.
(optional) to work with panoptic install panopticapi:
pip install git+https://github.com/cocodataset/panopticapi.git
windows10复现DEtection TRansformers(DETR)并实现自己的数据集
DETR训练自己的数据集
以COCO格式数据集为标准,COCO数据集现在有3种标注类型:object instances(目标实例), object keypoints(目标上的关键点), and image captions(看图说话),使用JSON文件存储。其文件目录如下:
其中,annotations包含训练集和验证集对应的json文件
如果事先准备好了VOC格式的数据集,则可通过脚本进行转换,详见VOC格式数据集转为COCO格式数据集脚本
转换成json格式,生成的文件夹记得改为instances_train2017.json这种样子,如下图
要在8个gpu的单个节点上训练基线DETR,运行300个epoch:
python -m torch.distributed.launch --nproc_per_node=8 --use_env main.py --coco_path /path/to/coco
为了在COCO val5k上用一次GPU运行评估DETR R50:
python main.py --batch_size 2 --no_aux_loss --eval --resume https://dl.fbaipublicfiles.com/detr/detr-r50-e632da11.pth --coco_path /path/to/coco
在此基础上,我们提供了所有的DETR检测模型的结果。请注意,数字取决于每个GPU的批大小(图像数量)。非DC5模型的批大小为2,DC5的批大小为1,因此,如果每个GPU使用超过1张图像进行评估,DC5模型的AP将显著下降。
pip install submitit
Train baseline DETR-6-6 model on 4 nodes for 300 epochs:
python run_with_submitit.py --timeout 3000 --coco_path /path/to/coco
将下载的预训练模型.pth文件改一下,因为他是用的coco数据集,而我们只需要训练自己的数据集,就是下图这个文件
运行一下代码,就会生成一个你数据集所需要的物体数目的pth,记得改那个数字。
import torch
from pathlib import Path
FILE = Path(__file__).resolve()
print("FILE",FILE)
ROOT = FILE.parents[0] # root directory
print("ROOT",ROOT)
pretrained_weights = torch.load(ROOT / 'pre_weights/detr-r50.pth')#加载预训练模型
num_class = 3 #这里是你的物体数+1,因为背景也算一个
pretrained_weights["model"]["class_embed.weight"].resize_(num_class+1, 256)
pretrained_weights["model"]["class_embed.bias"].resize_(num_class+1)
torch.save(pretrained_weights, "detr-r50_%d.pth"%num_class)#构建自己数据集的权重文件
运行完后会生成下图的文件:
parser.add_argument('--num_queries', default=100, type=int,
help="Number of query slots")#查询槽位数,一张图片的最大框数
python main.py --dataset_file "coco" --coco_path data/coco --epochs 100 --lr=1e-4 --batch_size=2 --num_workers=4 --output_dir="outputs" --resume="detr-r50_3.pth"
训练完后会在outputs生成下图的文件,log文件是记录每一个epoch的一些信息
最后,就是拿自己的训练的模型进行测试,更改这里的图片路径为要测试的图片的路径,还有第19行的CLASSES=[],记得改成自己的类别!(预测代码参考https://colab.research.google.com/github/facebookresearch/detr/blob/colab/notebooks/DETR_panoptic.ipynb)
整个流程就结束了,这个代码还提供了画图的功能,在util中的plot_utils.py文件
在这个py文件下加入这个代码,路径自己改哦!
if __name__ == '__main__':
files = list(Path('../outputs/eval').glob('*.pth'))
plot_precision_recall(files)
plt.show()
plot_logs(logs=Path('D:/detr/outputs/log/'),fields=('class_error', 'loss_bbox_unscaled', 'mAP'), ewm_col=0, log_name='log.txt')
plt.show()
出现这个错误:
OMP: Error #15: Initializing libiomp5md.dll, but found libiomp5md.dll already initialized.
解决方法为
import os
os.environ['KMP_DUPLICATE_LIB_OK'] = 'TRUE'
if __name__ == '__main__':
os.environ['KMP_DUPLICATE_LIB_OK'] = 'TRUE'
files = list(Path('../outputs/eval').glob('*.pth'))
plot_precision_recall(files)
plt.show()
plot_logs(logs=Path('D:/BaiduNetdiskDownload/detr/outputs/log/'),fields=('class_error', 'loss_bbox_unscaled', 'mAP'), ewm_col=0, log_name='log.txt')
plt.show()
模型的预测代码:
'''
Author: your name
Date: 2022-04-19 16:23:32
LastEditTime: 2022-04-19 18:46:25
LastEditors: Please set LastEditors
Description: 打开koroFileHeader查看配置 进行设置: https://github.com/OBKoro1/koro1FileHeader/wiki/%E9%85%8D%E7%BD%AE
FilePath: \detr-main\predict.py
'''
import argparse
import datetime
from PIL import Image
import requests
import matplotlib.pyplot as plt
import ipywidgets as widgets
from IPython.display import display, clear_output
import torch
from torch import nn
from torchvision.models import resnet50
import torchvision.transforms as T
torch.set_grad_enabled(False)
from models.detr import DETR
from models import build_model
from pathlib import Path
FILE = Path(__file__).resolve()
print("FILE",FILE)
ROOT = FILE.parents[0] # YOLOv5 root directory
print("ROOT",ROOT)
#COCO classes
CLASSES = [
'N/A', 'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus',
'train', 'truck', 'boat', 'traffic light', 'fire hydrant', 'N/A',
'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse',
'sheep', 'cow', 'elephant', 'bear', 'zebra', 'giraffe', 'N/A', 'backpack',
'umbrella', 'N/A', 'N/A', 'handbag', 'tie', 'suitcase', 'frisbee', 'skis',
'snowboard', 'sports ball', 'kite', 'baseball bat', 'baseball glove',
'skateboard', 'surfboard', 'tennis racket', 'bottle', 'N/A', 'wine glass',
'cup', 'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple', 'sandwich',
'orange', 'broccoli', 'carrot', 'hot dog', 'pizza', 'donut', 'cake',
'chair', 'couch', 'potted plant', 'bed', 'N/A', 'dining table', 'N/A',
'N/A', 'toilet', 'N/A', 'tv', 'laptop', 'mouse', 'remote', 'keyboard',
'cell phone', 'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'N/A',
'book', 'clock', 'vase', 'scissors', 'teddy bear', 'hair drier',
'toothbrush'
]
#colors for visualization
COLORS = [[0.000, 0.447, 0.741], [0.850, 0.325, 0.098], [0.929, 0.694, 0.125],
[0.494, 0.184, 0.556], [0.466, 0.674, 0.188], [0.301, 0.745, 0.933]]
# standard PyTorch mean-std input image normalization
transform = T.Compose([
T.Resize(800),
T.ToTensor(),
T.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])
# for output bounding box post-processing
def box_cxcywh_to_xyxy(x):
x_c, y_c, w, h = x.unbind(1)
b = [(x_c - 0.5 * w), (y_c - 0.5 * h),
(x_c + 0.5 * w), (y_c + 0.5 * h)]
return torch.stack(b, dim=1)
def rescale_bboxes(out_bbox, size):
img_w, img_h = size
b = box_cxcywh_to_xyxy(out_bbox)
b = b * torch.tensor([img_w, img_h, img_w, img_h], dtype=torch.float32)
return b
def plot_results(pil_img, prob, boxes):
plt.figure(figsize=(16,10))
plt.imshow(pil_img)
ax = plt.gca()
colors = COLORS * 100
for p, (xmin, ymin, xmax, ymax), c in zip(prob, boxes.tolist(), colors):
ax.add_patch(plt.Rectangle((xmin, ymin), xmax - xmin, ymax - ymin,
fill=False, color=c, linewidth=3))
cl = p.argmax()
text = f'{CLASSES[cl]}: {p[cl]:0.2f}'
ax.text(xmin, ymin, text, fontsize=15,
bbox=dict(facecolor='yellow', alpha=0.5))
plt.axis('off')
plt.show()
def get_args_parser():
parser = argparse.ArgumentParser('Transformer detector Predict', add_help=False)
parser.add_argument('--source', type=str, default=ROOT / 'data/coco/val2017/000000007108.jpg', help='file/dir/URL/glob, 0 for webcam')
parser.add_argument('--weights', type=str, default=ROOT / 'pre_weights/detr-r50-dc5.pth', help='model path(s)')
parser.add_argument('--out_dir', type=str, default=ROOT / 'runs/inference', help=' save predict result path')
parser.add_argument('--lr', default=1e-4, type=float)#学习率
parser.add_argument('--lr_backbone', default=1e-5, type=float)#backbone提取特征的学习率
parser.add_argument('--batch_size', default=2, type=int)
parser.add_argument('--weight_decay', default=1e-4, type=float)
parser.add_argument('--epochs', default=300, type=int)
parser.add_argument('--lr_drop', default=200, type=int)
parser.add_argument('--clip_max_norm', default=0.1, type=float,
help='gradient clipping max norm')
# Model parameters
parser.add_argument('--frozen_weights', type=str, default=None,
help="Path to the pretrained model. If set, only the mask head will be trained")#到预训练模型的路径。如果设置,只有面具头将被训练
# * Backbone
parser.add_argument('--backbone', default='resnet50', type=str,
help="Name of the convolutional backbone to use")#要使用的主干卷积网络的名称
parser.add_argument('--dilation', action='store_true',
help="If true, we replace stride with dilation in the last convolutional block (DC5)")#如果为真,我们在最后一个卷积区用膨胀代替stride (DC5)
parser.add_argument('--position_embedding', default='sine', type=str, choices=('sine', 'learned'),
help="Type of positional embedding to use on top of the image features")#用于图像特征顶部的位置嵌入类型
#* Transformer
parser.add_argument('--enc_layers', default=6, type=int,
help="Number of encoding layers in the transformer")
parser.add_argument('--dec_layers', default=6, type=int,
help="Number of decoding layers in the transformer")
parser.add_argument('--dim_feedforward', default=2048, type=int,
help="Intermediate size of the feedforward layers in the transformer blocks")#变压器块中前馈层的中间尺寸
parser.add_argument('--hidden_dim', default=256, type=int,
help="Size of the embeddings (dimension of the transformer)")#预埋件尺寸(变压器尺寸)
parser.add_argument('--dropout', default=0.1, type=float,
help="Dropout applied in the transformer")#变压器漏接
parser.add_argument('--nheads', default=8, type=int,
help="Number of attention heads inside the transformer's attentions")#变压器注意力内部的注意力头的数量
parser.add_argument('--num_queries', default=100, type=int,
help="Number of query slots")#查询槽位数
parser.add_argument('--pre_norm', action='store_true')
# Loss
parser.add_argument('--no_aux_loss', dest='aux_loss', action='store_false',
help="Disables auxiliary decoding losses (loss at each layer)")#禁用辅助解码损耗
# * Matcher
parser.add_argument('--set_cost_class', default=1, type=float,
help="Class coefficient in the matching cost")#匹配代价中的类系数
parser.add_argument('--set_cost_bbox', default=5, type=float,
help="L1 box coefficient in the matching cost")#匹配成本中的L1盒系数
parser.add_argument('--set_cost_giou', default=2, type=float,
help="giou box coefficient in the matching cost")
# * Loss coefficients
parser.add_argument('--mask_loss_coef', default=1, type=float)
parser.add_argument('--dice_loss_coef', default=1, type=float)
parser.add_argument('--bbox_loss_coef', default=5, type=float)
parser.add_argument('--giou_loss_coef', default=2, type=float)
parser.add_argument('--eos_coef', default=0.1, type=float,
help="Relative classification weight of the no-object class")#无对象类的相对分类权重
# * Segmentation
parser.add_argument('--masks', action='store_true',
help="Train segmentation head if the flag is provided")
# dataset parameters
parser.add_argument('--dataset_file', default='coco')
parser.add_argument('--device', default='cpu',
help='device to use for training / testing')
return parser
def main(args):
# Detection - using a pre-trained model
pre_model=args.weights
model, criterion, postprocessors = build_model(args)
checkpoints = torch.load(pre_model)
model.load_state_dict(checkpoints['model'])
model.eval()
im = Image.open(args.source)
# # mean-std normalize the input image (batch-size: 1)
img = transform(im).unsqueeze(0)
# # propagate through the model
outputs = model(img)
# # keep only predictions with 0.7+ confidence
probas = outputs['pred_logits'].softmax(-1)[0, :, :-1]
keep = probas.max(-1).values > 0.9
# # convert boxes from [0; 1] to image scales
bboxes_scaled = rescale_bboxes(outputs['pred_boxes'][0, keep], im.size)
plot_results(im, probas[keep], bboxes_scaled)
if __name__ == '__main__':
parser = argparse.ArgumentParser('DETR predict script', parents=[get_args_parser()])
args = parser.parse_args()
# print("args",args)
if args.out_dir:
Path(args.out_dir).mkdir(parents=True, exist_ok=True)
main(args)
模型的预测结果
我们表明,它是相对简单的扩展DETR,以预测分割掩码。我们主要展示了强大的全景分割结果
For panoptic segmentation, you need the panoptic annotations additionally to the coco dataset (see above for the coco dataset). You need to download and extract the annotations. We expect the directory structure to be the following:
path/to/coco_panoptic/
annotations/ # annotation json files
panoptic_train2017/ # train panoptic annotations
panoptic_val2017/ # val panoptic annotations
我们建议将分割训练分为两个阶段:首先训练DETR来检测所有的盒子,然后训练分割头部。对于全视域分割,DETR必须学会同时检测物体和物体类的盒子。你可以在8个gpu的单个节点上训练它300个epoch:
python -m torch.distributed.launch --nproc_per_node=8 --use_env main.py --coco_path /path/to/coco --coco_panoptic_path /path/to/coco_panoptic --dataset_file coco_panoptic --output_dir /output/path/box_model
例如分割,你可以简单地训练一个普通的盒子模型(或使用我们提供的一个预先训练的模型)。
一旦你有一个盒子模型检查点,你需要冻结它,并训练分割头在隔离。对于全景分割,你可以用8个图形处理器在一个节点上训练25个epoch:
python -m torch.distributed.launch --nproc_per_node=8 --use_env main.py --masks --epochs 25 --lr_drop 15 --coco_path /path/to/coco --coco_panoptic_path /path/to/coco_panoptic --dataset_file coco_panoptic --frozen_weights /output/path/box_model/checkpoint.pth --output_dir /output/path/segm_model
For instance segmentation only, simply remove the dataset_file
and coco_panoptic_path
arguments from the above command line.
目录
1、准备数据
1.1 coco数据集
2、Model Zoo
2.1 coco的目标检测预训练模型
2.2 coco的全景分割预训练模型
2.2 COCO val5k目标检测评价结果
DETR COCO 2017 evaluation results
Object detection
2.2.1 DETR R50
2.2.2 DETR R50-DC5
2.2.3 DETR R101
2.2.4 DETR R101-DC5
2.3 Panoptic segmentation
2.3.2 DETR-R50 DC5
2.3.3 DETR-R101
2.4 Notebooks Examples