DETR(DEtection TRansformer) 是 Facebook AI(FAIR)于 2020 年提出的 端到端目标检测算法,它基于 Transformer 架构,消除了 Faster R-CNN、YOLO 等方法中的 候选框(Anchor Boxes) 和 非极大值抑制(NMS) 机制,使目标检测变得更简单、高效。
论文:End-to-End Object Detection with Transformers
DETR 由 三部分 组成:
DETR 结构示意图
输入图片 -> CNN 提取特征 -> Transformer 处理特征 -> 预测目标类别 + 边界框
使用 PyTorch 进行 DETR 目标检测
import torch
import torchvision.transforms as T
from PIL import Image
import requests
# 载入 DETR 预训练模型
detr = torch.hub.load('facebookresearch/detr', 'detr_resnet50', pretrained=True, trust_repo=True)
detr.eval()
# 加载图片并进行预处理
image_path = r"D:\Pictures\test.jpeg"
image = Image.open(image_path)
transform = T.Compose([T.Resize(800), T.ToTensor()])
img_tensor = transform(image).unsqueeze(0)
# 进行目标检测
with torch.no_grad():
outputs = detr(img_tensor)
# 输出检测结果
print(outputs)
运行结果
{'pred_logits': tensor([[[-17.4480, -1.4711, -6.0746, ..., -10.0646, -7.2832, 11.1362],
[-17.7877, -1.7454, -5.9165, ..., -11.6356, -8.4581, 10.7261],
[-18.3903, -1.3194, -7.6447, ..., -11.3595, -6.6635, 11.2573],
...,
[-18.0295, -1.6913, -6.6354, ..., -11.4836, -7.7729, 10.9814],
[-14.4323, 1.3790, -4.2558, ..., -11.5297, -7.8083, 8.1644],
[-17.6349, -1.6041, -6.4100, ..., -11.2120, -7.4216, 10.7064]]]), 'pred_boxes': tensor([[[0.4990, 0.5690, 0.4764, 0.7080],
[0.5039, 0.5219, 0.4657, 0.6124],
[0.3920, 0.5463, 0.2963, 0.6085],
[0.5231, 0.5180, 0.4489, 0.6110],
[0.4986, 0.5346, 0.4989, 0.5883],
[0.5145, 0.5258, 0.5162, 0.6123],
[0.4251, 0.5273, 0.3235, 0.5911],
[0.4012, 0.5339, 0.2816, 0.5804],
[0.4025, 0.5263, 0.2526, 0.5638],
[0.5153, 0.5249, 0.4807, 0.6065],
[0.6775, 0.8235, 0.0436, 0.0436],
[0.4380, 0.5365, 0.3368, 0.5919],
[0.5044, 0.5242, 0.4791, 0.6314],
[0.7352, 0.8131, 0.0248, 0.0464],
[0.4567, 0.8361, 0.0448, 0.0530],
[0.4981, 0.5287, 0.4715, 0.6199],
[0.5047, 0.5239, 0.4570, 0.6045],
[0.6295, 0.5182, 0.2367, 0.6062],
[0.5980, 0.5261, 0.2878, 0.6313],
[0.5106, 0.5218,
代码解析
detr_resnet50
)模型 | 方法 | 检测方式 | 速度(FPS) | mAP(COCO) | 特点 |
---|---|---|---|---|---|
Faster R-CNN | 双阶段 | RPN + ROI 池化 | ⏳ 5-10 | 76.4% | 高精度,速度慢 |
YOLOv8 | 单阶段 | 直接预测类别 + 边界框 | ⚡ 60+ | 92% | 速度快,适合实时检测 |
DETR | 端到端 | Transformer 进行检测 | ⏳ 15 | 94% | 无 Anchor / NMS |