Deformable-DETR部署和体验

随着tranformer在NLP,CV的流行,在目标检测领域也有了基于transformer的实现的detr,说明了其可行性。更特别是它减少了目标检测的不少组件,比如没有anchor、nms等,可以说是真正的end-to-end检测,同时也有很好的准确率表现。
但是,由于Transformer注意模块在处理图像特征图时的局限性,它收敛缓慢且特征空间分辨率有限。为了缓解这些问题,提出了Deformable-DETR,其关注模块仅关注参考周围的一小部分关键采样点。可变形的DETR可以比DETR(尤其是在小物体上)获得更好的性能,训练时间减少10倍。在COCO Benchmark数据集上进行的大量实验证明了方法的有效性。

开源地址:
https://github.com/fundamentalvision/deformable-detr

环境准备

Docker开发环境部署类似detr ,但是最后deformable-detr 需要进行CUDA编译,需要使用开发版本环境,带有编译库,其他类似detr

FROM pytorch/pytorch:1.5-cuda10.1-cudnn7-devel

ENV DEBIAN_FRONTEND=noninteractive

RUN apt-get update -qq && \
    apt-get install -y git vim libgtk2.0-dev && \
            rm -rf /var/cache/apk/*
 
RUN pip --default-timeout=100 install Cython -i http://pypi.douban.com/simple/ --trusted-host pypi.douban.com

RUN git clone https://github.com/philferriere/cocoapi.git

RUN cd cocoapi && cd PythonAPI \
                    make

RUN pip --no-cache-dir install pycocotools -i http://pypi.douban.com/simple/ --trusted-host pypi.douban.com

RUN cd .. && git clone https://gitee.com/qiaodl/panopticapi.git && \
                         cd panopticapi && \
                        python setup.py install


                                        # 强烈建议修改
COPY requirements.txt /workspace

RUN pip install  --no-cache-dir -r  /workspace/requirements.txt  -i http://pypi.douban.com/simple/ --trusted-host pypi.douban.com

构建镜像
docker build -t deformable-detr:v0.1 .
构建后,运行容器,进入容器环境编译
cd models
cd ops
sh make.sh
编译成功后可以 测试 python test.py

数据准备和转换等可以参考detr 的
https://www.jianshu.com/p/f54f473a1143

配置更改

正常编译好就可以用 参考代码运行
GPUS_PER_NODE=8 ./tools/run_dist_launch.sh 8 ./configs/r50_deformable_detr.sh
但是这是针对coco数据集,而且有更多选择,比如多gpu和多节点,还有分one-stage和two-stage,这里使用voc数据集所有要改配置和代码。

一般我们也是需要进行配置修改,使得神经网络适合自己的类别个数,同时一般都是基于已经训练好的预训练模型,detr 和deformable等基于transform的目标检测多是基于coco数据集,有80个类,同时都是要基于backbone网络来提取特征,比如resnet50 ,vgg等卷积神经网络。当然你有环境能力和时间也可以从头训练。

所以我们这里验证使用VOC数据集也是同样类似修改,deformable-detr 也是需要一个背景类,num_class+1

同样修改modesl/deformable_detr.py

 def build(args):
     num_class = 20 #类别个数
     num_classes = 20 if args.dataset_file != 'coco' else (num_class + 1)

由于单GPU ,所有直接使用main文件运行训练
这里有个坑,deformable 是使用第一个epoch当作test 来验证参数,所以你即使没有参数 eval 也是会开始test,然后因为错误test后退出,


image.png

修改代码使得开始跳过验证,直接开始训练

# main.py
# line 229
# if args.resume:
if args.resume and args.start_epoch is not 0:
# 可以通过epoch次数控制是否在首轮进行验证加载的权重文件

这里直接注释test 的部分代码,略过此步骤


image.png

接下来加载权值文件,首先下载r50上coco数据集训练结果文件
r50_deformable_detr-checkpoint.pth
可以在GitHub上下,翻墙比较好


image.png

需要修改相关权值

import torch
#加载官方提供的权重文件
pretrained_weights = torch.load('r50_deformable_detr-checkpoint.pth')

#修改相关权重
num_class = 20 # 自己数据集分类数
pretrained_weights['model']['class_embed.0.weight'].resize_(num_class+1, 256)
pretrained_weights['model']['class_embed.0.bias'].resize_(num_class+1)
pretrained_weights['model']['class_embed.1.weight'].resize_(num_class+1, 256)
pretrained_weights['model']['class_embed.1.bias'].resize_(num_class+1)
pretrained_weights['model']['class_embed.2.weight'].resize_(num_class+1, 256)
pretrained_weights['model']['class_embed.2.bias'].resize_(num_class+1)
pretrained_weights['model']['class_embed.3.weight'].resize_(num_class+1, 256)
pretrained_weights['model']['class_embed.3.bias'].resize_(num_class+1)
pretrained_weights['model']['class_embed.4.weight'].resize_(num_class+1, 256)
pretrained_weights['model']['class_embed.4.bias'].resize_(num_class+1)
pretrained_weights['model']['class_embed.5.weight'].resize_(num_class+1, 256)
pretrained_weights['model']['class_embed.5.bias'].resize_(num_class+1)
pretrained_weights['model']['query_embed.weight'].resize_(50, 512) # 此处50对应生成queries的数量,根据main.py中--num_queries数量修改
torch.save(pretrained_weights, 'deformable_detr-r50_%d.pth'%num_class)
 

运行后输出新的权值文件,接着就可以训练了,

python main.py --coco_path "data"  --batch_size=2 --num_workers=4 --output_dir="outputs_1"    --start_epoch 0  --resume=deformable_detr_r50_20.pth
image.png

推理预测代码
将文件夹下的图片都进行推理,并绘制框保存结果图片到新的文件夹

import cv2
from PIL import Image
import numpy as np
import os
import time

import torch
from torch import nn
# from torchvision.models import resnet50
import torchvision.transforms as T
from main import get_args_parser as get_main_args_parser
from models import build_model

torch.set_grad_enabled(False)

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print("[INFO] 当前使用{}做推断".format(device))

# 图像数据处理
transform = T.Compose([
    T.Resize(800),
    T.ToTensor(),
    T.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])


# 将xywh转xyxy
def box_cxcywh_to_xyxy(x):
    x_c, y_c, w, h = x.unbind(1)
    b = [(x_c - 0.5 * w), (y_c - 0.5 * h),
         (x_c + 0.5 * w), (y_c + 0.5 * h)]
    return torch.stack(b, dim=1)


# 将0-1映射到图像
def rescale_bboxes(out_bbox, size):
    img_w, img_h = size
    b = box_cxcywh_to_xyxy(out_bbox)
    b = b.cpu().numpy()
    b = b * np.array([img_w, img_h, img_w, img_h], dtype=np.float32)
    return b


# plot box by opencv
def plot_result(pil_img, prob, boxes, save_name=None, imshow=False, imwrite=False):
    LABEL = ['all','hat', 'person', 'groundrod', 'vest', 'workclothes_clothes', 'workclothes_trousers', 'winter_clothes',
             'winter_trousers', 'noworkclothes_clothes', 'noworkclothes_trousers', 'height', 'safteybelt', 'smoking',
             'noheight', 'fire', 'extinguisher', 'roll_workclothes', 'roll_noworkclothes', 'insulating_gloves', 'car',
             'fence', 'bottle', 'shorts', 'holes', 'single_ladder', 'down', 'double_ladder', 'oxygen_horizontally',
             'oxygen_vertically', 'acetylene_vertically', 'acetylene_horizontally']

    len(prob)
    opencvImage = cv2.cvtColor(np.array(pil_img), cv2.COLOR_RGB2BGR)


    if len(prob) == 0:
        print("[INFO] NO box detect !!! ")
        if imwrite:
            if not os.path.exists("./result/pred_no"):
                os.makedirs("./result/pred_no")
            cv2.imwrite(os.path.join("./result/pred_no", save_name), opencvImage)
        return

    for p, (xmin, ymin, xmax, ymax) in zip(prob, boxes):
        cl = p.argmax()
        label_text = '{}: {}%'.format(LABEL[cl], round(p[cl] * 100, 2))

        cv2.rectangle(opencvImage, (int(xmin), int(ymin)), (int(xmax), int(ymax)), (255, 255, 0), 2)
        cv2.putText(opencvImage, label_text, (int(xmin) + 10, int(ymin) + 30), cv2.FONT_HERSHEY_SIMPLEX, 1,
                    (255, 255, 0), 2)

    if imshow:
        cv2.imshow('detect', opencvImage)
        cv2.waitKey(0)

    if imwrite:
        if not os.path.exists("./result/pred"):
            os.makedirs('./result/pred')
        cv2.imwrite('./result/pred/{}'.format(save_name), opencvImage)

def load_model(model_path , args):

    model, _, _ = build_model(args)
    model.cuda()
    model.eval()
    state_dict = torch.load(model_path)  # <-----------修改加载模型的路径
    model.load_state_dict(state_dict["model"])
    model.to(device)
    print("load model sucess")
    return model

# 单张图像的推断
def detect(im, model, transform, prob_threshold=0.7):
    # mean-std normalize the input image (batch-size: 1)
    img = transform(im).unsqueeze(0)

    # demo model only support by default images with aspect ratio between 0.5 and 2
    # if you want to use images with an aspect ratio outside this range
    # rescale your image so that the maximum size is at most 1333 for best results
    
    #assert img.shape[-2] <= 1600 and img.shape[
    #                                     -1] <= 1600, 'demo model only supports images up to 1600 pixels on each side'

    # propagate through the model
    img = img.to(device)
    start = time.time()
    outputs = model(img)
    #end = time.time()
    # keep only predictions with 0.7+ confidence
    # print(outputs['pred_logits'].softmax(-1)[0, :, :-1])
    probas = outputs['pred_logits'].softmax(-1)[0, :, :-1]
    keep = probas.max(-1).values > prob_threshold
    #end = time.time()

    probas = probas.cpu().detach().numpy()
    keep = keep.cpu().detach().numpy()

    # convert boxes from [0; 1] to image scales
    bboxes_scaled = rescale_bboxes(outputs['pred_boxes'][0, keep], im.size)
    end = time.time()
    return probas[keep], bboxes_scaled, end - start


if __name__ == "__main__":
    
    main_args = get_main_args_parser().parse_args()
#加载模型
    dfdetr = load_model('exps/r50_deformable_detr/checkpoint0049.pth',main_args)

    files = os.listdir("coco/testdata/test2017")

    cn = 0
    waste=0
    for file in files:
        img_path = os.path.join("coco/testdata/test2017", file)
        im = Image.open(img_path)

        scores, boxes, waste_time = detect(im, dfdetr, transform)
        plot_result(im, scores, boxes, save_name=file, imshow=False, imwrite=True)
        print("{} [INFO] {} time: {} done!!!".format(cn,file, waste_time))

        cn+=1
        waste+=waste_time
    waste_avg = waste/cn
    print(waste_avg)


image.png

参考
https://blog.csdn.net/Q1u1NG/article/details/109160318
https://blog.csdn.net/u010826850/article/details/117325848

你可能感兴趣的:(Deformable-DETR部署和体验)