基于MxNet实现目标检测-Efficientdet【附部分源码及模型】

文章目录

  • 前言
  • 目标检测发展史及意义
  • 一、数据集的准备
    • 1.标注工具的安装
    • 2.数据集的准备
    • 3.标注数据
    • 4.解释xml文件的内容
  • 二、网络结构的介绍
  • 三、代码实现
    • 0.工程目录结构如下
    • 1.导入库
    • 2.配置GPU/CPU环境
    • 3.数据加载器
    • 4.模型构建
    • 5.模型训练
      • 1.学习率设置
      • 2.优化器设置
      • 3.损失设置
      • 4.循环训练
    • 6.模型预测
  • 四、算法主入口
  • 五、训练效果展示


前言

  本文主要讲解基于mxnet深度学习框架实现目标检测,实现的模型为Efficientdet

环境配置:
      python 3.8
      mxnet 1.7.0
      cuda 10.1


目标检测发展史及意义

  图像分类任务的实现可以让我们粗略的知道图像中包含了什么类型的物体,但并不知道物体在图像中哪一个位置,也不知道物体的具体信息,在一些具体的应用场景比如车牌识别、交通违章检测、人脸识别、运动捕捉,单纯的图像分类就不能完全满足我们的需求了。

  这时候,需要引入图像领域另一个重要任务:物体的检测与识别。在传统机器领域,一个典型的案例是利用HOG(Histogram of Gradient)特征来生成各种物体相应的“滤波器”,HOG滤波器能完整的记录物体的边缘和轮廓信息,利用这一滤波器过滤不同图片的不同位置,当输出响应值幅度超过一定阈值,就认为滤波器和图片中的物体匹配程度较高,从而完成了物体的检测。


一、数据集的准备

  首先我是用的是halcon数据集里边的药片,去了前边的100张做标注,后面的300张做测试,其中100张里边选择90张做训练集,10张做验证集。

1.标注工具的安装

pip install labelimg

进入cmd,输入labelimg,会出现如图的标注工具:
在这里插入图片描述

2.数据集的准备

首先我们先创建3个文件夹,如图:
在这里插入图片描述
DataImage:100张需要标注的图像
DataLabel:空文件夹,主要是存放标注文件,这个在labelimg中生成标注文件
test:存放剩下的300张图片,不需要标注
DataImage目录下和test目录的存放样子是这样的(以DataImage为例):
在这里插入图片描述

3.标注数据

  首先我们需要在labelimg中设置图像路径和标签存放路径,如图:
在这里插入图片描述
  然后先记住快捷键:w:开始编辑,a:上一张,d:下一张。这个工具只需要这三个快捷键即可完成工作。
  开始标注工作,首先按下键盘w,这个时候进入编辑框框的模式,然后在图像上绘制框框,输入标签(框框属于什么类别),即可完成物体1的标注,一张物体可以多个标注和多个类别,但是切记不可摸棱两可,比如这张图像对于某物体标注了,另一张图像如果出现同样的就需要标注,或者标签类别不可多个,比如这个图象A物体标注为A标签,下张图的A物体标出成了B标签,最终的效果如图:
在这里插入图片描述
最后标注完成会在DataLabel中看到标注文件,json格式:
在这里插入图片描述

4.解释xml文件的内容

在这里插入图片描述
xml标签文件如图,我们用到的就只有object对象,对其进行解析即可。


二、网络结构的介绍

论文地址:https://arxiv.org/pdf/1911.09070.pdf
网络结构:
基于MxNet实现目标检测-Efficientdet【附部分源码及模型】_第1张图片

  EfficientDet是在EfficientNet基础上提出来的目标检测模型,它将EfficientNet主干网络、级联的双向特征金字塔网络(bi-directional feature pyramid network,BiFPN)和联合缩放方法结合,可以快速高效完成目标检测,且检测准确率较高,同时网络参数量较之主流检测模型大幅减少,检测速度也得到了很大提升,是目前最先进的目标检测算法之一。EfficientDet是将EfficientNet的复合缩放思路进行延伸,把架构决策明确为了可拓展框架为不同的使用场景提供了D0-D7共8种模型,使用者可根据真实环境中软硬件的性价比与对精度和效率的实际需求,来对模型进行选择。EfficientDet D0-D7网络越来越深,输入分辨率也越来越大,精度越来越高的同时,计算量也越来越大。EfficientDet网络的整体架构如图1所示,是一个端到端的网络,以EfficientNet为主体网络,BiFPN作为特征网络接收来自主干网络的特征并对其进行双向特征融合,最后将融合特征送入分类和边框回归网络,输出目标的类别及位置信息实现目标检测。


三、代码实现

0.工程目录结构如下

基于MxNet实现目标检测-Efficientdet【附部分源码及模型】_第2张图片

core:损失计算及一些核心计算的文件都存放在此文件夹
data:数据加载的相关函数及类
net:包含主干网络结构及标准的centernet结构
utils:数据预处理的相关文件
Ctu_EfficientDet.py:effientdet的训练类和测试类,是整个AI的主入口


1.导入库

import os, sys, time, json
sys.path.append('.')
import numpy as np
import mxnet as mx
from mxnet import nd, gluon, autograd
from mxnet.contrib import amp
from data.data_loader import VOCDetection, VOC07MApMetric
from data.batchify_fn import Tuple, Stack, Pad
from data.data_transform import EfficientdetDefaultTrainTransform,EfficientdetDefaultValTransform
from core.lr_scheduler import LRScheduler,LRSequential
from core.loss import EfficientDetLoss
from nets.efficientdet import get_efficientdet

2.配置GPU/CPU环境

self.ctx = [mx.gpu(int(i)) for i in USEGPU.split(',') if i.strip()]
self.ctx = self.ctx if self.ctx else [mx.cpu()]

3.数据加载器

这里输入的是迭代器,后面都会利用它构建训练的迭代器

class VOCDetection(dataset.Dataset):
    def CreateDataList(self,IMGDir,XMLDir):
        ImgList = os.listdir(IMGDir)
        XmlList = os.listdir(XMLDir)
        classes = []
        dataList=[]
        for each_jpg in ImgList:
            each_xml = each_jpg.split('.')[0] + '.xml'
            if each_xml in XmlList:
                dataList.append([os.path.join(IMGDir,each_jpg),os.path.join(XMLDir,each_xml)])
                with open(os.path.join(XMLDir,each_xml), "r", encoding="utf-8") as in_file:
                    tree = ET.parse(in_file)
                    root = tree.getroot()
                    for obj in root.iter('object'):
                        cls = obj.find('name').text
                        if cls not in classes:
                            classes.append(cls)
        return dataList,classes

    def __init__(self, ImageDir, XMLDir,transform=None):
        self.datalist,self.classes_names = self.CreateDataList(ImageDir,XMLDir)
        self._transform = transform
        self.index_map = dict(zip(self.classes_names, range(len(self.classes_names))))
        # self._label_cache = self._preload_labels()

    @property
    def classes(self):
        return self.classes_names

    def __len__(self):
        return len(self.datalist)

    def __getitem__(self, idx):
        img_path = self.datalist[idx][0]
        # label = self._label_cache[idx] if self._label_cache else self._load_label(idx)
        label = self._load_label(idx)
        img = mx.image.imread(img_path, 1)
        if self._transform is not None:
            return self._transform(img, label)
        return img, label.copy()

    def _preload_labels(self):
        return [self._load_label(idx) for idx in range(len(self))]

    def _load_label(self, idx):
        anno_path = self.datalist[idx][1]
        root = ET.parse(anno_path).getroot()
        size = root.find('size')
        width = float(size.find('width').text)
        height = float(size.find('height').text)
        label = []
        for obj in root.iter('object'):
            try:
                difficult = int(obj.find('difficult').text)
            except ValueError:
                difficult = 0
            cls_name = obj.find('name').text.strip().lower()
            if cls_name not in self.classes:
                continue
            cls_id = self.index_map[cls_name]
            xml_box = obj.find('bndbox')
            xmin = (float(xml_box.find('xmin').text) - 1)
            ymin = (float(xml_box.find('ymin').text) - 1)
            xmax = (float(xml_box.find('xmax').text) - 1)
            ymax = (float(xml_box.find('ymax').text) - 1)
            try:
                self._validate_label(xmin, ymin, xmax, ymax, width, height)
                label.append([xmin, ymin, xmax, ymax, cls_id, difficult])
            except AssertionError as e:
                pass
        return np.array(label)

    def _validate_label(self, xmin, ymin, xmax, ymax, width, height):
        assert 0 <= xmin < width, "xmin must in [0, {}), given {}".format(width, xmin)
        assert 0 <= ymin < height, "ymin must in [0, {}), given {}".format(height, ymin)
        assert xmin < xmax <= width, "xmax must in (xmin, {}], given {}".format(width, xmax)
        assert ymin < ymax <= height, "ymax must in (ymin, {}], given {}".format(height, ymax)


4.模型构建

本项目包含了8种Efficientdet

def efficientdet_params(model_name):
    params_dict = {
        'efficientdet-d0': ['efficientnet-b0', 512,  64,  3, 3, 4.0],
        'efficientdet-d1': ['efficientnet-b1', 640,  88,  4, 3, 4.0],
        'efficientdet-d2': ['efficientnet-b2', 768,  112, 5, 3, 4.0],
        'efficientdet-d3': ['efficientnet-b3', 896,  160, 5, 3, 4.0],
        'efficientdet-d4': ['efficientnet-b4', 1024, 224, 7, 4, 4.0],
        'efficientdet-d5': ['efficientnet-b5', 1280, 288, 7, 4, 4.0],
        'efficientdet-d6': ['efficientnet-b6', 1280, 384, 8, 5, 4.0],
        'efficientdet-d7': ['efficientnet-b7', 1536, 384, 8, 5, 5.0]
    }
    if model_name not in list(params_dict.keys()):
        raise NotImplementedError('%s is not exists.'%model_name)

    return params_dict[model_name]
class EfficientDet(nn.HybridBlock):
    def __init__(self, base_size, stages, ratios, scales, steps, classes, fpn_channel=64, fpn_repeat=3, box_cls_repeat=3, act_type='swish', stds=(0.1, 0.1, 0.2, 0.2), nms_thresh=0.45, nms_topk=400, post_nms=100, anchor_alloc_size=128, ctx=mx.cpu(), norm_layer=nn.BatchNorm, norm_kwargs=None, **kwargs):
        super(EfficientDet, self).__init__(**kwargs)

        self.num_stages = len(steps)
        self.classes = classes
        self.nms_thresh = nms_thresh
        self.nms_topk = nms_topk
        self.post_nms = post_nms
        num_anchors = len(ratios)*len(scales)
        norm_kwargs = {} if norm_kwargs is None else norm_kwargs
    
        im_size = (base_size, base_size)
        asz = anchor_alloc_size
        with self.name_scope():
            self.stages     = nn.HybridSequential()
            self.proj_convs = nn.HybridSequential()
            self.fpns       = nn.HybridSequential()
            self.anchor_generators = nn.HybridSequential()
            for stage in stages:
                self.stages.add(stage)
            for i in range(self.num_stages):
                block = nn.HybridSequential()
                _add_conv(block, channels=fpn_channel, act_type=act_type, norm_layer=norm_layer, norm_kwargs=norm_kwargs)
                self.proj_convs.add(block)
                anchor_generator = AnchorGenerator(i, im_size, ratios, scales, steps[i], (asz, asz))
                self.anchor_generators.add(anchor_generator)
                asz = max(asz//2, 16)

            for i in range(fpn_repeat):
                self.fpns.add(BiFPN(fpn_channel, num_features=self.num_stages, act_type=act_type, norm_layer=norm_layer, norm_kwargs=norm_kwargs))
            self.cls_net = OutputSubnet(fpn_channel, box_cls_repeat, self.num_classes+1, num_anchors, act_type=act_type, norm_layer=norm_layer, norm_kwargs=norm_kwargs, prefix='class_net')
            self.box_net = OutputSubnet(fpn_channel, box_cls_repeat, 4, num_anchors, act_type=act_type, norm_layer=norm_layer, norm_kwargs=norm_kwargs, prefix='box_net')
            self.bbox_decoder = NormalizedBoxCenterDecoder(stds)
            self.cls_decoder = MultiPerClassDecoder(self.num_classes+1, thresh=0.01)

    @property
    def num_classes(self):
        return len(self.classes)

    def set_nms(self, nms_thresh=0.45, nms_topk=400, post_nms=100):
        self._clear_cached_op()
        self.nms_thresh = nms_thresh
        self.nms_topk = nms_topk
        self.post_nms = post_nms

    def hybrid_forward(self, F, x):
        feats = []
        # backbone forward
        for stage in self.stages:
            x = stage(x)
            feats.append(x)
        # additional stages
        for i in range(self.num_stages-len(feats)):
            x = F.Pooling(x, pool_type='max', kernel=(2, 2), stride=(2, 2), pooling_convention='full')
            feats.append(x)
        # The channel of feature project to the input channel of BiFPN 
        for i, block in enumerate(self.proj_convs):
            feats[i] = block(feats[i])
        # Binfpn forward
        for block in self.fpns:
            feats = block(*feats)
        
        cls_preds = []
        box_preds = []
        anchors   = []
        for feat, ag in zip(feats, self.anchor_generators):
            box_pred = self.box_net(feat)
            cls_pred = self.cls_net(feat)
            anchor   = ag(feat)
            # (b, c*a, h, w) -> (b, c, a*h*w)
            box_pred = F.reshape(F.transpose(box_pred, axes=(0, 2, 3, 1)), shape=(0, -1, 4))
            cls_pred = F.reshape(F.transpose(cls_pred, axes=(0, 2, 3, 1)), shape=(0, -1, self.num_classes+1))
            cls_preds.append(cls_pred)
            box_preds.append(box_pred)
            anchors.append(anchor)
        
        cls_preds = F.concat(*cls_preds, dim=1)
        box_preds = F.concat(*box_preds, dim=1)
        anchors   = F.concat(*anchors, dim=1)
        if mx.autograd.is_training():
            return [cls_preds, box_preds, anchors]
        
        bboxes = self.bbox_decoder(box_preds, anchors)
        cls_ids, scores = self.cls_decoder(F.softmax(cls_preds, axis=-1))
        results = []
        for i in range(self.num_classes):
            cls_id = cls_ids.slice_axis(axis=-1, begin=i, end=i+1)
            score  = scores.slice_axis(axis=-1, begin=i, end=i+1)
            # per class results
            per_result = F.concat(*[cls_id, score, bboxes], dim=-1)
            results.append(per_result)
        result = F.concat(*results, dim=1)
        if self.nms_thresh > 0 and self.nms_thresh < 1:
            result = F.contrib.box_nms(result, overlap_thresh=self.nms_thresh, topk=self.nms_topk, valid_thresh=0.01, id_index=0, score_index=1, coord_start=2, force_suppress=False)
            if self.post_nms > 0:
                result = result.slice_axis(axis=1, begin=0, end=self.post_nms)
        ids = F.slice_axis(result, axis=2, begin=0, end=1)
        scores = F.slice_axis(result, axis=2, begin=1, end=2)
        bboxes = F.slice_axis(result, axis=2, begin=2, end=6)
        return ids, scores, bboxes


5.模型训练

1.学习率设置

lr_steps = sorted([int(ls) for ls in lr_decay_epoch.split(',') if ls.strip()])
lr_decay_epoch = [e for e in lr_steps]

 lr_scheduler = LRSequential([
     LRScheduler('linear', base_lr=0, target_lr=learning_rate,
                 nepochs=0, iters_per_epoch=self.num_samples // self.batch_size),
     LRScheduler(lr_mode, base_lr=learning_rate,
                 nepochs=TrainNum,
                 iters_per_epoch=self.num_samples // self.batch_size,
                 step_epoch=lr_decay_epoch,
                 step_factor=lr_decay, power=2),
 ])

2.优化器设置

if optim == 1:
    trainer = gluon.Trainer(self.model.collect_params(), 'sgd', {'learning_rate': learning_rate, 'wd': 0.0005, 'momentum': 0.9, 'lr_scheduler': lr_scheduler})
elif optim == 2:
    trainer = gluon.Trainer(self.model.collect_params(), 'adagrad', {'learning_rate': learning_rate, 'lr_scheduler': lr_scheduler})
else:
    trainer = gluon.Trainer(self.model.collect_params(), 'adam', {'learning_rate': learning_rate, 'lr_scheduler': lr_scheduler})

3.损失设置

cls_box_loss = EfficientDetLoss(len(self.classes_names)+1, rho=0.1, lambd=50.0)
ce_metric = mx.metric.Loss('FocalLoss')
smoothl1_metric = mx.metric.Loss('SmoothL1')

4.循环训练

for i, batch in enumerate(self.train_loader):
    data = gluon.utils.split_and_load(batch[0], ctx_list=self.ctx, batch_axis=0)
    cls_targets = gluon.utils.split_and_load(batch[1], ctx_list=self.ctx, batch_axis=0)
    box_targets = gluon.utils.split_and_load(batch[2], ctx_list=self.ctx, batch_axis=0)

    with autograd.record():
        cls_preds = []
        box_preds = []
        for x in data:
            cls_pred, box_pred, _ = self.model(x)
            cls_preds.append(cls_pred)
            box_preds.append(box_pred)
        sum_loss, cls_loss, box_loss = cls_box_loss(cls_preds, box_preds, cls_targets, box_targets)
        if self.ampFlag:
            with amp.scale_loss(sum_loss, trainer) as scaled_loss:
                autograd.backward(scaled_loss)
        else:
            autograd.backward(sum_loss)
    trainer.step(self.batch_size)

    ce_metric.update(0, [l * self.batch_size for l in cls_loss])
    smoothl1_metric.update(0, [l * self.batch_size for l in box_loss])

    name1, loss1 = ce_metric.get()
    name2, loss2 = smoothl1_metric.get()
    print('[Epoch {}][Batch {}], Speed: {:.3f} samples/sec, {}={:.3f}, {}={:.3f}'.format(epoch, i, self.batch_size/(time.time()-btic), name1, loss1, name2, loss2))
    btic = time.time()


6.模型预测

def predict(self, image, confidence=0.5, mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225)):
    start_time = time.time()
    origin_img = copy.deepcopy(image)
    base_imageSize = origin_img.shape
    image = self.resize_image(image,(self.image_size,self.image_size))
    # print(resize_imageSize,base_imageSize)   # (512, 512, 3) (780, 1248, 3)
    img = nd.array(image)

    # img = resize_short_within(img, self.image_size, max_size)
    img = mx.nd.image.to_tensor(img)
    img = mx.nd.image.normalize(img, mean=mean, std=std)
    x = img.expand_dims(0)

    x = x.as_in_context(self.ctx[0])
    labels, scores, bboxes = [xx[0].asnumpy() for xx in self.model(x)]

    origin_img_pillow = self.cv2_pillow(origin_img)
    font = ImageFont.truetype(font='./model_data/simhei.ttf', size=np.floor(3e-2 * np.shape(origin_img_pillow)[1] + 0.5).astype('int32'))
    thickness = max((np.shape(origin_img_pillow)[0] + np.shape(origin_img_pillow)[1]) // self.image_size, 1)

    imgbox = []
    for i, bbox in enumerate(bboxes):
        if (scores is not None and scores.flat[i] < confidence) or labels is not None and labels.flat[i] < 0:
            continue
        cls_id = int(labels.flat[i]) if labels is not None else -1

        xmin, ymin, xmax, ymax = [int(x) for x in bbox]
        xmin, ymin, xmax, ymax = xmin/self.image_size, ymin/self.image_size, xmax/self.image_size, ymax/self.image_size
        box_xy, box_wh = np.array([(xmin+xmax)/2,(ymin+ymax)/2]).astype('float32'), np.array([xmax-xmin,ymax-ymin]).astype('float32')
        image_shape = np.array((base_imageSize[0],base_imageSize[1]))
        input_shape = np.array((self.image_size,self.image_size))
        result = self.correct_boxes(box_xy, box_wh, input_shape, image_shape,True)
        ymin, xmin, ymax, xmax = result

        xmin, ymin, xmax, ymax = int(xmin), int(ymin), int(xmax), int(ymax)
        class_name = self.classes_names[cls_id]
        score = '{:d}%'.format(int(scores.flat[i] * 100)) if scores is not None else ''
        imgbox.append([(xmin, ymin, xmax, ymax), cls_id, self.classes_names[cls_id], score])
        top, left, bottom, right = ymin, xmin, ymax, xmax


        # cv2.rectangle(origin_img, (xmin, ymin), (xmax, ymax), self.colors[cls_id], 2)
        # if class_name or score:
        #     y = ymin - 15 if ymin - 15 > 15 else ymin + 15
        #     cv2.putText(origin_img, '{:s} {:s}'.format(class_name, score),
        #                 (xmin, y), cv2.FONT_HERSHEY_SIMPLEX, min(1.0 / 2, 2),
        #                 self.colors[cls_id], min(int(1.0), 5), lineType=cv2.LINE_AA)
        label = '{}-{}'.format(class_name, score)
        draw = ImageDraw.Draw(origin_img_pillow)
        label_size = draw.textsize(label, font)
        label = label.encode('utf-8')

        if top - label_size[1] >= 0:
            text_origin = np.array([left, top - label_size[1]])
        else:
            text_origin = np.array([left, top + 1])

        for i in range(thickness):
            draw.rectangle([left + i, top + i, right - i, bottom - i], outline=self.colors[cls_id])
        draw.rectangle([tuple(text_origin), tuple(text_origin + label_size)], fill=self.colors[cls_id])
        draw.text(text_origin, str(label,'UTF-8'), fill=(0, 0, 0), font=font)
        del draw

    result_value = {
        "image_result": self.pillow_cv2(origin_img_pillow),
        "bbox": imgbox,
        "time": (time.time() - start_time) * 1000
    }

    return result_value

四、算法主入口


if __name__ == '__main__':
    ctu = Ctu_Efficientdet(USEGPU='0',image_size=512, ampFlag = False)
    ctu.InitModel(DataDir=r'D:/Ctu/Ctu_Project_DL/DataSet/DataSet_Detection_Color',batch_size=1,Pre_Model = './Model_efficientdet/best_model.dat',num_workers=0,phi=0)
    ctu.train(TrainNum=150,learning_rate=0.00001,lr_decay_epoch='50,100,150,200',lr_decay = 0.9,ModelPath='./Model2',optim=2,lr_mode='step')

五、训练效果展示

备注:项目模型的本人没有保存因此会后续提供训练效果
基于MxNet实现目标检测-Efficientdet【附部分源码及模型】_第3张图片

你可能感兴趣的:(深度学习-mxnet,目标检测,mxnet,计算机视觉)