本文主要讲解基于mxnet深度学习框架实现目标检测,实现的模型为Efficientdet
环境配置:
python 3.8
mxnet 1.7.0
cuda 10.1
图像分类任务的实现可以让我们粗略的知道图像中包含了什么类型的物体,但并不知道物体在图像中哪一个位置,也不知道物体的具体信息,在一些具体的应用场景比如车牌识别、交通违章检测、人脸识别、运动捕捉,单纯的图像分类就不能完全满足我们的需求了。
这时候,需要引入图像领域另一个重要任务:物体的检测与识别。在传统机器领域,一个典型的案例是利用HOG(Histogram of Gradient)特征来生成各种物体相应的“滤波器”,HOG滤波器能完整的记录物体的边缘和轮廓信息,利用这一滤波器过滤不同图片的不同位置,当输出响应值幅度超过一定阈值,就认为滤波器和图片中的物体匹配程度较高,从而完成了物体的检测。
首先我是用的是halcon数据集里边的药片,去了前边的100张做标注,后面的300张做测试,其中100张里边选择90张做训练集,10张做验证集。
pip install labelimg
进入cmd,输入labelimg,会出现如图的标注工具:
首先我们先创建3个文件夹,如图:
DataImage:100张需要标注的图像
DataLabel:空文件夹,主要是存放标注文件,这个在labelimg中生成标注文件
test:存放剩下的300张图片,不需要标注
DataImage目录下和test目录的存放样子是这样的(以DataImage为例):
首先我们需要在labelimg中设置图像路径和标签存放路径,如图:
然后先记住快捷键:w:开始编辑,a:上一张,d:下一张。这个工具只需要这三个快捷键即可完成工作。
开始标注工作,首先按下键盘w,这个时候进入编辑框框的模式,然后在图像上绘制框框,输入标签(框框属于什么类别),即可完成物体1的标注,一张物体可以多个标注和多个类别,但是切记不可摸棱两可,比如这张图像对于某物体标注了,另一张图像如果出现同样的就需要标注,或者标签类别不可多个,比如这个图象A物体标注为A标签,下张图的A物体标出成了B标签,最终的效果如图:
最后标注完成会在DataLabel中看到标注文件,json格式:
xml标签文件如图,我们用到的就只有object对象,对其进行解析即可。
论文地址:https://arxiv.org/pdf/1911.09070.pdf
网络结构:
EfficientDet是在EfficientNet基础上提出来的目标检测模型,它将EfficientNet主干网络、级联的双向特征金字塔网络(bi-directional feature pyramid network,BiFPN)和联合缩放方法结合,可以快速高效完成目标检测,且检测准确率较高,同时网络参数量较之主流检测模型大幅减少,检测速度也得到了很大提升,是目前最先进的目标检测算法之一。EfficientDet是将EfficientNet的复合缩放思路进行延伸,把架构决策明确为了可拓展框架为不同的使用场景提供了D0-D7共8种模型,使用者可根据真实环境中软硬件的性价比与对精度和效率的实际需求,来对模型进行选择。EfficientDet D0-D7网络越来越深,输入分辨率也越来越大,精度越来越高的同时,计算量也越来越大。EfficientDet网络的整体架构如图1所示,是一个端到端的网络,以EfficientNet为主体网络,BiFPN作为特征网络接收来自主干网络的特征并对其进行双向特征融合,最后将融合特征送入分类和边框回归网络,输出目标的类别及位置信息实现目标检测。
core:损失计算及一些核心计算的文件都存放在此文件夹
data:数据加载的相关函数及类
net:包含主干网络结构及标准的centernet结构
utils:数据预处理的相关文件
Ctu_EfficientDet.py:effientdet的训练类和测试类,是整个AI的主入口
import os, sys, time, json
sys.path.append('.')
import numpy as np
import mxnet as mx
from mxnet import nd, gluon, autograd
from mxnet.contrib import amp
from data.data_loader import VOCDetection, VOC07MApMetric
from data.batchify_fn import Tuple, Stack, Pad
from data.data_transform import EfficientdetDefaultTrainTransform,EfficientdetDefaultValTransform
from core.lr_scheduler import LRScheduler,LRSequential
from core.loss import EfficientDetLoss
from nets.efficientdet import get_efficientdet
self.ctx = [mx.gpu(int(i)) for i in USEGPU.split(',') if i.strip()]
self.ctx = self.ctx if self.ctx else [mx.cpu()]
这里输入的是迭代器,后面都会利用它构建训练的迭代器
class VOCDetection(dataset.Dataset):
def CreateDataList(self,IMGDir,XMLDir):
ImgList = os.listdir(IMGDir)
XmlList = os.listdir(XMLDir)
classes = []
dataList=[]
for each_jpg in ImgList:
each_xml = each_jpg.split('.')[0] + '.xml'
if each_xml in XmlList:
dataList.append([os.path.join(IMGDir,each_jpg),os.path.join(XMLDir,each_xml)])
with open(os.path.join(XMLDir,each_xml), "r", encoding="utf-8") as in_file:
tree = ET.parse(in_file)
root = tree.getroot()
for obj in root.iter('object'):
cls = obj.find('name').text
if cls not in classes:
classes.append(cls)
return dataList,classes
def __init__(self, ImageDir, XMLDir,transform=None):
self.datalist,self.classes_names = self.CreateDataList(ImageDir,XMLDir)
self._transform = transform
self.index_map = dict(zip(self.classes_names, range(len(self.classes_names))))
# self._label_cache = self._preload_labels()
@property
def classes(self):
return self.classes_names
def __len__(self):
return len(self.datalist)
def __getitem__(self, idx):
img_path = self.datalist[idx][0]
# label = self._label_cache[idx] if self._label_cache else self._load_label(idx)
label = self._load_label(idx)
img = mx.image.imread(img_path, 1)
if self._transform is not None:
return self._transform(img, label)
return img, label.copy()
def _preload_labels(self):
return [self._load_label(idx) for idx in range(len(self))]
def _load_label(self, idx):
anno_path = self.datalist[idx][1]
root = ET.parse(anno_path).getroot()
size = root.find('size')
width = float(size.find('width').text)
height = float(size.find('height').text)
label = []
for obj in root.iter('object'):
try:
difficult = int(obj.find('difficult').text)
except ValueError:
difficult = 0
cls_name = obj.find('name').text.strip().lower()
if cls_name not in self.classes:
continue
cls_id = self.index_map[cls_name]
xml_box = obj.find('bndbox')
xmin = (float(xml_box.find('xmin').text) - 1)
ymin = (float(xml_box.find('ymin').text) - 1)
xmax = (float(xml_box.find('xmax').text) - 1)
ymax = (float(xml_box.find('ymax').text) - 1)
try:
self._validate_label(xmin, ymin, xmax, ymax, width, height)
label.append([xmin, ymin, xmax, ymax, cls_id, difficult])
except AssertionError as e:
pass
return np.array(label)
def _validate_label(self, xmin, ymin, xmax, ymax, width, height):
assert 0 <= xmin < width, "xmin must in [0, {}), given {}".format(width, xmin)
assert 0 <= ymin < height, "ymin must in [0, {}), given {}".format(height, ymin)
assert xmin < xmax <= width, "xmax must in (xmin, {}], given {}".format(width, xmax)
assert ymin < ymax <= height, "ymax must in (ymin, {}], given {}".format(height, ymax)
本项目包含了8种Efficientdet
def efficientdet_params(model_name):
params_dict = {
'efficientdet-d0': ['efficientnet-b0', 512, 64, 3, 3, 4.0],
'efficientdet-d1': ['efficientnet-b1', 640, 88, 4, 3, 4.0],
'efficientdet-d2': ['efficientnet-b2', 768, 112, 5, 3, 4.0],
'efficientdet-d3': ['efficientnet-b3', 896, 160, 5, 3, 4.0],
'efficientdet-d4': ['efficientnet-b4', 1024, 224, 7, 4, 4.0],
'efficientdet-d5': ['efficientnet-b5', 1280, 288, 7, 4, 4.0],
'efficientdet-d6': ['efficientnet-b6', 1280, 384, 8, 5, 4.0],
'efficientdet-d7': ['efficientnet-b7', 1536, 384, 8, 5, 5.0]
}
if model_name not in list(params_dict.keys()):
raise NotImplementedError('%s is not exists.'%model_name)
return params_dict[model_name]
class EfficientDet(nn.HybridBlock):
def __init__(self, base_size, stages, ratios, scales, steps, classes, fpn_channel=64, fpn_repeat=3, box_cls_repeat=3, act_type='swish', stds=(0.1, 0.1, 0.2, 0.2), nms_thresh=0.45, nms_topk=400, post_nms=100, anchor_alloc_size=128, ctx=mx.cpu(), norm_layer=nn.BatchNorm, norm_kwargs=None, **kwargs):
super(EfficientDet, self).__init__(**kwargs)
self.num_stages = len(steps)
self.classes = classes
self.nms_thresh = nms_thresh
self.nms_topk = nms_topk
self.post_nms = post_nms
num_anchors = len(ratios)*len(scales)
norm_kwargs = {} if norm_kwargs is None else norm_kwargs
im_size = (base_size, base_size)
asz = anchor_alloc_size
with self.name_scope():
self.stages = nn.HybridSequential()
self.proj_convs = nn.HybridSequential()
self.fpns = nn.HybridSequential()
self.anchor_generators = nn.HybridSequential()
for stage in stages:
self.stages.add(stage)
for i in range(self.num_stages):
block = nn.HybridSequential()
_add_conv(block, channels=fpn_channel, act_type=act_type, norm_layer=norm_layer, norm_kwargs=norm_kwargs)
self.proj_convs.add(block)
anchor_generator = AnchorGenerator(i, im_size, ratios, scales, steps[i], (asz, asz))
self.anchor_generators.add(anchor_generator)
asz = max(asz//2, 16)
for i in range(fpn_repeat):
self.fpns.add(BiFPN(fpn_channel, num_features=self.num_stages, act_type=act_type, norm_layer=norm_layer, norm_kwargs=norm_kwargs))
self.cls_net = OutputSubnet(fpn_channel, box_cls_repeat, self.num_classes+1, num_anchors, act_type=act_type, norm_layer=norm_layer, norm_kwargs=norm_kwargs, prefix='class_net')
self.box_net = OutputSubnet(fpn_channel, box_cls_repeat, 4, num_anchors, act_type=act_type, norm_layer=norm_layer, norm_kwargs=norm_kwargs, prefix='box_net')
self.bbox_decoder = NormalizedBoxCenterDecoder(stds)
self.cls_decoder = MultiPerClassDecoder(self.num_classes+1, thresh=0.01)
@property
def num_classes(self):
return len(self.classes)
def set_nms(self, nms_thresh=0.45, nms_topk=400, post_nms=100):
self._clear_cached_op()
self.nms_thresh = nms_thresh
self.nms_topk = nms_topk
self.post_nms = post_nms
def hybrid_forward(self, F, x):
feats = []
# backbone forward
for stage in self.stages:
x = stage(x)
feats.append(x)
# additional stages
for i in range(self.num_stages-len(feats)):
x = F.Pooling(x, pool_type='max', kernel=(2, 2), stride=(2, 2), pooling_convention='full')
feats.append(x)
# The channel of feature project to the input channel of BiFPN
for i, block in enumerate(self.proj_convs):
feats[i] = block(feats[i])
# Binfpn forward
for block in self.fpns:
feats = block(*feats)
cls_preds = []
box_preds = []
anchors = []
for feat, ag in zip(feats, self.anchor_generators):
box_pred = self.box_net(feat)
cls_pred = self.cls_net(feat)
anchor = ag(feat)
# (b, c*a, h, w) -> (b, c, a*h*w)
box_pred = F.reshape(F.transpose(box_pred, axes=(0, 2, 3, 1)), shape=(0, -1, 4))
cls_pred = F.reshape(F.transpose(cls_pred, axes=(0, 2, 3, 1)), shape=(0, -1, self.num_classes+1))
cls_preds.append(cls_pred)
box_preds.append(box_pred)
anchors.append(anchor)
cls_preds = F.concat(*cls_preds, dim=1)
box_preds = F.concat(*box_preds, dim=1)
anchors = F.concat(*anchors, dim=1)
if mx.autograd.is_training():
return [cls_preds, box_preds, anchors]
bboxes = self.bbox_decoder(box_preds, anchors)
cls_ids, scores = self.cls_decoder(F.softmax(cls_preds, axis=-1))
results = []
for i in range(self.num_classes):
cls_id = cls_ids.slice_axis(axis=-1, begin=i, end=i+1)
score = scores.slice_axis(axis=-1, begin=i, end=i+1)
# per class results
per_result = F.concat(*[cls_id, score, bboxes], dim=-1)
results.append(per_result)
result = F.concat(*results, dim=1)
if self.nms_thresh > 0 and self.nms_thresh < 1:
result = F.contrib.box_nms(result, overlap_thresh=self.nms_thresh, topk=self.nms_topk, valid_thresh=0.01, id_index=0, score_index=1, coord_start=2, force_suppress=False)
if self.post_nms > 0:
result = result.slice_axis(axis=1, begin=0, end=self.post_nms)
ids = F.slice_axis(result, axis=2, begin=0, end=1)
scores = F.slice_axis(result, axis=2, begin=1, end=2)
bboxes = F.slice_axis(result, axis=2, begin=2, end=6)
return ids, scores, bboxes
lr_steps = sorted([int(ls) for ls in lr_decay_epoch.split(',') if ls.strip()])
lr_decay_epoch = [e for e in lr_steps]
lr_scheduler = LRSequential([
LRScheduler('linear', base_lr=0, target_lr=learning_rate,
nepochs=0, iters_per_epoch=self.num_samples // self.batch_size),
LRScheduler(lr_mode, base_lr=learning_rate,
nepochs=TrainNum,
iters_per_epoch=self.num_samples // self.batch_size,
step_epoch=lr_decay_epoch,
step_factor=lr_decay, power=2),
])
if optim == 1:
trainer = gluon.Trainer(self.model.collect_params(), 'sgd', {'learning_rate': learning_rate, 'wd': 0.0005, 'momentum': 0.9, 'lr_scheduler': lr_scheduler})
elif optim == 2:
trainer = gluon.Trainer(self.model.collect_params(), 'adagrad', {'learning_rate': learning_rate, 'lr_scheduler': lr_scheduler})
else:
trainer = gluon.Trainer(self.model.collect_params(), 'adam', {'learning_rate': learning_rate, 'lr_scheduler': lr_scheduler})
cls_box_loss = EfficientDetLoss(len(self.classes_names)+1, rho=0.1, lambd=50.0)
ce_metric = mx.metric.Loss('FocalLoss')
smoothl1_metric = mx.metric.Loss('SmoothL1')
for i, batch in enumerate(self.train_loader):
data = gluon.utils.split_and_load(batch[0], ctx_list=self.ctx, batch_axis=0)
cls_targets = gluon.utils.split_and_load(batch[1], ctx_list=self.ctx, batch_axis=0)
box_targets = gluon.utils.split_and_load(batch[2], ctx_list=self.ctx, batch_axis=0)
with autograd.record():
cls_preds = []
box_preds = []
for x in data:
cls_pred, box_pred, _ = self.model(x)
cls_preds.append(cls_pred)
box_preds.append(box_pred)
sum_loss, cls_loss, box_loss = cls_box_loss(cls_preds, box_preds, cls_targets, box_targets)
if self.ampFlag:
with amp.scale_loss(sum_loss, trainer) as scaled_loss:
autograd.backward(scaled_loss)
else:
autograd.backward(sum_loss)
trainer.step(self.batch_size)
ce_metric.update(0, [l * self.batch_size for l in cls_loss])
smoothl1_metric.update(0, [l * self.batch_size for l in box_loss])
name1, loss1 = ce_metric.get()
name2, loss2 = smoothl1_metric.get()
print('[Epoch {}][Batch {}], Speed: {:.3f} samples/sec, {}={:.3f}, {}={:.3f}'.format(epoch, i, self.batch_size/(time.time()-btic), name1, loss1, name2, loss2))
btic = time.time()
def predict(self, image, confidence=0.5, mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225)):
start_time = time.time()
origin_img = copy.deepcopy(image)
base_imageSize = origin_img.shape
image = self.resize_image(image,(self.image_size,self.image_size))
# print(resize_imageSize,base_imageSize) # (512, 512, 3) (780, 1248, 3)
img = nd.array(image)
# img = resize_short_within(img, self.image_size, max_size)
img = mx.nd.image.to_tensor(img)
img = mx.nd.image.normalize(img, mean=mean, std=std)
x = img.expand_dims(0)
x = x.as_in_context(self.ctx[0])
labels, scores, bboxes = [xx[0].asnumpy() for xx in self.model(x)]
origin_img_pillow = self.cv2_pillow(origin_img)
font = ImageFont.truetype(font='./model_data/simhei.ttf', size=np.floor(3e-2 * np.shape(origin_img_pillow)[1] + 0.5).astype('int32'))
thickness = max((np.shape(origin_img_pillow)[0] + np.shape(origin_img_pillow)[1]) // self.image_size, 1)
imgbox = []
for i, bbox in enumerate(bboxes):
if (scores is not None and scores.flat[i] < confidence) or labels is not None and labels.flat[i] < 0:
continue
cls_id = int(labels.flat[i]) if labels is not None else -1
xmin, ymin, xmax, ymax = [int(x) for x in bbox]
xmin, ymin, xmax, ymax = xmin/self.image_size, ymin/self.image_size, xmax/self.image_size, ymax/self.image_size
box_xy, box_wh = np.array([(xmin+xmax)/2,(ymin+ymax)/2]).astype('float32'), np.array([xmax-xmin,ymax-ymin]).astype('float32')
image_shape = np.array((base_imageSize[0],base_imageSize[1]))
input_shape = np.array((self.image_size,self.image_size))
result = self.correct_boxes(box_xy, box_wh, input_shape, image_shape,True)
ymin, xmin, ymax, xmax = result
xmin, ymin, xmax, ymax = int(xmin), int(ymin), int(xmax), int(ymax)
class_name = self.classes_names[cls_id]
score = '{:d}%'.format(int(scores.flat[i] * 100)) if scores is not None else ''
imgbox.append([(xmin, ymin, xmax, ymax), cls_id, self.classes_names[cls_id], score])
top, left, bottom, right = ymin, xmin, ymax, xmax
# cv2.rectangle(origin_img, (xmin, ymin), (xmax, ymax), self.colors[cls_id], 2)
# if class_name or score:
# y = ymin - 15 if ymin - 15 > 15 else ymin + 15
# cv2.putText(origin_img, '{:s} {:s}'.format(class_name, score),
# (xmin, y), cv2.FONT_HERSHEY_SIMPLEX, min(1.0 / 2, 2),
# self.colors[cls_id], min(int(1.0), 5), lineType=cv2.LINE_AA)
label = '{}-{}'.format(class_name, score)
draw = ImageDraw.Draw(origin_img_pillow)
label_size = draw.textsize(label, font)
label = label.encode('utf-8')
if top - label_size[1] >= 0:
text_origin = np.array([left, top - label_size[1]])
else:
text_origin = np.array([left, top + 1])
for i in range(thickness):
draw.rectangle([left + i, top + i, right - i, bottom - i], outline=self.colors[cls_id])
draw.rectangle([tuple(text_origin), tuple(text_origin + label_size)], fill=self.colors[cls_id])
draw.text(text_origin, str(label,'UTF-8'), fill=(0, 0, 0), font=font)
del draw
result_value = {
"image_result": self.pillow_cv2(origin_img_pillow),
"bbox": imgbox,
"time": (time.time() - start_time) * 1000
}
return result_value
if __name__ == '__main__':
ctu = Ctu_Efficientdet(USEGPU='0',image_size=512, ampFlag = False)
ctu.InitModel(DataDir=r'D:/Ctu/Ctu_Project_DL/DataSet/DataSet_Detection_Color',batch_size=1,Pre_Model = './Model_efficientdet/best_model.dat',num_workers=0,phi=0)
ctu.train(TrainNum=150,learning_rate=0.00001,lr_decay_epoch='50,100,150,200',lr_decay = 0.9,ModelPath='./Model2',optim=2,lr_mode='step')