项目链接【https://aistudio.baidu.com/aistudio/projectdetail/4647849?contributionType=1】
水下目标检测旨在对水下场景中的物体进行定位和识别。这项研究由于在海洋学、水下导航等领域的广泛应用而引起了持续的关注。但是,由于复杂的水下环境和光照条件,这仍然是一项艰巨的任务。
基于深度学习的物体检测系统已在各种应用中表现出较好的性能,但在处理水下目标检测方面仍然感到不足,主要有原因是:可用的水下目标检测数据集稀少,实际应用中的水下场景的图像杂乱无章,并且水下环境中的目标物体通常很小,而当前基于深度学习的目标检测器通常无法有效地检测小物体,或者对小目标物体的检测性能较差。同时,在水下场景中,与波长有关的吸收和散射问题大大降低了水下图像的质量,从而导致了可见度损失,弱对比度和颜色变化等问题。
Al+水下勘探是一个新兴领域,目前专门用于水下研究工作的解决方案不多,高质量的数据集更是弥足珍贵。使用 PP-YOLOE+ 来推进水下目标检测的进步,从而使得水下机器人等设备能够更加智能化,提高海底资源勘探等方面的效率。而水下机器人又反哺出高质量的水下目标检测数据集,推动 Al+水下勘探的发展。
深度学习中,数据往往决定了性能的上限,算法只是不断地逼近上限。尽管基于深度学习的方法在标准的目标检测中取得了可喜的性能。水下目标检测仍具有以下几点挑战:
(1)水下场景的实际应用中目标通常很小,含有大量的小目标;
(2)水下数据集和实际应用中的图像通常是模糊的,图像中具有异构的噪声。
因此,针对以上所述的背景和水中目标检测所遇到的挑战,本项目将选用 PP-YOLOE+ 这一基于飞桨云边一体高精度模型PP-YOLOE迭代优化升级的版本。针对性的解决以上问题。
PP-YOLOE+ 具有如下特点:
超强性能
训练收敛加速
下游任务泛化性显著提升
高性能部署能力
PaddlePaddle >= 2.3.2
Python == 3.7
%cd /home/aistudio/work/
# gitee 国内下载比较快
!git clone https://gitee.com/paddlepaddle/PaddleDetection.git -b develop
# github
# !git clone https://github.com/PaddlePaddle/PaddleDetection.git -b develop
# 环境安装
%cd /home/aistudio/work/
# gitee 国内下载比较快
# !git clone https://gitee.com/paddlepaddle/PaddleDetection.git -b develop
%cd PaddleDetection/
!pip install -r requirements.txt > /dev/null
/home/aistudio/work
/home/aistudio/work/PaddleDetection
本项目选用数据集来源于水下目标检测算法赛(注:数据由鹏城实验室提供),训练集是5543张 jpg 格式的水下光学图像与对应标注结果构成,其中主要有海参、海胆、扇贝、海星四种目标。
supercategory | id | name |
---|---|---|
component | 1 | echinus |
component | 2 | holothurian |
component | 3 | scallop |
component | 4 | starfish |
component | 5 | waterweeds |
4.1.2 图像分辨率
长度 | 宽度 | 图片数量 |
---|---|---|
704 | 576 | 38 |
1920 | 1080 | 596 |
3840 | 2160 | 1712 |
720 | 405 | 3153 |
586 | Text | 44 |
# 1.数据集解压
!unzip data/data172711/fish.zip > /dev/null
!mv ./fish ./work/PaddleDetection/dataset
%cd work/PaddleDetection/
/home/aistudio/work/PaddleDetection
# 2.voc2coco
import argparse
import glob
import json
import os
import os.path as osp
import sys
import shutil
import numpy as np
import PIL.ImageDraw
import xml.dom.minidom as xmldom
import cv2
label_to_num = {}
categories_list = []
labels_list = []
class MyEncoder(json.JSONEncoder):
def default(self, obj):
if isinstance(obj, np.integer):
return int(obj)
elif isinstance(obj, np.floating):
return float(obj)
elif isinstance(obj, np.ndarray):
return obj.tolist()
else:
return super(MyEncoder, self).default(obj)
def getbbox(self, points):
polygons = points
mask = self.polygons_to_mask([self.height, self.width], polygons)
return self.mask2box(mask)
def images_labelme(data, num):
image = {}
image['height'] = data['imageHeight']
image['width'] = data['imageWidth']
image['id'] = num + 1
image['file_name'] = data['imagePath'].split('/')[-1]
return image
def images_cityscape(num, img_file, w, h):
image = {}
image['height'] = h
image['width'] = w
image['id'] = num + 1
image['file_name'] = img_file
return image
def categories(label, labels_list):
category = {}
category['supercategory'] = 'component'
category['id'] = len(labels_list) + 1
category['name'] = label
return category
def annotations_rectangle(points, label, image_num, object_num, label_to_num):
annotation = {}
seg_points = np.asarray(points).copy()
annotation['segmentation'] = [list(seg_points.flatten())]
annotation['iscrowd'] = 0
annotation['image_id'] = image_num + 1
annotation['bbox'] = list(
map(float, [
points[0][0], points[0][1], points[1][0] - points[0][0], points[1][
1] - points[0][1]
]))
annotation['area'] = annotation['bbox'][2] * annotation['bbox'][3]
annotation['category_id'] = label_to_num[label]
annotation['id'] = object_num + 1
return annotation
def annotations_polygon(height, width, points, label, image_num, object_num,
label_to_num):
annotation = {}
annotation['segmentation'] = [list(np.asarray(points).flatten())]
annotation['iscrowd'] = 0
annotation['image_id'] = image_num + 1
annotation['bbox'] = list(map(float, get_bbox(height, width, points)))
annotation['area'] = annotation['bbox'][2] * annotation['bbox'][3]
annotation['category_id'] = label_to_num[label]
annotation['id'] = object_num + 1
return annotation
def get_bbox(height, width, points):
polygons = points
mask = np.zeros([height, width], dtype=np.uint8)
mask = PIL.Image.fromarray(mask)
xy = list(map(tuple, polygons))
PIL.ImageDraw.Draw(mask).polygon(xy=xy, outline=1, fill=1)
mask = np.array(mask, dtype=bool)
index = np.argwhere(mask == 1)
rows = index[:, 0]
clos = index[:, 1]
left_top_r = np.min(rows)
left_top_c = np.min(clos)
right_bottom_r = np.max(rows)
right_bottom_c = np.max(clos)
return [
left_top_c, left_top_r, right_bottom_c - left_top_c,
right_bottom_r - left_top_r
]
def deal_json(output_dir, input_dir, img_list):
data_coco = {}
images_list = []
annotations_list = []
image_num = -1
object_num = -1
# labels_list =[]
for img_file in img_list:
img = cv2.imread(osp.join('dataset/fish/Images', img_file))
w = img.shape[1]
h = img.shape[0]
img_label = img_file.split('.')[0]
label_file = osp.join('dataset/fish/Annotations', img_label + '.xml')
# print('Generating dataset from:', label_file)
image_num = image_num + 1
xml_file = xmldom.parse(label_file)
eles = xml_file.documentElement
images_list.append(images_cityscape(image_num, img_file, w, h))
for i in range(len(eles.getElementsByTagName('name'))):
label = eles.getElementsByTagName('name')[i].firstChild.data
# if label == 'starfish':
# continue
object_num = object_num + 1
# print(label)
if label not in labels_list:
# print(label)
categories_list.append(categories(label, labels_list))
labels_list.append(label)
label_to_num[label] = len(labels_list)
points = []
xmin = int(eles.getElementsByTagName('xmin')[i].firstChild.data)
ymin = int(eles.getElementsByTagName('ymin')[i].firstChild.data)
xmax = int(eles.getElementsByTagName('xmax')[i].firstChild.data)
ymax = int(eles.getElementsByTagName('ymax')[i].firstChild.data)
# print(xmin,ymin,xmax,ymax)
# if xmin > 2000:
# print(label_file)
points.append([xmin, ymin])
points.append([xmax, ymax])
annotations_list.append(
annotations_rectangle(points, label, image_num,
object_num, label_to_num))
data_coco['images'] = images_list
data_coco['categories'] = categories_list
data_coco['annotations'] = annotations_list
# print(labels_list)
return data_coco
import os.path as osp
import glob
import os
import shutil
train_img = 'dataset/fish/Images'
train_box = 'dataset/fish/Annotations'
# Allocate the dataset.
total_num = len(glob.glob(osp.join(train_img, '*.jpg')))
train_num = int(total_num * 0.8)
os.makedirs('data/cocome' + '/train')
val_num = total_num - train_num
os.makedirs('data/cocome' + '/val')
count = 1
train_list = []
val_list = []
for img_name in os.listdir(train_img):
if count <= train_num:
if osp.exists('data/cocome' + '/train/'):
shutil.copyfile(
osp.join(train_img, img_name),
osp.join('data/cocome' + '/train/', img_name))
train_list.append(img_name)
else:
if count <= train_num + val_num:
if osp.exists('data/cocome' + '/val/'):
shutil.copyfile(
osp.join(train_img, img_name),
osp.join('data/cocome' + '/val/', img_name))
val_list.append(img_name)
count = count + 1
if not os.path.exists('data/cocome' + '/annotations'):
os.makedirs('data/cocome' + '/annotations')
train_data_coco = deal_json(
'data/cocome' + '/train', train_img ,train_list)
train_json_path = osp.join('data/cocome' + '/annotations', 'instance_train.json')
json.dump(
train_data_coco,
open(train_json_path, 'w'),
indent=4,
cls=MyEncoder)
val_data_coco = deal_json('data/cocome' + '/val', train_img, val_list)
val_json_path = osp.join('data/cocome' + '/annotations', 'instance_val.json')
json.dump(val_data_coco, open(val_json_path, 'w'), indent=4, cls=MyEncoder)
# 3.将转换后的数据集移动至dataset/fish
!mv data/cocome/* dataset/fish/
metric: COCO
num_classes: 5
TrainDataset:
!COCODataSet
image_dir: train
anno_path: annotations/instance_train.json
dataset_dir: dataset/fish
data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd']
EvalDataset:
!COCODataSet
image_dir: val
anno_path: annotations/instance_val.json
dataset_dir: dataset/fish
TestDataset:
!ImageFolder
anno_path: label_list.txt # also support txt (like VOC's label_list.txt)
dataset_dir: dataset/fish # if set, anno_path will be 'dataset_dir/anno_path'
使用 PP-YOLOE+ 进行训练:
在./configs/ppyoloe/ppyoloe_plus_crn_s_80e_coco.yml中提供了基于PP-YOLOE+训练该场景的配置,训练脚本如下:
!python tools/train.py -c configs/ppyoloe/ppyoloe_plus_crn_s_80e_coco.yml -o LearningRate.base_lr=0.000125 -o snapshot_epoch=1 -o worker_num=1 --eval
在训练模型以后,我们可以通过运行评估命令来得到模型的精度,以确认训练的效果。评估可以参考以下命令执行。
这里使用了我们已经训练好的模型。如希望使用自己训练的模型,请对应将weights=后的值更改为对应模型.pdparams文件的存储路径。
# 模型评估
!python tools/eval.py -c configs/ppyoloe/ppyoloe_plus_crn_s_80e_coco.yml -o weights=output/ppyoloe_plus_crn_s_80e_coco/best_model.pdparams
Warning: import ppdet from source directory without installing, run 'python setup.py install' to install ppdet firstly
W1016 22:54:41.591436 22455 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 11.2, Runtime API Version: 11.2
W1016 22:54:41.598824 22455 gpu_resources.cc:91] device: 0, cuDNN Version: 8.2.
loading annotations into memory...
Done (t=0.00s)
creating index...
index created!
[10/16 22:54:43] ppdet.utils.checkpoint INFO: Finish loading model weights: output/ppyoloe_plus_crn_s_80e_coco/best_model.pdparams
[10/16 22:54:44] ppdet.engine INFO: Eval iter: 0
[10/16 22:54:55] ppdet.metrics.metrics INFO: The bbox result is saved to bbox.json.
loading annotations into memory...
Done (t=0.01s)
creating index...
index created!
[10/16 22:54:55] ppdet.metrics.coco_utils INFO: Start evaluate...
Loading and preparing results...
DONE (t=0.50s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=3.09s).
Accumulating evaluation results...
DONE (t=0.43s).
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.278
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.541
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.267
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.237
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.275
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.304
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.160
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.436
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.543
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.514
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.509
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.572
[10/16 22:54:59] ppdet.engine INFO: Total sample number: 200, averge FPS: 19.422192298181745
这里我们将训练好的模型对着整个数据集的验证集来一番批量预测,这些预测结果会记录在VisualDL中,可以很方便地与原图对比,观察预测效果。
!python tools/infer.py -c configs/ppyoloe/ppyoloe_plus_crn_s_80e_coco.yml -o weights=output/ppyoloe_plus_crn_s_80e_coco/best_model --infer_dir ./dataset/fish/val --use_vdl=True --vdl_log_dir=./output/image --output_dir ./output/results
.pdparams只包括了模型的参数数据,实际部署还需要执行导出步骤。导出步骤可以参考下面列举的步骤:
注意,这里使用了我们已经训练好的模型。如希望使用自己训练的模型,请对应将weights=后的值更改为对应模型.pdparams文件的存储路径。如果没有指定–output_dir,那么导出的模型将默认存储在 output_inference/ 路径下。
!python tools/export_model.py -c configs/ppyoloe/ppyoloe_plus_crn_s_80e_coco.yml -o weights=output/ppyoloe_plus_crn_s_80e_coco/best_model.pdparams
至此,我们就完成了水底生物目标检测模型的从训练到导出的过程。接下来,看看该模型使用Paddle Inference部署时的具体性能表现。
!python deploy/python/infer.py --model_dir=output_inference/ppyoloe_plus_crn_s_80e_coco --image_file=dataset/fish/val/c000418.jpg --run_mode=paddle --device=gpu
----------- Running Arguments -----------
action_file: None
batch_size: 1
camera_id: -1
combine_method: nms
cpu_threads: 1
device: gpu
enable_mkldnn: False
enable_mkldnn_bfloat16: False
image_dir: None
image_file: dataset/fish/val/c000418.jpg
match_metric: ios
match_threshold: 0.6
model_dir: output_inference/ppyoloe_plus_crn_s_80e_coco
output_dir: output
overlap_ratio: [0.25, 0.25]
random_pad: False
reid_batch_size: 50
reid_model_dir: None
run_benchmark: False
run_mode: paddle
save_images: True
save_mot_txt_per_img: False
save_mot_txts: False
save_results: False
scaled: False
slice_infer: False
slice_size: [640, 640]
threshold: 0.5
tracker_config: None
trt_calib_mode: False
trt_max_shape: 1280
trt_min_shape: 1
trt_opt_shape: 640
use_coco_category: False
use_dark: True
use_gpu: False
video_file: None
window_size: 50
------------------------------------------
----------- Model Configuration -----------
Model Arch: YOLO
Transform Order:
--transform op: Resize
--transform op: NormalizeImage
--transform op: Permute
--------------------------------------------
class_id:0, confidence:0.7396, left_top:[299.31,0.93],right_bottom:[331.32,32.59]
class_id:0, confidence:0.7280, left_top:[293.39,100.21],right_bottom:[332.52,138.38]
class_id:0, confidence:0.6212, left_top:[528.09,-1.20],right_bottom:[566.73,20.30]
class_id:0, confidence:0.5986, left_top:[71.87,-0.63],right_bottom:[111.20,26.91]
class_id:0, confidence:0.5766, left_top:[120.02,101.65],right_bottom:[181.98,152.03]
class_id:0, confidence:0.5554, left_top:[405.12,215.59],right_bottom:[458.40,272.60]
class_id:0, confidence:0.5486, left_top:[476.20,49.54],right_bottom:[511.81,85.39]
class_id:1, confidence:0.6370, left_top:[446.48,281.86],right_bottom:[504.50,342.79]
class_id:1, confidence:0.5986, left_top:[450.84,89.30],right_bottom:[494.58,123.99]
class_id:1, confidence:0.5715, left_top:[510.54,89.15],right_bottom:[570.02,138.44]
class_id:1, confidence:0.5151, left_top:[512.79,94.07],right_bottom:[562.67,133.47]
save result to: output/c000418.jpg
Test iter 0
------------------ Inference Time Info ----------------------
total_time(ms): 1072.8, img_num: 1
average latency time(ms): 1072.80, QPS: 0.932140
preprocess_time(ms): 18.20, inference_time(ms): 1054.50, postprocess_time(ms): 0.10
本项目用EasyEdge端与边缘AI服务平台对水底生物目标检测模型进行部署。EasyEdge是基于百度飞桨轻量化推理框架Paddle Lite研发的端与边缘AI服务平台,能够帮助深度学习开发者将自建模型快速部署到设备端。只需上传模型,最快2分种即可获得适配终端硬件/芯片的模型。
至此,本项目的一个完整流程就结束啦!如果对此感兴趣,不妨动起来呀!