Visdrone目标检测实验(含数据格式转换)

Visdrone目标检测实验(含数据格式转换)

目前的初步计划是使用ssd或者fasterrcnn或者目前比较新的这个yolov4出来,来处理数据集

github地址:https://github.com/cmFighting/visdrone_detection

数据集介绍

主页:http://aiskyeye.com/challenge/object-detection/

  • We are pleased to announce the VisDrone2020 Object Detection in Images Challenge (Task 1). This competition is designed to push the state-of-the-art in object detection with drone platform forward. Teams are required to predict the bounding boxes of objects of ten predefined classes (i.e., pedestrian, person, car, van, bus, truck, motor, bicycle, awning-tricycle, and tricycle) with real-valued confidences. Some rarely occurring special vehicles (e.g., machineshop truck, forklift truck, and tanker) are ignored in evaluation.
  • The challenge containing 10,209 static images (6,471 for training, 548 for validation and 3,190 for testing) captured by drone platforms in different places at different height, are available on the download page. We manually annotate the bounding boxes of different categories of objects in each image. In addition, we also provide two kinds of useful annotations, occlusion ratio and truncation ratio. Specifically, we use the fraction of objects being occluded to define the occlusion ratio. The truncation ratio is used to indicate the degree of object parts appears outside a frame. If an object is not fully captured within a frame, we annotate the bounding box across the frame boundary and estimate the truncation ratio based on the region outside the image. It is worth mentioning that a target is skipped during evaluation if its truncation ratio is larger than 50%. Annotations on the training and validation sets are publicly available.

原始的数据集中一共提供了以下11个类,当用SSD代码进行训练的时候,需要额外添加一个背景类,也就是在ssd代码中使用了12个类,本文使用ssd300.

'pedestrian', 'people', 'bicycle', 'car', 'van','truck', 'tricycle', 'awning-tricycle', 'bus', 'motor', 'others'

官方提供的数据中训练集是6471、验证集是548、测试集是1610。其中训练集用来训练模型,验证集用来保存模型,测试集用来测试模型。

visdrone标注形式

visdrone也采用了xml作为标注文件,示例如下:

http://aiskyeye.com/evaluate/results-format/
684,8,273,116,0,0,0,0
406,119,265,70,0,0,0,0
255,22,119,128,0,0,0,0
1,3,209,78,0,0,0,0
从左到右依次是,,,,,,,

需要注意的是在转化为voc标注形式的时候需要事先根据做左上角的位置求解出右下角的位置,后面的四个位置表示得分、物体类别、截断和遮挡,这部分内容在做转化的时候其实可以暂时不考虑,无所谓,直接使用默认的即可。

voc标注形式

voc官网:http://host.robots.ox.ac.uk/pascal/VOC/

voc数据集简介:https://blog.csdn.net/mzpmzk/article/details/88065416

  • 组织结构

    .
    ├── Annotations 进行 detection 任务时的标签文件,xml 形式,文件名与图片名一一对应
    ├── ImageSets 包含三个子文件夹 Layout、Main、Segmentation,其中 Main 存放的是分类和检测的数据集分割文件
    ├── JPEGImages 存放 .jpg 格式的图片文件
    ├── SegmentationClass 存放按照 class 分割的图片
    └── SegmentationObject 存放按照 object 分割的图片
    
    ├── Main
    │   ├── train.txt 写着用于训练的图片名称, 共 2501 个
    │   ├── val.txt 写着用于验证的图片名称,共 2510 个
    │   ├── trainval.txt train与val的合集。共 5011 个
    │   ├── test.txt 写着用于测试的图片名称,共 4952 个
    
  • xml解析(以目标检测为例)

    
    	VOC2007
    	000001.jpg  # 文件名 
    	
    		The VOC2007 Database
    		PASCAL VOC2007
    		flickr
    		341012865
    	
    	
    		Fried Camels
    		Jinky the Fruit Bat
    	
    	  # 图像尺寸, 用于对 bbox 左上和右下坐标点做归一化操作
    		353
    		500
    		3
    	
    	0  # 是否用于分割
    	
    		dog  # 物体类别
    		Left  # 拍摄角度:front, rear, left, right, unspecified 
    		1  # 目标是否被截断(比如在图片之外),或者被遮挡(超过15%)
    		0  # 检测难易程度,这个主要是根据目标的大小,光照变化,图片质量来判断
    		
    			48
    			240
    			195
    			371
    		
    	
    	
    		person
    		Left
    		1
    		0
    		
    			8
    			12
    			352
    			498
    		
    	
    
    
  • 分类标准

    以本文的方法来说,我们主要是做一个目标检测的任务,其中一个xml中标注了多个object,name表示是物体的类别,然后采用左上角和右下角标注bbox。

yolo标注形式

YOLO数据集txt标注格式:

0 0.160938 0.541667 0.120312 0.386111
标注内容的类别、归一化后的中心点x坐标,归一化后的中心点y坐标,归一化后的目标框宽度w,归一化后的目标况高度h(此处归一化指的是除以图片宽和高)

重点 voc和coco之间的转化公式

import xml.etree.ElementTree as ET
import pickle
import os
from os import listdir, getcwd
from os.path import join
import sys

sets=[('2018', 'train'), ('2018', 'val')]

classes = ["a", "b", "c", "d"]

# soft link your VOC2018 under here
root_dir = sys.argv[1]


def convert(size, box):
    dw = 1./(size[0])
    dh = 1./(size[1])
    x = (box[0] + box[1])/2.0 - 1
    y = (box[2] + box[3])/2.0 - 1
    w = box[1] - box[0]
    h = box[3] - box[2]
    x = x*dw
    w = w*dw
    y = y*dh
    h = h*dh
    return (x,y,w,h)

def convert_annotation(year, image_id):
    in_file = open(os.path.join(root_dir, 'VOC%s/Annotations/%s.xml'%(year, image_id)))
    out_file = open(os.path.join(root_dir, 'VOC%s/labels/%s.txt'%(year, image_id)), 'w')
    tree=ET.parse(in_file)
    root = tree.getroot()
    size = root.find('size')
    w = int(size.find('width').text)
    h = int(size.find('height').text)

    for obj in root.iter('object'):
        difficult = obj.find('difficult').text
        cls = obj.find('name').text
        if cls not in classes or int(difficult)==1:
            continue
        cls_id = classes.index(cls)
        xmlbox = obj.find('bndbox')
        b = (float(xmlbox.find('xmin').text), float(xmlbox.find('xmax').text), float(xmlbox.find('ymin').text), float(xmlbox.find('ymax').text))
        bb = convert((w,h), b)
        out_file.write(str(cls_id) + " " + " ".join([str(a) for a in bb]) + '\n')

wd = getcwd()

for year, image_set in sets:
    labels_target = os.path.join(root_dir, 'VOC%s/labels/'%(year))
    print('labels dir to save: {}'.format(labels_target))
    if not os.path.exists(labels_target):
        os.makedirs(labels_target)
    image_ids = open(os.path.join(root_dir, 'VOC{}/ImageSets/Main/{}.txt'.format(year, image_set))).read().strip().split()
    list_file = open(os.path.join(root_dir, '%s_%s.txt'%(year, image_set)), 'w')
    for image_id in image_ids:
        img_f = os.path.join(root_dir, 'VOC%s/JPEGImages/%s.jpg\n'%(year, image_id))
        list_file.write(os.path.abspath(img_f))
        convert_annotation(year, image_id)
    list_file.close()

print('done.')

coco标注形式

参考这篇文章:https://blog.csdn.net/qq_41375609/article/details/94737915

[x,y,width,height] 边界框表示形式

Yolo实验

参考代码:https://github.com/eriklindernoren/PyTorch-YOLOv3

服务器位置:/mnt/data/scm/2020/ai/yolo/PyTorch-YOLOv3

pytorch 1.0

tensorflow 1.14

训练指令:

CUDA_VISIBLE_DEVICES=1 python train.py --model_def config/yolov3-custom.cfg --data_config config/custom.data --pretrained_weights weights/darknet53.conv.74

进程信息
(ai-yolo) supermicro@supermicro:/mnt/data/scm/2020/ai/yolo/PyTorch-YOLOv3$ CUDA_VISIBLE_DEVICES=1 nohup python train.py --model_def config/yolov3-custom.cfg --data_config config/custom.data --pretrained_weights weights/darknet53.conv.74 &
[1] 10816

实验结果

1610张测试图像
epoch98
Detecting objects: 100%|██████████| 202/202 [01:35<00:00,  2.12it/s]
Computing AP:   0%|          | 0/11 [00:00

SSD实验

参考代码:https://github.com/amdegroot/ssd.pytorch

服务器位置:/data1/scm/ssd/code/SSD0414/

实验结果

1610张测试图像
image_sets_file:/data1/scm/ssd/data/voc/VisDrone_ROOT/DET2019/ImageSets/Main/test.txt
2020-11-04 14:56:01,130 SSD.inference INFO: Evaluating VisDrone_2019__test dataset(1610 images):
100%|██████████████████████████████████████████████████████████████████| 161/161 [00:17<00:00,  9.22it/s]
2020-11-04 14:56:20,698 SSD.inference INFO: mAP: 0.1524
pedestrian      : 0.1170
people          : 0.0909
bicycle         : 0.0909
car             : 0.4377
van             : 0.1740
truck           : 0.2258
tricycle        : 0.1048
awning-tricycle : 0.0413
bus             : 0.3754
motor           : 0.0800
others          : 0.0909
24.98157835006714
FPS:64.44748916337679
测试图像结果
E:\datas\ai\voc\VisDrone_ROOT\VisDrone2019\Images_split\ssd_result

你可能感兴趣的:(目标检测,深度学习,python,计算机视觉,人工智能)