coco数据集

目录

一、COCO2017数据集格式

二、现有标注格式

三、格式转换

1.建立目录

2.生成train和val图片名文本文件

3.将图片移动至对应目录下

4.生成json文件

4.visdrone

1. 将annotations中的txt标签转化为xml文件

2.xml2json

训练记录:

Reference:


一、COCO2017数据集格式

COCO_ROOT     #根目录
    ├── annotations        # 存放json格式的标注
    │     ├── instances_train2017.json   
    │     └── instances_val2017.json
    └── train2017         # 存放图片文件
    │     ├── 000000000001.jpg 
    │     ├── 000000000002.jpg 
    │     └── 000000000003.jpg 
    └── val2017        
          ├── 000000000004.jpg 
          └── 000000000005.jpg

COCO所有目标框标注都放在json文件中,json文件解析出来是一个字典,格式如下:

{
  "info": info, 
  "images": [image], 
  "annotations": [annotation], 
  "categories": [categories],
  "licenses": [license],
}

info记录关于数据集的一些基本信息

"info":{
	"description":"This is stable 1.0 version of the 2014 MS COCO dataset.",
	"url":"http:\/\/mscoco.org",
	"version":"1.0",
	"year":2017,
	"contributor":"Microsoft COCO group",
	"date_created":"2017-01-27 09:11:52.357475"
}

licenses是数据集遵循的一些许可,格式是list,其中内容为:

"licenses":{
	"url":"http:\/\/creativecommons.org\/licenses\/by-nc-sa\/2.0\/",
	"id":1,
	"name":"Attribution-NonCommercial-ShareAlike License"
}

制作自己的数据集的时候info和licenses是不需要的。只需要images,annotations和categories三个字段即可。

        其中images是一个字典的列表,储存图像的文件名,高宽和id,id是图象的编号,在annotations中也用到,是唯一的。有多少张图片,该列表就有多少个字典。

# json['images'][0]
{
  'file_name': '000000397133.jpg',
  'height': 427,
  'width': 640,
  'id': 397133
}

"images":{
    "coco_url": "", 
    "date_captured": "", 
    "file_name": "000001.jpg", 
    "flickr_url": "", 
    "id": 1, 
    "license": 0, 
    "width": 416, 
    "height": 416
}

 categories表示所有的类别,有多少类就定义多少,类别的id从1开始,0为背景。格式如下:

"categories":{
    "id": int,
    "name": str,
    "supercategory": str,
}

[
  {'supercategory': 'person', 'id': 1, 'name': 'person'},
  {'supercategory': 'vehicle', 'id': 2, 'name': 'bicycle'},
  {'supercategory': 'vehicle', 'id': 3, 'name': 'car'},
  {'supercategory': 'vehicle', 'id': 4, 'name': 'motorcycle'},
  {'supercategory': 'vehicle', 'id': 5, 'name': 'airplane'},
  {'supercategory': 'vehicle', 'id': 6, 'name': 'bus'},
  {'supercategory': 'vehicle', 'id': 7, 'name': 'train'},
  {'supercategory': 'vehicle', 'id': 8, 'name': 'truck'},
  {'supercategory': 'vehicle', 'id': 9, 'name': 'boat'}
  # ....
]

annotations是数据集中包含的实例掩膜,数量等于bounding box的数量。segmentation格式取决于这个实例是一个单个的对象(即iscrowd=0,将使用polygons格式,以多边形顶点表示)还是一组对象(即iscrowd=1,将使用RLE格式,mask编码)

annotations是检测框的标注,一个bounding box的格式如下:

{'segmentation': [[]],
 'area': 240.000,
 'iscrowd': 0,
 'image_id': 289343,
 'bbox': [0., 0., 60., 40.],
 'category_id': 1,
 'id': 1768}

"annotations":{
    "id": int,
    "image_id": int,
    "category_id": int,
    "segmentation": RLE or [polygon],
    "area": float,
    "bbox": [x,y,width,height],
    "iscrowd": 0 or 1
}

# 以多边形顶点形式表示的实例:
"annotations":{
	"segmentation": [[510.66,423.01,511.72,420.03,510.45......]],
	"area": 702.1057499999998,
	"iscrowd": 0,
	"image_id": 289343,
	"bbox": [473.07,395.93,38.65,28.67],
	"category_id": 18,
	"id": 1768
}

//解析其中的类别ID、图像ID:
coco = COCO(annotation_file.json)
catIds = coco.getCatIds()
imgIds = coco.getImgIds()

  其中segmentation是分割的多边形,我对这个键的含义不是很懂,而且我用到的标注只有bbox,所知直接设置成了[[]],注意一定是两个列表嵌套,area是分割的面积,bbox是检测框的[x, y, w, h]坐标,category_id是类别id,与categories中对应,image_id图像的id,id是bbox的id,每个检测框是唯一的,有几个bbox,annotations里就有几个字

二、现有标注格式

        使用的数据来自阿里天池宫颈癌风险检测竞赛的数据集,经过预处理后获得图像及其对应的json文件标注信息,如下所示:

coco数据集_第1张图片

三、格式转换

1.建立目录

        按照COCO数据集格式建立目录,这一步很简单,没啥可说的。

2.生成train和val图片名文本文件

from glob import glob
import random
 
 
# 该目录存储图片数据
patch_fn_list = glob('D:/data/TianChi/Train/roi_train_total/*.jpg')  
# 返回存储图片名的列表,不包含图片的后缀
patch_fn_list = [fn.split('\\')[-1][:-4] for fn in patch_fn_list]
# 将图片打乱顺序
random.shuffle(patch_fn_list)
 
# 按照7:3比例划分train和val
train_num = int(0.7 * len(patch_fn_list))
train_patch_list = patch_fn_list[:train_num]
valid_patch_list = patch_fn_list[train_num:]
 
# produce train/valid/trainval txt file
split = ['train_total', 'val_total', 'trainval_total']
 
for s in split:
    # 存储文本文件的地址
    save_path = 'D:/data/TianChi/Train/' + s + '.txt'
 
    if s == 'train':
        with open(save_path, 'w') as f:
            for fn in train_patch_list:
                # 将训练图像的地址写入train.txt文件
                f.write('%s\n' % fn)
    elif s == 'val':
        with open(save_path, 'w') as f:
            for fn in valid_patch_list:
                # 将验证图像的地址写入val.txt文件
                f.write('%s\n' % fn)
    elif s == 'trainval':
        with open(save_path, 'w') as f:
            for fn in patch_fn_list:
                # 将所有图像名的编号写入trainval.txt文件
                f.write('%s\n' % fn)
    print('Finish Producing %s txt file to %s' % (s, save_path))

3.将图片移动至对应目录下

import shutil
 
 
 
def my_move(datadir, trainlistdir,vallistdir,traindir,valdir):
    # 打开train.txt文件
    fopen = open(trainlistdir, 'r') 
    # 读取图片名称
    file_names = fopen.readlines()
    for file_name in file_names:
        file_name=file_name.strip('\n')
        # 图片的路径
        traindata = datadir + file_name+'.jpg'
        # 把图片移动至traindir路径下
        # 若想复制可将move改为copy
        shutil.move(traindata, traindir)
    # 同上
    fopen = open(vallistdir, 'r')
    file_names = fopen.readlines()
    for file_name in file_names:
        file_name=file_name.strip('\n')
        valdata = datadir + file_name+'.jpg'
        shutil.move(valdata, valdir)
 
# 图片存储地址
datadir=r'D:\data\TianChi\Train\roi_uniform_hue\\'
# 存储训练图片名的txt文件地址
trainlistdir=r'D:\data\TianChi\Train\ImageSets\Main\train.txt'
# 存储验证图片名的txt文件地址
vallistdir=r'D:\data\TianChi\Train\ImageSets\Main\val.txt'
# coco格式数据集的train2017目录
traindir=r'D:\data\TianChi\Train\COCO_ROOT\train2017'
# coco格式数据集的val2017目录
valdir=r'D:\data\TianChi\Train\COCO_ROOT\val2017'
my_move(datadir, trainlistdir,vallistdir,traindir,valdir)

4.生成json文件

import json
import glob
import cv2 as cv
import os
 
 
class tococo(object):
    def __init__(self, jpg_paths, label_path, save_path):
        self.images = []
        self.categories = []
        self.annotations = []
        # 返回每张图片的地址
        self.jpgpaths = jpg_paths
        self.save_path = save_path
        self.label_path = label_path
        # 可根据情况设置类别,这里只设置了一类
        self.class_ids = {'pos': 1}
        self.class_id = 1
        self.coco = {}
 
    def npz_to_coco(self):
        annid = 0
        for num, jpg_path in enumerate(self.jpgpaths):
 
            imgname = jpg_path.split('\\')[-1].split('.')[0]
            img = cv.imread(jpg_path)
            jsonf = open(self.label_path + imgname + '.json').read()  # 读取json
            labels = json.loads(jsonf)
            h, w = img.shape[:-1]
            self.images.append(self.get_images(imgname, h, w, num))
            for label in labels:
                # self.categories.append(self.get_categories(label['class'], self.class_id))
                px,py,pw,ph=label['x'],label['y'],label['w'],label['h']
                box=[px,py,pw,ph]
                print(box)
                self.annotations.append(self.get_annotations(box, num, annid, label['class']))
                annid = annid + 1
 
        self.coco["images"] = self.images
        self.categories.append(self.get_categories(label['class'], self.class_id))
        self.coco["categories"] = self.categories
        self.coco["annotations"] = self.annotations
        # print(self.coco)
 
    def get_images(self, filename, height, width, image_id):
        image = {}
        image["height"] = height
        image['width'] = width
        image["id"] = image_id
        # 文件名加后缀
        image["file_name"] = filename+'.jpg'
        # print(image)
        return image
 
    def get_categories(self, name, class_id):
        category = {}
        category["supercategory"] = "Positive Cell"
        # id=0
        category['id'] = class_id
        # name=1
        category['name'] = name
        # print(category)
        return category
 
    def get_annotations(self, box, image_id, ann_id, calss_name):
        annotation = {}
        w, h = box[2], box[3]
        area = w * h
        annotation['segmentation'] = [[]]
        annotation['iscrowd'] = 0
        # 第几张图像,从0开始
        annotation['image_id'] = image_id
        annotation['bbox'] = box
        annotation['area'] = float(area)
        # category_id=0
        annotation['category_id'] = self.class_ids[calss_name]
        # 第几个标注,从0开始
        annotation['id'] = ann_id
        # print(annotation)
        return annotation
 
    def save_json(self):
        self.npz_to_coco()
        label_dic = self.coco
        # print(label_dic)
        instances_train2017 = json.dumps(label_dic)
        # 可改为instances_train2017.json
        f = open(os.path.join(save_path+'\instances_val2017.json'), 'w')
        f.write(instances_train2017)
        f.close()
 
# 可改为train2017,要对应上面的
jpg_paths = glob.glob('D:\data\TianChi\Train\COCO_ROOT\\val2017\*.jpg')
# 现有的标注文件地址
label_path = r'D:\data\TianChi\Train\roi_label\\'
# 保存地址
save_path = r'D:\data\TianChi\Train\COCO_ROOT\annotations'
c = tococo(jpg_paths, label_path, save_path)
c.save_json()

       至此就完成了COCO数据格式的转换,就可以用来跑模型了。上述程序仅适用于阿里天池宫颈癌风险检测竞赛的数据集,需要根据自己的数据进行修改。

4.visdrone

visdrone是一个无人机的目标检测数据集,在很多目标检测的论文中都能看到它的身影。
标签从0到11分别为’ignored regions’,‘pedestrian’,‘people’,‘bicycle’,‘car’,‘van’,
‘truck’,‘tricycle’,‘awning-tricycle’,‘bus’,‘motor’,‘others’

现在先要用mmdetection自己训练一下这个数据集,需要把他转化为coco数据集格式

分两步走:

1. 将annotations中的txt标签转化为xml文件

需要改的地方有注释,就是几个路径改一下即可

import os
from PIL import Image

# 把下面的root_dir路径改成你自己的路径即可
root_dir = r"D:\object_detection_data\datacovert\VisDrone2019-DET-val/"   
annotations_dir = root_dir+"annotations/"
image_dir = root_dir + "images/"
xml_dir = root_dir+"Annotations_XML/"   #在工作目录下创建Annotations_XML文件夹保存xml文件

# 下面的类别也换成你自己数据类别,也可适用于其他的数据集转换
class_name = ['ignored regions','pedestrian','people','bicycle','car','van',
    'truck','tricycle','awning-tricycle','bus','motor','others']

for filename in os.listdir(annotations_dir):
    fin = open(annotations_dir+filename, 'r')
    image_name = filename.split('.')[0]
    img = Image.open(image_dir+image_name+".jpg") # 若图像数据是“png”转换成“.png”即可
    xml_name = xml_dir+image_name+'.xml'
    with open(xml_name, 'w') as fout:
        fout.write(''+'\n')
        
        fout.write('\t'+'VOC2007'+'\n')
        fout.write('\t'+''+image_name+'.jpg'+''+'\n')
        
        fout.write('\t'+''+'\n')
        fout.write('\t\t'+''+'VisDrone2019-DET'+''+'\n')
        fout.write('\t\t'+''+'VisDrone2019-DET'+''+'\n')
        fout.write('\t\t'+''+'flickr'+''+'\n')
        fout.write('\t\t'+''+'Unspecified'+''+'\n')
        fout.write('\t'+''+'\n')
        
        fout.write('\t'+''+'\n')
        fout.write('\t\t'+''+'LJ'+''+'\n')
        fout.write('\t\t'+''+'LJ'+''+'\n')
        fout.write('\t'+''+'\n')
        
        fout.write('\t'+''+'\n')
        fout.write('\t\t'+''+str(img.size[0])+''+'\n')
        fout.write('\t\t'+''+str(img.size[1])+''+'\n')
        fout.write('\t\t'+''+'3'+''+'\n')
        fout.write('\t'+''+'\n')
        
        fout.write('\t'+''+'0'+''+'\n')

        for line in fin.readlines():
            line = line.split(',')
            fout.write('\t'+''+'\n')
            fout.write('\t\t'+''+class_name[int(line[5])]+''+'\n')
            fout.write('\t\t'+''+'Unspecified'+''+'\n')
            fout.write('\t\t'+''+line[6]+''+'\n')
            fout.write('\t\t'+''+str(int(line[7]))+''+'\n')
            fout.write('\t\t'+''+'\n')
            fout.write('\t\t\t'+''+line[0]+''+'\n')
            fout.write('\t\t\t'+''+line[1]+''+'\n')
            # pay attention to this point!(0-based)
            fout.write('\t\t\t'+''+str(int(line[0])+int(line[2])-1)+''+'\n')
            fout.write('\t\t\t'+''+str(int(line[1])+int(line[3])-1)+''+'\n')
            fout.write('\t\t'+''+'\n')
            fout.write('\t'+''+'\n')
             
        fin.close()
        fout.write('')

2.xml2json

#!/usr/bin/python
# xml是voc的格式
# json是coco的格式
import sys, os, json, glob
import xml.etree.ElementTree as ET

INITIAL_BBOXIds = 1
# PREDEF_CLASSE = {}
PREDEF_CLASSE = { 'pedestrian': 1, 'people': 2,
    'bicycle': 3, 'car': 4, 'van': 5, 'truck': 6, 'tricycle': 7,
    'awning-tricycle': 8, 'bus': 9, 'motor': 10}
    #我这里只想检测这十个类, 0和11没有加入转化。

# function
def get(root, name):
    return root.findall(name)

def get_and_check(root, name, length):
    vars = root.findall(name)
    if len(vars) == 0:
        raise NotImplementedError('Can not find %s in %s.'%(name, root.tag))
    if length > 0 and len(vars) != length:
        raise NotImplementedError('The size of %s is supposed to be %d, but is %d.'%(name, length, len(vars)))
    if length == 1:
        vars = vars[0]
    return vars

def convert(xml_paths, out_json):
    json_dict = {'images': [], 'type': 'instances', 
        'categories': [], 'annotations': []}
    categories = PREDEF_CLASSE
    bbox_id = INITIAL_BBOXIds
    for image_id, xml_f in enumerate(xml_paths):

        # 进度输出
        sys.stdout.write('\r>> Converting image %d/%d' % (
            image_id + 1, len(xml_paths)))
        sys.stdout.flush()

        tree = ET.parse(xml_f)
        root = tree.getroot()
        filename = get_and_check(root, 'filename', 1).text
        size = get_and_check(root, 'size', 1)
        width = int(get_and_check(size, 'width', 1).text)
        height = int(get_and_check(size, 'height', 1).text)
        image = {'file_name': filename, 'height': height, 
                'width': width, 'id': image_id + 1}
        json_dict['images'].append(image)
        ## Cruuently we do not support segmentation
        #segmented = get_and_check(root, 'segmented', 1).text
        #assert segmented == '0'

        for obj in get(root, 'object'):
            category = get_and_check(obj, 'name', 1).text
            if category not in categories:
                new_id = max(categories.values()) + 1
                categories[category] = new_id
            category_id = categories[category]
            bbox = get_and_check(obj, 'bndbox', 1)
            xmin = int(get_and_check(bbox, 'xmin', 1).text) - 1
            ymin = int(get_and_check(bbox, 'ymin', 1).text) - 1
            xmax = int(get_and_check(bbox, 'xmax', 1).text)
            ymax = int(get_and_check(bbox, 'ymax', 1).text)
            if xmax <= xmin or ymax <= ymin:
                continue
            o_width = abs(xmax - xmin)
            o_height = abs(ymax - ymin)
            ann = {'area': o_width * o_height, 'iscrowd': 0, 'image_id': image_id + 1,
                'bbox': [xmin, ymin, o_width, o_height], 'category_id': category_id, 
                'id': bbox_id, 'ignore': 0, 'segmentation': []}
            json_dict['annotations'].append(ann)
            bbox_id = bbox_id + 1

    for cate, cid in categories.items():
        cat = {'supercategory': 'none', 'id': cid, 'name': cate}
        json_dict['categories'].append(cat)
        
    # json_file = open(out_json, 'w')
    # json_str = json.dumps(json_dict)
    # json_file.write(json_str)
    # json_file.close() # 快
    json.dump(json_dict, open(out_json, 'w'), indent=4)  # indent=4 更加美观显示 慢

if __name__ == '__main__':
    xml_path = r'D:\object_detection_data\datacovert\VisDrone2019-DET-val/Annotations_XML/'   #改一下读取xml文件位置
    xml_file = glob.glob(os.path.join(xml_path, '*.xml'))
    convert(xml_file, r'D:\object_detection_data\datacovert\VisDrone2019-DET-val/NEW_val.json')  #这里是生成的json保存位置,改一下

如图:

coco数据集_第2张图片

训练记录:

这里选用的是configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.py模型。
首先下载对应权重,修改权重后面的全连接层的神经元个数
两阶段通用脚本如下,修改载入的权重和保存的权重名字运行即可。

import torch
pretrained_weights  = torch.load('checkpoints/faster_rcnn_r50_fpn_1x_coco_20200130-047c8118.pth')
num_class = 10
pretrained_weights['state_dict']['roi_head.bbox_head.fc_cls.weight'].resize_(num_class+1, 1024)
pretrained_weights['state_dict']['roi_head.bbox_head.fc_cls.bias'].resize_(num_class+1)
pretrained_weights['state_dict']['roi_head.bbox_head.fc_reg.weight'].resize_(num_class*4, 1024)
pretrained_weights['state_dict']['roi_head.bbox_head.fc_reg.bias'].resize_(num_class*4)
torch.save(pretrained_weights, "faster_rcnn_r50_fpn_1x_%d.pth"%num_class)

后面加载这个修改后的权重即可。
这里我只检测十个类 ,0 和11 对应的类没有检测。
接下来需要修改和类别相关的三个地方

  1. base/faster_rcnn_r50_fpn_coco.py中的 numclass=10
  2. mmdet/core/evalution/class_names.py下

这里修改为visdeone要检测的类别

 mmdet/datasets/coco.py下

coco数据集_第3张图片

 修改完类别之后可以运行 下面这个命令检查标签对着没,对着就可以开始训练了

python  tools/misc/browse_dataset.py   config/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.py    

这个数据集场景比较复杂,小物体的map非常低,还把人分成了pedestrain和people,个人感觉这俩太像了,容易误检,所以这俩的map贼低,感觉分类略多。

转好的json资源放在了百度云—链接随后放上来。
链接:https://pan.baidu.com/s/1BnpYSsViBnuT7FJq-nzxWw
提取码:1111

Reference:

目标检测 – 解析VOC和COCO格式并制作自己的数据集 – X.YU (xyu.ink)

将visdrone数据集转化为coco格式并在mmdetection上训练,附上转好的json文件-CSDN博客_visdrone转coco

VisDrone2019(to yolo / voc / coco)---MMDetection数据篇-CSDN博客_visdrone2019重要

你可能感兴趣的:(目标检测,深度学习,人工智能,计算机视觉)