VOC官方地址
又名PASCAL VOC数据集,主要有两个版本VOC_2007和VOC_2012,均为4个大类,20个小类。
['aeroplane', 'bicycle', 'bird', 'boat', 'bottle', 'bus', 'car', 'cat', 'chair', 'cow', 'diningtable', 'dog',
'horse', 'motorbike', 'person', 'pottedplant', 'sheep', 'sofa', 'train', 'tvmonitor']
VOC_2007数据集包包含图片有9900张左右,其数据量的规模和目标的类别能够满足绝大多数模型的训练要求;VOC_2012是以2008的VOC数据集(与VOC_2007不同,是重新建立的数据集)为基础,不断进行扩充后的版本,到2011年图片数量达到11000张,2012年对其标注进行了优化改进,即如今的VOC_2012。VOC_2012的JPEGImages中共存储了17125张图片,但其中只有11540张图片用于检测任务(train:5717 ; val:5823)。
下图是VOC数据集的成长历程,不同的颜色表示数据集的内容:
标签:xml文件;图片:JPG
注意VOC2012 test 没有公布,所以训练的时候可以自行重新规划数据集比例或训练完成后提交到 PASCAL VOC Evaluation Server上评估结果。
如有需要特定数据集的可私信博主。
VOC2007 和 VOC2012 目标检测任务中的训练、验证和测试数据统计如下表所示:
论文中的常用组合有以下几种:
JPEGImages:存放的是训练与测试的所有图片;
Annotations:里面存放的是每张图片打完标签所对应的XML文件,其中XML文件和图像文件名称一致(除后缀名);
ImageSets/mMain:ImageSets文件夹下主要是Main文件夹中有四个文本文件test.txt、train.txt、trainval.txt、val.txt, 其中分别存放的是测试集图片的文件名、训练集图片的文件名、训练验证集合集的文件名、验证集图片的文件名;txt文件中每一行包含一个图片的名称,末尾会加上±1表示正负样本;
SegmentationClass与SegmentationObject:存放的都是图片,且都是图像分割结果图,对目标检测任务来说没有用。SegmentationClass 标注出每一个像素的类别,语义分割相关(class);SegmentationObject 标注出每一个像素属于哪一个物体,实例分割相关(object)。
目录层级文件结构:
VOC2012
- JPEGImages //所有源图片
- name1.jpg
- name2.jpg
- ...
- Annotations
- name1.xml
- name2.xml
- ...
- ImageSets
- Main
- test.txt //用于测试的图片名称,共 4952 个
- train.txt //用于训练的图片名称, 共 2501 个
- trainval.txt //train与val的合集。共 5011 个
- val.txt //用于验证的图片名称,共 2510 个
- Action //所有具有Action标注信息图像文件名的txt文件列表
- Layout //其txt文件表示包含Layout标注信息的图像文件名列表
- Segmentation //包含语义分割信息图像文件的列表
- SegmentationClass //语义分割分类
- SegmentationObject //实例分割分类
在main中对于每个类(如jinx类)都有三个文件,分别对应:jinx_train.txt、jinx_val.txt、jinx_trainval.txt;
每个类别txt文件中的内容格式为:图像文件名 + 空格 + 标记;如下分别表示jinx类中jinx1图片、jinx2图片
、jinx3图片对应txt中的内容, jinx*
为图像文件名,0
表示包含jinx1图片该jinx对象但是难识别样本,1
表示jinx2图片包含jinx对象样本,-1
表示jinx3不包含进行对象样本。
jinx1 0
jinx2 1
jinx3 -1
注意:VOC_2007中不包含0、1、-1
,仅只有文件名。
标签xml文件结构:
<annotation>
<folder>VOC2007</folder>
<filename>000001.jpg</filename> # 文件名
<source>
<database>The VOC2007 Database</database>
<annotation>PASCAL VOC2007</annotation>
<image>flickr</image>
<flickrid>341012865</flickrid>
</source>
<owner>
<flickrid>Fried Camels</flickrid>
<name>Jinky the Fruit Bat</name>
</owner>
<size> # 图像尺寸, 用于对 bbox 左上和右下坐标点做归一化操作
<width>353</width>
<height>500</height>
<depth>3</depth>
</size>
<segmented>0</segmented> # 是否用于分割,1有分割标注,0表示没有分割标注。
<object>
<name>dog</name> # 物体类别
<pose>Left</pose> # 拍摄角度:front, rear, left, right, unspecified
<truncated>1</truncated> # 目标是否被截断(比如在图片之外),或者被遮挡(超过15%)
<difficult>0</difficult> # 检测难易程度,这个主要是根据目标的大小,光照变化,图片质量来判断,0表示是但较难识别,1表示是,-1表示不是
<bndbox>
<xmin>48</xmin>
<ymin>240</ymin>
<xmax>195</xmax>
<ymax>371</ymax>
</bndbox>
</object>
<object>
<name>person</name>
<pose>Left</pose>
<truncated>1</truncated> # 是否被标记为截断,0表示没有,1表示是
<difficult>0</difficult>
<bndbox>
<xmin>8</xmin>
<ymin>12</ymin>
<xmax>352</xmax>
<ymax>498</ymax>
</bndbox>
</object>
</annotation>
size:记录图像宽高属性。
object/name:为目标分类名称;
object/bndbox:记录目标框的左上和右下二维坐标,object/truncated:代表目标是否存在部分遮挡(>15%),0则代表不存在遮挡,1则代表部分遮挡。
分为以下2个步骤。
比如自制一个名为Jinx的数据集,其目录下应该包含如下三个文件夹:
其中ImageSets
下再建立一个Main
文件夹:
此处以标注工具labelImg为例,具体标注方法网上有很多教程,大家自行选择。
关于类别命名均,由1.1.1可知,均使用小写字母,严谨一点,则同步VOC均使用小写字母。
标注完成后将标注文件xml保存到Annotations
文件夹中。
JPEGImages:存放所有源图片(JPG格式)。
Annotations:存放标签XML文件,其文件名与训练图片的文件名一一对应,1.2.1中也有说明;其内容格式见1.2.3
ImageSets/Main:存放四个txt文件,train.txt存放用于训练图片名字集合,val.txt存放用于验证的图片名字集合,trainval.txt用于存放train.txt与val.txt的集合,test.txt存放着用于测试的图片名字集合;其内容格式见1.2.2
将源图片集复制到JPEGImages
文件后,还需要将所有的图片命名为VOC的指定命名形式“000005.jpg”。
import os
def voc_rename(path):
filelist = os.listdir(path) # 该文件夹下所有的文件(包括文件夹)
count = 0
for file in filelist: # 遍历所有文件
Olddir = os.path.join(path, file) # 原来的文件路径
if os.path.isdir(Olddir): # 如果是文件夹则跳过
continue
filename = os.path.splitext(file)[0] # 文件名
filetype = os.path.splitext(file)[1] # 文件扩展名
Newdir = os.path.join(path, str(count).zfill(6) + filetype) # 用字符串函数zfill 以0补全所需位数
os.rename(Olddir, Newdir) # 重命名
count += 1
if __name__ == '__main__':
filepath = 'D:\jinxData'
voc_rename(filepath)
如图便是目录D:\jinxData
下的图片经过重命名预处理后的截图。
import os
import random
def voc_proportion_divide(xmlfilepath, txtsavepath, trainval_percent, train_percent):
'''
vod数据集比例自定义划分
Args:
xmlfilepath: xml文件的地址, xml一般存放在Annotations下,如'D:\jinx\Annatations'
txtsavepath:地址选择自己数据下的ImageSets/Main,如'D:\jinx\ImageSets\Main'
trainval_percent: 训练和验证集比例
train_percent: 训练集比例(如trainval_percent=0.8,train_percent=0.7表示0.7train、 0.1val、0.2test)
'''
total_xml = os.listdir(xmlfilepath)
if not os.path.exists(txtsavepath):
os.makedirs(txtsavepath)
num = len(total_xml)
list_index = range(num)
tv = int(num * trainval_percent)
tr = int(tv * train_percent)
trainval = random.sample(list_index, tv)
train = random.sample(trainval, tr)
file_trainval = open(txtsavepath + '/trainval.txt', 'w')
file_test = open(txtsavepath + '/test.txt', 'w')
file_train = open(txtsavepath + '/train.txt', 'w')
file_val = open(txtsavepath + '/val.txt', 'w')
for i in list_index:
name = total_xml[i][:-4] + '\n'
if i in trainval:
file_trainval.write(name)
if i in train:
file_train.write(name)
else:
file_val.write(name)
else:
file_test.write(name)
file_trainval.close()
file_train.close()
file_val.close()
file_test.close()
非自制必须项,或转换数据集格式时需要。
根据划分结果对源图片集进行重组:
def voc_image_redivide(self, source, target):
'''
Args:
source: 源文件图片索引目录地址,如'D:\jinx\ImageSets\Main\trainval.txt'
target: 图片存放地址,如'D:\jinx\data'
Returns:
'''
with open(source) as context:
for file_name in context:
# file_name = file_name[0:11]
file_name = file_name.rstrip('\n')
# 若处理xml文件改为'.xml'即可
file_name = file_name + '.jpg'
# 源图片集
images_source = 'D:/jinxData/voctest/JPEGImages/'
if 'train' in source:
if not os.path.exists(target + '/train'):
os.makedirs(target + '/train')
shutil.copyfile(images_source + file_name, target + '/train/' + file_name)
elif 'val' in source:
if not os.path.exists(target + '/val'):
os.makedirs(target + '/val')
shutil.copyfile(images_source + file_name, target + '/val/' + file_name)
# 若没有test.txt:
# 删除训练集和验证集,剩余图片为测试集
# os.remove(images_source + file_name)
elif 'test' in source:
if not os.path.exists(target + '/test'):
os.makedirs(target + '/test')
shutil.copyfile(images_source + file_name, target + '/test/' + file_name)
对于Annotations中的.xml文件同理,只需将.jpg改为.xml。
import xml.etree.ElementTree as ET
import os
import json
from datetime import datetime
import sys
import argparse
coco = dict()
coco['images'] = []
coco['type'] = 'instances'
coco['annotations'] = []
coco['categories'] = []
category_set = dict()
image_set = set()
category_item_id = -1
image_id = 000000
annotation_id = 0
def addCatItem(name):
global category_item_id
category_item = dict()
category_item['supercategory'] = 'none'
category_item_id += 1
category_item['id'] = category_item_id
category_item['name'] = name
coco['categories'].append(category_item)
category_set[name] = category_item_id
return category_item_id
def addImgItem(file_name, size):
global image_id
if file_name is None:
raise Exception('Could not find filename tag in xml file.')
if size['width'] is None:
raise Exception('Could not find width tag in xml file.')
if size['height'] is None:
raise Exception('Could not find height tag in xml file.')
image_id += 1
image_item = dict()
image_item['id'] = image_id
image_item['file_name'] = file_name
image_item['width'] = size['width']
image_item['height'] = size['height']
image_item['license'] = None
image_item['flickr_url'] = None
image_item['coco_url'] = None
image_item['date_captured'] = str(datetime.today())
coco['images'].append(image_item)
image_set.add(file_name)
return image_id
def addAnnoItem(object_name, image_id, category_id, bbox):
global annotation_id
annotation_item = dict()
annotation_item['segmentation'] = []
seg = []
# bbox[] is x,y,w,h
# left_top
seg.append(bbox[0])
seg.append(bbox[1])
# left_bottom
seg.append(bbox[0])
seg.append(bbox[1] + bbox[3])
# right_bottom
seg.append(bbox[0] + bbox[2])
seg.append(bbox[1] + bbox[3])
# right_top
seg.append(bbox[0] + bbox[2])
seg.append(bbox[1])
annotation_item['segmentation'].append(seg)
annotation_item['area'] = bbox[2] * bbox[3]
annotation_item['iscrowd'] = 0
annotation_item['ignore'] = 0
annotation_item['image_id'] = image_id
annotation_item['bbox'] = bbox
annotation_item['category_id'] = category_id
annotation_id += 1
annotation_item['id'] = annotation_id
coco['annotations'].append(annotation_item)
def read_image_ids(image_sets_file):
ids = []
with open(image_sets_file, 'r') as f:
for line in f.readlines():
ids.append(line.strip())
return ids
def parseXmlFilse(data_dir, json_save_path, split='train'):
assert os.path.exists(data_dir), "data path:{} does not exist".format(data_dir)
labelfile = split + ".txt"
image_sets_file = os.path.join(data_dir, "ImageSets", "Main", labelfile)
xml_files_list = []
if os.path.isfile(image_sets_file):
ids = read_image_ids(image_sets_file)
xml_files_list = [os.path.join(data_dir, "Annotations", f"{i}.xml") for i in ids]
elif os.path.isdir(data_dir):
# 修改此处xml的路径即可
# xml_dir = os.path.join(data_dir,"labels/voc")
xml_dir = data_dir
xml_list = os.listdir(xml_dir)
xml_files_list = [os.path.join(xml_dir, i) for i in xml_list]
for xml_file in xml_files_list:
if not xml_file.endswith('.xml'):
continue
tree = ET.parse(xml_file)
root = tree.getroot()
# 初始化
size = dict()
size['width'] = None
size['height'] = None
if root.tag != 'annotation':
raise Exception('pascal voc xml root element should be annotation, rather than {}'.format(root.tag))
# 提取图片名字
file_name = root.findtext('filename')
assert file_name is not None, "filename is not in the file"
# 提取图片 size {width,height,depth}
size_info = root.findall('size')
assert size_info is not None, "size is not in the file"
for subelem in size_info[0]:
size[subelem.tag] = int(subelem.text)
if file_name is not None and size['width'] is not None and file_name not in image_set:
# 添加coco['image'],返回当前图片ID
current_image_id = addImgItem(file_name, size)
print('add image with name: {}\tand\tsize: {}'.format(file_name, size))
elif file_name in image_set:
raise Exception('file_name duplicated')
else:
raise Exception("file name:{}\t size:{}".format(file_name, size))
# 提取一张图片内所有目标object标注信息
object_info = root.findall('object')
if len(object_info) == 0:
continue
# 遍历每个目标的标注信息
for object in object_info:
# 提取目标名字
object_name = object.findtext('name')
if object_name not in category_set:
# 创建类别索引
current_category_id = addCatItem(object_name)
else:
current_category_id = category_set[object_name]
# 初始化标签列表
bndbox = dict()
bndbox['xmin'] = None
bndbox['xmax'] = None
bndbox['ymin'] = None
bndbox['ymax'] = None
# 提取box:[xmin,ymin,xmax,ymax]
bndbox_info = object.findall('bndbox')
for box in bndbox_info[0]:
bndbox[box.tag] = int(box.text)
if bndbox['xmin'] is not None:
if object_name is None:
raise Exception('xml structure broken at bndbox tag')
if current_image_id is None:
raise Exception('xml structure broken at bndbox tag')
if current_category_id is None:
raise Exception('xml structure broken at bndbox tag')
bbox = []
# x
bbox.append(bndbox['xmin'])
# y
bbox.append(bndbox['ymin'])
# w
bbox.append(bndbox['xmax'] - bndbox['xmin'])
# h
bbox.append(bndbox['ymax'] - bndbox['ymin'])
print('add annotation with object_name:{}\timage_id:{}\tcat_id:{}\tbbox:{}'.format(object_name,
current_image_id,
current_category_id,
bbox))
addAnnoItem(object_name, current_image_id, current_category_id, bbox)
json_parent_dir = os.path.dirname(json_save_path)
if not os.path.exists(json_parent_dir):
os.makedirs(json_parent_dir)
json.dump(coco, open(json_save_path, 'w'))
print("class nums:{}".format(len(coco['categories'])))
print("image nums:{}".format(len(coco['images'])))
print("bbox nums:{}".format(len(coco['annotations'])))
if __name__ == '__main__':
"""
脚本说明:
本脚本用于将VOC格式的标注文件.xml转换为coco格式的标注文件.json
参数说明:
voc_data_dir:两种格式
1.voc2012文件夹的路径,会自动找到voc2012/imageSets/Main/xx.txt
2.xml标签文件存放的文件夹
json_save_path:json文件输出的文件夹
split:主要用于voc2012查找xx.txt,如train.txt.如果用格式2,则不会用到该参数
"""
parser = argparse.ArgumentParser()
parser.add_argument('-d', '--voc-dir', type=str, default='data/label/voc', help='voc path')
parser.add_argument('-s', '--save-path', type=str, default='./data/convert/coco/train.json', help='json save path')
parser.add_argument('-t', '--type', type=str, default='train', help='only use in voc2012/2007')
opt = parser.parse_args()
if len(sys.argv) > 1:
print(opt)
parseXmlFilse(opt.voc_dir, opt.save_path, opt.type)
else:
# voc_data_dir = r'D:\jinx\VOC2012'
voc_data_dir = './data/labels/voc'
json_save_path = './data/convert/coco/train.json'
split = 'train'
parseXmlFilse(data_dir=voc_data_dir, json_save_path=json_save_path, split=split)
import os
import json
import argparse
import sys
import shutil
from lxml import etree
from tqdm import tqdm
category_set = set()
image_set = set()
bbox_nums = 0
def parse_xml_to_dict(xml):
"""
将xml文件解析成字典形式,参考tensorflow的recursive_parse_xml_to_dict
Args:
xml: xml tree obtained by parsing XML file contents using lxml.etree
Returns:
Python dictionary holding XML contents.
"""
if len(xml) == 0: # 遍历到底层,直接返回tag对应的信息
return {xml.tag: xml.text}
result = {}
for child in xml:
child_result = parse_xml_to_dict(child) # 递归遍历标签信息
if child.tag != 'object':
result[child.tag] = child_result[child.tag]
else:
if child.tag not in result: # 因为object可能有多个,所以需要放入列表里
result[child.tag] = []
result[child.tag].append(child_result[child.tag])
return {xml.tag: result}
def write_classIndices(category_set):
class_indices = dict((k, v) for v, k in enumerate(category_set))
json_str = json.dumps(dict((val, key) for key, val in class_indices.items()), indent=4)
with open('class_indices.json', 'w') as json_file:
json_file.write(json_str)
def xyxy2xywhn(bbox, size):
bbox = list(map(float, bbox))
size = list(map(float, size))
xc = (bbox[0] + (bbox[2] - bbox[0]) / 2.) / size[0]
yc = (bbox[1] + (bbox[3] - bbox[1]) / 2.) / size[1]
wn = (bbox[2] - bbox[0]) / size[0]
hn = (bbox[3] - bbox[1]) / size[1]
return (xc, yc, wn, hn)
def parser_info(info: dict, only_cat=True, class_indices=None):
filename = info['annotation']['filename']
image_set.add(filename)
objects = []
width = int(info['annotation']['size']['width'])
height = int(info['annotation']['size']['height'])
for obj in info['annotation']['object']:
obj_name = obj['name']
category_set.add(obj_name)
if only_cat:
continue
xmin = int(obj['bndbox']['xmin'])
ymin = int(obj['bndbox']['ymin'])
xmax = int(obj['bndbox']['xmax'])
ymax = int(obj['bndbox']['ymax'])
bbox = xyxy2xywhn((xmin, ymin, xmax, ymax), (width, height))
if class_indices is not None:
obj_category = class_indices[obj_name]
object = [obj_category, bbox]
objects.append(object)
return filename, objects
def parseXmlFilse(voc_dir, save_dir):
assert os.path.exists(voc_dir), "ERROR {} does not exists".format(voc_dir)
if os.path.exists(save_dir):
shutil.rmtree(save_dir)
os.makedirs(save_dir)
xml_files = [os.path.join(voc_dir, i) for i in os.listdir(voc_dir) if os.path.splitext(i)[-1] == '.xml']
for xml_file in xml_files:
with open(xml_file) as fid:
xml_str = fid.read()
xml = etree.fromstring(xml_str)
info_dict = parse_xml_to_dict(xml)
parser_info(info_dict, only_cat=True)
with open(save_dir + "/classes.txt", 'w') as classes_file:
for cat in sorted(category_set):
classes_file.write("{}\n".format(cat))
class_indices = dict((v, k) for k, v in enumerate(sorted(category_set)))
xml_files = tqdm(xml_files)
for xml_file in xml_files:
with open(xml_file) as fid:
xml_str = fid.read()
xml = etree.fromstring(xml_str)
info_dict = parse_xml_to_dict(xml)
filename, objects = parser_info(info_dict, only_cat=False, class_indices=class_indices)
if len(objects) != 0:
global bbox_nums
bbox_nums += len(objects)
with open(save_dir + "/" + filename.split(".")[0] + ".txt", 'w') as f:
for obj in objects:
f.write(
"{} {:.5f} {:.5f} {:.5f} {:.5f}\n".format(obj[0], obj[1][0], obj[1][1], obj[1][2], obj[1][3]))
if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument('--voc-dir', type=str, default='./data/labels/voc')
parser.add_argument('--save-dir', type=str, default='./data/convert/yolo')
opt = parser.parse_args()
if len(sys.argv) > 1:
print(opt)
parseXmlFilse(**vars(opt))
print("image nums: {}".format(len(image_set)))
print("category nums: {}".format(len(category_set)))
print("bbox nums: {}".format(bbox_nums))
else:
voc_dir = './data/labels/voc'
save_dir = './data/convert/yolo'
parseXmlFilse(voc_dir, save_dir)
print("image nums: {}".format(len(image_set)))
print("category nums: {}".format(len(category_set)))
print("bbox nums: {}".format(bbox_nums))
转换代码参见转换代码博文,写得很清晰。