Pascal Voc2012数据集中segmention数据转化为pkl文件

    Pascal Voc2012数据集在object detection 和 instance segmention都具有较大的作用,并且相比于COCO来说,它比较小,更加适合像我这种穷学生操作。COCO实在是太大了,在本地还是在服务器上操作都太难受了,之前在服务器上下载COCO都花了几天的时间,在本地使用的linux和window双系统,给linux就分配了200G,这要是下载一个数据集就把我磁盘容量用完。。

    Pascal Voc2012数据集的结构:

    把VOC2012下载下来后,打开VOC2012这个文件夹,里面有如下几个文件夹:

Pascal Voc2012数据集中segmention数据转化为pkl文件_第1张图片

    分别简单的说下各个文件夹里面的内容吧。

    Annotations中是各个图的标注,包括了2007-2012年所有的标注,总共有17125项。

Pascal Voc2012数据集中segmention数据转化为pkl文件_第2张图片


	VOC2012
	2007_000323.jpg
	
		The VOC2007 Database
		PASCAL VOC2007
		flickr
	
	
		500
		375
		3
	
	1
	
		person
		Frontal
		1
		0
		
			277
			3
			500
			375
		
	
	
		person
		Frontal
		1
		0
		
			12
			3
			305
			375
		
	

    在2007_000323.jpg这张图里面有两个object。每个object下包括了它的bndbox,第一个是xmin=277,ymin=3,xmax=500,ymax=375;第二个是xmin=12,ymin=3,xmax=305,ymax=375。

这个是很一般的一个标注,在VOC2012中也存在人体姿态的标注,在使用时需要注意:


	VOC2012
	2007_000272.jpg
	
		The VOC2007 Database
		PASCAL VOC2007
		flickr
	
	
		333
		500
		3
	
	0
	
		person
		Right
		1
		0
		
			head
			
				142
				93
				256
				229
			
		
		
			hand
			
				233
				372
				304
				425
			
		
		
			25
			71
			304
			500
		
	

可以看出加入了人的head 和 hand的姿态标注。之前编程查找bounding box的时候还被这个坑了。

还需要注意的是,在2007-2008年的标注中bounding box的顺序是

			25
			71
			304
			500

在2009-2012年的标注中bounding box的顺序是:

			476
			52
			332
			71

ImageSets这个文件夹暂时用不上,所以我也没有取管他具体意义,我就略过了。

接下里就是JPEGImages文件夹,里面是所有的原图像,故它里面也是17125项。随便贴一张

Pascal Voc2012数据集中segmention数据转化为pkl文件_第3张图片


   SegmentationClass中存的是用于语义分割的标注图,也瞎贴一张,总共2913项。

Pascal Voc2012数据集中segmention数据转化为pkl文件_第4张图片

    最后一个文件夹是实例分割用到的标注图,瞎贴一张,总共2913项。

Pascal Voc2012数据集中segmention数据转化为pkl文件_第5张图片

    做过计算机视觉这一块的人应该很清楚语义分割和实例分割的区别,图中也给出来了。


因为我需要用它来训练Mask R-CNN,所以我需要的是segmentation的图片,用到的文件夹就包括了三个:Annotation、JPEGImages、SegmentationObject。

接下来就实现如何把segmentation图片和原图片以及标准整合到一个pkl文件中。

代码实现整合:

1.读取必要的库和必要的参数定义:

import cv2
import glob
import os
import numpy as np
import scipy.misc
import scipy.ndimage
import matplotlib.pyplot as plt
import pickle
import sys
from xml.etree import ElementTree
base_path = os.getcwd()
# 调整图片大小
IMAGE_MIN_DIM = 512
IMAGE_MAX_DIM = 512
year = "2012"
Object_path = glob.glob( './VOCdevkit_ %s /VOC %s /SegmentationObject/*.png' % (year,year))
np.random.shuffle(Object_path) # 数据随机打乱

Image_path = './VOCdevkit_ %s /VOC %s /JPEGImages' % (year,year)
Annotations_path = './VOCdevkit_ %s /VOC %s /Annotations' % (year,year)

2.解析xml文件函数定义:

#从标注中得到图片中每个实例对应的类别及其每个实例的bounding box
df analyze_xml( file_name = None ):
class_name = []
rectangle_position = []
tree = ElementTree.parse(file_name)
root = tree.getroot()
for object_tree in root.findall( "object" ):
for bounding_box in object_tree.iter( "bndbox" ):
xmin = int (bounding_box.find( "xmin" ).text)
ymin = int (bounding_box.find( "ymin" ).text)
xmax = int (bounding_box.find( "xmax" ).text)
ymax = int (bounding_box.find( "ymax" ).text)
postion = [xmin, ymin, xmax, ymax]
rectangle_position.append(postion)
one_class = object_tree.find( "name" ).text
class_name.append(one_class)
return class_name, rectangle_position

这个没有包括人体的姿态部分的bounding box,稍微修改一下就可以实现解析出人体姿态的bounding box

def analyze_xml ( file_name = None ):
class_name = []
rectangle_position = []
part_class = []
part_position = []
tree = ElementTree.parse(file_name)
root = tree.getroot()
for object_tree in root.findall( "object" ):
#寻找object_tree下一级的bndbox
for bounding_box in object_tree.iter( "bndbox" ):
xmin = int (bounding_box.find( "xmin" ).text)
ymin = int (bounding_box.find( "ymin" ).text)
xmax = int (bounding_box.find( "xmax" ).text)
ymax = int (bounding_box.find( "ymax" ).text)
for part in object_tree.findall( "part" ):
for part_box in part.iter( "bndbox" ):
xmin = int (part_box.find( "xmin" ).text)
ymin = int (part_box.find( "ymin" ).text)
xmax = int (part_box.find( "xmax" ).text)
ymax = int (part_box.find( "ymax" ).text)
part_ps = [xmin, ymin, xmax, ymax]
part_position.append(part_ps)
part_cls = part.find( "name" ).text
part_class.append(part_cls)
postion = [xmin, ymin, xmax, ymax]
rectangle_position.append(postion)
one_class = object_tree.find( "name" ).text
class_name.append(one_class)
return class_name, rectangle_position, part_class, part_position


获得object 的全部class:

###得到全部的类别
def analyze_xml_class ( file_names , class_name = []):
'''解析xml的所有类别'''
for file_name in file_names:
with open (file_name) as fp:
for p in fp:
if '' in p:
class_name.append( next (fp).split( '>' )[ 1 ].split( '<' )[ 0 ])

3.对image进行resize,以及相应的对mask进行resize

def resize_image ( image , min_dim = None , max_dim = None , padding = False ):
"""
Resizes an image keeping the aspect ratio.
min_dim: if provided, resizes the image such that it's smaller
dimension == min_dim
max_dim: if provided, ensures that the image longest side doesn't
exceed this value.
padding: If true, pads image with zeros so it's size is max_dim x max_dim
Returns:
image: the resized image
window: (y1, x1, y2, x2). If max_dim is provided, padding might
be inserted in the returned image. If so, this window is the
coordinates of the image part of the full image (excluding
the padding). The x2, y2 pixels are not included.
scale: The scale factor used to resize the image
padding: Padding added to the image [(top, bottom), (left, right), (0, 0)]
"""
# Default window (y1, x1, y2, x2) and default scale == 1.
h, w = image.shape[: 2 ]
window = ( 0 , 0 , h, w)
scale = 1
if min_dim:
# Scale up but not down
scale = max ( 1 , min_dim / min (h, w))
# Does it exceed max dim?
if max_dim:
image_max = max (h, w)
if round (image_max * scale) > max_dim:
scale = max_dim / image_max
# Resize image and mask
if scale != 1 :
image = scipy.misc.imresize(
image, ( round (h * scale), round (w * scale)))
if padding:
# Get new height and width
h, w = image.shape[: 2 ]
top_pad = (max_dim - h) // 2
bottom_pad = max_dim - h - top_pad
left_pad = (max_dim - w) // 2
right_pad = max_dim - w - left_pad
padding = [(top_pad, bottom_pad), (left_pad, right_pad), ( 0 , 0 )]
image = np.pad(image, padding, mode = 'constant' , constant_values = 0 )
window = (top_pad, left_pad, h + top_pad, w + left_pad)
return image, window, scale, padding


def resize_mask ( mask , scale , padding ):
"""Resizes a mask using the given scale and padding.
Typically, you get the scale and padding from resize_image() to
ensure both, the image and the mask, are resized consistently.
scale: mask scaling factor
padding: Padding to add to the mask in the form
[(top, bottom), (left, right), (0, 0)]
"""
h, w = mask.shape[: 2 ]
mask = scipy.ndimage.zoom(mask, zoom = [scale, scale, 1 ], order = 0 )
mask = np.pad(mask, padding, mode = 'constant' , constant_values = 0 )
return mask

4.对数据进行操作,得到数据的所有类别,并放在一个dict中:

class_all_name = []
analyze_xml_class(glob.glob(os.path.join(Annotations_path, '*.xml' )),class_all_name)
class_set = set (class_all_name) # 转成set去除重复的
class_all_name = None
class_list = sorted ( list (class_set)) # 对所有的类别排序
class_set = None
##每个类别对应于一个id,dict = {name:id,.....}
class_dict = dict ( zip (class_list, range ( 1 , len (class_list) + 1 ))) # 类别从1开始,0 默认为背景
# key 和 value 交换一下,class_name_dict = {id:name,....}
class_name_dict = dict ( zip (class_dict.values(),class_dict.keys()))

5.写入到pkl文件中:

object_data = []
object_data.append([class_dict])

# 生成的pickle数据存放在data文件下
if not os.path.exists( './data' ):
os.mkdir( './data' )

for num,path in enumerate (Object_path):
# 进度输出
sys.stdout.write( ' \r >> Converting image %d / %d ' % (
num + 1 , len (Object_path)))
sys.stdout.flush()

file_name = path.split( '/' )[ - 1 ].split( '.' )[ 0 ]
Annotations_path_ = os.path.join(Annotations_path,file_name + '.xml' ) # 对应的xml文件
class_name,rectangle_position = analyze_xml(Annotations_path_)
# 解析对象的mask[h,w,m] m为对象的个数,0、1组成的单通道图像
mask_1 = cv2.imread(path, 0 )
print ( type (mask_1))

masks = []
for rectangle in rectangle_position:
pixel_list = []
mask = np.zeros_like(mask_1,np.uint8)
print (rectangle[ 1 ], rectangle[ 3 ],rectangle[ 0 ], rectangle[ 2 ])
mask[rectangle[ 1 ]:rectangle[ 3 ], rectangle[ 0 ]:rectangle[ 2 ]] = mask_1[rectangle[ 1 ]:rectangle[ 3 ],rectangle[ 0 ]:rectangle[ 2 ]]

mean_x = (rectangle[ 0 ] + rectangle[ 2 ]) // 2
mean_y = (rectangle[ 1 ] + rectangle[ 3 ]) // 2
end = min ((mask.shape[ 1 ], int (rectangle[ 2 ]) + 1 ))
start = max (( 0 , int (rectangle[ 0 ]) - 1 ))

flag = True
for i in range (mean_x,end):
x_ = i;y_ = mean_y
pixels = mask_1[y_, x_]
pixel_list.append(pixels)
if pixels != 0 and pixels != 220 : # 0 对应背景 220对应边界线
temp = mask == pixels
mask = (mask == pixels).astype(np.uint8)

flag = False
break

if flag:
for i in range (mean_x - 1 , start, - 1 ):
x_ = i;y_ = mean_y
pixels = mask_1[y_, x_]
if pixels != 0 and pixels != 220 :
mask = (mask == pixels).astype(np.uint8)
break

masks.append(mask)
# mask转成[h,w,m]格式
masks = np.asarray(masks,np.uint8).transpose([ 1 , 2 , 0 ]) # [h,w,m]
# class name 与class id 对应
class_id = []
[class_id.append(class_dict[i]) for i in class_name]
class_id = np.asarray(class_id,np.uint8) # [m,]

mask_1 = None

# images 原图像
image = cv2.imread(os.path.join(Image_path, file_name + '.jpg' ))
# image = cv2.resize(image, (h, w)) # /255. # 不需要转到0.~1. 程序内部会自动进行归一化处理

# 图像与mask都进行缩放
image, _, scale, padding = resize_image(image, min_dim = IMAGE_MIN_DIM, max_dim = IMAGE_MAX_DIM, padding = True )
masks = resize_mask(masks, scale, padding)


object_data.append([image,masks,class_id])
if num > 0 and num % 700 == 0 :
with open ( './data/data_' + str (num) + '.pkl' , 'wb' ) as fp:
pickle.dump(object_data,fp)
object_data = []
#object_data.append([class_dict])

if num == len (Object_path) - 1 and object_data != None :
with open ( './data/data_' + str (num) + '.pkl' , 'wb' ) as fp:
pickle.dump(object_data, fp)
object_data = None

sys.stdout.write( ' \n ' )
sys.stdout.flush()


6.测试一下保存的pkl值:

from matplotlib import pyplot as plt 
import pickle
#读取保存的pkl文件
pkl_path='./data/data_700.pkl'
with open(pkl_path,'rb') as fp:
    data=pickle.load(fp)

读取序号为14的信息。data[0][0]存放的是class_name和class_ids;data[num][1][:,:,1]存放的第二个mask的值

num = 14
plt.imshow(data[num][1][:,:,1])
plt.show()
class_dict = data[0][0]
class_dict = {v:k for k, v in class_dict.items()}
print(data[num][2])
for i in data[num][2]:
    print(class_dict[i])       

输出为:

[ 5 15 15 15 15]
bottle
person
person
person
person

看一下原图:

plt.imshow(data[num][0])
plt.show()
data[num][0].shape
输出为:

(512, 512, 3)





你可能感兴趣的:(Object,Detection,deeplearning)