如何预处理数据集 Visual Genome

Visual Genome

本文中所用的代码绝大部分都来自 image generation from scene graph ,配合源代码食用效果更佳。

构建 download_vg.sh 下载数据库(可选)

bash 程序:

mkdir -p $VG_DIR

wget https://visualgenome.org/static/data/dataset/objects.json.zip -O $VG_DIR/objects.json.zip
wget https://visualgenome.org/static/data/dataset/attributes.json.zip -O $VG_DIR/attributes.json.zip
wget https://visualgenome.org/static/data/dataset/relationships.json.zip -O $VG_DIR/relationships.json.zip
wget https://visualgenome.org/static/data/dataset/object_alias.txt -O $VG_DIR/object_alias.txt
wget https://visualgenome.org/static/data/dataset/relationship_alias.txt -O $VG_DIR/relationship_alias.txt
wget https://visualgenome.org/static/data/dataset/image_data.json.zip -O $VG_DIR/image_data.json.zip
wget https://cs.stanford.edu/people/rak248/VG_100K_2/images.zip -O $VG_DIR/images.zip
wget https://cs.stanford.edu/people/rak248/VG_100K_2/images2.zip -O $VG_DIR/images2.zip

unzip $VG_DIR/objects.json.zip -d $VG_DIR
unzip $VG_DIR/attributes.json.zip -d $VG_DIR
unzip $VG_DIR/relationships.json.zip -d $VG_DIR
unzip $VG_DIR/image_data.json.zip -d $VG_DIR
unzip $VG_DIR/images.zip -d $VG_DIR/images
unzip $VG_DIR/images2.zip -d $VG_DIR/images


bash download_vg.sh


image_data.json 文件包含了 VG 数据库所有图片的信息,对总共的108,077 张图片分别给出了描述,描述内容包括但不限于图片的宽高,url,id 等。

Python 读取示例:

import json

with open('image_data.json', 'r') as f:
	##读取出的 images 是 list 类型,长度是108,077
	images = json.load(f)

##构建 dict 实现从 id 到 image 信息的一一对应
image_id_to_image = {
     i['image_id']: i for i in images}

##输出一个示例看看 image 信息
for each in images:
	print('Information of a single image: \n', each)


Information of a single image: 
     'width': 800, 'url': 'https://cs.stanford.edu/people/rak248/VG_100K_2/1.jpg', 'height': 600, 'image_id': 1, 'coco_id': None, 'flickr_id': None}

创建 vg_splits.json 将数据集分为 train, val 和 test

Python 代码:

import json, random

##图片总数为108077,分为 train(80%) 86463,validation(10%) 10807,test(10%) 10807
nums, train_num, val_num = 108077, 86463, 10807

id_store = set(range(1, nums + 1))

##从 id_store 中取出用作 train 的图片 id
train_ids = random.sample(id_store, train_num)

##计算出剩下的 id
id_remain = id_store.difference(train_ids)

##从 id_remain 中取出用作 validation 的图片 id
val_ids = random.sample(id_remain, val_num)

##剩下的图片 id 用作 test
id_remain = id_remain.difference(val_ids)
test_ids = list(id_remain)

##将这些 id 都放入 dict 类型中
split_dict = {
     "train":train_ids, "val":val_ids, "test":test_ids}

##将类型从 dict 转换为 str, 否则存储为 json 文件的时候会报错
split_str = json.dumps(split_dict)

##存储为 json 文件,默认存储路径为 './data/vg_splits.json',可根据实际情况修改
with open('data/vg_splits.json', 'w') as f:

测试是否正确存储为 json 文件:

splits_json = './data/vg_splits.json'
with open(splits_json, 'r') as f:
	splits = json.load(f)
for split_name, split_list in splits.items():
	print(split_name, type(split_list), len(split_list))


train <class 'list'> 86463
val <class 'list'> 10807
test <class 'list'> 10807

去除 train, val 和 test 中尺寸较小的图片

Python 程序:

def remove_small_images(min_image_size, image_id_to_image, splits):
	new_splits = {
	for split_name, image_ids in splits.items():
		new_image_ids = []
		num_skipped = 0
		for image_id in image_ids:
			image = image_id_to_image[image_id]
			height, width = image['height'], image['width']
			if min(height, width) < min_image_size:
				num_skipped += 1
		new_splits[split_name] = new_image_ids
		print('Removed %d images from split "%s" for being too small' %
		(num_skipped, split_name))
	return new_splits
min_image_size = 200

##读取储存图片 id 的 json 文件
splits_json = './data/vg_splits.json'
with open(splits_json, 'r') as f:
	splits = json.load(f)

##去除尺寸较小的图片 id
splits = remove_small_images(min_image_size, image_id_to_image, splits)


Removed 335 images from split "train" for being too small
Removed 46 images from split "val" for being too small
Removed 45 images from split "test" for being too small

处理物体和关系的别名 (object_alias.txt 和 relationship_alias.txt)

同一个物体类别可以有不同的名称(例如名称有单复数的区别,但表示的还是同一类物体),物体之间的关系描述也是如此,就像 in 和 inside of 表示的也是同一类关系。

Python 代码

def load_aliases(alias_path):
	aliases = {
	with open(alias_path, 'r') as f:
		for line in f:
			## strip() remove spaces at the beginning and at the end of the string
			line = [s.strip() for s in line.split(',')]
			for s in line:
				aliases[s] = line[0]
	return aliases
##默认路径是 'object_alias.txt' 和 'relationship_alias.txt',可根据实际情况改变
obj_aliases = load_aliases('object_alias.txt')
rel_aliases = load_aliases('relationship_alias.txt')


读取 objects.json 文件,并构建字典 object_name_to_idx 实现物体名称和编号的一一对应。当然,可以预先设置一些过滤器过滤掉出现次数较少的 object,这能让训练出来的模型表现效果更好。

Python 代码:

def create_object_vocab(min_object_instances, image_ids, objects, aliases, vocab):
	image_ids = set(image_ids)

	print('Making object vocab from %d training images' % len(image_ids))

	##使用 Counter 类型来统计物体出现的次数,方便过滤
	object_name_counter = Counter()
	for image in objects:
		if image['image_id'] not in image_ids:
		for obj in image['objects']:
			names = set()
			##需要注意的是, object 的名字是存在 list 中的,但我看了几个例子,
			##没有发现 list 放了物体的多个名字,都只有一个名字。
			for name in obj['names']:
				names.add(aliases.get(name, name))

	object_names = ['__image__']
	for name, count in object_name_counter.most_common():
		if count >= min_object_instances:
	print('Found %d object categories with >= %d training instances' %
	(len(object_names), min_object_instances))
	##我能想到的解释是作者不希望让字典的 key 是 int 类型。
	object_name_to_idx = {
	object_idx_to_name = []
	for idx, name in enumerate(object_names):
		object_name_to_idx[name] = idx

	vocab['object_name_to_idx'] = object_name_to_idx
	vocab['object_idx_to_name'] = object_idx_to_name
##objects.json 的默认路径为 './objects.json',可根据实际情况修改。
with open('objects.json', 'r') as f:
	objects = json.load(f)
print('type and length of objects json', type(objects), len(objects))

min_object_instances = 2000
vocab = {
##splits 和 obj_aliases 都在上文生成过了
train_ids = splits['train']
create_object_vocab(min_object_instances, train_ids, objects, obj_aliases, vocab)


type and length of objects json <class 'list'> 108077

     'image_id': 1, 'objects': [{
     'synsets': ['tree.n.01'], 'h': 557, 'object_id': 1058549, 'merged_object_ids': [], 'names': ['trees'], 'w': 799, 'y': 0, 'x': 0}, {
     'synsets': ['sidewalk.n.01'], 'h': 290, 'object_id': 1058534, 'merged_object_ids': [5046], 'names': ['sidewalk'], 'w': 722, 'y': 308, 'x': 78}, {
     'synsets': ['building.n.01'], 'h': 538, 'object_id': 1058508, 'merged_object_ids': [], 'names': ['building'], 'w': 222, 'y': 0, 'x': 1}, {
     'synsets': ['street.n.01'], 'h': 258, 'object_id': 1058539, 'merged_object_ids': [3798578], 'names': ['street'], 'w': 359, 'y': 283, 'x': 439}, {
     'synsets': ['wall.n.01'], 'h': 535, 'object_id': 1058543, 'merged_object_ids': [], 'names': ['wall'], 'w': 135, 'y': 1, 'x': 0}, {
     'synsets': ['tree.n.01'], 'h': 360, 'object_id': 1058545, 'merged_object_ids': [], 'names': ['tree'], 'w': 476, 'y': 0, 'x': 178}, {
     'synsets': ['shade.n.01'], 'h': 189, 'object_id': 5045, 'merged_object_ids': [], 'names': ['shade'], 'w': 274, 'y': 344, 'x': 116}, {
     'synsets': ['van.n.05'], 'h': 176, 'object_id': 1058542, 'merged_object_ids': [1058536], 'names': ['van'], 'w': 241, 'y': 278, 'x': 533}, {
     'synsets': ['trunk.n.01'], 'h': 348, 'object_id': 5055, 'merged_object_ids': [], 'names': ['tree trunk'], 'w': 78, 'y': 213, 'x': 623}, {
     'synsets': ['clock.n.01'], 'h': 363, 'object_id': 1058498, 'merged_object_ids': [], 'names': ['clock'], 'w': 77, 'y': 63, 'x': 422}, {
     'synsets': ['window.n.01'], 'h': 147, 'object_id': 3798579, 'merged_object_ids': [], 'names': ['windows'], 'w': 198, 'y': 1, 'x': 602}, {
     'synsets': ['man.n.01'], 'h': 248, 'object_id': 3798576, 'merged_object_ids': [1058540], 'names': ['man'], 'w': 82, 'y': 264, 'x': 367}, {
     'synsets': ['man.n.01'], 'h': 259, 'object_id': 3798577, 'merged_object_ids': [], 'names': ['man'], 'w': 57, 'y': 254, 'x': 238}, {
     'synsets': [], 'h': 430, 'object_id': 1058548, 'merged_object_ids': [], 'names': ['lamp post'], 'w': 43, 'y': 63, 'x': 537}, {
     'synsets': ['sign.n.02'], 'h': 179, 'object_id': 1058507, 'merged_object_ids': [], 'names': ['sign'], 'w': 78, 'y': 13, 'x': 123}, {
     'synsets': ['car.n.01'], 'h': 164, 'object_id': 1058515, 'merged_object_ids': [], 'names': ['car'], 'w': 80, 'y': 342, 'x': 719}, {
     'synsets': ['back.n.01'], 'h': 164, 'object_id': 5060, 'merged_object_ids': [], 'names': ['back'], 'w': 70, 'y': 345, 'x': 716}, {
     'synsets': ['jacket.n.01'], 'h': 98, 'object_id': 1058530, 'merged_object_ids': [], 'names': ['jacket'], 'w': 82, 'y': 296, 'x': 367}, {
     'synsets': ['car.n.01'], 'h': 95, 'object_id': 5049, 'merged_object_ids': [], 'names': ['car'], 'w': 78, 'y': 319, 'x': 478}, {
     'synsets': ['trouser.n.01'], 'h': 128, 'object_id': 1058531, 'merged_object_ids': [], 'names': ['pants'], 'w': 48, 'y': 369, 'x': 388}, {
     'synsets': ['shirt.n.01'], 'h': 103, 'object_id': 1058511, 'merged_object_ids': [], 'names': ['shirt'], 'w': 54, 'y': 287, 'x': 241}, {
     'synsets': ['parking_meter.n.01'], 'h': 143, 'object_id': 1058519, 'merged_object_ids': [], 'names': ['parking meter'], 'w': 26, 'y': 325, 'x': 577}, {
     'synsets': ['trouser.n.01'], 'h': 118, 'object_id': 1058528, 'merged_object_ids': [], 'names': ['pants'], 'w': 44, 'y': 384, 'x': 245}, {
     'synsets': ['shirt.n.01'], 'h': 102, 'object_id': 1058547, 'merged_object_ids': [], 'names': ['shirt'], 'w': 82, 'y': 295, 'x': 368}, {
     'synsets': ['shoe.n.01'], 'h': 28, 'object_id': 1058525, 'merged_object_ids': [5048], 'names': ['shoes'], 'w': 48, 'y': 485, 'x': 388}, {
     'synsets': ['arm.n.01'], 'h': 41, 'object_id': 1058546, 'merged_object_ids': [], 'names': ['arm'], 'w': 30, 'y': 285, 'x': 370}, {
     'synsets': ['bicycle.n.01'], 'h': 36, 'object_id': 1058535, 'merged_object_ids': [], 'names': ['bike'], 'w': 27, 'y': 319, 'x': 337}, {
     'synsets': ['bicycle.n.01'], 'h': 41, 'object_id': 5051, 'merged_object_ids': [], 'names': ['bike'], 'w': 27, 'y': 311, 'x': 321}, {
     'synsets': ['headlight.n.01'], 'h': 9, 'object_id': 5050, 'merged_object_ids': [], 'names': ['headlight'], 'w': 18, 'y': 370, 'x': 517}, {
     'synsets': ['spectacles.n.01'], 'h': 23, 'object_id': 1058518, 'merged_object_ids': [], 'names': ['glasses'], 'w': 43, 'y': 317, 'x': 448}, {
     'synsets': ['chin.n.01'], 'h': 8, 'object_id': 1058541, 'merged_object_ids': [], 'names': ['chin'], 'w': 9, 'y': 288, 'x': 401}], 'image_url': 'https://cs.stanford.edu/people/rak248/VG_100K_2/1.jpg'}

Making object vocab from 86128 training images
Found 179 object categories with >= 2000 training instances


在上面读取 image_data.json 这一步骤中,我们实现了从 image id 到 image 的一一对应。现在我们构建完成了 object_name_to_idx (当然中间已经根据物体出现的次数过滤掉了一部分物体),接下来需要根据每个物体的尺寸进一步过滤物体,并实现从 object id 到 object (object 的 name, idx 和 box) 的一一对应。看到代码实现过程,不禁心疼 CPU 一秒钟。

Python 代码:

def filter_objects(min_object_size, objects, aliases, vocab, splits):
	object_id_to_objects = {
	all_image_ids = set()
	for image_ids in splits.values():
		all_image_ids |= set(image_ids)

	object_name_to_idx = vocab['object_name_to_idx']
	object_id_to_obj = {

	num_too_small = 0
	for image in objects:
		image_id = image['image_id']
		if image_id not in all_image_ids:
		for obj in image['objects']:
			object_id = obj['object_id']
			final_name = None
			final_name_idx = None
			for name in obj['names']:
				name = aliases.get(name, name)
				if name in object_name_to_idx:
					final_name = name
					final_name_idx = object_name_to_idx[final_name]
			w, h = obj['w'], obj['h']
			too_small = (w < min_object_size) or (h < min_object_size)
			if too_small:
				num_too_small += 1
			if final_name is not None and not too_small:
				object_id_to_obj[object_id] = {
				'name': final_name,
				'name_idx': final_name_idx,
				'box': [obj['x'], obj['y'], obj['w'], obj['h']],
	print('Skipped %d objects with size < %d' % (num_too_small, min_object_size))
	return object_id_to_obj

min_object_size = 32
object_id_to_obj = filter_objects(min_object_size, objects, obj_aliases, vocab, splits)


Skipped 997213 objects with size < 32


读取 attributes.json 文件,并构建字典 attribute_name_to_idx 实现属性名称和编号的一一对应。当然,可以预先设置一些过滤器过滤掉出现次数较少的 attribute,这能让训练出来的模型表现效果更好。该处理过程与处理 objects.json 类似。

Python 代码:

def create_attribute_vocab(min_attribute_instances, image_ids, attributes, vocab):
	image_ids = set(image_ids)
	print('Making attribute vocab from %d training images' % len(image_ids))
	attribute_name_counter = Counter()
	for image in attributes:
		if image['image_id'] not in image_ids:
		for attribute in image['attributes']:
			names = set()
			##这里用 try ... except 是因为图片中的有些物体并没有 attributes,
			##如果不用这个语法就会报 KeyError 的错。
				for name in attribute['attributes']:
			except KeyError:
	attribute_names = []
	for name, count in attribute_name_counter.most_common():
		if count >= min_attribute_instances:
	print('Found %d attribute categories with >= %d training instances' %
		(len(attribute_names), min_attribute_instances))

	attribute_name_to_idx = {
	attribute_idx_to_name = []
	for idx, name in enumerate(attribute_names):
		attribute_name_to_idx[name] = idx
	vocab['attribute_name_to_idx'] = attribute_name_to_idx
	vocab['attribute_idx_to_name'] = attribute_idx_to_name

##attributes.json 的默认路径为 './attributes.json',可根据实际情况修改。
with open('attributes.json', 'r') as f:
	attributes = json.load(f)
print('type of attributes json', type(attributes), len(attributes))

min_attribute_instances = 2000
create_attribute_vocab(min_attribute_instances, train_ids, attributes, vocab)


type of attributes json <class 'list'> 108077

     'image_id': 1, 'attributes': [{
     'synsets': ['clock.n.01'], 'h': 339, 'object_id': 1058498, 'names': ['clock'], 'w': 79, 'attributes': ['green', 'tall'], 'y': 91, 'x': 421}, {
     'synsets': ['street.n.01'], 'h': 262, 'object_id': 5046, 'names': ['street'], 'w': 714, 'attributes': ['sidewalk'], 'y': 328, 'x': 77}, {
     'synsets': ['shade.n.01'], 'h': 192, 'object_id': 5045, 'names': ['shade'], 'w': 274, 'y': 338, 'x': 119}, {
     'synsets': ['man.n.01'], 'h': 262, 'object_id': 1058529, 'names': ['man'], 'w': 60, 'y': 249, 'x': 238}, {
     'synsets': ['gym_shoe.n.01'], 'h': 26, 'object_id': 5048, 'names': ['sneakers'], 'w': 52, 'attributes': ['grey'], 'y': 489, 'x': 243}, {
     'synsets': ['headlight.n.01'], 'h': 15, 'object_id': 5050, 'names': ['headlight'], 'w': 23, 'attributes': ['off'], 'y': 366, 'x': 514}, {
     'synsets': ['car.n.01'], 'h': 98, 'object_id': 5049, 'names': ['car'], 'w': 74, 'y': 315, 'x': 479}, {
     'synsets': ['bicycle.n.01'], 'h': 34, 'object_id': 5051, 'names': ['bike'], 'w': 28, 'attributes': ['parked', 'far away'], 'y': 319, 'x': 318}, {
     'synsets': ['bicycle.n.01'], 'h': 35, 'object_id': 1058535, 'names': ['bike'], 'w': 29, 'attributes': ['parked', 'far away', 'chained'], 'y': 319, 'x': 334}, {
     'synsets': ['sign.n.02'], 'h': 182, 'object_id': 1058507, 'names': ['sign'], 'w': 88, 'attributes': ['black'], 'y': 13, 'x': 118}, {
     'synsets': ['building.n.01'], 'h': 536, 'object_id': 1058508, 'names': ['building'], 'w': 218, 'attributes': ['tall', 'brick', 'made of bricks'], 'y': 2, 'x': 1}, {
     'synsets': ['trunk.n.01'], 'h': 327, 'object_id': 5055, 'names': ['tree trunk'], 'w': 87, 'y': 234, 'x': 622}, {
     'synsets': ['sidewalk.n.01'], 'h': 266, 'object_id': 1058534, 'names': ['sidewalk'], 'w': 722, 'attributes': ['brick'], 'y': 331, 'x': 77}, {
     'synsets': ['shirt.n.01'], 'h': 101, 'object_id': 1058511, 'names': ['shirt'], 'w': 59, 'attributes': ['red', 'orange'], 'y': 289, 'x': 241}, {
     'synsets': ['street.n.01'], 'h': 233, 'object_id': 1058539, 'names': ['street'], 'w': 440, 'attributes': ['clean'], 'y': 283, 'x': 358}, {
     'synsets': ['car.n.01'], 'h': 174, 'object_id': 1058515, 'names': ['car'], 'w': 91, 'attributes': ['white', 'parked'], 'y': 342, 'x': 708}, {
     'synsets': ['back.n.01'], 'h': 170, 'object_id': 5060, 'names': ['back'], 'w': 67, 'y': 339, 'x': 721}, {
     'synsets': ['spectacles.n.01'], 'h': 12, 'object_id': 1058518, 'names': ['glasses'], 'w': 20, 'y': 268, 'x': 271}, {
     'synsets': ['parking_meter.n.01'], 'h': 143, 'object_id': 1058519, 'names': ['parking meter'], 'w': 32, 'attributes': ['orange'], 'y': 327, 'x': 574}, {
     'synsets': ['shoe.n.01'], 'h': 34, 'object_id': 1058525, 'names': ['shoes'], 'w': 46, 'attributes': ['brown'], 'y': 481, 'x': 391}, {
     'synsets': ['man.n.01'], 'h': 251, 'object_id': 1058532, 'names': ['man'], 'w': 75, 'y': 264, 'x': 372}, {
     'synsets': ['trouser.n.01'], 'h': 118, 'object_id': 1058528, 'names': ['pants'], 'w': 38, 'attributes': ['black'], 'y': 384, 'x': 245}, {
     'synsets': ['jacket.n.01'], 'h': 97, 'object_id': 1058530, 'names': ['jacket'], 'w': 89, 'attributes': ['gray', 'grey'], 'y': 296, 'x': 356}, {
     'synsets': ['trouser.n.01'], 'h': 128, 'object_id': 1058531, 'names': ['pants'], 'w': 54, 'attributes': ['gray', 'grey'], 'y': 369, 'x': 382}, {
     'synsets': [], 'h': 185, 'object_id': 1058536, 'names': ['work truck'], 'w': 265, 'attributes': ['white'], 'y': 271, 'x': 521}, {
     'synsets': ['sidewalk.n.01'], 'h': 189, 'object_id': 3798575, 'names': ['sidewalk'], 'w': 50, 'y': 318, 'x': 343}, {
     'synsets': ['chin.n.01'], 'h': 9, 'object_id': 1058541, 'names': ['chin'], 'w': 11, 'attributes': ['raised'], 'y': 288, 'x': 399}, {
     'synsets': ['guy.n.01'], 'h': 250, 'object_id': 1058540, 'names': ['guy'], 'w': 82, 'y': 264, 'x': 369}, {
     'synsets': ['van.n.05'], 'h': 134, 'object_id': 1058542, 'names': ['van'], 'w': 233, 'attributes': ['parked', 'white'], 'y': 298, 'x': 529}, {
     'synsets': ['wall.n.01'], 'h': 533, 'object_id': 1058543, 'names': ['wall'], 'w': 134, 'attributes': ['grey'], 'y': 1, 'x': 0}, {
     'synsets': ['tree.n.01'], 'h': 360, 'object_id': 1058545, 'names': ['tree'], 'w': 176, 'y': 0, 'x': 249}, {
     'synsets': ['bicycle.n.01'], 'h': 35, 'object_id': 1058544, 'names': ['bikes'], 'w': 40, 'y': 319, 'x': 321}, {
     'synsets': ['arm.n.01'], 'h': 43, 'object_id': 1058546, 'names': ['arm'], 'w': 32, 'attributes': ['raised'], 'y': 283, 'x': 368}, {
     'synsets': ['shirt.n.01'], 'h': 66, 'object_id': 1058547, 'names': ['shirt'], 'w': 37, 'attributes': ['grey'], 'y': 306, 'x': 384}, {
     'synsets': ['man.n.01'], 'h': 248, 'object_id': 3798576, 'names': ['man'], 'w': 97, 'y': 264, 'x': 362}, {
     'synsets': ['man.n.01'], 'h': 264, 'object_id': 3798577, 'names': ['man'], 'w': 72, 'y': 251, 'x': 230}, {
     'synsets': ['road.n.01'], 'h': 218, 'object_id': 3798578, 'names': ['road'], 'w': 340, 'y': 295, 'x': 435}, {
     'synsets': [], 'h': 430, 'object_id': 1058548, 'names': ['lamp post'], 'w': 41, 'y': 63, 'x': 537}, {
     'synsets': ['tree.n.01'], 'h': 557, 'object_id': 1058549, 'names': ['trees'], 'w': 606, 'attributes': ['sparse'], 'y': 0, 'x': 190}, {
     'synsets': ['window.n.01'], 'h': 148, 'object_id': 3798579, 'names': ['windows'], 'w': 173, 'y': 4, 'x': 602}]}

Making attribute vocab from 86128 training images
Found 80 attribute categories with >= 2000 training instances


读取 relationships.json 文件,并构建字典 pred_name_to_idx 实现关系名称和编号的一一对应。当然,可以预先设置一些过滤器过滤掉出现次数较少的 relationship,这能让训练出来的模型表现效果更好。

Python 代码:

def create_rel_vocab(min_relationship_instances, image_ids, relationships, 
                     object_id_to_obj, rel_aliases, vocab):
	pred_counter = defaultdict(int)
	image_ids_set = set(image_ids)
	for image in relationships:
		image_id = image['image_id']
		if image_id not in image_ids_set:
		for rel in image['relationships']:
			sid = rel['subject']['object_id']
			oid = rel['object']['object_id']
			found_subject = sid in object_id_to_obj
			found_object = oid in object_id_to_obj
			if not found_subject or not found_object:
			pred = rel['predicate'].lower().strip()
			pred = rel_aliases.get(pred, pred)
			rel['predicate'] = pred
			pred_counter[pred] += 1

	pred_names = ['__in_image__']
	for pred, count in pred_counter.items():
		if count >= min_relationship_instances:
	print('Found %d relationship types with >= %d training instances'
		% (len(pred_names), min_relationship_instances))

	pred_name_to_idx = {
	pred_idx_to_name = []
	for idx, name in enumerate(pred_names):
		pred_name_to_idx[name] = idx

	vocab['pred_name_to_idx'] = pred_name_to_idx
	vocab['pred_idx_to_name'] = pred_idx_to_name

##relationships.json 的默认路径是 './relationships.json',可根据实际情况修改。
with open('relationships.json', 'r') as f:
	relationships = json.load(f)
print('type of relationships json', type(relationships), len(relationships))
min_relationship_instances = 500
create_rel_vocab(min_relationship_instances, train_ids, relationships,
	             object_id_to_obj, rel_aliases, vocab)


type of relationships json <class 'list'> 108077

     'relationships': [{
     'predicate': 'ON', 'object': {
     'h': 290, 'object_id': 1058534, 'merged_object_ids': [5046], 'synsets': ['sidewalk.n.01'], 'w': 722, 'y': 308, 'x': 78, 'names': ['sidewalk']}, 'relationship_id': 15927, 'synsets': ['along.r.01'], 'subject': {
     'name': 'shade', 'h': 192, 'synsets': ['shade.n.01'], 'object_id': 5045, 'w': 274, 'y': 338, 'x': 119}}, {
     'predicate': 'wears', 'object': {
     'h': 28, 'object_id': 1058525, 'merged_object_ids': [5048], 'synsets': ['shoe.n.01'], 'w': 48, 'y': 485, 'x': 388, 'names': ['shoes']}, 'relationship_id': 15928, 'synsets': ['wear.v.01'], 'subject': {
     'name': 'man', 'h': 262, 'synsets': ['man.n.01'], 'object_id': 1058529, 'w': 60, 'y': 249, 'x': 238}}, {
     'predicate': 'has', 'object': {
     'name': 'headlight', 'h': 15, 'synsets': ['headlight.n.01'], 'object_id': 5050, 'w': 23, 'y': 366, 'x': 514}, 'relationship_id': 15929, 'synsets': ['have.v.01'], 'subject': {
     'name': 'car', 'h': 98, 'synsets': ['car.n.01'], 'object_id': 5049, 'w': 74, 'y': 315, 'x': 479}}, {
     'predicate': 'ON', 'object': {
     'name': 'building', 'h': 536, 'synsets': ['building.n.01'], 'object_id': 1058508, 'w': 218, 'y': 2, 'x': 1}, 'relationship_id': 15930, 'synsets': ['along.r.01'], 'subject': {
     'name': 'sign', 'h': 182, 'synsets': ['sign.n.02'], 'object_id': 1058507, 'w': 88, 'y': 13, 'x': 118}}, {
     'predicate': 'ON', 'object': {
     'name': 'sidewalk', 'h': 266, 'synsets': ['sidewalk.n.01'], 'object_id': 1058534, 'w': 722, 'y': 331, 'x': 77}, 'relationship_id': 15931, 'synsets': ['along.r.01'], 'subject': {
     'name': 'tree trunk', 'h': 327, 'synsets': ['trunk.n.01'], 'object_id': 5055, 'w': 87, 'y': 234, 'x': 622}}, {
     'predicate': 'has', 'object': {
     'name': 'shirt', 'h': 101, 'synsets': ['shirt.n.01'], 'object_id': 1058511, 'w': 59, 'y': 289, 'x': 241}, 'relationship_id': 15932, 'synsets': ['have.v.01'], 'subject': {
     'name': 'man', 'h': 262, 'synsets': ['man.n.01'], 'object_id': 1058529, 'w': 60, 'y': 249, 'x': 238}}, {
     'predicate': 'next to', 'object': {
     'name': 'street', 'h': 233, 'synsets': ['street.n.01'], 'object_id': 1058539, 'w': 440, 'y': 283, 'x': 358}, 'relationship_id': 15933, 'synsets': ['next.r.01'], 'subject': {
     'name': 'sidewalk', 'h': 266, 'synsets': ['sidewalk.n.01'], 'object_id': 1058534, 'w': 722, 'y': 331, 'x': 77}}, {
     'predicate': 'has', 'object': {
     'name': 'back', 'h': 170, 'synsets': ['back.n.01'], 'object_id': 5060, 'w': 67, 'y': 339, 'x': 721}, 'relationship_id': 15934, 'synsets': ['have.v.01'], 'subject': {
     'name': 'car', 'h': 174, 'synsets': ['car.n.01'], 'object_id': 1058515, 'w': 91, 'y': 342, 'x': 708}}, {
     'predicate': 'has', 'object': {
     'name': 'glasses', 'h': 12, 'synsets': ['spectacles.n.01'], 'object_id': 1058518, 'w': 20, 'y': 268, 'x': 271}, 'relationship_id': 15935, 'synsets': ['have.v.01'], 'subject': {
     'name': 'man', 'h': 262, 'synsets': ['man.n.01'], 'object_id': 1058529, 'w': 60, 'y': 249, 'x': 238}}, {
     'predicate': 'ON', 'object': {
     'name': 'sidewalk', 'h': 266, 'synsets': ['sidewalk.n.01'], 'object_id': 1058534, 'w': 722, 'y': 331, 'x': 77}, 'relationship_id': 15936, 'synsets': ['along.r.01'], 'subject': {
     'name': 'parking meter', 'h': 143, 'synsets': ['parking_meter.n.01'], 'object_id': 1058519, 'w': 32, 'y': 327, 'x': 574}}, {
     'predicate': 'wears', 'object': {
     'h': 28, 'object_id': 1058525, 'merged_object_ids': [5048], 'synsets': ['shoe.n.01'], 'w': 48, 'y': 485, 'x': 388, 'names': ['shoes']}, 'relationship_id': 15937, 'synsets': ['wear.v.01'], 'subject': {
     'name': 'man', 'h': 262, 'synsets': ['man.n.01'], 'object_id': 1058529, 'w': 60, 'y': 249, 'x': 238}}, {
     'predicate': 'has', 'object': {
     'name': 'shoes', 'h': 34, 'synsets': ['shoe.n.01'], 'object_id': 1058525, 'w': 46, 'y': 481, 'x': 391}, 'relationship_id': 15938, 'synsets': ['have.v.01'], 'subject': {
     'name': 'man', 'h': 251, 'synsets': ['man.n.01'], 'object_id': 1058532, 'w': 75, 'y': 264, 'x': 372}}, {
     'predicate': 'has', 'object': {
     'name': 'shirt', 'h': 101, 'synsets': ['shirt.n.01'], 'object_id': 1058511, 'w': 59, 'y': 289, 'x': 241}, 'relationship_id': 15939, 'synsets': ['have.v.01'], 'subject': {
     'name': 'man', 'h': 262, 'synsets': ['man.n.01'], 'object_id': 1058529, 'w': 60, 'y': 249, 'x': 238}}, {
     'predicate': 'wears', 'object': {
     'name': 'pants', 'h': 118, 'synsets': ['trouser.n.01'], 'object_id': 1058528, 'w': 38, 'y': 384, 'x': 245}, 'relationship_id': 15940, 'synsets': ['wear.v.01'], 'subject': {
     'name': 'man', 'h': 262, 'synsets': ['man.n.01'], 'object_id': 1058529, 'w': 60, 'y': 249, 'x': 238}}, {
     'predicate': 'has', 'object': {
     'name': 'jacket', 'h': 97, 'synsets': ['jacket.n.01'], 'object_id': 1058530, 'w': 89, 'y': 296, 'x': 356}, 'relationship_id': 15941, 'synsets': ['have.v.01'], 'subject': {
     'name': 'man', 'h': 251, 'synsets': ['man.n.01'], 'object_id': 1058532, 'w': 75, 'y': 264, 'x': 372}}, {
     'predicate': 'has', 'object': {
     'name': 'pants', 'h': 128, 'synsets': ['trouser.n.01'], 'object_id': 1058531, 'w': 54, 'y': 369, 'x': 382}, 'relationship_id': 15942, 'synsets': ['have.v.01'], 'subject': {
     'name': 'man', 'h': 251, 'synsets': ['man.n.01'], 'object_id': 1058532, 'w': 75, 'y': 264, 'x': 372}}, {
     'predicate': 'parked on', 'object': {
     'name': 'sidewalk', 'h': 266, 'synsets': ['sidewalk.n.01'], 'object_id': 1058534, 'w': 722, 'y': 331, 'x': 77}, 'relationship_id': 15943, 'synsets': ['along.r.01'], 'subject': {
     'name': 'bike', 'h': 34, 'synsets': ['bicycle.n.01'], 'object_id': 5051, 'w': 28, 'y': 319, 'x': 318}}, {
     'predicate': 'parked on', 'object': {
     'name': 'sidewalk', 'h': 266, 'synsets': ['sidewalk.n.01'], 'object_id': 1058534, 'w': 722, 'y': 331, 'x': 77}, 'relationship_id': 15944, 'synsets': ['along.r.01'], 'subject': {
     'name': 'bike', 'h': 35, 'synsets': ['bicycle.n.01'], 'object_id': 1058535, 'w': 29, 'y': 319, 'x': 334}}, {
     'predicate': 'parked on', 'object': {
     'name': 'street', 'h': 233, 'synsets': ['street.n.01'], 'object_id': 1058539, 'w': 440, 'y': 283, 'x': 358}, 'relationship_id': 15945, 'synsets': ['along.r.01'], 'subject': {
     'h': 176, 'object_id': 1058542, 'merged_object_ids': [1058536], 'synsets': ['van.n.05'], 'w': 241, 'y': 278, 'x': 533, 'names': ['van']}}, {
     'predicate': 'parked on', 'object': {
     'name': 'street', 'h': 233, 'synsets': ['street.n.01'], 'object_id': 1058539, 'w': 440, 'y': 283, 'x': 358}, 'relationship_id': 15946, 'synsets': ['along.r.01'], 'subject': {
     'name': 'car', 'h': 174, 'synsets': ['car.n.01'], 'object_id': 1058515, 'w': 91, 'y': 342, 'x': 708}}, {
     'predicate': 'ON', 'object': {
     'name': 'sidewalk', 'h': 189, 'synsets': ['sidewalk.n.01'], 'object_id': 3798575, 'w': 50, 'y': 318, 'x': 343}, 'relationship_id': 4265923, 'synsets': ['along.r.01'], 'subject': {
     'name': 'bike', 'h': 35, 'synsets': ['bicycle.n.01'], 'object_id': 1058535, 'w': 29, 'y': 319, 'x': 334}}, {
     'predicate': 'behind', 'object': {
     'name': 'man', 'h': 251, 'synsets': ['man.n.01'], 'object_id': 1058532, 'w': 75, 'y': 264, 'x': 372}, 'relationship_id': 3186256, 'synsets': ['behind.r.01'], 'subject': {
     'name': 'parking meter', 'h': 143, 'synsets': ['parking_meter.n.01'], 'object_id': 1058519, 'w': 32, 'y': 327, 'x': 574}}, {
     'predicate': 'holding', 'object': {
     'name': 'chin', 'h': 9, 'synsets': ['chin.n.01'], 'object_id': 1058541, 'w': 11, 'y': 288, 'x': 399}, 'relationship_id': 3186257, 'synsets': ['have.v.01'], 'subject': {
     'h': 248, 'object_id': 3798576, 'merged_object_ids': [1058540], 'synsets': ['man.n.01'], 'w': 82, 'y': 264, 'x': 367, 'names': ['man']}}, {
     'predicate': 'WEARING', 'object': {
     'name': 'shirt', 'h': 101, 'synsets': ['shirt.n.01'], 'object_id': 1058511, 'w': 59, 'y': 289, 'x': 241}, 'relationship_id': 3186258, 'synsets': ['wear.v.01'], 'subject': {
     'name': 'man', 'h': 262, 'synsets': ['man.n.01'], 'object_id': 1058529, 'w': 60, 'y': 249, 'x': 238}}, {
     'predicate': 'holding', 'object': {
     'name': 'chin', 'h': 9, 'synsets': ['chin.n.01'], 'object_id': 1058541, 'w': 11, 'y': 288, 'x': 399}, 'relationship_id': 3186259, 'synsets': ['have.v.01'], 'subject': {
     'name': 'man', 'h': 251, 'synsets': ['man.n.01'], 'object_id': 1058532, 'w': 75, 'y': 264, 'x': 372}}, {
     'predicate': 'near', 'object': {
     'name': 'tree', 'h': 360, 'synsets': ['tree.n.01'], 'object_id': 1058545, 'w': 176, 'y': 0, 'x': 249}, 'relationship_id': 3186260, 'synsets': ['about.r.07'], 'subject': {
     'name': 'bikes', 'h': 35, 'synsets': ['bicycle.n.01'], 'object_id': 1058544, 'w': 40, 'y': 319, 'x': 321}}, {
     'predicate': 'WEARING', 'object': {
     'name': 'shoes', 'h': 34, 'synsets': ['shoe.n.01'], 'object_id': 1058525, 'w': 46, 'y': 481, 'x': 391}, 'relationship_id': 3186261, 'synsets': ['wear.v.01'], 'subject': {
     'name': 'man', 'h': 251, 'synsets': ['man.n.01'], 'object_id': 1058532, 'w': 75, 'y': 264, 'x': 372}}, {
     'predicate': 'near', 'object': {
     'name': 'tree', 'h': 360, 'synsets': ['tree.n.01'], 'object_id': 1058545, 'w': 176, 'y': 0, 'x': 249}, 'relationship_id': 3186262, 'synsets': ['about.r.07'], 'subject': {
     'name': 'bikes', 'h': 35, 'synsets': ['bicycle.n.01'], 'object_id': 1058544, 'w': 40, 'y': 319, 'x': 321}}, {
     'predicate': 'ON', 'object': {
     'name': 'man', 'h': 262, 'synsets': ['man.n.01'], 'object_id': 1058529, 'w': 60, 'y': 249, 'x': 238}, 'relationship_id': 3186263, 'synsets': ['along.r.01'], 'subject': {
     'name': 'shirt', 'h': 101, 'synsets': ['shirt.n.01'], 'object_id': 1058511, 'w': 59, 'y': 289, 'x': 241}}, {
     'predicate': 'holding', 'object': {
     'name': 'chin', 'h': 9, 'synsets': ['chin.n.01'], 'object_id': 1058541, 'w': 11, 'y': 288, 'x': 399}, 'relationship_id': 4265924, 'synsets': ['have.v.01'], 'subject': {
     'name': 'man', 'h': 248, 'synsets': ['man.n.01'], 'object_id': 3798576, 'w': 97, 'y': 264, 'x': 362}}, {
     'predicate': 'WEARING', 'object': {
     'name': 'glasses', 'h': 12, 'synsets': ['spectacles.n.01'], 'object_id': 1058518, 'w': 20, 'y': 268, 'x': 271}, 'relationship_id': 4265925, 'synsets': ['wear.v.01'], 'subject': {
     'name': 'man', 'h': 264, 'synsets': ['man.n.01'], 'object_id': 3798577, 'w': 72, 'y': 251, 'x': 230}}, {
     'predicate': 'along', 'object': {
     'h': 258, 'object_id': 1058539, 'merged_object_ids': [3798578], 'synsets': ['street.n.01'], 'w': 359, 'y': 283, 'x': 439, 'names': ['street']}, 'relationship_id': 4265926, 'synsets': ['along.r.01'], 'subject': {
     'name': 'lamp post', 'h': 430, 'synsets': [], 'object_id': 1058548, 'w': 41, 'y': 63, 'x': 537}}, {
     'predicate': 'IN', 'object': {
     'name': 'shirt', 'h': 101, 'synsets': ['shirt.n.01'], 'object_id': 1058511, 'w': 59, 'y': 289, 'x': 241}, 'relationship_id': 3186264, 'synsets': ['in.r.01'], 'subject': {
     'name': 'man', 'h': 262, 'synsets': ['man.n.01'], 'object_id': 1058529, 'w': 60, 'y': 249, 'x': 238}}, {
     'predicate': 'WEARING', 'object': {
     'name': 'pants', 'h': 118, 'synsets': ['trouser.n.01'], 'object_id': 1058528, 'w': 38, 'y': 384, 'x': 245}, 'relationship_id': 3186265, 'synsets': ['wear.v.01'], 'subject': {
     'name': 'man', 'h': 262, 'synsets': ['man.n.01'], 'object_id': 1058529, 'w': 60, 'y': 249, 'x': 238}}, {
     'predicate': 'on top of', 'object': {
     'name': 'street', 'h': 233, 'synsets': ['street.n.01'], 'object_id': 1058539, 'w': 440, 'y': 283, 'x': 358}, 'relationship_id': 3186266, 'synsets': ['along.r.01'], 'subject': {
     'name': 'parking meter', 'h': 143, 'synsets': ['parking_meter.n.01'], 'object_id': 1058519, 'w': 32, 'y': 327, 'x': 574}}, {
     'predicate': 'next to', 'object': {
     'name': 'street', 'h': 233, 'synsets': ['street.n.01'], 'object_id': 1058539, 'w': 440, 'y': 283, 'x': 358}, 'relationship_id': 3186267, 'synsets': ['next.r.01'], 'subject': {
     'name': 'tree', 'h': 360, 'synsets': ['tree.n.01'], 'object_id': 1058545, 'w': 176, 'y': 0, 'x': 249}}, {
     'predicate': 'WEARING', 'object': {
     'name': 'glasses', 'h': 12, 'synsets': ['spectacles.n.01'], 'object_id': 1058518, 'w': 20, 'y': 268, 'x': 271}, 'relationship_id': 3186268, 'synsets': ['wear.v.01'], 'subject': {
     'name': 'man', 'h': 262, 'synsets': ['man.n.01'], 'object_id': 1058529, 'w': 60, 'y': 249, 'x': 238}}, {
     'predicate': 'behind', 'object': {
     'name': 'man', 'h': 262, 'synsets': ['man.n.01'], 'object_id': 1058529, 'w': 60, 'y': 249, 'x': 238}, 'relationship_id': 3186269, 'synsets': ['behind.r.01'], 'subject': {
     'name': 'bikes', 'h': 35, 'synsets': ['bicycle.n.01'], 'object_id': 1058544, 'w': 40, 'y': 319, 'x': 321}}, {
     'predicate': 'by', 'object': {
     'name': 'sidewalk', 'h': 266, 'synsets': ['sidewalk.n.01'], 'object_id': 1058534, 'w': 722, 'y': 331, 'x': 77}, 'relationship_id': 3186270, 'synsets': ['by.r.01'], 'subject': {
     'name': 'trees', 'h': 557, 'synsets': ['tree.n.01'], 'object_id': 1058549, 'w': 606, 'y': 0, 'x': 190}}, {
     'predicate': 'WEARING', 'object': {
     'name': 'jacket', 'h': 97, 'synsets': ['jacket.n.01'], 'object_id': 1058530, 'w': 89, 'y': 296, 'x': 356}, 'relationship_id': 3186271, 'synsets': ['wear.v.01'], 'subject': {
     'name': 'man', 'h': 251, 'synsets': ['man.n.01'], 'object_id': 1058532, 'w': 75, 'y': 264, 'x': 372}}, {
     'predicate': 'with', 'object': {
     'name': 'windows', 'h': 148, 'synsets': ['window.n.01'], 'object_id': 3798579, 'w': 173, 'y': 4, 'x': 602}, 'relationship_id': 4265927, 'synsets': [], 'subject': {
     'name': 'building', 'h': 536, 'synsets': ['building.n.01'], 'object_id': 1058508, 'w': 218, 'y': 2, 'x': 1}}], 'image_id': 1}

Found 46 relationship types with >= 500 training instances


到目前为止,我们得到了图片 id 到图片的一一对应 (image_id_to_image),物体的名字到编号的一一对应 (object_name_to_idx),描述的名字到编号的一一对应 (attribute_name_to_idx),关系的名字到编号的一一对应 (pred_name_to_idx),甚至在心疼 CPU 的同时得到了物体的 id 到物体相关信息(‘name’, ‘name_idx’, ‘box’)的一一对应 (object_id_to_obj)。当然了,在得到这些信息的过程中我们都添加了各种各样的过滤器用来筛选出符合我们标准的信息。接下来,激动人心的时刻就要来临了(为什么我的脑海中浮现了凯南的台词),我们需要综合处理我们所得到的所有信息,实现每一张图片信息的标准化,此时,需要心疼 CPU 2秒钟。

Python 代码:

import argparse, json, os
from collections import Counter, defaultdict
import numpy as np
parser = argparse.ArgumentParser()

parser.add_argument('--min_objects_per_image', default=3, type=int)
parser.add_argument('--max_objects_per_image', default=30, type=int)
parser.add_argument('--max_attributes_per_image', default=30, type=int)
parser.add_argument('--min_relationships_per_image', default=1, type=int)
parser.add_argument('--max_relationships_per_image', default=30, type=int)

def encode_graphs(args, splits, objects, relationships, vocab,
                  object_id_to_obj, attributes):

	image_id_to_objects = {
	for image in objects:
		image_id = image['image_id']
		image_id_to_objects[image_id] = image['objects']
	image_id_to_relationships = {
	for image in relationships:
		image_id = image['image_id']
		image_id_to_relationships[image_id] = image['relationships']
	image_id_to_attributes = {
	for image in attributes:
		image_id = image['image_id']
		image_id_to_attributes[image_id] = image['attributes']

	numpy_arrays = {
	for split, image_ids in splits.items():
		skip_stats = defaultdict(int)
		# We need to filter *again* based on number of objects and relationships
		final_image_ids = []
		object_ids = []
		object_names = []
		object_boxes = []
		objects_per_image = []
		relationship_ids = []
		relationship_subjects = []
		relationship_predicates = []
		relationship_objects = []
		relationships_per_image = []
		attribute_ids = []
		attributes_per_object = []
		object_attributes = []
		for image_id in image_ids:
			image_object_ids = []
			image_object_names = []
			image_object_boxes = []
			object_id_to_idx = {
			for obj in image_id_to_objects[image_id]:
				object_id = obj['object_id']
				if object_id not in object_id_to_obj:
				obj = object_id_to_obj[object_id]
				object_id_to_idx[object_id] = len(image_object_ids)
			num_objects = len(image_object_ids)
			too_few = num_objects < args.min_objects_per_image
			too_many = num_objects > args.max_objects_per_image
			if too_few:
				skip_stats['too_few_objects'] += 1
			if too_many:
				skip_stats['too_many_objects'] += 1
			image_rel_ids = []
			image_rel_subs = []
			image_rel_preds = []
			image_rel_objs = []
			for rel in image_id_to_relationships[image_id]:
				relationship_id = rel['relationship_id']
				pred = rel['predicate']
				pred_idx = vocab['pred_name_to_idx'].get(pred, None)
				if pred_idx is None:
				sid = rel['subject']['object_id']
				sidx = object_id_to_idx.get(sid, None)
				oid = rel['object']['object_id']
				oidx = object_id_to_idx.get(oid, None)
				if sidx is None or oidx is None:
			num_relationships = len(image_rel_ids)
			too_few = num_relationships < args.min_relationships_per_image
			too_many = num_relationships > args.max_relationships_per_image
			if too_few:
				skip_stats['too_few_relationships'] += 1
			if too_many:
				skip_stats['too_many_relationships'] += 1

			obj_id_to_attributes = {
			num_attributes = []
			for obj_attribute in image_id_to_attributes[image_id]:
				obj_id_to_attributes[obj_attribute['object_id']] = obj_attribute.get('attributes', None)
			for object_id in image_object_ids:
				attributes = obj_id_to_attributes.get(object_id, None)
				if attributes is None:
					object_attributes.append([-1] * args.max_attributes_per_image)
					attribute_ids = []
					for attribute in attributes:
						if attribute in vocab['attribute_name_to_idx']:
						if len(attribute_ids) >= args.max_attributes_per_image:
					pad_len = args.max_attributes_per_image - len(attribute_ids)
					attribute_ids = attribute_ids + [-1] * pad_len

			# Pad object info out to max_objects_per_image
			while len(image_object_ids) < args.max_objects_per_image:
				image_object_boxes.append([-1, -1, -1, -1])

			# Pad relationship info out to max_relationships_per_image
			while len(image_rel_ids) < args.max_relationships_per_image:


		print('Skip stats for split "%s"' % split)
		for stat, count in skip_stats.items():
			print(stat, count)
		numpy_arrays[split] = {
			'image_ids': np.asarray(final_image_ids),
			'object_ids': np.asarray(object_ids),
			'object_names': np.asarray(object_names),
			'object_boxes': np.asarray(object_boxes),
			'objects_per_image': np.asarray(objects_per_image),
			'relationship_ids': np.asarray(relationship_ids),
			'relationship_subjects': np.asarray(relationship_subjects),
			'relationship_predicates': np.asarray(relationship_predicates),
			'relationship_objects': np.asarray(relationship_objects),
			'relationships_per_image': np.asarray(relationships_per_image),
			'attributes_per_object': np.asarray(attributes_per_object),
			'object_attributes': np.asarray(object_attributes),
		for k, v in numpy_arrays[split].items():
			if v.dtype == np.int64:
				numpy_arrays[split][k] = v.astype(np.int32)
	return numpy_arrays

args = parser.parse_args()
numpy_arrays = encode_graphs(args, splits, objects, relationships, vocab,
                               object_id_to_obj, attributes)

##观察对 train 数据集所做的信息综合                               
for key, value in numpy_arrays['train'].items():
	##输出 value 的类型和长度
	print(key, type(value), len(value))
	##输出每个 value 的第一个元素


Skip stats for split "train"
too_few_relationships 16402
too_few_objects 6794
too_many_objects 187
too_many_relationships 180

Skip stats for split "test"
too_few_relationships 4803
too_few_objects 837
too_many_objects 26

Skip stats for split "val"
too_few_objects 853
too_few_relationships 4815
too_many_objects 27
too_many_relationships 4

image_ids <class 'numpy.ndarray'> 62565

object_ids <class 'numpy.ndarray'> 62565
[1058549 1058534 1058508 1058539 1058543 1058545 1058498 3798579 3798576
 3798577 1058507 1058515    5060 1058530    5049 1058531 1058511 1058528
 1058547      -1      -1      -1      -1      -1      -1      -1      -1
      -1      -1      -1]

object_names <class 'numpy.ndarray'> 62565
[  2  52   7  60   5   2  95   1   3   3   9  19 134  44  19  32   4  32
   4  -1  -1  -1  -1  -1  -1  -1  -1  -1  -1  -1]

object_boxes <class 'numpy.ndarray'> 62565
[[  0   0 799 557]
 [ 78 308 722 290]
 [  1   0 222 538]
 [439 283 359 258]
 [  0   1 135 535]
 [178   0 476 360]
 [422  63  77 363]
 [602   1 198 147]
 [367 264  82 248]
 [238 254  57 259]
 [123  13  78 179]
 [719 342  80 164]
 [716 345  70 164]
 [367 296  82  98]
 [478 319  78  95]
 [388 369  48 128]
 [241 287  54 103]
 [245 384  44 118]
 [368 295  82 102]
 [ -1  -1  -1  -1]
 [ -1  -1  -1  -1]
 [ -1  -1  -1  -1]
 [ -1  -1  -1  -1]
 [ -1  -1  -1  -1]
 [ -1  -1  -1  -1]
 [ -1  -1  -1  -1]
 [ -1  -1  -1  -1]
 [ -1  -1  -1  -1]
 [ -1  -1  -1  -1]
 [ -1  -1  -1  -1]]

objects_per_image <class 'numpy.ndarray'> 62565

relationship_ids <class 'numpy.ndarray'> 62565
[  15930   15933   15934   15946 3186267 3186270 4265927      -1      -1
      -1      -1      -1      -1      -1      -1      -1      -1      -1
      -1      -1      -1      -1      -1      -1      -1      -1      -1
      -1      -1      -1]

relationship_subjects <class 'numpy.ndarray'> 62565
[10  1 11 11  5  0  2 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
 -1 -1 -1 -1 -1 -1]

relationship_predicates <class 'numpy.ndarray'> 62565
[ 1  2  3  4  2  5  6 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
 -1 -1 -1 -1 -1 -1]

relationship_objects <class 'numpy.ndarray'> 62565
[ 2  3 12  3  3  1  7 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
 -1 -1 -1 -1 -1 -1]

relationships_per_image <class 'numpy.ndarray'> 62565

attributes_per_object <class 'numpy.ndarray'> 62565
[ 0  1  2  0  1  0  2  0  0  0  1  2  0  2  0  2  2  1  1 -1 -1 -1 -1 -1
 -1 -1 -1 -1 -1 -1]

object_attributes <class 'numpy.ndarray'> 606319
[-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
 -1 -1 -1 -1 -1 -1]



Pyhton 代码:

import h5py
def get_image_paths(image_id_to_image, image_ids):
	paths = []
	for image_id in image_ids:
		image = image_id_to_image[image_id]
		base, filename = os.path.split(image['url'])
		path = os.path.join(os.path.basename(base), filename)
	return paths

output_h5_dir = './'
output_vocab_json = 'vocab.json'
print('Writing HDF5 output files')
for split_name, split_arrays in numpy_arrays.items():
	image_ids = list(split_arrays['image_ids'].astype(int))
	h5_path = os.path.join(output_h5_dir, '%s.h5' % split_name)
	print('Writing file "%s"' % h5_path)
	with h5py.File(h5_path, 'w') as h5_file:
		for name, ary in split_arrays.items():
			print('Creating datset: ', name, ary.shape, ary.dtype)
			h5_file.create_dataset(name, data=ary)
		print('Writing image paths')
		image_paths = get_image_paths(image_id_to_image, image_ids)
		path_dtype = h5py.special_dtype(vlen=str)
		path_shape = (len(image_paths),)
		path_dset = h5_file.create_dataset('image_paths', path_shape,
		for i, p in enumerate(image_paths):
			path_dset[i] = p

print('Writing vocab to "%s"' % output_vocab_json)
with open(output_vocab_json, 'w') as f:
	json.dump(vocab, f)


Writing HDF5 output files
Writing file "./train.h5"
Creating datset:  image_ids (62565,) int32
Creating datset:  object_ids (62565, 30) int32
Creating datset:  object_names (62565, 30) int32
Creating datset:  object_boxes (62565, 30, 4) int32
Creating datset:  objects_per_image (62565,) int32
Creating datset:  relationship_ids (62565, 30) int32
Creating datset:  relationship_subjects (62565, 30) int32
Creating datset:  relationship_predicates (62565, 30) int32
Creating datset:  relationship_objects (62565, 30) int32
Creating datset:  relationships_per_image (62565,) int32
Creating datset:  attributes_per_object (62565, 30) int32
Creating datset:  object_attributes (606319, 30) int32
Writing image paths

Writing file "./test.h5"
Creating datset:  image_ids (5096,) int32
Creating datset:  object_ids (5096, 30) int32
Creating datset:  object_names (5096, 30) int32
Creating datset:  object_boxes (5096, 30, 4) int32
Creating datset:  objects_per_image (5096,) int32
Creating datset:  relationship_ids (5096, 30) int32
Creating datset:  relationship_subjects (5096, 30) int32
Creating datset:  relationship_predicates (5096, 30) int32
Creating datset:  relationship_objects (5096, 30) int32
Creating datset:  relationships_per_image (5096,) int32
Creating datset:  attributes_per_object (5096, 30) int32
Creating datset:  object_attributes (51626, 30) int32
Writing image paths

Writing file "./val.h5"
Creating datset:  image_ids (5062,) int32
Creating datset:  object_ids (5062, 30) int32
Creating datset:  object_names (5062, 30) int32
Creating datset:  object_boxes (5062, 30, 4) int32
Creating datset:  objects_per_image (5062,) int32
Creating datset:  relationship_ids (5062, 30) int32
Creating datset:  relationship_subjects (5062, 30) int32
Creating datset:  relationship_predicates (5062, 30) int32
Creating datset:  relationship_objects (5062, 30) int32
Creating datset:  relationships_per_image (5062,) int32
Creating datset:  attributes_per_object (5062, 30) int32
Creating datset:  object_attributes (51090, 30) int32
Writing image paths

Writing vocab to "vocab.json"
