本文中所用的代码绝大部分都来自 image generation from scene graph ,配合源代码食用效果更佳。
bash 程序:
VG_DIR=datasets/vg
mkdir -p $VG_DIR
wget https://visualgenome.org/static/data/dataset/objects.json.zip -O $VG_DIR/objects.json.zip
wget https://visualgenome.org/static/data/dataset/attributes.json.zip -O $VG_DIR/attributes.json.zip
wget https://visualgenome.org/static/data/dataset/relationships.json.zip -O $VG_DIR/relationships.json.zip
wget https://visualgenome.org/static/data/dataset/object_alias.txt -O $VG_DIR/object_alias.txt
wget https://visualgenome.org/static/data/dataset/relationship_alias.txt -O $VG_DIR/relationship_alias.txt
wget https://visualgenome.org/static/data/dataset/image_data.json.zip -O $VG_DIR/image_data.json.zip
wget https://cs.stanford.edu/people/rak248/VG_100K_2/images.zip -O $VG_DIR/images.zip
wget https://cs.stanford.edu/people/rak248/VG_100K_2/images2.zip -O $VG_DIR/images2.zip
unzip $VG_DIR/objects.json.zip -d $VG_DIR
unzip $VG_DIR/attributes.json.zip -d $VG_DIR
unzip $VG_DIR/relationships.json.zip -d $VG_DIR
unzip $VG_DIR/image_data.json.zip -d $VG_DIR
unzip $VG_DIR/images.zip -d $VG_DIR/images
unzip $VG_DIR/images2.zip -d $VG_DIR/images
运行:
bash download_vg.sh
image_data.json 文件包含了 VG 数据库所有图片的信息,对总共的108,077 张图片分别给出了描述,描述内容包括但不限于图片的宽高,url,id 等。
Python 读取示例:
import json
##默认路径是'image_data.json',可根据实际情况选择路径
with open('image_data.json', 'r') as f:
##读取出的 images 是 list 类型,长度是108,077
images = json.load(f)
##构建 dict 实现从 id 到 image 信息的一一对应
image_id_to_image = {
i['image_id']: i for i in images}
##输出一个示例看看 image 信息
for each in images:
print('Information of a single image: \n', each)
break
输出:
Information of a single image:
{
'width': 800, 'url': 'https://cs.stanford.edu/people/rak248/VG_100K_2/1.jpg', 'height': 600, 'image_id': 1, 'coco_id': None, 'flickr_id': None}
Python 代码:
import json, random
##图片总数为108077,分为 train(80%) 86463,validation(10%) 10807,test(10%) 10807
nums, train_num, val_num = 108077, 86463, 10807
##创建一个set类型存储所有图片id
id_store = set(range(1, nums + 1))
##从 id_store 中取出用作 train 的图片 id
train_ids = random.sample(id_store, train_num)
##计算出剩下的 id
id_remain = id_store.difference(train_ids)
##从 id_remain 中取出用作 validation 的图片 id
val_ids = random.sample(id_remain, val_num)
##剩下的图片 id 用作 test
id_remain = id_remain.difference(val_ids)
test_ids = list(id_remain)
##将这些 id 都放入 dict 类型中
split_dict = {
"train":train_ids, "val":val_ids, "test":test_ids}
##将类型从 dict 转换为 str, 否则存储为 json 文件的时候会报错
split_str = json.dumps(split_dict)
##存储为 json 文件,默认存储路径为 './data/vg_splits.json',可根据实际情况修改
with open('data/vg_splits.json', 'w') as f:
f.write(split_str)
测试是否正确存储为 json 文件:
splits_json = './data/vg_splits.json'
with open(splits_json, 'r') as f:
splits = json.load(f)
for split_name, split_list in splits.items():
print(split_name, type(split_list), len(split_list))
输出:
train <class 'list'> 86463
val <class 'list'> 10807
test <class 'list'> 10807
Python 程序:
def remove_small_images(min_image_size, image_id_to_image, splits):
new_splits = {
}
for split_name, image_ids in splits.items():
new_image_ids = []
num_skipped = 0
for image_id in image_ids:
image = image_id_to_image[image_id]
height, width = image['height'], image['width']
if min(height, width) < min_image_size:
num_skipped += 1
continue
new_image_ids.append(image_id)
new_splits[split_name] = new_image_ids
print('Removed %d images from split "%s" for being too small' %
(num_skipped, split_name))
return new_splits
##设置最小图片尺寸,可根据实际情况进行改变
min_image_size = 200
##读取储存图片 id 的 json 文件
splits_json = './data/vg_splits.json'
with open(splits_json, 'r') as f:
splits = json.load(f)
##去除尺寸较小的图片 id
splits = remove_small_images(min_image_size, image_id_to_image, splits)
输出:
##由于数据分割是随机的,所以输出结果可能会不同
Removed 335 images from split "train" for being too small
Removed 46 images from split "val" for being too small
Removed 45 images from split "test" for being too small
同一个物体类别可以有不同的名称(例如名称有单复数的区别,但表示的还是同一类物体),物体之间的关系描述也是如此,就像 in 和 inside of 表示的也是同一类关系。
Python 代码
def load_aliases(alias_path):
aliases = {
}
with open(alias_path, 'r') as f:
for line in f:
## strip() remove spaces at the beginning and at the end of the string
line = [s.strip() for s in line.split(',')]
for s in line:
aliases[s] = line[0]
return aliases
##默认路径是 'object_alias.txt' 和 'relationship_alias.txt',可根据实际情况改变
obj_aliases = load_aliases('object_alias.txt')
rel_aliases = load_aliases('relationship_alias.txt')
读取 objects.json 文件,并构建字典 object_name_to_idx 实现物体名称和编号的一一对应。当然,可以预先设置一些过滤器过滤掉出现次数较少的 object,这能让训练出来的模型表现效果更好。
Python 代码:
def create_object_vocab(min_object_instances, image_ids, objects, aliases, vocab):
image_ids = set(image_ids)
print('Making object vocab from %d training images' % len(image_ids))
##使用 Counter 类型来统计物体出现的次数,方便过滤
object_name_counter = Counter()
for image in objects:
if image['image_id'] not in image_ids:
continue
for obj in image['objects']:
names = set()
##需要注意的是, object 的名字是存在 list 中的,但我看了几个例子,
##没有发现 list 放了物体的多个名字,都只有一个名字。
for name in obj['names']:
names.add(aliases.get(name, name))
object_name_counter.update(names)
object_names = ['__image__']
for name, count in object_name_counter.most_common():
##只有当物体出现的次数大于设定的阈值,才把它的名字加进去。
if count >= min_object_instances:
object_names.append(name)
print('Found %d object categories with >= %d training instances' %
(len(object_names), min_object_instances))
##这个地方也有点意思,从物体的名字到编号用字典存储,但从物体的编号到名字却用表格存储,
##我能想到的解释是作者不希望让字典的 key 是 int 类型。
object_name_to_idx = {
}
object_idx_to_name = []
for idx, name in enumerate(object_names):
object_name_to_idx[name] = idx
object_idx_to_name.append(name)
vocab['object_name_to_idx'] = object_name_to_idx
vocab['object_idx_to_name'] = object_idx_to_name
##objects.json 的默认路径为 './objects.json',可根据实际情况修改。
with open('objects.json', 'r') as f:
objects = json.load(f)
print('type and length of objects json', type(objects), len(objects))
##可以输出一个例子来看看
print(objects[0])
min_object_instances = 2000
vocab = {
}
##splits 和 obj_aliases 都在上文生成过了
train_ids = splits['train']
create_object_vocab(min_object_instances, train_ids, objects, obj_aliases, vocab)
输出:
type and length of objects json <class 'list'> 108077
##其实我很惊讶,单个图片就已经列出了这么多物体的信息
{
'image_id': 1, 'objects': [{
'synsets': ['tree.n.01'], 'h': 557, 'object_id': 1058549, 'merged_object_ids': [], 'names': ['trees'], 'w': 799, 'y': 0, 'x': 0}, {
'synsets': ['sidewalk.n.01'], 'h': 290, 'object_id': 1058534, 'merged_object_ids': [5046], 'names': ['sidewalk'], 'w': 722, 'y': 308, 'x': 78}, {
'synsets': ['building.n.01'], 'h': 538, 'object_id': 1058508, 'merged_object_ids': [], 'names': ['building'], 'w': 222, 'y': 0, 'x': 1}, {
'synsets': ['street.n.01'], 'h': 258, 'object_id': 1058539, 'merged_object_ids': [3798578], 'names': ['street'], 'w': 359, 'y': 283, 'x': 439}, {
'synsets': ['wall.n.01'], 'h': 535, 'object_id': 1058543, 'merged_object_ids': [], 'names': ['wall'], 'w': 135, 'y': 1, 'x': 0}, {
'synsets': ['tree.n.01'], 'h': 360, 'object_id': 1058545, 'merged_object_ids': [], 'names': ['tree'], 'w': 476, 'y': 0, 'x': 178}, {
'synsets': ['shade.n.01'], 'h': 189, 'object_id': 5045, 'merged_object_ids': [], 'names': ['shade'], 'w': 274, 'y': 344, 'x': 116}, {
'synsets': ['van.n.05'], 'h': 176, 'object_id': 1058542, 'merged_object_ids': [1058536], 'names': ['van'], 'w': 241, 'y': 278, 'x': 533}, {
'synsets': ['trunk.n.01'], 'h': 348, 'object_id': 5055, 'merged_object_ids': [], 'names': ['tree trunk'], 'w': 78, 'y': 213, 'x': 623}, {
'synsets': ['clock.n.01'], 'h': 363, 'object_id': 1058498, 'merged_object_ids': [], 'names': ['clock'], 'w': 77, 'y': 63, 'x': 422}, {
'synsets': ['window.n.01'], 'h': 147, 'object_id': 3798579, 'merged_object_ids': [], 'names': ['windows'], 'w': 198, 'y': 1, 'x': 602}, {
'synsets': ['man.n.01'], 'h': 248, 'object_id': 3798576, 'merged_object_ids': [1058540], 'names': ['man'], 'w': 82, 'y': 264, 'x': 367}, {
'synsets': ['man.n.01'], 'h': 259, 'object_id': 3798577, 'merged_object_ids': [], 'names': ['man'], 'w': 57, 'y': 254, 'x': 238}, {
'synsets': [], 'h': 430, 'object_id': 1058548, 'merged_object_ids': [], 'names': ['lamp post'], 'w': 43, 'y': 63, 'x': 537}, {
'synsets': ['sign.n.02'], 'h': 179, 'object_id': 1058507, 'merged_object_ids': [], 'names': ['sign'], 'w': 78, 'y': 13, 'x': 123}, {
'synsets': ['car.n.01'], 'h': 164, 'object_id': 1058515, 'merged_object_ids': [], 'names': ['car'], 'w': 80, 'y': 342, 'x': 719}, {
'synsets': ['back.n.01'], 'h': 164, 'object_id': 5060, 'merged_object_ids': [], 'names': ['back'], 'w': 70, 'y': 345, 'x': 716}, {
'synsets': ['jacket.n.01'], 'h': 98, 'object_id': 1058530, 'merged_object_ids': [], 'names': ['jacket'], 'w': 82, 'y': 296, 'x': 367}, {
'synsets': ['car.n.01'], 'h': 95, 'object_id': 5049, 'merged_object_ids': [], 'names': ['car'], 'w': 78, 'y': 319, 'x': 478}, {
'synsets': ['trouser.n.01'], 'h': 128, 'object_id': 1058531, 'merged_object_ids': [], 'names': ['pants'], 'w': 48, 'y': 369, 'x': 388}, {
'synsets': ['shirt.n.01'], 'h': 103, 'object_id': 1058511, 'merged_object_ids': [], 'names': ['shirt'], 'w': 54, 'y': 287, 'x': 241}, {
'synsets': ['parking_meter.n.01'], 'h': 143, 'object_id': 1058519, 'merged_object_ids': [], 'names': ['parking meter'], 'w': 26, 'y': 325, 'x': 577}, {
'synsets': ['trouser.n.01'], 'h': 118, 'object_id': 1058528, 'merged_object_ids': [], 'names': ['pants'], 'w': 44, 'y': 384, 'x': 245}, {
'synsets': ['shirt.n.01'], 'h': 102, 'object_id': 1058547, 'merged_object_ids': [], 'names': ['shirt'], 'w': 82, 'y': 295, 'x': 368}, {
'synsets': ['shoe.n.01'], 'h': 28, 'object_id': 1058525, 'merged_object_ids': [5048], 'names': ['shoes'], 'w': 48, 'y': 485, 'x': 388}, {
'synsets': ['arm.n.01'], 'h': 41, 'object_id': 1058546, 'merged_object_ids': [], 'names': ['arm'], 'w': 30, 'y': 285, 'x': 370}, {
'synsets': ['bicycle.n.01'], 'h': 36, 'object_id': 1058535, 'merged_object_ids': [], 'names': ['bike'], 'w': 27, 'y': 319, 'x': 337}, {
'synsets': ['bicycle.n.01'], 'h': 41, 'object_id': 5051, 'merged_object_ids': [], 'names': ['bike'], 'w': 27, 'y': 311, 'x': 321}, {
'synsets': ['headlight.n.01'], 'h': 9, 'object_id': 5050, 'merged_object_ids': [], 'names': ['headlight'], 'w': 18, 'y': 370, 'x': 517}, {
'synsets': ['spectacles.n.01'], 'h': 23, 'object_id': 1058518, 'merged_object_ids': [], 'names': ['glasses'], 'w': 43, 'y': 317, 'x': 448}, {
'synsets': ['chin.n.01'], 'h': 8, 'object_id': 1058541, 'merged_object_ids': [], 'names': ['chin'], 'w': 9, 'y': 288, 'x': 401}], 'image_url': 'https://cs.stanford.edu/people/rak248/VG_100K_2/1.jpg'}
Making object vocab from 86128 training images
Found 179 object categories with >= 2000 training instances
在上面读取 image_data.json 这一步骤中,我们实现了从 image id 到 image 的一一对应。现在我们构建完成了 object_name_to_idx (当然中间已经根据物体出现的次数过滤掉了一部分物体),接下来需要根据每个物体的尺寸进一步过滤物体,并实现从 object id 到 object (object 的 name, idx 和 box) 的一一对应。看到代码实现过程,不禁心疼 CPU 一秒钟。
Python 代码:
def filter_objects(min_object_size, objects, aliases, vocab, splits):
object_id_to_objects = {
}
all_image_ids = set()
for image_ids in splits.values():
all_image_ids |= set(image_ids)
object_name_to_idx = vocab['object_name_to_idx']
object_id_to_obj = {
}
num_too_small = 0
for image in objects:
image_id = image['image_id']
if image_id not in all_image_ids:
continue
for obj in image['objects']:
object_id = obj['object_id']
final_name = None
final_name_idx = None
for name in obj['names']:
name = aliases.get(name, name)
if name in object_name_to_idx:
final_name = name
final_name_idx = object_name_to_idx[final_name]
break
w, h = obj['w'], obj['h']
too_small = (w < min_object_size) or (h < min_object_size)
if too_small:
num_too_small += 1
if final_name is not None and not too_small:
object_id_to_obj[object_id] = {
'name': final_name,
'name_idx': final_name_idx,
'box': [obj['x'], obj['y'], obj['w'], obj['h']],
}
print('Skipped %d objects with size < %d' % (num_too_small, min_object_size))
return object_id_to_obj
min_object_size = 32
object_id_to_obj = filter_objects(min_object_size, objects, obj_aliases, vocab, splits)
输出:
Skipped 997213 objects with size < 32
读取 attributes.json 文件,并构建字典 attribute_name_to_idx 实现属性名称和编号的一一对应。当然,可以预先设置一些过滤器过滤掉出现次数较少的 attribute,这能让训练出来的模型表现效果更好。该处理过程与处理 objects.json 类似。
Python 代码:
def create_attribute_vocab(min_attribute_instances, image_ids, attributes, vocab):
image_ids = set(image_ids)
print('Making attribute vocab from %d training images' % len(image_ids))
attribute_name_counter = Counter()
for image in attributes:
if image['image_id'] not in image_ids:
continue
for attribute in image['attributes']:
names = set()
##这里用 try ... except 是因为图片中的有些物体并没有 attributes,
##如果不用这个语法就会报 KeyError 的错。
try:
for name in attribute['attributes']:
names.add(name)
attribute_name_counter.update(names)
except KeyError:
pass
attribute_names = []
for name, count in attribute_name_counter.most_common():
if count >= min_attribute_instances:
attribute_names.append(name)
print('Found %d attribute categories with >= %d training instances' %
(len(attribute_names), min_attribute_instances))
attribute_name_to_idx = {
}
attribute_idx_to_name = []
for idx, name in enumerate(attribute_names):
attribute_name_to_idx[name] = idx
attribute_idx_to_name.append(name)
vocab['attribute_name_to_idx'] = attribute_name_to_idx
vocab['attribute_idx_to_name'] = attribute_idx_to_name
##attributes.json 的默认路径为 './attributes.json',可根据实际情况修改。
with open('attributes.json', 'r') as f:
attributes = json.load(f)
print('type of attributes json', type(attributes), len(attributes))
##输出一个例子来看看
print(attributes[0])
min_attribute_instances = 2000
create_attribute_vocab(min_attribute_instances, train_ids, attributes, vocab)
输出:
type of attributes json <class 'list'> 108077
##这个例子很长很长很长。。。
{
'image_id': 1, 'attributes': [{
'synsets': ['clock.n.01'], 'h': 339, 'object_id': 1058498, 'names': ['clock'], 'w': 79, 'attributes': ['green', 'tall'], 'y': 91, 'x': 421}, {
'synsets': ['street.n.01'], 'h': 262, 'object_id': 5046, 'names': ['street'], 'w': 714, 'attributes': ['sidewalk'], 'y': 328, 'x': 77}, {
'synsets': ['shade.n.01'], 'h': 192, 'object_id': 5045, 'names': ['shade'], 'w': 274, 'y': 338, 'x': 119}, {
'synsets': ['man.n.01'], 'h': 262, 'object_id': 1058529, 'names': ['man'], 'w': 60, 'y': 249, 'x': 238}, {
'synsets': ['gym_shoe.n.01'], 'h': 26, 'object_id': 5048, 'names': ['sneakers'], 'w': 52, 'attributes': ['grey'], 'y': 489, 'x': 243}, {
'synsets': ['headlight.n.01'], 'h': 15, 'object_id': 5050, 'names': ['headlight'], 'w': 23, 'attributes': ['off'], 'y': 366, 'x': 514}, {
'synsets': ['car.n.01'], 'h': 98, 'object_id': 5049, 'names': ['car'], 'w': 74, 'y': 315, 'x': 479}, {
'synsets': ['bicycle.n.01'], 'h': 34, 'object_id': 5051, 'names': ['bike'], 'w': 28, 'attributes': ['parked', 'far away'], 'y': 319, 'x': 318}, {
'synsets': ['bicycle.n.01'], 'h': 35, 'object_id': 1058535, 'names': ['bike'], 'w': 29, 'attributes': ['parked', 'far away', 'chained'], 'y': 319, 'x': 334}, {
'synsets': ['sign.n.02'], 'h': 182, 'object_id': 1058507, 'names': ['sign'], 'w': 88, 'attributes': ['black'], 'y': 13, 'x': 118}, {
'synsets': ['building.n.01'], 'h': 536, 'object_id': 1058508, 'names': ['building'], 'w': 218, 'attributes': ['tall', 'brick', 'made of bricks'], 'y': 2, 'x': 1}, {
'synsets': ['trunk.n.01'], 'h': 327, 'object_id': 5055, 'names': ['tree trunk'], 'w': 87, 'y': 234, 'x': 622}, {
'synsets': ['sidewalk.n.01'], 'h': 266, 'object_id': 1058534, 'names': ['sidewalk'], 'w': 722, 'attributes': ['brick'], 'y': 331, 'x': 77}, {
'synsets': ['shirt.n.01'], 'h': 101, 'object_id': 1058511, 'names': ['shirt'], 'w': 59, 'attributes': ['red', 'orange'], 'y': 289, 'x': 241}, {
'synsets': ['street.n.01'], 'h': 233, 'object_id': 1058539, 'names': ['street'], 'w': 440, 'attributes': ['clean'], 'y': 283, 'x': 358}, {
'synsets': ['car.n.01'], 'h': 174, 'object_id': 1058515, 'names': ['car'], 'w': 91, 'attributes': ['white', 'parked'], 'y': 342, 'x': 708}, {
'synsets': ['back.n.01'], 'h': 170, 'object_id': 5060, 'names': ['back'], 'w': 67, 'y': 339, 'x': 721}, {
'synsets': ['spectacles.n.01'], 'h': 12, 'object_id': 1058518, 'names': ['glasses'], 'w': 20, 'y': 268, 'x': 271}, {
'synsets': ['parking_meter.n.01'], 'h': 143, 'object_id': 1058519, 'names': ['parking meter'], 'w': 32, 'attributes': ['orange'], 'y': 327, 'x': 574}, {
'synsets': ['shoe.n.01'], 'h': 34, 'object_id': 1058525, 'names': ['shoes'], 'w': 46, 'attributes': ['brown'], 'y': 481, 'x': 391}, {
'synsets': ['man.n.01'], 'h': 251, 'object_id': 1058532, 'names': ['man'], 'w': 75, 'y': 264, 'x': 372}, {
'synsets': ['trouser.n.01'], 'h': 118, 'object_id': 1058528, 'names': ['pants'], 'w': 38, 'attributes': ['black'], 'y': 384, 'x': 245}, {
'synsets': ['jacket.n.01'], 'h': 97, 'object_id': 1058530, 'names': ['jacket'], 'w': 89, 'attributes': ['gray', 'grey'], 'y': 296, 'x': 356}, {
'synsets': ['trouser.n.01'], 'h': 128, 'object_id': 1058531, 'names': ['pants'], 'w': 54, 'attributes': ['gray', 'grey'], 'y': 369, 'x': 382}, {
'synsets': [], 'h': 185, 'object_id': 1058536, 'names': ['work truck'], 'w': 265, 'attributes': ['white'], 'y': 271, 'x': 521}, {
'synsets': ['sidewalk.n.01'], 'h': 189, 'object_id': 3798575, 'names': ['sidewalk'], 'w': 50, 'y': 318, 'x': 343}, {
'synsets': ['chin.n.01'], 'h': 9, 'object_id': 1058541, 'names': ['chin'], 'w': 11, 'attributes': ['raised'], 'y': 288, 'x': 399}, {
'synsets': ['guy.n.01'], 'h': 250, 'object_id': 1058540, 'names': ['guy'], 'w': 82, 'y': 264, 'x': 369}, {
'synsets': ['van.n.05'], 'h': 134, 'object_id': 1058542, 'names': ['van'], 'w': 233, 'attributes': ['parked', 'white'], 'y': 298, 'x': 529}, {
'synsets': ['wall.n.01'], 'h': 533, 'object_id': 1058543, 'names': ['wall'], 'w': 134, 'attributes': ['grey'], 'y': 1, 'x': 0}, {
'synsets': ['tree.n.01'], 'h': 360, 'object_id': 1058545, 'names': ['tree'], 'w': 176, 'y': 0, 'x': 249}, {
'synsets': ['bicycle.n.01'], 'h': 35, 'object_id': 1058544, 'names': ['bikes'], 'w': 40, 'y': 319, 'x': 321}, {
'synsets': ['arm.n.01'], 'h': 43, 'object_id': 1058546, 'names': ['arm'], 'w': 32, 'attributes': ['raised'], 'y': 283, 'x': 368}, {
'synsets': ['shirt.n.01'], 'h': 66, 'object_id': 1058547, 'names': ['shirt'], 'w': 37, 'attributes': ['grey'], 'y': 306, 'x': 384}, {
'synsets': ['man.n.01'], 'h': 248, 'object_id': 3798576, 'names': ['man'], 'w': 97, 'y': 264, 'x': 362}, {
'synsets': ['man.n.01'], 'h': 264, 'object_id': 3798577, 'names': ['man'], 'w': 72, 'y': 251, 'x': 230}, {
'synsets': ['road.n.01'], 'h': 218, 'object_id': 3798578, 'names': ['road'], 'w': 340, 'y': 295, 'x': 435}, {
'synsets': [], 'h': 430, 'object_id': 1058548, 'names': ['lamp post'], 'w': 41, 'y': 63, 'x': 537}, {
'synsets': ['tree.n.01'], 'h': 557, 'object_id': 1058549, 'names': ['trees'], 'w': 606, 'attributes': ['sparse'], 'y': 0, 'x': 190}, {
'synsets': ['window.n.01'], 'h': 148, 'object_id': 3798579, 'names': ['windows'], 'w': 173, 'y': 4, 'x': 602}]}
Making attribute vocab from 86128 training images
Found 80 attribute categories with >= 2000 training instances
读取 relationships.json 文件,并构建字典 pred_name_to_idx 实现关系名称和编号的一一对应。当然,可以预先设置一些过滤器过滤掉出现次数较少的 relationship,这能让训练出来的模型表现效果更好。
Python 代码:
def create_rel_vocab(min_relationship_instances, image_ids, relationships,
object_id_to_obj, rel_aliases, vocab):
pred_counter = defaultdict(int)
image_ids_set = set(image_ids)
for image in relationships:
image_id = image['image_id']
if image_id not in image_ids_set:
continue
for rel in image['relationships']:
sid = rel['subject']['object_id']
oid = rel['object']['object_id']
found_subject = sid in object_id_to_obj
found_object = oid in object_id_to_obj
if not found_subject or not found_object:
continue
pred = rel['predicate'].lower().strip()
pred = rel_aliases.get(pred, pred)
rel['predicate'] = pred
pred_counter[pred] += 1
pred_names = ['__in_image__']
for pred, count in pred_counter.items():
if count >= min_relationship_instances:
pred_names.append(pred)
print('Found %d relationship types with >= %d training instances'
% (len(pred_names), min_relationship_instances))
pred_name_to_idx = {
}
pred_idx_to_name = []
for idx, name in enumerate(pred_names):
pred_name_to_idx[name] = idx
pred_idx_to_name.append(name)
vocab['pred_name_to_idx'] = pred_name_to_idx
vocab['pred_idx_to_name'] = pred_idx_to_name
##relationships.json 的默认路径是 './relationships.json',可根据实际情况修改。
with open('relationships.json', 'r') as f:
relationships = json.load(f)
print('type of relationships json', type(relationships), len(relationships))
##输出一个示例看看
print(relationships[0])
##注意下面函数所用的参数在之前的步骤中都已经生成了。
min_relationship_instances = 500
create_rel_vocab(min_relationship_instances, train_ids, relationships,
object_id_to_obj, rel_aliases, vocab)
输出:
type of relationships json <class 'list'> 108077
##又是一个很长很长的示例。。。。
{
'relationships': [{
'predicate': 'ON', 'object': {
'h': 290, 'object_id': 1058534, 'merged_object_ids': [5046], 'synsets': ['sidewalk.n.01'], 'w': 722, 'y': 308, 'x': 78, 'names': ['sidewalk']}, 'relationship_id': 15927, 'synsets': ['along.r.01'], 'subject': {
'name': 'shade', 'h': 192, 'synsets': ['shade.n.01'], 'object_id': 5045, 'w': 274, 'y': 338, 'x': 119}}, {
'predicate': 'wears', 'object': {
'h': 28, 'object_id': 1058525, 'merged_object_ids': [5048], 'synsets': ['shoe.n.01'], 'w': 48, 'y': 485, 'x': 388, 'names': ['shoes']}, 'relationship_id': 15928, 'synsets': ['wear.v.01'], 'subject': {
'name': 'man', 'h': 262, 'synsets': ['man.n.01'], 'object_id': 1058529, 'w': 60, 'y': 249, 'x': 238}}, {
'predicate': 'has', 'object': {
'name': 'headlight', 'h': 15, 'synsets': ['headlight.n.01'], 'object_id': 5050, 'w': 23, 'y': 366, 'x': 514}, 'relationship_id': 15929, 'synsets': ['have.v.01'], 'subject': {
'name': 'car', 'h': 98, 'synsets': ['car.n.01'], 'object_id': 5049, 'w': 74, 'y': 315, 'x': 479}}, {
'predicate': 'ON', 'object': {
'name': 'building', 'h': 536, 'synsets': ['building.n.01'], 'object_id': 1058508, 'w': 218, 'y': 2, 'x': 1}, 'relationship_id': 15930, 'synsets': ['along.r.01'], 'subject': {
'name': 'sign', 'h': 182, 'synsets': ['sign.n.02'], 'object_id': 1058507, 'w': 88, 'y': 13, 'x': 118}}, {
'predicate': 'ON', 'object': {
'name': 'sidewalk', 'h': 266, 'synsets': ['sidewalk.n.01'], 'object_id': 1058534, 'w': 722, 'y': 331, 'x': 77}, 'relationship_id': 15931, 'synsets': ['along.r.01'], 'subject': {
'name': 'tree trunk', 'h': 327, 'synsets': ['trunk.n.01'], 'object_id': 5055, 'w': 87, 'y': 234, 'x': 622}}, {
'predicate': 'has', 'object': {
'name': 'shirt', 'h': 101, 'synsets': ['shirt.n.01'], 'object_id': 1058511, 'w': 59, 'y': 289, 'x': 241}, 'relationship_id': 15932, 'synsets': ['have.v.01'], 'subject': {
'name': 'man', 'h': 262, 'synsets': ['man.n.01'], 'object_id': 1058529, 'w': 60, 'y': 249, 'x': 238}}, {
'predicate': 'next to', 'object': {
'name': 'street', 'h': 233, 'synsets': ['street.n.01'], 'object_id': 1058539, 'w': 440, 'y': 283, 'x': 358}, 'relationship_id': 15933, 'synsets': ['next.r.01'], 'subject': {
'name': 'sidewalk', 'h': 266, 'synsets': ['sidewalk.n.01'], 'object_id': 1058534, 'w': 722, 'y': 331, 'x': 77}}, {
'predicate': 'has', 'object': {
'name': 'back', 'h': 170, 'synsets': ['back.n.01'], 'object_id': 5060, 'w': 67, 'y': 339, 'x': 721}, 'relationship_id': 15934, 'synsets': ['have.v.01'], 'subject': {
'name': 'car', 'h': 174, 'synsets': ['car.n.01'], 'object_id': 1058515, 'w': 91, 'y': 342, 'x': 708}}, {
'predicate': 'has', 'object': {
'name': 'glasses', 'h': 12, 'synsets': ['spectacles.n.01'], 'object_id': 1058518, 'w': 20, 'y': 268, 'x': 271}, 'relationship_id': 15935, 'synsets': ['have.v.01'], 'subject': {
'name': 'man', 'h': 262, 'synsets': ['man.n.01'], 'object_id': 1058529, 'w': 60, 'y': 249, 'x': 238}}, {
'predicate': 'ON', 'object': {
'name': 'sidewalk', 'h': 266, 'synsets': ['sidewalk.n.01'], 'object_id': 1058534, 'w': 722, 'y': 331, 'x': 77}, 'relationship_id': 15936, 'synsets': ['along.r.01'], 'subject': {
'name': 'parking meter', 'h': 143, 'synsets': ['parking_meter.n.01'], 'object_id': 1058519, 'w': 32, 'y': 327, 'x': 574}}, {
'predicate': 'wears', 'object': {
'h': 28, 'object_id': 1058525, 'merged_object_ids': [5048], 'synsets': ['shoe.n.01'], 'w': 48, 'y': 485, 'x': 388, 'names': ['shoes']}, 'relationship_id': 15937, 'synsets': ['wear.v.01'], 'subject': {
'name': 'man', 'h': 262, 'synsets': ['man.n.01'], 'object_id': 1058529, 'w': 60, 'y': 249, 'x': 238}}, {
'predicate': 'has', 'object': {
'name': 'shoes', 'h': 34, 'synsets': ['shoe.n.01'], 'object_id': 1058525, 'w': 46, 'y': 481, 'x': 391}, 'relationship_id': 15938, 'synsets': ['have.v.01'], 'subject': {
'name': 'man', 'h': 251, 'synsets': ['man.n.01'], 'object_id': 1058532, 'w': 75, 'y': 264, 'x': 372}}, {
'predicate': 'has', 'object': {
'name': 'shirt', 'h': 101, 'synsets': ['shirt.n.01'], 'object_id': 1058511, 'w': 59, 'y': 289, 'x': 241}, 'relationship_id': 15939, 'synsets': ['have.v.01'], 'subject': {
'name': 'man', 'h': 262, 'synsets': ['man.n.01'], 'object_id': 1058529, 'w': 60, 'y': 249, 'x': 238}}, {
'predicate': 'wears', 'object': {
'name': 'pants', 'h': 118, 'synsets': ['trouser.n.01'], 'object_id': 1058528, 'w': 38, 'y': 384, 'x': 245}, 'relationship_id': 15940, 'synsets': ['wear.v.01'], 'subject': {
'name': 'man', 'h': 262, 'synsets': ['man.n.01'], 'object_id': 1058529, 'w': 60, 'y': 249, 'x': 238}}, {
'predicate': 'has', 'object': {
'name': 'jacket', 'h': 97, 'synsets': ['jacket.n.01'], 'object_id': 1058530, 'w': 89, 'y': 296, 'x': 356}, 'relationship_id': 15941, 'synsets': ['have.v.01'], 'subject': {
'name': 'man', 'h': 251, 'synsets': ['man.n.01'], 'object_id': 1058532, 'w': 75, 'y': 264, 'x': 372}}, {
'predicate': 'has', 'object': {
'name': 'pants', 'h': 128, 'synsets': ['trouser.n.01'], 'object_id': 1058531, 'w': 54, 'y': 369, 'x': 382}, 'relationship_id': 15942, 'synsets': ['have.v.01'], 'subject': {
'name': 'man', 'h': 251, 'synsets': ['man.n.01'], 'object_id': 1058532, 'w': 75, 'y': 264, 'x': 372}}, {
'predicate': 'parked on', 'object': {
'name': 'sidewalk', 'h': 266, 'synsets': ['sidewalk.n.01'], 'object_id': 1058534, 'w': 722, 'y': 331, 'x': 77}, 'relationship_id': 15943, 'synsets': ['along.r.01'], 'subject': {
'name': 'bike', 'h': 34, 'synsets': ['bicycle.n.01'], 'object_id': 5051, 'w': 28, 'y': 319, 'x': 318}}, {
'predicate': 'parked on', 'object': {
'name': 'sidewalk', 'h': 266, 'synsets': ['sidewalk.n.01'], 'object_id': 1058534, 'w': 722, 'y': 331, 'x': 77}, 'relationship_id': 15944, 'synsets': ['along.r.01'], 'subject': {
'name': 'bike', 'h': 35, 'synsets': ['bicycle.n.01'], 'object_id': 1058535, 'w': 29, 'y': 319, 'x': 334}}, {
'predicate': 'parked on', 'object': {
'name': 'street', 'h': 233, 'synsets': ['street.n.01'], 'object_id': 1058539, 'w': 440, 'y': 283, 'x': 358}, 'relationship_id': 15945, 'synsets': ['along.r.01'], 'subject': {
'h': 176, 'object_id': 1058542, 'merged_object_ids': [1058536], 'synsets': ['van.n.05'], 'w': 241, 'y': 278, 'x': 533, 'names': ['van']}}, {
'predicate': 'parked on', 'object': {
'name': 'street', 'h': 233, 'synsets': ['street.n.01'], 'object_id': 1058539, 'w': 440, 'y': 283, 'x': 358}, 'relationship_id': 15946, 'synsets': ['along.r.01'], 'subject': {
'name': 'car', 'h': 174, 'synsets': ['car.n.01'], 'object_id': 1058515, 'w': 91, 'y': 342, 'x': 708}}, {
'predicate': 'ON', 'object': {
'name': 'sidewalk', 'h': 189, 'synsets': ['sidewalk.n.01'], 'object_id': 3798575, 'w': 50, 'y': 318, 'x': 343}, 'relationship_id': 4265923, 'synsets': ['along.r.01'], 'subject': {
'name': 'bike', 'h': 35, 'synsets': ['bicycle.n.01'], 'object_id': 1058535, 'w': 29, 'y': 319, 'x': 334}}, {
'predicate': 'behind', 'object': {
'name': 'man', 'h': 251, 'synsets': ['man.n.01'], 'object_id': 1058532, 'w': 75, 'y': 264, 'x': 372}, 'relationship_id': 3186256, 'synsets': ['behind.r.01'], 'subject': {
'name': 'parking meter', 'h': 143, 'synsets': ['parking_meter.n.01'], 'object_id': 1058519, 'w': 32, 'y': 327, 'x': 574}}, {
'predicate': 'holding', 'object': {
'name': 'chin', 'h': 9, 'synsets': ['chin.n.01'], 'object_id': 1058541, 'w': 11, 'y': 288, 'x': 399}, 'relationship_id': 3186257, 'synsets': ['have.v.01'], 'subject': {
'h': 248, 'object_id': 3798576, 'merged_object_ids': [1058540], 'synsets': ['man.n.01'], 'w': 82, 'y': 264, 'x': 367, 'names': ['man']}}, {
'predicate': 'WEARING', 'object': {
'name': 'shirt', 'h': 101, 'synsets': ['shirt.n.01'], 'object_id': 1058511, 'w': 59, 'y': 289, 'x': 241}, 'relationship_id': 3186258, 'synsets': ['wear.v.01'], 'subject': {
'name': 'man', 'h': 262, 'synsets': ['man.n.01'], 'object_id': 1058529, 'w': 60, 'y': 249, 'x': 238}}, {
'predicate': 'holding', 'object': {
'name': 'chin', 'h': 9, 'synsets': ['chin.n.01'], 'object_id': 1058541, 'w': 11, 'y': 288, 'x': 399}, 'relationship_id': 3186259, 'synsets': ['have.v.01'], 'subject': {
'name': 'man', 'h': 251, 'synsets': ['man.n.01'], 'object_id': 1058532, 'w': 75, 'y': 264, 'x': 372}}, {
'predicate': 'near', 'object': {
'name': 'tree', 'h': 360, 'synsets': ['tree.n.01'], 'object_id': 1058545, 'w': 176, 'y': 0, 'x': 249}, 'relationship_id': 3186260, 'synsets': ['about.r.07'], 'subject': {
'name': 'bikes', 'h': 35, 'synsets': ['bicycle.n.01'], 'object_id': 1058544, 'w': 40, 'y': 319, 'x': 321}}, {
'predicate': 'WEARING', 'object': {
'name': 'shoes', 'h': 34, 'synsets': ['shoe.n.01'], 'object_id': 1058525, 'w': 46, 'y': 481, 'x': 391}, 'relationship_id': 3186261, 'synsets': ['wear.v.01'], 'subject': {
'name': 'man', 'h': 251, 'synsets': ['man.n.01'], 'object_id': 1058532, 'w': 75, 'y': 264, 'x': 372}}, {
'predicate': 'near', 'object': {
'name': 'tree', 'h': 360, 'synsets': ['tree.n.01'], 'object_id': 1058545, 'w': 176, 'y': 0, 'x': 249}, 'relationship_id': 3186262, 'synsets': ['about.r.07'], 'subject': {
'name': 'bikes', 'h': 35, 'synsets': ['bicycle.n.01'], 'object_id': 1058544, 'w': 40, 'y': 319, 'x': 321}}, {
'predicate': 'ON', 'object': {
'name': 'man', 'h': 262, 'synsets': ['man.n.01'], 'object_id': 1058529, 'w': 60, 'y': 249, 'x': 238}, 'relationship_id': 3186263, 'synsets': ['along.r.01'], 'subject': {
'name': 'shirt', 'h': 101, 'synsets': ['shirt.n.01'], 'object_id': 1058511, 'w': 59, 'y': 289, 'x': 241}}, {
'predicate': 'holding', 'object': {
'name': 'chin', 'h': 9, 'synsets': ['chin.n.01'], 'object_id': 1058541, 'w': 11, 'y': 288, 'x': 399}, 'relationship_id': 4265924, 'synsets': ['have.v.01'], 'subject': {
'name': 'man', 'h': 248, 'synsets': ['man.n.01'], 'object_id': 3798576, 'w': 97, 'y': 264, 'x': 362}}, {
'predicate': 'WEARING', 'object': {
'name': 'glasses', 'h': 12, 'synsets': ['spectacles.n.01'], 'object_id': 1058518, 'w': 20, 'y': 268, 'x': 271}, 'relationship_id': 4265925, 'synsets': ['wear.v.01'], 'subject': {
'name': 'man', 'h': 264, 'synsets': ['man.n.01'], 'object_id': 3798577, 'w': 72, 'y': 251, 'x': 230}}, {
'predicate': 'along', 'object': {
'h': 258, 'object_id': 1058539, 'merged_object_ids': [3798578], 'synsets': ['street.n.01'], 'w': 359, 'y': 283, 'x': 439, 'names': ['street']}, 'relationship_id': 4265926, 'synsets': ['along.r.01'], 'subject': {
'name': 'lamp post', 'h': 430, 'synsets': [], 'object_id': 1058548, 'w': 41, 'y': 63, 'x': 537}}, {
'predicate': 'IN', 'object': {
'name': 'shirt', 'h': 101, 'synsets': ['shirt.n.01'], 'object_id': 1058511, 'w': 59, 'y': 289, 'x': 241}, 'relationship_id': 3186264, 'synsets': ['in.r.01'], 'subject': {
'name': 'man', 'h': 262, 'synsets': ['man.n.01'], 'object_id': 1058529, 'w': 60, 'y': 249, 'x': 238}}, {
'predicate': 'WEARING', 'object': {
'name': 'pants', 'h': 118, 'synsets': ['trouser.n.01'], 'object_id': 1058528, 'w': 38, 'y': 384, 'x': 245}, 'relationship_id': 3186265, 'synsets': ['wear.v.01'], 'subject': {
'name': 'man', 'h': 262, 'synsets': ['man.n.01'], 'object_id': 1058529, 'w': 60, 'y': 249, 'x': 238}}, {
'predicate': 'on top of', 'object': {
'name': 'street', 'h': 233, 'synsets': ['street.n.01'], 'object_id': 1058539, 'w': 440, 'y': 283, 'x': 358}, 'relationship_id': 3186266, 'synsets': ['along.r.01'], 'subject': {
'name': 'parking meter', 'h': 143, 'synsets': ['parking_meter.n.01'], 'object_id': 1058519, 'w': 32, 'y': 327, 'x': 574}}, {
'predicate': 'next to', 'object': {
'name': 'street', 'h': 233, 'synsets': ['street.n.01'], 'object_id': 1058539, 'w': 440, 'y': 283, 'x': 358}, 'relationship_id': 3186267, 'synsets': ['next.r.01'], 'subject': {
'name': 'tree', 'h': 360, 'synsets': ['tree.n.01'], 'object_id': 1058545, 'w': 176, 'y': 0, 'x': 249}}, {
'predicate': 'WEARING', 'object': {
'name': 'glasses', 'h': 12, 'synsets': ['spectacles.n.01'], 'object_id': 1058518, 'w': 20, 'y': 268, 'x': 271}, 'relationship_id': 3186268, 'synsets': ['wear.v.01'], 'subject': {
'name': 'man', 'h': 262, 'synsets': ['man.n.01'], 'object_id': 1058529, 'w': 60, 'y': 249, 'x': 238}}, {
'predicate': 'behind', 'object': {
'name': 'man', 'h': 262, 'synsets': ['man.n.01'], 'object_id': 1058529, 'w': 60, 'y': 249, 'x': 238}, 'relationship_id': 3186269, 'synsets': ['behind.r.01'], 'subject': {
'name': 'bikes', 'h': 35, 'synsets': ['bicycle.n.01'], 'object_id': 1058544, 'w': 40, 'y': 319, 'x': 321}}, {
'predicate': 'by', 'object': {
'name': 'sidewalk', 'h': 266, 'synsets': ['sidewalk.n.01'], 'object_id': 1058534, 'w': 722, 'y': 331, 'x': 77}, 'relationship_id': 3186270, 'synsets': ['by.r.01'], 'subject': {
'name': 'trees', 'h': 557, 'synsets': ['tree.n.01'], 'object_id': 1058549, 'w': 606, 'y': 0, 'x': 190}}, {
'predicate': 'WEARING', 'object': {
'name': 'jacket', 'h': 97, 'synsets': ['jacket.n.01'], 'object_id': 1058530, 'w': 89, 'y': 296, 'x': 356}, 'relationship_id': 3186271, 'synsets': ['wear.v.01'], 'subject': {
'name': 'man', 'h': 251, 'synsets': ['man.n.01'], 'object_id': 1058532, 'w': 75, 'y': 264, 'x': 372}}, {
'predicate': 'with', 'object': {
'name': 'windows', 'h': 148, 'synsets': ['window.n.01'], 'object_id': 3798579, 'w': 173, 'y': 4, 'x': 602}, 'relationship_id': 4265927, 'synsets': [], 'subject': {
'name': 'building', 'h': 536, 'synsets': ['building.n.01'], 'object_id': 1058508, 'w': 218, 'y': 2, 'x': 1}}], 'image_id': 1}
Found 46 relationship types with >= 500 training instances
到目前为止,我们得到了图片 id 到图片的一一对应 (image_id_to_image),物体的名字到编号的一一对应 (object_name_to_idx),描述的名字到编号的一一对应 (attribute_name_to_idx),关系的名字到编号的一一对应 (pred_name_to_idx),甚至在心疼 CPU 的同时得到了物体的 id 到物体相关信息(‘name’, ‘name_idx’, ‘box’)的一一对应 (object_id_to_obj)。当然了,在得到这些信息的过程中我们都添加了各种各样的过滤器用来筛选出符合我们标准的信息。接下来,激动人心的时刻就要来临了(为什么我的脑海中浮现了凯南的台词),我们需要综合处理我们所得到的所有信息,实现每一张图片信息的标准化,此时,需要心疼 CPU 2秒钟。
Python 代码:
import argparse, json, os
from collections import Counter, defaultdict
import numpy as np
parser = argparse.ArgumentParser()
parser.add_argument('--min_objects_per_image', default=3, type=int)
parser.add_argument('--max_objects_per_image', default=30, type=int)
parser.add_argument('--max_attributes_per_image', default=30, type=int)
parser.add_argument('--min_relationships_per_image', default=1, type=int)
parser.add_argument('--max_relationships_per_image', default=30, type=int)
def encode_graphs(args, splits, objects, relationships, vocab,
object_id_to_obj, attributes):
image_id_to_objects = {
}
for image in objects:
image_id = image['image_id']
image_id_to_objects[image_id] = image['objects']
image_id_to_relationships = {
}
for image in relationships:
image_id = image['image_id']
image_id_to_relationships[image_id] = image['relationships']
image_id_to_attributes = {
}
for image in attributes:
image_id = image['image_id']
image_id_to_attributes[image_id] = image['attributes']
numpy_arrays = {
}
for split, image_ids in splits.items():
skip_stats = defaultdict(int)
# We need to filter *again* based on number of objects and relationships
final_image_ids = []
object_ids = []
object_names = []
object_boxes = []
objects_per_image = []
relationship_ids = []
relationship_subjects = []
relationship_predicates = []
relationship_objects = []
relationships_per_image = []
attribute_ids = []
attributes_per_object = []
object_attributes = []
for image_id in image_ids:
image_object_ids = []
image_object_names = []
image_object_boxes = []
object_id_to_idx = {
}
for obj in image_id_to_objects[image_id]:
object_id = obj['object_id']
if object_id not in object_id_to_obj:
continue
obj = object_id_to_obj[object_id]
object_id_to_idx[object_id] = len(image_object_ids)
image_object_ids.append(object_id)
image_object_names.append(obj['name_idx'])
image_object_boxes.append(obj['box'])
num_objects = len(image_object_ids)
too_few = num_objects < args.min_objects_per_image
too_many = num_objects > args.max_objects_per_image
if too_few:
skip_stats['too_few_objects'] += 1
continue
if too_many:
skip_stats['too_many_objects'] += 1
continue
image_rel_ids = []
image_rel_subs = []
image_rel_preds = []
image_rel_objs = []
for rel in image_id_to_relationships[image_id]:
relationship_id = rel['relationship_id']
pred = rel['predicate']
pred_idx = vocab['pred_name_to_idx'].get(pred, None)
if pred_idx is None:
continue
sid = rel['subject']['object_id']
sidx = object_id_to_idx.get(sid, None)
oid = rel['object']['object_id']
oidx = object_id_to_idx.get(oid, None)
if sidx is None or oidx is None:
continue
image_rel_ids.append(relationship_id)
image_rel_subs.append(sidx)
image_rel_preds.append(pred_idx)
image_rel_objs.append(oidx)
num_relationships = len(image_rel_ids)
too_few = num_relationships < args.min_relationships_per_image
too_many = num_relationships > args.max_relationships_per_image
if too_few:
skip_stats['too_few_relationships'] += 1
continue
if too_many:
skip_stats['too_many_relationships'] += 1
continue
obj_id_to_attributes = {
}
num_attributes = []
for obj_attribute in image_id_to_attributes[image_id]:
obj_id_to_attributes[obj_attribute['object_id']] = obj_attribute.get('attributes', None)
for object_id in image_object_ids:
attributes = obj_id_to_attributes.get(object_id, None)
if attributes is None:
object_attributes.append([-1] * args.max_attributes_per_image)
num_attributes.append(0)
else:
attribute_ids = []
for attribute in attributes:
if attribute in vocab['attribute_name_to_idx']:
attribute_ids.append(vocab['attribute_name_to_idx'][attribute])
if len(attribute_ids) >= args.max_attributes_per_image:
break
num_attributes.append(len(attribute_ids))
pad_len = args.max_attributes_per_image - len(attribute_ids)
attribute_ids = attribute_ids + [-1] * pad_len
object_attributes.append(attribute_ids)
# Pad object info out to max_objects_per_image
while len(image_object_ids) < args.max_objects_per_image:
image_object_ids.append(-1)
image_object_names.append(-1)
image_object_boxes.append([-1, -1, -1, -1])
num_attributes.append(-1)
# Pad relationship info out to max_relationships_per_image
while len(image_rel_ids) < args.max_relationships_per_image:
image_rel_ids.append(-1)
image_rel_subs.append(-1)
image_rel_preds.append(-1)
image_rel_objs.append(-1)
final_image_ids.append(image_id)
object_ids.append(image_object_ids)
object_names.append(image_object_names)
object_boxes.append(image_object_boxes)
objects_per_image.append(num_objects)
relationship_ids.append(image_rel_ids)
relationship_subjects.append(image_rel_subs)
relationship_predicates.append(image_rel_preds)
relationship_objects.append(image_rel_objs)
relationships_per_image.append(num_relationships)
attributes_per_object.append(num_attributes)
print('Skip stats for split "%s"' % split)
for stat, count in skip_stats.items():
print(stat, count)
print()
numpy_arrays[split] = {
'image_ids': np.asarray(final_image_ids),
'object_ids': np.asarray(object_ids),
'object_names': np.asarray(object_names),
'object_boxes': np.asarray(object_boxes),
'objects_per_image': np.asarray(objects_per_image),
'relationship_ids': np.asarray(relationship_ids),
'relationship_subjects': np.asarray(relationship_subjects),
'relationship_predicates': np.asarray(relationship_predicates),
'relationship_objects': np.asarray(relationship_objects),
'relationships_per_image': np.asarray(relationships_per_image),
'attributes_per_object': np.asarray(attributes_per_object),
'object_attributes': np.asarray(object_attributes),
}
for k, v in numpy_arrays[split].items():
if v.dtype == np.int64:
numpy_arrays[split][k] = v.astype(np.int32)
return numpy_arrays
args = parser.parse_args()
numpy_arrays = encode_graphs(args, splits, objects, relationships, vocab,
object_id_to_obj, attributes)
##观察对 train 数据集所做的信息综合
for key, value in numpy_arrays['train'].items():
##输出 value 的类型和长度
print(key, type(value), len(value))
##输出每个 value 的第一个元素
print(value[0])
输出:
Skip stats for split "train"
too_few_relationships 16402
too_few_objects 6794
too_many_objects 187
too_many_relationships 180
Skip stats for split "test"
too_few_relationships 4803
too_few_objects 837
too_many_objects 26
Skip stats for split "val"
too_few_objects 853
too_few_relationships 4815
too_many_objects 27
too_many_relationships 4
image_ids <class 'numpy.ndarray'> 62565
1
object_ids <class 'numpy.ndarray'> 62565
[1058549 1058534 1058508 1058539 1058543 1058545 1058498 3798579 3798576
3798577 1058507 1058515 5060 1058530 5049 1058531 1058511 1058528
1058547 -1 -1 -1 -1 -1 -1 -1 -1
-1 -1 -1]
object_names <class 'numpy.ndarray'> 62565
[ 2 52 7 60 5 2 95 1 3 3 9 19 134 44 19 32 4 32
4 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1]
object_boxes <class 'numpy.ndarray'> 62565
[[ 0 0 799 557]
[ 78 308 722 290]
[ 1 0 222 538]
[439 283 359 258]
[ 0 1 135 535]
[178 0 476 360]
[422 63 77 363]
[602 1 198 147]
[367 264 82 248]
[238 254 57 259]
[123 13 78 179]
[719 342 80 164]
[716 345 70 164]
[367 296 82 98]
[478 319 78 95]
[388 369 48 128]
[241 287 54 103]
[245 384 44 118]
[368 295 82 102]
[ -1 -1 -1 -1]
[ -1 -1 -1 -1]
[ -1 -1 -1 -1]
[ -1 -1 -1 -1]
[ -1 -1 -1 -1]
[ -1 -1 -1 -1]
[ -1 -1 -1 -1]
[ -1 -1 -1 -1]
[ -1 -1 -1 -1]
[ -1 -1 -1 -1]
[ -1 -1 -1 -1]]
objects_per_image <class 'numpy.ndarray'> 62565
19
relationship_ids <class 'numpy.ndarray'> 62565
[ 15930 15933 15934 15946 3186267 3186270 4265927 -1 -1
-1 -1 -1 -1 -1 -1 -1 -1 -1
-1 -1 -1 -1 -1 -1 -1 -1 -1
-1 -1 -1]
relationship_subjects <class 'numpy.ndarray'> 62565
[10 1 11 11 5 0 2 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
-1 -1 -1 -1 -1 -1]
relationship_predicates <class 'numpy.ndarray'> 62565
[ 1 2 3 4 2 5 6 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
-1 -1 -1 -1 -1 -1]
relationship_objects <class 'numpy.ndarray'> 62565
[ 2 3 12 3 3 1 7 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
-1 -1 -1 -1 -1 -1]
relationships_per_image <class 'numpy.ndarray'> 62565
7
attributes_per_object <class 'numpy.ndarray'> 62565
[ 0 1 2 0 1 0 2 0 0 0 1 2 0 2 0 2 2 1 1 -1 -1 -1 -1 -1
-1 -1 -1 -1 -1 -1]
object_attributes <class 'numpy.ndarray'> 606319
[-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
-1 -1 -1 -1 -1 -1]
经过了让人头晕脑胀甚至不明所以的信息综合后,终于可以把得到的信息写到文件中保存起来了。
Pyhton 代码:
import h5py
def get_image_paths(image_id_to_image, image_ids):
paths = []
for image_id in image_ids:
image = image_id_to_image[image_id]
base, filename = os.path.split(image['url'])
path = os.path.join(os.path.basename(base), filename)
paths.append(path)
return paths
output_h5_dir = './'
output_vocab_json = 'vocab.json'
print('Writing HDF5 output files')
for split_name, split_arrays in numpy_arrays.items():
image_ids = list(split_arrays['image_ids'].astype(int))
h5_path = os.path.join(output_h5_dir, '%s.h5' % split_name)
print('Writing file "%s"' % h5_path)
with h5py.File(h5_path, 'w') as h5_file:
for name, ary in split_arrays.items():
print('Creating datset: ', name, ary.shape, ary.dtype)
h5_file.create_dataset(name, data=ary)
print('Writing image paths')
image_paths = get_image_paths(image_id_to_image, image_ids)
path_dtype = h5py.special_dtype(vlen=str)
path_shape = (len(image_paths),)
path_dset = h5_file.create_dataset('image_paths', path_shape,
dtype=path_dtype)
for i, p in enumerate(image_paths):
path_dset[i] = p
print()
print('Writing vocab to "%s"' % output_vocab_json)
with open(output_vocab_json, 'w') as f:
json.dump(vocab, f)
输出:
Writing HDF5 output files
Writing file "./train.h5"
Creating datset: image_ids (62565,) int32
Creating datset: object_ids (62565, 30) int32
Creating datset: object_names (62565, 30) int32
Creating datset: object_boxes (62565, 30, 4) int32
Creating datset: objects_per_image (62565,) int32
Creating datset: relationship_ids (62565, 30) int32
Creating datset: relationship_subjects (62565, 30) int32
Creating datset: relationship_predicates (62565, 30) int32
Creating datset: relationship_objects (62565, 30) int32
Creating datset: relationships_per_image (62565,) int32
Creating datset: attributes_per_object (62565, 30) int32
Creating datset: object_attributes (606319, 30) int32
Writing image paths
Writing file "./test.h5"
Creating datset: image_ids (5096,) int32
Creating datset: object_ids (5096, 30) int32
Creating datset: object_names (5096, 30) int32
Creating datset: object_boxes (5096, 30, 4) int32
Creating datset: objects_per_image (5096,) int32
Creating datset: relationship_ids (5096, 30) int32
Creating datset: relationship_subjects (5096, 30) int32
Creating datset: relationship_predicates (5096, 30) int32
Creating datset: relationship_objects (5096, 30) int32
Creating datset: relationships_per_image (5096,) int32
Creating datset: attributes_per_object (5096, 30) int32
Creating datset: object_attributes (51626, 30) int32
Writing image paths
Writing file "./val.h5"
Creating datset: image_ids (5062,) int32
Creating datset: object_ids (5062, 30) int32
Creating datset: object_names (5062, 30) int32
Creating datset: object_boxes (5062, 30, 4) int32
Creating datset: objects_per_image (5062,) int32
Creating datset: relationship_ids (5062, 30) int32
Creating datset: relationship_subjects (5062, 30) int32
Creating datset: relationship_predicates (5062, 30) int32
Creating datset: relationship_objects (5062, 30) int32
Creating datset: relationships_per_image (5062,) int32
Creating datset: attributes_per_object (5062, 30) int32
Creating datset: object_attributes (51090, 30) int32
Writing image paths
Writing vocab to "vocab.json"