blauschneiden

如何预处理数据集 Visual Genome

- Visual Genome
- - 构建 download_vg.sh 下载数据库（可选）
  - image_data.json
  - 创建 vg_splits.json 将数据集分为 train, val 和 test
  - 去除 train, val 和 test 中尺寸较小的图片
  - 处理物体和关系的别名（object_alias.txt 和 relationship_alias.txt）
  - objects.json
  - 进一步过滤物体
  - attributes.json
  - relationships.json
  - 综合处理所有图片的信息
  - 将综合的信息写到文件中

Visual Genome

本文中所用的代码绝大部分都来自 image generation from scene graph ，配合源代码食用效果更佳。

构建 download_vg.sh 下载数据库（可选）

bash 程序：

VG_DIR=datasets/vg
mkdir -p $VG_DIR

wget https://visualgenome.org/static/data/dataset/objects.json.zip -O $VG_DIR/objects.json.zip
wget https://visualgenome.org/static/data/dataset/attributes.json.zip -O $VG_DIR/attributes.json.zip
wget https://visualgenome.org/static/data/dataset/relationships.json.zip -O $VG_DIR/relationships.json.zip
wget https://visualgenome.org/static/data/dataset/object_alias.txt -O $VG_DIR/object_alias.txt
wget https://visualgenome.org/static/data/dataset/relationship_alias.txt -O $VG_DIR/relationship_alias.txt
wget https://visualgenome.org/static/data/dataset/image_data.json.zip -O $VG_DIR/image_data.json.zip
wget https://cs.stanford.edu/people/rak248/VG_100K_2/images.zip -O $VG_DIR/images.zip
wget https://cs.stanford.edu/people/rak248/VG_100K_2/images2.zip -O $VG_DIR/images2.zip

unzip $VG_DIR/objects.json.zip -d $VG_DIR
unzip $VG_DIR/attributes.json.zip -d $VG_DIR
unzip $VG_DIR/relationships.json.zip -d $VG_DIR
unzip $VG_DIR/image_data.json.zip -d $VG_DIR
unzip $VG_DIR/images.zip -d $VG_DIR/images
unzip $VG_DIR/images2.zip -d $VG_DIR/images

运行：

bash download_vg.sh

image_data.json

image_data.json 文件包含了 VG 数据库所有图片的信息，对总共的108,077 张图片分别给出了描述，描述内容包括但不限于图片的宽高，url，id 等。

Python 读取示例：

import json

##默认路径是'image_data.json',可根据实际情况选择路径
with open('image_data.json', 'r') as f:
	##读取出的 images 是 list 类型，长度是108,077
	images = json.load(f)

##构建 dict 实现从 id 到 image 信息的一一对应
image_id_to_image = {
     i['image_id']: i for i in images}

##输出一个示例看看 image 信息
for each in images:
	print('Information of a single image: \n', each)
	break

输出：

Information of a single image: 
 {
     'width': 800, 'url': 'https://cs.stanford.edu/people/rak248/VG_100K_2/1.jpg', 'height': 600, 'image_id': 1, 'coco_id': None, 'flickr_id': None}

创建 vg_splits.json 将数据集分为 train, val 和 test

Python 代码：

import json, random

##图片总数为108077，分为 train(80%) 86463，validation(10%) 10807，test(10%) 10807
nums, train_num, val_num = 108077, 86463, 10807

##创建一个set类型存储所有图片id
id_store = set(range(1, nums + 1))

##从 id_store 中取出用作 train 的图片 id
train_ids = random.sample(id_store, train_num)

##计算出剩下的 id
id_remain = id_store.difference(train_ids)

##从 id_remain 中取出用作 validation 的图片 id
val_ids = random.sample(id_remain, val_num)

##剩下的图片 id 用作 test
id_remain = id_remain.difference(val_ids)
test_ids = list(id_remain)

##将这些 id 都放入 dict 类型中
split_dict = {
     "train":train_ids, "val":val_ids, "test":test_ids}

##将类型从 dict 转换为 str， 否则存储为 json 文件的时候会报错
split_str = json.dumps(split_dict)

##存储为 json 文件，默认存储路径为 './data/vg_splits.json'，可根据实际情况修改
with open('data/vg_splits.json', 'w') as f:
	f.write(split_str)

测试是否正确存储为 json 文件：

splits_json = './data/vg_splits.json'
with open(splits_json, 'r') as f:
	splits = json.load(f)
for split_name, split_list in splits.items():
	print(split_name, type(split_list), len(split_list))

输出：

train <class 'list'> 86463
val <class 'list'> 10807
test <class 'list'> 10807

去除 train, val 和 test 中尺寸较小的图片

Python 程序：

def remove_small_images(min_image_size, image_id_to_image, splits):
	new_splits = {
     }
	for split_name, image_ids in splits.items():
		new_image_ids = []
		num_skipped = 0
		for image_id in image_ids:
			image = image_id_to_image[image_id]
			height, width = image['height'], image['width']
			if min(height, width) < min_image_size:
				num_skipped += 1
				continue
			new_image_ids.append(image_id)
		new_splits[split_name] = new_image_ids
		print('Removed %d images from split "%s" for being too small' %
		(num_skipped, split_name))
	return new_splits
	
##设置最小图片尺寸，可根据实际情况进行改变
min_image_size = 200

##读取储存图片 id 的 json 文件
splits_json = './data/vg_splits.json'
with open(splits_json, 'r') as f:
	splits = json.load(f)

##去除尺寸较小的图片 id
splits = remove_small_images(min_image_size, image_id_to_image, splits)

输出：

##由于数据分割是随机的，所以输出结果可能会不同
Removed 335 images from split "train" for being too small
Removed 46 images from split "val" for being too small
Removed 45 images from split "test" for being too small

处理物体和关系的别名（object_alias.txt 和 relationship_alias.txt）

同一个物体类别可以有不同的名称（例如名称有单复数的区别，但表示的还是同一类物体），物体之间的关系描述也是如此，就像 in 和 inside of 表示的也是同一类关系。

Python 代码

def load_aliases(alias_path):
	aliases = {
     }
	with open(alias_path, 'r') as f:
		for line in f:
			## strip() remove spaces at the beginning and at the end of the string
			line = [s.strip() for s in line.split(',')]
			for s in line:
				aliases[s] = line[0]
	return aliases
	
##默认路径是 'object_alias.txt' 和 'relationship_alias.txt'，可根据实际情况改变
obj_aliases = load_aliases('object_alias.txt')
rel_aliases = load_aliases('relationship_alias.txt')

objects.json

读取 objects.json 文件，并构建字典 object_name_to_idx 实现物体名称和编号的一一对应。当然，可以预先设置一些过滤器过滤掉出现次数较少的 object，这能让训练出来的模型表现效果更好。

Python 代码：

def create_object_vocab(min_object_instances, image_ids, objects, aliases, vocab):
	image_ids = set(image_ids)

	print('Making object vocab from %d training images' % len(image_ids))

	##使用 Counter 类型来统计物体出现的次数，方便过滤
	object_name_counter = Counter()
	for image in objects:
		if image['image_id'] not in image_ids:
			continue
		for obj in image['objects']:
			names = set()
			##需要注意的是， object 的名字是存在 list 中的，但我看了几个例子，
			##没有发现 list 放了物体的多个名字，都只有一个名字。
			for name in obj['names']:
				names.add(aliases.get(name, name))
			object_name_counter.update(names)

	object_names = ['__image__']
	for name, count in object_name_counter.most_common():
		##只有当物体出现的次数大于设定的阈值，才把它的名字加进去。
		if count >= min_object_instances:
			object_names.append(name)
	print('Found %d object categories with >= %d training instances' %
	(len(object_names), min_object_instances))
	
	##这个地方也有点意思，从物体的名字到编号用字典存储，但从物体的编号到名字却用表格存储，
	##我能想到的解释是作者不希望让字典的 key 是 int 类型。
	object_name_to_idx = {
     }
	object_idx_to_name = []
	for idx, name in enumerate(object_names):
		object_name_to_idx[name] = idx
		object_idx_to_name.append(name)

	vocab['object_name_to_idx'] = object_name_to_idx
	vocab['object_idx_to_name'] = object_idx_to_name
	
##objects.json 的默认路径为 './objects.json'，可根据实际情况修改。
with open('objects.json', 'r') as f:
	objects = json.load(f)
print('type and length of objects json', type(objects), len(objects))
##可以输出一个例子来看看
print(objects[0])

min_object_instances = 2000
vocab = {
     }
##splits 和 obj_aliases 都在上文生成过了
train_ids = splits['train']
create_object_vocab(min_object_instances, train_ids, objects, obj_aliases, vocab)

输出：

type and length of objects json <class 'list'> 108077

##其实我很惊讶，单个图片就已经列出了这么多物体的信息
{
     'image_id': 1, 'objects': [{
     'synsets': ['tree.n.01'], 'h': 557, 'object_id': 1058549, 'merged_object_ids': [], 'names': ['trees'], 'w': 799, 'y': 0, 'x': 0}, {
     'synsets': ['sidewalk.n.01'], 'h': 290, 'object_id': 1058534, 'merged_object_ids': [5046], 'names': ['sidewalk'], 'w': 722, 'y': 308, 'x': 78}, {
     'synsets': ['building.n.01'], 'h': 538, 'object_id': 1058508, 'merged_object_ids': [], 'names': ['building'], 'w': 222, 'y': 0, 'x': 1}, {
     'synsets': ['street.n.01'], 'h': 258, 'object_id': 1058539, 'merged_object_ids': [3798578], 'names': ['street'], 'w': 359, 'y': 283, 'x': 439}, {
     'synsets': ['wall.n.01'], 'h': 535, 'object_id': 1058543, 'merged_object_ids': [], 'names': ['wall'], 'w': 135, 'y': 1, 'x': 0}, {
     'synsets': ['tree.n.01'], 'h': 360, 'object_id': 1058545, 'merged_object_ids': [], 'names': ['tree'], 'w': 476, 'y': 0, 'x': 178}, {
     'synsets': ['shade.n.01'], 'h': 189, 'object_id': 5045, 'merged_object_ids': [], 'names': ['shade'], 'w': 274, 'y': 344, 'x': 116}, {
     'synsets': ['van.n.05'], 'h': 176, 'object_id': 1058542, 'merged_object_ids': [1058536], 'names': ['van'], 'w': 241, 'y': 278, 'x': 533}, {
     'synsets': ['trunk.n.01'], 'h': 348, 'object_id': 5055, 'merged_object_ids': [], 'names': ['tree trunk'], 'w': 78, 'y': 213, 'x': 623}, {
     'synsets': ['clock.n.01'], 'h': 363, 'object_id': 1058498, 'merged_object_ids': [], 'names': ['clock'], 'w': 77, 'y': 63, 'x': 422}, {
     'synsets': ['window.n.01'], 'h': 147, 'object_id': 3798579, 'merged_object_ids': [], 'names': ['windows'], 'w': 198, 'y': 1, 'x': 602}, {
     'synsets': ['man.n.01'], 'h': 248, 'object_id': 3798576, 'merged_object_ids': [1058540], 'names': ['man'], 'w': 82, 'y': 264, 'x': 367}, {
     'synsets': ['man.n.01'], 'h': 259, 'object_id': 3798577, 'merged_object_ids': [], 'names': ['man'], 'w': 57, 'y': 254, 'x': 238}, {
     'synsets': [], 'h': 430, 'object_id': 1058548, 'merged_object_ids': [], 'names': ['lamp post'], 'w': 43, 'y': 63, 'x': 537}, {
     'synsets': ['sign.n.02'], 'h': 179, 'object_id': 1058507, 'merged_object_ids': [], 'names': ['sign'], 'w': 78, 'y': 13, 'x': 123}, {
     'synsets': ['car.n.01'], 'h': 164, 'object_id': 1058515, 'merged_object_ids': [], 'names': ['car'], 'w': 80, 'y': 342, 'x': 719}, {
     'synsets': ['back.n.01'], 'h': 164, 'object_id': 5060, 'merged_object_ids': [], 'names': ['back'], 'w': 70, 'y': 345, 'x': 716}, {
     'synsets': ['jacket.n.01'], 'h': 98, 'object_id': 1058530, 'merged_object_ids': [], 'names': ['jacket'], 'w': 82, 'y': 296, 'x': 367}, {
     'synsets': ['car.n.01'], 'h': 95, 'object_id': 5049, 'merged_object_ids': [], 'names': ['car'], 'w': 78, 'y': 319, 'x': 478}, {
     'synsets': ['trouser.n.01'], 'h': 128, 'object_id': 1058531, 'merged_object_ids': [], 'names': ['pants'], 'w': 48, 'y': 369, 'x': 388}, {
     'synsets': ['shirt.n.01'], 'h': 103, 'object_id': 1058511, 'merged_object_ids': [], 'names': ['shirt'], 'w': 54, 'y': 287, 'x': 241}, {
     'synsets': ['parking_meter.n.01'], 'h': 143, 'object_id': 1058519, 'merged_object_ids': [], 'names': ['parking meter'], 'w': 26, 'y': 325, 'x': 577}, {
     'synsets': ['trouser.n.01'], 'h': 118, 'object_id': 1058528, 'merged_object_ids': [], 'names': ['pants'], 'w': 44, 'y': 384, 'x': 245}, {
     'synsets': ['shirt.n.01'], 'h': 102, 'object_id': 1058547, 'merged_object_ids': [], 'names': ['shirt'], 'w': 82, 'y': 295, 'x': 368}, {
     'synsets': ['shoe.n.01'], 'h': 28, 'object_id': 1058525, 'merged_object_ids': [5048], 'names': ['shoes'], 'w': 48, 'y': 485, 'x': 388}, {
     'synsets': ['arm.n.01'], 'h': 41, 'object_id': 1058546, 'merged_object_ids': [], 'names': ['arm'], 'w': 30, 'y': 285, 'x': 370}, {
     'synsets': ['bicycle.n.01'], 'h': 36, 'object_id': 1058535, 'merged_object_ids': [], 'names': ['bike'], 'w': 27, 'y': 319, 'x': 337}, {
     'synsets': ['bicycle.n.01'], 'h': 41, 'object_id': 5051, 'merged_object_ids': [], 'names': ['bike'], 'w': 27, 'y': 311, 'x': 321}, {
     'synsets': ['headlight.n.01'], 'h': 9, 'object_id': 5050, 'merged_object_ids': [], 'names': ['headlight'], 'w': 18, 'y': 370, 'x': 517}, {
     'synsets': ['spectacles.n.01'], 'h': 23, 'object_id': 1058518, 'merged_object_ids': [], 'names': ['glasses'], 'w': 43, 'y': 317, 'x': 448}, {
     'synsets': ['chin.n.01'], 'h': 8, 'object_id': 1058541, 'merged_object_ids': [], 'names': ['chin'], 'w': 9, 'y': 288, 'x': 401}], 'image_url': 'https://cs.stanford.edu/people/rak248/VG_100K_2/1.jpg'}

Making object vocab from 86128 training images
Found 179 object categories with >= 2000 training instances

进一步过滤物体

在上面读取 image_data.json 这一步骤中，我们实现了从 image id 到 image 的一一对应。现在我们构建完成了 object_name_to_idx （当然中间已经根据物体出现的次数过滤掉了一部分物体），接下来需要根据每个物体的尺寸进一步过滤物体，并实现从 object id 到 object (object 的 name, idx 和 box) 的一一对应。看到代码实现过程，不禁心疼 CPU 一秒钟。

Python 代码：

def filter_objects(min_object_size, objects, aliases, vocab, splits):
	object_id_to_objects = {
     }
	all_image_ids = set()
	for image_ids in splits.values():
		all_image_ids |= set(image_ids)

	object_name_to_idx = vocab['object_name_to_idx']
	object_id_to_obj = {
     }

	num_too_small = 0
	for image in objects:
		image_id = image['image_id']
		if image_id not in all_image_ids:
			continue
		for obj in image['objects']:
			object_id = obj['object_id']
			final_name = None
			final_name_idx = None
			for name in obj['names']:
				name = aliases.get(name, name)
				if name in object_name_to_idx:
					final_name = name
					final_name_idx = object_name_to_idx[final_name]
					break
			w, h = obj['w'], obj['h']
			too_small = (w < min_object_size) or (h < min_object_size)
			if too_small:
				num_too_small += 1
			if final_name is not None and not too_small:
				object_id_to_obj[object_id] = {
     
				'name': final_name,
				'name_idx': final_name_idx,
				'box': [obj['x'], obj['y'], obj['w'], obj['h']],
				}
	print('Skipped %d objects with size < %d' % (num_too_small, min_object_size))
	return object_id_to_obj

min_object_size = 32
object_id_to_obj = filter_objects(min_object_size, objects, obj_aliases, vocab, splits)

输出：

Skipped 997213 objects with size < 32

attributes.json

读取 attributes.json 文件，并构建字典 attribute_name_to_idx 实现属性名称和编号的一一对应。当然，可以预先设置一些过滤器过滤掉出现次数较少的 attribute，这能让训练出来的模型表现效果更好。该处理过程与处理 objects.json 类似。

Python 代码：

def create_attribute_vocab(min_attribute_instances, image_ids, attributes, vocab):
	image_ids = set(image_ids)
	print('Making attribute vocab from %d training images' % len(image_ids))
	attribute_name_counter = Counter()
	for image in attributes:
		if image['image_id'] not in image_ids:
			continue
		for attribute in image['attributes']:
			names = set()
			##这里用 try ... except 是因为图片中的有些物体并没有 attributes，
			##如果不用这个语法就会报 KeyError 的错。
			try:
				for name in attribute['attributes']:
					names.add(name)
				attribute_name_counter.update(names)
			except KeyError:
				pass
	attribute_names = []
	for name, count in attribute_name_counter.most_common():
		if count >= min_attribute_instances:
			attribute_names.append(name)
	print('Found %d attribute categories with >= %d training instances' %
		(len(attribute_names), min_attribute_instances))

	attribute_name_to_idx = {
     }
	attribute_idx_to_name = []
	for idx, name in enumerate(attribute_names):
		attribute_name_to_idx[name] = idx
		attribute_idx_to_name.append(name)
	vocab['attribute_name_to_idx'] = attribute_name_to_idx
	vocab['attribute_idx_to_name'] = attribute_idx_to_name

##attributes.json 的默认路径为 './attributes.json'，可根据实际情况修改。
with open('attributes.json', 'r') as f:
	attributes = json.load(f)
print('type of attributes json', type(attributes), len(attributes))
##输出一个例子来看看
print(attributes[0])

min_attribute_instances = 2000
create_attribute_vocab(min_attribute_instances, train_ids, attributes, vocab)

输出：

type of attributes json <class 'list'> 108077

##这个例子很长很长很长。。。
{
     'image_id': 1, 'attributes': [{
     'synsets': ['clock.n.01'], 'h': 339, 'object_id': 1058498, 'names': ['clock'], 'w': 79, 'attributes': ['green', 'tall'], 'y': 91, 'x': 421}, {
     'synsets': ['street.n.01'], 'h': 262, 'object_id': 5046, 'names': ['street'], 'w': 714, 'attributes': ['sidewalk'], 'y': 328, 'x': 77}, {
     'synsets': ['shade.n.01'], 'h': 192, 'object_id': 5045, 'names': ['shade'], 'w': 274, 'y': 338, 'x': 119}, {
     'synsets': ['man.n.01'], 'h': 262, 'object_id': 1058529, 'names': ['man'], 'w': 60, 'y': 249, 'x': 238}, {
     'synsets': ['gym_shoe.n.01'], 'h': 26, 'object_id': 5048, 'names': ['sneakers'], 'w': 52, 'attributes': ['grey'], 'y': 489, 'x': 243}, {
     'synsets': ['headlight.n.01'], 'h': 15, 'object_id': 5050, 'names': ['headlight'], 'w': 23, 'attributes': ['off'], 'y': 366, 'x': 514}, {
     'synsets': ['car.n.01'], 'h': 98, 'object_id': 5049, 'names': ['car'], 'w': 74, 'y': 315, 'x': 479}, {
     'synsets': ['bicycle.n.01'], 'h': 34, 'object_id': 5051, 'names': ['bike'], 'w': 28, 'attributes': ['parked', 'far away'], 'y': 319, 'x': 318}, {
     'synsets': ['bicycle.n.01'], 'h': 35, 'object_id': 1058535, 'names': ['bike'], 'w': 29, 'attributes': ['parked', 'far away', 'chained'], 'y': 319, 'x': 334}, {
     'synsets': ['sign.n.02'], 'h': 182, 'object_id': 1058507, 'names': ['sign'], 'w': 88, 'attributes': ['black'], 'y': 13, 'x': 118}, {
     'synsets': ['building.n.01'], 'h': 536, 'object_id': 1058508, 'names': ['building'], 'w': 218, 'attributes': ['tall', 'brick', 'made of bricks'], 'y': 2, 'x': 1}, {
     'synsets': ['trunk.n.01'], 'h': 327, 'object_id': 5055, 'names': ['tree trunk'], 'w': 87, 'y': 234, 'x': 622}, {
     'synsets': ['sidewalk.n.01'], 'h': 266, 'object_id': 1058534, 'names': ['sidewalk'], 'w': 722, 'attributes': ['brick'], 'y': 331, 'x': 77}, {
     'synsets': ['shirt.n.01'], 'h': 101, 'object_id': 1058511, 'names': ['shirt'], 'w': 59, 'attributes': ['red', 'orange'], 'y': 289, 'x': 241}, {
     'synsets': ['street.n.01'], 'h': 233, 'object_id': 1058539, 'names': ['street'], 'w': 440, 'attributes': ['clean'], 'y': 283, 'x': 358}, {
     'synsets': ['car.n.01'], 'h': 174, 'object_id': 1058515, 'names': ['car'], 'w': 91, 'attributes': ['white', 'parked'], 'y': 342, 'x': 708}, {
     'synsets': ['back.n.01'], 'h': 170, 'object_id': 5060, 'names': ['back'], 'w': 67, 'y': 339, 'x': 721}, {
     'synsets': ['spectacles.n.01'], 'h': 12, 'object_id': 1058518, 'names': ['glasses'], 'w': 20, 'y': 268, 'x': 271}, {
     'synsets': ['parking_meter.n.01'], 'h': 143, 'object_id': 1058519, 'names': ['parking meter'], 'w': 32, 'attributes': ['orange'], 'y': 327, 'x': 574}, {
     'synsets': ['shoe.n.01'], 'h': 34, 'object_id': 1058525, 'names': ['shoes'], 'w': 46, 'attributes': ['brown'], 'y': 481, 'x': 391}, {
     'synsets': ['man.n.01'], 'h': 251, 'object_id': 1058532, 'names': ['man'], 'w': 75, 'y': 264, 'x': 372}, {
     'synsets': ['trouser.n.01'], 'h': 118, 'object_id': 1058528, 'names': ['pants'], 'w': 38, 'attributes': ['black'], 'y': 384, 'x': 245}, {
     'synsets': ['jacket.n.01'], 'h': 97, 'object_id': 1058530, 'names': ['jacket'], 'w': 89, 'attributes': ['gray', 'grey'], 'y': 296, 'x': 356}, {
     'synsets': ['trouser.n.01'], 'h': 128, 'object_id': 1058531, 'names': ['pants'], 'w': 54, 'attributes': ['gray', 'grey'], 'y': 369, 'x': 382}, {
     'synsets': [], 'h': 185, 'object_id': 1058536, 'names': ['work truck'], 'w': 265, 'attributes': ['white'], 'y': 271, 'x': 521}, {
     'synsets': ['sidewalk.n.01'], 'h': 189, 'object_id': 3798575, 'names': ['sidewalk'], 'w': 50, 'y': 318, 'x': 343}, {
     'synsets': ['chin.n.01'], 'h': 9, 'object_id': 1058541, 'names': ['chin'], 'w': 11, 'attributes': ['raised'], 'y': 288, 'x': 399}, {
     'synsets': ['guy.n.01'], 'h': 250, 'object_id': 1058540, 'names': ['guy'], 'w': 82, 'y': 264, 'x': 369}, {
     'synsets': ['van.n.05'], 'h': 134, 'object_id': 1058542, 'names': ['van'], 'w': 233, 'attributes': ['parked', 'white'], 'y': 298, 'x': 529}, {
     'synsets': ['wall.n.01'], 'h': 533, 'object_id': 1058543, 'names': ['wall'], 'w': 134, 'attributes': ['grey'], 'y': 1, 'x': 0}, {
     'synsets': ['tree.n.01'], 'h': 360, 'object_id': 1058545, 'names': ['tree'], 'w': 176, 'y': 0, 'x': 249}, {
     'synsets': ['bicycle.n.01'], 'h': 35, 'object_id': 1058544, 'names': ['bikes'], 'w': 40, 'y': 319, 'x': 321}, {
     'synsets': ['arm.n.01'], 'h': 43, 'object_id': 1058546, 'names': ['arm'], 'w': 32, 'attributes': ['raised'], 'y': 283, 'x': 368}, {
     'synsets': ['shirt.n.01'], 'h': 66, 'object_id': 1058547, 'names': ['shirt'], 'w': 37, 'attributes': ['grey'], 'y': 306, 'x': 384}, {
     'synsets': ['man.n.01'], 'h': 248, 'object_id': 3798576, 'names': ['man'], 'w': 97, 'y': 264, 'x': 362}, {
     'synsets': ['man.n.01'], 'h': 264, 'object_id': 3798577, 'names': ['man'], 'w': 72, 'y': 251, 'x': 230}, {
     'synsets': ['road.n.01'], 'h': 218, 'object_id': 3798578, 'names': ['road'], 'w': 340, 'y': 295, 'x': 435}, {
     'synsets': [], 'h': 430, 'object_id': 1058548, 'names': ['lamp post'], 'w': 41, 'y': 63, 'x': 537}, {
     'synsets': ['tree.n.01'], 'h': 557, 'object_id': 1058549, 'names': ['trees'], 'w': 606, 'attributes': ['sparse'], 'y': 0, 'x': 190}, {
     'synsets': ['window.n.01'], 'h': 148, 'object_id': 3798579, 'names': ['windows'], 'w': 173, 'y': 4, 'x': 602}]}

Making attribute vocab from 86128 training images
Found 80 attribute categories with >= 2000 training instances

relationships.json

读取 relationships.json 文件，并构建字典 pred_name_to_idx 实现关系名称和编号的一一对应。当然，可以预先设置一些过滤器过滤掉出现次数较少的 relationship，这能让训练出来的模型表现效果更好。

Python 代码：

def create_rel_vocab(min_relationship_instances, image_ids, relationships, 
                     object_id_to_obj, rel_aliases, vocab):
	pred_counter = defaultdict(int)
	image_ids_set = set(image_ids)
	for image in relationships:
		image_id = image['image_id']
		if image_id not in image_ids_set:
			continue
		for rel in image['relationships']:
			sid = rel['subject']['object_id']
			oid = rel['object']['object_id']
			found_subject = sid in object_id_to_obj
			found_object = oid in object_id_to_obj
			if not found_subject or not found_object:
				continue
			pred = rel['predicate'].lower().strip()
			pred = rel_aliases.get(pred, pred)
			rel['predicate'] = pred
			pred_counter[pred] += 1

	pred_names = ['__in_image__']
	for pred, count in pred_counter.items():
		if count >= min_relationship_instances:
			pred_names.append(pred)
	print('Found %d relationship types with >= %d training instances'
		% (len(pred_names), min_relationship_instances))

	pred_name_to_idx = {
     }
	pred_idx_to_name = []
	for idx, name in enumerate(pred_names):
		pred_name_to_idx[name] = idx
		pred_idx_to_name.append(name)

	vocab['pred_name_to_idx'] = pred_name_to_idx
	vocab['pred_idx_to_name'] = pred_idx_to_name

##relationships.json 的默认路径是 './relationships.json',可根据实际情况修改。
with open('relationships.json', 'r') as f:
	relationships = json.load(f)
print('type of relationships json', type(relationships), len(relationships))
##输出一个示例看看
print(relationships[0])
##注意下面函数所用的参数在之前的步骤中都已经生成了。
min_relationship_instances = 500
create_rel_vocab(min_relationship_instances, train_ids, relationships,
	             object_id_to_obj, rel_aliases, vocab)

输出：

type of relationships json <class 'list'> 108077

##又是一个很长很长的示例。。。。
{
     'relationships': [{
     'predicate': 'ON', 'object': {
     'h': 290, 'object_id': 1058534, 'merged_object_ids': [5046], 'synsets': ['sidewalk.n.01'], 'w': 722, 'y': 308, 'x': 78, 'names': ['sidewalk']}, 'relationship_id': 15927, 'synsets': ['along.r.01'], 'subject': {
     'name': 'shade', 'h': 192, 'synsets': ['shade.n.01'], 'object_id': 5045, 'w': 274, 'y': 338, 'x': 119}}, {
     'predicate': 'wears', 'object': {
     'h': 28, 'object_id': 1058525, 'merged_object_ids': [5048], 'synsets': ['shoe.n.01'], 'w': 48, 'y': 485, 'x': 388, 'names': ['shoes']}, 'relationship_id': 15928, 'synsets': ['wear.v.01'], 'subject': {
     'name': 'man', 'h': 262, 'synsets': ['man.n.01'], 'object_id': 1058529, 'w': 60, 'y': 249, 'x': 238}}, {
     'predicate': 'has', 'object': {
     'name': 'headlight', 'h': 15, 'synsets': ['headlight.n.01'], 'object_id': 5050, 'w': 23, 'y': 366, 'x': 514}, 'relationship_id': 15929, 'synsets': ['have.v.01'], 'subject': {
     'name': 'car', 'h': 98, 'synsets': ['car.n.01'], 'object_id': 5049, 'w': 74, 'y': 315, 'x': 479}}, {
     'predicate': 'ON', 'object': {
     'name': 'building', 'h': 536, 'synsets': ['building.n.01'], 'object_id': 1058508, 'w': 218, 'y': 2, 'x': 1}, 'relationship_id': 15930, 'synsets': ['along.r.01'], 'subject': {
     'name': 'sign', 'h': 182, 'synsets': ['sign.n.02'], 'object_id': 1058507, 'w': 88, 'y': 13, 'x': 118}}, {
     'predicate': 'ON', 'object': {
     'name': 'sidewalk', 'h': 266, 'synsets': ['sidewalk.n.01'], 'object_id': 1058534, 'w': 722, 'y': 331, 'x': 77}, 'relationship_id': 15931, 'synsets': ['along.r.01'], 'subject': {
     'name': 'tree trunk', 'h': 327, 'synsets': ['trunk.n.01'], 'object_id': 5055, 'w': 87, 'y': 234, 'x': 622}}, {
     'predicate': 'has', 'object': {
     'name': 'shirt', 'h': 101, 'synsets': ['shirt.n.01'], 'object_id': 1058511, 'w': 59, 'y': 289, 'x': 241}, 'relationship_id': 15932, 'synsets': ['have.v.01'], 'subject': {
     'name': 'man', 'h': 262, 'synsets': ['man.n.01'], 'object_id': 1058529, 'w': 60, 'y': 249, 'x': 238}}, {
     'predicate': 'next to', 'object': {
     'name': 'street', 'h': 233, 'synsets': ['street.n.01'], 'object_id': 1058539, 'w': 440, 'y': 283, 'x': 358}, 'relationship_id': 15933, 'synsets': ['next.r.01'], 'subject': {
     'name': 'sidewalk', 'h': 266, 'synsets': ['sidewalk.n.01'], 'object_id': 1058534, 'w': 722, 'y': 331, 'x': 77}}, {
     'predicate': 'has', 'object': {
     'name': 'back', 'h': 170, 'synsets': ['back.n.01'], 'object_id': 5060, 'w': 67, 'y': 339, 'x': 721}, 'relationship_id': 15934, 'synsets': ['have.v.01'], 'subject': {
     'name': 'car', 'h': 174, 'synsets': ['car.n.01'], 'object_id': 1058515, 'w': 91, 'y': 342, 'x': 708}}, {
     'predicate': 'has', 'object': {
     'name': 'glasses', 'h': 12, 'synsets': ['spectacles.n.01'], 'object_id': 1058518, 'w': 20, 'y': 268, 'x': 271}, 'relationship_id': 15935, 'synsets': ['have.v.01'], 'subject': {
     'name': 'man', 'h': 262, 'synsets': ['man.n.01'], 'object_id': 1058529, 'w': 60, 'y': 249, 'x': 238}}, {
     'predicate': 'ON', 'object': {
     'name': 'sidewalk', 'h': 266, 'synsets': ['sidewalk.n.01'], 'object_id': 1058534, 'w': 722, 'y': 331, 'x': 77}, 'relationship_id': 15936, 'synsets': ['along.r.01'], 'subject': {
     'name': 'parking meter', 'h': 143, 'synsets': ['parking_meter.n.01'], 'object_id': 1058519, 'w': 32, 'y': 327, 'x': 574}}, {
     'predicate': 'wears', 'object': {
     'h': 28, 'object_id': 1058525, 'merged_object_ids': [5048], 'synsets': ['shoe.n.01'], 'w': 48, 'y': 485, 'x': 388, 'names': ['shoes']}, 'relationship_id': 15937, 'synsets': ['wear.v.01'], 'subject': {
     'name': 'man', 'h': 262, 'synsets': ['man.n.01'], 'object_id': 1058529, 'w': 60, 'y': 249, 'x': 238}}, {
     'predicate': 'has', 'object': {
     'name': 'shoes', 'h': 34, 'synsets': ['shoe.n.01'], 'object_id': 1058525, 'w': 46, 'y': 481, 'x': 391}, 'relationship_id': 15938, 'synsets': ['have.v.01'], 'subject': {
     'name': 'man', 'h': 251, 'synsets': ['man.n.01'], 'object_id': 1058532, 'w': 75, 'y': 264, 'x': 372}}, {
     'predicate': 'has', 'object': {
     'name': 'shirt', 'h': 101, 'synsets': ['shirt.n.01'], 'object_id': 1058511, 'w': 59, 'y': 289, 'x': 241}, 'relationship_id': 15939, 'synsets': ['have.v.01'], 'subject': {
     'name': 'man', 'h': 262, 'synsets': ['man.n.01'], 'object_id': 1058529, 'w': 60, 'y': 249, 'x': 238}}, {
     'predicate': 'wears', 'object': {
     'name': 'pants', 'h': 118, 'synsets': ['trouser.n.01'], 'object_id': 1058528, 'w': 38, 'y': 384, 'x': 245}, 'relationship_id': 15940, 'synsets': ['wear.v.01'], 'subject': {
     'name': 'man', 'h': 262, 'synsets': ['man.n.01'], 'object_id': 1058529, 'w': 60, 'y': 249, 'x': 238}}, {
     'predicate': 'has', 'object': {
     'name': 'jacket', 'h': 97, 'synsets': ['jacket.n.01'], 'object_id': 1058530, 'w': 89, 'y': 296, 'x': 356}, 'relationship_id': 15941, 'synsets': ['have.v.01'], 'subject': {
     'name': 'man', 'h': 251, 'synsets': ['man.n.01'], 'object_id': 1058532, 'w': 75, 'y': 264, 'x': 372}}, {
     'predicate': 'has', 'object': {
     'name': 'pants', 'h': 128, 'synsets': ['trouser.n.01'], 'object_id': 1058531, 'w': 54, 'y': 369, 'x': 382}, 'relationship_id': 15942, 'synsets': ['have.v.01'], 'subject': {
     'name': 'man', 'h': 251, 'synsets': ['man.n.01'], 'object_id': 1058532, 'w': 75, 'y': 264, 'x': 372}}, {
     'predicate': 'parked on', 'object': {
     'name': 'sidewalk', 'h': 266, 'synsets': ['sidewalk.n.01'], 'object_id': 1058534, 'w': 722, 'y': 331, 'x': 77}, 'relationship_id': 15943, 'synsets': ['along.r.01'], 'subject': {
     'name': 'bike', 'h': 34, 'synsets': ['bicycle.n.01'], 'object_id': 5051, 'w': 28, 'y': 319, 'x': 318}}, {
     'predicate': 'parked on', 'object': {
     'name': 'sidewalk', 'h': 266, 'synsets': ['sidewalk.n.01'], 'object_id': 1058534, 'w': 722, 'y': 331, 'x': 77}, 'relationship_id': 15944, 'synsets': ['along.r.01'], 'subject': {
     'name': 'bike', 'h': 35, 'synsets': ['bicycle.n.01'], 'object_id': 1058535, 'w': 29, 'y': 319, 'x': 334}}, {
     'predicate': 'parked on', 'object': {
     'name': 'street', 'h': 233, 'synsets': ['street.n.01'], 'object_id': 1058539, 'w': 440, 'y': 283, 'x': 358}, 'relationship_id': 15945, 'synsets': ['along.r.01'], 'subject': {
     'h': 176, 'object_id': 1058542, 'merged_object_ids': [1058536], 'synsets': ['van.n.05'], 'w': 241, 'y': 278, 'x': 533, 'names': ['van']}}, {
     'predicate': 'parked on', 'object': {
     'name': 'street', 'h': 233, 'synsets': ['street.n.01'], 'object_id': 1058539, 'w': 440, 'y': 283, 'x': 358}, 'relationship_id': 15946, 'synsets': ['along.r.01'], 'subject': {
     'name': 'car', 'h': 174, 'synsets': ['car.n.01'], 'object_id': 1058515, 'w': 91, 'y': 342, 'x': 708}}, {
     'predicate': 'ON', 'object': {
     'name': 'sidewalk', 'h': 189, 'synsets': ['sidewalk.n.01'], 'object_id': 3798575, 'w': 50, 'y': 318, 'x': 343}, 'relationship_id': 4265923, 'synsets': ['along.r.01'], 'subject': {
     'name': 'bike', 'h': 35, 'synsets': ['bicycle.n.01'], 'object_id': 1058535, 'w': 29, 'y': 319, 'x': 334}}, {
     'predicate': 'behind', 'object': {
     'name': 'man', 'h': 251, 'synsets': ['man.n.01'], 'object_id': 1058532, 'w': 75, 'y': 264, 'x': 372}, 'relationship_id': 3186256, 'synsets': ['behind.r.01'], 'subject': {
     'name': 'parking meter', 'h': 143, 'synsets': ['parking_meter.n.01'], 'object_id': 1058519, 'w': 32, 'y': 327, 'x': 574}}, {
     'predicate': 'holding', 'object': {
     'name': 'chin', 'h': 9, 'synsets': ['chin.n.01'], 'object_id': 1058541, 'w': 11, 'y': 288, 'x': 399}, 'relationship_id': 3186257, 'synsets': ['have.v.01'], 'subject': {
     'h': 248, 'object_id': 3798576, 'merged_object_ids': [1058540], 'synsets': ['man.n.01'], 'w': 82, 'y': 264, 'x': 367, 'names': ['man']}}, {
     'predicate': 'WEARING', 'object': {
     'name': 'shirt', 'h': 101, 'synsets': ['shirt.n.01'], 'object_id': 1058511, 'w': 59, 'y': 289, 'x': 241}, 'relationship_id': 3186258, 'synsets': ['wear.v.01'], 'subject': {
     'name': 'man', 'h': 262, 'synsets': ['man.n.01'], 'object_id': 1058529, 'w': 60, 'y': 249, 'x': 238}}, {
     'predicate': 'holding', 'object': {
     'name': 'chin', 'h': 9, 'synsets': ['chin.n.01'], 'object_id': 1058541, 'w': 11, 'y': 288, 'x': 399}, 'relationship_id': 3186259, 'synsets': ['have.v.01'], 'subject': {
     'name': 'man', 'h': 251, 'synsets': ['man.n.01'], 'object_id': 1058532, 'w': 75, 'y': 264, 'x': 372}}, {
     'predicate': 'near', 'object': {
     'name': 'tree', 'h': 360, 'synsets': ['tree.n.01'], 'object_id': 1058545, 'w': 176, 'y': 0, 'x': 249}, 'relationship_id': 3186260, 'synsets': ['about.r.07'], 'subject': {
     'name': 'bikes', 'h': 35, 'synsets': ['bicycle.n.01'], 'object_id': 1058544, 'w': 40, 'y': 319, 'x': 321}}, {
     'predicate': 'WEARING', 'object': {
     'name': 'shoes', 'h': 34, 'synsets': ['shoe.n.01'], 'object_id': 1058525, 'w': 46, 'y': 481, 'x': 391}, 'relationship_id': 3186261, 'synsets': ['wear.v.01'], 'subject': {
     'name': 'man', 'h': 251, 'synsets': ['man.n.01'], 'object_id': 1058532, 'w': 75, 'y': 264, 'x': 372}}, {
     'predicate': 'near', 'object': {
     'name': 'tree', 'h': 360, 'synsets': ['tree.n.01'], 'object_id': 1058545, 'w': 176, 'y': 0, 'x': 249}, 'relationship_id': 3186262, 'synsets': ['about.r.07'], 'subject': {
     'name': 'bikes', 'h': 35, 'synsets': ['bicycle.n.01'], 'object_id': 1058544, 'w': 40, 'y': 319, 'x': 321}}, {
     'predicate': 'ON', 'object': {
     'name': 'man', 'h': 262, 'synsets': ['man.n.01'], 'object_id': 1058529, 'w': 60, 'y': 249, 'x': 238}, 'relationship_id': 3186263, 'synsets': ['along.r.01'], 'subject': {
     'name': 'shirt', 'h': 101, 'synsets': ['shirt.n.01'], 'object_id': 1058511, 'w': 59, 'y': 289, 'x': 241}}, {
     'predicate': 'holding', 'object': {
     'name': 'chin', 'h': 9, 'synsets': ['chin.n.01'], 'object_id': 1058541, 'w': 11, 'y': 288, 'x': 399}, 'relationship_id': 4265924, 'synsets': ['have.v.01'], 'subject': {
     'name': 'man', 'h': 248, 'synsets': ['man.n.01'], 'object_id': 3798576, 'w': 97, 'y': 264, 'x': 362}}, {
     'predicate': 'WEARING', 'object': {
     'name': 'glasses', 'h': 12, 'synsets': ['spectacles.n.01'], 'object_id': 1058518, 'w': 20, 'y': 268, 'x': 271}, 'relationship_id': 4265925, 'synsets': ['wear.v.01'], 'subject': {
     'name': 'man', 'h': 264, 'synsets': ['man.n.01'], 'object_id': 3798577, 'w': 72, 'y': 251, 'x': 230}}, {
     'predicate': 'along', 'object': {
     'h': 258, 'object_id': 1058539, 'merged_object_ids': [3798578], 'synsets': ['street.n.01'], 'w': 359, 'y': 283, 'x': 439, 'names': ['street']}, 'relationship_id': 4265926, 'synsets': ['along.r.01'], 'subject': {
     'name': 'lamp post', 'h': 430, 'synsets': [], 'object_id': 1058548, 'w': 41, 'y': 63, 'x': 537}}, {
     'predicate': 'IN', 'object': {
     'name': 'shirt', 'h': 101, 'synsets': ['shirt.n.01'], 'object_id': 1058511, 'w': 59, 'y': 289, 'x': 241}, 'relationship_id': 3186264, 'synsets': ['in.r.01'], 'subject': {
     'name': 'man', 'h': 262, 'synsets': ['man.n.01'], 'object_id': 1058529, 'w': 60, 'y': 249, 'x': 238}}, {
     'predicate': 'WEARING', 'object': {
     'name': 'pants', 'h': 118, 'synsets': ['trouser.n.01'], 'object_id': 1058528, 'w': 38, 'y': 384, 'x': 245}, 'relationship_id': 3186265, 'synsets': ['wear.v.01'], 'subject': {
     'name': 'man', 'h': 262, 'synsets': ['man.n.01'], 'object_id': 1058529, 'w': 60, 'y': 249, 'x': 238}}, {
     'predicate': 'on top of', 'object': {
     'name': 'street', 'h': 233, 'synsets': ['street.n.01'], 'object_id': 1058539, 'w': 440, 'y': 283, 'x': 358}, 'relationship_id': 3186266, 'synsets': ['along.r.01'], 'subject': {
     'name': 'parking meter', 'h': 143, 'synsets': ['parking_meter.n.01'], 'object_id': 1058519, 'w': 32, 'y': 327, 'x': 574}}, {
     'predicate': 'next to', 'object': {
     'name': 'street', 'h': 233, 'synsets': ['street.n.01'], 'object_id': 1058539, 'w': 440, 'y': 283, 'x': 358}, 'relationship_id': 3186267, 'synsets': ['next.r.01'], 'subject': {
     'name': 'tree', 'h': 360, 'synsets': ['tree.n.01'], 'object_id': 1058545, 'w': 176, 'y': 0, 'x': 249}}, {
     'predicate': 'WEARING', 'object': {
     'name': 'glasses', 'h': 12, 'synsets': ['spectacles.n.01'], 'object_id': 1058518, 'w': 20, 'y': 268, 'x': 271}, 'relationship_id': 3186268, 'synsets': ['wear.v.01'], 'subject': {
     'name': 'man', 'h': 262, 'synsets': ['man.n.01'], 'object_id': 1058529, 'w': 60, 'y': 249, 'x': 238}}, {
     'predicate': 'behind', 'object': {
     'name': 'man', 'h': 262, 'synsets': ['man.n.01'], 'object_id': 1058529, 'w': 60, 'y': 249, 'x': 238}, 'relationship_id': 3186269, 'synsets': ['behind.r.01'], 'subject': {
     'name': 'bikes', 'h': 35, 'synsets': ['bicycle.n.01'], 'object_id': 1058544, 'w': 40, 'y': 319, 'x': 321}}, {
     'predicate': 'by', 'object': {
     'name': 'sidewalk', 'h': 266, 'synsets': ['sidewalk.n.01'], 'object_id': 1058534, 'w': 722, 'y': 331, 'x': 77}, 'relationship_id': 3186270, 'synsets': ['by.r.01'], 'subject': {
     'name': 'trees', 'h': 557, 'synsets': ['tree.n.01'], 'object_id': 1058549, 'w': 606, 'y': 0, 'x': 190}}, {
     'predicate': 'WEARING', 'object': {
     'name': 'jacket', 'h': 97, 'synsets': ['jacket.n.01'], 'object_id': 1058530, 'w': 89, 'y': 296, 'x': 356}, 'relationship_id': 3186271, 'synsets': ['wear.v.01'], 'subject': {
     'name': 'man', 'h': 251, 'synsets': ['man.n.01'], 'object_id': 1058532, 'w': 75, 'y': 264, 'x': 372}}, {
     'predicate': 'with', 'object': {
     'name': 'windows', 'h': 148, 'synsets': ['window.n.01'], 'object_id': 3798579, 'w': 173, 'y': 4, 'x': 602}, 'relationship_id': 4265927, 'synsets': [], 'subject': {
     'name': 'building', 'h': 536, 'synsets': ['building.n.01'], 'object_id': 1058508, 'w': 218, 'y': 2, 'x': 1}}], 'image_id': 1}

Found 46 relationship types with >= 500 training instances

综合处理所有图片的信息

到目前为止，我们得到了图片 id 到图片的一一对应 (image_id_to_image)，物体的名字到编号的一一对应 (object_name_to_idx)，描述的名字到编号的一一对应 (attribute_name_to_idx)，关系的名字到编号的一一对应 (pred_name_to_idx)，甚至在心疼 CPU 的同时得到了物体的 id 到物体相关信息（‘name’, ‘name_idx’, ‘box’）的一一对应 (object_id_to_obj)。当然了，在得到这些信息的过程中我们都添加了各种各样的过滤器用来筛选出符合我们标准的信息。接下来，激动人心的时刻就要来临了（为什么我的脑海中浮现了凯南的台词），我们需要综合处理我们所得到的所有信息，实现每一张图片信息的标准化，此时，需要心疼 CPU 2秒钟。

Python 代码：

import argparse, json, os
from collections import Counter, defaultdict
import numpy as np
parser = argparse.ArgumentParser()

parser.add_argument('--min_objects_per_image', default=3, type=int)
parser.add_argument('--max_objects_per_image', default=30, type=int)
parser.add_argument('--max_attributes_per_image', default=30, type=int)
parser.add_argument('--min_relationships_per_image', default=1, type=int)
parser.add_argument('--max_relationships_per_image', default=30, type=int)

def encode_graphs(args, splits, objects, relationships, vocab,
                  object_id_to_obj, attributes):

	image_id_to_objects = {
     }
	for image in objects:
		image_id = image['image_id']
		image_id_to_objects[image_id] = image['objects']
	image_id_to_relationships = {
     }
	for image in relationships:
		image_id = image['image_id']
		image_id_to_relationships[image_id] = image['relationships']
	image_id_to_attributes = {
     }
	for image in attributes:
		image_id = image['image_id']
		image_id_to_attributes[image_id] = image['attributes']

	numpy_arrays = {
     }
	for split, image_ids in splits.items():
		skip_stats = defaultdict(int)
		# We need to filter *again* based on number of objects and relationships
		final_image_ids = []
		object_ids = []
		object_names = []
		object_boxes = []
		objects_per_image = []
		relationship_ids = []
		relationship_subjects = []
		relationship_predicates = []
		relationship_objects = []
		relationships_per_image = []
		attribute_ids = []
		attributes_per_object = []
		object_attributes = []
		for image_id in image_ids:
			image_object_ids = []
			image_object_names = []
			image_object_boxes = []
			object_id_to_idx = {
     }
			for obj in image_id_to_objects[image_id]:
				object_id = obj['object_id']
				if object_id not in object_id_to_obj:
					continue
				obj = object_id_to_obj[object_id]
				object_id_to_idx[object_id] = len(image_object_ids)
				image_object_ids.append(object_id)
				image_object_names.append(obj['name_idx'])
				image_object_boxes.append(obj['box'])
			num_objects = len(image_object_ids)
			too_few = num_objects < args.min_objects_per_image
			too_many = num_objects > args.max_objects_per_image
			if too_few:
				skip_stats['too_few_objects'] += 1
				continue
			if too_many:
				skip_stats['too_many_objects'] += 1
				continue
			image_rel_ids = []
			image_rel_subs = []
			image_rel_preds = []
			image_rel_objs = []
			for rel in image_id_to_relationships[image_id]:
				relationship_id = rel['relationship_id']
				pred = rel['predicate']
				pred_idx = vocab['pred_name_to_idx'].get(pred, None)
				if pred_idx is None:
					continue
				sid = rel['subject']['object_id']
				sidx = object_id_to_idx.get(sid, None)
				oid = rel['object']['object_id']
				oidx = object_id_to_idx.get(oid, None)
				if sidx is None or oidx is None:
					continue
				image_rel_ids.append(relationship_id)
				image_rel_subs.append(sidx)
				image_rel_preds.append(pred_idx)
				image_rel_objs.append(oidx)
			num_relationships = len(image_rel_ids)
			too_few = num_relationships < args.min_relationships_per_image
			too_many = num_relationships > args.max_relationships_per_image
			if too_few:
				skip_stats['too_few_relationships'] += 1
				continue
			if too_many:
				skip_stats['too_many_relationships'] += 1
				continue

			obj_id_to_attributes = {
     }
			num_attributes = []
			for obj_attribute in image_id_to_attributes[image_id]:
				obj_id_to_attributes[obj_attribute['object_id']] = obj_attribute.get('attributes', None)
			for object_id in image_object_ids:
				attributes = obj_id_to_attributes.get(object_id, None)
				if attributes is None:
					object_attributes.append([-1] * args.max_attributes_per_image)
					num_attributes.append(0)
				else:
					attribute_ids = []
					for attribute in attributes:
						if attribute in vocab['attribute_name_to_idx']:
							attribute_ids.append(vocab['attribute_name_to_idx'][attribute])
						if len(attribute_ids) >= args.max_attributes_per_image:
							break
					num_attributes.append(len(attribute_ids))
					pad_len = args.max_attributes_per_image - len(attribute_ids)
					attribute_ids = attribute_ids + [-1] * pad_len
					object_attributes.append(attribute_ids)

			# Pad object info out to max_objects_per_image
			while len(image_object_ids) < args.max_objects_per_image:
				image_object_ids.append(-1)
				image_object_names.append(-1)
				image_object_boxes.append([-1, -1, -1, -1])
				num_attributes.append(-1)

			# Pad relationship info out to max_relationships_per_image
			while len(image_rel_ids) < args.max_relationships_per_image:
				image_rel_ids.append(-1)
				image_rel_subs.append(-1)
				image_rel_preds.append(-1)
				image_rel_objs.append(-1)

			final_image_ids.append(image_id)
			object_ids.append(image_object_ids)
			object_names.append(image_object_names)
			object_boxes.append(image_object_boxes)
			objects_per_image.append(num_objects)
			relationship_ids.append(image_rel_ids)
			relationship_subjects.append(image_rel_subs)
			relationship_predicates.append(image_rel_preds)
			relationship_objects.append(image_rel_objs)
			relationships_per_image.append(num_relationships)
			attributes_per_object.append(num_attributes)

		print('Skip stats for split "%s"' % split)
		for stat, count in skip_stats.items():
			print(stat, count)
		print()
		numpy_arrays[split] = {
     
			'image_ids': np.asarray(final_image_ids),
			'object_ids': np.asarray(object_ids),
			'object_names': np.asarray(object_names),
			'object_boxes': np.asarray(object_boxes),
			'objects_per_image': np.asarray(objects_per_image),
			'relationship_ids': np.asarray(relationship_ids),
			'relationship_subjects': np.asarray(relationship_subjects),
			'relationship_predicates': np.asarray(relationship_predicates),
			'relationship_objects': np.asarray(relationship_objects),
			'relationships_per_image': np.asarray(relationships_per_image),
			'attributes_per_object': np.asarray(attributes_per_object),
			'object_attributes': np.asarray(object_attributes),
		}
		for k, v in numpy_arrays[split].items():
			if v.dtype == np.int64:
				numpy_arrays[split][k] = v.astype(np.int32)
	return numpy_arrays

args = parser.parse_args()
numpy_arrays = encode_graphs(args, splits, objects, relationships, vocab,
                               object_id_to_obj, attributes)

##观察对 train 数据集所做的信息综合                               
for key, value in numpy_arrays['train'].items():
	##输出 value 的类型和长度
	print(key, type(value), len(value))
	##输出每个 value 的第一个元素
	print(value[0])

输出：

Skip stats for split "train"
too_few_relationships 16402
too_few_objects 6794
too_many_objects 187
too_many_relationships 180

Skip stats for split "test"
too_few_relationships 4803
too_few_objects 837
too_many_objects 26

Skip stats for split "val"
too_few_objects 853
too_few_relationships 4815
too_many_objects 27
too_many_relationships 4

image_ids <class 'numpy.ndarray'> 62565
1

object_ids <class 'numpy.ndarray'> 62565
[1058549 1058534 1058508 1058539 1058543 1058545 1058498 3798579 3798576
 3798577 1058507 1058515    5060 1058530    5049 1058531 1058511 1058528
 1058547      -1      -1      -1      -1      -1      -1      -1      -1
      -1      -1      -1]

object_names <class 'numpy.ndarray'> 62565
[  2  52   7  60   5   2  95   1   3   3   9  19 134  44  19  32   4  32
   4  -1  -1  -1  -1  -1  -1  -1  -1  -1  -1  -1]

object_boxes <class 'numpy.ndarray'> 62565
[[  0   0 799 557]
 [ 78 308 722 290]
 [  1   0 222 538]
 [439 283 359 258]
 [  0   1 135 535]
 [178   0 476 360]
 [422  63  77 363]
 [602   1 198 147]
 [367 264  82 248]
 [238 254  57 259]
 [123  13  78 179]
 [719 342  80 164]
 [716 345  70 164]
 [367 296  82  98]
 [478 319  78  95]
 [388 369  48 128]
 [241 287  54 103]
 [245 384  44 118]
 [368 295  82 102]
 [ -1  -1  -1  -1]
 [ -1  -1  -1  -1]
 [ -1  -1  -1  -1]
 [ -1  -1  -1  -1]
 [ -1  -1  -1  -1]
 [ -1  -1  -1  -1]
 [ -1  -1  -1  -1]
 [ -1  -1  -1  -1]
 [ -1  -1  -1  -1]
 [ -1  -1  -1  -1]
 [ -1  -1  -1  -1]]

objects_per_image <class 'numpy.ndarray'> 62565
19

relationship_ids <class 'numpy.ndarray'> 62565
[  15930   15933   15934   15946 3186267 3186270 4265927      -1      -1
      -1      -1      -1      -1      -1      -1      -1      -1      -1
      -1      -1      -1      -1      -1      -1      -1      -1      -1
      -1      -1      -1]

relationship_subjects <class 'numpy.ndarray'> 62565
[10  1 11 11  5  0  2 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
 -1 -1 -1 -1 -1 -1]

relationship_predicates <class 'numpy.ndarray'> 62565
[ 1  2  3  4  2  5  6 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
 -1 -1 -1 -1 -1 -1]

relationship_objects <class 'numpy.ndarray'> 62565
[ 2  3 12  3  3  1  7 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
 -1 -1 -1 -1 -1 -1]

relationships_per_image <class 'numpy.ndarray'> 62565
7

attributes_per_object <class 'numpy.ndarray'> 62565
[ 0  1  2  0  1  0  2  0  0  0  1  2  0  2  0  2  2  1  1 -1 -1 -1 -1 -1
 -1 -1 -1 -1 -1 -1]

object_attributes <class 'numpy.ndarray'> 606319
[-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
 -1 -1 -1 -1 -1 -1]

将综合的信息写到文件中

经过了让人头晕脑胀甚至不明所以的信息综合后，终于可以把得到的信息写到文件中保存起来了。

Pyhton 代码：

import h5py
def get_image_paths(image_id_to_image, image_ids):
	paths = []
	for image_id in image_ids:
		image = image_id_to_image[image_id]
		base, filename = os.path.split(image['url'])
		path = os.path.join(os.path.basename(base), filename)
		paths.append(path)
	return paths

output_h5_dir = './'
output_vocab_json = 'vocab.json'
print('Writing HDF5 output files')
for split_name, split_arrays in numpy_arrays.items():
	image_ids = list(split_arrays['image_ids'].astype(int))
	h5_path = os.path.join(output_h5_dir, '%s.h5' % split_name)
	print('Writing file "%s"' % h5_path)
	with h5py.File(h5_path, 'w') as h5_file:
		for name, ary in split_arrays.items():
			print('Creating datset: ', name, ary.shape, ary.dtype)
			h5_file.create_dataset(name, data=ary)
		print('Writing image paths')
		image_paths = get_image_paths(image_id_to_image, image_ids)
		path_dtype = h5py.special_dtype(vlen=str)
		path_shape = (len(image_paths),)
		path_dset = h5_file.create_dataset('image_paths', path_shape,
                                         dtype=path_dtype)
		for i, p in enumerate(image_paths):
			path_dset[i] = p
	print()

print('Writing vocab to "%s"' % output_vocab_json)
with open(output_vocab_json, 'w') as f:
	json.dump(vocab, f)

输出：

Writing HDF5 output files
Writing file "./train.h5"
Creating datset:  image_ids (62565,) int32
Creating datset:  object_ids (62565, 30) int32
Creating datset:  object_names (62565, 30) int32
Creating datset:  object_boxes (62565, 30, 4) int32
Creating datset:  objects_per_image (62565,) int32
Creating datset:  relationship_ids (62565, 30) int32
Creating datset:  relationship_subjects (62565, 30) int32
Creating datset:  relationship_predicates (62565, 30) int32
Creating datset:  relationship_objects (62565, 30) int32
Creating datset:  relationships_per_image (62565,) int32
Creating datset:  attributes_per_object (62565, 30) int32
Creating datset:  object_attributes (606319, 30) int32
Writing image paths

Writing file "./test.h5"
Creating datset:  image_ids (5096,) int32
Creating datset:  object_ids (5096, 30) int32
Creating datset:  object_names (5096, 30) int32
Creating datset:  object_boxes (5096, 30, 4) int32
Creating datset:  objects_per_image (5096,) int32
Creating datset:  relationship_ids (5096, 30) int32
Creating datset:  relationship_subjects (5096, 30) int32
Creating datset:  relationship_predicates (5096, 30) int32
Creating datset:  relationship_objects (5096, 30) int32
Creating datset:  relationships_per_image (5096,) int32
Creating datset:  attributes_per_object (5096, 30) int32
Creating datset:  object_attributes (51626, 30) int32
Writing image paths

Writing file "./val.h5"
Creating datset:  image_ids (5062,) int32
Creating datset:  object_ids (5062, 30) int32
Creating datset:  object_names (5062, 30) int32
Creating datset:  object_boxes (5062, 30, 4) int32
Creating datset:  objects_per_image (5062,) int32
Creating datset:  relationship_ids (5062, 30) int32
Creating datset:  relationship_subjects (5062, 30) int32
Creating datset:  relationship_predicates (5062, 30) int32
Creating datset:  relationship_objects (5062, 30) int32
Creating datset:  relationships_per_image (5062,) int32
Creating datset:  attributes_per_object (5062, 30) int32
Creating datset:  object_attributes (51090, 30) int32
Writing image paths

Writing vocab to "vocab.json"

你可能感兴趣的:(论文解析,机器学习,深度学习,pytorch,神经网络,数据库)

毕业设计--基于Python+Django框架的豆瓣图书评论推荐可视化系统源码空间站11 python 课程设计 django 毕业设计人工智能后端豆瓣图书爬虫
1.系统概述本系统是一个基于Python和Django框架的豆瓣图书评论推荐可视化系统。通过收集并分析豆瓣图书的评论数据，结合可视化技术，系统能够为用户推荐感兴趣的图书，并以直观的方式展示图书评论的统计信息和分析结果。2.项目目标本项目的主要目标是设计并实现一个豆瓣图书评论推荐系统，具体目标如下：图书评论数据采集与存储：系统能够从豆瓣平台抓取图书评论数据，并将其存储在本地数据库中（db.sqlit
SpringBoot+Vue网站项目是如何实现的森森 coding 技术 spring boot vue.js 后端毕业设计
1.项目规划需求分析：明确项目的功能需求，比如用户注册、登录、数据展示等。确定技术栈：使用SpringBoot作为后端，Vue.js作为前端，MySQL作为数据库。2.环境准备2.1后端安装JDK：确保安装JavaDevelopmentKit(JDK)8或更高版本。安装Maven：用于管理依赖和构建项目。IDE选择：选择合适的IDE，如IntelliJIDEA或Eclipse。2.2前端安装Nod
神经网络初始化 (init) 介绍迷路爸爸180 神经网络人工智能深度学习初始化 init
文章目录引言1.初始化的重要性1.1打破对称性1.2控制方差1.3加速收敛与提高泛化能力2.常见的初始化方法及其应用场景2.1Xavier/Glorot初始化2.2He初始化2.3正交初始化2.4其他初始化方法3.如何设置初始化4.基于BERT的文本分类如何进行初始化4.1项目背景4.2模型构建4.3模型训练与评估4.4结果分析结论参考资料引言在深度学习的世界中，构建一个高效且性能优异的神经网络模
docker 部署postgresql ubuntu20.04 docker postgresql 容器
docker部署PostgreSQL服务拉一下容器dockerpullpostgres运行容器dockerrun--namemy-postgres-ePOSTGRES_PASSWORD=123456-p5432:5432-dpostgres以postgres用户身份进入容器dockerexec-itmy-postgrespsql-Upostgres创建数据库CREATEDATABASEfinanc
单体式架构，集群，分布式架构有什么区别 highfish920 java 分布式 java
单体式架构单体架构就是所有的代码都是放在一个项目上面的，并且部署在一个机器上面，他什么都做，自己完成所有的任务。但是当业务规模变得复杂以后，就会暴露他的缺点缺点：1.编译和启动的速度会变慢2.代码冲突3.单个数据库存储能力有限4.很难在进行扩展虽然有缺点，但是他肯定还是有自己的优点的优点1.开发测试简单部署方便2.扩容简单，给应用加机器就可以由于单体架构的缺点，逐渐出现了集群集群集群就是当单体架构
使用Docker部署PostgreSQL服务器 shelby_loo docker postgresql 服务器
Yo，大家好！今天我要分享的是在阿贝云免费服务器上使用Docker部署PostgreSQL服务器的技术教程。配置虽然是1核CPU、1G内存、10G硬盘、5M带宽，但性能已经完全升任了！首先，让我们简要介绍一下使用到的Docker和PostgreSQL软件。Docker是一个强大的容器化平台，而PostgreSQL则是一款开源的关系型数据库管理系统，两者结合使用能让我们的工作更加高效！现在，让我们来
高级java每日一道面试题-2025年01月17日-JDBC篇-JDBC 编程有哪些步骤? java我跟你拼了 java每日一道面试题 java 加载驱动程序建立数据库连接创建SQL 语句调用执行查询或更新处理结果集关闭资源
如果有遗漏,评论区告诉我进行补充面试官:JDBC编程有哪些步骤?我回答:在Java高级面试中，关于JDBC（JavaDatabaseConnectivity）编程的讨论通常会围绕如何使用JDBC进行数据库操作以及最佳实践展开。以下是JDBC编程的基本步骤及其详解：1.加载驱动程序为了与特定类型的数据库通信，首先需要加载相应的JDBC驱动程序。这是通过调用Class.forName()方法来实现的，
使用Docker部署一个使用PostgreSQL数据库的Springboot项目 youtian.L Docker系列教程数据库 docker spring boot java 容器
1-我们将建造什么？出于演示目的，我们将创建一个简单的食谱管理应用程序，其中包含两个实体：Chef和Recipe。编辑2-创建一个springboot应用程序为此，如果您使用IntelliJidea.并选择以下依赖项：SpringWeb，PostgreSQLDriver当然SpringDataJPA还要确保选择maven作为项目经理。编辑安装项目后，将其解压缩并使用您喜欢的IDE打开它。如第一部分
Docker安装PostgreSQL tag心动 Docker容器 docker postgresql 容器 docker-compose
文章目录一、PostgreSQL是什么？二、搭建步骤1、编写docker-compose.yml脚本2、启动验证一、PostgreSQL是什么？PostgreSQL是一种特性非常齐全的自由软件的对象-关系型数据库管理系统（ORDBMS），其基础源于加州大学计算机系开发的POSTGRES4.2版本。PostgreSQL不仅支持大部分的SQL标准，还提供了许多现代特性，如复杂查询、外键、触发器、视图、
深入理解 MySQL 中的锁和MVCC机制 master_chenchengg 能力提升面试宝典技术 IT信息化
深入理解MySQL中的锁和MVCC机制事务的概念与ACID特性锁的类型及其工作机制锁的粒度与性能影响多版本并发控制（MVCC）原理幻读问题及解决方法死锁检测与预防策略事务隔离级别对锁和MVCC的影响实际应用场景下的锁优化技巧事务的概念与ACID特性在任何数据库操作中，事务都是一个核心概念。事务是指作为一个单位的一组有序的数据库操作，这些操作要么全部执行，要么全部不执行，确保数据的完整性和一致性。M
【MySQL】深入解析“Data too long”错误：原因、解决方案与优化策略 master_chenchengg sql数据库 mysql 数据库
【MySQL】深入解析“Datatoolong”错误：原因、解决方案与优化策略一、引言二、技术概述错误定义核心特性与优势三、技术细节原理分析难点四、实战应用应用场景问题与解决方案五、优化与改进潜在问题改进建议六、常见问题问题列举解决方案七、总结与展望一、引言MySQL作为世界上最受欢迎的开源关系型数据库管理系统之一，其稳定性和灵活性使其在Web应用、数据仓库和其他需要高性能数据存储的场景中占据主导
MySQL第三次实验 Z字小熊饼干爱吃保安 mysql 数据库
一、建库建表1、创建数据库mydb11_stu并使用数据库mysql>createdatabasemydb11_stu;QueryOK,1rowaffected(0.01sec)mysql>showdatabases;+--------------------+|Database|+--------------------+|information_schema||mydb10_city||myd
2021 寄网数据库西电大题软工 _ZCWzy 学习
大题1：给了一个类似书上employee,works,company的关系模式，写关系代数和sql语句大题2：给了事务的执行，求串行执行有几种方式，串行执行后XY的结果；新的调度是否是可串行化？用两项锁协议改写该调度，并且写出XY的结果大题3：给了书上instr_dept的那个关系模式问是不是BCNF；改写为BCNF大题4：给了关系模式R，求其中属性AB的闭包；问AB是否是候选码大题5：ER设计寄
【openGauss】数据库安全-数据库认证机制小嗑数据库数据库开源软件
数据库认证机制可获得性本特性自openGauss1.1.0版本开始引入。特性简介提供基于客户端/服务端（C/S）模式的客户端连接认证机制。客户价值加密认证过程中采用单向Hash不可逆加密算法PBKDF2，有效防止彩虹攻击。特性描述openGauss采用基本的客户端连接认证机制，客户端发起连接请求后，由服务端完成信息校验并依据校验结果发送认证所需信息给客户端（认证信息包括盐值、token以及服务端签
PL/SQL语言的语法糖技术的探险家包罗万象 golang 开发语言后端
PL/SQL语言的语法糖引言PL/SQL（ProceduralLanguage/StructuredQueryLanguage）是Oracle公司为其数据库管理系统（DBMS）设计的一种过程化语言。作为一种扩展SQL的语言，PL/SQL不仅支持数据的查询和操作，还增加了更复杂的编程特性，比如变量声明、控制结构、异常处理等，从而使得程序员能够编写出更加灵活和高效的数据库应用程序。然而在PL/SQL中
机器学习02-发展历史补充坐吃山猪机器学习机器学习人工智能
机器学习02-发展历史补充文章目录机器学习02-发展历史补充1-机器学习个人理解1-初始阶段：统计学习和模式识别（20世纪50年代至80年代）2-第二阶段【集成时代】+【核方法】（20世纪90年代至2000年代初期）3-第三阶段【特征工程】+【模型优化】（2000年代中期至2010年代初期）4-大规模数据和分布式计算（2010年代中后期）5-自动化机器学习和特征选择（2010年代末至今）2-神经网
MySQL数据库漫谈实战课程 MySQL数据库极速实战视频教程 MySQL初阶DBA试炼教程 weixin_52291433 数据库 mysql java sql python
MySQL数据库漫谈实战课程MySQL数据库极速实战视频教程MySQL初阶DBA试炼教程===============课程目录===============├─01-Mysql-数据库简介.mp4├─02-Mysql-RDBMS专业术语.mp4├─03-Mysql-安装.mp4├─04-Mysql-基本命令及连接Navicat.mp4├─05-Mysql-字符集介绍.mp4├─06-Mysql-存
asp.net mysql 性能问题_ASP.NET性能优化小结 syhakh asp.net mysql 性能问题
一、返回多个数据集检查你的访问数据库的代码，看是否存在着要返回多次的请求。每次往返降低了你的应用程序的每秒能够响应请求的次数。通过在单个数据库请求中返回多个结果集，可以减少与数据库通信的时间，使你的系统具有扩展性，也可以减少数据库服务器响应请求的工作量。如果用动态的SQL语句来返回多个数据集，那用存储过程来替代动态的SQL语句会更好些。是否把业务逻辑写到存储过程中，这个有点争议。但是我认为，把业务
MySQL.data.dll v4.0：深入.NET与MySQL交互的关键组件小黄人95
本文还有配套的精品资源，点击获取简介：MySQL.data.dll是.NETFramework应用程序与MySQL服务器通信的重要组件，包含MySqlClient类库，提供数据库连接、命令执行和数据适配等功能。它对.NET开发人员使用C#、***等语言操作MySQL数据库至关重要。本实践指南将深入介绍如何正确配置和使用MySQL.data.dll，包括连接字符串配置、异常处理、数据库操作、连接管理
后端方向初阶入门——MySQL 小羊一定要努力变强 mysql 数据库
各位帅哥美女，编辑不易，动动发财小手，来个三连加关注，后续会有更加优秀的推文推出~Mysql：用的最多的数据库，项目都用此数据库(Oracle：大型数据库，用的不多，因为收费，大公司难免会遇到。目录1.MySQL简介2.MySQL安装与配置2.1下载与安装2.2配置与启动3.数据库基础操作3.1创建数据库3.2选择数据库3.3创建表3.4插入数据3.5查询数据3.6更新数据3.7删除数据3.8.添
Python数据分析高频面试题及答案闲人编程程序员面试 python 数据分析面试题核心
目录1.基础知识2.数据处理3.数据可视化4.机器学习模型5.进阶问题6.数据清洗与预处理7.数据转换与操作8.时间序列分析9.高级数据分析技术10.数据降维与特征选择11.模型评估与优化12.数据操作与转换13.数据筛选与分析14.数据可视化与报告15.数据统计与分析16.高级数据处理以下是一些Python数据分析的高频核心面试题及其答案，涵盖了基础知识、数据1.基础知识问1：Python中列表
grafana数据库从sqlit3迁移至mysql(tidb) mark.meng 监控系统数据库 grafana mysql
背景grafana默认采用的是sqlite3，由于sqlite3不支持同时写，sqlite的锁是文件锁，作用的是整个DB文件，同一时间可以有多个读事务，但是同一时间最多只能有一个写事务。容易锁库导致grafana报错，现决定将grafana默认sqlite3db文件迁移至mysql(tidb)中。报错信息：Error:databaseislocked迁移步骤1.下载迁移工具镜像dockerpull
智能生成ER图工具。使用 SQL 生成 ER 图：让数据库设计更高效小林rr 数据库 sql oracle
使用SQL生成ER图：让数据库设计更高效在数据库设计中，ER图（实体关系图）是不可或缺的工具。它不仅能帮助开发者直观地展示数据库的结构，还能帮助团队成员更好地理解不同数据实体之间的关系。传统上，ER图的绘制需要手动操作或使用特定的工具，而通过SQL自动生成ER图则提供了一种更加高效、便捷的方式。今天，我们将向大家介绍如何使用SQL生成ER图，帮助您更轻松地进行数据库设计，同时推广一款强大易用的工具
推荐开源项目：MySQL Connector for .NET与.NET Core 伏佳励Sibyl
推荐开源项目：MySQLConnectorfor.NET与.NETCoreMySqlConnectorMySqlConnector是一个开源的MySQL数据库连接器，用于在.NET应用程序中与MySQL数据库进行交互。-功能：MySQL数据库连接器；.NET应用程序；MySQL数据库交互。-特点：易于使用；轻量级；支持多种编程语言；高性能。项目地址:https://gitcode.com/gh_m
Pytorch 三小时极限入门教程 power-辰南人工智能深度学习 pytorch 人工智能
一、引言在当今的人工智能领域，深度学习占据了举足轻重的地位。而Pytorch作为一款广受欢迎的深度学习框架，以其简洁、灵活的特性，吸引了大量开发者投身其中。无论是科研人员探索前沿的神经网络架构，还是工程师将深度学习技术落地到实际项目，Pytorch都提供了强大的支持。本教程将带你从零基础开始，一步步深入了解Pytorch的核心知识，助你顺利踏上深度学习的征程。二、Pytorch基础环境搭建安装An
Python机器学习之XGBoost从入门到实战(基本理论说明) 雪域枫蓝 Python Atificial Intelligence 机器学习 python 分布式
Xgboost从基础到实战XGBoost:eXtremeGradientBoosting*应用机器学习领域的一个强有力的工具*GradientBootingMachines(GBM)的优化表现，快速有效—深盟分布式机器学习开源平台(DistributedmachinelearningCommunity，DMLC)的分支—DMLC也开源流行的深度学习库mxnet*GBM：Machine：机器学习模型
MyBatis 注解使用指南小小小妮子~ tomcat java 开发语言 MyBatis
什么是MyBatis注解？MyBatis是一个老牌而强大的ORM框架，通过配置文件或注解来实现数据库操作。在注解模式中，你可以不用写XML配置文件，而是通过在代码中直接使用Java注解来实现。这种方式更简洁，会让你的项目代码极大约简化，同时保持构件的结构化。MyBatis注解的优势优化代码结构：较乎使用XML配置，注解使用更简洁。所有于SQL直接直观地位于接口中，汇总于一个场景中。清除多余配置文件
Python Sqlite数据库与配置文件的加载、编辑和保存 2201_75335496 数据库 sqlite python json pyqt
一、Sqlite数据库SQLite，是一款轻型的数据库，是遵守ACID的关系型数据库管理系统，它包含在一个相对小的C库中。它是D.RichardHipp建立的公有领域项目。它的设计目标是嵌入式的，而且已经在很多嵌入式产品中使用了它，它占用资源非常的低，在嵌入式设备中，可能只需要几百K的内存就够了。它能够支持Windows/Linux/Unix等等主流的操作系统，同时能够跟很多程序语言相结合，比如T
【数据分析岗】关于数据分析岗面试python的金典问题+解答，包含数据读取、数据清洗、数据分析、机器学习等内容摇光~ 数据分析面试 python
大家好，我是摇光~，用大白话讲解所有你难懂的知识点最近和几个大佬交流了，说了很多关于现在职场面试等问题，然后也找他们问了问他们基本面试的话都会提什么问题。所以我收集了很多关于python的面试题，希望对大家面试有用。类别1：数据读取与处理问题1：如何用Python从Excel文件中读取数据？答：在Python中，可以使用pandas库从Excel文件中读取数据。pandas提供了read_exce
【Python篇】深入机器学习核心：XGBoost 从入门到实战半截诗 Python python 机器学习深度学习分类回归数据分析 XGBoost
文章目录XGBoost完整学习指南：从零开始掌握梯度提升1.前言2.什么是XGBoost？2.1梯度提升简介3.安装XGBoost4.数据准备4.1加载数据4.2数据集划分5.XGBoost基础操作5.1转换为DMatrix格式5.2设置参数5.3模型训练5.4预测6.模型评估7.超参数调优7.1常用超参数7.2网格搜索8.XGBoost特征重要性分析9.高级功能扩展9.1模型解释与可解释性9.2
PHP，安卓，UI，java，linux视频教程合集 cocos2d-x小菜 java UI linux PHP android
╔-----------------------------------╗┆
zookeeper admin 笔记 braveCS zookeeper
Required Software 1) JDK>=1.6 2)推荐使用ensemble的ZooKeeper(至少3台)，并run on separate machines 3)在Yahoo!，zk配置在特定的RHEL boxes里，2个cpu，2G内存，80G硬盘数据和日志目录 1)数据目录里的文件是zk节点的持久化备份，包括快照和事务日
Spring配置多个连接池 easterfly spring
项目中需要同时连接多个数据库的时候，如何才能在需要用到哪个数据库就连接哪个数据库呢？ Spring中有关于dataSource的配置： <bean id="dataSource" class="com.mchange.v2.c3p0.ComboPooledDataSource" &nb
Mysql 171815164 mysql
例如，你想myuser使用mypassword从任何主机连接到mysql服务器的话。 GRANT ALL PRIVILEGES ON *.* TO 'myuser'@'%'IDENTIFIED BY 'mypassword' WI TH GRANT OPTION; 如果你想允许用户myuser从ip为192.168.1.6的主机连接到mysql服务器，并使用mypassword作
CommonDAO（公共/基础DAO） g21121 DAO
好久没有更新博客了，最近一段时间工作比较忙，所以请见谅，无论你是爱看呢还是爱看呢还是爱看呢，总之或许对你有些帮助。 DAO(Data Access Object)是一个数据访问（顾名思义就是与数据库打交道）接口，DAO一般在业
直言有讳永夜-极光感悟随笔
1.转载地址:http://blog.csdn.net/jasonblog/article/details/10813313 精华: “直言有讳”是阿里巴巴提倡的一种观念，而我在此之前并没有很深刻的认识。为什么呢？就好比是读书时候做阅读理解，我喜欢我自己的解读，并不喜欢老师给的意思。在这里也是。我自己坚持的原则是互相尊重，我觉得阿里巴巴很多价值观其实是基本的做人
安装CentOS 7 和Win 7后，Win7 引导丢失随便小屋 centos
一般安装双系统的顺序是先装Win7，然后在安装CentOS，这样CentOS可以引导WIN 7启动。但安装CentOS7后，却找不到Win7 的引导，稍微修改一点东西即可。一、首先具有root 的权限。即进入Terminal后输入命令su，然后输入密码即可二、利用vim编辑器打开/boot/grub2/grub.cfg文件进行修改 v
Oracle备份与恢复案例 aijuans oracle
Oracle备份与恢复案例一. 理解什么是数据库恢复当我们使用一个数据库时，总希望数据库的内容是可靠的、正确的，但由于计算机系统的故障（硬件故障、软件故障、网络故障、进程故障和系统故障）影响数据库系统的操作，影响数据库中数据的正确性，甚至破坏数据库，使数据库中全部或部分数据丢失。因此当发生上述故障后，希望能重构这个完整的数据库，该处理称为数据库恢复。恢复过程大致可以分为复原(Restore)与
JavaEE开源快速开发平台G4Studio v5.0发布無為子
我非常高兴地宣布,今天我们最新的JavaEE开源快速开发平台G4Studio_V5.0版本已经正式发布。访问G4Studio网站 http://www.g4it.org 2013-04-06 发布G4Studio_V5.0版本功能新增 (1). 新增了调用Oracle存储过程返回游标，并将游标映射为Java List集合对象的标
Oracle显示根据高考分数模拟录取百合不是茶 PL/SQL编程 oracle例子模拟高考录取学习交流
题目要求: 1,创建student表和result表 2,pl/sql对学生的成绩数据进行处理 3,处理的逻辑是根据每门专业课的最低分线和总分的最低分数线自动的将录取和落选 1,创建student表,和result表学生信息表; create table student( student_id number primary key,--学生id
优秀的领导与差劲的领导 bijian1013 领导管理团队
责任优秀的领导：优秀的领导总是对他所负责的项目担负起责任。如果项目不幸失败了，那么他知道该受责备的人是他自己，并且敢于承认错误。差劲的领导：差劲的领导觉得这不是他的问题，因此他会想方设法证明是他的团队不行，或是将责任归咎于团队中他不喜欢的那几个成员身上。努力工作优秀的领导：团队领导应该是团队成员的榜样。至少，他应该与团队中的其他成员一样努力工作。这仅仅因为他
js函数在浏览器下的兼容 Bill_chen jquery 浏览器 IE DWR ext
做前端开发的工程师，少不了要用FF进行测试，纯js函数在不同浏览器下，名称也可能不同。对于IE6和FF，取得下一结点的函数就不尽相同： IE6：node.nextSibling,对于FF是不能识别的； FF：node.nextElementSibling,对于IE是不能识别的；兼容解决方式：var Div = node.nextSibl
【JVM四】老年代垃圾回收：吞吐量垃圾收集器(Throughput GC) bit1129 垃圾回收
吞吐量与用户线程暂停时间衡量垃圾回收算法优劣的指标有两个：吞吐量越高，则算法越好暂停时间越短，则算法越好首先说明吞吐量和暂停时间的含义。垃圾回收时，JVM会启动几个特定的GC线程来完成垃圾回收的任务，这些GC线程与应用的用户线程产生竞争关系，共同竞争处理器资源以及CPU的执行时间。GC线程不会对用户带来的任何价值，因此，好的GC应该占
J2EE监听器和过滤器基础白糖_ J2EE
Servlet程序由Servlet，Filter和Listener组成，其中监听器用来监听Servlet容器上下文。监听器通常分三类：基于Servlet上下文的ServletContex监听，基于会话的HttpSession监听和基于请求的ServletRequest监听。 ServletContex监听器 ServletContex又叫application
博弈AngularJS讲义(16) - 提供者 boyitech js AngularJS api Angular Provider
Angular框架提供了强大的依赖注入机制，这一切都是有注入器(injector)完成. 注入器会自动实例化服务组件和符合Angular API规则的特殊对象，例如控制器，指令，过滤器动画等。那注入器怎么知道如何去创建这些特殊的对象呢？ Angular提供了5种方式让注入器创建对象，其中最基础的方式就是提供者(provider), 其余四种方式(Value, Fac
java-写一函数f(a,b)，它带有两个字符串参数并返回一串字符，该字符串只包含在两个串中都有的并按照在a中的顺序。 bylijinnan java
public class CommonSubSequence { /** * 题目：写一函数f(a,b)，它带有两个字符串参数并返回一串字符，该字符串只包含在两个串中都有的并按照在a中的顺序。 * 写一个版本算法复杂度O(N^2)和一个O(N) 。 * * O(N^2)：对于a中的每个字符，遍历b中的每个字符，如果相同，则拷贝到新字符串中。 * O(
sqlserver 2000 无法验证产品密钥 Chen.H sql windows SQL Server Microsoft
在 Service Pack 4 (SP 4), 是运行 Microsoft Windows Server 2003、 Microsoft Windows Storage Server 2003 或 Microsoft Windows 2000 服务器上您尝试安装 Microsoft SQL Server 2000 通过卷许可协议 (VLA) 媒体。这样做, 收到以下错误信息CD KEY的 SQ
[新概念武器]气象战争 comsci
气象战争的发动者必须是拥有发射深空航天器能力的国家或者组织.... 原因如下: 地球上的气候变化和大气层中的云层涡旋场有密切的关系,而维持一个在大气层某个层次
oracle 中 rollup、cube、grouping 使用详解 daizj oracle grouping rollup cube
oracle 中 rollup、cube、grouping 使用详解 -- 使用oracle 样例表演示转自namesliu -- 使用oracle 的样列库，演示 rollup, cube, grouping 的用法与使用场景 --- ROLLUP ，为了理解分组的成员数量，我增加了分组的计数 COUNT(SAL)
技术资料汇总分享 Dead_knight 技术资料汇总分享
本人汇总的技术资料，分享出来，希望对大家有用。 http://pan.baidu.com/s/1jGr56uE 资料主要包含： Workflow->工作流相关理论、框架(OSWorkflow、JBPM、Activiti、fireflow...) Security->java安全相关资料(SSL、SSO、SpringSecurity、Shiro、JAAS...) Ser
初一下学期难记忆单词背诵第一课 dcj3sjt126com english word
could 能够 minute 分钟 Tuesday 星期二 February 二月 eighteenth 第十八 listen 听 careful 小心的，仔细的 short 短的 heavy 重的 empty 空的 certainly 当然 carry 携带；搬运 tape 磁带 basket 蓝子 bottle 瓶 juice 汁，果汁 head 头；头部
截取视图的图片, 然后分享出去 dcj3sjt126com OS Objective-C
OS 7 has a new method that allows you to draw a view hierarchy into the current graphics context. This can be used to get an UIImage very fast. I implemented a category method on UIView to get the vi
MySql重置密码 fanxiaolong MySql重置密码
方法一: 在my.ini的[mysqld]字段加入： skip-grant-tables 重启mysql服务，这时的mysql不需要密码即可登录数据库然后进入mysql mysql>use mysql; mysql>更新 user set password=password('新密码') WHERE User='root'; mysq
Ehcache（03）——Ehcache中储存缓存的方式 234390216 ehcache MemoryStore DiskStore 存储驱除策略
Ehcache中储存缓存的方式目录 1 堆内存（MemoryStore） 1.1 指定可用内存 1.2 驱除策略 1.3 元素过期 2 &nbs
spring mvc中的@propertysource jackyrong spring mvc
在spring mvc中，在配置文件中的东西，可以在java代码中通过注解进行读取了： @PropertySource 在spring 3.1中开始引入比如有配置文件 config.properties mongodb.url=1.2.3.4 mongodb.db=hello 则代码中 @PropertySource(&
重学单例模式 lanqiu17 单例 Singleton 模式
最近在重新学习设计模式，感觉对模式理解更加深刻。觉得有必要记下来。第一个学的就是单例模式，单例模式估计是最好理解的模式了。它的作用就是防止外部创建实例，保证只有一个实例。单例模式的常用实现方式有两种，就人们熟知的饱汉式与饥汉式，具体就不多说了。这里说下其他的实现方式静态内部类方式: package test.pattern.singleton.statics; publ
.NET开源核心运行时，且行且珍惜 netcome java .net 开源
背景 2014年11月12日，ASP.NET之父、微软云计算与企业级产品工程部执行副总裁Scott Guthrie，在Connect全球开发者在线会议上宣布，微软将开源全部.NET核心运行时，并将.NET 扩展为可在 Linux 和 Mac OS 平台上运行。.NET核心运行时将基于MIT开源许可协议发布，其中将包括执行.NET代码所需的一切项目——CLR、JIT编译器、垃圾收集器（GC）和核心
使用oscahe缓存技术减少与数据库的频繁交互 Everyday都不同 Web 高并发 oscahe缓存
此前一直不知道缓存的具体实现，只知道是把数据存储在内存中，以便下次直接从内存中读取。对于缓存的使用也没有概念，觉得缓存技术是一个比较”神秘陌生“的领域。但最近要用到缓存技术，发现还是很有必要一探究竟的。缓存技术使用背景：一般来说，对于web项目，如果我们要什么数据直接jdbc查库好了，但是在遇到高并发的情形下，不可能每一次都是去查数据库，因为这样在高并发的情形下显得不太合理——
Spring+Mybatis 手动控制事务 toknowme mybatis
@Override public boolean testDelete(String jobCode) throws Exception { boolean flag = false; &nbs
菜鸟级的android程序员面试时候需要掌握的知识点 xp9802 android
熟悉Android开发架构和API调用掌握APP适应不同型号手机屏幕开发技巧熟悉Android下的数据存储熟练Android Debug Bridge Tool 熟练Eclipse/ADT及相关工具熟悉Android框架原理及Activity生命周期熟练进行Android UI布局熟练使用SQLite数据库；熟悉Android下网络通信机制，S

如何预处理数据集 Visual Genome

如何预处理数据集 Visual Genome

Visual Genome

构建 download_vg.sh 下载数据库（可选）

image_data.json

创建 vg_splits.json 将数据集分为 train, val 和 test

去除 train, val 和 test 中尺寸较小的图片

处理物体和关系的别名 （object_alias.txt 和 relationship_alias.txt）

objects.json

进一步过滤物体

attributes.json

relationships.json

综合处理所有图片的信息

将综合的信息写到文件中

你可能感兴趣的:(论文解析,机器学习,深度学习,pytorch,神经网络,数据库)

处理物体和关系的别名（object_alias.txt 和 relationship_alias.txt）