Visual Genome Readme
File image part1, image part2
全部 jpg 格式的图片
IMAGE_ID.jpg,
File image_data.json.zip
全部图片的 meta data,格式:
Name | Type 类型 | Description 描述 |
---|---|---|
image_id | int | 图片ID |
url | hyperlink string | 图片URL |
width | int | 图片宽 pixels |
height | int | 图片高 |
coco_id | int | 在 COCO 数据集中的图片ID |
flickr_id | int | 在 flickr 数据集中的图片ID |
如:
[...
{
"image_id": 2412112,
"url": "https://cs.stanford.edu/people/rak248/VG_100K/2370463.jpg",
"width": 500,
"height": 281,
"coco_id": 547168,
"flickr_id": 8505158818
}
...]
File region_descriptions.json.zip
全部的 region descriptions.
Name | Type 类型 | Description 描述 |
---|---|---|
image_id | int | 包含该 region 的图片ID |
regions | object array | 该图片的 region descriptions 数组 |
—-.region_id | int | region description ID |
—-.x | int | region bounding box 的 x 坐标值 |
—-.y | int | region bounding box 的 y 坐标值 |
—-.width | int | region bounding box 的宽 |
—-.height | int | region bounding box 的高 |
—-.phrase | str | region description phrase 区域描述短语 |
—-.synsets | object array | description 的同义词 |
——–.synset_name | str | 同义词名字 |
——–.entity_name | str | 短语 |
——–.entity_idx_start | int | 同义词在短语中开始位置索引 |
——–.entity_idx_end | int | 同义词在短语中结束位置索引 |
如:
“`
[…
{
“image_id”: 2407890,
“regions”: […
{
“region_id”: 1353,
“x”: 117,
“y”: 79,
“width”: 249,
“height”: 107,
“phrase”: “a cat sitting on a table.”,
“synsets”: […
{
“synset_name”: “cat.n.01”,
“entity_name”: “cat”,
“entity_idx_start”: 2,
“entity_idx_end”: 5
},
…]
},
{
“region_id”: 1354,
“x”: 116,
“y”: 29,
“width”: 239,
“height”: 135,
“phrase”: “a white cat with a tan tail and face markings”,
“synsets”: […
…]
},
…]
},
{
“image_id”: 2407890,
“regions”: […
…]
},
…]
“`
File question_answers.json.zip
全部的问题与答案(QAs).
Name | Type 类型 | Description 描述 |
---|---|---|
image_id | int | 图片ID |
qas | object array | 该图片的 QAs 列表 |
—-.qa_id | str | QA ID |
—-.question | str | question |
—-.answer | str | answer |
—-.question_synsets | object array | question 中的同义词数组 |
——–.synset_name | str | 同义词名字 |
——–.entity_name | str | question 字符串 |
——–.entity_idx_start | int | question 中同义词开始位置的索引 |
——–.entity_idx_end | int | question 中同义词结束位置的索引 |
—-.answer_synsets | object array | answer 中的同义词数组 |
——–.synset_name | str | 同义词名字 |
——–.entity_name | str | answer 字符串 |
——–.entity_idx_start | int | answer 中同义词开始位置的索引 |
——–.entity_idx_end | int | answer 中同义词结束位置的索引 |
[...
{
"image_id": 2317993,
"qas": [...
{
"qa_id": 912402,
"question": "Where are the clouds?",
"answer": "sky",
"question_synsets": [...
{
"synset_name": "cloud.n.01",
"entity_name": "cloud",
"entity_idx_start": 14,
"entity_idx_end": 20
},
...],
"answer_synsets": [...
{
"synset_name": "sky.n.01",
"entity_name": "sky",
"entity_idx_start": 0,
"entity_idx_end": 3
},
...]
},
...]
},
...]
File objects.json.zip
全部的 object 实例.
Name | Type 类型 | Description 描述 |
---|---|---|
image_id | int | 图片ID |
objects | object array | 该图片的 object 实例 |
—-.object_id | int | object ID |
—-.x | int | object bounding box 的 x 坐标值 |
—-.y | int | object bounding box 的 y 坐标值 |
—-.w | int | object bounding box 的宽 |
—-.h | int | object bounding box 的高 |
—-.name | str | object 名字 |
—-.synsets | str array | 该 object 相关的同义词名字 |
[...
{
"image_id": 2,
"objects": [...
{
"object_id": 1023847,
"x": 405,
"y": 34,
"w": 78,
"h": 438,
"name": "pole",
"synsets": ["pole.n.01"]
},
{
"object_id": 1023836,
"x": 239,
"y": 347,
"w": 136,
"h": 126,
"name": "car",
"synsets": ["car.n.01"]
},
...]
},
...]
File attributes.json.zip
数据集中全部的 attributes.
Name | Type 类型 | Description 描述 |
---|---|---|
image_id | int | 图片ID |
attributes | object array | 该图片的 object 实例的 attributes 数组 |
—-.object_id | int | object ID |
—-.x | int | object bounding box 的 x 坐标值 |
—-.y | int | object bounding box 的 y 坐标值 |
—-.w | int | object bounding box 的宽 |
—-.h | int | object bounding box 的高 |
—-.name | str | object 名字 |
—-.synsets | str array | 该 object 相关的同义词名字 |
如:
[...
{
"image_id": 2,
"attributes": [...
{
"object_id": 1023847,
"x": 405,
"y": 34,
"w": 78,
"h": 438,
"name": "pole",
"synsets": ["pole.n.01"],
"attributes": ["brown"]
},
{
"object_id": 1023836,
"x": 239,
"y": 347,
"w": 136,
"h": 126,
"name": "car",
"synsets": ["car.n.01"],
"attributes": ["red", "broken"]
},
...]
},
...]
File relationships.json.zip
全部的 relationships.
Name | Type 类型 | Description 描述 |
---|---|---|
image_id | int | 图片 ID |
relationships | object array | 图片中 relationships 数组 |
—-.relationship_id | int | relationship ID |
—-.predicate | int | entity 字符开始索引 |
—-.synsets | str array | 与 predicate 相关的同义词名字 |
—-.subject | int | eneity 字符结束索引 |
——–.object_id | int | object ID |
——–.x | int | object bounding box 的 x 坐标值 |
——–.y | int | object bounding box 的 y 坐标值 |
——–.w | int | object bounding box 的宽 |
——–.h | int | object bounding box 的高 |
——–.name | str | object 名字 |
——–.synsets | str array | 与该 object 相关的同义词名字 |
—-.object | int | 识别的 entity 名字 |
——–.object_id | int | object ID |
——–.x | int | object bounding box 的 x 坐标值 |
——–.y | int | object bounding box 的 y 坐标值 |
——–.w | int | object bounding box 的宽 |
——–.h | int | object bounding box 的高 |
——–.name | str | object 名字 |
——–.synsets | str array | 与该 object 相关的同义词名字 |
如:
[...
{
"image_id": 2,
"relationships": [...
{
"relationship_id": 15947,
"predicate": "wears",
"synsets": ["wear.v.01"],
"subject": {
"object_id": 1023838,
"x": 324,
"y": 320,
"w": 142,
"h": 255,
"name": "man",
"synsets": ["man.n.01"]
},
"object": {
"object_id": 5071,
"x": 359,
"y": 362,
"w": 72,
"h": 81,
"name": "backpack",
"synsets": ["backpack.n.01"]
},
},
...],
}
...]
File synsets.json.zip
全部的同义词及其描述.
Name | Type 类型 | Description 描述 |
---|---|---|
synset_name | str | 唯一的同义词名字 |
synset_definition | str | 根据 WordNet 的同义词定义 |
如:
[...
{
"synset_name": "phonograph_record.n.01",
"synset_definition": "sound recording consisting of a disk with a continuous groove; used to reproduce music by rotating while a phonograph needle tracks in the groove",
},
{
"synset_name": "truck.n.01",
"synset_definition": "an automotive vehicle suitable for hauling",
}
...]
File region_graphs.json.zip
全部的 region graphs.
Name | Type 类型 | Description 描述 |
---|---|---|
image_id | int | 包含 region 的图片 ID |
regions | object array | 该图片的 region descriptions |
—-.region_id | int | region description ID |
—-.x | int | x-coordinate of region bounding box |
—-.y | int | y-coordinate of region bounding box |
—-.width | int | width of region bounding box |
—-.height | int | height of region bounding box |
—-.phrase | str | region description phrase |
—-.synsets | object array | synsets in the description |
——–.synset_name | str | synset name |
——–.entity_name | str | string from phrase |
——–.entity_idx_start | int | index where synset starts in the phrase |
——–.entity_idx_end | int | index where synset ends in the phrase |
—-.objects | object array | Array of object instances for this image |
——–.object_id | int | ID of object |
——–.x | int | x-coordinate of object bounding box |
——–.y | int | y-coordinate of object bounding box |
——–.w | int | width of object bounding box |
——–.h | int | height of object bounding box |
——–.name | str | name of object |
——–.synsets | str array | synset names associated with this object |
—-.relationships | object array | array of relationships in the image |
——–.relationship_id | int | ID of relationship |
——–.predicate | int | starting char index of entity |
——–.synsets | str array | synset names associated with the predicate |
——–.subject_id | int | ID of subject (found in objects list) |
——–.object_id | int | ID of object (found in objects list) |
如:
[...
{
"image_id": 2407890,
"regions": [...
{
"region_id": 1353,
"x": 117,
"y": 79,
"width": 249,
"height": 107,
"phrase": "a cat sitting on a table.",
"synsets": [...
{
"synset_name": "cat.n.01",
"entity_name": "cat",
"entity_idx_start": 2,
"entity_idx_end": 5
},
...]
"objects": [...
{
"object_id": 1023838,
"x": 324,
"y": 320,
"w": 142,
"h": 255,
"name": "cat",
"synsets": ["cat.n.01"]
},
{
"object_id": 5071,
"x": 359,
"y": 362,
"w": 72,
"h": 81,
"name": "table",
"synsets": ["table.n.01"]
},
...],
"relationships": [...
{
"relationship_id": 15947,
"predicate": "wears",
"synsets": ["wear.v.01"],
"subject_id": 1023838,
"object_id": 5071,
}
...]
},
...]
},
...]
File scene_graphs.json.zip
全部的 scene graphs.
Name | Type 类型 | Description 描述 |
---|---|---|
image_id | int | ID of image containing region |
objects | object array | Array of object instances for this image |
—-.object_id | int | ID of object |
—-.x | int | x-coordinate of object bounding box |
—-.y | int | y-coordinate of object bounding box |
—-.w | int | width of object bounding box |
—-.h | int | height of object bounding box |
—-.name | str | name of object |
—-.synsets | str array | synset names associated with this object |
.relationships | object array | array of relationships in the image |
—-.relationship_id | int | ID of relationship |
—-.predicate | int | starting char index of entity |
—-.synsets | str array | synset names associated with the predicate |
—-.subject_id | int | ID of subject (found in objects list) |
—-.object_id | int | ID of object (found in objects list) |
如:
[...
{
"image_id": 2407890,
"objects": [...
{
"object_id": 1023838,
"x": 324,
"y": 320,
"w": 142,
"h": 255,
"name": "cat",
"synsets": ["cat.n.01"]
},
{
"object_id": 5071,
"x": 359,
"y": 362,
"w": 72,
"h": 81,
"name": "table",
"synsets": ["table.n.01"]
},
...],
"relationships": [...
{
"relationship_id": 15947,
"predicate": "wears",
"synsets": ["wear.v.01"],
"subject_id": 1023838,
"object_id": 5071,
}
...]
},
...]
File qa_to_region_mapping.json.zip
将 QA 映射到对应的 region descriptions.
如:
{...
QA_ID: REGION_DESCRIPTION_ID,
"1885736": "2072251"
...}