Visual Genome 是一个数据集,知识库,不断努力把结构化的图像概念和语言连接起来。
使用了众包的方式实现,由李飞飞一位同事 Michael Bernstein 提出。
540 万对区域的描述(Region Descriptions)
170 万视觉问答(Visual Question Answers)
380万对象案例(Object Instances)
280 万属性(Attributes)
所有的东西都映射到 Wordnet Synsets
Return all images in jpg format. 返回所有jpg格式的图像
Return meta data about all images 返回所有图像的元信息(图像基本情况)
Name Type Description
image_id int ID of image 图像ID
url hyperlink string Visual Genome-hosted image URL 图像来源的超链接URL
width int width of image in px 图像宽度
height int height of image in px 图像高度
coco_id int ID of the image in the coco dataset
flickr_id int ID of the image in the flickr dataset
"image_id": 2412112,
"url": "https://cs.stanford.edu/people/rak248/VG_100K/2370463.jpg",
"width": 500,
"height": 281,
"coco_id": 547168,
"flickr_id": 8505158818
Return all region descriptions 区域描述
Name Type Description
image_id int ID of image containing region
regions object array Array of region descriptions for this image
int ID of region description
int x-coordinate of region bounding box
int y-coordinate of region bounding box
int width of region bounding box
int height of region bounding box
str region description phrase
.synsets 同义词集
object array synsets in the description
str synset name
str string from phrase
int index where synset starts in the phrase
int index where synset ends in the phrase
"image_id": 2407890,
"regions": [...
"region_id": 1353,
"x": 117,
"y": 79,
"width": 249,
"height": 107,
"phrase": "a cat sitting on a table.",
"synsets": [...
"synset_name": "cat.n.01",
"entity_name": "cat",
"entity_idx_start": 2,
"entity_idx_end": 5
"region_id": 1354,
"x": 116,
"y": 29,
"width": 239,
"height": 135,
"phrase": "a white cat with a tan tail and face markings",
"synsets": [...
"image_id": 2407890,
"regions": [...
All visual question answers 所有可视化的问题和答案
Name Type Description
image_id int ID of image
qas object array list of qas for the image
str ID of question answer
str question
str answer
object array array of sysnets in the question
str synset name
str string from question
str index where synset starts in the question
str index where synset ends in the question
object array array of sysnets in the answer
str synset name
str string from answer
int index where synset starts in the answer
int index where synset ends in the answer
"image_id": 2317993,
"qas": [...
"qa_id": 912402,
"question": "Where are the clouds?",
"answer": "sky",
"question_synsets": [...
"synset_name": "cloud.n.01",
"entity_name": "cloud",
"entity_idx_start": 14,
"entity_idx_end": 20
"answer_synsets": [...
"synset_name": "sky.n.01",
"entity_name": "sky",
"entity_idx_start": 0,
"entity_idx_end": 3
All object instances
Name Type Description
image_id int ID of image
objects object array Array of object instances for this image
int ID of object
int x-coordinate of object bounding box
int y-coordinate of object bounding box
int width of object bounding box
int height of object bounding box
str name of object
str array synset names associated with this object
"image_id": 2,
"objects": [...
"object_id": 1023847,
"x": 405,
"y": 34,
"w": 78,
"h": 438,
"name": "pole",
"synsets": ["pole.n.01"]
"object_id": 1023836,
"x": 239,
"y": 347,
"w": 136,
"h": 126,
"name": "car",
"synsets": ["car.n.01"]
All attributes in the dataset
Name Type Description
image_id int ID of image
attributes object array Array of attributes with object instances for this image
int ID of object
int x-coordinate of object bounding box
int y-coordinate of object bounding box
int width of object bounding box
int height of object bounding box
str name of object
str array synset names associated with this object
str array list of attributes associated with this object
"image_id": 2,
"attributes": [...
"object_id": 1023847,
"x": 405,
"y": 34,
"w": 78,
"h": 438,
"name": "pole",
"synsets": ["pole.n.01"],
"attributes": ["brown"]
"object_id": 1023836,
"x": 239,
"y": 347,
"w": 136,
"h": 126,
"name": "car",
"synsets": ["car.n.01"],
"attributes": ["red", "broken"]
All relationships
Name Type Description
image_id int ID of image
relationships object array array of relationships in the image
int ID of relationship
int starting char index of entity
str array synset names associated with the predicate
int ending char index of entity
int ID of object
int x-coordinate of object bounding box
int y-coordinate of object bounding box
int width of object bounding box
int height of object bounding box
str name of object
str array synset names associated with this object
int name of recognized entity
int ID of object
int x-coordinate of object bounding box
int y-coordinate of object bounding box
int width of object bounding box
int height of object bounding box
str name of object
str array synset names associated with this object
"image_id": 2,
"relationships": [...
"relationship_id": 15947,
"predicate": "wears",
"synsets": ["wear.v.01"],
"subject": {
"object_id": 1023838,
"x": 324,
"y": 320,
"w": 142,
"h": 255,
"name": "man",
"synsets": ["man.n.01"]
"object": {
"object_id": 5071,
"x": 359,
"y": 362,
"w": 72,
"h": 81,
"name": "backpack",
"synsets": ["backpack.n.01"]
All the synsets and their descriptions
Name Type Description
str unique synset name
str definition of synset according to WordNet
"synset_name": "phonograph_record.n.01",
"synset_definition": "sound recording consisting of a disk with a continuous groove; used to reproduce music by rotating while a phonograph needle tracks in the groove",
"synset_name": "truck.n.01",
"synset_definition": "an automotive vehicle suitable for hauling",
All the region graphs
Name Type Description
image_id int ID of image containing region
regions object array Array of region descriptions for this image
int ID of region description
int x-coordinate of region bounding box
int y-coordinate of region bounding box
int width of region bounding box
int height of region bounding box
str region description phrase
object array synsets in the description
str synset name
str string from phrase
int index where synset starts in the phrase
int index where synset ends in the phrase
object array Array of object instances for this image
int ID of object
int x-coordinate of object bounding box
int y-coordinate of object bounding box
int width of object bounding box
int height of object bounding box
str name of object
str array synset names associated with this object
object array array of relationships in the image
int ID of relationship
int starting char index of entity
str array synset names associated with the predicate
int ID of subject (found in objects list)
int ID of object (found in objects list)
"image_id": 2407890,
"regions": [...
"region_id": 1353,
"x": 117,
"y": 79,
"width": 249,
"height": 107,
"phrase": "a cat sitting on a table.",
"synsets": [...
"synset_name": "cat.n.01",
"entity_name": "cat",
"entity_idx_start": 2,
"entity_idx_end": 5
"objects": [...
"object_id": 1023838,
"x": 324,
"y": 320,
"w": 142,
"h": 255,
"name": "cat",
"synsets": ["cat.n.01"]
"object_id": 5071,
"x": 359,
"y": 362,
"w": 72,
"h": 81,
"name": "table",
"synsets": ["table.n.01"]
"relationships": [...
"relationship_id": 15947,
"predicate": "wears",
"synsets": ["wear.v.01"],
"subject_id": 1023838,
"object_id": 5071,
All the scene graphs
Name Type Description
image_id int ID of image containing region
objects object array Array of object instances for this image
int ID of object
int x-coordinate of object bounding box
int y-coordinate of object bounding box
int width of object bounding box
int height of object bounding box
str name of object
str array synset names associated with this object
.relationships object array array of relationships in the image
int ID of relationship
int starting char index of entity
str array synset names associated with the predicate
int ID of subject (found in objects list)
int ID of object (found in objects list)
"image_id": 2407890,
"objects": [...
"object_id": 1023838,
"x": 324,
"y": 320,
"w": 142,
"h": 255,
"name": "cat",
"synsets": ["cat.n.01"]
"object_id": 5071,
"x": 359,
"y": 362,
"w": 72,
"h": 81,
"name": "table",
"synsets": ["table.n.01"]
"relationships": [...
"relationship_id": 15947,
"predicate": "wears",
"synsets": ["wear.v.01"],
"subject_id": 1023838,
"object_id": 5071,
Mapping from qa to their corresponding region descriptions
"1885736": "2072251"