因为要预研VQA项目参考,趁GPU满负荷的时间,记录下这个数据集相关笔记:
官方网站定义为:
Visual Genome 是一个数据集,知识库,不断努力把结构化的图像概念和语言连接起来。
使用了众包的方式实现,由李飞飞一位同事 Michael Bernstein 提出。
截至今天2016/12/08包含:
108077张图片
540 万对区域的描述(Region Descriptions)
170 万视觉问答(Visual Question Answers)
380万对象案例(Object Instances)
280 万属性(Attributes)
230万关系(Relationships)
所有的东西都映射到 Wordnet Synsets
稍后添加数据集的详细中文描述
参考
http://visualgenome.org/
官方网站有原始论文
http://www.ccf.org.cn/sites/ccf/nry.jsp?contentId=2912552761248
该网站有中文翻译论文
以下依次是文件/返回内容/格式
images[2].zip
Return all images in jpg format. 返回所有jpg格式的图像
IMAGE_ID.jpg,
image_data.json.zip
Return meta data about all images 返回所有图像的元信息(图像基本情况)
Name Type Description
image_id int ID of image 图像ID
url hyperlink string Visual Genome-hosted image URL 图像来源的超链接URL
width int width of image in px 图像宽度
height int height of image in px 图像高度
coco_id int ID of the image in the coco dataset
flickr_id int ID of the image in the flickr dataset
由此可见该数据集引用了coco数据集和flickr数据集
[...
{
"image_id": 2412112,
"url": "https://cs.stanford.edu/people/rak248/VG_100K/2370463.jpg",
"width": 500,
"height": 281,
"coco_id": 547168,
"flickr_id": 8505158818
}
...]
region_descriptions.json.zip
Return all region descriptions 区域描述
Name Type Description
image_id int ID of image containing region
regions object array Array of region descriptions for this image
.region_id
int ID of region description
.x
int x-coordinate of region bounding box
.y
int y-coordinate of region bounding box
.width
int width of region bounding box
.height
int height of region bounding box
.phrase
str region description phrase
.synsets 同义词集
object array synsets in the description
.synset_name
str synset name
.entity_name
str string from phrase
.entity_idx_start
int index where synset starts in the phrase
.entity_idx_end
int index where synset ends in the phrase
[...
{
"image_id": 2407890,
"regions": [...
{
"region_id": 1353,
"x": 117,
"y": 79,
"width": 249,
"height": 107,
"phrase": "a cat sitting on a table.",
"synsets": [...
{
"synset_name": "cat.n.01",
"entity_name": "cat",
"entity_idx_start": 2,
"entity_idx_end": 5
},
...]
},
{
"region_id": 1354,
"x": 116,
"y": 29,
"width": 239,
"height": 135,
"phrase": "a white cat with a tan tail and face markings",
"synsets": [...
...]
},
...]
},
{
"image_id": 2407890,
"regions": [...
...]
},
...]
question_answers.json.zip
All visual question answers 所有可视化的问题和答案
Name Type Description
image_id int ID of image
qas object array list of qas for the image
.qa_id
str ID of question answer
.question
str question
.answer
str answer
.question_synsets
object array array of sysnets in the question
.synset_name
str synset name
.entity_name
str string from question
.entity_idx_start
str index where synset starts in the question
.entity_idx_end
str index where synset ends in the question
.answer_synsets
object array array of sysnets in the answer
.synset_name
str synset name
.entity_name
str string from answer
.entity_idx_start
int index where synset starts in the answer
.entity_idx_end
int index where synset ends in the answer
[...
{
"image_id": 2317993,
"qas": [...
{
"qa_id": 912402,
"question": "Where are the clouds?",
"answer": "sky",
"question_synsets": [...
{
"synset_name": "cloud.n.01",
"entity_name": "cloud",
"entity_idx_start": 14,
"entity_idx_end": 20
},
...],
"answer_synsets": [...
{
"synset_name": "sky.n.01",
"entity_name": "sky",
"entity_idx_start": 0,
"entity_idx_end": 3
},
...]
},
...]
},
...]
objects.json.zip
All object instances
Name Type Description
image_id int ID of image
objects object array Array of object instances for this image
.object_id
int ID of object
.x
int x-coordinate of object bounding box
.y
int y-coordinate of object bounding box
.w
int width of object bounding box
.h
int height of object bounding box
.name
str name of object
.synsets
str array synset names associated with this object
[...
{
"image_id": 2,
"objects": [...
{
"object_id": 1023847,
"x": 405,
"y": 34,
"w": 78,
"h": 438,
"name": "pole",
"synsets": ["pole.n.01"]
},
{
"object_id": 1023836,
"x": 239,
"y": 347,
"w": 136,
"h": 126,
"name": "car",
"synsets": ["car.n.01"]
},
...]
},
...]
attributes.json.zip
All attributes in the dataset
Name Type Description
image_id int ID of image
attributes object array Array of attributes with object instances for this image
.object_id
int ID of object
.x
int x-coordinate of object bounding box
.y
int y-coordinate of object bounding box
.w
int width of object bounding box
.h
int height of object bounding box
.name
str name of object
.synsets
str array synset names associated with this object
.attributes
str array list of attributes associated with this object
[...
{
"image_id": 2,
"attributes": [...
{
"object_id": 1023847,
"x": 405,
"y": 34,
"w": 78,
"h": 438,
"name": "pole",
"synsets": ["pole.n.01"],
"attributes": ["brown"]
},
{
"object_id": 1023836,
"x": 239,
"y": 347,
"w": 136,
"h": 126,
"name": "car",
"synsets": ["car.n.01"],
"attributes": ["red", "broken"]
},
...]
},
...]
relationships.json.zip
All relationships
Name Type Description
image_id int ID of image
relationships object array array of relationships in the image
.relationship_id
int ID of relationship
.predicate
int starting char index of entity
.synsets
str array synset names associated with the predicate
.subject
int ending char index of entity
.object_id
int ID of object
.x
int x-coordinate of object bounding box
.y
int y-coordinate of object bounding box
.w
int width of object bounding box
.h
int height of object bounding box
.name
str name of object
.synsets
str array synset names associated with this object
.object
int name of recognized entity
.object_id
int ID of object
.x
int x-coordinate of object bounding box
.y
int y-coordinate of object bounding box
.w
int width of object bounding box
.h
int height of object bounding box
.name
str name of object
.synsets
str array synset names associated with this object
[...
{
"image_id": 2,
"relationships": [...
{
"relationship_id": 15947,
"predicate": "wears",
"synsets": ["wear.v.01"],
"subject": {
"object_id": 1023838,
"x": 324,
"y": 320,
"w": 142,
"h": 255,
"name": "man",
"synsets": ["man.n.01"]
},
"object": {
"object_id": 5071,
"x": 359,
"y": 362,
"w": 72,
"h": 81,
"name": "backpack",
"synsets": ["backpack.n.01"]
},
},
...],
}
...]
synsets.json.zip
All the synsets and their descriptions
Name Type Description
synset_name
str unique synset name
synset_definition
str definition of synset according to WordNet
[...
{
"synset_name": "phonograph_record.n.01",
"synset_definition": "sound recording consisting of a disk with a continuous groove; used to reproduce music by rotating while a phonograph needle tracks in the groove",
},
{
"synset_name": "truck.n.01",
"synset_definition": "an automotive vehicle suitable for hauling",
}
...]
region_graphs.json.zip
All the region graphs
Name Type Description
image_id int ID of image containing region
regions object array Array of region descriptions for this image
.region_id
int ID of region description
.x
int x-coordinate of region bounding box
.y
int y-coordinate of region bounding box
.width
int width of region bounding box
.height
int height of region bounding box
.phrase
str region description phrase
.synsets
object array synsets in the description
.synset_name
str synset name
.entity_name
str string from phrase
.entity_idx_start
int index where synset starts in the phrase
.entity_idx_end
int index where synset ends in the phrase
.objects
object array Array of object instances for this image
.object_id
int ID of object
.x
int x-coordinate of object bounding box
.y
int y-coordinate of object bounding box
.w
int width of object bounding box
.h
int height of object bounding box
.name
str name of object
.synsets
str array synset names associated with this object
.relationships
object array array of relationships in the image
.relationship_id
int ID of relationship
.predicate
int starting char index of entity
.synsets
str array synset names associated with the predicate
.subject_id
int ID of subject (found in objects list)
.object_id
int ID of object (found in objects list)
[...
{
"image_id": 2407890,
"regions": [...
{
"region_id": 1353,
"x": 117,
"y": 79,
"width": 249,
"height": 107,
"phrase": "a cat sitting on a table.",
"synsets": [...
{
"synset_name": "cat.n.01",
"entity_name": "cat",
"entity_idx_start": 2,
"entity_idx_end": 5
},
...]
"objects": [...
{
"object_id": 1023838,
"x": 324,
"y": 320,
"w": 142,
"h": 255,
"name": "cat",
"synsets": ["cat.n.01"]
},
{
"object_id": 5071,
"x": 359,
"y": 362,
"w": 72,
"h": 81,
"name": "table",
"synsets": ["table.n.01"]
},
...],
"relationships": [...
{
"relationship_id": 15947,
"predicate": "wears",
"synsets": ["wear.v.01"],
"subject_id": 1023838,
"object_id": 5071,
}
...]
},
...]
},
...]
scene_graphs.json.zip
All the scene graphs
Name Type Description
image_id int ID of image containing region
objects object array Array of object instances for this image
.object_id
int ID of object
.x
int x-coordinate of object bounding box
.y
int y-coordinate of object bounding box
.w
int width of object bounding box
.h
int height of object bounding box
.name
str name of object
.synsets
str array synset names associated with this object
.relationships object array array of relationships in the image
.relationship_id
int ID of relationship
.predicate
int starting char index of entity
.synsets
str array synset names associated with the predicate
.subject_id
int ID of subject (found in objects list)
.object_id
int ID of object (found in objects list)
[...
{
"image_id": 2407890,
"objects": [...
{
"object_id": 1023838,
"x": 324,
"y": 320,
"w": 142,
"h": 255,
"name": "cat",
"synsets": ["cat.n.01"]
},
{
"object_id": 5071,
"x": 359,
"y": 362,
"w": 72,
"h": 81,
"name": "table",
"synsets": ["table.n.01"]
},
...],
"relationships": [...
{
"relationship_id": 15947,
"predicate": "wears",
"synsets": ["wear.v.01"],
"subject_id": 1023838,
"object_id": 5071,
}
...]
},
...]
qa_to_region_mapping.json.zip
Mapping from qa to their corresponding region descriptions
{...
QA_ID: REGION_DESCRIPTION_ID,
"1885736": "2072251"
...}