Tensorflow图像生成文本实现(1)flickr30k数据集介绍

flickr30k数据集是什么

这个数据集的核心就两点,一是图像,二是图像对应的描述语言。
先上图:
Tensorflow图像生成文本实现(1)flickr30k数据集介绍_第1张图片
在token文件中的标注信息:
667626.jpg#0 A girl wearing a red and multicolored bikini is laying on her back in shallow water .
667626.jpg#1 Girl wearing a bikini lying on her back in a shallow pool of clear blue water .
667626.jpg#2 A young girl is lying in the sand , while ocean water is surrounding her .
667626.jpg#3 A little girl in a red swimsuit is laying on her back in shallow water .
667626.jpg#4 A girl is stretched out in shallow water

可以看到,每副图像都搭配有5句描述,五句描述语言的的意思基本都差不多。
我们的目标是训练出一个模型,需要达到的效果是:将一张图像放进去,出来一句对应的还算正确的图像描述,俗话说的看图说话。

数据集下载

官网传送门:点我点我

在本页最下面填个表,然后就可以下载了,但是,有大概率很慢而且不稳定。
百度云链接:链接: https://pan.baidu.com/s/1nQ_t-OzuFkxJmfbzRH2vPA 提取码: md6z (链接失效请留言)

数据集文件结构

两个tar压缩包

  1. flickr30k-images.tar
  2. flickr30k.tar.gz

第一个包存放图片,第二个包存放图像的标注信息(一张图像有几句语言表述)。
在 flickr30k.tar.gz 中,有一个名为 results_20130124.token,可以输入该文件进行查看。

import pandas as pd

annotations = pd.read_table('results_20130124.token', sep='\t', header=None,
                            names=['image', 'caption'])
print(annotations)

结果为:

                   image                                            caption
0       1000092795.jpg#0  Two young guys with shaggy hair look at their ...
1       1000092795.jpg#1  Two young , White males are outside near many ...
2       1000092795.jpg#2   Two men in green shirts are standing in a yard .
3       1000092795.jpg#3       A man in a blue shirt standing in a garden .
4       1000092795.jpg#4            Two friends enjoy time spent together .
5         10002456.jpg#0  Several men in hard hats are operating a giant...
6         10002456.jpg#1  Workers look down from up above on a piece of ...
7         10002456.jpg#2   Two men working on a machine wearing hard hats .
8         10002456.jpg#3              Four men on top of a tall structure .
9         10002456.jpg#4                         Three men on a large rig .
...                  ...                                                ...
[158915 rows x 2 columns]

可以看出一张图像,对应5条描述语言,一共有158915条语言描述。

你可能感兴趣的:(tensorflow,Tensorflow学习笔记)