MXNet数据格式转化

最近在学习使用MXNet,train_mnist.py可以正常跑,但是由于服务器无法联网,imagenet的数据集无法从网上下载,只好自己下载了JPEG类型的数据集,然后参考MXNet的官方文档,学习怎么转化为MXNet的文件格式rec。

以下为官方文档内容[mxnet/example/image-classification/README.md]

Prepare Datasets

The recommended data format is RecordIO, which concatenates multiple examples into seekable binary files for better read efficiency. We provide a tool im2rec.py located in tools/ to convert individual images into .rec files.

在MXNet中提供了一个工具 im2rec.py 将原始图片转化为.rec 文件, im2rec.pytools/ 下。

For a simple tutorial, assume all images are stored as individual image files such as .png or .jpg, and images belonging to the same class are placed in the same directory. All these class directories are then in the same root img_data directory. Our goal is to generate two files, mydata_train.rec for training and mydata_val.rec for validation, and the former contains 95% images.

这些.png 或者.jpg 格式的图片,保存在同一个路径img_data下,我们的目标是将得到两个文件,用来训练的mydata_train.rec 和用来验证的mydata_val.rec,用来训练的图片占了所有图片的95%。

We first prepare two .lst files, which consist of the labels and image paths can be used for generating rec files.

首先准备两个.lst 文件,这各文件由标签和图片路径组成,用来生成 rec 文件。

python tools/im2rec.py --list True --recursive True --train-ratio 0.95 mydata img_data

Then we generate the .rec files. We resize the images such that the short edge is at least 480px and save them with 95/100 quality. We also use 16 threads to accelerate the packing.

然后生成 rec 文件。在这里我们重新定义图片的大小,并且使用16个线程来加快速度。

python tools/im2rec.py --resize 480 --quality 95 --num-thread 16 mydata img_data

你可能感兴趣的:(MXNet)