reference:https://mxnet.incubator.apache.org/api/python/io/io.html#mxnet.recordio.MXRecordIO
读写RecordIO数据格式,支持顺序读写
>>> record = mx.recordio.MXRecordIO('tmp.rec', 'w')
>>> for i in range(5):
... record.write('record_%d'%i)
>>> record.close()
>>> record = mx.recordio.MXRecordIO('tmp.rec', 'r')
>>> for i in range(5):
... item = record.read()
... print(item)
record_0
record_1
record_2
record_3
record_4
>>> record.close()
注意python2中write的输入是string,而在python3中write的输入是bytes
在Python3中应该写为
record.write(b"record_%d" % i)
record.read().decode()
随机读写
#python2
>>> for i in range(5):
... record.write_idx(i, 'record_%d'%i)
>>> record.close()
>>> record = mx.recordio.MXIndexedRecordIO('tmp.idx', 'tmp.rec', 'r')
>>> record.read_idx(3)
record_3
是一个header,用于存储元数据,和mxnet.recordio.pack, mxnet.recordio.pack_img一起用,将label和image打包为二进制.rec文件。
flag(int)-可以任意设置
label(float)-或者1-D float array
id(int)-唯一的id
id2(int)-通常为0
把字符串打包进MXImageRecord
将一个MXImageRecord解压为string
# pack
data = b'data'#python3中输入bytes
label1 = 1.0
header1 = mx.recordio.IRHeader(flag=0, label=label1, id=1, id2=0)
s1 = mx.recordio.pack(header1, data)
label2 = [1.0, 2.0, 3.0]
header2 = mx.recordio.IRHeader(flag=3, label=label2, id=2, id2=0)
s2 = mx.recordio.pack(header2, data)
# unpack
print(mx.recordio.unpack(s1))
print(mx.recordio.unpack(s2))
将image打包进MXImageRecord
img - numpy array
quality - JPEG(1-100), PNG(1-9)
img_fmt - .jpg, .png
label = 4 # label can also be a 1-D array, for example: label = [1,2,3]
id = 2574
header = mx.recordio.IRHeader(0, label, id, 0)
img = cv2.imread('test.jpg')
packed_s = mx.recordio.pack_img(header, img)
把MXImageRecord解压为image
>>> record = mx.recordio.MXRecordIO('test.rec', 'r')
>>> item = record.read()
>>> header, img = mx.recordio.unpack_img(item)
>>> header
HEADER(flag=0, label=14.0, id=20129312, id2=0)
>>> img
array([[[ 23, 27, 45],
[ 28, 32, 50],
...,
[ 36, 40, 59],
[ 35, 39, 58]],
...,
[[ 91, 92, 113],
[ 97, 98, 119],
...,
[168, 169, 167],
[166, 167, 165]]], dtype=uint8)