在训练人脸数据集MS1M时,采用pytorch的ImageFolder
对原始的图像进行的读取。由于人脸数据小,且量大,导致GPU很快训练完成,但是IO却很慢,从而拖垮了整个训练时间。
以上问题的根本原因在于pytorch没有自己的数据格式,像TF的TFrecorde,mx的rec文件以及caffe使用lmdb,都有自己的格式。因此,我们可以采用其他框架的格式做数据读取,pytorch来做训练。
首先压缩问题:在mxnet的github的网站下载源码文件,其中tools
的img2rec.py
即官网给的编码文件
图像文件夹如下形式:
imgs
- id1---->images
- id2---->images
首先生成.lst文件,该文件包含了图像的所有路径。执行代码
python img2rec.py train_data imgs --list --recursive --num-thread=10
--list
表示生成.lst文件--recursive
表示浏览路径下的所有文件--num-thread
表示多线程,一定要设置,不然默认的1,会非常慢然后根据生成的.lst文件生成rec文件
执行代码
python img2rec train_data images --num-thread=10
images.rec
和images.idx
,这两个就是我们需要的文件正式的代码部分
mxnet
的gluon
这个封装好的模块import mxnet as mx
from mxnet.gluon.data.vision import ImageRecordDataset
from mxnet.gluon.data import DataLoader
import torch
import numpy as np
from PIL import Image
def load_mx_rec():
data = ImageRecordDataset('F:/MXnet/train_data.rec')
train_loader = DataLoader(data, batch_size=4, shuffle=False)
train_transform = transforms.Compose([transforms.Resize([int(128 * 128 / 112)
, int(128 * 128 / 112)]), transforms.RandomCrop([128, 128])
, transforms.RandomHorizontalFlip(), transforms.ToTensor()])
for input, label in iter(train_loader):
inputs = input.asnumpy()
nB = torch.rand(4, 3, 128, 128)
for i in range(4):
image = Image.fromarray(inputs[i,:,:,:])
image = train_transform(image)
nB[i,:,:,:] = image
labels = label.asnumpy()
labels = torch.from_numpy(labels).long()
# load_mx_rec()
import mxnet as mx
from mxnet.gluon.data.vision import ImageRecordDataset
from mxnet.gluon.data import DataLoader
import torch
import numpy as np
import cv2
def load_mx_rec_2():
data = ImageRecordDataset('F:/MXnet/train_data.rec')
data1 = datasets.ImageFolder('F:/MXnet/images')
train_loader = DataLoader(data, batch_size=4, shuffle=False)
# train_transform = transforms.Compose([transforms.Resize([int(128 * 128 / 112)
# , int(128 * 128 / 112)]), transforms.RandomCrop([128, 128])
# , transforms.RandomHorizontalFlip(), transforms.ToTensor()])
for input, label in iter(train_loader):
inputs = input.asnumpy()
nB = torch.rand(4, 3, 128, 128)
for i in range(4):
image = cv2.cvtColor(inputs[i,:,:,:], cv2.COLOR_RGB2BGR)
size = (int(128 * 128 / 112), int(128 * 128 / 112))
image = cv2.resize(image, size)
x = np.random.randint(0, int(128*128/112)-128)
y = np.random.randint(0, int(128*128/112)-128)
image = image[x:x+128, y:y+128]
if random.choice([0,1])>0:
cv2.flip(image, 1, image)
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
image = image.transpose(3, 1, 2).astype(np.float32) / 255
image[0,:,:] = (image[0,:,:] - 0.5) / 0.5
image[1,:,:] = (image[1,:,:] - 0.5) / 0.5
image[2,:,:] = (image[2,:,:] - 0.5) / 0.5
image = torch.from_numpy(image)
nB[i,:,:,:] = image
labels = label.asnumpy()
labels = torch.from_numpy(labels).long()
load_mx_rec_2()