报错内容:
Traceback (most recent call last):
File "/home/user1/main_arc_face.py", line 534, in <module>
main()
File "/home/user1/main_arc_face.py", line 315, in main
val_loss, prec1 = validate(val_loader, model, criterion)
File "/home/user1/main_arc_face.py", line 455, in validate
output = model(input)
File "/home/user1/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__
result = self.forward(*input, **kwargs)
File "/home/user1/arc_face/model.py", line 173, in forward
x = self.output_layer(x)
File "/home/user1/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__
result = self.forward(*input, **kwargs)
File "/home/user1/miniconda3/lib/python3.7/site-packages/torch/nn/modules/container.py", line 92, in forward
input = module(input)
File "/home/user1/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__
result = self.forward(*input, **kwargs)
File "/home/user1/miniconda3/lib/python3.7/site-packages/torch/nn/modules/linear.py", line 92, in forward
return F.linear(input, self.weight, self.bias)
File "/home/user1/miniconda3/lib/python3.7/site-packages/torch/nn/functional.py", line 1406, in linear
ret = torch.addmm(bias, input, weight.t())
RuntimeError: size mismatch, m1: [16 x 86016], m2: [25088 x 512] at /opt/conda/conda-bld/pytorch_1556653215914/work/aten/src/THC/generic/THCTensorMathBlas.cu:268
原因:调整了训练的图像输入,测试时的没有调整。导致训练没问题,测试报错。ArcFace不管是训练输入还是测试输入,都是112x112的(也可能不一定非得112,但是需要统一),所以训练改了测试也要改。transforms.Resize((112, 112)),
修复:
val_loader = torch.utils.data.DataLoader(
CelebA(args.data,
'val_40_att_list.txt',
transforms.Compose([
transforms.Resize((112, 112)),
transforms.ToTensor(),
normalize,
])),
batch_size=args.test_batch, shuffle=False,
num_workers=args.workers, pin_memory=True)
是怎么想到这一点的呢?
从这个报错来看:m1: [16 x 86016], m2: [25088 x 512]
86016 = 512 x 14 x 12,而25088 = 112 x 112 x 2
设置一个小的数据集做测试,在模型定义的forward函数中 报错的位置(x = self.output_layer(x)) 输出数据的shape,对比训练和测试时候的shape变化异同,可以发现问题。
def forward(self,x):
x = self.input_layer(x)
x = self.body(x)
# print(x.shape)
x = self.output_layer(x)
# print(x.shape)
y = []
for i in range(self.num_attributes):
classifier = getattr(self, 'classifier' + str(i).zfill(2))
y.append(classifier(x))
return y
类似的问题在这里:pytorch RuntimeError size mismatch, m1: [64 x 100], m2: [784 x 128] at /pytorch/aten/src/TH/generic/THTensorMath.cpp:2070