Traceback (most recent call last):
File "/opt/yyl/yolov5/train.py", line 543, in <module>
train(hyp, opt, device, tb_writer)
File "/opt/yyl/yolov5/train.py", line 355, in train
results, maps, times = test.test(data_dict,
File "/opt/yyl/yolov5/test.py", line 122, in test
out = non_max_suppression(out, conf_thres, iou_thres, labels=lb, multi_label=True, agnostic=single_cls)
File "/opt/yyl/yolov5/utils/general.py", line 560, in non_max_suppression
i = torchvision.ops.nms(boxes, scores, iou_thres) # NMS
File "/root/anaconda3/envs/py38_torch16/lib/python3.8/site-packages/torchvision/ops/boxes.py", line 42, in nms
return torch.ops.torchvision.nms(boxes, scores, iou_threshold)
RuntimeError: Could not run 'torchvision::nms' with arguments from the 'CUDA' backend. 'torchvision::nms' is only available for these backends: [CPU, BackendSelect, Named, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, Tracer, Autocast, Batched, VmapMode].
分析原因:
博主在学习yolov5,报错的情况是:第一个epoch训练时正常,但是测试时报错。
博主最开始配置环境的时候更换豆瓣源,使用的命令来自pytorch官网pip install torch==1.7.0+cu101 torchvision==0.8.0+cu101 torchaudio==0.7.0
但是没有找到torchvision0.8.0+cu101,所以直接更换为torchvision0.8.0,导致torchvision版本虽然达到了0.8,但是没有和cuda10.1环境匹配。
解决办法:
既然没有torchvision0.8.0+cu101,那就根据提示尝试更换为torchvision0.8.1+cu101,安装成功后再次执行训练命令,即可正常训练。
usage: train.py [-h] [-a ARCH] [-j N] [--epochs N] [--start-epoch N] [-b N]
[--lr LR] [--momentum M] [--wd W] [-p N] [--resume PATH] [-e]
[--pretrained] [--world-size WORLD_SIZE] [--rank RANK]
[--dist-url DIST_URL] [--dist-backend DIST_BACKEND]
[--seed SEED] [--gpu GPU] [--multiprocessing-distributed]
DIR
train.py: error: the following arguments are required: DIR
在运行别人的代码之前,parser已经添加了default,但是没有起作用:
'data'
, default=’/opt/yyl/data/plant2018/’,这是个小问题,来看下argparse的参数定义:
argparse是根据设定的参数是否有前缀(-或–)来判断是未知参数还是选项参数。基于位置的参数,不要以前缀出现,而且是必须配置的参数。上述问题就是因为data没有前缀所以必须配置,直接将data加上前缀即可不用配置,而从default获取。修改如下:
'-data'
, default=’/opt/yyl/data/plant2018/’,RuntimeError: Error(s) in loading state_dict for ResNet:
Missing key(s) in state_dict: "conv1.weight", "bn1.weight", "bn1.bias", "bn1.running_mean",...
Unexpected key(s) in state_dict: "module.conv1.weight", "module.bn1.weight", ...
使用model.load_state_dict
加载模型时出现如上报错。原因是因为新生成的模型中的参数与训练好的模型best.pth中的参数对不上。
解决办法:加上strict=False
,使得训练好的模型中有什么参数,新生成模型就获取什么参数。
使用model.load_state_dict(ckpt['state_dict'],strict=False)
完整报错:
Traceback (most recent call last):
File "D:\screen_service\model_class.py", line 123, in predict
img = transformer(img)
File "D:\ProgramFile\Anaconda3\envs\py36\lib\site-packages\torchvision\transforms\transforms.py", line 60, in __call__
img = t(img)
File "D:\ProgramFile\Anaconda3\envs\py36\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "D:\ProgramFile\Anaconda3\envs\py36\lib\site-packages\torchvision\transforms\transforms.py", line 221, in forward
return F.normalize(tensor, self.mean, self.std, self.inplace)
File "D:\ProgramFile\Anaconda3\envs\py36\lib\site-packages\torchvision\transforms\functional.py", line 336, in normalize
tensor.sub_(mean).div_(std)
RuntimeError: The size of tensor a (4) must match the size of tensor b (3) at non-singleton dimension 0
报错原因:其实是读取的时候格式不对,所以应该统一转成‘RGB’模式。
我的代码里是先将请求发来的图片的base64编码转成图片后再进行到transformer(img)这一步,所以问题是处在转图片的位置。
将Image.open(img_bytes)
改为Image.open(img_bytes).convert('RGB')
就行。
img_color = Image.open(img_bytes).convert('RGB')
# img_color = Image.open(img_bytes)
青古の每篇一歌
《典狱司》
君还记
铁马将军哽咽若孩提