进行植被分类实验,使用pytorch版本的deeplabV3训练自己的数据集,训练过程参考大大的博客:
https://blog.csdn.net/qq_39056987/article/details/106455828
修改了各种配置之后,先根据标红安装各种缺少的包,代码中无明显错误后开始训练。
(1)RuntimeError: unexpected EOF, expected 24460 more bytes. The file might be corrupted.
下载预训练模型时网络不稳定,再次下载就会这样报错。
解决办法:删除损坏的文件,目录:C:\Users\用户名.cache\torch\hub\checkpoints,重新下载
(2)UserWarning: size_average and reduce args will be deprecated, please use reduction='mean' instead
this warning lets you know that future versions of PyTorch these arguments will be deprecated. So just please continue your training。
so不用管,继续训练
(3)Target 4 is out of bounds.
原因:NUM_CLASSES定义错了
(4)
File "train.py", line 118, in training
self.summary.visualize_image(self.writer, self.args.dataset, image, target, output, global_step)
File "F:\2021_1(4-2)\ProjectNet\VegetationClassifyNet\utils\summaries.py", line 18, in visualize_image
grid_image = make_grid(decode_seg_map_sequence(torch.max(output[:3], 1)[1].detach().cpu().numpy(),
File "F:\2021_1(4-2)\ProjectNet\VegetationClassifyNet\dataloaders\utils.py", line 8, in decode_seg_map_sequence
rgb_mask = decode_segmap(label_mask, dataset)
File "F:\2021_1(4-2)\ProjectNet\VegetationClassifyNet\dataloaders\utils.py", line 43, in decode_segmap
r[label_mask == ll] = label_colours[ll, 0]
IndexError: index 4 is out of bounds for axis 0 with size 4
同上,NUM_CLASSES定义有问题
三者要一致且与自己的模型匹配
(5)in training if i%(num_img_tr//10)==0
除零错误,没有搞明白num_img_tr这个参数代表什么,但是它好像随着你数据量增加而增大,数据量足够多时就不会报错了。
(6)训练完成一次后报错
File "E:\Python\lib\site-packages\torch\nn\modules\module.py", line 947, in __getattr__
raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'DeepLab' object has no attribute 'module'
本以为是环境配置问题,改这改那却都没用,也不是
较为常见的‘module’ object has no attribute‘XXX’问题,终于找到了解决办法,泪目
修改 ‘state_dict’: self.model.module.state_dict() 为 ‘state_dict’: self.model.state_dict()
(1)Error(s) in loading state_dict
Traceback (most recent call last):
File "demo.py", line 101, in <module>
main()
File "demo.py", line 66, in main
model.load_state_dict(ckpt['state_dict'])
File "E:\Python\lib\site-packages\torch\nn\modules\module.py", line 1223, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for DeepLab:
size mismatch for decoder.last_conv.8.weight: copying a param with shape torch.Size([5, 256, 1, 1]) from checkpoint, the shape in current m
odel is torch.Size([2, 256, 1, 1]).
size mismatch for decoder.last_conv.8.bias: copying a param with shape torch.Size([5]) from checkpoint, the shape in current model is torch
.Size([2]).
竟然还是参数不匹配问题?!
修改模型测试py文件中的default为自己的类别数
parser.add_argument('--num_classes', type=int, default=5,
help='crop image size')
(2)AssertionError: Torch not compiled with CUDA enabled
Traceback (most recent call last):
File "demo.py", line 102, in <module>
main()
File "demo.py", line 68, in main
model = model.cuda()
File "E:\Python\lib\site-packages\torch\nn\modules\module.py", line 491, in cuda
return self._apply(lambda t: t.cuda(device))
File "E:\Python\lib\site-packages\torch\nn\modules\module.py", line 387, in _apply
module._apply(fn)
File "E:\Python\lib\site-packages\torch\nn\modules\module.py", line 387, in _apply
module._apply(fn)
File "E:\Python\lib\site-packages\torch\nn\modules\module.py", line 387, in _apply
module._apply(fn)
[Previous line repeated 1 more time]
File "E:\Python\lib\site-packages\torch\nn\modules\module.py", line 409, in _apply
param_applied = fn(param)
File "E:\Python\lib\site-packages\torch\nn\modules\module.py", line 491, in <lambda>
return self._apply(lambda t: t.cuda(device))
File "E:\Python\lib\site-packages\torch\cuda\__init__.py", line 164, in _lazy_init
raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled
搜了一下,有很多大佬说是pytorch和CUDA版本不兼容或者下载的torch没有cuda的原因,我没有试去配置版本
最简单的解决了我的问题的一个办法:model=model.cuda()
中的.cuda()删掉
END---------------------------------------------------------
今天你debug了吗?