pytorch-deeplab V3+训练自己的数据集 问题总结

pytorch-deeplab V3+训练自己的数据集 问题总结

进行植被分类实验,使用pytorch版本的deeplabV3训练自己的数据集,训练过程参考大大的博客:
https://blog.csdn.net/qq_39056987/article/details/106455828

一、安装包

修改了各种配置之后,先根据标红安装各种缺少的包,代码中无明显错误后开始训练。

二、训练

(1)RuntimeError: unexpected EOF, expected 24460 more bytes. The file might be corrupted.
pytorch-deeplab V3+训练自己的数据集 问题总结_第1张图片
下载预训练模型时网络不稳定,再次下载就会这样报错。
解决办法:删除损坏的文件,目录:C:\Users\用户名.cache\torch\hub\checkpoints,重新下载

(2)UserWarning: size_average and reduce args will be deprecated, please use reduction='mean' instead

this warning lets you know that future versions of PyTorch these arguments will be deprecated. So just please continue your training。

so不用管,继续训练

(3)Target 4 is out of bounds.
原因:NUM_CLASSES定义错了

(4)

File "train.py", line 118, in training
    self.summary.visualize_image(self.writer, self.args.dataset, image, target, output, global_step)
  File "F:\2021_1(4-2)\ProjectNet\VegetationClassifyNet\utils\summaries.py", line 18, in visualize_image
    grid_image = make_grid(decode_seg_map_sequence(torch.max(output[:3], 1)[1].detach().cpu().numpy(),
  File "F:\2021_1(4-2)\ProjectNet\VegetationClassifyNet\dataloaders\utils.py", line 8, in decode_seg_map_sequence
    rgb_mask = decode_segmap(label_mask, dataset)
  File "F:\2021_1(4-2)\ProjectNet\VegetationClassifyNet\dataloaders\utils.py", line 43, in decode_segmap
    r[label_mask == ll] = label_colours[ll, 0]
IndexError: index 4 is out of bounds for axis 0 with size 4

同上,NUM_CLASSES定义有问题

  1. dataloaders/datasets/自己的数据集.py中的NUM_CLASSES
  2. dataloaders/utils.py get_invoice_lables()中定义的颜色类别个数
  3. dataloaders/utils.py decode_segmap()中的类别数。

三者要一致且与自己的模型匹配

(5)in training if i%(num_img_tr//10)==0
pytorch-deeplab V3+训练自己的数据集 问题总结_第2张图片
除零错误,没有搞明白num_img_tr这个参数代表什么,但是它好像随着你数据量增加而增大,数据量足够多时就不会报错了。

(6)训练完成一次后报错

File "E:\Python\lib\site-packages\torch\nn\modules\module.py", line 947, in __getattr__
    raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'DeepLab' object has no attribute 'module'

本以为是环境配置问题,改这改那却都没用,也不是
较为常见的‘module’ object has no attribute‘XXX’问题,终于找到了解决办法,泪目
pytorch-deeplab V3+训练自己的数据集 问题总结_第3张图片
修改 ‘state_dict’: self.model.module.state_dict() 为 ‘state_dict’: self.model.state_dict()

三、模型测试

(1)Error(s) in loading state_dict

Traceback (most recent call last):
  File "demo.py", line 101, in <module>
    main()
  File "demo.py", line 66, in main
    model.load_state_dict(ckpt['state_dict'])
  File "E:\Python\lib\site-packages\torch\nn\modules\module.py", line 1223, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for DeepLab:
        size mismatch for decoder.last_conv.8.weight: copying a param with shape torch.Size([5, 256, 1, 1]) from checkpoint, the shape in current m
odel is torch.Size([2, 256, 1, 1]).
        size mismatch for decoder.last_conv.8.bias: copying a param with shape torch.Size([5]) from checkpoint, the shape in current model is torch
.Size([2]).

竟然还是参数不匹配问题?!
修改模型测试py文件中的default为自己的类别数

    parser.add_argument('--num_classes', type=int, default=5,
                        help='crop image size')

(2)AssertionError: Torch not compiled with CUDA enabled

Traceback (most recent call last):
  File "demo.py", line 102, in <module>
    main()
  File "demo.py", line 68, in main
    model = model.cuda()
  File "E:\Python\lib\site-packages\torch\nn\modules\module.py", line 491, in cuda
    return self._apply(lambda t: t.cuda(device))
  File "E:\Python\lib\site-packages\torch\nn\modules\module.py", line 387, in _apply
    module._apply(fn)
  File "E:\Python\lib\site-packages\torch\nn\modules\module.py", line 387, in _apply
    module._apply(fn)
  File "E:\Python\lib\site-packages\torch\nn\modules\module.py", line 387, in _apply
    module._apply(fn)
  [Previous line repeated 1 more time]
  File "E:\Python\lib\site-packages\torch\nn\modules\module.py", line 409, in _apply
    param_applied = fn(param)
  File "E:\Python\lib\site-packages\torch\nn\modules\module.py", line 491, in <lambda>
    return self._apply(lambda t: t.cuda(device))
  File "E:\Python\lib\site-packages\torch\cuda\__init__.py", line 164, in _lazy_init
    raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled

搜了一下,有很多大佬说是pytorch和CUDA版本不兼容或者下载的torch没有cuda的原因,我没有试去配置版本
最简单的解决了我的问题的一个办法:model=model.cuda()
中的.cuda()删掉

END---------------------------------------------------------
今天你debug了吗?

pytorch-deeplab V3+训练自己的数据集 问题总结_第4张图片

你可能感兴趣的:(语义分割,深度学习,python,神经网络)