MMdetection训练遇到的各种问题总结

label = self.cat2label[name] 报错

  • 报错信息如下:

KeyError: ‘Traceback (most recent call last):\n File “/home/dreamtech/.conda/envs/AugFPN2/lib/python3.7/site-packages/torch/utils/data/dataloader.py”, line 138, in _worker_loop\n samples = collate_fn([dataset[i] for i in batch_indices])\n File “/home/dreamtech/.conda/envs/AugFPN2/lib/python3.7/site-packages/torch/utils/data/dataloader.py”, line 138, in \n samples = collate_fn([dataset[i] for i in batch_indices])\n File “/home/dreamtech/.conda/envs/AugFPN2/lib/python3.7/site-packages/mmdet/datasets/custom.py”, line 159, in getitem\n data = self.prepare_train_img(idx)\n File “/home/dreamtech/.conda/envs/AugFPN2/lib/python3.7/site-packages/mmdet/datasets/custom.py”, line 187, in prepare_train_img\n ann = self.get_ann_info(idx)\n File “/home/dreamtech/.conda/envs/AugFPN2/lib/python3.7/site-packages/mmdet/datasets/xml_style.py”, line 44, in get_ann_info\n label = self.cat2label[name]\nKeyError: ‘sly’\n’

  • 该问题出现在低版本的MMdetection,修改config文件训练自己的VOC格式数据集时,出现该问题并不是训练集标签的问题,而是没有在修改VOC.py ,class_names.py 以及config文件后,重新执行 pip install . 或者 python setup.py install.
  • 该问题在高版本的MMdetection中不会出现。

RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED

  • 报错信息如下

Traceback (most recent call last):
File “tools/train.py”, line 196, in
main()
File “tools/train.py”, line 192, in main
meta=meta)
File “/clusters/data_1080Ti_0/liudong/mmdetection/mmdet/apis/train.py”, line 209, in train_detector
runner.run(data_loaders, cfg.workflow)
File “/home/liudong/.conda/envs/liudong-mmlab/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py”, line 127, in run
epoch_runner(data_loaders[i], **kwargs)
File “/home/liudong/.conda/envs/liudong-mmlab/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py”, line 51, in train
self.call_hook(‘after_train_iter’)
File “/home/liudong/.conda/envs/liudong-mmlab/lib/python3.7/site-packages/mmcv/runner/base_runner.py”, line 309, in call_hook
getattr(hook, fn_name)(self)
File “/home/liudong/.conda/envs/liudong-mmlab/lib/python3.7/site-packages/mmcv/runner/hooks/optimizer.py”, line 56, in after_train_iter
runner.outputs[‘loss’].backward()
File “/home/liudong/.conda/envs/liudong-mmlab/lib/python3.7/site-packages/torch/tensor.py”, line 245, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File “/home/liudong/.conda/envs/liudong-mmlab/lib/python3.7/site-packages/torch/autograd/init.py”, line 147, in backward
allow_unreachable=True, accumulate_grad=True) # allow_unreachable flag
RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED

  • 此时mmdet=2.20.0 mmcv-full=1.4.4
  • 查阅github,可以降低nncv-full的版本到1.4.2 解决问题

你可能感兴趣的:(深度学习,深度学习,python)