记录yolov5更改backbone为ShuffleBlock网络迁移训练出错问题以及解决方法

前言:最近在学习yolov5,记录一些报错

1. 张量不对

Sizes of tensors must match except in dimension 1. Expected size 16 but got size 8 for tensor number 1 in the list.

报错信息如下:

Traceback (most recent call last):
  File "I:\EnvVariable\ML\Anaconda3\envs\pytorch\lib\site-packages\IPython\core\interactiveshell.py", line 3343, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "", line 1, in <module>
    runfile('I:/GraduationProject/yolov5-5.0-sniperitf798/train.py', wdir='I:/GraduationProject/yolov5-5.0-sniperitf798')
  File "I:\DevSoftware\Python\JetBrains\PyCharm 2020.2.3\plugins\python\helpers\pydev\_pydev_bundle\pydev_umd.py", line 198, in runfile
    pydev_imports.execfile(filename, global_vars, local_vars)  # execute the script
  File "I:\DevSoftware\Python\JetBrains\PyCharm 2020.2.3\plugins\python\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "I:/GraduationProject/yolov5-5.0-sniperitf798/train.py", line 543, in <module>
    train(hyp, opt, device, tb_writer)
  File "I:/GraduationProject/yolov5-5.0-sniperitf798/train.py", line 88, in train
    model = Model(opt.cfg or ckpt['model'].yaml, ch=3, nc=nc, anchors=hyp.get('anchors')).to(device)  # create
  File "I:\GraduationProject\yolov5-5.0-sniperitf798\models\yolo.py", line 93, in __init__
    m.stride = torch.tensor([s / x.shape[-2] for x in self.forward(torch.zeros(1, ch, s, s))])  # forward
  File "I:\GraduationProject\yolov5-5.0-sniperitf798\models\yolo.py", line 123, in forward
    return self.forward_once(x, profile)  # single-scale inference, train
  File "I:\GraduationProject\yolov5-5.0-sniperitf798\models\yolo.py", line 139, in forward_once
    x = m(x)  # run
  File "I:\EnvVariable\ML\Anaconda3\envs\pytorch\lib\site-packages\torch\nn\modules\module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "I:\GraduationProject\yolov5-5.0-sniperitf798\models\common.py", line 210, in forward
    return torch.cat(x, self.d)
RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 16 but got size 8 for tensor number 1 in the list.

初步估计是模型网络结构出了问题。
下面是报错的网络结构:

backbone:
  # [from, number, module, args]
  # Shuffle_Block: [out, stride]
  [[ -1, 1, conv_bn_relu_maxpool, [ 32 ] ], # 0-P2/4
   [ -1, 1, Shuffle_Block, [ 128, 2 ] ],  # 1-P3/8
   [ -1, 3, Shuffle_Block, [ 128, 1 ] ],  # 2
   [ -1, 1, Shuffle_Block, [ 256, 2 ] ],  # 3-P4/16
   [ -1, 7, Shuffle_Block, [ 256, 1 ] ],  # 4
   [ -1, 1, Shuffle_Block, [ 512, 2 ] ],  # 5-P5/32
   [ -1, 3, Shuffle_Block, [ 512, 1 ] ],  # 6
   [ -1, 1, Shuffle_Block, [ 1024, 2 ] ],  # 7-P6/64
#   [ -1, 3, Shuffle_Block, [ 1024, 1 ] ],  # 8
  ]

# YOLOv5 v5.0 head
head:
  [[-1, 1, Conv, [512, 1, 1]], # 7
   [-1, 1, nn.Upsample, [None, 2, 'nearest']],# 8
   [[-1, 6], 1, Concat, [1]],  # cat backbone P4 # 9
   [-1, 3, C3, [512, False]],  # 10

   [-1, 1, Conv, [256, 1, 1]], # 11
   [-1, 1, nn.Upsample, [None, 2, 'nearest']], # 12
   [[-1, 4], 1, Concat, [1]],  # cat backbone P3 # 13
   [-1, 3, C3, [256, False]],  # 14 (P3/8-small)

   [-1, 1, Conv, [256, 3, 2]], # 15
   [[-1, 14], 1, Concat, [1]],  # cat head P4 # 16
   [-1, 3, C3, [512, False]],  # 17(P4/16-medium)

   [-1, 1, Conv, [512, 3, 2]],# 18
   [[-1, 10], 1, Concat, [1]],  # cat head P5
   [-1, 3, C3, [1024, False]],  # 20 (P5/32-large)

   [[15, 18, 21], 1, Detect, [nc, anchors]],  # Detect(P3, P4, P5)
  ]

解决方法:在backbone网络末尾加上了Conv,spp,c3网络

# YOLOv5  backbone
backbone:
  # [from, number, module, args]
  # Shuffle_Block: [out, stride]
  [[ -1, 1, conv_bn_relu_maxpool, [ 32 ] ], # 0-P2/4
   [ -1, 1, Shuffle_Block, [ 128, 2 ] ],  # 1-P3/8
   [ -1, 3, Shuffle_Block, [ 128, 1 ] ],  # 2
   [ -1, 1, Shuffle_Block, [ 256, 2 ] ],  # 3-P4/16
   [ -1, 7, Shuffle_Block, [ 256, 1 ] ],  # 4
   [ -1, 1, Shuffle_Block, [ 512, 2 ] ],  # 5-P5/32
   [ -1, 3, Shuffle_Block, [ 512, 1 ] ],  # 6
   [ -1, 1, Conv, [ 1024, 3, 2 ] ],  # 7-P5/32
   [ -1, 1, SPP, [ 1024, [ 5, 9, 13 ] ] ],# 8
   [ -1, 3, C3, [ 1024, False ] ],  # 9
  ]

2. [WinError 1455] 页面文件太小

但是开始训练,又报新的错

train: Scanning 'VOC\labels\train' images and labels... 16551 found, 0 missing, 0 empty, 0 corrupted: 100%|██████████| 16551/16551 [00:11<00:00, 1491.29it/s]
train: New cache created: VOC\labels\train.cache
Traceback (most recent call last):
  File "", line 1, in <module>
  File "I:\EnvVariable\ML\Anaconda3\envs\pytorch\lib\multiprocessing\spawn.py", line 105, in spawn_main
    exitcode = _main(fd)
  File "I:\EnvVariable\ML\Anaconda3\envs\pytorch\lib\multiprocessing\spawn.py", line 114, in _main
    prepare(preparation_data)
  File "I:\EnvVariable\ML\Anaconda3\envs\pytorch\lib\multiprocessing\spawn.py", line 225, in prepare
    _fixup_main_from_path(data['init_main_from_path'])
  File "I:\EnvVariable\ML\Anaconda3\envs\pytorch\lib\multiprocessing\spawn.py", line 277, in _fixup_main_from_path
    run_name="__mp_main__")
  File "I:\EnvVariable\ML\Anaconda3\envs\pytorch\lib\runpy.py", line 263, in run_path
    pkg_name=pkg_name, script_name=fname)
  File "I:\EnvVariable\ML\Anaconda3\envs\pytorch\lib\runpy.py", line 96, in _run_module_code
    mod_name, mod_spec, pkg_name, script_name)
  File "I:\EnvVariable\ML\Anaconda3\envs\pytorch\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "I:\GraduationProject\yolov5-5.0-sniperitf798\train.py", line 12, in <module>
    import torch.distributed as dist
  File "I:\EnvVariable\ML\Anaconda3\envs\pytorch\lib\site-packages\torch\__init__.py", line 124, in <module>
    raise err
OSError: [WinError 1455] 页面文件太小,无法完成操作。 Error loading "I:\EnvVariable\ML\Anaconda3\envs\pytorch\lib\site-packages\torch\lib\caffe2_detectron_ops_gpu.dll" or one of its dependencies.
Traceback (most recent call last):
  File "I:\EnvVariable\ML\Anaconda3\envs\pytorch\lib\site-packages\IPython\core\interactiveshell.py", line 3343, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "", line 1, in <module>
    runfile('I:/GraduationProject/yolov5-5.0-sniperitf798/train.py', wdir='I:/GraduationProject/yolov5-5.0-sniperitf798')
  File "I:\DevSoftware\Python\JetBrains\PyCharm 2020.2.3\plugins\python\helpers\pydev\_pydev_bundle\pydev_umd.py", line 198, in runfile
    pydev_imports.execfile(filename, global_vars, local_vars)  # execute the script
  File "I:\DevSoftware\Python\JetBrains\PyCharm 2020.2.3\plugins\python\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "I:/GraduationProject/yolov5-5.0-sniperitf798/train.py", line 545, in <module>
    train(hyp, opt, device, tb_writer)
  File "I:/GraduationProject/yolov5-5.0-sniperitf798/train.py", line 194, in train
    image_weights=opt.image_weights, quad=opt.quad, prefix=colorstr('train: '))
  File "I:\GraduationProject\yolov5-5.0-sniperitf798\utils\datasets.py", line 84, in create_dataloader
    collate_fn=LoadImagesAndLabels.collate_fn4 if quad else LoadImagesAndLabels.collate_fn)
  File "I:\GraduationProject\yolov5-5.0-sniperitf798\utils\datasets.py", line 97, in __init__
    self.iterator = super().__iter__()
  File "I:\EnvVariable\ML\Anaconda3\envs\pytorch\lib\site-packages\torch\utils\data\dataloader.py", line 359, in __iter__
    return self._get_iterator()
  File "I:\EnvVariable\ML\Anaconda3\envs\pytorch\lib\site-packages\torch\utils\data\dataloader.py", line 305, in _get_iterator
    return _MultiProcessingDataLoaderIter(self)
  File "I:\EnvVariable\ML\Anaconda3\envs\pytorch\lib\site-packages\torch\utils\data\dataloader.py", line 918, in __init__
    w.start()
  File "I:\EnvVariable\ML\Anaconda3\envs\pytorch\lib\multiprocessing\process.py", line 105, in start
    self._popen = self._Popen(self)
  File "I:\EnvVariable\ML\Anaconda3\envs\pytorch\lib\multiprocessing\context.py", line 223, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "I:\EnvVariable\ML\Anaconda3\envs\pytorch\lib\multiprocessing\context.py", line 322, in _Popen
    return Popen(process_obj)
  File "I:\EnvVariable\ML\Anaconda3\envs\pytorch\lib\multiprocessing\popen_spawn_win32.py", line 65, in __init__
    reduction.dump(process_obj, to_child)
  File "I:\EnvVariable\ML\Anaconda3\envs\pytorch\lib\multiprocessing\reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
BrokenPipeError: [Errno 32] Broken pipe

出现的原因

电脑在默认情况下没有给I盘分配虚拟内存,所以将Python装在I盘的,在跑程序的时候,没有分配虚拟内存,就会遇到上面的问题。所以,只要给I盘分派虚拟内存即可。如果Python安装在C盘,更改C盘的虚拟内存的值,调大些。

解决方法

  1. 查找:查看高级系统设置
    记录yolov5更改backbone为ShuffleBlock网络迁移训练出错问题以及解决方法_第1张图片
  2. 进行如下图的操作
    记录yolov5更改backbone为ShuffleBlock网络迁移训练出错问题以及解决方法_第2张图片
  3. 重新启动计算机,重新运行程序

3. MP: Hint This means that multiple copies of the OpenMP runtime have been linked into the

解决上面一二问题开始出现新的问题,具体报错如下:

train: Scanning 'VOC\labels\train.cache' images and labels... 16551 found, 0 missing, 0 empty, 0 corrupted: 100%|██████████| 16551/16551 [00:00<?, ?it/s]
OMP: Error #15: Initializing libiomp5md.dll, but found libiomp5md.dll already initialized.
OMP: Hint This means that multiple copies of the OpenMP runtime have been linked into the program. That is dangerous, since it can degrade performance or cause incorrect results. The best thing to do is to ensure that only a single OpenMP runtime is linked into the process, e.g. by avoiding static linking of the OpenMP runtime in any library. As an unsafe, unsupported, undocumented workaround you can set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorrect results. For more information, please see http://www.intel.com/software/products/support/.
OMP: Error #15: Initializing libiomp5md.dll, but found libiomp5md.dll already initialized.
OMP: Hint This means that multiple copies of the OpenMP runtime have been linked into the program. That is dangerous, since it can degrade performance or cause incorrect results. The best thing to do is to ensure that only a single OpenMP runtime is linked into the process, e.g. by avoiding static linking of the OpenMP runtime in any library. As an unsafe, unsupported, undocumented workaround you can set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorrect results. For more information, please see http://www.intel.com/software/products/support/.
OMP: Error #15: Initializing libiomp5md.dll, but found libiomp5md.dll already initialized.
OMP: Hint This means that multiple copies of the OpenMP runtime have been linked into the program. That is dangerous, since it can degrade performance or cause incorrect results. The best thing to do is to ensure that only a single OpenMP runtime is linked into the process, e.g. by avoiding static linking of the OpenMP runtime in any library. As an unsafe, unsupported, undocumented workaround you can set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorrect results. For more information, please see http://www.intel.com/software/products/support/.
OMP: Error #15: Initializing libiomp5md.dll, but found libiomp5md.dll already initialized.
OMP: Hint This means that multiple copies of the OpenMP runtime have been linked into the program. That is dangerous, since it can degrade performance or cause incorrect results. The best thing to do is to ensure that only a single OpenMP runtime is linked into the process, e.g. by avoiding static linking of the OpenMP runtime in any library. As an unsafe, unsupported, undocumented workaround you can set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorrect results. For more information, please see http://www.intel.com/software/products/support/.
OMP: Error #15: Initializing libiomp5md.dll, but found libiomp5md.dll already initialized.
OMP: Hint This means that multiple copies of the OpenMP runtime have been linked into the program. That is dangerous, since it can degrade performance or cause incorrect results. The best thing to do is to ensure that only a single OpenMP runtime is linked into the process, e.g. by avoiding static linking of the OpenMP runtime in any library. As an unsafe, unsupported, undocumented workaround you can set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorrect results. For more information, please see http://www.intel.com/software/products/support/.
OMP: Error #15: Initializing libiomp5md.dll, but found libiomp5md.dll already initialized.
OMP: Hint This means that multiple copies of the OpenMP runtime have been linked into the program. That is dangerous, since it can degrade performance or cause incorrect results. The best thing to do is to ensure that only a single OpenMP runtime is linked into the process, e.g. by avoiding static linking of the OpenMP runtime in any library. As an unsafe, unsupported, undocumented workaround you can set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorrect results. For more information, please see http://www.intel.com/software/products/support/.
OMP: Error #15: Initializing libiomp5md.dll, but found libiomp5md.dll already initialized.
OMP: Hint This means that multiple copies of the OpenMP runtime have been linked into the program. That is dangerous, since it can degrade performance or cause incorrect results. The best thing to do is to ensure that only a single OpenMP runtime is linked into the process, e.g. by avoiding static linking of the OpenMP runtime in any library. As an unsafe, unsupported, undocumented workaround you can set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorrect results. For more information, please see http://www.intel.com/software/products/support/.
Scanning images:   0%|          | 0/4952 [00:00<?, ?it/s]OMP: Error #15: Initializing libiomp5md.dll, but found libiomp5md.dll already initialized.
OMP: Hint This means that multiple copies of the OpenMP runtime have been linked into the program. That is dangerous, since it can degrade performance or cause incorrect results. The best thing to do is to ensure that only a single OpenMP runtime is linked into the process, e.g. by avoiding static linking of the OpenMP runtime in any library. As an unsafe, unsupported, undocumented workaround you can set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorrect results. For more information, please see http://www.intel.com/software/products/support/.
val: Scanning 'VOC\labels\val' images and labels... 4952 found, 0 missing, 0 empty, 0 corrupted: 100%|██████████| 4952/4952 [00:05<00:00, 846.03it/s]
val: New cache created: VOC\labels\val.cache
Plotting labels... 
OMP: Error #15: Initializing libiomp5md.dll, but found libiomp5md.dll already initialized.
OMP: Hint This means that multiple copies of the OpenMP runtime have been linked into the program. That is dangerous, since it can degrade performance or cause incorrect results. The best thing to do is to ensure that only a single OpenMP runtime is linked into the process, e.g. by avoiding static linking of the OpenMP runtime in any library. As an unsafe, unsupported, undocumented workaround you can set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorrect results. For more information, please see http://www.intel.com/software/products/support/.

解决方法:
在train.py开头添加以下代码:

##OMP报错
import os
os.environ['KMP_DUPLICATE_LIB_OK'] = 'TRUE'

参考:
彻底解决pycharm中: OSError: [WinError 1455] 页面文件太小,无法完成操作的问题–亲测

你可能感兴趣的:(本科毕设,pytorch,深度学习,目标检测)