调试YOLOv3/YOLOv5过程中遇到的各种问题

码遇到的各种error

接上一篇YOLOv3-Pytorch版本自己学习及训练数据的记录!

过程中遇到的各种问题(下了好多个版本项目...)

  • .cfg文件版本中遇到的
    • 1. OSError: 页面文件太小,无法完成操作;BrokenPipeError; Error loading caffe2_detectron_ops_gpu.dll
    • 2. RuntimeError: CUDA out of memory.
    • 3. 至今还不会解决的:RuntimeError:Expected all tensors tobe on the same device, but found at least two devices,cuda:0 and cpu!
  • .yaml文件版本中遇到的
    • 1. yaml文件报错AttributeError: 'str' object has no attribute 'get'
    • 2.UnicodeDecodeError:’gbk’ codec can’t decode byte 0xae in position - : illegal multibyte sequence
    • 3. TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.
    • 4. RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.
    • 5. 用detect.py检测图片发现什么目标都识别不出来,用初始yolov3.pt也没有结果
    • 记录时间2021/3/24
    • 6. OMP: Error #15: Initializing libiomp5md.dll, but found libiomp5md.dll already initialized.
    • 记录时间2021/4/1


.cfg文件版本中遇到的

1. OSError: 页面文件太小,无法完成操作;BrokenPipeError; Error loading caffe2_detectron_ops_gpu.dll

OSError: 页面文件太小,无法完成操作。
BrokenPipeError: [Errno 32] Broken pipe
Error loading “D:\Anaconda3\envs\py36\lib\site-packages\torch\lib\caffe2_detectron_ops_gpu.dll” or one of its dependencies.

num_workers改成0
train.py传入参数那里改,如果没有的话就是在前面dataloader改调试YOLOv3/YOLOv5过程中遇到的各种问题_第1张图片

2. RuntimeError: CUDA out of memory.

形如RuntimeError: CUDA out of memory. Tried to allocate 1.04 GiB (GPU 0; 4.00 GiB total capacity; 86.63 MiB already allocated; 2.52 GiB free; 94.00 MiB reserved in total by PyTorch)
显存不够,调小训练的batch-size,其他进程关掉点或者重启一下电脑

3. 至今还不会解决的:RuntimeError:Expected all tensors tobe on the same device, but found at least two devices,cuda:0 and cpu!

在这里插入图片描述
用CPU可以训练,但是–device 0 命令就会报错,搜了一圈都解决不了T T 还好yaml版的我可以用(
先留在这

.yaml文件版本中遇到的

1. yaml文件报错AttributeError: ‘str’ object has no attribute ‘get’

我这个是报错在自己数据集的.yaml文件,修改确认写的路径正确就不会报错了。在这里插入图片描述

2.UnicodeDecodeError:’gbk’ codec can’t decode byte 0xae in position - : illegal multibyte sequence

UnicodeDecodeError:’gbk’ codec can’t decode byte 0xae in position42 : illegal multibyte sequence
找到报错对应位置,看看有没有with open()命令,加上encoding=‘utf-8’
调试YOLOv3/YOLOv5过程中遇到的各种问题_第2张图片

3. TypeError: can’t convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.

File “D:\Anaconda3\envs\py38\lib\site-packages\torch\tensor.py”, line 621, in __array__return self.numpy();TypeError: can’t convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.
在报错的地方.numpy()前面加个.cpu()
调试YOLOv3/YOLOv5过程中遇到的各种问题_第3张图片

4. RuntimeError: view size is not compatible with input tensor’s size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(…) instead.

File “D:\cxy\PyTorch_YOLOv3-master\PyTorch_YOLOv3-master\models\yolo_layer.py”, line 103, in forwardreturn pred.view(batchsize, -1, n_ch).data;RuntimeError: view size is not compatible with input tensor’s size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(…) instead.

别人的博客: 这是因为view()需要Tensor中的元素地址是连续的,但可能出现Tensor不连续的情况,所以先用 .contiguous() 将其在内存中变成连续分布:

在.view()前加.contiguous()

调试YOLOv3/YOLOv5过程中遇到的各种问题_第4张图片

5. 用detect.py检测图片发现什么目标都识别不出来,用初始yolov3.pt也没有结果

可能和我一样需要修改detect.py文件中的一个地方
调试YOLOv3/YOLOv5过程中遇到的各种问题_第5张图片
添加我标的这行,也就是和上面4行一样的

记录时间2021/3/24

6. OMP: Error #15: Initializing libiomp5md.dll, but found libiomp5md.dll already initialized.

video 1/1 (2/129) d:\testvideo\test01.mp4:
OMP: Error #15: Initializing libiomp5md.dll, but found libiomp5md.dll already initialized.
OMP: Hint This means that multiple copies of the OpenMP runtime have been linked into the program. That is dangerous, since it can degrade performance or cause incorrect resul
ts. The best thing to do is to ensure that only a single OpenMP runtime is linked into the process, e.g. by avoiding static linking of the OpenMP runtime in any library. As an
unsafe, unsupported, undocumented workaround you can set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but that may cause cr
ashes or silently produce incorrect results. For more information, please see http://www.intel.com/software/products/support/.

import os
os.environ['KMP_DUPLICATE_LIB_OK'] = 'True'  # 加上这句话不报错

记录时间2021/4/1

你可能感兴趣的:(python,pytorch)