yolov8自定义数据训练报错排查(CUDA error: an illegal memory access was encountered)

使用自己的数据集训练时,出现了以下错误:
RuntimeError: numel: integer multiplication overflow
RuntimeError: numel: integer multiplication overflow · Issue #596 · ultralytics/ultralytics (github.com)
github上有人说是由于数据集中标签有问题,不过我处理了一遍数据,并没有这种情况。
仔细查看错误出现的位置,是在第一个epoch训练完成后在验证集上出现的,于是我尝试把训练集也设置成验证集,结果训练第一个epoch正常的,报错仍旧出现在第一个epoch后的验证阶段,并且报错变成了:

../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [1017,0,0], thread: [22,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [1017,0,0], thread: [23,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [1017,0,0], thread: [24,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [1017,0,0], thread: [25,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [1017,0,0], thread: [26,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [1017,0,0], thread: [27,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [1017,0,0], thread: [28,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [1017,0,0], thread: [29,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [1017,0,0], thread: [30,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [1017,0,0], thread: [31,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.

然后出现# CUDA error: an illegal memory access was encountered
尝试使用其他版本pytorch, 经过验证:
pytorch 1.11.02.0.1可以正常训练,出问题的版本是pytorch 1.13.1。若有碰到相似问题的,不妨换一个pytorch版本。

你可能感兴趣的:(yolo系列解读与实战,YOLO,计算机视觉,深度学习,yolov8,目标检测)