(省流:换旧版本)yolov5训练数据集时P, R等数据为0的解决方案 2022.2.24

 (py 3.6 cuda 11.3 torch1.10.2 -> py 3.9 cuda 10.2 torch 1.9.0)

之前下载的pytorch版本是直接从Start Locally | PyTorch 中的start locally选择的1.10.2中下载的cuda版本11.3的指令,如下图.

(省流:换旧版本)yolov5训练数据集时P, R等数据为0的解决方案 2022.2.24_第1张图片

 训练出的模型中results.png显示的像precision,mAP之类的图表全部不是0就是nan,混淆矩阵部分也全是FN.同时报错

UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`.  Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate

Previous PyTorch Versions | PyTorch 之后我选择下载历史版本1.9.0,pip安装指令如下:

pip install torch==1.9.0 torchvision==0.10.0 torchaudio==0.9.0

 (其实这个是OSX的版本)数据恢复正常.但当我使用print(torch.version.cuda) 输出cuda版本时显示为10.2,我使用nvcc -V查询cuda版本时显示为11.3,usr/local中也不存在10.2的文件,但是还能用,奇奇怪怪.第二天报错:CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.For debugging consider passing CUDA_LAUNCH_BLOCKING=1.然后就寄掉惹.

使用pip安装,pytorch版本1.9.0,cuda11.3,报错UserWarning: CUDA initialization: CUDA unknown error - this may be due to an incorrectly set up environment, e.g. changing env variable CUDA_VISIBLE_DEVICES after program start. Setting the available devices to be zero. (Triggered internally at  /pytorch/c10/cuda/CUDAFunctions.cpp:115.)
  return torch._C._cuda_getDeviceCount() > 0

此时数据正常,gpu_mem显示为0.

最后cuda10.2+pytorch1.9.0安装,发现import torch报错,将python版本切换为3.9,解决问题!

自用的检验方法:

import torch
print(torch.__version__) # PyTorch version
import torchvision
print(torchvision.__version__)

print(torch.version.cuda) # Corresponding CUDA version
print(torch.backends.cudnn.version()) # Corresponding cuDNN version
print(torch.cuda.get_device_name(0)) # GPU type

yolov5训练(train)的时候 P R 值为0_m0_59080342的博客-CSDN博客    同样的情况

YOLOv5目标检测 - 迷途小书童的Note迷途小书童的Note  配置时使用的教程

你可能感兴趣的:(方便自己看的小教程,pytorch)