这两天(2021年3月20日左右)在github下载yolov5后准备先使用一下自带的coco128数据集尝试效果,但是每次运行的时候都会报错:
RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED
查阅了大量的资料,包括重装了CUDA和Cudnn,重装了Anaconda3等等方法,都不行。直到看到了6天前在stackoverflow有人和我出现了同样的问题,报错提示和我一样如下:
Logging results to runs/train/exp66
Starting training for 5 epochs...
Epoch gpu_mem box obj cls total targets img_size
0%| | 0/22 [00:00, ?it/s]
Traceback (most recent call last):
File "train.py", line 533, in
train(hyp, opt, device, tb_writer, wandb)
File "train.py", line 298, in train
pred = model(imgs) # forward
File "/usr/local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/ubuntu/yolov5/models/yolo.py", line 121, in forward
return self.forward_once(x, profile) # single-scale inference, train
File "/home/ubuntu/yolov5/models/yolo.py", line 137, in forward_once
x = m(x) # run
File "/usr/local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/ubuntu/yolov5/models/common.py", line 113, in forward
return self.conv(torch.cat([x[..., ::2, ::2], x[..., 1::2, ::2], x[..., ::2, 1::2], x[..., 1::2, 1::2]], 1))
File "/usr/local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/ubuntu/yolov5/models/common.py", line 38, in forward
return self.act(self.bn(self.conv(x)))
File "/usr/local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 399, in forward
return self._conv_forward(input, self.weight, self.bias)
File "/usr/local/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 395, in _conv_forward
return F.conv2d(input, weight, bias, self.stride,
RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED
附上原答主的回复:
I don’t know why but it seems as torch 1.8 is built on older version of cuda. Also as pytorch has its own cuda it seems to doesn’t care what version you have on your machine. Changing the torch version (and matching compatible tochvision) solved my problem.
In my case I did as follows:
Changed two lines in “requirements.txt”:
torch==1.7.1
torchvision==0.8.2
Created fresh conda environment with python=3.8
Activated the environment
Installed requirements from modified file:
$ pip install -r requirements.txt
Hope it’ll help to someone
大概的意思就是由于使用yolov5自带的依赖安装脚本,会安装torch1.8,它会根据老版本的cuda去build。而且pytorch也不会管你电脑上的cuda而用它自带的cuda。所以说在安装依赖,也就是
$ pip install -r requirements.txt
这一步之前,需要打开requirements.txt对内容进行修改。
首先把
torch>=1.7.0
改成
torch==1.7.1
然后把
torchvision>=0.8.1
改成
torchvision==0.8.2
这两行改好后保存文件,重新执行
$ pip install -r requirements.txt
现在pip会自动安装1.7.1的torch和0.8.2的torchvision。
我机器安装好了后就可以成功运行了。
我的电脑是RTX 2080S, CUDA 11.1,Cudnn8.0.5.39。
如果有类似问题,也有可能是python版本不对 需要3.8以上,或者是其他依赖。不过可能这一个版本的yolov5在某些机器上存在各种各样的问题,但是这个方法对我的电脑有效,也许后续的版本会修复吧。折腾了两天就因为这个哈哈。