RunTimeError: CUDA error: no kernel image is available for execution on the device

首先贴上自己的配置ubuntu 18.04 + 3060 + driver 470 + CUDA 11.1 + cuDNN 8.1.1 + torch 1.8

python3 collect_env.py 


Collecting environment information...
PyTorch version: 1.8.0+cu111
Is debug build: False
CUDA used to build PyTorch: 11.1
ROCM used to build PyTorch: N/A

OS: Ubuntu 18.04.6 LTS (x86_64)
GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
Clang version: Could not collect
CMake version: Could not collect
Libc version: glibc-2.9

Python version: 3.7.0 (default, Jun 28 2018, 13:15:42)  [GCC 7.2.0] (64-bit runtime)
Python platform: Linux-5.4.0-92-generic-x86_64-with-debian-buster-sid
Is CUDA available: True
CUDA runtime version: 11.1.105
GPU models and configuration: GPU 0: NVIDIA GeForce RTX 3060
Nvidia driver version: 470.86
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.2.4
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.2.4
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.2.4
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.2.4
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.2.4
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.2.4
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.2.4
HIP runtime version: N/A
MIOpen runtime version: N/A
 

nvidia-smi

+---------------------------------------------------------------------------------------------------+
| NVIDIA-SMI 470.86        Driver Version: 470.86       CUDA Version: 11.4     |
|-------------------------------+----------------------+-------------------------------------------+

 

NVIDIA GeForce RTX 3060 with CUDA capability sm_86 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70.
If you want to use the NVIDIA GeForce RTX 3060 GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/

warnings.warn(incompatible_device_warn.format(device_name, capability, " ".join(arch_list), device_name))

......

2022-01-09 11:27:55 | ERROR    | yolox.core.launch:98 - An error has been caught in function 'launch', process 'MainProcess' (18082), thread 'MainThread' (140270498518848):
Traceback (most recent call last):
......

RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

首先反省自己又陷入经验主义了,看到一堆log就习惯性地只看最下面。忽视了上面的重要信息,导致自己浪费了三天的时间。希望这篇文章能够节省某些有缘人的六个小时。

在[github issue](No kernel image is available for execution on the device · Issue #5723 · google/jax · GitHub)中提到了这是由于CUDA和驱动不匹配的原因。(We've confirmed this issue is due to using too new a version of CUDA with too old a driver version. If you see this issue, the workaround is either to use an older CUDA release or a newer NVidia driver.)其他几个回答也说可能是CUDA和torch版本的问题。于是我重装了好几次CUDA、cuDNN和torch等,算是个装这个的熟练工了(泪)。

但问题并没有解决,直到我重新看了一遍log,才在最上面看到了这个报错。在网上搜了一下,原因是我当前版本的torch并不支持我当前CUDA的算力。

在 [github issue](https://github.com/pytorch/pytorch/issues/45021)中建议我在命令行中设置TORCH_CUDA_ARCH_LIST="8.6"然后重装torch等,然后就work well 了。

记录一下要升级CUDA的流程:

1、在官网查询显卡对应能安装的CUDA版本。

2、看看是否需要升级驱动。

3、在官网下载CUDA的run文件,chmod 1777,然后sudo ./xx.sh。点掉驱动安装。

4、下载对应的cuDNN的linux library,chmod 1777,然后sudo ./xx.sh。

5、在pytorch官网查询自己CUDA对应的torch版本

6、import torch      torch.cuda.is_available()

你可能感兴趣的:(pytorch,人工智能,python)