UserWarning: CUDA initialization: CUDA unknown error

CUDA在suspend之后不可用问题

问题描述

一觉醒来,电脑cuda不可用

/home/你的电脑/pytorch/lib/python3.8/site-packages/torch/cuda/__init__.py:107: UserWarning: CUDA initialization: CUDA unknown error - this may be due to an incorrectly set up environment, e.g. changing env variable CUDA_VISIBLE_DEVICES after program start. Setting the available devices to be zero. (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:109.)
  return torch._C._cuda_getDeviceCount() > 0
  • 尝试

    • export PATH=/usr/local/cuda-11/bin:$PATH
    • export LD_LIBRARY_PATH=/usr/local/cuda-11/lib64:$LD_LIBRARY_PATH
    • 但不是因为没有加载环境变量
  • 根据查到参考[1]中,可能与电脑suspend相关,查到[2]

  • 系统无法与GPU通信会提示这样的错误

    • 原因1:因为驱动更新但未重启或者其他安装问题
    • 原因2:电脑进入过suspend状态,重启可再次生效

解决办法

在这里插入图片描述
sudo rmmod nvidia_uvm
sudo modprobe nvidia_uvm
UserWarning: CUDA initialization: CUDA unknown error_第1张图片

  • 快速验证是否可用
import torch
torch.cuda.is_available()

关于rmmodmodprobe介绍可以参考[3]的介绍

参考

[1] https://blog.csdn.net/weixin_48319333/article/details/128214617
[2] https://discuss.pytorch.org/t/userwarning-cuda-initialization-cuda-unknown-error-this-may-be-due-to-an-incorrectly-set-up-environment-e-g-changing-env-variable-cuda-visible-devices-after-program-start-setting-the-available-devices-to-be-zero/129335/2
[3] https://blog.csdn.net/Ternence_zq/article/details/131068125

你可能感兴趣的:(cuda,各种报错,ubuntu,cuda,1024程序员节)