RuntimeError: The NVIDIA driver on your system is too old.

【报错】使用 AutoDL 复现实验时遇到 RuntimeError: The NVIDIA driver on your system is too old (found version 11070). Please update your GPU driver by downloading and installing a new version from the URL: http://www.nvidia.com/Download/index.aspx Alternatively, go to: https://pytorch.org to install a PyTorch version that has been compiled with your version of the CUDA driver. 报错:
在这里插入图片描述
显卡是 RTX 3090 24G,软件环境参照 instruct-pix2pix 的 environment.yaml;

【原因】执行 nvidia-smi 指令获取 GPU 相关信息,包括驱动版本、CUDA 版本和一些设备信息:
RuntimeError: The NVIDIA driver on your system is too old._第1张图片

按照报错提示,访问 http://www.nvidia.com/Download/index.aspx 查询合适的 GPU 驱动版本,发现确实是驱动版本太低 1 2,至少需要 535.146.02 版本的驱动程序,而服务器上的仅有 515.76:
RuntimeError: The NVIDIA driver on your system is too old._第2张图片

【解决办法】参考 AutoDL私有云 | GPU驱动 更新驱动,但第一部卸载当前驱动无法执行,可以按照 How can I uninstall a nvidia driver completely ? 中卸载驱动。

卸载驱动后,安装新驱动:wget https://us.download.nvidia.com/XFree86/Linux-x86_64/535.98/NVIDIA-Linux-x86_64-535.98.run
RuntimeError: The NVIDIA driver on your system is too old._第3张图片

最后一步遇到 ERROR: An NVIDIA kernel module 'nvidia-uvm' appears to already be loaded in your kernel. This may be because it is in use (for example, by an X server, a CUDA program, or the NVIDIA Persistence Daemon), but this may also happen if your kernel was configured without support for module unloading. Please be sure to exit any programs that may be using the GPU(s) before attempting to upgrade your driver. If no GPU-based programs are running, you know that your kernel supports module unloading, and you still receive this message, then an error may have occurred that has corrupted an NVIDIA kernel module's usage count, for which the simplest remedy is to reboot your computer. 报错:
在这里插入图片描述
查阅大量资料也未能解决 3

因为是远程服务器无法本地安装驱动,建议换一台版本更高的。


  1. UserWarning: CUDA initialization: The NVIDIA driver on your system is too old (found version 10010) ↩︎

  2. NVIDIA driver too old error #4546 ↩︎

  3. How to solve ‘ERROR: An NVIDIA kernel module ‘nvidia-uvm’ appears to already be loaded in your kernel’? ↩︎

你可能感兴趣的:(Bugs,NVIDIA,CUDA,driver)