NVIDIA驱动失效简单解决方案:NVIDIA-SMI has failed because it couldn‘t communicate with the NVIDIA driver.

原来用的挺好,今天就报错了,显卡用不了了,重启之后报这个错:NVIDIA-SMI has failed because it couldn‘t communicate with the NVIDIA driver. 脑瓜子嗡,马上想到可能要重装显卡驱动。
后来一想,如何能不装呢。网上大神多!!!
第一步,打开终端,先用nvidia-smi查看一下,发现如下报错:

NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. 
Make sure that the latest NVIDIA driver is installed and running.

第二步,使用nvcc -V检查驱动和cuda。

(base) [jianming_ge@localhost national_fire_level]$ nvidia-smi -l 1
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

(base) [jianming_ge@localhost national_fire_level]$
(base) [jianming_ge@localhost national_fire_level]$
(base) [jianming_ge@localhost national_fire_level]$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Tue_Sep_15_19:10:02_PDT_2020
Cuda compilation tools, release 11.1, V11.1.74
Build cuda_11.1.TC455_06.29069683_0

第三步,查看已安装驱动的版本信息

ls /usr/src | grep nvidia
(base) [jianming_ge@localhost national_fire_level]$ ls /usr/src | grep nvidia
nvidia-470.63.01

第四步,依次输入以下命令

sudo apt-get install dkms

sudo dkms install -m nvidia -v 450.57

我的是centos命令 应该用:

sudo yum install epel-release
sudo yum install dkms

sudo dkms install -m nvidia -v 470.63.01

(base) [jianming_ge@localhost national_fire_level]$ sudo dkms install -m nvidia  -v 470.63.01
Sign command: /lib/modules/3.10.0-1160.90.1.el7.x86_64/build/scripts/sign-file
Signing key: /var/lib/dkms/mok.key
Public certificate (MOK): /var/lib/dkms/mok.pub

Building module:
Cleaning build area...
'make' -j8 NV_EXCLUDE_BUILD_MODULES='' KERNEL_UNAME=3.10.0-1160.90.1.el7.x86_64 IGNORE_CC_MISMATCH='' modules...(bad exit status: 2)
Error! Bad return status for module build on kernel: 3.10.0-1160.90.1.el7.x86_64 (x86_64)
Consult /var/lib/dkms/nvidia/470.63.01/build/make.log for more information.

报错,安装指引 Consult /var/lib/dkms/nvidia/470.63.01/build/make.log for more information.


(base) [jianming_ge@localhost national_fire_level]$ cat /var/lib/dkms/nvidia/470.63.01/build/make.log
DKMS make.log for nvidia-470.63.01 for kernel 3.10.0-1160.90.1.el7.x86_64 (x86_64)
2023年 07月 13日 星期四 11:36:22 CST
make[1]: 进入目录“/usr/src/kernels/3.10.0-1160.90.1.el7.x86_64”
  SYMLINK /var/lib/dkms/nvidia/470.63.01/build/nvidia/nv-kernel.o
  SYMLINK /var/lib/dkms/nvidia/470.63.01/build/nvidia-modeset/nv-modeset-kernel.o

Compiler version check failed:

The major and minor number of the compiler used to
compile the kernel:

gcc version 4.8.5 20150623 (Red Hat 4.8.5-44) (GCC)

does not match the compiler used here:

cc (GCC) 8.3.1 20190311 (Red Hat 8.3.1-3)
Copyright (C) 2018 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.


It is recommended to set the CC environment variable
to the compiler that was used to compile the kernel.

The compiler version check can be disabled by setting
the IGNORE_CC_MISMATCH environment variable to "1".
However, mixing compiler versions between the kernel
and kernel modules can result in subtle bugs that are
difficult to diagnose.

*** Failed CC version check. Bailing out! ***

make[3]: *** [cc_version_check] 错误 1
make[2]: *** [_module_/var/lib/dkms/nvidia/470.63.01/build] 错误 2
make[1]: *** [sub-make] 错误 2
make[1]: 离开目录“/usr/src/kernels/3.10.0-1160.90.1.el7.x86_64”
make: *** [modules] 错误 2

然后开始瞎升级gcc,没好用。网上遇到这个问题的人也很多。偶然发现并解决了。差一点就放弃重装了。
然后:

(base) [jianming_ge@localhost national_fire_level]$ cat /usr/src/nvidia-470.63.01/dkms.conf
PACKAGE_NAME="nvidia"
PACKAGE_VERSION="470.63.01"
AUTOINSTALL="yes"

# By default, DKMS will add KERNELRELEASE to the make command line; however,
# this will cause the kernel module build to infer that it was invoked via
# Kbuild directly instead of DKMS. The dkms(8) manual page recommends quoting
# the 'make' command name to suppress this behavior.
MAKE[0]="'make' -j8 NV_EXCLUDE_BUILD_MODULES='' KERNEL_UNAME=${kernelver} IGNORE_CC_MISMATCH='' modules"

# The list of kernel modules was generated by nvidia-installer at runtime.
BUILT_MODULE_NAME[0]="nvidia"
DEST_MODULE_LOCATION[0]="/kernel/drivers/video"
BUILT_MODULE_NAME[1]="nvidia-uvm"
DEST_MODULE_LOCATION[1]="/kernel/drivers/video"
BUILT_MODULE_NAME[2]="nvidia-modeset"
DEST_MODULE_LOCATION[2]="/kernel/drivers/video"
BUILT_MODULE_NAME[3]="nvidia-drm"
DEST_MODULE_LOCATION[3]="/kernel/drivers/video"
BUILT_MODULE_NAME[4]="nvidia-peermem"
DEST_MODULE_LOCATION[4]="/kernel/drivers/video"

找到了它的配置文件
vim /usr/src/nvidia-470.63.01/dkms.conf
MAKE[0]=“‘make’ -j8 NV_EXCLUDE_BUILD_MODULES=‘’ KERNEL_UNAME=${kernelver} IGNORE_CC_MISMATCH=‘’ modules”

IGNORE_CC_MISMATCH=‘yes’
再执行,就好使了
NVIDIA驱动失效简单解决方案:NVIDIA-SMI has failed because it couldn‘t communicate with the NVIDIA driver._第1张图片
NVIDIA驱动失效简单解决方案:NVIDIA-SMI has failed because it couldn‘t communicate with the NVIDIA driver._第2张图片

你可能感兴趣的:(深度学习,服务器,python)