ubuntu 上NVIDIA驱动和CUDA9.0 的坑之一二

1 参考链接

[1] NVIDIA 官方CUDA安装文档: http://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html

[2] NVIDIA  对XFree86 下安装驱动的说明: http://us.download.nvidia.com/XFree86/Linux-x86/319.12/README/installdriver.html

[3] Ubuntu 官方编译内核教程: https://wiki.ubuntu.com/Kernel/BuildYourOwnKernel

[4] Secure Boot: https://askubuntu.com/questions/755238/why-disabling-secure-boot-is-enforced-policy-when-installing-3rd-party-modules

2 坑之一二

2.1 错误log:/ver/log/nvidia-installer.log

ERROR: The kernel module failed to load, because it was not signed by a key

                that is trusted by the kernel. Please try installing the driver again. 

                and sign the kernel when prompted to do so.

ERROR:  Unable to load the kernel module 'nvidia.ko'. This happens most 

                frequently when this kernel module was built against the wrong or

                improperly configured kernel sources, with a version of gcc that 

                differs from the one used to build the target kernel(1), or if a driver 

                such as rivafb, nvidiafb. or nouveau is present and prevents the 

                NVIDIA kernel module from obtaining ownership of the NVIDIA 

                graphics device(s), or no NVIDIA GPU installed in this system is 

                supported by this NVIDIA Linux graphics driver release.


Kernel module compilation complete.

The target kernel has CONFIG_MODULE_SIG set. which means that is supports 

cryptographic signature on kernel modules. On some system, the kernel may refuse 

to load modules without a valid signature from a trusted key. This system also has 

UEFI Secure Boot enabled; many distrubtions enforce module signature verification

on UEFI systems when Secure Boot is enabled(2). Would you like sign the NVIDIA kernel

module? (Answer: Install without signing)

Kernel module load error: Required key not avaliable

2.2 错误分析

    上面错误已经粗体下划线突出显示并标出(1),(2).

2.2.1 Ubuntu kernel version VS gcc version

    检查系统Ubuntu 的Kernel 版本及其所编译使用的gcc版本:

$cat /proc/version
Linux version 4.4.0-116-generic (buildd@lgw01-amd64-021) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.9) ) #140-Ubuntu SMP Mon Feb 12 21:23:04 UTC 2018

上面的输出结果对应于Ubuntu 16.06 版本. 可以看到gcc 的版本为5.4.0, 而在NVIDIA 官方cuda 安装文档[1] 中的requirement 如下


 

(为了突出重点,截去了部分), 而在系统始终保持更新的话,系统中的gcc版本应该就是5.4.0 版本,而NVIDIA 要求的却是 5.3.1. 但是根据经验还是没有问题的.

2.2.2 Secure Boot

错误(2) 的简要描述了 NVIDIA 由于由于Ubuntu 16.04 的内核编译默认设置了 CONFIG_MODULE_SIG 为真, 然后Secure Boot打开所带来的问题, 更详细的描述见参考链接[2][3]. 大意是在支持UEFI的设备上打开Secure Boot 后,Ubuntu 16.04对于添加到内核的模块更加保守, 需要持有签名才能添加到模块中, 而显卡驱动由于要添加到内核中, 所以需要签名. 在安装过程中我们也会看到NVIDIA显卡会提示是否生成签名. 如果生成成功则没有问题,如果失败则

进入BIOS关闭Secure Boot


-------------------------------------------------------

以上是实践中的一些经验,欢迎讨论与批评.

你可能感兴趣的:(环境配置)