为了用GPU加速TENSORFLOW,笔者折腾了两天,终于给我的双显卡笔记本安装上了CUDA,期间电脑几次进不了桌面,说实话,我的内心是崩溃的。
尝试了.run文件和直接apt-get安装,最后使用的方法是apt。在此将过程写下来,免得小白入坑。
操作系统:
Ubuntu 16.04 LTS desktop amd64
显卡和CPU:
Intel i7-7500 Nvidia-Geforce-940MX
一:检查你的显卡是否支持CUDA,参考网址
http://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#verify-you-have-cuda-enabled-system
二:这个教程采用apt网络安装方式
apt-get update && apt-get upgrade
dpkg -i virtualgl*.deb
apt-get install linux-headers-$(uname -r)
apt-get install freeglut3-dev libxmu-dev libpcap-dev
export PATH=$PATH:/opt/VirtualGL/bin
export PATH=$PATH:/usr/local/cuda/bin
apt-get install bumblebee-nvidia primus
[bumblebeed]
ServerGroup=bumblebee
TurnCardOffAtExit=false
NoEcoModeOverride=false
Driver=nvidia
XorgConfDir=/etc/bumblebee/xorg.conf.d
Bridge=auto
PrimusLibraryPath=/usr/lib/x86_64-linux-gnu/primus:/usr/lib/i386-linux-gnu/primus
AllowFallbackToIGC=false
Driver=nvidia
[driver-nvidia]
KernelDriver=nvidia
PMMethod=auto
LibraryPath=/usr/lib/nvidia-367:/usr/lib32/nvidia-367
XorgModulePath=/usr/lib/xorg,/usr/lib/xorg/modules
XorgConfFile=/etc/bumblebee/xorg.conf.nvidia
Driver=nouveau
[driver-nouveau]
KernelDriver=nouveau
PMMethod=auto
XorgConfFile=/etc/bumblebee/xorg.conf.nouveau
$ lspci | egrep 'VGA|3D'
00:02.0 VGA compatible controller: Intel Corporation Device 5916 (rev 02)
01:00.0 3D controller: NVIDIA Corporation Device 179c (rev a2)
sudo nano /etc/bumblebee/xorg.conf.nvidia
加入:
Section "ServerLayout"
Identifier "Layout0"
Option "AutoAddDevices" "false"
Option "AutoAddGPU" "false"
BusID "PCI:01:00.0"
sudo shutdown -r now
三:后期工作
设置环境变量
编辑bashrc加入
#set cuda environment
export PATH=$PATH:/opt/VirtualGL/bin
export PATH=$PATH:/usr/local/cuda/bin
export LD_LIBRARY_PATH=/usr/local/cuda-8.0/lib64\
${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
验证CUDA的安装
$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2016 NVIDIA Corporation
Built on Tue_Jan_10_13:22:03_CST_2017
Cuda compilation tools, release 8.0, V8.0.61
$ nvidia-smi
Tue Apr 4 21:26:01 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.39 Driver Version: 375.39 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce 940MX Off | 0000:01:00.0 Off | N/A |
| N/A 41C P0 N/A / N/A | 325MiB / 2002MiB | 24% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1258 G /usr/lib/xorg/Xorg 191MiB |
| 0 1977 G compiz 125MiB |
| 0 2295 G fcitx-qimpanel 8MiB |
+-----------------------------------------------------------------------------+
编译范例代码:
参考官网:
http://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#post-installation-actions
$ cuda-install-samples-8.0.61.sh <你要将范例代码放置的文件夹>
进入代码目录后执行make,编译完成后
之后在bin目录下执行deviceQuery 和 bandwidthTest。结果为Pass则OK。
四: 问题解决方案
$ cat /proc/driver/nvidia/version
NVRM version: NVIDIA UNIX x86_64 Kernel Module 375.39 Tue Jan 31 20:47:00 PST 2017
GCC version: gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.4)
进入报错的那个目录,找findglib.mk这个文件。
打开它可以发现有一项
UBUNTU_PKG_NAME = "nvidia-367"
它将库目录硬编码了。将其改成安装的nvidia驱动对应版本号即可,这里是375。
sudo apt-get remove --purge nvidia-*
http://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#environment-setup
http://askubuntu.com/questions/799184/how-can-i-install-cuda-on-ubuntu-16-04
https://devtalk.nvidia.com/default/topic/769578/cuda-6-5-cannot-find-lnvcuvid/