Archlinux(generic) Linux 下安装安装配置tensorflow_gpu_1.2.0

关于显卡驱动和cuda环境的安装, 本文不过多涉及,下面是两个链接,一个是驱动, 一个是CUDA:
http://www.nvidia.cn/download/driverResults.aspx/117766/cn
https://developer.nvidia.com/cuda-downloads
只要不是上古内核, 驱动安装就是傻瓜化的,不多说。
CUDA有一点麻烦, 官方只提供了几个大的发行版的安装包。 小众一些的例如Arch就没有, 但是发行版自己维护的有CUDA的包, 使用 pacman 直接安装cuda就可以了。

下面进入正题:

Linux环境上, google只针对ubuntu发布了官方的deb安装包. 并且在install guide里说了这么一句话: “don’t build a TensorFlow binary yourself unless you are very comfortable building complex packages from source and dealing with the inevitable aftermath should things not go exactly as documented.

得,心凉. 想在其他的Linux上跑tensorflow还是比想象中麻烦一些的.

除了自己编译之外, 还有两种可行的办法在非Ubuntu发行版上使用Tensorflow.
1. Docker
首先测试使用docker部署.
也不知道是因为科学上网失效还是官方的源有问题, 一直下不到image.
并且nvidia-docker很不稳定, 自带的测试的测试还偶尔报:” nvidia-docker-plugin exits with “Error: nvml: Unknown Error” ” 我们时间宝贵, 不再折腾, 等等后续版本稳定了再说.
如果看官有兴趣, 这里是使用docker部署的instructions:
https://www.tensorflow.org/install/install_linux#InstallingDocker

2.Anaconda
使用anaconda有个好处, 就是省时间。
它可以帮助自动安装大部分科学计算和数据处理依赖的库, 同时自己搞定环境变量等等的配置。
a. 先安装anaconda。 下载linux版本的anaconda安装程序,地址在这里:https://repo.continuum.io/archive/Anaconda3-4.4.0-Linux-x86_64.sh 。

b. 执行 bash Anaconda3-4.4.0-Linux-x86_64.sh 安装。 这里不推荐安装到root,因为可能会和系统里的python版本有冲突。使用普通用户执行安装, 安装到home下比较好。

c. 创建一个virtual env 专门用来跑tensorflow。

$ conda create -n tensorflow
$ source activate tensorflow 
$ conda install anaconda
$ pip install --ignore-installed --upgrade https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.2.0-cp36-cp36m-linux_x86_64.whl

注意这里,因为conda 新创建的virtual env默认安装了py3.6版本, 所以这里也要对应安装相应的tensorflow版本。 如果是2.x或者是3.5, 需要按照如下表格选择对应的版本:

The URL of the TensorFlow Python package

A few installation mechanisms require the URL of the TensorFlow Python
package. The value you specify depends on three factors:

operating system
Python version
CPU only vs. GPU support

This section documents the relevant values for Linux installations.
Python 2.7

CPU only:

https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-1.2.0-cp27-none-linux_x86_64.whl

GPU support:

https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.2.0-cp27-none-linux_x86_64.whl

Note that GPU support requires the NVIDIA hardware and software
described in NVIDIA requirements to run TensorFlow with GPU support.
Python 3.4

CPU only:

https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-1.2.0-cp34-cp34m-linux_x86_64.whl

GPU support:

https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.2.0-cp34-cp34m-linux_x86_64.whl

Note that GPU support requires the NVIDIA hardware and software
described in NVIDIA requirements to run TensorFlow with GPU support.
Python 3.5

CPU only:

https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-1.2.0-cp35-cp35m-linux_x86_64.whl

GPU support:

https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.2.0-cp35-cp35m-linux_x86_64.whl

Note that GPU support requires the NVIDIA hardware and software
described in NVIDIA requirements to run TensorFlow with GPU support.
Python 3.6

CPU only:

https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-1.2.0-cp36-cp36m-linux_x86_64.whl

GPU support:

https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.2.0-cp36-cp36m-linux_x86_64.whl

Note that GPU support requires the NVIDIA hardware and software
described in NVIDIA requirements to run TensorFlow with GPU support.

d. 安装cudnn. 地址在这里 https://developer.nvidia.com/cudnn , 千万注意, 我们刚才安装的tensorflow 1.2 使用的是cudnn5 版本。 不要下错了。

$ tar zxvf cudnn-8.0-linux-x64-v5.1.tgz
$ sudo cp -r CUDA /opt/
$ cd /opt/cuda/lib64
$ ls *libcudnn*
$ sudo ldconfig

解释一下, 上面第一条shell命令解压下载来的cudnn。 解压后会在当前目录生成包含cudnn的“CUDA”目录,不要迷糊,这个“CUDA”目录名字只是为了方便让你把他merge到CUDA的安装目录。将这个目录整体拷贝到CUDA的安装目录即可。(在笔者的服务器上, CUDA安装在 /opt/下面)

e. 验证tensorflow安装是否成功:

$ source activate tensorflow 
$ python
>>> import tensorflow as tf

如果没有错误提示, 说明安装成功。

f. 安装keras : $ pip install keras
到这里, tensorflow和keras的安装完成。

最后, 拿 keras sample里的 helloword(mnist_cnn.py)做一下benchmark:

CPU计算:

mnist_cnn.py', wdir='E:/lzjwork/develop/keras-master/examples')
x_train shape: (60000, 28, 28, 1)
60000 train samples
10000 test samples
Train on 60000 samples, validate on 10000 samples
Epoch 1/12
60000/60000 [==============================] - 132s - loss: 0.3531 - acc: 0.8916 - val_loss: 0.0888 - val_acc: 0.9726
Epoch 2/12
60000/60000 [==============================] - 131s - loss: 0.1191 - acc: 0.9653 - val_loss: 0.0540 - val_acc: 0.9827
Epoch 3/12
60000/60000 [==============================] - 133s - loss: 0.0878 - acc: 0.9743 - val_loss: 0.0433 - val_acc: 0.9863
Epoch 4/12
60000/60000 [==============================] - 129s - loss: 0.0731 - acc: 0.9783 - val_loss: 0.0387 - val_acc: 0.9866
Epoch 5/12
60000/60000 [==============================] - 124s - loss: 0.0639 - acc: 0.9806 - val_loss: 0.0358 - val_acc: 0.9879
Epoch 6/12
60000/60000 [==============================] - 124s - loss: 0.0550 - acc: 0.9842 - val_loss: 0.0342 - val_acc: 0.9883
Epoch 7/12
60000/60000 [==============================] - 123s - loss: 0.0529 - acc: 0.9845 - val_loss: 0.0307 - val_acc: 0.9895
Epoch 8/12
60000/60000 [==============================] - 123s - loss: 0.0484 - acc: 0.9855 - val_loss: 0.0303 - val_acc: 0.9893
Epoch 9/12
60000/60000 [==============================] - 124s - loss: 0.0432 - acc: 0.9872 - val_loss: 0.0296 - val_acc: 0.9902
Epoch 10/12
60000/60000 [==============================] - 124s - loss: 0.0424 - acc: 0.9871 - val_loss: 0.0270 - val_acc: 0.9911
Epoch 11/12
60000/60000 [==============================] - 122s - loss: 0.0399 - acc: 0.9874 - val_loss: 0.0276 - val_acc: 0.9909
Epoch 12/12
60000/60000 [==============================] - 123s - loss: 0.0362 - acc: 0.9892 - val_loss: 0.0264 - val_acc: 0.9913
Test loss: 0.0264107687845
Test accuracy: 0.9913

GPU计算


2017-06-21 17:44:40.639353: I tensorflow/core/common_runtime/gpu/gpu_device.cc:940] Found device 0 with properties: 
name: Graphics Device
major: 6 minor: 1 memoryClockRate (GHz) 1.582
pciBusID 0000:01:00.0
Total memory: 10.91GiB
Free memory: 10.74GiB
2017-06-21 17:44:40.639367: I tensorflow/core/common_runtime/gpu/gpu_device.cc:961] DMA: 0 
2017-06-21 17:44:40.639371: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0:   Y 
2017-06-21 17:44:40.639379: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Graphics Device, pci bus id: 0000:01:00.0)
60000/60000 [==============================] - 4s - loss: 0.3362 - acc: 0.8979 - val_loss: 0.0779 - val_acc: 0.9756
Epoch 2/12
60000/60000 [==============================] - 3s - loss: 0.1136 - acc: 0.9670 - val_loss: 0.0504 - val_acc: 0.9833
Epoch 3/12
60000/60000 [==============================] - 3s - loss: 0.0869 - acc: 0.9745 - val_loss: 0.0437 - val_acc: 0.9858
Epoch 4/12
60000/60000 [==============================] - 3s - loss: 0.0729 - acc: 0.9786 - val_loss: 0.0369 - val_acc: 0.9874
Epoch 5/12
60000/60000 [==============================] - 3s - loss: 0.0630 - acc: 0.9807 - val_loss: 0.0360 - val_acc: 0.9872
Epoch 6/12
60000/60000 [==============================] - 3s - loss: 0.0567 - acc: 0.9831 - val_loss: 0.0337 - val_acc: 0.9884
Epoch 7/12
60000/60000 [==============================] - 3s - loss: 0.0508 - acc: 0.9851 - val_loss: 0.0293 - val_acc: 0.9901
Epoch 8/12
60000/60000 [==============================] - 3s - loss: 0.0475 - acc: 0.9859 - val_loss: 0.0326 - val_acc: 0.9879
Epoch 9/12
60000/60000 [==============================] - 3s - loss: 0.0451 - acc: 0.9867 - val_loss: 0.0315 - val_acc: 0.9896
Epoch 10/12
60000/60000 [==============================] - 3s - loss: 0.0420 - acc: 0.9872 - val_loss: 0.0283 - val_acc: 0.9905
Epoch 11/12
60000/60000 [==============================] - 3s - loss: 0.0394 - acc: 0.9885 - val_loss: 0.0300 - val_acc: 0.9906
Epoch 12/12
60000/60000 [==============================] - 3s - loss: 0.0377 - acc: 0.9884 - val_loss: 0.0289 - val_acc: 0.9905
Test loss: 0.0288597793579
Test accuracy: 0.9905

真是有一种牛车换火箭的感觉。

你可能感兴趣的:(Linux)