前提:亚马逊云教程6:创建、启动AMI,设置Cloud Watch
GPU原是用于提高图像显示性能的,之前多用于玩3D游戏的电脑。2009年,斯坦福大学的学者Andrew Ng及一个团队意识到GPU可用于神经网络的并行运算(REF1)。GPU的并行运算要比CPU高出很多倍,所以现在被广泛运用于人工智能的计算中。而TensorFlow也有相应的GPU版本。
- 在EC2控制台左侧“IMAGES”栏目下的“AMIs”找到快照。
- 右键点击快照,选择“Launch”。
- 在“Step 2: Choose and Instance Type”这里,选择需要的类型,如,“p2.xlarge”。
- “Step 6: Configure Security Group”,“Add Rule”,添加必要的端口,如之前jupyter notebook使用的9999。
GPU实例的启动等待的时间要稍微长一点。等到Public IP显示后我们可以尝试登录服务器。
路径1. 使用conda安装tensorflow-gpu
conda create -n tensorflow-gpu-conda python=3.6 ipykernel
source activate tensorflow-gpu-conda
进入新环境。python -m ipykernel install --user --name tensorflow-gpu-conda
把这个环境安装到ipykernel,用于jupyter notebook。conda install -c anaconda tensorflow-gpu=1.1.0
安装支持GPU的TensorFlow。这里的方法是谷歌搜索“conda install tensorflow gpu”之后,在这个conda官网这个链接找到的。这个就是使用conda的优势,只需要一行命令,conda会负责安装各种依赖项目。-
>>> import tensorflow as tf >>> hello = tf.constant('Hello, TensorFlow!') >>> sess = tf.Session() >>> print(sess.run(hello))
>>> import tensorflow as tf >>> hello = tf.constant('Hello, TensorFlow!') >>> sess = tf.Session() 2017-06-14 23:31:08.287078: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations. 2017-06-14 23:31:08.287117: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations. 2017-06-14 23:31:08.287130: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations. 2017-06-14 23:31:08.287144: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations. 2017-06-14 23:31:08.287158: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations. 2017-06-14 23:31:08.366924: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2017-06-14 23:31:08.367192: I tensorflow/core/common_runtime/gpu/gpu_device.cc:887] Found device 0 with properties: name: Tesla K80 major: 3 minor: 7 memoryClockRate (GHz) 0.8235 pciBusID 0000:00:1e.0 Total memory: 11.17GiB Free memory: 2.86GiB 2017-06-14 23:31:08.367220: I tensorflow/core/common_runtime/gpu/gpu_device.cc:908] DMA: 0 2017-06-14 23:31:08.367233: I tensorflow/core/common_runtime/gpu/gpu_device.cc:918] 0: Y 2017-06-14 23:31:08.367251: I tensorflow/core/common_runtime/gpu/gpu_device.cc:977] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla K80, pci bus id: 0000:00:1e.0) >>> print(sess.run(hello)) b'Hello, TensorFlow!'
>>> print(sess.run(hello)) b'Hello, TensorFlow!'
路径2. 根据谷歌官网提示安装依赖项目后安装TensorFlow
依据TensorFlow官网的安装要求,为了让TensorFlow使用服务器的GPU,我们还需要在系统层面安装三个接口程序,cuda,cuDNN,和 libcupti-dev 包。原文如下(下述信息摘自谷歌官网,摘抄日期20170614),里面提到了其他要求,不过按照这个教程的方法,是默认符合的。
NVIDIA requirements to run TensorFlow with GPU support
If you are installing TensorFlow with GPU support using one of the mechanisms described in this guide, then the following NVIDIA software must be installed on your system:
CUDA® Toolkit 8.0. For details, see NVIDIA's documentation. Ensure that you append the relevant Cuda pathnames to the
environment variable as described in the NVIDIA documentation.The NVIDIA drivers associated with CUDA Toolkit 8.0.
cuDNN v5.1. For details, see NVIDIA's documentation. Ensure that you create the
environment variable as described in the NVIDIA documentation.GPU card with CUDA Compute Capability 3.0 or higher. See NVIDIA documentation for a list of supported GPU cards.
The libcupti-dev library, which is the NVIDIA CUDA Profile Tools Interface. This library provides advanced profiling support. To install this library, issue the following command:
$ sudo apt-get install libcupti-dev
2.1 安装cuda
roden@ip-172-31-19-170:~$ lspci | grep -i nvidia 00:1e.0 3D controller: NVIDIA Corporation GK210GL [Tesla K80] (rev a1) # 可以看到有一个Tesla K80
roden@ip-172-31-19-170:~$ uname -r 4.4.0-1018-aws
roden@ip-172-31-19-170:~$ sudo apt-get install linux-headers-$(uname -r) Reading package lists... Done Building dependency tree Reading state information... Done linux-headers-4.4.0-1018-aws is already the newest version (4.4.0-1018.27). linux-headers-4.4.0-1018-aws set to manually installed. The following packages were automatically installed and are no longer required: linux-aws-headers-4.4.0-1013 linux-aws-headers-4.4.0-1016 linux-headers-4.4.0-1013-aws linux-headers-4.4.0-1016-aws linux-image-4.4.0-1013-aws linux-image-4.4.0-1016-aws Use 'sudo apt autoremove' to remove them. 0 upgraded, 0 newly installed, 0 to remove and 37 not upgraded. # 提示已经是最新版的了,所以不用安装
下载deb安装包。在cuda工具箱下载页面依次选择“Linux, x8664, Ubuntu, 16.04, deb (network)”。然后复制下面出现的下载图标的下载地址(右键,“复制链接地址”)。其地址是 “http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x8664/cuda-repo-ubuntu1604_8.0.61-1_amd64.deb”
roden@ip-172-31-19-170:~/download$ wget http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/cuda-repo-ubuntu1604_8.0.61-1_amd64.deb # wget下载 ... cuda-repo-ubuntu1604_ 100%[=========================>] 2.63K --.-KB/s in 0s ... roden@ip-172-31-19-170:~/download$ ls cuda-repo-ubuntu1604_8.0.61-1_amd64.deb # 下载成功
sudo dpkg -i cuda-repo-ubuntu1604_8.0.61-1_amd64.deb sudo apt-get update sudo apt-get install cuda # 这一步会提示需要你的确认,输入Y按回车确认。
done. done. roden@ip-172-31-19-170:~/download$
roden@ip-172-31-19-170:~/download$ nvidia-smi # 上述命令只有在完成了cuda的安装后才有效 Tue Jun 13 04:47:12 2017 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 375.66 Driver Version: 375.66 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Tesla K80 Off | 0000:00:1E.0 Off | 0 | | N/A 28C P0 70W / 149W | 0MiB / 11439MiB | 100% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+ # 上面显示我们有一个编号为0的GPU,还没有被使用。 # 在使用GPU训练模型是,我们可以再次使用这个命令看看结果。
2.2 安装cuDNN
cuDNN是cuda的深度神经网络包。官方下载链接网址。这个软件需要注册成为用户后才能下载。点击“Download”后,点击“Join now”,完成注册。登录后下载。
选择“Download cuDNN v5.1 (Jan 20, 2017), for CUDA 8.0”。原因如下,我们之前安装的cuda是8.0版的,前后要一致。另外,谷歌的文档里面说,要求“cuDNN v5.1”,所以我们只能选择 v5.1 的。我最初写这个文档的时候,选择了最新版的,也就是 v6.0 的,后面运行的时候出现了错误,说没有找到“libcudnn.so.5”这个文件。错误报告如下“ImportError: libcudnn.so.5: cannot open shared object file: No such file or directory”。后来意识到这个问题。所以如果按照官方指南安装,一定要根据官方给的要求,一个一个,一模一样的安装依赖项目。在后面升级的时候要非常注意版本的对应。
点击cuDNN Install Guide查看安装指南。发现只需要下载这个文件,然后放在一个目录里面,再添加环境变量就好了。不需要编译等步骤。
点击cuDNN v5.1 Library for Linux下载。需要登录。我们可以先在本地下载好了之后
上传到服务器。➜ Downloads scp cudnn-8.0-linux-x64-v5.1.tgz [email protected]:download/ # scp 上传 cudnn-8.0-linux-x64-v5.1.tgz 100% 98MB 12.3MB/s 00:08 ➜ Downloads
roden@ip-172-31-19-170:~/download$ ls cudnn-8.0-linux-x64-v5.1.tgz # 找到文件 roden@ip-172-31-19-170:~/download$ tar -xf cudnn-8.0-linux-x64-v5.1.tgz # 解压缩 roden@ip-172-31-19-170:~/download$ ls cuda cudnn-8.0-linux-x64-v5.1.tgz roden@ip-172-31-19-170:~/download$ ls cuda/ # 解压缩之后的 cuda 这个文件夹就是cuDNN了。我们可以查看一下内容。 include lib64 roden@ip-172-31-19-170:~/download$
目录下,因为第一步的cuda是默认安装到这里的。roden@ip-172-31-19-170:~/download$ mv cuda cuDNN # 重命名文件夹,即把cuda改为cuDNN roden@ip-172-31-19-170:~/download$ ls /usr/local/ # 确认目标文件夹下没有同名文件夹,避免覆盖了重要数据。 bin cuda cuda-8.0 etc games include lib man sbin share src roden@ip-172-31-19-170:~/download$ sudo mv cuDNN/ /usr/local/ # 把cuDNN转移过去。/usr/local/是系统文件夹,需要使用sudo才能写入文件。 [sudo] password for roden: # 输入密码 roden@ip-172-31-19-170:~/download$ ls /usr/local/ # 查看是否成功转移文件 bin cuda cuda-8.0 cuDNN etc games include lib man sbin share src roden@ip-172-31-19-170:~/download$ ls /usr/local/cuDNN/ include lib64 roden@ip-172-31-19-170:~/download$
这个文件里面,随便哪一行都行。我一般放在那个文件的开头。roden@ip-172-31-19-170:~/download$ vi ~/.bashrc # 编辑文件 roden@ip-172-31-19-170:~/download$
PATH=$PATH:/usr/local/cuda-8.0/bin LD_LIBRARY_PATH=/usr/local/cuda-8.0/lib64 export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64" export CUDA_HOME=/usr/local/cuda LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuDNN/lib64 LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuDNN/include
source ~/.bashrc
2.3 安装libcupti-dev
sudo apt-get install libcupti-dev
好了,这样我们的cuda, cuDNN, 和l ibcupti-dev 就安装好了。也就是完成了一些前提条件,接下来正式安装支持GPU的TensorFlow。步骤和第5讲中的类似,不过要新建一个虚拟Python环境,更换一下TensorFlow包的链接。
conda create -n tensorflowGPU python=3.6 ipykernel
创建虚拟环境source activate tensorflowGPU
进入到tensorflowGPU虚拟环境python -m ipykernel install --user --name tensorflowGPU
把这个环境安装到ipykernel,用于jupyter notebook。在TensorFlow官网找到Ubuntu的,Python3.6的,支持GPU的包的安装链接。
pip install --ignore-installed --upgrade https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.1.0-cp36-cp36m-linux_x86_64.whl
>>> import tensorflow as tf >>> hello = tf.constant('Hello, TensorFlow!') >>> sess = tf.Session() >>> print(sess.run(hello))
>>> import tensorflow as tf >>> hello = tf.constant('Hello, TensorFlow!') >>> sess = tf.Session() 2017-06-15 02:55:33.982353: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations. 2017-06-15 02:55:33.982396: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations. 2017-06-15 02:55:33.982410: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations. 2017-06-15 02:55:33.982426: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations. 2017-06-15 02:55:33.982437: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations. 2017-06-15 02:55:37.038419: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2017-06-15 02:55:37.038938: I tensorflow/core/common_runtime/gpu/gpu_device.cc:887] Found device 0 with properties: name: Tesla K80 major: 3 minor: 7 memoryClockRate (GHz) 0.8235 pciBusID 0000:00:1e.0 Total memory: 11.17GiB Free memory: 11.11GiB 2017-06-15 02:55:37.038970: I tensorflow/core/common_runtime/gpu/gpu_device.cc:908] DMA: 0 2017-06-15 02:55:37.038986: I tensorflow/core/common_runtime/gpu/gpu_device.cc:918] 0: Y 2017-06-15 02:55:37.039014: I tensorflow/core/common_runtime/gpu/gpu_device.cc:977] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla K80, pci bus id: 0000:00:1e.0) >>> print(sess.run(hello)) b'Hello, TensorFlow!'
>>> print(sess.run(hello)) b'Hello, TensorFlow!'
我们首先按照第5讲中的方法,运行jupyter notebook,打开MNIST_demo那个笔记本。
roden@ip-172-31-19-170:~$ cd tf_notebook/
roden@ip-172-31-19-170:~/tf_notebook$ nohup jupyter notebook &
[1] 2219
roden@ip-172-31-19-170:~/tf_notebook$ nohup: ignoring input and appending output to 'nohup.out'
然后从浏览器中进入到jupyter notebook。打开"MNIST_demo.ipynb”这个笔记本。在工具栏中,选择"Kernel -> Change kernel",然后选择其中一个支持GPU的TensorFlow,比如我们用conda安装的”tensorflow-gpu-conda“。然后点击”Cell -> Run All“。可以看到类似下面的结果。
step 0, training accuracy 0.08
step 100, training accuracy 0.9
step 200, training accuracy 0.88
step 300, training accuracy 0.94
step 400, training accuracy 0.86
step 500, training accuracy 1
step 600, training accuracy 0.96
step 700, training accuracy 0.96
step 800, training accuracy 0.96
step 900, training accuracy 0.9
step 1000, training accuracy 0.94
step 19000, training accuracy 1
step 19100, training accuracy 0.98
step 19200, training accuracy 1
step 19300, training accuracy 1
step 19400, training accuracy 1
step 19500, training accuracy 1
step 19600, training accuracy 1
step 19700, training accuracy 1
step 19800, training accuracy 1
step 19900, training accuracy 1
test accuracy 0.992
可以看到最后在测试集上面的准确率是0.992。可以感觉到这一次的运行速度要比之前快很多,比较快就等到全部程序运行完毕。大家可以返回到之前的没有GPU的kernel,选择"Kernel -> Change kernel -> tensorflow",再运行,会发现这个就慢很多了,因为其使用的是CPU。
- REF1,The inevitable. Kevin Kelly. Page 38
- NVIDIA CUDA Installation Guide for Linux, 英文。
- wget with authentication 需要身份验证时的wget使用方法。
- AMI marketplace 链接。
- 分布式计算,AWS博客链接,TensorFlow文档。
- 路径1:使用conda安装TensorFlow-GPU
conda create -n tensorflow-gpu-conda python=3.6 ipykernel
创建新虚拟环境,命名为`tensorflow-gpu-conda -
source activate tensorflow-gpu-conda
进入新环境。 -
python -m ipykernel install --user --name tensorflow-gpu-conda
把这个环境安装到ipykernel,用于jupyter notebook。 -
conda install -c anaconda tensorflow-gpu=1.1.0
- 路径2:根据谷歌官网提示安装依赖项目后安装TensorFlow。建议大家阅读正文后,查阅官方最新文档和最新要求,然后跟随官方指南完成安装。