少走弯路先看版本搭配。搭配!搭配!搭配!
官方表格
https://tensorflow.google.cn/install/source_windows#gpu
版本 Python 版本 编译器 构建工具 cuDNN CUDA
tensorflow_gpu-1.12.0 3.5-3.6 MSVC 2015 update 3 Bazel 0.15.0 7 9
tensorflow_gpu-1.11.0 3.5-3.6 MSVC 2015 update 3 Bazel 0.15.0 7 9
tensorflow_gpu-1.10.0 3.5-3.6 MSVC 2015 update 3 Cmake v3.6.3 7 9
tensorflow_gpu-1.9.0 3.5-3.6 MSVC 2015 update 3 Cmake v3.6.3 7 9
tensorflow_gpu-1.8.0 3.5-3.6 MSVC 2015 update 3 Cmake v3.6.3 7 9
tensorflow_gpu-1.7.0 3.5-3.6 MSVC 2015 update 3 Cmake v3.6.3 7 9
tensorflow_gpu-1.6.0 3.5-3.6 MSVC 2015 update 3 Cmake v3.6.3 7 9
tensorflow_gpu-1.5.0 3.5-3.6 MSVC 2015 update 3 Cmake v3.6.3 7 9
tensorflow_gpu-1.4.0 3.5-3.6 MSVC 2015 update 3 Cmake v3.6.3 6 8
tensorflow_gpu-1.3.0 3.5-3.6 MSVC 2015 update 3 Cmake v3.6.3 6 8
tensorflow_gpu-1.2.0 3.5-3.6 MSVC 2015 update 3 Cmake v3.6.3 5.1 8
tensorflow_gpu-1.1.0 3.5 MSVC 2015 update 3 Cmake v3.6.3 5.1 8
tensorflow_gpu-1.0.0 3.5 MSVC 2015 update 3 Cmake v3.6.3 5.1 8
这些表格很重要,环境搭建时对应关系不当,会导致错误出现的花红柳绿,五彩纷呈。
有可能像“ImportError: DLL load failed: 找不到指定的模块”这种问题都会出现
根据操作系统找到合适的表格,根据自己的显卡驱动的支持来确定应该选哪一行。
这里https://developer.nvidia.com/cuda-gpus
查看自己的显卡是否支持cuda
这里https://www.geforce.cn/drivers
查看自己的显卡驱动的版本号
举个栗子如
GeForce Game Ready Driver - WHQL
版本: 419.35 - 发行日期: 2019-3-5
根据419.35能确定合适的cuda版本
官方文档
https://docs.nvidia.com/deeplearning/sdk/cudnn-install/index.html#installcuda
引用
4. Installing cuDNN on Windows
4.1. Prerequisites
Ensure you meet the following requirements before you install cuDNN.
A GPU of compute capability 3.0 or higher. To understand the compute capability of the GPU on your system, see: CUDA GPUs. Also see the cuDNN Support Matrix.
One of the following supported platforms:
Windows 7
Windows 10
Windows Server 2012
One of the following supported CUDA versions and NVIDIA graphics driver:
NVIDIA graphics driver R410 or newer for CUDA 10.0
NVIDIA graphics driver R396 or newer for CUDA 9.2
NVIDIA graphics driver R384 or newer for CUDA 9
NVIDIA graphics driver R377 or newer for CUDA 8
419应该选择 CUDA 10.0
此篇有图表,可查看对应关系
https://blog.csdn.net/qq_27158179/article/details/82952021
这个案例很经典,很欣赏文章的标题 正好是CUDA 10.0
https://www.cnblogs.com/sorex/p/7615185.html
Win10 x64 + CUDA 10.0 + cuDNN v7.5 + TensorFlow GPU 1.13 安装指南
Python 3.6.x x64
安装tensorflow-gpu指定版本
pip install tensorflow-gpu==1.13.1
NVIDIA 419.35 驱动
CUDA 10.0
下载地址:https://developer.nvidia.com/cuda-10.0-download-archive
其他版本这里
https://developer.nvidia.com/cuda-toolkit-archive
Latest Release
CUDA Toolkit 10.1 (Feb 2019), Versioned Online Documentation
Archived Releases
CUDA Toolkit 10.0 (Sept 2018), Online Documentation
CUDA Toolkit 9.2 (May 2018),Online Documentation
CUDA Toolkit 9.1 (Dec 2017), Online Documentation
CUDA Toolkit 9.0 (Sept 2017), Online Documentation
CUDA Toolkit 8.0 GA2 (Feb 2017), Online Documentation
CUDA Toolkit 8.0 GA1 (Sept 2016), Online Documentation
CUDA Toolkit 7.5 (Sept 2015)
CUDA Toolkit 7.0 (March 2015)
CUDA Toolkit 6.5 (August 2014)
CUDA Toolkit 6.0 (April 2014)
CUDA Toolkit 5.5 (July 2013)
CUDA Toolkit 5.0 (Oct 2012)
CUDA Toolkit 4.2 (April 2012)
CUDA Toolkit 4.1 (Jan 2012)
CUDA Toolkit 4.0 (May 2011)
CUDA Toolkit 3.2 (Nov 2010)
CUDA Toolkit 3.1 (June 2010)
CUDA Toolkit 3.0 (March 2010)
OpenCL 1.0 Release (Sept 2009)
CUDA Toolkit 2.3 (June 2009)
CUDA Toolkit 2.2 (May 2009)
CUDA Toolkit 2.1 (Jan 2009)
CUDA Toolkit 2.0 (Aug 2008)
CUDA Toolkit 1.1 (Dec 2007)
CUDA Toolkit 1.0 (June 2007)
cuDNN v7.5 for CUDA 10.0
下载地址:https://developer.nvidia.com/rdp/cudnn-download
解压后覆盖到C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0目录即可。
Win10下的nvidia-smi在 C:\Program Files\NVIDIA Corporation\NVSMI 目录内。
添加路径到环境变量的Path
关于环境变量
这篇比较细致https://blog.csdn.net/qilixuening/article/details/77503631
引用原文
CUDA_SDK_PATH = C:\ProgramData\NVIDIA Corporation\CUDA Samples\v8.0(这是默认安装位置的路径,经自定义路径后,我的路径为D:\NVIDIA\CUDA Samples)
CUDA_LIB_PATH = %CUDA_PATH%\lib\x64
CUDA_BIN_PATH = %CUDA_PATH%\bin
CUDA_SDK_BIN_PATH = %CUDA_SDK_PATH%\bin\win64
CUDA_SDK_LIB_PATH = %CUDA_SDK_PATH%\common\lib\x64
然后:
在系统变量 PATH 的末尾添加:
%CUDA_LIB_PATH%;%CUDA_BIN_PATH%;%CUDA_SDK_LIB_PATH%;%CUDA_SDK_BIN_PATH%;
再添加如下4条(默认安装路径):
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\lib\x64;
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\bin;
C:\ProgramData\NVIDIA Corporation\CUDA Samples\v8.0\common\lib\x64;
C:\ProgramData\NVIDIA Corporation\CUDA Samples\v8.0\bin\win64;
不要复制粘贴,变量值的路径都应该相应替换为你的安装的路径
验证安装
引用原文
配置完成后,我们可以验证是否配置成功,主要使用CUDA内置的deviceQuery.exe 和 bandwithTest.exe:
首先win+R启动cmd,cd到安装目录下的 ...\extras\demo_suite,然后分别执行bandwidthTest.exe和deviceQuery.exe,应该得到下图:
如果以上两步都返回了Result=PASS,那么就算成功啦。
C:\Users\hasee>nvidia-smi
Sun Mar 10 22:01:06 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 419.35 Driver Version: 419.35 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 105... WDDM | 00000000:01:00.0 On | N/A |
| N/A 43C P8 N/A / N/A | 244MiB / 4096MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1148 C+G Insufficient Permissions N/A |
| 0 12644 C+G ...hell.Experiences.TextInput.InputApp.exe N/A |
+-----------------------------------------------------------------------------+
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\extras\demo_suite>bandwidthTest.exe
[CUDA Bandwidth Test] - Starting...
Running on...
Device 0: GeForce GTX 1050 Ti
Quick Mode
Host to Device Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 6351.0
Device to Host Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 6453.6
Device to Device Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 94710.5
Result = PASS
NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\extras\demo_suite>deviceQuery.exe
deviceQuery.exe Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: "GeForce GTX 1050 Ti"
CUDA Driver Version / Runtime Version 10.1 / 10.1
CUDA Capability Major/Minor version number: 6.1
Total amount of global memory: 4096 MBytes (4294967296 bytes)
( 6) Multiprocessors, (128) CUDA Cores/MP: 768 CUDA Cores
GPU Max Clock rate: 1620 MHz (1.62 GHz)
Memory Clock rate: 3504 Mhz
Memory Bus Width: 128-bit
L2 Cache Size: 1048576 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers
Total amount of constant memory: zu bytes
Total amount of shared memory per block: zu bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: zu bytes
Texture alignment: zu bytes
Concurrent copy and kernel execution: Yes with 5 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
CUDA Device Driver Mode (TCC or WDDM): WDDM (Windows Display Driver Model)
Device supports Unified Addressing (UVA): Yes
Device supports Compute Preemption: Yes
Supports Cooperative Kernel Launch: No
Supports MultiDevice Co-op Kernel Launch: No
Device PCI Domain ID / Bus ID / location ID: 0 / 1 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 10.1, CUDA Runtime Version = 10.1, NumDevs = 1, Device0 = GeForce GTX 1050 Ti
Result = PASS
安装完成后,进入python环境,输入:
>>> import tensorflow as tf
>>> hello = tf.constant('Hello, TensorFlow!')
>>> sess = tf.Session()
>>> print(sess.run(hello))
输出类似这样
2019-03-11 23:14:31.677862: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2019-03-11 23:14:32.091092: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties:
name: GeForce GTX 1050 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.62
pciBusID: 0000:01:00.0
totalMemory: 4.00GiB freeMemory: 3.30GiB
2019-03-11 23:14:32.129912: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-03-11 23:14:34.344642: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-03-11 23:14:34.358376: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0
2019-03-11 23:14:34.367380: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N
2019-03-11 23:14:34.409061: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 3004 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1050 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1)
b'Hello, TensorFlow!'
至此,你就可以体验tensorflow-gpu的速度感了。
其他参考
https://blog.csdn.net/weixin_39290638/article/details/80045236
Win10下Tensorflow(GPU版)安装趟坑实录
https://blog.csdn.net/qq_36124802/article/details/79675485
tensflow-gpu版的无数坑坑坑!(tf坑大总结)
https://keras-cn.readthedocs.io/en/latest/for_beginners/keras_windows/
Keras中文文档Docs » For beginners » Keras windows
https://blog.csdn.net/qq_33186949/article/details/79104659
Tensorflow之GPU和CPU