Win 10 环境安装TensorFlow 2.1.0 GPU版本的失败经历

文章目录

    • 本机配置
    • 官网安装说明
    • 安装 TensorFlow
    • 测试
      • 程序报错了……

本机配置

  • 系统:Win 10笔记本
  • 显卡:GeForce 940MX, computeCapability: 5.0
  • Python:3.6
  • CUDA版本:10.1
  • cuDNN版本:7.6.5

官网安装说明

  1. 官方的硬件及软件要求 (GPU版本)

The following NVIDIA® software must be installed on your system:

  • NVIDIA® GPU drivers —CUDA 10.1 requires 418.x or higher.
  • CUDA® Toolkit —TensorFlow supports CUDA 10.1 (TensorFlow >= 2.1.0)
  • CUPTI ships with the CUDA Toolkit.
  • cuDNN SDK (>= 7.6)
  • (Optional) TensorRT 6.0 to improve latency and throughput for inference on some models.
  1. GPU型号和cuda版本的对应关系
  2. cuDNN的安装说明参考这里 ,下载cuDNN需要注册登录。

安装 TensorFlow

在对应的conda环境下安装
pip install tensorflow # install in $HOME,默认安装GPU版本 2.1.0

测试

官方说明在这里

from __future__ import absolute_import, division, print_function, unicode_literals

import tensorflow as tf
print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))
output:
2020-02-26 13:52:57.152740: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
2020-02-26 13:53:13.930344: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library nvcuda.dll
2020-02-26 13:53:14.241621: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce 940MX computeCapability: 5.0
coreClock: 1.189GHz coreCount: 3 deviceMemorySize: 2.00GiB deviceMemoryBandwidth: 37.33GiB/s
2020-02-26 13:53:14.251583: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
2020-02-26 13:53:14.267335: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
2020-02-26 13:53:14.281962: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_10.dll
2020-02-26 13:53:14.292789: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_10.dll
2020-02-26 13:53:14.308111: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_10.dll
2020-02-26 13:53:14.322738: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_10.dll
2020-02-26 13:53:14.341372: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-02-26 13:53:14.349375: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0      Num GPUs Available:  1

此时,我以为大功告成了! 一阵窃喜,然后继续测试,

In [4]: a = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
   ...: b = tf.constant([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]])
   ...: c = tf.matmul(a, b)
   ...:
   ...: print(c)
   ...:
output:
2020-02-26 14:00:13.049637: I tensorflow/core/platform/cpu_feature_guard.cc:142]
 Your CPU supports instructions that this TensorFlow binary was 
 not compiled to use: AVX2
2020-02-26 14:00:13.056240: F tensorflow/stream_executor/lib/statusor.cc:34] 
Attempting to fetch value instead of handling error Internal:
 failed to get device attribute 13 for device 0: CUDA_ERROR_UNKNOWN: unknown error

程序报错了……

网上查找一通,仍然没有解决。有人说,这几句代码怎么知道是在GPU上运行的呢?因为官方文档是这么说的:

If a TensorFlow operation has both CPU and GPU implementations, by default the GPU devices will be given priority when the operation is assigned to a device. For example, tf.matmul has both CPU and GPU kernels. On a system with devices CPU:0 and GPU:0, the GPU:0 device will be selected to run tf.matmul unless you explicitly request running it on another device.

如果一个算子,既有cpu实现,也有gpu实现,那么优先选择运行在GPU上,而且框架是自动切换的。

对于这个错误,有人说是GPU型号不行,性能太差,如果真的是这样的话,那就是硬伤了,不抱希望了,换回CPU版本了。
安装CPU版本后,代码就可以运行了。以后有计划再尝试吧。
如果有人解决了,我也可以参考下,感谢~
2020-02-26

你可能感兴趣的:(系统操作,编程基础)