Tensorflow手记

这里写自定义目录标题

  • tensorflow的安装
        • 使用仅支持 CPU 的映像的示例
        • GPU 支持
  • 遇到的问题
      • 显存不足
          • 报错1

tensorflow的安装

安装tensorflow docker版本很简单,按官网的指南安装GPU支持的docker版本只需要三步

  • 在本地主机上安装 Docker。
  • 要在 Linux 上启用 GPU 支持,请安装 nvidia-docker。
  • 启动 TensorFlow Docker 容器

使用仅支持 CPU 的映像的示例

docker run -it --rm tensorflow/tensorflow \
   python -c "import tensorflow as tf; tf.enable_eager_execution(); print(tf.reduce_sum(tf.random_normal([1000, 1000])))"

GPU 支持

检查 GPU 是否可用:

lspci | grep -i nvidia

验证 nvidia-docker 安装:

docker run --runtime=nvidia --rm nvidia/cuda nvidia-smi

使用支持 GPU 的映像
下载并运行支持 GPU 的 TensorFlow 映像:

docker run --runtime=nvidia -it --rm tensorflow/tensorflow:latest-gpu \
   python -c "import tensorflow as tf; tf.enable_eager_execution(); print(tf.reduce_sum(tf.random_normal([1000, 1000])))"

可能需要几分钟的时间,然后安装就完成了

遇到的问题

显存不足

报错1
2019-03-12 07:19:15.563496: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-03-12 07:19:17.012829: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcuda.so.1
2019-03-12 07:19:18.953407: W tensorflow/compiler/xla/service/platform_util.cc:256] unable to create StreamExecutor for CUDA:0: failed initializing StreamExecutor for CUDA device ordinal 0: Internal: failed call to cuDevicePrimaryCtxRetain: CUDA_ERROR_OUT_OF_MEMORY: out of memory; total memory reported: 12788498432
2019-03-12 07:19:18.953905: I tensorflow/compiler/xla/service/service.cc:162] XLA service 0x510ff40 executing computations on platform CUDA. Devices:
2019-03-12 07:19:18.954126: I tensorflow/compiler/xla/service/service.cc:169]   StreamExecutor device (0): TITAN Xp, Compute Capability 6.1
Traceback (most recent call last):
  File "", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/constant_op.py", line 245, in constant
    allow_broadcast=True)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/constant_op.py", line 253, in _constant_impl
    t = convert_to_eager_tensor(value, ctx, dtype)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/constant_op.py", line 99, in convert_to_eager_tensor
    handle = ctx._handle  # pylint: disable=protected-access
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/eager/context.py", line 447, in _handle
    self._initialize_handle_and_devices()
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/eager/context.py", line 364, in _initialize_handle_and_devices
    self._context_handle = pywrap_tensorflow.TFE_NewContext(opts)
tensorflow.python.framework.errors_impl.InvalidArgumentError: device CUDA:0 not supported by XLA service
        while setting up XLA_GPU_JIT device number 0
root@bba2178d455a:/# nvidia-smi
Tue Mar 12 07:19:53 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.43       Driver Version: 418.43       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  TITAN Xp            Off  | 00000000:02:00.0 Off |                  N/A |
| 50%   79C    P2   206W / 250W |  12146MiB / 12196MiB |     57%      Default |
+-------------------------------+----------------------+----------------------+
|   1  TITAN Xp            Off  | 00000000:83:00.0 Off |                  N/A |
| 23%   35C    P8    11W / 250W |    304MiB / 12196MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

查看日志,发现GPU1显存不足,切换到GPU2执行

root@bba2178d455a:/# CUDA_VISIBLE_DEVICES="1" python
Python 2.7.12 (default, Nov 12 2018, 14:36:49) 
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> hello = tf.constant('Hello, TensorFlow!')
2019-03-12 07:20:13.982541: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-03-12 07:20:14.010166: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcuda.so.1
2019-03-12 07:20:15.343635: I tensorflow/compiler/xla/service/service.cc:162] XLA service 0x520f190 executing computations on platform CUDA. Devices:
2019-03-12 07:20:15.343733: I tensorflow/compiler/xla/service/service.cc:169]   StreamExecutor device (0): TITAN Xp, Compute Capability 6.1
2019-03-12 07:20:15.478958: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2397445000 Hz
2019-03-12 07:20:15.481017: I tensorflow/compiler/xla/service/service.cc:162] XLA service 0x53236a0 executing computations on platform Host. Devices:
2019-03-12 07:20:15.481071: I tensorflow/compiler/xla/service/service.cc:169]   StreamExecutor device (0): <undefined>, <undefined>
2019-03-12 07:20:15.505691: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1467] Found device 0 with properties: 
name: TITAN Xp major: 6 minor: 1 memoryClockRate(GHz): 1.582
pciBusID: 0000:83:00.0
totalMemory: 11.91GiB freeMemory: 11.47GiB
2019-03-12 07:20:15.505741: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1546] Adding visible gpu devices: 0
2019-03-12 07:20:15.505958: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2019-03-12 07:20:15.513226: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1015] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-03-12 07:20:15.513284: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1021]      0 
2019-03-12 07:20:15.514078: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1034] 0:   N 
2019-03-12 07:20:15.528644: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1149] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 11157 MB memory) -> physical GPU (device: 0, name: TITAN Xp, pci bus id: 0000:83:00.0, compute capability: 6.1)
>>>

你可能感兴趣的:(tensorflow,笔记,学习笔记,docker,tensorflow)