安装tensorflow docker版本很简单,按官网的指南安装GPU支持的docker版本只需要三步
docker run -it --rm tensorflow/tensorflow \
python -c "import tensorflow as tf; tf.enable_eager_execution(); print(tf.reduce_sum(tf.random_normal([1000, 1000])))"
检查 GPU 是否可用:
lspci | grep -i nvidia
验证 nvidia-docker 安装:
docker run --runtime=nvidia --rm nvidia/cuda nvidia-smi
使用支持 GPU 的映像
下载并运行支持 GPU 的 TensorFlow 映像:
docker run --runtime=nvidia -it --rm tensorflow/tensorflow:latest-gpu \
python -c "import tensorflow as tf; tf.enable_eager_execution(); print(tf.reduce_sum(tf.random_normal([1000, 1000])))"
可能需要几分钟的时间,然后安装就完成了
2019-03-12 07:19:15.563496: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-03-12 07:19:17.012829: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcuda.so.1
2019-03-12 07:19:18.953407: W tensorflow/compiler/xla/service/platform_util.cc:256] unable to create StreamExecutor for CUDA:0: failed initializing StreamExecutor for CUDA device ordinal 0: Internal: failed call to cuDevicePrimaryCtxRetain: CUDA_ERROR_OUT_OF_MEMORY: out of memory; total memory reported: 12788498432
2019-03-12 07:19:18.953905: I tensorflow/compiler/xla/service/service.cc:162] XLA service 0x510ff40 executing computations on platform CUDA. Devices:
2019-03-12 07:19:18.954126: I tensorflow/compiler/xla/service/service.cc:169] StreamExecutor device (0): TITAN Xp, Compute Capability 6.1
Traceback (most recent call last):
File "" , line 1, in <module>
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/constant_op.py", line 245, in constant
allow_broadcast=True)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/constant_op.py", line 253, in _constant_impl
t = convert_to_eager_tensor(value, ctx, dtype)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/constant_op.py", line 99, in convert_to_eager_tensor
handle = ctx._handle # pylint: disable=protected-access
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/eager/context.py", line 447, in _handle
self._initialize_handle_and_devices()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/eager/context.py", line 364, in _initialize_handle_and_devices
self._context_handle = pywrap_tensorflow.TFE_NewContext(opts)
tensorflow.python.framework.errors_impl.InvalidArgumentError: device CUDA:0 not supported by XLA service
while setting up XLA_GPU_JIT device number 0
root@bba2178d455a:/# nvidia-smi
Tue Mar 12 07:19:53 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.43 Driver Version: 418.43 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 TITAN Xp Off | 00000000:02:00.0 Off | N/A |
| 50% 79C P2 206W / 250W | 12146MiB / 12196MiB | 57% Default |
+-------------------------------+----------------------+----------------------+
| 1 TITAN Xp Off | 00000000:83:00.0 Off | N/A |
| 23% 35C P8 11W / 250W | 304MiB / 12196MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
查看日志,发现GPU1显存不足,切换到GPU2执行
root@bba2178d455a:/# CUDA_VISIBLE_DEVICES="1" python
Python 2.7.12 (default, Nov 12 2018, 14:36:49)
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> hello = tf.constant('Hello, TensorFlow!')
2019-03-12 07:20:13.982541: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-03-12 07:20:14.010166: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcuda.so.1
2019-03-12 07:20:15.343635: I tensorflow/compiler/xla/service/service.cc:162] XLA service 0x520f190 executing computations on platform CUDA. Devices:
2019-03-12 07:20:15.343733: I tensorflow/compiler/xla/service/service.cc:169] StreamExecutor device (0): TITAN Xp, Compute Capability 6.1
2019-03-12 07:20:15.478958: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2397445000 Hz
2019-03-12 07:20:15.481017: I tensorflow/compiler/xla/service/service.cc:162] XLA service 0x53236a0 executing computations on platform Host. Devices:
2019-03-12 07:20:15.481071: I tensorflow/compiler/xla/service/service.cc:169] StreamExecutor device (0): <undefined>, <undefined>
2019-03-12 07:20:15.505691: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1467] Found device 0 with properties:
name: TITAN Xp major: 6 minor: 1 memoryClockRate(GHz): 1.582
pciBusID: 0000:83:00.0
totalMemory: 11.91GiB freeMemory: 11.47GiB
2019-03-12 07:20:15.505741: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1546] Adding visible gpu devices: 0
2019-03-12 07:20:15.505958: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2019-03-12 07:20:15.513226: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1015] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-03-12 07:20:15.513284: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1021] 0
2019-03-12 07:20:15.514078: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1034] 0: N
2019-03-12 07:20:15.528644: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1149] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 11157 MB memory) -> physical GPU (device: 0, name: TITAN Xp, pci bus id: 0000:83:00.0, compute capability: 6.1)
>>>