在进行TensorFlow测试,想得到最优的性能,学习TensorFlow相关文档,源码编译会得到更好的性能
Building and installing from source
The default TensorFlow binaries target the broadest range of hardware to make TensorFlow accessible to everyone. If using CPUs for training or inference, it is recommended to compile TensorFlow with all of the optimizations available for the CPU in use. Speedups for training and inference on CPU are documented below in Comparing compiler optimizations.
To install the most optimized version of TensorFlow, @{$install_sources$build and install} from source If there is a need to build TensorFlow on a platform that has different hardware than the target, then cross-compile with the highest optimizations for the target platform. The following command is an example of using bazel
to compile for a specific platform:
# This command optimizes for Intel’s Broadwell processor
bazel build -c opt --copt=-march="broadwell" --config=cuda //tensorflow/tools/pip_package:build_pip_package
Environment, build, and install tips
./configure
asks which compute capability to include in the build. This does not impact overall performance but does impact initial startup. After running TensorFlow once, the compiled kernels are cached by CUDA. If using a docker container, the data is not cached and the penalty is paid each time TensorFlow starts. The best practice is to include thecompute capabilities of the GPUs that will be used, e.g. P100: 6.0, Titan X (Pascal): 6.1, Titan X (Maxwell): 5.2, and K80: 3.7.
- Use a version of gcc that supports all of the optimizations of the target CPU. The recommended minimum gcc version is 4.8.3. On OS X, upgrade to the latest Xcode version and use the version of clang that comes with Xcode.
- Install the latest stable CUDA platform and cuDNN libraries supported by TensorFlow.
(https://github.com/tensorflow/tensorflow/blob/master/tensorflow/docs_src/performance/performance_guide.md#building-and-installing-from-source)
因此考虑源码安装
========================================================================
下面放出我的安装过程,写的不太规范,也有吐槽的。。
1. 安装bazel
参考官网 https://docs.bazel.build/versions/master/install-ubuntu.html
其实首先是想按照 Installing using binary installer 方式安装的,奈何下载***.sh这个文件实在太慢,因此改为Using Bazel custom APT repository方式安装
bazel-install.sh
#!/bin/bash
echo "Install JDK 8"
./jdk8-install.sh
echo "deb [arch=amd64] http://storage.googleapis.com/bazel-apt stable jdk1.8" | tee /etc/apt/sources.list.d/bazel.list
curl https://bazel.build/bazel-release.pub.gpg | apt-key add -
echo "Install and update Bazel"
apt-get update && apt-get install bazel
echo "bazel version:"
bazel version
jdk8-install.sh
#!/bin/bash
apt-get install openjdk-8-jdk
java -version
2. TF 依赖包
参考 https://www.jianshu.com/p/636c6477250a
check.sh
#!/bin/bash
apt-get update&&sudo apt-get install -y \
build-essential \
curl \
libcurl3-dev \
git \
libfreetype6-dev \
libpng12-dev \
libzmq3-dev \
pkg-config \
python-dev \
python-numpy \
python-pip \
software-properties-common \
swig \
zip \
zlib1g-dev
添加环境变量到 ~/.bashrc
export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/cuda-8.0/extras/CUPTI/lib64"
这个其实我也不知道有啥用,但参考好多博客里都写上去了,我也就加上去了。。。
另外,我原本的环境变量是
export PATH="$PATH:/usr/local/cuda-8.0/bin"
export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/cuda-8.0/lib64:/usr/local/lib"
3. 源码安装TensorFlow
其实参考了好多博主的博客,因为比较多就不在这里放了,不过其实TensorFlow官方写的也比较详细了
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/docs_src/install/install_sources.md
(1)下载源码
git clone --recurse-submodules http://github.com/tensorflow/tensorflow
其中–recurse-submodules 参数是必须的, 用于获取 TesorFlow 依赖的 protobuf 库.
(2)配置TensorFlow
cd ~/tensorflow
./configure
下边有一个选项是关于选择CPU类型的:
Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native]
This question refers to a later phase in which you'll use bazel to build the pip package. We recommend accepting the default (-march=native
), which will optimize the generated code for your local machine's CPU type. However, if you are building TensorFlow on one CPU type but will run TensorFlow on a different CPU type, then consider specifying a more specific optimization flag as described in the gcc documentation.
(https://github.com/tensorflow/tensorflow/blob/master/tensorflow/docs_src/install/install_sources.md#configure-the-installation)
我查看相关文档后,说的意思应该是如果将来是在同一个类型的CPU上进行运行,就可以默认,如果是在其他CPU上运行就要指定CPU类型。因此这里我是默认的。如果有不对的,请指正。。。
以下是我安装的选择
##############################################################################
root@iZhp31vdzy8zu7m6eor9djZ:~/tensorflow# ./configure
You have bazel 0.9.0 installed.
Please specify the location of python. [Default is /usr/bin/python]: /usr/bin/python
Found possible Python library paths:
/usr/local/lib/python2.7/dist-packages
/usr/lib/python2.7/dist-packages
Please input the desired Python library path to use. Default is [/usr/local/lib/python2.7/dist-packages]
Do you wish to build TensorFlow with jemalloc as malloc support? [Y/n]: y
jemalloc as malloc support will be enabled for TensorFlow.
Do you wish to build TensorFlow with Google Cloud Platform support? [Y/n]: n
No Google Cloud Platform support will be enabled for TensorFlow.
Do you wish to build TensorFlow with Hadoop File System support? [Y/n]: n
No Hadoop File System support will be enabled for TensorFlow.
Do you wish to build TensorFlow with Amazon S3 File System support? [Y/n]: n
No Amazon S3 File System support will be enabled for TensorFlow.
Do you wish to build TensorFlow with XLA JIT support? [y/N]: y
XLA JIT support will be enabled for TensorFlow.
Do you wish to build TensorFlow with GDR support? [y/N]: y
GDR support will be enabled for TensorFlow.
Do you wish to build TensorFlow with VERBS support? [y/N]: y
VERBS support will be enabled for TensorFlow.
Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]: n
No OpenCL SYCL support will be enabled for TensorFlow.
Do you wish to build TensorFlow with CUDA support? [y/N]: y
CUDA support will be enabled for TensorFlow.
Please specify the CUDA SDK version you want to use, e.g. 7.0. [Leave empty to default to CUDA 9.0]: 8.0
Please specify the location where CUDA 8.0 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: /usr/local/cuda-8.0
Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 7.0]: 6
Please specify the location where cuDNN 6 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda-8.0]:
Please specify a list of comma-separated Cuda compute capabilities you want to build with.
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus.
Please note that each additional compute capability significantly increases your build time and binary size. [Default is: 6.0,6.0,6.0,6.0,6.0,6.0,6.0,6.0]
Do you want to use clang as CUDA compiler? [y/N]: n
nvcc will be used as CUDA compiler.
Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]:
Do you wish to build TensorFlow with MPI support? [y/N]: n
(中间改成过y,但后来出错就又改为了n,后边有写)
No MPI support will be enabled for TensorFlow.
Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native]:
Add "--config=mkl" to your bazel command to build with MKL support.
(实际编译时并没有加mkl。。)
Please note that MKL on MacOS or windows is still not supported.
If you would like to use a local MKL instead of downloading, please set the environment variable "TF_MKL_ROOT" every time before build.
Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]: n
Not configuring the WORKSPACE for Android builds.
Configuration finished
####################################################################
(3)Build the pip package
bazel build --cxxopt="-D_GLIBCXX_USE_CXX11_ABI=0" --config=opt --config=cuda //tensorflow/tools/pip_package:build_pip_package
其中 -D_GLIBCXX_USE_CXX11_ABI=0 是在GCC 5 及以后的版本中都要指定的。
NOTE on gcc 5 or later: the binary pip packages available on the TensorFlow website are built with gcc 4, which uses the older ABI. To make your build compatible with the older ABI, you need to add --cxxopt="-D_GLIBCXX_USE_CXX11_ABI=0" to your bazel build command. ABI compatibility allows custom ops built against the TensorFlow pip package to continue to work against your built package.
==========================================
2018.1.19日更新,今天又重新编译了一次TensorFlow,编译命令如下:
bazel build -c opt --copt=-march="haswell" --config=cuda //tensorflow/tools/pip_package:build_pip_package
这次没有加-D_GLIBCXX_USE_CXX11_ABI=0这个选项,没有出错。
===========================================
然后就是各种出错(第一次编译时的问题)
1)
########################################################################################
WARNING: /root/tensorflow/tensorflow/core/BUILD:1808:1: in includes attribute of cc_library rule //tensorflow/core:framework_headers_lib: '../../external/nsync/public' resolves to 'external/nsync/public' not below the relative path of its package 'tensorflow/core'. This will be an error in the future. Since this rule was created by the macro 'cc_header_only_library', the error might have been caused by the macro implementation in /root/tensorflow/tensorflow/tensorflow.bzl:1152:30
ERROR: /root/tensorflow/tensorflow/tools/pip_package/BUILD:103:1: no such package '@llvm//': java.io.IOException: Error downloading [https://mirror.bazel.build/github.com/llvm-mirror/llvm/archive/7e6fcc775f56cdeeae061f6f8071f5c103087330.tar.gz, https://github.com/llvm-mirror/llvm/archive/7e6fcc775f56cdeeae061f6f8071f5c103087330.tar.gz] to /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/external/llvm/7e6fcc775f56cdeeae061f6f8071f5c103087330.tar.gz: Checksum was 43b4ca39395aece21d5755ada9008eedaca434ccf0b38c2ed6709d45803813fb but wanted 9478274a10d7f487e7ad878c8eec30398a54e07eb148867711cd9c6fe7ff5f59 and referenced by '//tensorflow/tools/pip_package:licenses'
ERROR: Analysis of target '//tensorflow/tools/pip_package:build_pip_package' failed; build aborted: no such package '@llvm//': java.io.IOException: Error downloading [https://mirror.bazel.build/github.com/llvm-mirror/llvm/archive/7e6fcc775f56cdeeae061f6f8071f5c103087330.tar.gz, https://github.com/llvm-mirror/llvm/archive/7e6fcc775f56cdeeae061f6f8071f5c103087330.tar.gz] to /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/external/llvm/7e6fcc775f56cdeeae061f6f8071f5c103087330.tar.gz: Checksum was 43b4ca39395aece21d5755ada9008eedaca434ccf0b38c2ed6709d45803813fb but wanted 9478274a10d7f487e7ad878c8eec30398a54e07eb148867711cd9c6fe7ff5f59
INFO: Elapsed time: 608.452s
########################################################################################
试过修改 tensorflow/workspace.bzl,参考
https://www.kaijia.me/2017/09/sha256-checksum-error-while-compiling-tensorflow-1-3-temporary-fixs/
tf_http_archive(
name = "llvm",
urls = [
"https://mirror.bazel.build/github.com/llvm-mirror/llvm/archive/7e6fcc775f56cdeeae061f6f8071f5c103087330.tar.gz",
"https://github.com/llvm-mirror/llvm/archive/7e6fcc775f56cdeeae061f6f8071f5c103087330.tar.gz",
],
sha256 = "43b4ca39395aece21d5755ada9008eedaca434ccf0b38c2ed6709d45803813fb",
#sha256 = "9478274a10d7f487e7ad878c8eec30398a54e07eb148867711cd9c6fe7ff5f59", #修改
strip_prefix = "llvm-7e6fcc775f56cdeeae061f6f8071f5c103087330",
build_file = str(Label("//third_party/llvm:llvm.BUILD")),
)
########################################################################################
但其实没有作用,仍然报错
########################################################################################
root@iZhp31vdzy8zu7m6eor9djZ:~/tensorflow# bazel build --config=opt --config=cuda --cxxopt="-D_GLIBCXX_USE_CXX11_ABI=0" //tensorflow/tools/pip_package:build_pip_package
WARNING: /root/tensorflow/tensorflow/core/BUILD:1808:1: in includes attribute of cc_library rule //tensorflow/core:framework_headers_lib: '../../external/nsync/public' resolves to 'external/nsync/public' not below the relative path of its package 'tensorflow/core'. This will be an error in the future. Since this rule was created by the macro 'cc_header_only_library', the error might have been caused by the macro implementation in /root/tensorflow/tensorflow/tensorflow.bzl:1152:30
ERROR: /root/tensorflow/tensorflow/tools/pip_package/BUILD:103:1: no such package '@llvm//': java.io.IOException: Error downloading [https://mirror.bazel.build/github.com/llvm-mirror/llvm/archive/7e6fcc775f56cdeeae061f6f8071f5c103087330.tar.gz, https://github.com/llvm-mirror/llvm/archive/7e6fcc775f56cdeeae061f6f8071f5c103087330.tar.gz] to /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/external/llvm/7e6fcc775f56cdeeae061f6f8071f5c103087330.tar.gz: Premature EOF and referenced by '//tensorflow/tools/pip_package:licenses'
ERROR: Analysis of target '//tensorflow/tools/pip_package:build_pip_package' failed; build aborted: no such package '@llvm//': java.io.IOException: Error downloading [https://mirror.bazel.build/github.com/llvm-mirror/llvm/archive/7e6fcc775f56cdeeae061f6f8071f5c103087330.tar.gz, https://github.com/llvm-mirror/llvm/archive/7e6fcc775f56cdeeae061f6f8071f5c103087330.tar.gz] to /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/external/llvm/7e6fcc775f56cdeeae061f6f8071f5c103087330.tar.gz: Premature EOF
INFO: Elapsed time: 461.853s
########################################################################################
后来bazel clean后重新bazel build后,错误又反过来了。。。呵呵呵呵。。。
########################################################################################
WARNING: /root/tensorflow/tensorflow/core/BUILD:1808:1: in includes attribute of cc_library rule //tensorflow/core:framework_headers_lib: '../../external/nsync/public' resolves to 'external/nsync/public' not below the relative path of its package 'tensorflow/core'. This will be an error in the future. Since this rule was created by the macro 'cc_header_only_library', the error might have been caused by the macro implementation in /root/tensorflow/tensorflow/tensorflow.bzl:1152:30
ERROR: /root/tensorflow/tensorflow/tools/pip_package/BUILD:103:1: no such package '@llvm//': java.io.IOException: Error downloading [https://mirror.bazel.build/github.com/llvm-mirror/llvm/archive/7e6fcc775f56cdeeae061f6f8071f5c103087330.tar.gz, https://github.com/llvm-mirror/llvm/archive/7e6fcc775f56cdeeae061f6f8071f5c103087330.tar.gz] to /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/external/llvm/7e6fcc775f56cdeeae061f6f8071f5c103087330.tar.gz: Checksum was 9478274a10d7f487e7ad878c8eec30398a54e07eb148867711cd9c6fe7ff5f59 but wanted 43b4ca39395aece21d5755ada9008eedaca434ccf0b38c2ed6709d45803813fb and referenced by '//tensorflow/tools/pip_package:licenses'
ERROR: Analysis of target '//tensorflow/tools/pip_package:build_pip_package' failed; build aborted: no such package '@llvm//': java.io.IOException: Error downloading [https://mirror.bazel.build/github.com/llvm-mirror/llvm/archive/7e6fcc775f56cdeeae061f6f8071f5c103087330.tar.gz, https://github.com/llvm-mirror/llvm/archive/7e6fcc775f56cdeeae061f6f8071f5c103087330.tar.gz] to /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/external/llvm/7e6fcc775f56cdeeae061f6f8071f5c103087330.tar.gz: Checksum was 9478274a10d7f487e7ad878c8eec30398a54e07eb148867711cd9c6fe7ff5f59 but wanted 43b4ca39395aece21d5755ada9008eedaca434ccf0b38c2ed6709d45803813fb
########################################################################################
然后就又把tensorflow/workspace.bzl里改的又改回来了。。。然后就莫名的可以了。。。呵呵呵呵
########################################################################################
2)ERROR: /root/tensorflow/tensorflow/contrib/lite/toco/BUILD:328:1: Linking of rule '//tensorflow/contrib/lite/toco:toco' failed (Exit 1): crosstool_wrapper_driver_is_not_gcc failed: error executing command
参考 https://github.com/tensorflow/tensorflow/issues/13481
解决方法:(把 ‘/usr/local/cuda-8.0/lib64’替换成自己系统的cuda目录)
sh -c "echo '/usr/local/cuda-8.0/lib64' >> /etc/ld.so.conf.d/nvidia.conf"
ldconfig
3)找不到 mpi.h
cd third_party/mpi
ll
看到 mpicxx.h mpi.h mpio.h的链接都不对。
rm -rf mpicxx.h
rm -rf mpi.h
rm -rf mpio.h
ln -s /usr/include/mpi/openmpi/ompi/mpi/cxx/mpicxx.h mpicxx.h
ln -s /usr/include/mpi/mpi.h mpi.h
但是 这时发现。。。系统中找不到 mpio.h。。
就直接重新./configure 把支持MPI给关掉了。。。
然后再重新bazel build
4)gdr_memory_manager.h找不到rdma/rdma.h
这篇文章里有解决办法 https://www.cnblogs.com/dyufei/p/8027517.html
编译时出现如下错误:
ERROR: /home/duanyufei/source/TensorFlow/tensorflow/tensorflow/contrib/gdr/BUILD:52:1: C++ compilation of rule '//tensorflow/contrib/gdr:gdr_memory_manager' failed (Exit 1)
tensorflow/contrib/gdr/gdr_memory_manager.cc:28:27: fatal error: rdma/rdma_cma.h: No such file or directory
compilation terminated.
Target //tensorflow/tools/pip_package:build_pip_package failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 323.279s, Critical Path: 33.69s
FAILED: Build did NOT complete successfully
解决办法
sudo apt-get install librdmacm-dev
==========================
中间好像还有其他错、、不太记得了
成功之后,进行下边这一步,生成pip package
bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
(5)Install the pip package
pip install /tmp/tensorflow_pkg/tensorflow-1.4.0-cp27-cp27mu-linux_x86_64.whl
为啥从GitHub拉下来的源码是1.4.0呢??现在明明都是发布了TensorFlow 1.5.0-rc0了。。
然后试验是否安装成功了。。。
但是 这时 import tensorflow as tf 会出错
将export PYTHONPATH="$PYTHONPATH:/usr/local/lib/python2.7/dist-packages/tensorflow" 加入环境变量就好了
测试:
git clone http://github.com/tensorflow/benchmarks
cd benchmarks/scripts/tf_cnn_benchmarks/
python tf_cnn_benchmarks.py
TensorFlow: 1.4
Model: trivial
Dataset: imagenet (synthetic)
Mode: training
SingleSess: False
Batch size: 32 global
32 per device
Num batches: 100
Num epochs: 0.00
Devices: ['/gpu:0']
Data format: NCHW
Layout optimizer: False
Optimizer: sgd
Variables: parameter_server
==========
Generating model
WARNING:tensorflow:From /root/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:1260: __init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.MonitoredTrainingSession
2018-01-04 20:50:08.345697: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:898] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-01-04 20:50:08.346369: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1206] Found device 0 with properties:
name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285
pciBusID: 0000:00:09.0
totalMemory: 15.89GiB freeMemory: 15.60GiB
2018-01-04 20:50:08.461438: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:898] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-01-04 20:50:08.462105: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1206] Found device 1 with properties:
name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285
pciBusID: 0000:00:0a.0
totalMemory: 15.89GiB freeMemory: 15.60GiB
2018-01-04 20:50:08.582272: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:898] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-01-04 20:50:08.582941: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1206] Found device 2 with properties:
name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285
pciBusID: 0000:00:0b.0
totalMemory: 15.89GiB freeMemory: 15.60GiB
2018-01-04 20:50:08.707436: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:898] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-01-04 20:50:08.708132: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1206] Found device 3 with properties:
name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285
pciBusID: 0000:00:0c.0
totalMemory: 15.89GiB freeMemory: 15.60GiB
2018-01-04 20:50:08.838647: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:898] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-01-04 20:50:08.839324: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1206] Found device 4 with properties:
name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285
pciBusID: 0000:00:0d.0
totalMemory: 15.89GiB freeMemory: 15.60GiB
2018-01-04 20:50:09.003227: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:898] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-01-04 20:50:09.003897: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1206] Found device 5 with properties:
name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285
pciBusID: 0000:00:0e.0
totalMemory: 15.89GiB freeMemory: 15.60GiB
2018-01-04 20:50:09.143724: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:898] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-01-04 20:50:09.144392: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1206] Found device 6 with properties:
name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285
pciBusID: 0000:00:0f.0
totalMemory: 15.89GiB freeMemory: 15.60GiB
2018-01-04 20:50:09.295404: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:898] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-01-04 20:50:09.296093: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1206] Found device 7 with properties:
name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285
pciBusID: 0000:00:10.0
totalMemory: 15.89GiB freeMemory: 15.60GiB
2018-01-04 20:50:09.309124: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1221] Device peer to peer matrix
2018-01-04 20:50:09.309382: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1227] DMA: 0 1 2 3 4 5 6 7
2018-01-04 20:50:09.309399: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1237] 0: Y Y Y Y N N N N
2018-01-04 20:50:09.309412: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1237] 1: Y Y Y Y N N N N
2018-01-04 20:50:09.309426: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1237] 2: Y Y Y Y N N N N
2018-01-04 20:50:09.309436: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1237] 3: Y Y Y Y N N N N
2018-01-04 20:50:09.309450: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1237] 4: N N N N Y Y Y Y
2018-01-04 20:50:09.309468: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1237] 5: N N N N Y Y Y Y
2018-01-04 20:50:09.309478: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1237] 6: N N N N Y Y Y Y
2018-01-04 20:50:09.309485: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1237] 7: N N N N Y Y Y Y
2018-01-04 20:50:09.309506: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1300] Adding visible gpu device 0
2018-01-04 20:50:09.309518: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1300] Adding visible gpu device 1
2018-01-04 20:50:09.309525: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1300] Adding visible gpu device 2
2018-01-04 20:50:09.309540: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1300] Adding visible gpu device 3
2018-01-04 20:50:09.309545: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1300] Adding visible gpu device 4
2018-01-04 20:50:09.309554: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1300] Adding visible gpu device 5
2018-01-04 20:50:09.309559: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1300] Adding visible gpu device 6
2018-01-04 20:50:09.309568: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1300] Adding visible gpu device 7
2018-01-04 20:50:11.400919: I tensorflow/core/common_runtime/gpu/gpu_device.cc:987] Creating TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 15132 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:09.0, compute capability: 6.0)
2018-01-04 20:50:11.568485: I tensorflow/core/common_runtime/gpu/gpu_device.cc:987] Creating TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 15132 MB memory) -> physical GPU (device: 1, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:0a.0, compute capability: 6.0)
2018-01-04 20:50:11.736890: I tensorflow/core/common_runtime/gpu/gpu_device.cc:987] Creating TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:2 with 15132 MB memory) -> physical GPU (device: 2, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:0b.0, compute capability: 6.0)
2018-01-04 20:50:11.904241: I tensorflow/core/common_runtime/gpu/gpu_device.cc:987] Creating TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:3 with 15132 MB memory) -> physical GPU (device: 3, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:0c.0, compute capability: 6.0)
2018-01-04 20:50:12.071824: I tensorflow/core/common_runtime/gpu/gpu_device.cc:987] Creating TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:4 with 15132 MB memory) -> physical GPU (device: 4, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:0d.0, compute capability: 6.0)
2018-01-04 20:50:12.238944: I tensorflow/core/common_runtime/gpu/gpu_device.cc:987] Creating TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:5 with 15132 MB memory) -> physical GPU (device: 5, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:0e.0, compute capability: 6.0)
2018-01-04 20:50:12.406061: I tensorflow/core/common_runtime/gpu/gpu_device.cc:987] Creating TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:6 with 15132 MB memory) -> physical GPU (device: 6, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:0f.0, compute capability: 6.0)
2018-01-04 20:50:12.573243: I tensorflow/core/common_runtime/gpu/gpu_device.cc:987] Creating TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:7 with 15132 MB memory) -> physical GPU (device: 7, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:10.0, compute capability: 6.0)
Running warm up
Done warm up
Step Img/sec loss
1 images/sec: 6762.6 +/- 0.0 (jitter = 0.0) 7.052
10 images/sec: 6646.3 +/- 28.4 (jitter = 110.7) 7.052
20 images/sec: 6736.3 +/- 35.7 (jitter = 178.8) 7.052
30 images/sec: 6792.7 +/- 29.8 (jitter = 167.5) 7.052
40 images/sec: 6838.3 +/- 26.4 (jitter = 170.0) 7.052
50 images/sec: 6854.3 +/- 22.3 (jitter = 146.0) 7.052
60 images/sec: 6876.7 +/- 20.1 (jitter = 150.5) 7.052
70 images/sec: 6885.0 +/- 17.9 (jitter = 142.9) 7.052
80 images/sec: 6888.5 +/- 16.5 (jitter = 141.8) 7.052
90 images/sec: 6882.8 +/- 15.5 (jitter = 147.1) 7.052
100 images/sec: 6886.2 +/- 14.4 (jitter = 137.2) 7.052
----------------------------------------------------------------
total images/sec: 6758.69
----------------------------------------------------------------
成功