最近入坑deepin操作系统,基于debian9,和Ubuntu具有一样的操作习惯,由于是武汉团队开发的操作的系统,在易用性上更胜一筹,常见的qq,微信,搜狗输入法(最近ubuntu18.04好像支持了中文输入,不用自己折腾),外观更漂亮。
在安装过程中第一个坑是nvidia驱动,在NVIDIA官网下载的cuda10.1.168自带390的驱动,题主没有用,而是直接下载最新的430驱动,安装驱动的过程中注意要先关掉开源驱动,使用tty2(ctrl+alt+F2)安装。
在https://www.geforce.cn/drivers/results/148530下载NVIDIA-Linux-x86_64-430.26.run文件,默认路径是~/Downloads
sudo vim /etc/modprobe.d/nvidia-installer-disable-nouveau.conf
blacklist nouveau
options nouveau modeset=0
sudo apt purge nvidia-*
sudo reboot
重新启动后打开tty2(ctrl+alt+F2)
sudo service lightdm stop
sudo service gdm stop
sudo service gdm3 stop
chmod +x ./NVIDIA-Linux-x86_64-430.26.run
sudo ./NVIDIA-Linux-x86_64-430.26.run
在deepin安装cuda10.1遇到大坑,按照NVIDIA官网的教程无法安装,查阅资料发现https://www.findhao.net/easycoding/2562.html, 使用sudo sh ~/Downloads/cuda_10.1.168_418.67_linux.run安装后,莫名开始按照/var/log/nvidia/.uninstallManifests里的所有文件卸载安装的部件。所以一个比较容易想到的解决思路就是让安装包对这个目录没有读写的权限。
在论坛里找了一个Failing to install 10.1 via .run file on RHEL7 as non-root不使用sudo安装toolkit的帖子。直接按照帖子里的方法在deepin上安装cuda toolkit即可:
cd ~/opt/
mkdir cuda10.1
cd ~/Downloads/
./cuda.run --silent --toolkit --toolkitpath=$HOME/opt/cuda10.1 --defaultroot=$HOME/opt/cuda10.1 --samples --samplespath=$HOME/
没有报错,就说明安装成功了。在系统的/usr/local/下创建toolkit10的链接,注意软连接的名字必须是/usr/local/cuda,不然以后用tensorflow源代码编译的时候找不到cuda的库文件
sudo ln -s ~/opt/cuda10.1 /usr/local/cuda
测试安装是否成功
cd ~/NVIDIA_CUDA-10.1_Samples/1_Utilities/deviceQuery
make
./deviceQuery
可以修改~/.bashrc或者/etc/profile,添加如下内容(记得重启)或者source /etc/profile和source ~/.bashrc:
$vim ~/.bashrc打开文件
##编辑~/.bashrc在最后加入
CUDA_HOME=/usr/local/cuda/
export PATH=$PATH:$CUDA_HOME/bin/
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CUDA_HOME/lib64
##检查nvcc是否成功
$nvcc -V
##出现如下显示成功
##nvcc: NVIDIA ® Cuda compiler driver
##Copyright © 2005-2019 NVIDIA Corporation
##Built on Wed_Apr_24_19:10:27_PDT_2019
##Cuda compilation tools, release 10.1, V10.1.168
Reference
##Deepin 15.10 安装cuda toolkit 10.1
https://www.findhao.net/easycoding/2562.html
##NVIDIA CUDA Installation Guide for Linux
https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html
##Deepin搭建深度学习环境(安装显卡驱动、cuda、cuDNN)
https://www.jianshu.com/p/faece1be1c87
##Failing to install 10.1 via .run file on RHEL7 as non-root
https://devtalk.nvidia.com/default/topic/1049111/cuda-setup-and-installation/failing-to-install-10-1-via-run-file-on-rhel7-as-non-root/post/5324336/#5324336
cudnn是NVIDIA专门为CNN,RNN开发的一套C++库文件,配合cuda使用,在用geforce 2070上比题主的AMD 2700X训练cifar10快20多倍。cudnn为增量式安装。安装比较容易,安装官方给的installation guide就好。
在NVIDIA https://developer.nvidia.com/rdp/cudnn-archive下载cudnn-10.1-linux-x64-v7.6.0.64.tgz,解压缩任意路径
$ tar -xvzf ~/Downloads/cudnn-10.1-linux-x64-v7.6.0.64.tgz
##解压后为cuda文件夹,直接拷贝到/usr/local/cuda对应的文件夹下面,因为前面已经添加了环境环境变量,无需再次加入
$ sudo cp cuda/include/cudnn.h /usr/local/cuda/include
$ sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
##增加权限
$ sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*
##验证是否安装成功(可选)
$ cp -r /usr/src/cudnn_samples_v7/ $HOME
##转向mnistcudnn文件夹
$ cd $HOME/cudnn_samples_v7/mnistCUDNN
##编译
$ make clean && make
$$ 运行
$ ./mnistCUDNN
###出现下面结果代表安装成功
Test passed!
最先我一直按照pip或者conda安装tensorflow,总是出现安装的tensorflow无法调用cuda的库文件,GPU无法调用,而CPU跑满的情况。问题出现在pip install tensorflows的版本是cuda10.1,cudnn为6.7.0,而anaconda的cudnn和cuda版本为6.7.0和10.0,每次都是找cudart.so.10.0等动态链接库文件,然而cuda下面的lib64文件夹下并没有10.0版本的动态链接库,最先想到的办法是手动软链接一下所需要的文件,但是解决这个问题后,
2019-07-09 08:31:13.153121: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2019-07-09 08:31:13.154096: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2019-07-09 08:31:13.155139: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2019-07-09 08:31:13.155290: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2019-07-09 08:31:13.156334: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2019-07-09 08:31:13.156926: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2019-07-09 08:31:13.159269: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
又来了新问题,引发
2019-07-09 08:34:16.727477: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2019-07-09 08:34:16.738390: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
对于这个问题,我提供一下参考链接:https://github.com/tensorflow/tensorflow/issues/6698
加入这样三行
import tensorflow as tf
config = tf.ConfigProto(gpu_options=tf.GPUOptions(allow_growth=True))
sess = tf.Session(config=config)
可以解决这个问题,但是用watch -n 0.5 nvidia-smi动态查看GPU依旧没有显示运行,但是GPU内存在消耗,不知道为什么。
放弃这条修补的思路后果断放弃deepin系统,转手刷了一个Ubuntu18.04,不用deepin那么,麻烦重新刷驱动,安装cuda10.1.168,cudnn7.6.1,pip 安装tensorflow_gpu1.4.0,依旧是
于是重新安装deepin10.15,想过用docker,省掉这些麻烦。但是docker只有debian8的包,nvidia-docker2出现
E: 无法定位软件包 nvidia-docker2或者其他和docker包不匹配的错误,放弃docker,囧
于是只有最后一条路,手动编译tensorflow的源代码,tensoeflow的官网有教程,我直接拷贝过来, https://tensorflow.google.cn/install/source#bazel_build
##安装 Python 和 TensorFlow 软件包依赖项
$ sudo apt install python-dev python-pip # or python3-dev python3-pip
##安装 TensorFlow pip 软件包依赖项(如果使用虚拟环境,请省略 --user 参数):
$ pip install -U --user pip six numpy wheel mock
$ pip install -U --user keras_applications1.0.6 --no-deps
$ pip install -U --user keras_preprocessing1.0.5 --no-deps
##这些依赖项列在 setup.py 文件中的 REQUIRED_PACKAGES 下。
##安装 Bazel
##安装 Bazel,它是用于编译 TensorFlow 的编译工具。
网址https://docs.bazel.build/versions/master/install-ubuntu.html
########################################################################################################################
########################################################################################################################
Step 1: 下载需要的依赖
需要这些包 pkg-config, zip, g++, zlib1g-dev, unzip, and python3.
$ sudo apt-get install pkg-config zip g++ zlib1g-dev unzip python3
Step 2: 下载 Bazel
https://github.com/bazelbuild/bazel/releases
找到 bazel-0.25.2-installer-linux-x86_64.sh down下来
Step 3: Run the installer
更改权限,安装
$ chmod +x /home/andrew/Downloads/bazel-0.25.2-installer-linux-x86_64.sh
$ /home/andrew/Downloads/bazel-0.25.2-installer-linux-x86_64.sh --user
Step 4: 设置环境变量,将 Bazel 可执行文件的位置添加到 PATH 环境变量中
$ vim ~/.bashrc
添加
export PATH=“ P A T H : PATH: PATH:HOME/bin”
##保存退出
$ source ~/.bashrc
。
#################################################################################################
##安装支持 GPU 的版本(可选,仅限 Linux)
##使用 Git 克隆 TensorFlow 代码库:
$ git clone https://github.com/tensorflow/tensorflow.git
$ cd tensorflow
##代码库默认为 master 开发分支。您也可以检出要编译的版本分支:
$ git checkout branch_name # r1.9, r1.10, etc.
##########################################################
使用分支编译会发生编码错误
Traceback (most recent call last):
File “third_party/gpus/find_cuda_config.py”, line 505, in
main()
File “third_party/gpus/find_cuda_config.py”, line 497, in main
for key, value in sorted(find_cuda_config().items()):
File “third_party/gpus/find_cuda_config.py”, line 456, in find_cuda_config
_get_default_cuda_paths(cuda_version))
File “third_party/gpus/find_cuda_config.py”, line 172, in _get_default_cuda_paths
] + _get_ld_config_paths()
File “third_party/gpus/find_cuda_config.py”, line 152, in _get_ld_config_paths
match = pattern.match(line.decode(“ascii”))
UnicodeDecodeError: ‘ascii’ codec can’t decode byte 0xe5 in position 0: ordinal not in range(128)
#########################################################
配置编译系统
通过在 TensorFlow 源代码树的根目录下运行以下命令来配置编译系统:
$ ./configure
##此脚本会提示您指定 TensorFlow 依赖项的位置,并要求指定其他编译配置选项(例如,编译器标记)。以下代码展示了 ./configure 的示例运行会话(您的会话可能会有所不同):
WARNING: --batch mode is deprecated. Please instead explicitly shut down your Bazel server using the command “bazel shutdown”.
You have bazel 0.25.2 installed.
Please specify the location of python. [Default is /home/andrew/anaconda3/envs/deep/bin/python]: /home/andrew/anaconda3/bin/python
Found possible Python library paths:
/home/andrew/anaconda3/lib/python3.7/site-packages
Please input the desired Python library path to use. Default is [/home/andrew/anaconda3/lib/python3.7/site-packages]
Do you wish to build TensorFlow with XLA JIT support? [Y/n]: n
No XLA JIT support will be enabled for TensorFlow.
Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]: n
No OpenCL SYCL support will be enabled for TensorFlow.
Do you wish to build TensorFlow with ROCm support? [y/N]: n
No ROCm support will be enabled for TensorFlow.
Do you wish to build TensorFlow with CUDA support? [y/N]: y
CUDA support will be enabled for TensorFlow.
Do you wish to build TensorFlow with TensorRT support? [y/N]: n
No TensorRT support will be enabled for TensorFlow.
Found CUDA 10.1 in:
/usr/local/cuda/lib64
/usr/local/cuda/include
Found cuDNN 7 in:
/usr/local/cuda/lib64
/usr/local/cuda/include
Please specify a list of comma-separated CUDA compute capabilities you want to build with.
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus.
Please note that each additional compute capability significantly increases your build time and binary size, and that TensorFlow only supports compute capabilities >= 3.5 [Default is: 7.5]:
Do you want to use clang as CUDA compiler? [y/N]: y
Clang will be used as CUDA compiler.
Do you wish to download a fresh release of clang? (Experimental) [y/N]: y
Clang will be downloaded and used to compile tensorflow.
Do you wish to build TensorFlow with MPI support? [y/N]: n
No MPI support will be enabled for TensorFlow.
Please specify optimization flags to use during compilation when bazel option “–config=opt” is specified [Default is -march=native -Wno-sign-compare]:
Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]: n
Not configuring the WORKSPACE for Android builds.
Preconfigured Bazel build configs. You can use any of the below by adding “–config=<>” to your build command. See .bazelrc for more details.
–config=mkl # Build with MKL support.
–config=monolithic # Config for mostly static monolithic build.
–config=gdr # Build with GDR support.
–config=verbs # Build with libverbs support.
–config=ngraph # Build with Intel nGraph support.
–config=numa # Build with NUMA support.
–config=dynamic_kernels # (Experimental) Build kernels into separate shared objects.
–config=v2 # Build TensorFlow 2.x instead of 1.x.
Preconfigured Bazel build configs to DISABLE default on features:
–config=noaws # Disable AWS S3 filesystem support.
–config=nogcp # Disable GCP support.
–config=nohdfs # Disable HDFS support.
–config=noignite # Disable Apache Ignite support.
–config=nokafka # Disable Apache Kafka support.
–config=nonccl # Disable NVIDIA NCCL support.
Configuration finished
##配置选项
##对于 GPU 支持,请指定 CUDA 和 cuDNN 的版本。如果您的系统安装了多个 CUDA 或 cuDNN 版本,请明确设置版本而不是依赖于默认版本。./configure 会创建指向系统 CUDA 库的符号链接,因此,如果您更新 CUDA 库路径,则必须在编译之前再次运行此配置步骤。
##对于编译优化标记,默认值 (-march=native) 会优化针对计算机的 CPU 类型生成的代码。但是,如果要针对不同类型的 CPU 编译 TensorFlow,请考虑指定一个更加具体的优化标记。要查看相关示例,请参阅 GCC 手册。
##您可以将一些预先配置好的编译配置添加到 bazel build 命令中,例如:
–config=mkl - 支持 Intel® MKL-DNN。
–config=monolithic - 此配置适用于基本保持静态的单体编译系统。
##注意:从 TensorFlow 1.6 开始,二进制文件使用 AVX 指令,这些指令可能无法在旧版 CPU 上运行。
##编译 pip 软件包
#######################################################################################################
Bazel 构建
仅支持 CPU
使用 bazel 构建仅支持 CPU 的 TensorFlow 软件包构建器:
$ bazel build --config=opt //tensorflow/tools/pip_package:build_pip_package
##########################################################################################################
##########################################################################################################
####GPU 支持
##要构建支持 GPU 的 TensorFlow 软件包构建器,请运行以下命令:
$ bazel build --config=opt --config=cuda //tensorflow/tools/pip_package:build_pip_package
##Bazel 编译选项
##从源代码编译 TensorFlow 可能会消耗大量内存。如果系统内存有限,请使用以下命令限制 Bazel 的内存消耗量:–local_resources 2048,.5,1.0。
##官方 TensorFlow 软件包是使用 GCC 4 编译的,并使用旧版 ABI。对于 GCC 5 及更高版本,为了使您的编译系统与旧版 ABI 兼容,请使用 --cxxopt="-D_GLIBCXX_USE_CXX11_ABI=0"。兼容 ABI 可确保针##对官方 TensorFlow pip 软件包编译的自定义操作继续支持使用 GCC 5 编译的软件包。
#####编译软件包
########################################################################################################
##########################################################################################################
bazel build 命令会创建一个名为 build_pip_package 的可执行文件,此文件是用于编译 pip 软件包的程序。请如下所示地运行该可执行文件,以在 /tmp/tensorflow_pkg 目录中编译 .whl 软件包。
要从某个版本分支编译,请使用以下目录:
$ ./bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
要从 master 编译,请使用 --nightly_flag 获取正确的依赖项:
$ ./bazel-bin/tensorflow/tools/pip_package/build_pip_package --nightly_flag /tmp/tensorflow_pkg
##尽管可以在同一个源代码树下编译 CUDA 和非 CUDA 配置,但建议您在同一个源代码树中的这两种配置之间切换时运行 bazel clean。
####################################################################################################
####################################################################################################
安装软件包
##生成的 .whl 文件的文件名取决于 TensorFlow 版本和您的平台。例如,使用 pip install 安装软件包:
$ pip install /tmp/tensorflow_pkg/tensorflow-version-tags.whl
(8)
##测试tensorflow_gpu
$ git clone https://github.com/tensorflow/models
$ cd ./models/tutorials/image/cifar10
$ python ./cifar10_train.py ##需要提前安装keras,tensorflow_datasets包
###我的结果如下
$ python cifar10_train.py
WARNING: Logging before flag parsing goes to stderr.
W0708 14:47:51.155282 139714882098944 deprecation_wrapper.py:118] From cifar10_train.py:44: The name tf.GPUOptions is deprecated. Please use tf.compat.v1.GPUOptions instead.
2019-07-08 14:47:51.156776: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2019-07-08 14:47:51.162123: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-07-08 14:47:51.162466: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: GeForce RTX 2070 major: 7 minor: 5 memoryClockRate(GHz): 1.725
pciBusID: 0000:1c:00.0
2019-07-08 14:47:51.162564: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2019-07-08 14:47:51.163519: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2019-07-08 14:47:51.164624: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2019-07-08 14:47:51.164775: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2019-07-08 14:47:51.165788: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2019-07-08 14:47:51.166329: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2019-07-08 14:47:51.168777: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2019-07-08 14:47:51.168901: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-07-08 14:47:51.169296: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-07-08 14:47:51.169617: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2019-07-08 14:47:51.169648: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2019-07-08 14:47:51.235868: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-07-08 14:47:51.235893: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165] 0
2019-07-08 14:47:51.235900: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0: N
2019-07-08 14:47:51.236021: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-07-08 14:47:51.236396: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-07-08 14:47:51.236765: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-07-08 14:47:51.237103: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 2657 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2070, pci bus id: 0000:1c:00.0, compute capability: 7.5)
W0708 14:47:51.357703 139714882098944 deprecation_wrapper.py:118] From cifar10_train.py:141: The name tf.app.run is deprecated. Please use tf.compat.v1.app.run instead.
W0708 14:47:51.358014 139714882098944 deprecation_wrapper.py:118] From cifar10_train.py:134: The name tf.gfile.Exists is deprecated. Please use tf.io.gfile.exists instead.
W0708 14:47:51.359087 139714882098944 deprecation_wrapper.py:118] From cifar10_train.py:77: The name tf.train.get_or_create_global_step is deprecated. Please use tf.compat.v1.train.get_or_create_global_step instead.
I0708 14:47:51.362449 139714882098944 dataset_builder.py:174] Overwrite dataset info from restored data version.
I0708 14:47:51.363972 139714882098944 dataset_builder.py:216] Reusing dataset cifar10 (/home/andrew/tensorflow_datasets/cifar10/1.0.2)
2019-07-08 14:47:51.415065: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-07-08 14:47:51.415417: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: GeForce RTX 2070 major: 7 minor: 5 memoryClockRate(GHz): 1.725
pciBusID: 0000:1c:00.0
2019-07-08 14:47:51.415442: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2019-07-08 14:47:51.415451: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2019-07-08 14:47:51.415458: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2019-07-08 14:47:51.415465: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2019-07-08 14:47:51.415475: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2019-07-08 14:47:51.415482: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2019-07-08 14:47:51.415489: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2019-07-08 14:47:51.415527: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-07-08 14:47:51.415858: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-07-08 14:47:51.416158: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
W0708 14:47:51.670379 139714882098944 deprecation_wrapper.py:118] From /home/andrew/anaconda3/envs/test_tensorflow_gpu_gamma/lib/python3.7/site-packages/tensorflow_core/python/autograph/converters/directives.py:119: The name tf.image.resize_image_with_crop_or_pad is deprecated. Please use tf.image.resize_with_crop_or_pad instead.
W0708 14:47:51.826221 139714882098944 deprecation.py:323] From /home/andrew/anaconda3/envs/test_tensorflow_gpu_gamma/lib/python3.7/site-packages/tensorflow_core/python/ops/image_ops_impl.py:1526: div (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Deprecated in favor of operator or tf.math.divide.
W0708 14:47:51.833786 139714882098944 deprecation.py:323] From /home/andrew/Downloads/models-master/tutorials/image/cifar10/cifar10_input.py:45: DatasetV1.make_one_shot_iterator (from tensorflow.python.data.ops.dataset_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use for ... in dataset:
to iterate over a dataset. If using tf.estimator
, return the Dataset
object directly from your input function. As a last resort, you can use tf.compat.v1.data.make_one_shot_iterator(dataset)
.
W0708 14:47:51.846651 139714882098944 deprecation_wrapper.py:118] From /home/andrew/Downloads/models-master/tutorials/image/cifar10/cifar10_input.py:48: The name tf.summary.image is deprecated. Please use tf.compat.v1.summary.image instead.
W0708 14:47:51.848567 139714882098944 deprecation.py:506] From /home/andrew/Downloads/models-master/tutorials/image/cifar10/cifar10.py:126: calling TruncatedNormal.init (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
W0708 14:47:51.869967 139714882098944 deprecation_wrapper.py:118] From /home/andrew/Downloads/models-master/tutorials/image/cifar10/cifar10.py:190: The name tf.nn.max_pool is deprecated. Please use tf.nn.max_pool2d instead.
I0708 14:47:51.990282 139714882098944 summary_op_util.py:66] Summary name local3/weight_loss (raw) is illegal; using local3/weight_loss__raw_ instead.
I0708 14:47:51.991768 139714882098944 summary_op_util.py:66] Summary name local4/weight_loss (raw) is illegal; using local4/weight_loss__raw_ instead.
I0708 14:47:51.993656 139714882098944 summary_op_util.py:66] Summary name cross_entropy (raw) is illegal; using cross_entropy__raw_ instead.
I0708 14:47:51.995055 139714882098944 summary_op_util.py:66] Summary name total_loss (raw) is illegal; using total_loss__raw_ instead.
W0708 14:47:52.088509 139714882098944 deprecation.py:323] From /home/andrew/anaconda3/envs/test_tensorflow_gpu_gamma/lib/python3.7/site-packages/tensorflow_core/python/training/moving_averages.py:433: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.
I0708 14:47:52.242822 139714882098944 basic_session_run_hooks.py:541] Create CheckpointSaverHook.
W0708 14:47:52.373944 139714882098944 deprecation.py:323] From /home/andrew/anaconda3/envs/test_tensorflow_gpu_gamma/lib/python3.7/site-packages/tensorflow_core/python/ops/array_ops.py:1354: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
I0708 14:47:52.431926 139714882098944 monitored_session.py:240] Graph was finalized.
2019-07-08 14:47:52.432215: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-07-08 14:47:52.432602: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: GeForce RTX 2070 major: 7 minor: 5 memoryClockRate(GHz): 1.725
pciBusID: 0000:1c:00.0
2019-07-08 14:47:52.432634: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2019-07-08 14:47:52.432644: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2019-07-08 14:47:52.432654: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2019-07-08 14:47:52.432662: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2019-07-08 14:47:52.432671: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2019-07-08 14:47:52.432679: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2019-07-08 14:47:52.432689: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2019-07-08 14:47:52.432730: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-07-08 14:47:52.433084: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-07-08 14:47:52.433403: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2019-07-08 14:47:52.433425: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-07-08 14:47:52.433431: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165] 0
2019-07-08 14:47:52.433437: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0: N
2019-07-08 14:47:52.433499: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-07-08 14:47:52.433855: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-07-08 14:47:52.434182: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 2657 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2070, pci bus id: 0000:1c:00.0, compute capability: 7.5)
I0708 14:47:52.939148 139714882098944 session_manager.py:500] Running local_init_op.
I0708 14:47:52.959544 139714882098944 session_manager.py:502] Done running local_init_op.
I0708 14:47:53.314375 139714882098944 basic_session_run_hooks.py:606] Saving checkpoints for 0 into /tmp/cifar10_train/model.ckpt.
2019-07-08 14:47:53.678739: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2019-07-08 14:47:53.981668: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
##根据GPU的性能,大约1-2h或者更多时间跑完cifar10_train.py
之后
$ python ./cifar10_eval.py
完整跑完大概86%的准确率。