华为MateBook 13(GeForce MX250)+UbuntuKylin 18.04+TensorFlow(GPU)+CUDA9.1离线源码安装出坑记

华为MateBook 13(GeForce MX250)+UbuntuKylin 18.04+TensorFlow(GPU)+CUDA9.1离线源码安装出坑记

  • 1 环境配置
  • 2 依赖库安装
  • 3 编译安装TensorFlow
    • 3.1 像亲人般的./configure
    • 3.2 完全陌生的bazel build
    • 3.3 期待已久的pip3 install
  • 4 测试验证
  • 5 避坑建议

惊闻隔离15天后还需继续隔离3天等待核酸检测,之后再等24小时,结果出来后方能重获自由。这计划外的3+1,让人到中年的我突然有种余生还很长的错觉。既然如此,且行且珍惜,就把前期跳进去的TensorFlow-GPU离线安装的坑填上吧。一千个人眼中有一千个哈姆雷特,这说的就是Linux下的离线源码安装,对于TensorFlow-GPU来说,那至少得有3千个吧。省略一万行眼泪,先把避坑心法记录下来。只说一句,源码安装和离线源码安装不是一个级别的坑。源码安装的坑多踩几次油门都能冲过去,但如果你赶时间,离线、源码安装的坑还是能绕则绕吧。心法也不能乱用,走火入魔,悔之晚矣。好像已经说了3句了。

1 环境配置

计算机:华为MateBook13
CPU:Core i7 8th Gen,8核,支持AVX2指令集
内存:8G
显卡:NVIDIA GeoForce MX250,2G存储,384 CUDA核,compute capability不详(这样的显卡也值得用大招吗?)
操作系统:Ubuntu Kylin 18.04,内核5.3.0
GCC:7.5.0
Python:3.6.9
显卡驱动:xserver-xorg-video-nvidia-440
CUDA:9.1(不要小看这个“.1”,它就是万恶之源)

2 依赖库安装

坑的来源就是依赖多,且版本要求严格。这些依赖中,版本改变后风险最大的是显卡驱动,当之无愧的万坑之王,所以将NVIDIA 440驱动作为版本基准,TensorFlow向其适配。TensorFlow主要的版本依赖关系包括:CPU指令集、显卡驱动、GCC、Python、CUDA、cuDNN、NCCL、Bazel。心法如下:
(★表示需要下载,其他表示从Ubuntu Kylin 18.04软件仓库安装即可,省略了一些,缺啥补啥药到病除。什么?没有离线仓库?那请随便看看。)

TensorFlow★:1.12.0
GCC:4.8.5,眼泪和时间说明这个非常重要,此外千万不要忘了g++-4.8这个大兄弟
cuDNN★:7.1.3.16,包括dev库
NCCL★:2.1.15,包括dev库,安装方式参考NVIDIA官方说明,类似于软件仓库安装
CUDA相关:libnvidia-compute-440:amd64、libnvidia-common-440、nvidia-cuda-toolkit、libcudart9.1:amd64、libcupti-dev:amd64、libcupti9.1:amd64
Bazel★:0.18.0
numpy★:1.18.2,Python3.6
absl-py★:0.2.0,Python3.6
astor★:0.7.0,Python3.6
gast★:0.2.0,Python3.6
grpcio★:1.8.6,Python3.6
Keras_Application★:1.0.6,Python3
Keras_Preprocessing★:1.0.5,Python3
protobuf★:3.7.0,Python3.6
tensorboard★:1.12.0,Python3

3 编译安装TensorFlow

3.1 像亲人般的./configure

源码解压后,进入源码目录,运行./configure进行配置,选项如下:

You have bazel 0.18.0 installed.
Please specify the location of python. [Default is /usr/bin/python]: /usr/bin/python3
Found possible Python library paths:
  /usr/lib/python3/dist-packages
  /usr/local/lib/python3.6/dist-packages
Please input the desired Python library path to use.  Default is [/usr/lib/python3/dist-packages]
Do you wish to build TensorFlow with Apache Ignite support? [Y/n]: n
No Apache Ignite support will be enabled for TensorFlow.
Do you wish to build TensorFlow with XLA JIT support? [Y/n]: n
No XLA JIT support will be enabled for TensorFlow.
Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]: 
No OpenCL SYCL support will be enabled for TensorFlow.
Do you wish to build TensorFlow with ROCm support? [y/N]: 
No ROCm support will be enabled for TensorFlow.
Do you wish to build TensorFlow with CUDA support? [y/N]: y
CUDA support will be enabled for TensorFlow.
Please specify the CUDA SDK version you want to use. [Leave empty to default to CUDA 9.0]: 9.1
Please specify the location where CUDA 9.1 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: /usr
Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 7]: 
Please specify the location where cuDNN 7 library is installed. Refer to README.md for more details. [Default is /usr]: /usr/lib/x86_64-linux-gnu
Do you wish to build TensorFlow with TensorRT support? [y/N]: 
No TensorRT support will be enabled for TensorFlow.
Please specify the NCCL version you want to use. If NCCL 2.2 is not installed, then you can use version 1.3 that can be fetched automatically but it may have worse performance with multiple GPUs. [Default is 2.2]: 2.1.15
NCCL libraries found in /usr/lib/x86_64-linux-gnu/libnccl.so
This looks like a system path.
Assuming NCCL header path is /usr/include
Please specify a list of comma-separated Cuda compute capabilities you want to build with.
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus.
Please note that each additional compute capability significantly increases your build time and binary size. [Default is: 3.5,7.0]: 
Do you want to use clang as CUDA compiler? [y/N]: 
nvcc will be used as CUDA compiler.
Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/x86_64-linux-gnu-gcc-7]: /usr/bin/x86_64-linux-gnu-gcc-4.8
Do you wish to build TensorFlow with MPI support? [y/N]: 
No MPI support will be enabled for TensorFlow.
Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native]: 
Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]: 
Not configuring the WORKSPACE for Android builds.
Preconfigured Bazel build configs. You can use any of the below by adding "--config=<>" to your build command. See tools/bazel.rc for more details.
	--config=mkl         	# Build with MKL support.
	--config=monolithic  	# Config for mostly static monolithic build.
	--config=gdr         	# Build with GDR support.
	--config=verbs       	# Build with libverbs support.
	--config=ngraph      	# Build with Intel nGraph support.
Configuration finished

3.2 完全陌生的bazel build

先不急着bazel build。目前一切还在接受范围,但不管是天真也好无邪也罢,反正能打败你。TensorFlow编译需要大量的第三方库源码,有多少?请看列表:

https://github.com/bazelbuild/rules_closure/archive/dbb96841cc0a5fb2664c37822803b06dab20c7d1.tar.gz
https://github.com/abseil/abseil-cpp/archive/48cd2c3f351ff188bc85684b84a91b6e6d17d896.tar.gz
https://bitbucket.org/eigen/eigen/get/fd6845384b86.tar.gz
https://github.com/hfp/libxsmm/archive/1.9.tar.gz
https://github.com/google/or-tools/archive/v6.7.2.tar.gz
https://github.com/google/re2/archive/2018-07-01.tar.gz
https://github.com/GoogleCloudPlatform/google-cloud-cpp/archive/14760a86c4ffab9943b476305c4fe927ad95db1c.tar.gz
https://github.com/googleapis/googleapis/archive/f81082ea1e2f85c43649bee26e0d9871d4b41cdb.zip
https://github.com/google/gemmlowp/archive/38ebac7b059e84692f53e5938f97a9943c120d98.zip
https://github.com/google/farmhash/archive/816a4ae622e964763ca0862d9dbd19324a1eaf45.tar.gz
https://github.com/google/highwayhash/archive/fd3d9af80465e4383162e4a7c5e2f406e82dd968.tar.gz
http://www.nasm.us/pub/nasm/releasebuilds/2.13.03/nasm-2.13.03.tar.bz2
https://github.com/libjpeg-turbo/libjpeg-turbo/archive/2.0.0.tar.gz
https://github.com/glennrp/libpng/archive/v1.6.34.tar.gz
https://www.sqlite.org/2018/sqlite-amalgamation-3240000.zip
http://pilotfiber.dl.sourceforge.net/project/giflib/giflib-5.1.4.tar.gz
https://pypi.python.org/packages/source/s/six/six-1.10.0.tar.gz
https://pypi.python.org/packages/d8/be/c4276b3199ec3feee2a88bc64810fbea8f26d961e0a4cd9c68387a9f35de/astor-0.6.2.tar.gz
https://pypi.python.org/packages/5c/78/ff794fcae2ce8aa6323e789d1f8b3b7765f601e7702726f430e814822b96/gast-0.2.0.tar.gz
https://pypi.python.org/packages/8a/48/a76be51647d0eb9f10e2a4511bf3ffb8cc1e6b14e9e4fab46173aa79f981/termcolor-1.1.0.tar.gz
https://github.com/abseil/abseil-py/archive/pypi-v0.2.2.tar.gz
https://pypi.python.org/packages/bc/cc/3cdb0a02e7e96f6c70bd971bc8a90b8463fda83e264fa9c5c1c98ceabd81/backports.weakref-1.0rc1.tar.gz
https://docs.python.org/2.7/_sources/license.txt
https://github.com/google/protobuf/archive/v3.6.0.tar.gz
https://github.com/google/nsync/archive/1.20.1.tar.gz
https://github.com/google/googletest/archive/997d343dd680e541ef96ce71ee54a91daf2577a0.zip
https://github.com/gflags/gflags/archive/v2.2.1.tar.gz
http://ftp.exim.org/pub/pcre/pcre-8.42.tar.gz
http://ufpr.dl.sourceforge.net/project/swig/swig/swig-3.0.8/swig-3.0.8.tar.gz
https://curl.haxx.se/download/curl-7.60.0.tar.gz
https://github.com/grpc/grpc/archive/v1.13.0.tar.gz
https://github.com/antirez/linenoise/archive/c894b9e59f02203dbe4e2be657572cf88c4230c3.tar.gz
https://github.com/LMDB/lmdb/archive/LMDB_0.9.22.tar.gz
https://github.com/open-source-parsers/jsoncpp/archive/1.8.4.tar.gz
https://github.com/google/boringssl/archive/7f634429a04abc48e2eb041c81c5235816c96514.tar.gz
https://zlib.net/zlib-1.2.11.tar.gz
http://www.kurims.kyoto-u.ac.jp/~ooura/fft.tgz
https://github.com/google/snappy/archive/1.1.7.tar.gz
https://github.com/nvidia/nccl/archive/03d856977ecbaac87e598c0c4bafca96761b9ac7.tar.gz
https://github.com/edenhill/librdkafka/archive/v0.11.5.tar.gz
https://github.com/aws/aws-sdk-cpp/archive/1.3.15.tar.gz
# 以下6个http会连接失败,需改为https
https://repo1.maven.org/maven2/junit/junit/4.12/junit-4.12.jar
https://repo1.maven.org/maven2/org/hamcrest/hamcrest-core/1.3/hamcrest-core-1.3.jar
https://repo1.maven.org/maven2/com/google/testing/compile/compile-testing/0.11/compile-testing-0.11.jar
https://repo1.maven.org/maven2/com/google/truth/truth/0.32/truth-0.32.jar
https://repo1.maven.org/maven2/org/checkerframework/checker-qual/2.4.0/checker-qual-2.4.0.jar
https://repo1.maven.org/maven2/com/squareup/javapoet/1.9.0/javapoet-1.9.0.jar
https://github.com/google/pprof/archive/c0fb62ec88c411cc91194465e54db2632845b650.tar.gz
https://github.com/NVlabs/cub/archive/1.8.0.zip
https://github.com/cython/cython/archive/0.28.4.tar.gz
https://github.com/bazelbuild/bazel-toolchains/archive/9a111bd82161c1fbe8ed17a593ca1023fd941c70.tar.gz
https://github.com/intel/ARM_NEON_2_x86_SSE/archive/0f77d9d182265259b135dad949230ecbf1a2633d.tar.gz
https://github.com/google/double-conversion/archive/3992066a95b823efc8ccc1baf82a1cfc73f6e9b8.zip
https://storage.googleapis.com/download.tensorflow.org/models/tflite/mobilenet_v1_224_android_quant_2017_11_08.zip
https://storage.googleapis.com/download.tensorflow.org/models/tflite/mobilenet_ssd_tflite_v1.zip
https://storage.googleapis.com/download.tensorflow.org/models/tflite/coco_ssd_mobilenet_v1_0.75_quant_2018_06_29.zip
http://storage.googleapis.com/download.tensorflow.org/models/object_detection/ssd_mobilenet_v1_quantized_300x300_coco14_sync_2018_07_18.tar.gz
https://storage.googleapis.com/download.tensorflow.org/models/tflite/conv_actions_tflite.zip
https://storage.googleapis.com/download.tensorflow.org/models/tflite/smartreply_1.0_2017_11_01.zip
https://storage.googleapis.com/download.tensorflow.org/data/ovic.zip
https://github.com/bazelbuild/rules_android/archive/v0.1.1.zip
https://github.com/01org/tbb/archive/tbb_2018.zip
https://github.com/NervanaSystems/ngraph/archive/v0.8.1.tar.gz
https://github.com/nlohmann/json/archive/v3.1.1.tar.gz
https://github.com/NervanaSystems/ngraph-tf/archive/v0.6.1.tar.gz
https://github.com/google/flatbuffers/archive/1f5eae5d6a135ff6811724f6c57f911d1f46bb15.tar.gz
https://github.com/unicode-org/icu/archive/release-62-1.tar.gz

上面这个列表我是从源码目录下的WORKSPACE、./tensorflow/workspace.bzl、./third_party/flatbuffers/workspace.bzl、./third_party/icu/workspace.bzl 4个文件用cat+grep抠出来的。没有它们,编译时就会出现错误:

Error downloading [https://mirror.bazel.build/github.com/bazelbuild/rules_closure/archive/dbb96841cc0a5fb2664c37822803b06dab20c7d1.tar.gz, https://github.com/bazelbuild/rules_closure/archive/dbb96841cc0a5fb2664c37822803b06dab20c7d1.tar.gz] to /home/wj/.cache/bazel/_bazel_wj/8f17aafcb0460916e0d9ac951575e712/external/io_bazel_rules_closure/dbb96841cc0a5fb2664c37822803b06dab20c7d1.tar.gz: All mirrors are down: [Unknown host: github.com, Unknown host: mirror.bazel.build]

我也是第一次用bazel,来不及了解它的原理和使用方法。出现错误后,网上很多说法认为是bazel的版本不对,各种试过后才发现他们应该都不是离线源码安装。解决办法是,将上面的列表保存为deps.list,然后编辑脚本get-deps.sh:

#!/bin/bash

for i in `cat deps.list`
do
  wget -nc $i
  echo "$i"
Done

将deps.list、get-deps.sh拷贝到联网环境下装有wget的Linux下(或者在Windows下用相应批量下载的工具也行,但迅雷不行),运行get-deps.sh将所有依赖源代码包下载下来,然后拷贝到离线计算机。假设存放目录为$deps_path,编辑WORKSPACE、./tensorflow/workspace.bzl、./third_party/flatbuffers/workspace.bzl、./third_party/icu/workspace.bzl 4个文件,在如下类似的位置添加本地文件路径:

http_archive(
    name = "io_bazel_rules_closure",
    sha256 = "a38539c5b5c358548e75b44141b4ab637bba7c4dc02b46b1f62a96d6433f56ae",
    strip_prefix = "rules_closure-dbb96841cc0a5fb2664c37822803b06dab20c7d1",
    urls = [
        "https://mirror.bazel.build/github.com/bazelbuild/rules_closure/archive/dbb96841cc0a5fb2664c37822803b06dab20c7d1.tar.gz",
        "https://github.com/bazelbuild/rules_closure/archive/dbb96841cc0a5fb2664c37822803b06dab20c7d1.tar.gz",  # 2018-04-13
        "file:$deps_path/dbb96841cc0a5fb2664c37822803b06dab20c7d1.tar.gz",
    ],
)

当然需要将$deps_path换成真实的路径名。请不要问我这么多依赖怎么修改,也不要问我什么时候可以修改完。我只能说认真修改,弄错了的话,你就好好享受坑的滋味吧。
源码依赖就绪后,运行:

$bazel build --config=opt --verbose_failures //tensorflow/tools/pip_package:build_pip_package

祝您好运。

3.3 期待已久的pip3 install

如果在未来的1.5小时一切太平,就可以生成whl包了,运行:

$bazel-bin/tensorflow/tools/pip_package/build_pip_package ../out

其中…/out可以替换成真实的路径。成功后就生成了whl安装包:tensorflow-1.12.0-cp36-cp36m-linux_x86_64.whl。这时候只需要pip3 install就可以了,如果没有问题,至少可以喝口水休息下了。

4 测试验证

如果手头没有现成的TensorFlow应用代码,可以用网上的示例代码测试:

import tensorflow as tf
with tf.device('/cpu:0'):
    a=tf.constant([1., 2, 3], shape=[3], name='a')
    b=tf.constant([1., 2, 3], shape=[3], name='b')
with tf.device('/gpu:0'):
    c=a+b
sess=tf.Session(config=tf.ConfigProto(allow_soft_placement=True, 
log_device_placement=True))
sess.run(tf.global_variables_initializer())
print(sess.run(c))

控制台的输出应该类似于:

2020-04-21 11:29:59.242611: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties: 
name: GeForce MX250 major: 6 minor: 1 memoryClockRate(GHz): 1.582
pciBusID: 0000:01:00.0
totalMemory: 1.96GiB freeMemory: 195.25MiB
2020-04-21 11:29:59.242630: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2020-04-21 11:29:59.813910: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-04-21 11:29:59.813953: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988]      0 
2020-04-21 11:29:59.813960: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0:   N 
2020-04-21 11:29:59.814070: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 133 MB memory) -> physical GPU (device: 0, name: GeForce MX250, pci bus id: 0000:01:00.0, compute capability: 6.1)
Device mapping:
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: GeForce MX250, pci bus id: 0000:01:00.0, compute capability: 6.1
2020-04-21 11:29:59.814449: I tensorflow/core/common_runtime/direct_session.cc:307] Device mapping:
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: GeForce MX250, pci bus id: 0000:01:00.0, compute capability: 6.1

add: (Add): /job:localhost/replica:0/task:0/device:GPU:0
2020-04-21 11:29:59.815606: I tensorflow/core/common_runtime/placer.cc:927] add: (Add)/job:localhost/replica:0/task:0/device:GPU:0
init: (NoOp): /job:localhost/replica:0/task:0/device:GPU:0
2020-04-21 11:29:59.815634: I tensorflow/core/common_runtime/placer.cc:927] init: (NoOp)/job:localhost/replica:0/task:0/device:GPU:0
a: (Const): /job:localhost/replica:0/task:0/device:CPU:0
2020-04-21 11:29:59.815649: I tensorflow/core/common_runtime/placer.cc:927] a: (Const)/job:localhost/replica:0/task:0/device:CPU:0
b: (Const): /job:localhost/replica:0/task:0/device:CPU:0
2020-04-21 11:29:59.815660: I tensorflow/core/common_runtime/placer.cc:927] b: (Const)/job:localhost/replica:0/task:0/device:CPU:0

这样就看到了心心念念的GPU在干活了,当然人生总是有意外。但只要前面的依赖版本丝毫不差,而且系统中没有“多余物”的话,应该成功没有问题。
我这里有一个Keras+TensorFlow用二维卷积神经网络识别MNIST手写数字的例子用于测试,图像尺寸28X28,50000张训练图像,10000张验证图像,3层Conv2D,2层Dense,93322个待训练参数。用CPU计算结果如下:

Train on 50000 samples, validate on 10000 samples
2020-04-20 21:28:56.517508: E tensorflow/stream_executor/cuda/cuda_driver.cc:300] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
2020-04-20 21:28:56.517906: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:150] kernel driver does not appear to be running on this host (wj-WRT-WX9): /proc/driver/nvidia/version does not exist
Epoch 1/20
50000/50000 [==============================] - 23s 467us/step - loss: 0.2066 - acc: 0.9359 - val_loss: 0.0829 - val_acc: 0.9762
Epoch 2/20
50000/50000 [==============================] - 23s 461us/step - loss: 0.0524 - acc: 0.9837 - val_loss: 0.0513 - val_acc: 0.9852
Epoch 3/20
50000/50000 [==============================] - 30s 609us/step - loss: 0.0343 - acc: 0.9891 - val_loss: 0.0460 - val_acc: 0.9875

CPU计算每轮次大概需要23s以上时间。使用GPU计算结果如下:

Train on 50000 samples, validate on 10000 samples
2020-04-20 21:17:43.752353: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:964] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-04-20 21:17:43.752600: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties: 
name: GeForce MX250 major: 6 minor: 1 memoryClockRate(GHz): 1.582
pciBusID: 0000:01:00.0
totalMemory: 1.96GiB freeMemory: 1.68GiB
2020-04-20 21:17:43.752622: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2020-04-20 21:17:44.202421: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-04-20 21:17:44.202447: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988]      0 
2020-04-20 21:17:44.202452: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0:   N 
2020-04-20 21:17:44.202538: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1432 MB memory) -> physical GPU (device: 0, name: GeForce MX250, pci bus id: 0000:01:00.0, compute capability: 6.1)
Epoch 1/20
50000/50000 [==============================] - 13s 268us/step - loss: 0.1848 - acc: 0.9430 - val_loss: 0.0557 - val_acc: 0.9843
Epoch 2/20
50000/50000 [==============================] - 13s 252us/step - loss: 0.0506 - acc: 0.9843 - val_loss: 0.0547 - val_acc: 0.9841
Epoch 3/20
50000/50000 [==============================] - 13s 257us/step - loss: 0.0342 - acc: 0.9894 - val_loss: 0.0396 - val_acc: 0.9880

GPU计算每轮次只需要13s的时间,比CPU的23s还是快多多了。看来即使是GeForce MX250,也是值得你付出的。在现有条件下,或多或少还是有点未来可期的感觉。

5 避坑建议

  1. 版本一定要遵从官方建议https://tensorflow.google.cn/install/source,如果可以,最好和成功的案例一模一样。但对于CUDA9.1,TensorFlow官方既没有release,也没有明确的版本建议,上面这些心法也算是时间和泪水的成果吧。请记住CUDA9.1和CUDA9绝不是一回事。
  2. 强如GCC也会犯错,也有崩溃的时候,再试一遍+重启有时候管用。
x86_64-linux-gnu-gcc-4.8: internal compiler error: Killed (program cc1plus)
Please submit a full bug report,
with preprocessed source if appropriate.
See <file:///usr/share/doc/gcc-4.8/README.Bugs> for instructions.
ERROR: /home/wj/software/tf/gpu/tensorflow-1.12.0/tensorflow/core/kernels/BUILD:107:1: output 'tensorflow/core/kernels/_objs/strided_slice_op_gpu/strided_slice_op_gpu.cu.pic.o' was not created
ERROR: /home/wj/software/tf/gpu/tensorflow-1.12.0/tensorflow/core/kernels/BUILD:107:1: not all outputs were created or valid
Target //tensorflow/tools/pip_package:build_pip_package failed to build
  1. 同样的库,多版本共存的时候一定要注意。我在编译安装完测试的时候还出现了错误,以为要前功尽弃,原来是不知道什么时候安装了libnvidia-compute-390:amd64导致的。
2020-04-20 20:42:50.228503: E tensorflow/stream_executor/cuda/cuda_driver.cc:300] failed call to cuInit: CUDA_ERROR_INVALID_VALUE: invalid argument
2020-04-20 20:42:50.228987: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:163] retrieving CUDA diagnostic information for host: wj-WRT-WX9
2020-04-20 20:42:50.229005: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:170] hostname: wj-WRT-WX9
2020-04-20 20:42:50.229067: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:194] libcuda reported version is: 390.132.0
2020-04-20 20:42:50.229102: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:198] kernel reported version is: 440.59.0
2020-04-20 20:42:50.229115: E tensorflow/stream_executor/cuda/cuda_diagnostics.cc:308] kernel version 440.59.0 does not match DSO version 390.132.0 -- cannot find working devices in this configuration
  1. 感谢https://blog.csdn.net/s_sunnyy/article/details/86074114,这是为数不多的离线源码安装的原创,我基本上也是按照他的套路来的。但还是那句话,一千个人眼中有一千个哈姆雷特,没法照搬。具体问题就是这样,也没有高级不高级的。

  2. 坑就是坑,能不踩就不要踩。实在不得不踩,先看看大神们在https://github.com/mind/wheels/releases这里所做的工作能不能帮到你。

最后,请不要问我为什么要离线、源码安装。愿疫情早日结束。

你可能感兴趣的:(华为MateBook 13(GeForce MX250)+UbuntuKylin 18.04+TensorFlow(GPU)+CUDA9.1离线源码安装出坑记)