MacBook+eGPU编译安装pytorch、tensorflow(OSX10.13.3,python3.6,cuda9.1,cudnn7)(未完成)

本文记录自己使用MacBook 13inch early2015+技嘉GTX1080 Gaming Box配置深度学习环境的过程

安装eGPU(待添加)

1. 

安装PyTorch(待添加)

1. 

安装tensorflow

  1. 下载tensorflow-1.5.0源码
  2. (修改文件)删除三个文件中的“__align__ (sizeof(T))”代码语句
    1. tensorflow/core/kernels/depthwise_conv_op_gpu.cu.cc
    2. tensorflow/core/kernels/split_lib_gpu.cu.cc
    3. tensorflow/core/kernels/concat_lib_gpu.impl.cu.cc
    4. (参考)
      1. https://github.com/tensorflow/tensorflow/issues/14174
      2. https://gist.github.com/smitshilu/53cf9ff0fd6cdb64cca69a7e2827ed0f
  3. 安装tf前准备
    1. sudo pip install six numpy wheel(Anaconda中已包含)
    2. brew install coreutils
    3. 因为已经安装了最新的Xcode9,CLT命令行工具版本太新,不支持cuda9,需要重新安装Command_Line_Tools_macOS_10.12_for_Xcode_8.2.dmg,并运行sudo xcode-select --switch /Library/Developer/CommandLineTools,之后可以运行clang --version查看命令行工具版本
    4. 参考)
      1. https://github.com/pytorch/pytorch/issues/3047
  4. 配置tensorflow安装信息
    1. $ cd tensorflow  # cd to the top-level directory created
    2. $ ./configure
  5. 补充步骤(不确定是否必要,最后一次尝试编译安装并成功,使用了以下部分代码)
    1. bazel clean --expunge 
    2. 参考中包含完整的5行命令(我只是用了一行):
      1. bazel clean --expunge
      2. sudo xcode-select -s /Applications/Xcode.app/Contents/Developer
      3. sudo xcodebuild -license
      4. bazel clean --expunge 
      5. bazel build --config=opt //tensorflow/tools/pip_package:build_pip_package(开始编译)
    3. (参考)
      1. https://stackoverflow.com/questions/45276830/xcode-version-must-be-specified-to-use-an-apple-crosstool
  6. 编译安装tf1.5
    1. $ bazel build --config=opt --config=cuda //tensorflow/tools/pip_package:build_pip_package
    2. $ bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
    3. $ sudo pip install /tmp/tensorflow_pkg/tensorflow-1.5.0-py2-none-any.whl
    4. (参考)
      1. https://www.tensorflow.org/install/install_sources
  7. 中间遇到的报错信息
    1. 问题:Linking of rule '//tensorflow/python:_pywrap_tensorflow_internal.so’ failed
      1. 解决办法:(修改文件)删除tensorflow/third_party/gpus/cuda/BUILD.tpl.文件中的linkopts = ["-lgomp"], 
      2. (参考)https://github.com/tensorflow/tensorflow/issues/15172
    2. 问题:运行tensorflow测试程序出现OOM错误
      1. 解决办法:
        1. gpu_options = tf.GPUOptions(allow_growth=True) 
        2. with tf.Session(config=tf.ConfigProto(gpu_options=gpu_options)) as sess:
      2. (参考)https://stackoverflow.com/questions/39465503/cuda-error-out-of-memory-in-tensorflow

你可能感兴趣的:(环境搭建)