编译pytorch源码
clone pytorch源码:
(base) xueruini@nico2:~/onion_rain/pytorch/code$ git clone --recursive https://github.com/pytorch/pytorch
Cloning into 'pytorch'...
remote: Enumerating objects: 37, done.
remote: Counting objects: 100% (37/37), done.
...
Resolving deltas: 100% (151/151), done.
Submodule path 'third_party/tensorpipe/third_party/pybind11/tools/clang': checked out '6a00cbc4a9b8e68b71caf7f774b3f9c753ae84d5'
Submodule path 'third_party/zstd': checked out 'aec56a52fbab207fc639a1937d1e708a282edca8'
(base) xueruini@nico2:~/onion_rain/pytorch/code$ cd pytorch
(base) xueruini@nico2:~/onion_rain/pytorch/code/pytorch$ ls
android aten.bzl binaries c10 CITATION CMakeLists.txt CODEOWNERS docker docker.Makefile ios Makefile mypy.ini README.md scripts submodules third_party torch version.txt
aten benchmarks BUILD.bazel caffe2 cmake CODE_OF_CONDUCT.md CONTRIBUTING.md Dockerfile docs LICENSE modules NOTICE requirements.txt setup.py test tools ubsan.supp WORKSPACE
切换到release版本
(pytorch) xueruini@nico2:~/onion_rain/pytorch/code/pytorch$ git checkout v1.5.1
Checking out files: 100% (3139/3139), done.
Note: checking out 'v1.5.1'.
You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.
If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:
git checkout -b
HEAD is now at 3c31d73c87 [ONNX] Fix pow op export [1.5.1] (#39791)
更新一些第三方库
(base) xueruini@nico2:~/onion_rain/pytorch/code/pytorch$ git submodule sync
(base) xueruini@nico2:~/onion_rain/pytorch/code/pytorch$ git submodule update --init --recursive
新建虚拟环境
(base) xueruini@nico2:~/onion_rain/pytorch/code/pytorch$ conda create -n pytorch
Collecting package metadata (current_repodata.json): done
Solving environment: done
## Package Plan ##
environment location: /home/xueruini/anaconda3/envs/pytorch
Proceed ([y]/n)? y
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
#
# To activate this environment, use
#
# $ conda activate pytorch
#
# To deactivate an active environment, use
#
# $ conda deactivate
(base) xueruini@nico2:~/onion_rain/pytorch/code/pytorch$ conda activate pytorch
(pytorch) xueruini@nico2:~/onion_rain/pytorch/code/pytorch$
安装依赖
通用依赖
(pytorch) xueruini@nico2:~/onion_rain/pytorch/code/pytorch$ conda install numpy ninja pyyaml mkl mkl-include setuptools cmake cffi
Collecting package metadata (current_repodata.json): done
Solving environment: done
...
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
查一下cuda版本
(pytorch) xueruini@nico2:~/onion_rain/pytorch/code/pytorch$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Wed_Oct_23_19:24:38_PDT_2019
Cuda compilation tools, release 10.2, V10.2.89
安装cuda依赖
(pytorch) xueruini@nico2:~/onion_rain/pytorch/code/pytorch$ conda install -c pytorch magma-cuda102
Collecting package metadata (current_repodata.json): done
Solving environment: done
...
Verifying transaction: done
Executing transaction: done
安装cudatoolkit
(pytorch) xueruini@nico2:~/onion_rain/pytorch/code/pytorch$ conda install cudatoolkit
Collecting package metadata (current_repodata.json): done
Solving environment: done
...
Verifying transaction: done
Executing transaction: done
不装cudatoolkit,编译完import torch时会有如下报错:
(pytorch) xueruini@nico2:~/onion_rain/pytorch/code/pytorch$ python
Python 3.8.3 (default, Jul 2 2020, 16:21:59)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
Traceback (most recent call last):
File "", line 1, in
File "/home/xueruini/onion_rain/pytorch/code/pytorch/torch/__init__.py", line 1
35, in
_load_global_deps()
File "/home/xueruini/onion_rain/pytorch/code/pytorch/torch/__init__.py", line 9
3, in _load_global_deps
ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL)
File "/home/xueruini/anaconda3/envs/pytorch/lib/python3.8/ctypes/__init__.py",
line 373, in __init__
self._handle = _dlopen(self._name, mode)
OSError: libcublasLt.so.10: cannot open shared object file: No such file or dire
编译
这里由于我是要对pytorch源码debug,所以DEBUG参数置为1
(pytorch) xueruini@nico2:~/onion_rain/pytorch/code/pytorch$ export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}
(pytorch) xueruini@nico2:~/onion_rain/pytorch/code/pytorch$ DEBUG=1 python setup.py install
Building wheel torch-1.5.0a0+3c31d73
-- Building version 1.5.0a0+3c31d73
cmake -GNinja -DBUILD_PYTHON=True -DBUILD_TEST=True -DCMAKE_BUILD_TYPE=Debug -DCMAKE_INSTALL_PREFIX=/home/xueruini/onion_rain/pytorch/code/pytorch/torch -DCMAKE_PREFIX_PATH=/home/xueruini/anaconda3/envs/pytorch -DNUMPY_INCLUDE_DIR=/home/xueruini/anaconda3/envs/pytorch/lib/python3.8/site-packages/numpy/core/include -DPYTHON_EXECUTABLE=/home/xueruini/anaconda3/envs/pytorch/bin/python -DPYTHON_INCLUDE_DIR=/home/xueruini/anaconda3/envs/pytorch/include/python3.8 -DPYTHON_LIBRARY=/home/xueruini/anaconda3/envs/pytorch/lib/libpython3.8.so.1.0 -DTORCH_BUILD_VERSION=1.5.0a0+3c31d73 -DUSE_NUMPY=True /home/xueruini/onion_rain/pytorch/code/pytorch
...
Copying torch.egg-info to /home/xueruini/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch-1.5.0a0+3c31d73-py3.8.egg-info
running install_scripts
Installing convert-caffe2-to-onnx script to /home/xueruini/anaconda3/envs/pytorch/bin
Installing convert-onnx-to-caffe2 script to /home/xueruini/anaconda3/envs/pytorch/bin
验证
首先要退出pytorch目录
(pytorch) xueruini@nico2:~/onion_rain/pytorch/code/pytorch$ cd ..
(pytorch) xueruini@nico2:~/onion_rain/pytorch/code$ python
Python 3.8.3 (default, Jul 2 2020, 16:21:59)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.__version__
'1.5.0a0+3c31d73'
>>>
不退出会出现如下报错:
(pytorch) xueruini@nico2:~/onion_rain/pytorch/code/pytorch$ python
Python 3.8.3 (default, Jul 2 2020, 16:21:59)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
Traceback (most recent call last):
File "", line 1, in
File "/home/xueruini/onion_rain/pytorch/code/pytorch/torch/__init__.py", line 1
36, in
from torch._C import *
ModuleNotFoundError: No module named 'torch._C'
>>>
卸载
(pytorch) xueruini@nico2:~/onion_rain/pytorch/code/pytorch$ pip uninstall torch
Found existing installation: torch 1.7.0a0+6095808
Uninstalling torch-1.7.0a0+6095808:
Would remove:
/home/xueruini/anaconda3/envs/pytorch/bin/convert-caffe2-to-onnx
/home/xueruini/anaconda3/envs/pytorch/bin/convert-onnx-to-caffe2
/home/xueruini/anaconda3/envs/pytorch/lib/python3.8/site-packages/caffe2
/home/xueruini/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch
/home/xueruini/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch-1.7.0a0+6095808-py3.8.egg-info
Proceed (y/n)? Y
Successfully uninstalled torch-1.7.0a0+6095808
(pytorch) xueruini@nico2:~/onion_rain/pytorch/code/pytorch$ python setup.py clean
Building wheel torch-1.7.0a0+6095808
running clean
其他
用vscode f5进行debug时,import torch报错,但是终端命令行跑没问题:
解决办法:
conda install openmpi
好奇怪,如果是缺库那终端也不应该能运行才对呀