因精力有限,目前只会翻译对自己有用的部分,且主要使用谷歌翻译,不准确的地方请参考原文。
开发 PyTorch
A full set of instructions on installing PyTorch from source is here:
https://github.com/pytorch/pytorch#from-source
有关从源代码安装 PyTorch 的完整说明如下:https://github.com/pytorch/pytorch#from-source
To develop PyTorch on your machine, here are some tips:
要在您的计算机上开发 PyTorch,这里有一些提示:
pip install cmake
先决条件
指示
pip uninstall torch
multiple times. You’ll know torch
is fullyWARNING: Skipping torch as it is not installed
. (You should only have to pip uninstall
a few times, butuninstall
with timeout
or in a loop if you’re feeling卸载所有现有的 PyTorch 安装。您可能需要多次运行 pip uninstall torch
。当您看到WARNING: Skipping torch as it is not installed
时,您就知道 torch 已完全卸载。 (您应该只需要 pip uninstall
几次,但如果您感到懒惰,则可以随时超时或循环卸载。)
conda uninstall pytorch -y
yes | pip uninstall torch
从源克隆 PyTorch 的副本:
git clone https://github.com/pytorch/pytorch
cd pytorch
If you already have PyTorch from source, update it:
如果您已经从源代码获得了 PyTorch,请更新它:
git pull --rebase
git submodule sync --recursive
git submodule update --init --recursive --jobs 0
If you want to have no-op incremental rebuilds (which are fast), see the section below titled “Make no-op build fast.”
setup.py install
you’ll want to call setup.py develop
instead:按照 installing PyTorch from source 的说明进行操作,但当需要安装 PyTorch 时,你需要调用 setup.py develop
,而不是调用 setup.py install
Specifically, the change you have to make is to replace
具体来说,您必须进行的更改是替换
python setup.py install
with
python setup.py develop
This mode will symlink the Python files from the current local source
tree into the Python install. This way when you modify a Python file, you
won’t need to reinstall PyTorch again and again. This is especially
useful if you are only changing Python files.
此模式会将 Python 文件从当前本地源代码树符号链接到 Python 安装中。这样当你修改Python文件时,你就不需要一次又一次地重新安装PyTorch。如果您只更改 Python 文件,这尤其有用。
For example:
develop
modetorch/__init__.py
(for example)例如:
You do not need to repeatedly install after modifying Python files (.py
). However, you would need to reinstall
if you modify Python interface (.pyi
, .pyi.in
) or non-Python files (.cpp
, .cc
, .cu
, .h
, …).
修改Python文件(.py)后无需重复安装。 但是,如果修改 Python 接口(.pyi、.pyi.in)或非 Python 文件(.cpp、.cc、.cu、.h、…),则需要重新安装。
In case you want to reinstall, make sure that you uninstall PyTorch
first by running pip uninstall torch
until you see WARNING: Skipping torch as it is not installed
; next run python setup.py clean
. After
that, you can install in develop
mode again.
如果您想重新安装,请确保首先运行 pip uninstall torch
来卸载 PyTorch,直到看到 WARNING: Skipping torch as it is not installed
;接下来运行 python setup.py clean
。之后,您可以再次以开发模式安装。
python setup.py develop
, here are some debugging steps:
printf '#include \nint main() { printf("Hello World");}'|clang -x c -; ./a.out
to make surebuild
directory. The setup.py
script compiles binaries into the build
folder and caches manyrm -rf build
from the toplevel pytorch
directory and start over.git submodule deinit -f .
git clean -xdf
python setup.py clean
git submodule update --init --recursive --jobs 0 # very important to sync the submodules
python setup.py develop # then try running the command again
python setup.py develop
is running make
from the build
directory. If you want toENV_KEY1=ENV_VAL1[, ENV_KEY2=ENV_VAL2]* python setup.py develop
如果在运行 python setup.pydevelop 时遇到错误,这里有一些调试步骤:
printf '#include \nint main() { printf("Hello World");}'|clang -x c -; ./a.out
以确保您的 CMake 可以正常工作并且可以毫无错误地编译这个简单的 Hello World 程序。setup.py
脚本将二进制文件编译到构建文件夹中,并在此过程中缓存许多详细信息,这可以在下次构建时节省时间。如果您遇到问题,您可以随时从顶层 pytorch 目录中 rm -rf build
并重新开始。python setup.py develop
中的主要步骤是从构建目录运行 make。如果您想尝试一些环境变量,可以将它们传递到命令中。代码库结构
c10 - 可在服务器和移动设备上随处使用的核心库文件。我们正在慢慢地将 ATen/core 的部分移到这里。该库仅旨在包含基本功能,并且适合在二进制大小很重要的设置中使用。 (但是如果您尝试直接使用它,则会缺少很多功能。)
aten - PyTorch 的 C++ 张量库(不支持 autograd)
setup.py
for the canonical list of Pythonpython_
. README
torch - 实际的 PyTorch 库。所有 csrc 中没有的内容都是 Python 模块,遵循 PyTorch Python 前端模块结构。
tools - PyTorch 库的代码生成脚本。
test - PyTorch Python 前端的 Python 单元测试。
caffe2 - Caffe2 库。
py-spy
Evaluating the performance impact of code changes in PyTorch can be complicated,
particularly if code changes happen in compiled code. One simple way to profile
both Python and C++ code in PyTorch is to use
py-spy
, a sampling profiler for Python
that has the ability to profile native code and Python code in the same session.
评估 PyTorch 中代码更改对性能的影响可能很复杂,尤其是在已编译代码中发生代码更改时。在 PyTorch 中分析 Python 和 C++ 代码的一种简单方法是使用 py-spy,这是一种 Python 采样分析器,能够在同一会话中分析本机代码和 Python 代码。
py-spy
can be installed via pip
:
pip install py-spy
To use py-spy
, first write a Python test script that exercises the
functionality you would like to profile. For example, this script profiles
torch.add
:
要使用 py-spy,首先编写一个 Python 测试脚本来执行您想要分析的功能。例如,此脚本配置 torch.add
:
import torch
t1 = torch.tensor([[1, 1], [1, 1.]])
t2 = torch.tensor([[0, 0], [0, 0.]])
for _ in range(1000000):
torch.add(t1, t2)
Since the torch.add
operation happens in microseconds, we repeat it a large
number of times to get good statistics. The most straightforward way to use
py-spy
with such a script is to generate a flame
graph:
由于 torch.add 操作发生在微秒内,因此我们重复多次以获得良好的统计数据。将 py-spy 与此类脚本结合使用的最直接方法是生成火焰图:
py-spy record -o profile.svg --native -- python test_tensor_tensor_add.py
This will output a file named profile.svg
containing a flame graph you can
view in a web browser or SVG viewer. Individual stack frame entries in the graph
can be selected interactively with your mouse to zoom in on a particular part of
the program execution timeline. The --native
command-line option tells
py-spy
to record stack frame entries for PyTorch C++ code. To get line numbers
for C++ code it may be necessary to compile PyTorch in debug mode by prepending
your setup.py develop
call to compile PyTorch with DEBUG=1
. Depending on
your operating system it may also be necessary to run py-spy
with root
privileges.
这将输出一个名为 profile.svg 的文件,其中包含您可以在 Web 浏览器或 SVG 查看器中查看的火焰图。可以使用鼠标以交互方式选择图中的各个堆栈帧条目,以放大程序执行时间线的特定部分。 --native 命令行选项告诉 py-spy 记录 PyTorch C++ 代码的堆栈帧条目。
要获取 C++ 代码的行号,可能需要在调试模式下编译 PyTorch,方法是在 setup.py develop
调用之前添加 DEBUG=1
来编译 PyTorch。根据您的操作系统,可能还需要以 root 权限运行 py-spy。
py-spy
can also work in an htop
-like “live profiling” mode and can be
tweaked to adjust the stack sampling rate, see the py-spy
readme for more
details.
py-spy 还可以在类似 htop 的“实时分析”模式下工作,并且可以进行调整以调整堆栈采样率,请参阅 py-spy 自述文件了解更多详细信息。
One downside to using python setup.py develop
is that your development
version of PyTorch will be installed globally on your account (e.g., if
you run import torch
anywhere else, the development version will be
used.
使用 python setup.py develop
的一个缺点是 PyTorch 的开发版本将全局安装在您的帐户上(例如,如果您在其他任何地方运行 import torch,则将使用开发版本。
If you want to manage multiple builds of PyTorch, you can make use of
conda environments to maintain
separate Python package environments, each of which can be tied to a
specific build of PyTorch. To set one up:
如果您想管理 PyTorch 的多个构建,可以利用 conda 环境来维护单独的 Python 包环境,每个环境都可以绑定到特定的 PyTorch 构建。要设置一项:
conda create -n pytorch-myfeature
source activate pytorch-myfeature
# if you run python now, torch will NOT be installed
python setup.py develop
If you are working on the C++ code, there are a few important things that you
will want to keep in mind:
如果您正在处理 C++ 代码,那么您需要记住一些重要的事情:
只构建您需要的内容
python setup.py build
will build everything by default, but sometimes you are
only interested in a specific component.
(cd build && ninja bin/test_binary_name)
toninja
withmake
if you don’t have ninja installed).BUILD_CAFFE2=0
to disable Caffe2 build.python setup.py build
默认情况下会构建所有内容,但有时您只对特定组件感兴趣。
cd build && ninja bin/test_binary_name
) 仅重建该测试二进制文件(无需重新运行 cmake)。 (如果没有安装 ninja,请将 ninja 替换为 make)。BUILD_CAFFE2=0
禁用 Caffe2 构建。On the initial build, you can also speed things up with the environment
variables DEBUG
, USE_DISTRIBUTED
, USE_MKLDNN
, USE_CUDA
, BUILD_TEST
, USE_FBGEMM
, USE_NNPACK
and USE_QNNPACK
.
DEBUG=1
will enable debug builds (-g -O0)REL_WITH_DEB_INFO=1
will enable debug symbols with optimizations (-g -O3)USE_DISTRIBUTED=0
will disable distributed (c10d, gloo, mpi, etc.) build.USE_MKLDNN=0
will disable using MKL-DNN.USE_CUDA=0
will disable compiling CUDA (in case you are developing on something not CUDA related), to save compile time.BUILD_TEST=0
will disable building C++ test binaries.USE_FBGEMM=0
will disable using FBGEMM (quantized 8-bit server operators).USE_NNPACK=0
will disable compiling with NNPACK.USE_QNNPACK=0
will disable QNNPACK build (quantized 8-bit operators).USE_XNNPACK=0
will disable compiling with XNNPACK.在初始构建中,您还可以使用环境变量 DEBUG、USE_DISTRIBUTED、USE_MKLDNN、USE_CUDA、BUILD_TEST、USE_FBGEMM、USE_NNPACK 和 USE_QNNPACK 来加快速度。
DEBUG=1
将启用调试版本 (-g -O0)REL_WITH_DEB_INFO=1
将启用带有优化的调试符号 (-g -O3)USE_DISTRIBUTED=0
将禁用分布式(c10d、gloo、mpi 等)构建。USE_MKLDNN=0
将禁用 MKL-DNN。USE_CUDA=0
将禁用编译 CUDA(如果您正在开发与 CUDA 无关的东西),以节省编译时间。BUILD_TEST=0
将禁用构建 C++ 测试二进制文件。USE_FBGEMM=0
将禁用 FBGEMM(量化 8 位服务器运算符)。USE_NNPACK=0
将禁用使用 NNPACK 进行编译。USE_QNNPACK=0
将禁用 QNNPACK 构建(量化 8 位运算符)。USE_XNNPACK=0
将禁用使用 XNNPACK 进行编译。For example:
DEBUG=1 USE_DISTRIBUTED=0 USE_MKLDNN=0 USE_CUDA=0 BUILD_TEST=0 USE_FBGEMM=0 USE_NNPACK=0 USE_QNNPACK=0 USE_XNNPACK=0 python setup.py develop
For subsequent builds (i.e., when build/CMakeCache.txt
exists), the build
options passed for the first time will persist; please run ccmake build/
, run
cmake-gui build/
, or directly edit build/CMakeCache.txt
to adapt build
options.
对于后续构建(即,当 build/CMakeCache.txt
存在时),第一次传递的构建选项将保留;请运行 ccmake build/
、运行 cmake-gui build/
或直接编辑 build/CMakeCache.txt
以适应构建选项。
When using python setup.py develop
, PyTorch will generate
a compile_commands.json
file that can be used by many editors
to provide command completion and error highlighting for PyTorch’s
C++ code. You need to pip install ninja
to generate accurate
information for the code in torch/csrc
. More information at:
使用 python setup.py develop
时,PyTorch 将生成一个 compile_commands.json
文件,许多编辑器可以使用该文件为 PyTorch 的 C++ 代码提供命令完成和错误突出显示。您需要 pip install ninja
才能为 torch/csrc
中的代码生成准确的信息。更多信息请访问:
By default, cmake will use its Makefile generator to generate your build
system. You can get faster builds if you install the ninja build system
with pip install ninja
. If PyTorch was already built, you will need
to run python setup.py clean
once after installing ninja for builds to
succeed.
默认情况下,cmake 将使用其 Makefile 生成器来生成构建系统。如果使用 pip install ninja
安装 ninja 构建系统,您可以获得更快的构建速度。如果 PyTorch 已经构建,则需要在安装 ninja 后运行 python setup.py clean
一次才能成功构建。
Even when dependencies are tracked with file modification, there are many
situations where files get rebuilt when a previous compilation was exactly the
same. Using ccache in a situation like this is a real time-saver.
即使通过文件修改来跟踪依赖关系,在许多情况下,当先前的编译完全相同时,文件也会被重建。在这种情况下使用 ccache 确实可以节省时间。
Before building pytorch, install ccache from your package manager of choice:
在构建 pytorch 之前,从您选择的包管理器安装 ccache:
conda install ccache -c conda-forge
sudo apt install ccache
sudo yum install ccache
brew install ccache
You may also find the default cache size in ccache is too small to be useful.
The cache sizes can be increased from the command line:
您可能还会发现 ccache 中的默认缓存大小太小而无用。可以从命令行增加缓存大小:
# config: cache dir is ~/.ccache, conf file ~/.ccache/ccache.conf
# max size of cache
ccache -M 25Gi # -M 0 for unlimited
# unlimited number of files
ccache -F 0
To check this is working, do two clean builds of pytorch in a row. The second
build should be substantially and noticeably faster than the first build. If
this doesn’t seem to be the case, check the CMAKE_
rules in build/CMakeCache.txt
, where
is C
, CXX
and CUDA
.
Each of these 3 variables should contain ccache, e.g.
要检查它是否有效,请连续进行两次干净的 pytorch 构建。第二次构建应该比第一次构建明显更快。如果情况并非如此,请检查 build/CMakeCache.txt
中的 CMAKE_
规则,其中 是 C、CXX 和 CUDA。这 3 个变量中的每一个都应该包含 ccache,例如
//CXX compiler launcher
CMAKE_CXX_COMPILER_LAUNCHER:STRING=/usr/bin/ccache
If not, you can define these variables on the command line before invoking setup.py
.
如果没有,您可以在调用 setup.py
之前在命令行上定义这些变量。
export CMAKE_C_COMPILER_LAUNCHER=ccache
export CMAKE_CXX_COMPILER_LAUNCHER=ccache
export CMAKE_CUDA_COMPILER_LAUNCHER=ccache
python setup.py develop
If you are debugging pytorch inside GDB, you might be interested in
pytorch-gdb. This script introduces some
pytorch-specific commands which you can use from the GDB prompt. In
particular, torch-tensor-repr
prints a human-readable repr of an at::Tensor
object. Example of usage:
如果您在 GDB 中调试 pytorch,您可能会对 pytorch-gdb 感兴趣。该脚本介绍了一些特定于 pytorch 的命令,您可以在 GDB 提示符下使用这些命令。特别是,torch-tensor-repr
打印 at::Tensor 对象的人类可读的 repr。使用示例:
$ gdb python
GNU gdb (GDB) 9.2
[...]
(gdb) # insert a breakpoint when we call .neg()
(gdb) break at::Tensor::neg
Function "at::Tensor::neg" not defined.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 1 (at::Tensor::neg) pending.
(gdb) run
[...]
>>> import torch
>>> t = torch.tensor([1, 2, 3, 4], dtype=torch.float64)
>>> t
tensor([1., 2., 3., 4.], dtype=torch.float64)
>>> t.neg()
Thread 1 "python" hit Breakpoint 1, at::Tensor::neg (this=0x7ffb118a9c88) at aten/src/ATen/core/TensorBody.h:3295
3295 inline at::Tensor Tensor::neg() const {
(gdb) # the default repr of 'this' is not very useful
(gdb) p this
$1 = (const at::Tensor * const) 0x7ffb118a9c88
(gdb) p *this
$2 = {impl_ = {target_ = 0x55629b5cd330}}
(gdb) torch-tensor-repr *this
Python-level repr of *this:
tensor([1., 2., 3., 4.], dtype=torch.float64)
GDB tries to automatically load pytorch-gdb
thanks to the
.gdbinit at the root of the pytorch repo. However, auto-loadings is disabled by default, because of security reasons:
由于 pytorch 存储库根目录下的 .gdbinit,GDB 尝试自动加载 pytorch-gdb
。但是,出于安全原因,默认情况下禁用自动加载:
$ gdb
warning: File "/path/to/pytorch/.gdbinit" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load".
To enable execution of this file add
add-auto-load-safe-path /path/to/pytorch/.gdbinit
line to your configuration file "/home/YOUR-USERNAME/.gdbinit".
To completely disable this security protection add
set auto-load safe-path /
line to your configuration file "/home/YOUR-USERNAME/.gdbinit".
For more information about this security protection see the
"Auto-loading safe path" section in the GDB manual. E.g., run from the shell:
info "(gdb)Auto-loading safe path"
(gdb)
As gdb itself suggests, the best way to enable auto-loading of pytorch-gdb
is to add the following line to your ~/.gdbinit
(i.e., the .gdbinit
file
which is in your home directory, not /path/to/pytorch/.gdbinit
):
正如 gdb 本身所建议的,启用 pytorch-gdb
自动加载的最佳方法是将以下行添加到 ~/.gdbinit
(即位于主目录中的 .gdbinit
文件,而不是 /path/to/pytorch /.gdbinit
):
add-auto-load-safe-path /path/to/pytorch/.gdbinit
Set TORCH_SHOW_CPP_STACKTRACES=1
to get the C++ stacktrace when an error occurs in Python.
设置 TORCH_SHOW_CPP_STACKTRACES=1
以在 Python 中发生错误时获取 C++ 堆栈跟踪。
If you are working on the CUDA code, here are some useful CUDA debugging tips:
CUDA_DEVICE_DEBUG=1
will enable CUDA device function debug symbols (-g -G
).DEBUG=1
), so use wisely.cuda-gdb
and cuda-memcheck
are your best CUDA debugging friends. Unlikegdb
,cuda-gdb
can display actual values in a CUDA tensor (rather than all zeros).std::numeric_limits
, std::nextafter
,std::tuple
etc. in device code. Many of such features are possible because of theuniform_
import torch
from torch.utils.benchmark import Timer
size = 128*512
nrep = 100
nbytes_read_write = 4 # this is number of bytes read + written by a kernel. Change this to fit your kernel.
for i in range(10):
a=torch.empty(size).cuda().uniform_()
torch.cuda.synchronize()
out = a.uniform_()
torch.cuda.synchronize()
t = Timer(stmt="a.uniform_()", globals=globals())
res = t.blocked_autorange()
timec = res.median
print("uniform, size, elements", size, "forward", timec, "bandwidth (GB/s)", size*(nbytes_read_write)*1e-9/timec)
size *=2
如果您正在处理 CUDA 代码,这里有一些有用的 CUDA 调试技巧:
CUDA_DEVICE_DEBUG=1
将启用 CUDA 设备功能调试符号 (-g -G
)。这对于调试设备代码特别有帮助。但是,它会使构建过程减慢约 50%(与仅 DEBUG=1
相比),因此请明智使用。cuda-gdb
和 cuda-memcheck
是您最好的 CUDA 调试朋友。与 gdb
不同,cuda-gdb
可以显示 CUDA 张量中的实际值(而不是全零)。std::numeric_limits
、std::nextafter
、std::tuple
等。由于 --expt-relaxed-constexpr nvcc 标志,许多这样的功能都是可能的。有一个已知问题,即 ROCm 在使用此类 stl 函数的设备代码上出错。Uniform_
内核的有效带宽。See more cuda development tips here
以下内容摘自 CUDA basics
Common gotchas for writing CUDA code
- If you are writing your kernel, try to use existing utilities to calculate the number of blocks, to perform atomic operations in the
kernel, to perform reductions in the block. Additionally, cub also
provides block-wide primitives that can be useful.- Avoid using raw cuda APIs, pytorch typically provides wrappers for those. NEVER allocate memory with cudaMalloc/cudaFree, use only
caching allocator- Avoid host-device synchronizations (can happen if you are copying data from cpu to gpu and back, or call
.item()
on a tensor)- In pytorch core, codegen takes care of making sure that current device is the same as the device on which tensors are located, and
that all arguments are on the same device. If you are writing
out-of-core operations, you will need to take care of this yourself编写 CUDA 代码的常见问题
- 如果您正在编写内核,请尝试使用现有实用程序来计算块数、在内核中执行原子操作、在块中执行缩减。此外,cub 还提供了有用的块范围原语。
- 避免使用原始 cuda API,pytorch 通常为这些 API 提供包装器。切勿使用 cudaMalloc/cudaFree 分配内存,仅使用缓存分配器
- 避免主机设备同步(如果您将数据从 cpu 复制到 GPU 并返回,或者在张量上调用 .item() ,则可能会发生这种情况)
- 在 pytorch 核心中,codegen 负责确保当前设备与张量所在的设备相同,并且所有参数都位于同一设备上。如果您正在编写核心外操作,则需要自己处理这个问题
Debugging and profiling tips
- Cuda execution is asynchronous, so backtrace you are getting from cuda error is likely pointing to the wrong place. Error message would
typically suggest running withCUDA_LAUNCH_BLOCKING=1
, do that!- Use
cuda-memcheck
andcuda-gdb
to get more detailed information- you can use
torch.cuda.set_sync_debug_mode
to warn or error out on cuda synchronizations, if you are trying to understand where
synchronizations are coming from in your workload or if you are
accidentally synchronizing in your operations- Use pytorch built-in profiler (kineto) or nsys to get information on GPU utilization and most time-consuming kernels
调试和分析技巧
- Cuda 执行是异步的,因此您从 cuda 错误获得的回溯可能指向错误的位置。错误消息通常会建议使用 CUDA_LAUNCH_BLOCKING=1 运行,这样做!
- 使用 cuda-memcheck 和 cuda-gdb 获取更详细的信息
- 如果您试图了解同步来自工作负载中的何处,或者您在操作中意外同步,则可以使用 torch.cuda.set_sync_debug_mode 来警告 cuda 同步或出错
- 使用 pytorch 内置分析器 (kineto) 或 nsys 获取有关 GPU 利用率和最耗时内核的信息
In 2018, we merged Caffe2 into the PyTorch source repository. While the
steady state aspiration is that Caffe2 and PyTorch share code freely,
in the meantime there will be some separation.
2018 年,我们将 Caffe2 合并到 PyTorch 源存储库中。虽然稳定状态的愿望是 Caffe2 和 PyTorch 自由共享代码,但同时也会有一些分离。
There are a few “unusual” directories which, for historical reasons,
are Caffe2/PyTorch specific. Here they are:
CMakeLists.txt
, Makefile
, binaries
, cmake
, conda
, modules
,
scripts
are Caffe2-specific. Don’t put PyTorch code in them without
extra coordination.
mypy*
, requirements.txt
, setup.py
, test
, tools
are
PyTorch-specific. Don’t put Caffe2 code in them without extra
coordination.
由于历史原因,有一些“不寻常”的目录是 Caffe2/PyTorch 特定的。他们来了:
CMakeLists.txt
, Makefile
, binaries
, cmake
, conda
, modules
, scripts
是 Caffe2 特定的。如果没有额外的协调,请勿将 PyTorch 代码放入其中。mypy*
, requirements.txt
, setup.py
, test
, tools
是 PyTorch 特定的。如果没有额外的协调,请勿将 Caffe2 代码放入其中。