pytorch/CONTRIBUTING.md 机翻

因精力有限,目前只会翻译对自己有用的部分,且主要使用谷歌翻译,不准确的地方请参考原文。

文章目录

    • Developing PyTorch
      • Prerequisites
      • Instructions
      • Tips and Debugging
    • Codebase structure
    • Profiling with `py-spy`
    • Managing multiple build trees
    • C++ development tips
      • Build only what you need
      • Code completion and IDE support
      • Make no-op build fast
        • Use Ninja
        • Use CCache
      • GDB integration
      • C++ stacktraces
    • CUDA development tips
    • Common gotchas for writing CUDA code
    • Debugging and profiling tips
    • Caffe2 notes

Developing PyTorch

开发 PyTorch

A full set of instructions on installing PyTorch from source is here:
https://github.com/pytorch/pytorch#from-source

有关从源代码安装 PyTorch 的完整说明如下:https://github.com/pytorch/pytorch#from-source

To develop PyTorch on your machine, here are some tips:

要在您的计算机上开发 PyTorch,这里有一些提示:

Prerequisites

  • CMake. You can install it via pip install cmake
  • Python >= 3.7 (3.7.6+ recommended)

先决条件

  • CMake。您可以通过 pip install cmake 安装它
  • Python >= 3.7(推荐 3.7.6+)

Instructions

指示

  1. Uninstall all existing PyTorch installs. You may need to run pip uninstall torch multiple times. You’ll know torch is fully
    uninstalled when you see WARNING: Skipping torch as it is not installed. (You should only have to pip uninstall a few times, but
    you can always uninstall with timeout or in a loop if you’re feeling
    lazy.)

卸载所有现有的 PyTorch 安装。您可能需要多次运行 pip uninstall torch。当您看到WARNING: Skipping torch as it is not installed时,您就知道 torch 已完全卸载。 (您应该只需要 pip uninstall 几次,但如果您感到懒惰,则可以随时超时或循环卸载。)

conda uninstall pytorch -y
yes | pip uninstall torch
  1. Clone a copy of PyTorch from source:

从源克隆 PyTorch 的副本:

git clone https://github.com/pytorch/pytorch
cd pytorch

If you already have PyTorch from source, update it:

如果您已经从源代码获得了 PyTorch,请更新它:

git pull --rebase
git submodule sync --recursive
git submodule update --init --recursive --jobs 0

If you want to have no-op incremental rebuilds (which are fast), see the section below titled “Make no-op build fast.”

  1. Follow the instructions for installing PyTorch from source, except when it’s time to install PyTorch instead of invoking setup.py install you’ll want to call setup.py develop instead:

按照 installing PyTorch from source 的说明进行操作,但当需要安装 PyTorch 时,你需要调用 setup.py develop,而不是调用 setup.py install

Specifically, the change you have to make is to replace

具体来说,您必须进行的更改是替换

python setup.py install

with

python setup.py develop

This mode will symlink the Python files from the current local source
tree into the Python install. This way when you modify a Python file, you
won’t need to reinstall PyTorch again and again. This is especially
useful if you are only changing Python files.

此模式会将 Python 文件从当前本地源代码树符号链接到 Python 安装中。这样当你修改Python文件时,你就不需要一次又一次地重新安装PyTorch。如果您只更改 Python 文件,这尤其有用。

For example:

  • Install local PyTorch in develop mode
  • modify your Python file torch/__init__.py (for example)
  • test functionality

例如:

  • 在开发模式下安装本地 PyTorch
  • 修改你的Python文件torch/init.py(例如)
  • 测试功能

You do not need to repeatedly install after modifying Python files (.py). However, you would need to reinstall
if you modify Python interface (.pyi, .pyi.in) or non-Python files (.cpp, .cc, .cu, .h, …).

修改Python文件(.py)后无需重复安装。 但是,如果修改 Python 接口(.pyi、.pyi.in)或非 Python 文件(.cpp、.cc、.cu、.h、…),则需要重新安装。

In case you want to reinstall, make sure that you uninstall PyTorch
first by running pip uninstall torch until you see WARNING: Skipping torch as it is not installed; next run python setup.py clean. After
that, you can install in develop mode again.

如果您想重新安装,请确保首先运行 pip uninstall torch 来卸载 PyTorch,直到看到 WARNING: Skipping torch as it is not installed;接下来运行 python setup.py clean。之后,您可以再次以开发模式安装。

Tips and Debugging

  • If you run into errors when running python setup.py develop, here are some debugging steps:
    1. Run printf '#include \nint main() { printf("Hello World");}'|clang -x c -; ./a.out to make sure
      your CMake works and can compile this simple Hello World program without errors.
    2. Nuke your build directory. The setup.py script compiles binaries into the build folder and caches many
      details along the way, which saves time the next time you build. If you’re running into issues, you can always
      rm -rf build from the toplevel pytorch directory and start over.
    3. If you have made edits to the PyTorch repo, commit any change you’d like to keep and clean the repo with the
      following commands (note that clean really removes all untracked files and changes.):
    git submodule deinit -f .
    git clean -xdf
    python setup.py clean
    git submodule update --init --recursive --jobs 0 # very important to sync the submodules
    python setup.py develop                          # then try running the command again
    
    1. The main step within python setup.py develop is running make from the build directory. If you want to
      experiment with some environment variables, you can pass them into the command:
    ENV_KEY1=ENV_VAL1[, ENV_KEY2=ENV_VAL2]* python setup.py develop
    

如果在运行 python setup.pydevelop 时遇到错误,这里有一些调试步骤:

  • 运行 printf '#include \nint main() { printf("Hello World");}'|clang -x c -; ./a.out 以确保您的 CMake 可以正常工作并且可以毫无错误地编译这个简单的 Hello World 程序。
  • 核对你的构建目录。 setup.py 脚本将二进制文件编译到构建文件夹中,并在此过程中缓存许多详细信息,这可以在下次构建时节省时间。如果您遇到问题,您可以随时从顶层 pytorch 目录中 rm -rf build 并重新开始。
  • ……
  • python setup.py develop 中的主要步骤是从构建目录运行 make。如果您想尝试一些环境变量,可以将它们传递到命令中。

Codebase structure

代码库结构

  • c10 - Core library files that work everywhere, both server
    and mobile. We are slowly moving pieces from ATen/core
    here. This library is intended only to contain essential functionality,
    and appropriate to use in settings where binary size matters. (But
    you’ll have a lot of missing functionality if you try to use it
    directly.)

c10 - 可在服务器和移动设备上随处使用的核心库文件我们正在慢慢地将 ATen/core 的部分移到这里。该库仅旨在包含基本功能,并且适合在二进制大小很重要的设置中使用。 (但是如果您尝试直接使用它,则会缺少很多功能。)

  • aten - C++ tensor library for PyTorch (no autograd support)
    • src - README
      • ATen
        • core - Core functionality of ATen. This
          is migrating to top-level c10 folder.
        • native - Modern implementations of
          operators. If you want to write a new operator, here is where
          it should go. Most CPU operators go in the top level directory,
          except for operators which need to be compiled specially; see
          cpu below.
          • cpu - Not actually CPU
            implementations of operators, but specifically implementations
            which are compiled with processor-specific instructions, like
            AVX. See the README for more
            details.
          • cuda - CUDA implementations of
            operators.
          • sparse - CPU and CUDA
            implementations of COO sparse tensor operations
          • mkl mkldnn
            miopen cudnn
            • implementations of operators which simply bind to some
              backend library.
          • quantized - Quantized tensor (i.e. QTensor) operation implementations. README contains details including how to implement native quantized operations.

aten - PyTorch 的 C++ 张量库(不支持 autograd)

  • core - ATen 的核心功能。这正在迁移到顶级 c10 文件夹
  • native - 运算符的现代实现。如果你想编写一个新的运算符,这里就是它应该去的地方。大多数CPU操作符都在顶级目录中,除了需要专门编译的操作符;参见下面的CPU。
    • cpu - 实际上不是运算符的 CPU 实现,而是使用特定于处理器的指令(如 AVX)编译的具体实现。有关更多详细信息,请参阅自述文件。
    • cuda - 运算符的 CUDA 实现
    • sparse - COO 稀疏张量运算的 CPU 和 CUDA 实现
    • mkl mkldnn miopen cudnn - 简单地绑定到某些后端库的运算符的实现。
    • quantized - 量化张量(即 QTensor)操作实现。自述文件包含详细信息,包括如何实现本机量化操作。
  • torch - The actual PyTorch library. Everything that is not
    in csrc is a Python module, following the PyTorch Python
    frontend module structure.
    • csrc - C++ files composing the PyTorch library. Files
      in this directory tree are a mix of Python binding code, and C++
      heavy lifting. Consult setup.py for the canonical list of Python
      binding files; conventionally, they are often prefixed with
      python_. README
      • jit - Compiler and frontend for TorchScript JIT
        frontend. README
      • autograd - Implementation of reverse-mode automatic differentiation. README
      • api - The PyTorch C++ frontend.
      • distributed - Distributed training
        support for PyTorch.

torch - 实际的 PyTorch 库。所有 csrc 中没有的内容都是 Python 模块,遵循 PyTorch Python 前端模块结构。

  • csrc - 组成 PyTorch 库的 C++ 文件。此目录树中的文件是 Python 绑定代码和 C++ 繁重工作的混合体。请参阅 setup.py 以获取 Python 绑定文件的规范列表;按照惯例,它们通常以 python_ 为前缀。
    • jit - TorchScript JIT 前端的编译器和前端。
    • autograd - 反向模式自动微分的实现。
    • api - PyTorch C++ 前端
    • distributed - PyTorch 的分布式训练支持。
  • tools - Code generation scripts for the PyTorch library.
    See README of this directory for more details.

tools - PyTorch 库的代码生成脚本。

  • test - Python unit tests for PyTorch Python frontend.
    • test_torch.py - Basic tests for PyTorch
      functionality.
    • test_autograd.py - Tests for non-NN
      automatic differentiation support.
    • test_nn.py - Tests for NN operators and
      their automatic differentiation.
    • test_jit.py - Tests for the JIT compiler
      and TorchScript.
    • cpp - C++ unit tests for PyTorch C++ frontend.
      • api - README
      • jit - README
      • tensorexpr - README
    • expect - Automatically generated “expect” files
      which are used to compare against expected output.
    • onnx - Tests for ONNX export functionality,
      using both PyTorch and Caffe2.

test - PyTorch Python 前端的 Python 单元测试。

  • test_torch.py​​ - PyTorch 功能的基本测试。
  • test_autograd.py - 测试非神经网络自动微分支持。
  • test_nn.py - 测试 NN 运算符及其自动微分。
  • test_jit.py - JIT 编译器和 TorchScript 的测试。
  • cpp - PyTorch C++ 前端的 C++ 单元测试。
  • api-
  • jit
  • tensorexpr
  • expect - 自动生成“expect”文件,用于与预期输出进行比较。
  • onnx - 使用 PyTorch 和 Caffe2 测试 ONNX 导出功能。
  • caffe2 - The Caffe2 library.
    • core - Core files of Caffe2, e.g., tensor, workspace,
      blobs, etc.
    • operators - Operators of Caffe2.
    • python - Python bindings to Caffe2.

caffe2 - Caffe2 库。

  • core - Caffe2 的核心文件,例如张量、工作空间、blob 等。
  • operators - Caffe2 的运算符。
  • python - Python 绑定到 Caffe2.
  • ……
  • .circleci - CircleCI configuration management. README

Profiling with py-spy

Evaluating the performance impact of code changes in PyTorch can be complicated,
particularly if code changes happen in compiled code. One simple way to profile
both Python and C++ code in PyTorch is to use
py-spy, a sampling profiler for Python
that has the ability to profile native code and Python code in the same session.

评估 PyTorch 中代码更改对性能的影响可能很复杂,尤其是在已编译代码中发生代码更改时。在 PyTorch 中分析 Python 和 C++ 代码的一种简单方法是使用 py-spy,这是一种 Python 采样分析器,能够在同一会话中分析本机代码和 Python 代码。

py-spy can be installed via pip:

pip install py-spy

To use py-spy, first write a Python test script that exercises the
functionality you would like to profile. For example, this script profiles
torch.add:

要使用 py-spy,首先编写一个 Python 测试脚本来执行您想要分析的功能。例如,此脚本配置 torch.add

import torch

t1 = torch.tensor([[1, 1], [1, 1.]])
t2 = torch.tensor([[0, 0], [0, 0.]])

for _ in range(1000000):
    torch.add(t1, t2)

Since the torch.add operation happens in microseconds, we repeat it a large
number of times to get good statistics. The most straightforward way to use
py-spy with such a script is to generate a flame
graph:

由于 torch.add 操作发生在微秒内,因此我们重复多次以获得良好的统计数据。将 py-spy 与此类脚本结合使用的最直接方法是生成火焰图:

py-spy record -o profile.svg --native -- python test_tensor_tensor_add.py

This will output a file named profile.svg containing a flame graph you can
view in a web browser or SVG viewer. Individual stack frame entries in the graph
can be selected interactively with your mouse to zoom in on a particular part of
the program execution timeline. The --native command-line option tells
py-spy to record stack frame entries for PyTorch C++ code. To get line numbers
for C++ code it may be necessary to compile PyTorch in debug mode by prepending
your setup.py develop call to compile PyTorch with DEBUG=1. Depending on
your operating system it may also be necessary to run py-spy with root
privileges.

这将输出一个名为 profile.svg 的文件,其中包含您可以在 Web 浏览器或 SVG 查看器中查看的火焰图。可以使用鼠标以交互方式选择图中的各个堆栈帧条目,以放大程序执行时间线的特定部分。 --native 命令行选项告诉 py-spy 记录 PyTorch C++ 代码的堆栈帧条目。

要获取 C++ 代码的行号,可能需要在调试模式下编译 PyTorch,方法是在 setup.py develop 调用之前添加 DEBUG=1 来编译 PyTorch。根据您的操作系统,可能还需要以 root 权限运行 py-spy。

py-spy can also work in an htop-like “live profiling” mode and can be
tweaked to adjust the stack sampling rate, see the py-spy readme for more
details.

py-spy 还可以在类似 htop 的“实时分析”模式下工作,并且可以进行调整以调整堆栈采样率,请参阅 py-spy 自述文件了解更多详细信息。

Managing multiple build trees

One downside to using python setup.py develop is that your development
version of PyTorch will be installed globally on your account (e.g., if
you run import torch anywhere else, the development version will be
used.

使用 python setup.py develop 的一个缺点是 PyTorch 的开发版本将全局安装在您的帐户上(例如,如果您在其他任何地方运行 import torch,则将使用开发版本。

If you want to manage multiple builds of PyTorch, you can make use of
conda environments to maintain
separate Python package environments, each of which can be tied to a
specific build of PyTorch. To set one up:

如果您想管理 PyTorch 的多个构建,可以利用 conda 环境来维护单独的 Python 包环境,每个环境都可以绑定到特定的 PyTorch 构建。要设置一项:

conda create -n pytorch-myfeature
source activate pytorch-myfeature
# if you run python now, torch will NOT be installed
python setup.py develop

C++ development tips

If you are working on the C++ code, there are a few important things that you
will want to keep in mind:

  1. How to rebuild only the code you are working on.
  2. How to make rebuilds in the absence of changes go faster.

如果您正在处理 C++ 代码,那么您需要记住一些重要的事情:

  • 如何仅重建您正在处理的代码。
  • 如何在没有变化的情况下使重建速度更快。

Build only what you need

只构建您需要的内容

python setup.py build will build everything by default, but sometimes you are
only interested in a specific component.

  • Working on a test binary? Run (cd build && ninja bin/test_binary_name) to
    rebuild only that test binary (without rerunning cmake). (Replace ninja with
    make if you don’t have ninja installed).
  • Don’t need Caffe2? Pass BUILD_CAFFE2=0 to disable Caffe2 build.

python setup.py build 默认情况下会构建所有内容,但有时您只对特定组件感兴趣。

  • 正在处理测试二进制文件?运行 (cd build && ninja bin/test_binary_name) 仅重建该测试二进制文件(无需重新运行 cmake)。 (如果没有安装 ninja,请将 ninja 替换为 make)。
  • 不需要 Caffe2 吗?通过 BUILD_CAFFE2=0 禁用 Caffe2 构建。

On the initial build, you can also speed things up with the environment
variables DEBUG, USE_DISTRIBUTED, USE_MKLDNN, USE_CUDA, BUILD_TEST, USE_FBGEMM, USE_NNPACK and USE_QNNPACK.

  • DEBUG=1 will enable debug builds (-g -O0)
  • REL_WITH_DEB_INFO=1 will enable debug symbols with optimizations (-g -O3)
  • USE_DISTRIBUTED=0 will disable distributed (c10d, gloo, mpi, etc.) build.
  • USE_MKLDNN=0 will disable using MKL-DNN.
  • USE_CUDA=0 will disable compiling CUDA (in case you are developing on something not CUDA related), to save compile time.
  • BUILD_TEST=0 will disable building C++ test binaries.
  • USE_FBGEMM=0 will disable using FBGEMM (quantized 8-bit server operators).
  • USE_NNPACK=0 will disable compiling with NNPACK.
  • USE_QNNPACK=0 will disable QNNPACK build (quantized 8-bit operators).
  • USE_XNNPACK=0 will disable compiling with XNNPACK.

在初始构建中,您还可以使用环境变量 DEBUG、USE_DISTRIBUTED、USE_MKLDNN、USE_CUDA、BUILD_TEST、USE_FBGEMM、USE_NNPACK 和 USE_QNNPACK 来加快速度。

  • DEBUG=1 将启用调试版本 (-g -O0)
  • REL_WITH_DEB_INFO=1 将启用带有优化的调试符号 (-g -O3)
  • USE_DISTRIBUTED=0 将禁用分布式(c10d、gloo、mpi 等)构建。
  • USE_MKLDNN=0 将禁用 MKL-DNN。
  • USE_CUDA=0 将禁用编译 CUDA(如果您正在开发与 CUDA 无关的东西),以节省编译时间。
  • BUILD_TEST=0 将禁用构建 C++ 测试二进制文件。
  • USE_FBGEMM=0 将禁用 FBGEMM(量化 8 位服务器运算符)。
  • USE_NNPACK=0 将禁用使用 NNPACK 进行编译。
  • USE_QNNPACK=0 将禁用 QNNPACK 构建(量化 8 位运算符)。
  • USE_XNNPACK=0 将禁用使用 XNNPACK 进行编译。

For example:

DEBUG=1 USE_DISTRIBUTED=0 USE_MKLDNN=0 USE_CUDA=0 BUILD_TEST=0 USE_FBGEMM=0 USE_NNPACK=0 USE_QNNPACK=0 USE_XNNPACK=0 python setup.py develop

For subsequent builds (i.e., when build/CMakeCache.txt exists), the build
options passed for the first time will persist; please run ccmake build/, run
cmake-gui build/, or directly edit build/CMakeCache.txt to adapt build
options.

对于后续构建(即,当 build/CMakeCache.txt 存在时),第一次传递的构建选项将保留;请运行 ccmake build/、运行 cmake-gui build/ 或直接编辑 build/CMakeCache.txt 以适应构建选项。

Code completion and IDE support

When using python setup.py develop, PyTorch will generate
a compile_commands.json file that can be used by many editors
to provide command completion and error highlighting for PyTorch’s
C++ code. You need to pip install ninja to generate accurate
information for the code in torch/csrc. More information at:

  • https://sarcasm.github.io/notes/dev/compilation-database.html

使用 python setup.py develop 时,PyTorch 将生成一个 compile_commands.json 文件,许多编辑器可以使用该文件为 PyTorch 的 C++ 代码提供命令完成和错误突出显示。您需要 pip install ninja 才能为 torch/csrc 中的代码生成准确的信息。更多信息请访问:

Make no-op build fast

Use Ninja

By default, cmake will use its Makefile generator to generate your build
system. You can get faster builds if you install the ninja build system
with pip install ninja. If PyTorch was already built, you will need
to run python setup.py clean once after installing ninja for builds to
succeed.

默认情况下,cmake 将使用其 Makefile 生成器来生成构建系统。如果使用 pip install ninja 安装 ninja 构建系统,您可以获得更快的构建速度。如果 PyTorch 已经构建,则需要在安装 ninja 后运行 python setup.py clean 一次才能成功构建。

Use CCache

Even when dependencies are tracked with file modification, there are many
situations where files get rebuilt when a previous compilation was exactly the
same. Using ccache in a situation like this is a real time-saver.

即使通过文件修改来跟踪依赖关系,在许多情况下,当先前的编译完全相同时,文件也会被重建。在这种情况下使用 ccache 确实可以节省时间。

Before building pytorch, install ccache from your package manager of choice:

在构建 pytorch 之前,从您选择的包管理器安装 ccache:

conda install ccache -c conda-forge
sudo apt install ccache
sudo yum install ccache
brew install ccache

You may also find the default cache size in ccache is too small to be useful.
The cache sizes can be increased from the command line:

您可能还会发现 ccache 中的默认缓存大小太小而无用。可以从命令行增加缓存大小:

# config: cache dir is ~/.ccache, conf file ~/.ccache/ccache.conf
# max size of cache
ccache -M 25Gi  # -M 0 for unlimited
# unlimited number of files
ccache -F 0

To check this is working, do two clean builds of pytorch in a row. The second
build should be substantially and noticeably faster than the first build. If
this doesn’t seem to be the case, check the CMAKE__COMPILER_LAUNCHER
rules in build/CMakeCache.txt, where is C, CXX and CUDA.
Each of these 3 variables should contain ccache, e.g.

要检查它是否有效,请连续进行两次干净的 pytorch 构建。第二次构建应该比第一次构建明显更快。如果情况并非如此,请检查 build/CMakeCache.txt 中的 CMAKE__COMPILER_LAUNCHER 规则,其中 是 C、CXX 和 CUDA。这 3 个变量中的每一个都应该包含 ccache,例如

//CXX compiler launcher
CMAKE_CXX_COMPILER_LAUNCHER:STRING=/usr/bin/ccache

If not, you can define these variables on the command line before invoking setup.py.

如果没有,您可以在调用 setup.py 之前在命令行上定义这些变量。

export CMAKE_C_COMPILER_LAUNCHER=ccache
export CMAKE_CXX_COMPILER_LAUNCHER=ccache
export CMAKE_CUDA_COMPILER_LAUNCHER=ccache
python setup.py develop

GDB integration

If you are debugging pytorch inside GDB, you might be interested in
pytorch-gdb. This script introduces some
pytorch-specific commands which you can use from the GDB prompt. In
particular, torch-tensor-repr prints a human-readable repr of an at::Tensor
object. Example of usage:

如果您在 GDB 中调试 pytorch,您可能会对 pytorch-gdb 感兴趣。该脚本介绍了一些特定于 pytorch 的命令,您可以在 GDB 提示符下使用这些命令。特别是,torch-tensor-repr 打印 at::Tensor 对象的人类可读的 repr。使用示例:

$ gdb python
GNU gdb (GDB) 9.2
[...]
(gdb) # insert a breakpoint when we call .neg()
(gdb) break at::Tensor::neg
Function "at::Tensor::neg" not defined.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 1 (at::Tensor::neg) pending.

(gdb) run
[...]
>>> import torch
>>> t = torch.tensor([1, 2, 3, 4], dtype=torch.float64)
>>> t
tensor([1., 2., 3., 4.], dtype=torch.float64)
>>> t.neg()

Thread 1 "python" hit Breakpoint 1, at::Tensor::neg (this=0x7ffb118a9c88) at aten/src/ATen/core/TensorBody.h:3295
3295    inline at::Tensor Tensor::neg() const {
(gdb) # the default repr of 'this' is not very useful
(gdb) p this
$1 = (const at::Tensor * const) 0x7ffb118a9c88
(gdb) p *this
$2 = {impl_ = {target_ = 0x55629b5cd330}}
(gdb) torch-tensor-repr *this
Python-level repr of *this:
tensor([1., 2., 3., 4.], dtype=torch.float64)

GDB tries to automatically load pytorch-gdb thanks to the
.gdbinit at the root of the pytorch repo. However, auto-loadings is disabled by default, because of security reasons:

由于 pytorch 存储库根目录下的 .gdbinit,GDB 尝试自动加载 pytorch-gdb。但是,出于安全原因,默认情况下禁用自动加载:

$ gdb
warning: File "/path/to/pytorch/.gdbinit" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load".
To enable execution of this file add
        add-auto-load-safe-path /path/to/pytorch/.gdbinit
line to your configuration file "/home/YOUR-USERNAME/.gdbinit".
To completely disable this security protection add
        set auto-load safe-path /
line to your configuration file "/home/YOUR-USERNAME/.gdbinit".
For more information about this security protection see the
"Auto-loading safe path" section in the GDB manual.  E.g., run from the shell:
        info "(gdb)Auto-loading safe path"
(gdb)

As gdb itself suggests, the best way to enable auto-loading of pytorch-gdb
is to add the following line to your ~/.gdbinit (i.e., the .gdbinit file
which is in your home directory, not /path/to/pytorch/.gdbinit):

正如 gdb 本身所建议的,启用 pytorch-gdb 自动加载的最佳方法是将以下行添加到 ~/.gdbinit (即位于主目录中的 .gdbinit 文件,而不是 /path/to/pytorch /.gdbinit):

add-auto-load-safe-path /path/to/pytorch/.gdbinit

C++ stacktraces

Set TORCH_SHOW_CPP_STACKTRACES=1 to get the C++ stacktrace when an error occurs in Python.

设置 TORCH_SHOW_CPP_STACKTRACES=1在 Python 中发生错误时获取 C++ 堆栈跟踪

CUDA development tips

If you are working on the CUDA code, here are some useful CUDA debugging tips:

  1. CUDA_DEVICE_DEBUG=1 will enable CUDA device function debug symbols (-g -G).
    This will be particularly helpful in debugging device code. However, it will
    slow down the build process for about 50% (compared to only DEBUG=1), so use wisely.
  2. cuda-gdb and cuda-memcheck are your best CUDA debugging friends. Unlikegdb,
    cuda-gdb can display actual values in a CUDA tensor (rather than all zeros).
  3. CUDA supports a lot of C++11/14 features such as, std::numeric_limits, std::nextafter,
    std::tuple etc. in device code. Many of such features are possible because of the
    –expt-relaxed-constexpr
    nvcc flag. There is a known issue
    that ROCm errors out on device code, which uses such stl functions.
  4. A good performance metric for a CUDA kernel is the
    Effective Memory Bandwidth.
    It is useful for you to measure this metric whenever you are writing/optimizing a CUDA
    kernel. Following script shows how we can measure the effective bandwidth of CUDA uniform_
    kernel.
    import torch
    from torch.utils.benchmark import Timer
    size = 128*512
    nrep = 100
    nbytes_read_write = 4 # this is number of bytes read + written by a kernel. Change this to fit your kernel.
    
    for i in range(10):
        a=torch.empty(size).cuda().uniform_()
        torch.cuda.synchronize()
        out = a.uniform_()
        torch.cuda.synchronize()
        t = Timer(stmt="a.uniform_()", globals=globals())
        res = t.blocked_autorange()
        timec = res.median
        print("uniform, size, elements", size, "forward", timec, "bandwidth (GB/s)", size*(nbytes_read_write)*1e-9/timec)
        size *=2
    

如果您正在处理 CUDA 代码,这里有一些有用的 CUDA 调试技巧:

  • CUDA_DEVICE_DEBUG=1 将启用 CUDA 设备功能调试符号 (-g -G)。这对于调试设备代码特别有帮助。但是,它会使构建过程减慢约 50%(与仅 DEBUG=1 相比),因此请明智使用。
  • cuda-gdbcuda-memcheck 是您最好的 CUDA 调试朋友。与 gdb 不同,cuda-gdb 可以显示 CUDA 张量中的实际值(而不是全零)。
  • CUDA 在设备代码中支持许多 C++11/14 功能,例如 std::numeric_limitsstd::nextafterstd::tuple 等。由于 --expt-relaxed-constexpr nvcc 标志,许多这样的功能都是可能的。有一个已知问题,即 ROCm 在使用此类 stl 函数的设备代码上出错。
  • CUDA 内核的一个良好性能指标是有效内存带宽。每当您编写/优化 CUDA 内核时,测量此指标都会很有用。以下脚本显示了如何测量 CUDA Uniform_ 内核的有效带宽。

See more cuda development tips here

以下内容摘自 CUDA basics

Common gotchas for writing CUDA code

  • If you are writing your kernel, try to use existing utilities to calculate the number of blocks, to perform atomic operations in the
    kernel, to perform reductions in the block. Additionally, cub also
    provides block-wide primitives that can be useful.
  • Avoid using raw cuda APIs, pytorch typically provides wrappers for those. NEVER allocate memory with cudaMalloc/cudaFree, use only
    caching allocator
  • Avoid host-device synchronizations (can happen if you are copying data from cpu to gpu and back, or call .item() on a tensor)
  • In pytorch core, codegen takes care of making sure that current device is the same as the device on which tensors are located, and
    that all arguments are on the same device. If you are writing
    out-of-core operations, you will need to take care of this yourself

编写 CUDA 代码的常见问题

  • 如果您正在编写内核,请尝试使用现有实用程序来计算块数、在内核中执行原子操作、在块中执行缩减。此外,cub 还提供了有用的块范围原语。
  • 避免使用原始 cuda API,pytorch 通常为这些 API 提供包装器。切勿使用 cudaMalloc/cudaFree 分配内存,仅使用缓存分配器
  • 避免主机设备同步(如果您将数据从 cpu 复制到 GPU 并返回,或者在张量上调用 .item() ,则可能会发生这种情况)
  • 在 pytorch 核心中,codegen 负责确保当前设备与张量所在的设备相同,并且所有参数都位于同一设备上。如果您正在编写核心外操作,则需要自己处理这个问题

Debugging and profiling tips

  • Cuda execution is asynchronous, so backtrace you are getting from cuda error is likely pointing to the wrong place. Error message would
    typically suggest running with CUDA_LAUNCH_BLOCKING=1, do that!
  • Use cuda-memcheck and cuda-gdb to get more detailed information
  • you can use torch.cuda.set_sync_debug_mode to warn or error out on cuda synchronizations, if you are trying to understand where
    synchronizations are coming from in your workload or if you are
    accidentally synchronizing in your operations
  • Use pytorch built-in profiler (kineto) or nsys to get information on GPU utilization and most time-consuming kernels

调试和分析技巧

  • Cuda 执行是异步的,因此您从 cuda 错误获得的回溯可能指向错误的位置。错误消息通常会建议使用 CUDA_LAUNCH_BLOCKING=1 运行,这样做!
  • 使用 cuda-memcheck 和 cuda-gdb 获取更详细的信息
  • 如果您试图了解同步来自工作负载中的何处,或者您在操作中意外同步,则可以使用 torch.cuda.set_sync_debug_mode 来警告 cuda 同步或出错
  • 使用 pytorch 内置分析器 (kineto) 或 nsys 获取有关 GPU 利用率和最耗时内核的信息

Caffe2 notes

In 2018, we merged Caffe2 into the PyTorch source repository. While the
steady state aspiration is that Caffe2 and PyTorch share code freely,
in the meantime there will be some separation.

2018 年,我们将 Caffe2 合并到 PyTorch 源存储库中。虽然稳定状态的愿望是 Caffe2 和 PyTorch 自由共享代码,但同时也会有一些分离。

There are a few “unusual” directories which, for historical reasons,
are Caffe2/PyTorch specific. Here they are:

  • CMakeLists.txt, Makefile, binaries, cmake, conda, modules,
    scripts are Caffe2-specific. Don’t put PyTorch code in them without
    extra coordination.

  • mypy*, requirements.txt, setup.py, test, tools are
    PyTorch-specific. Don’t put Caffe2 code in them without extra
    coordination.

由于历史原因,有一些“不寻常”的目录是 Caffe2/PyTorch 特定的。他们来了:

  • CMakeLists.txt, Makefile, binaries, cmake, conda, modules, scripts 是 Caffe2 特定的。如果没有额外的协调,请勿将 PyTorch 代码放入其中。
  • mypy*, requirements.txt, setup.py, test, tools 是 PyTorch 特定的。如果没有额外的协调,请勿将 Caffe2 代码放入其中。

你可能感兴趣的:(PyTorch,pytorch,python)