诸神缄默不语

PyTorch Python API详解大全（持续更新ing...）

诸神缄默不语-个人CSDN博文目录

VX号“PolarisRisingWar”可直接搜索添加作者好友讨论。

具体内容以官方文档为准。
最早更新时间：2021.4.23
最近更新时间：2021.6.4

文章目录

0. 常用属性及函数统一解释
1. torch
- 1.1 Tensors
- - 1.1.1 Creation Ops
  - 1.1.2 Indexing, Slicing, Joining, Mutating Ops
- 1.2 Generators
- 1.3 Random Sampling
- - 1.3.1 torch.default_generator
- 1.4 Serialization
- 1.5 Parallelism
- 1.6 Locally disabling gradient computation
- 1.7 Math operations
- - 1.7.1 Pointwise Ops
  - 1.7.2 Reduction Ops
  - 1.7.3 Comparison Ops
  - 1.7.4 Spectral Ops
  - 1.7.5 Other Operations
  - 1.7.6 BLAS and LAPACK Operations
- 1.8 Utilities
2. torch.nn
- 2.1 Containers
- 2.2 Convolution Layers
- 2.3 Pooling layers
- 2.4 Padding Layers
- 2.5 Non-linear Activations (weighted sum, nonlinearity)
- 2.6 Non-linear Activations (other)
- 2.7 Normalization Layers
- 2.8 Recurrent Layers
- 2.9 Transformer Layers
- 2.10 Linear Layers
- 2.11 Dropout Layers
- 2.12 Sparse Layers
3. torch.nn.functional
- 3.1 Convolution functions
- 3.2 Pooling functions
- 3.3 Non-linear activation functions
4. torch.Tensor
5. Tensor Attributes
- 5.1 `torch.dtype`
- 5.2 `torch.device`
- 5.3 `torch.layout`
- 5.4 `torch.memory_format`
- 5.5 不知道为什么没说，但是确实有
6. Tensor Views
7. torch.autograd
- 7.1 Functional higher level API
- 7.2 Locally disabling gradient computation
- 7.3 Default gradient layouts
- 7.4 In-place operations on Tensors
- 7.5 Variable (deprecated)
- 7.6 Tensor autograd functions
8. torch.cuda
- 8.1 Random Number Generator
9. torch.cuda.amp
10. torch.backends
- 10.1 torch.backends.cuda
- 10.2 torch.backends.cudnn
11. torch.distributed
12. torch.distributions
13. torch.fft
14. torch.futures
15. torch.fx
16. torch.hub
17. torch.jit
18. torch.linalg
19. torch.overrides
20. torch.profiler
21. torch.nn.init
22. torch.onnx
22. torch.optim
- 22.1 How to use an optimizer
- 22.2 Algorithms
23. Complex Numbers
24. DDP Communication Hooks
25. Pipeline Parallelism
26. Quantization
27. Distributed RPC Framework
28. torch.random
29. torch.sparse
30. torch.Storage
31. torch.utils.benchmark
32. torch.utils.bottleneck
33. torch.utils.checkpoint
34. torch.utils.cpp_extension
35. torch.utils.data

0. 常用属性及函数统一解释

input：Tensor格式
requires_grad：布尔值，aotugrad是否需要记录在该Tensor上的操作
size：一般是衡量尺寸的数据，可以是一串数字或collection格式（如list或tuple等）
加_是原地操作
Parameters是可以直接按照顺序放的，Keyword Arguments则必须指定参数名（用*作为区分）

1. torch

is_tensor(obj) 如果obj是Tensor，就返回True
注意：官方建议使用 isinstance(obj, Tensor) 作为代替

1.1 Tensors

1.1.1 Creation Ops

注意：通过随机取样生成Tensor的函数放在了Random sampling部分。

tensor(data, *, dtype=None, device=None, requires_grad=False, pin_memory=False)
将data转换为Tensor。data可以是list, tuple, NumPy ndarray, scalar等呈现数组形式的数据
from_numpy(ndarray)
将一个numpy.ndarray转换为Tensor。注意这一函数的两个数据对象占用同一储存空间，修改后变化也会体现在另一对象上
zeros(*size, *, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False)
返回一个尺寸为size的Tensor，所有元素都为0
ones(*size, *, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False)
返回一个尺寸为size的Tensor，所有元素都为1
torch.ones_like(input, *, dtype=None, layout=None, device=None, requires_grad=False, memory_format=torch.preserve_format)
返回一个与input有相同尺寸的Tensor，所有元素都为1

1.1.2 Indexing, Slicing, Joining, Mutating Ops

cat(tensors, dim=0, *, out=None)
串接tensors（一串Tensor，非空Tensor在非dim维度必须形状相同），返回结果
squeeze(input, dim=None, *, out=None)
去掉input（Tensor）中长度为1的维度，返回这个Tensor。如果有dim就只对指定维度进行squeeze操作。
返回值与input共享储存空间。
示例代码：

>>> x = torch.zeros(2, 1, 2, 1, 2)
>>> x.size()
torch.Size([2, 1, 2, 1, 2])
>>> y = torch.squeeze(x)
>>> y.size()
torch.Size([2, 2, 2])
>>> y = torch.squeeze(x, 0)
>>> y.size()
torch.Size([2, 1, 2, 1, 2])
>>> y = torch.squeeze(x, 1)
>>> y.size()
torch.Size([2, 2, 1, 2])

stack(tensors, dim=0, *, out=None)
连接tensors（一串形状相同的Tensor），返回结果
t(input)
零维和一维input不变，二维input转置，返回结果
transpose(input, dim0, dim1)
返回input转置的Tensor，dim0和dim1交换。
返回值与input共享储存空间。
示例代码：

>>> x = torch.randn(2, 3)
>>> x
tensor([[ 1.0028, -0.9893,  0.5809],
        [-0.1669,  0.7299,  0.4942]])
>>> torch.transpose(x, 0, 1)
tensor([[ 1.0028, -0.1669],
        [-0.9893,  0.7299],
        [ 0.5809,  0.4942]])

unsqueeze(input, dim)
在input指定维度插入一个长度为1的维度，返回Tensor
示例代码：

>>> x = torch.tensor([1, 2, 3, 4])
>>> torch.unsqueeze(x, 0)
tensor([[ 1,  2,  3,  4]])
>>> torch.unsqueeze(x, 1)
tensor([[ 1],
        [ 2],
        [ 3],
        [ 4]])

nonzero(input, *, out=None, as_tuple=False)
input是Tensor
1. as_tuple=False：返回一个2D Tensor，每一行是一个input非零元素的索引
  示例代码：

>>> torch.nonzero(torch.tensor([1, 1, 1, 0, 1]))
tensor([[ 0],
        [ 1],
        [ 2],
        [ 4]])

1.2 Generators

1.3 Random Sampling

manual_seed(seed)

1.3.1 torch.default_generator

返回默认的CPU torch.Generator

rand(*size, *, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False)
返回一个尺寸为size的Tensor，所有元素通过[0,1)的均匀分布采样生成
rand_like(input, *, dtype=None, layout=None, device=None, requires_grad=False, memory_format=torch.preserve_format)
返回一个跟input有相同尺寸的Tensor，所有元素通过[0,1)的均匀分布采样生成

1.4 Serialization

save(obj, f, pickle_module=pickle, pickle_protocol=2, _use_new_zipfile_serialization=True)
load(f, map_location=None, pickle_module=pickle, **pickle_load_args)

1.5 Parallelism

1.6 Locally disabling gradient computation

1.7 Math operations

1.7.1 Pointwise Ops

add()返回结果Tensor
1. add(input, other, *, out=None)
  other是标量，对input每个元素加上other
2. add(input, other, *, alpha=1, out=None)
  other是Tensor，other先逐元素乘标量alpha再逐元素加input
mul(input, other, *, out=None)
若other是标量：对input每个元素乘以other
若other是Tensor：input和other逐元素相乘
返回结果Tensor
tanh(input, *, out=None)
对input逐元素做tanh运算。返回Tensor

1.7.2 Reduction Ops

max()
1. max(input)
2. max(input, dim, keepdim=False, *, out=None)
3. max(input, other, *, out=None) 见1.7.3 maximum()
sum(input, *, dtype=None)
返回input（Tensor）中所有元素的加和，返回Tensor
dtype是期望返回值的dtype
mean(input)
返回input（Tensor）中所有元素的平均值，返回Tensor

1.7.3 Comparison Ops

maximum(input, other, *, out=None)
逐元素计算input和other中较大的元素

1.7.4 Spectral Ops

1.7.5 Other Operations

1.7.6 BLAS and LAPACK Operations

BLAS简介
LAPACK

matmul(input, other, *, out=None)
对input和other两个Tensor做矩阵乘法

1.8 Utilities

2. torch.nn

2.1 Containers

Module
所有神经网络单元的基本类，神经网络模型应当是Module的子类。可以在Module对象里面放Module对象（以树形结构存储），在__init__方法中将这些子Module定义为属性即可
1. parameters(recurse=True)
  返回Module参数（一堆Tensor）的迭代器，一般都是用来传入优化器的
2. zero_grad(set_to_none=False)
  设置所有模型参数的梯度为0，类似于21.2 优化器的zero_grad()
Sequential(*args)
顺序容器。Module就按照被传入构造器的顺序添加。也可以传入ordered dict
示例代码：

# Example of using Sequential
model = nn.Sequential(
          nn.Conv2d(1,20,5),
          nn.ReLU(),
          nn.Conv2d(20,64,5),
          nn.ReLU()
        )

# Example of using Sequential with OrderedDict
model = nn.Sequential(OrderedDict([
          ('conv1', nn.Conv2d(1,20,5)),
          ('relu1', nn.ReLU()),
          ('conv2', nn.Conv2d(20,64,5)),
          ('relu2', nn.ReLU())
        ]))

2.2 Convolution Layers

class Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, padding_mode=‘zeros’)
在输入信号（由几个平面图像构成）上应用2维卷积

2.3 Pooling layers

2.4 Padding Layers

2.5 Non-linear Activations (weighted sum, nonlinearity)

class ReLU(inplace=False)
逐元素应用修正线性单元（ReLU： $ReLU(x)=(x) ^+ =max(0,x)$ ）

2.6 Non-linear Activations (other)

class LogSoftmax(dim=None)

2.7 Normalization Layers

class BatchNorm1d(num_features, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
Batch Normalization

2.8 Recurrent Layers

2.9 Transformer Layers

2.10 Linear Layers

class Linear(in_features, out_features, bias=True)
对输入信号进行一个线性转换： $y = xA^T + b$

2.11 Dropout Layers

2.12 Sparse Layers

class torch.nn.Embedding(num_embeddings, embedding_dim, padding_idx=None, max_norm=None, norm_type=2.0, scale_grad_by_freq=False, sparse=False, _weight=None)
embedding词典。相当于一个大矩阵，每一行存储一个word的embedding。Embedding.weight是这个矩阵的值（Tensor），weight.data可以改变该值。
输入是索引的列表（IntTensor或LongTensor），输出是对应的词嵌入（尺寸为 (input尺寸,embedding_dim) ）。
num_embeddings是词典长度（int）。
embedding_dim是表示向量维度（int）。
1. weight：尺寸为 (num_embeddings, embedding_dim) ，从 $\mathcal{N}(0,1)$ 中初始化数据。

示例代码：

>>> # an Embedding module containing 10 tensors of size 3
>>> embedding = nn.Embedding(10, 3)
>>> # a batch of 2 samples of 4 indices each
>>> input = torch.LongTensor([[1,2,4,5],[4,3,2,9]])
>>> embedding(input)
tensor([[[-0.0251, -1.6902,  0.7172],
         [-0.6431,  0.0748,  0.6969],
         [ 1.4970,  1.3448, -0.9685],
         [-0.3677, -2.7265, -0.1685]],

        [[ 1.4970,  1.3448, -0.9685],
         [ 0.4362, -0.4004,  0.9400],
         [-0.6431,  0.0748,  0.6969],
         [ 0.9124, -2.3616,  1.1151]]])

3. torch.nn.functional

3.1 Convolution functions

3.2 Pooling functions

max_pool2d()
在输入信号（由几个平面图像构成）上应用2维最大池化

3.3 Non-linear activation functions

relu(input, inplace=False)
逐元素应用修正线性单元（ReLU： $ReLU(x)=(x) ^+ =max(0,x)$ ）
log_softmax(input, dim=None, _stacklevel=3, dtype=None)

4. torch.Tensor

Tensor是一个多维数组，只能包含一种类型的数据。
add(other, *, alpha=1) 见1.7.1 add()
add_(other, *, alpha=1) add(other, *, alpha=1)的原地版本
copy_(src, non_blocking=False)
复制src，返回结果
matmul(tensor2) @ 见1.7.6 matmul()
mean(dim=None, keepdim=False)
返回Tensor或(Tensor, Tensor)，见1.7.2 mean()
mul(value) * 见1.7.1 mul()
numpy()返回numpy.ndarray格式的Tensor，注意这一函数的两个数据对象占用同一储存空间，修改后变化也会体现在另一对象上
size()
返回self的size（tuple格式）
squeeze(dim=None)
返回一个Tensor，见1.1.2 squeeze()
sum(dim=None, keepdim=False, dtype=None)
返回一个Tensor，见1.7.2 sum()
t() 见1.1.2 t()
t_() t()的原地版本
to(other, non_blocking=False, copy=False) 返回一个和other（Tensor格式）具有相同torch.dtype和torch.device的Tensor
简单举例：将CPU上的Tensor移到GPU上 tensor = tensor.to('cuda')
transpose(dim0, dim1)
返回Tensor。见1.1.2 transpose()
tanh()
返回Tensor。见1.7.1 tanh()
unsqueeze(dim)
返回Tensor，见1.1.2 unsqueeze()
view(*shape)
返回一个和原Tensor具有相同元素，但形状为shape的Tensor

5. Tensor Attributes

5.1 `torch.dtype`

显示Tensor中数据的格式
torch.float32 or torch.float

5.2 `torch.device`

5.3 `torch.layout`

5.4 `torch.memory_format`

5.5 不知道为什么没说，但是确实有

shape（torch.Size格式）

6. Tensor Views

7. torch.autograd

自动求导包。可以对任何以标量为值的函数进行求导（神经网络也可以，某个矩阵也可以）

7.1 Functional higher level API

7.2 Locally disabling gradient computation

7.3 Default gradient layouts

7.4 In-place operations on Tensors

7.5 Variable (deprecated)

7.6 Tensor autograd functions

CLASS torch.Tensor

backward(gradient=None, retain_graph=None, create_graph=False, inputs=None)
计算当前Tensor相对于图上叶节点的梯度。
对图的微分使用了链式法则。
如果当前Tensor不是一个标量且需要梯度，就需要指定参数gradient。这个gradient是和当前Tensor形状相同，且包含当前Tensor的梯度
detach()
返回一个从当前图中分离下来的Tensor
用于切断反向传播¹

8. torch.cuda

会自动导入
is_available() 查看CUDA能不能用，返回布尔值

8.1 Random Number Generator

manual_seed(seed)
设置当前GPU的随机种子，如果cuda不可用会自动忽略。
manual_seed_all(seed)
设置所有GPU的随机种子，如果cuda不可用会自动忽略。

9. torch.cuda.amp

10. torch.backends

10.1 torch.backends.cuda

10.2 torch.backends.cudnn

torch.backends.cudnn.deterministic
布尔值，if True, causes cuDNN to only use deterministic convolution algorithms.

11. torch.distributed

12. torch.distributions

13. torch.fft

14. torch.futures

15. torch.fx

16. torch.hub

17. torch.jit

18. torch.linalg

19. torch.overrides

20. torch.profiler

21. torch.nn.init

22. torch.onnx

22. torch.optim

22.1 How to use an optimizer