TORCH.NN.FUNCTIONAL

Convolution functions

conv1d

torch.nn.functional.conv1d(input, weight, bias=None, stride=1, padding=0, dilation=1, groups=1) → Tensor在由多个输入平面组成的输入信号上应用一维卷积。

input – 输入形状的张量(minibatch,in_channels,iW)
weight – 形状滤(out_channels,in_channels/groups,kW)的波器
bias – 形状(out_channels) 的可选偏差. Default: None
stride – 卷积内核的步幅。可以是一个数字或一个元素的元组（sW，）。默认值：1
padding – 输入两侧的隐式填充。可以是一个数字或一个元素的元组（padW，）。默认值：0
dilation – 内核元素之间的间距。可以是一个数字或一个元素的元组（dW，）。默认值：1
groups – 将输入分成几组，in_channels应该被组数整除。默认值：1

>>> filters = torch.randn(33, 16, 3)
>>> inputs = torch.randn(20, 16, 50)
>>> F.conv1d(inputs, filters)

conv2d

torch.nn.functional.conv2d(input, weight, bias=None, stride=1, padding=0, dilation=1, groups=1) → Tensor在由多个输入平面组成的输入图像上应用2D卷积。

input – 输入形状的张量(minibatch,in_channels,iH,iW)
weight – 形状滤(out_channels,in_channels/groups,kH,kW)的波器
bias – 形状(out_channels) 的可选偏差. Default: None
stride – 卷积内核的步幅。可以是一个数字或一个元素的元组（sH，sW，）。默认值：1
padding – 输入两侧的隐式填充。可以是一个数字或一个元素的元组（padH，padW）。默认值：0
dilation – 内核元素之间的间距。可以是一个数字或一个元素的元组（dH，dW）。默认值：1
groups – 将输入分成几组，in_channels应该被组数整除。默认值：1

>>> # With square kernels and equal stride
>>> filters = torch.randn(8,4,3,3)
>>> inputs = torch.randn(1,4,5,5)
>>> F.conv2d(inputs, filters, padding=1)

conv3d

torch.nn.functional.conv3d(input, weight, bias=None, stride=1, padding=0, dilation=1, groups=1) → Tensor

在由多个输入平面组成的输入图像上应用3D卷积。

input – 输入形状的张量(minibatch,in_channels,iT,iH,iW)
weight – 形状滤(out_channels,in_channels/groups,kT,kH,kW*)的波器
bias – 形状(out_channels) 的可选偏差. Default: None
stride – 卷积内核的步幅。可以是一个数字或一个元素的元组（sT,sH，sW，）。默认值：1
padding – 输入两侧的隐式填充。可以是一个数字或一个元素的元组（padT，padH，padW）。默认值：0
dilation – 内核元素之间的间距。可以是一个数字或一个元素的元组（dT，dH，dW）。默认值：1
groups – 将输入分成几组，in_channels应该被组数整除。默认值：1

>>> filters = torch.randn(33, 16, 3, 3, 3)
>>> inputs = torch.randn(20, 16, 50, 10, 20)
>>> F.conv3d(inputs, filters)

conv_transpose1d

torch.nn.functional.conv_transpose1d(input, weight, bias=None, stride=1, padding=0, output_padding=0, groups=1, dilation=1) → Tensor在由几个输入平面组成的输入信号上应用一维转置卷积算符，有时也称为“反卷积”。

input – 输入形状的张量(minibatch,in_channels,iW)
weight – 形状滤(out_channels,in_channels/groups,kW)的波器
bias – 形状(out_channels) 的可选偏差. Default: None
stride – 卷积内核的步幅。可以是一个数字或一个元素的元组（sW，）。默认值：1
padding – dilation * (kernel_size - 1) - padding零填充将添加到输入中每个尺寸的两侧。可以是一个数字或一个元素的元组（padW，）。默认值：0
output_padding – 在输出形状的每个尺寸的一侧添加了附加尺寸。可以是单个数字或元组（out_padW）。默认值：0
groups – 将输入分成几组，in_channels应该被组数整除。默认值：1
dilation – 内核元素之间的间距。可以是一个数字或一个元素的元组（dW，）。默认值：1

>>> inputs = torch.randn(20, 16, 50)
>>> weights = torch.randn(16, 33, 5)
>>> F.conv_transpose1d(inputs, weights)

conv_transpose2d

torch.nn.functional.conv_transpose2d(input, weight, bias=None, stride=1, padding=0, output_padding=0, groups=1, dilation=1) → Tensor在由几个输入平面组成的输入图像上应用二维转置卷积运算符，有时也称为“反卷积”。

input – 输入形状的张量(minibatch,in_channels,iH,iW*)
weight – 形状滤(out_channels,in_channels/groups,kH,kW*)的波器
bias – 形状(out_channels) 的可选偏差. Default: None
stride – 卷积内核的步幅。可以是一个数字或一个元素的元组（sH，sW）。默认值：1
padding – dilation * (kernel_size - 1) - padding零填充将添加到输入中每个尺寸的两侧。可以是一个数字或一个元素的元组（padH，padW）。默认值：0
output_padding – 在输出形状的每个尺寸的一侧添加了附加尺寸。可以是单个数字或元组（out_padH, out_padW）。默认值：0
groups – 将输入分成几组，in_channels应该被组数整除。默认值：1
dilation – 内核元素之间的间距。可以是一个数字或一个元素的元组（dH，dW）。默认值：1

>>> # With square kernels and equal stride
>>> inputs = torch.randn(1, 4, 5, 5)
>>> weights = torch.randn(4, 8, 3, 3)
>>> F.conv_transpose2d(inputs, weights, padding=1)

conv_transpose3d

torch.nn.functional.conv_transpose3d(input, weight, bias=None, stride=1, padding=0, output_padding=0, groups=1, dilation=1) → Tensor在由多个输入平面组成的输入图像上应用3D转置卷积运算符，有时也称为“反卷积”

input – 输入形状的张量(minibatch,in_channels,iT,iH,iW)
weight – 形状滤(out_channels,in_channels/groups,kT,kH,kW*)的波器
bias – 形状(out_channels) 的可选偏差. Default: None
stride – 卷积内核的步幅。可以是一个数字或一个元素的元组（sT, sH，sW）。默认值：1
padding – dilation * (kernel_size - 1) - padding零填充将添加到输入中每个尺寸的两侧。可以是一个数字或一个元素的元组（padT, padH，padW）。默认值：0
output_padding – 在输出形状的每个尺寸的一侧添加了附加尺寸。可以是单个数字或元组（out_padT, out_padH, out_padW）。默认值：0
groups – 将输入分成几组，in_channels应该被组数整除。默认值：1
dilation – 内核元素之间的间距。可以是一个数字或一个元素的元组（dT, dH，dW）。默认值：1

>>> inputs = torch.randn(20, 16, 50, 10, 20)
>>> weights = torch.randn(16, 33, 3, 3, 3)
>>> F.conv_transpose3d(inputs, weights)

unfold

torch.nn.functional.unfold(input, kernel_size, dilation=1, padding=0, stride=1)从批处理输入张量中提取滑动局部块。

fold

torch.nn.functional.fold(input, output_size, kernel_size, dilation=1, padding=0, stride=1)将一系列滑动局部块组合成一个大型的张量。

Pooling functions

avg_pool1d

torch.nn.functional.avg_pool1d(input, kernel_size, stride=None, padding=0, ceil_mode=False, count_include_pad=True) → Tensor在由多个输入平面组成的输入信号上应用一维平均池。

input – 输入形状的张量(minibatch,in_channels,iW)
kernel_size –窗口的大小。可以是单个数字或元组 (kW,)
stride – 窗户的步幅。可以是单个数字或元组（sW，）。 Default: kernel_size
padding – 输入两侧的隐式零填充。可以是单个数字或元组（padW，）。Default: 0
ceil_mode – 为True时，将使用ceil而不是floor来计算输出形状。默认值：False
count_include_pad –当为True时，将在平均计算中包括零填充。默认值：True

>>> # pool of square window of size=3, stride=2
>>> input = torch.tensor([[[1, 2, 3, 4, 5, 6, 7]]], dtype=torch.float32)
>>> F.avg_pool1d(input, kernel_size=3, stride=2)
tensor([[[ 2.,  4.,  6.]]])

avg_pool2d

torch.nn.functional.avg_pool2d(input, kernel_size, stride=None, padding=0, ceil_mode=False, count_include_pad=True, divisor_override=None) → Tensor：在kH×kW区域中按步长sH×sW步长应用2D平均池运算。输出要素的数量等于输入平面的数量。

input – 输入形状的张量(minibatch,in_channels，iH,iW)
kernel_size –窗口的大小。可以是单个数字或元组 (kH,kW)
stride – 窗户的步幅。可以是单个数字或元组（sH，sW）。 Default: kernel_size
padding – 输入两侧的隐式零填充。可以是单个数字或元组（padH，padW）。Default: 0
ceil_mode – 为True时，将使用ceil而不是floor来计算输出形状。默认值：False
count_include_pad –当为True时，将在平均计算中包括零填充。默认值：True
divisor_override – 如果指定，它将用作除数，否则将使用池化区域的大小。默认值：None

avg_pool3d

torch.nn.functional.avg_pool3d(input, kernel_size, stride=None, padding=0, ceil_mode=False, count_include_pad=True, divisor_override=None) → Tensor通过步长sT×sH×sW step,在kT×kH×kW区域中应用3D平均池操作。输出要素的数量等于input planes/sT。

input – 输入形状的张量(minibatch,in_channels，iT x iH,iW)
kernel_size –窗口的大小。可以是单个数字或元组 (kT, kH,kW)
stride – 窗户的步幅。可以是单个数字或元组（sT, sH，sW）。 Default: kernel_size
padding – 输入两侧的隐式零填充。可以是单个数字或元组（padT, padH，padW）。Default: 0
ceil_mode – 为True时，将使用ceil而不是floor来计算输出形状。默认值：False
count_include_pad –当为True时，将在平均计算中包括零填充。默认值：True
divisor_override – 如果指定，它将用作除数，否则将使用池化区域的大小。默认值：None

max_pool1d

torch.nn.functional.max_pool1d(*args, **kwargs)在由多个输入平面组成的输入信号上应用一维最大池化。

max_pool2d

torch.nn.functional.max_pool2d(*args, **kwargs)在由多个输入平面组成的输入信号上应用2D最大合并。

max_pool3d

torch.nn.functional.max_pool3d(*args, **kwargs)在由多个输入平面组成的输入信号上应用3D最大池化。

max_unpool1d

torch.nn.functional.max_unpool1d(input, indices, kernel_size, stride=None, padding=0, output_size=None)计算MaxPool1d的局部逆。

max_unpool2d

torch.nn.functional.max_unpool2d(input, indices, kernel_size, stride=None, padding=0, output_size=None)计算MaxPool2d的局部逆。

max_unpool3d

torch.nn.functional.max_unpool3d(input, indices, kernel_size, stride=None, padding=0, output_size=None)计算MaxPool3d的局部逆。

lp_pool1d

torch.nn.functional.lp_pool1d(input, norm_type, kernel_size, stride=None, ceil_mode=False)在由多个输入平面组成的输入信号上应用一维幂平均池。如果p的幂的所有输入的总和为零，则梯度也设置为零。

lp_pool2d

torch.nn.functional.lp_pool2d(input, norm_type, kernel_size, stride=None, ceil_mode=False)在由多个输入平面组成的输入信号上应用2D功率平均池。如果p的幂的所有输入的总和为零，则梯度也设置为零。

adaptive_max_pool1d

torch.nn.functional.adaptive_max_pool1d(*args, **kwargs)在由多个输入平面组成的输入信号上应用一维自适应最大池化。

output_size – 目标输出尺寸 (single integer)
return_indices – 是否返回池索引。默认值：False

adaptive_max_pool2d

torch.nn.functional.adaptive_max_pool2d(*args, **kwargs)在由多个输入平面组成的输入信号上应用2D自适应最大池化。

output_size – 目标输出大小（单整数或双整数元组）
return_indices – 是否返回池索引。默认值：False

adaptive_max_pool3d

torch.nn.functional.adaptive_max_pool3d(*args, **kwargs)在由多个输入平面组成的输入信号上应用3D自适应最大池化。

output_size – 目标输出大小（单整数或三整数元组）
return_indices – 是否返回池索引。默认值：False

adaptive_avg_pool1d

torch.nn.functional.adaptive_avg_pool1d(input, output_size) → Tensor在由多个输入平面组成的输入信号上应用一维自适应平均池。

output_size – 目标输出大小（单个整数）

adaptive_avg_pool2d

torch.nn.functional.adaptive_avg_pool2d(input, output_size)在由多个输入平面组成的输入信号上应用2D自适应平均池。

output_size – 目标输出大小（单整数或双整数元组）

adaptive_avg_pool3d

torch.nn.functional.adaptive_avg_pool3d(input, output_size)在由多个输入平面组成的输入信号上应用3D自适应平均池。

output_size – 目标输出大小（单整数或三整数元组）

Non-linear activation functions

threshold

torch.nn.functional.threshold(input, threshold, value, inplace=False)设置输入张量的每个元素的阈值。

torch.nn.functional.threshold_(input, threshold, value) → Tensor

relu

torch.nn.functional.relu(input, inplace=False) → Tensor按元素应用整流线性单位函数。有关更多详细信息，请参见ReLU。

torch.nn.functional.relu_(input) → Tensor

hardtanh

torch.nn.functional.hardtanh(input, min_val=-1., max_val=1., inplace=False) → Tensor按元素应用HardTanh函数。有关更多详细信息，请参见Hardtanh。

torch.nn.functional.hardtanh_(input, min_val=-1., max_val=1.) → Tensor

relu6

torch.nn.functional.relu6(input, inplace=False) → Tensor应用逐元素函数\ReLU6（x）= min（max（0，x），6）。

elu

torch.nn.functional.elu(input, alpha=1.0, inplace=False) 逐元素ELU(x)=max(0,x)+min(0,α∗(exp(x)−1)) .

torch.nn.functional.elu_(input, alpha=1.) → Tensor

selu

torch.nn.functional.selu(input, inplace=False) → Tensor:逐元素 SELU(x)=scale∗(max(0,x)+min(0,α∗(exp(x)−1))) , withα=1.6732632423543772848170429916717 并且 scale=1.0507009873554804934193349852946

celu

torch.nn.functional.celu(input, alpha=1., inplace=False) → Tensor:逐元素CELU(x)=max(0,x)+min(0,α∗(exp(x/α)−1)) .

leaky_relu

torch.nn.functional.leaky_relu(input, negative_slope=0.01, inplace=False) → Tensor:逐元素LeakyReLU(x)=max(0,x)+negative_slope min(0,x)

torch.nn.functional.leaky_relu_(input, negative_slope=0.01) → Tensor

prelu

torch.nn.functional.prelu(input, weight) → Tensor：逐元素PReLU(x)=max(0,x)+weight∗min(0,x) 可学习的权重。

rrelu

torch.nn.functional.rrelu(input, lower=1./8, upper=1./3, training=False, inplace=False) → Tensor：随机的leaky ReLU。

torch.nn.functional.rrelu_(input, lower=1./8, upper=1./3, training=False) → Tensor

glu

torch.nn.functional.glu(input, dim=-1) → Tensor:门控线性单元。计算：GLU(a,b)=a⊗σ(b)。其中输入沿dim分成两半以形成a和b，σ是sigmoid 型函数，而⊗是矩阵之间的按元素乘积。

input (Tensor) –输入张量
dim (int) – 分割输入的维度。默认值：-1

gelu

torch.nn.functional.gelu(input) → Tensor：GeLU(x)=x∗Φ(x)，其中Φ（x）是高斯分布的累积分布函数。

logsigmoid

torch.nn.functional.logsigmoid(input) → Tensor：LogSigmoid(x_i)=log(1/(1+exp(−x_i)))

hardshrink

torch.nn.functional.hardshrink(input, lambd=0.5) → Tensor:逐个应用硬收缩功能

tanhshrink

torch.nn.functional.tanhshrink(input) → Tensor: Tanhshrink(x)=x−Tanh(x)

softsign

torch.nn.functional.softsign(input) → Tensor:

softplus

torch.nn.functional.softplus(input, beta=1, threshold=20) → Tensor

softmin

torch.nn.functional.softmin(input, dim=None, _stacklevel=3, dtype=None):应用softmin函数。Softmin(x)=Softmax(−x)有关数学公式，请参见softmax定义。

input (Tensor) – 输入
dim (int) –将计算softmin的维度（因此，沿着dim的每个切片的总和为1）。
dtype (torch.dtype, optional) – 返回张量的所需数据类型。如果指定，则在执行操作之前将输入张量强制转换为dtype。这对于防止数据类型溢出很有用。默认值：None。

softmax

torch.nn.functional.softmax(input, dim=None, _stacklevel=3, dtype=None):应用softmax函数。Softmax is defined as:

它将应用于沿dim的所有切片，并将对其进行重新缩放，以使元素位于[0，1]范围内且总和为1。

input (Tensor) – 输入
dim (int) – 将沿着其计算softmax的尺寸。
dtype (torch.dtype, optional) – 返回张量的所需数据类型。如果指定，则在执行操作之前将输入张量强制转换为dtype。这对于防止数据类型溢出很有用。默认值：None。

softshrink

torch.nn.functional.softshrink(input, lambd=0.5) → Tensor:逐个应用软收缩功能

gumbel_softmax

torch.nn.functional.gumbel_softmax(logits, tau=1, hard=False, eps=1e-10, dim=-1):Gumbel-Softmax发行版（链接1链接2）中的样本，也可以离散化。

logits – […, num_features] 未标准化的日志概率
tau – 非负标量尺度
hard – 如果为True，则返回的样本将被离散为一热向量，但将被区别为好像是autograd中的软样本
dim (int) – 将沿着其计算softmax的尺寸。默认值：-1。

>>> logits = torch.randn(20, 32)
>>> # Sample soft categorical using reparametrization trick:
>>> F.gumbel_softmax(logits, tau=1, hard=False)
>>> # Sample hard categorical using "Straight-through" trick:
>>> F.gumbel_softmax(logits, tau=1, hard=True)

log_softmax

torch.nn.functional.log_softmax(input, dim=None, _stacklevel=3, dtype=None)：应用softmax，后跟对数。

虽然在数学上等效于log（softmax（x）），但分别执行这两个操作比较慢，并且在数值上不稳定。此函数使用替代公式来正确计算输出和渐变。

input (Tensor) – 输入
dim (int) – 将沿其计算log_softmax的维。
dtype (torch.dtype, optional) –返回张量的所需数据类型。如果指定，则在执行操作之前将输入张量强制转换为dtype。这对于防止数据类型溢出很有用。默认值：None。

tanh

torch.nn.functional.tanh(input) → Tensor

sigmoid

torch.nn.functional.sigmoid(input) → Tensor

Normalization functions

batch_norm

torch.nn.functional.batch_norm(input, running_mean, running_var, weight=None, bias=None, training=False, momentum=0.1, eps=1e-05)对一批数据中的每个通道应用批标准化。

instance_norm

torch.nn.functional.instance_norm(input, running_mean=None, running_var=None, weight=None, bias=None, use_input_stats=True, momentum=0.1, eps=1e-05)批量对每个数据样本中的每个通道应用实例归一化

layer_norm

torch.nn.functional.layer_norm(input, normalized_shape, weight=None, bias=None, eps=1e-05)将图层归一化应用于最后一定数量的尺寸。

local_response_norm

torch.nn.functional.local_response_norm(input, size, alpha=0.0001, beta=0.75, k=1.0)在由几个输入平面组成的输入信号上应用本地响应归一化，其中通道占据第二维。跨通道应用标准化。

normalize

torch.nn.functional.normalize(input, p=2, dim=1, eps=1e-12, out=None)

执行L_p指定尺寸上的输入标准化。对于大小为

的张量输入，每个

沿维度dim的元素向量v转换为

使用默认参数时，它沿维使用向量1上的欧几里得范数归一化。

input – 任何形状的输入张量
p (float) –规范制定中的指数值。默认值：2
dim (int) –减小尺寸。默认值：1
eps (float) – 小值，以避免被零除。默认值：1e-12
out (Tensor, optional) – 输出张量。如果使用out，则此操作将不可微。

Linear functions

linear

torch.nn.functional.linear(input, weight, bias=None)

对输入数据应用线性变换：

bilinear

torch.nn.functional.bilinear(input1, input2, weight, bias=None):对输入数据应用双线性转换：

Dropout functions

dropout

torch.nn.functional.dropout(input, p=0.5, training=True, inplace=False)在训练期间，使用伯努利分布的样本以概率p将输入张量的某些元素随机置零。

p –元素归零的概率。默认值：0.5
training – 如果为True，则应用dropout。默认值：True
inplace –如果设置为True，将就地执行此操作。默认值：False

alpha_dropout

torch.nn.functional.alpha_dropout(input, p=0.5, training=False, inplace=False)

dropout2d

torch.nn.functional.dropout2d(input, p=0.5, training=True, inplace=False)随机将整个通道归零（通道是2D特征图，例如，批输入中的第i个样本的第j个通道是2D张量input [i，j ]）的输入张量）。使用伯努利分布的样本，每个信道将在每次forward中以概率p独立清零。

p –通道归零的概率。默认值：0.5
training – 如果为True，则应用dropout。默认值：True
inplace – 如果设置为True，将就地执行此操作。默认值：False

dropout3d

torch.nn.functional.dropout3d(input, p=0.5, training=True, inplace=False)随机将整个通道归零（通道是3D特征图，例如，批输入中的第i个样本的第j个通道是3D张量input [i，j ]）的输入张量）。使用伯努利分布的样本，每个信道将在每次forward中以概率p独立清零。

p –通道归零的概率。默认值：0.5
training – 如果为True，则应用dropout。默认值：True
inplace – 如果设置为True，将就地执行此操作。默认值：False

Sparse functions

embedding

torch.nn.functional.embedding(input, weight, padding_idx=None, max_norm=None, norm_type=2.0, scale_grad_by_freq=False, sparse=False)一个简单的查找表，用于以固定的字典和大小查找嵌入。

该模块通常用于使用索引检索单词嵌入。模块的输入是索引列表和嵌入矩阵，输出是相应的词嵌入。

input (LongTensor) – 包含嵌入张量矩阵的索引的张量
weight (Tensor) – 行数等于最大可能索引+ 1，列数等于嵌入大小的嵌入矩阵
padding_idx (int, optional) – 如果指定给定输出，则在遇到索引时，将输出用padding_idx（初始化为零）处的嵌入矢量填充。
max_norm (float, optional) – 如果给定，则范数大于max_norm的每个嵌入矢量将重新规范化为范数max_norm。注意：这将修改就地重量。
norm_type (float, optional) – 为max_norm选项计算的p范数的p。默认值2。
scale_grad_by_freq (boolean, optional) – 如果给定的话，它将按小批量中单词频率的倒数来缩放梯度。默认为False。
sparse (bool, optional) –如果为True，则梯度为w.r.t.重量将是一个稀疏的张量。有关稀疏渐变的更多详细信息，请参见torch.nn.Embedding下的注释。

>>> # a batch of 2 samples of 4 indices each
>>> input = torch.tensor([[1,2,4,5],[4,3,2,9]])
>>> # an embedding matrix containing 10 tensors of size 3
>>> embedding_matrix = torch.rand(10, 3)
>>> F.embedding(input, embedding_matrix)
tensor([[[ 0.8490,  0.9625,  0.6753],
         [ 0.9666,  0.7761,  0.6108],
         [ 0.6246,  0.9751,  0.3618],
         [ 0.4161,  0.2419,  0.7383]],

        [[ 0.6246,  0.9751,  0.3618],
         [ 0.0237,  0.7794,  0.0528],
         [ 0.9666,  0.7761,  0.6108],
         [ 0.3385,  0.8612,  0.1867]]])

>>> # example with padding_idx
>>> weights = torch.rand(10, 3)
>>> weights[0, :].zero_()
>>> embedding_matrix = weights
>>> input = torch.tensor([[0,2,0,5]])
>>> F.embedding(input, embedding_matrix, padding_idx=0)
tensor([[[ 0.0000,  0.0000,  0.0000],
         [ 0.5609,  0.5384,  0.8720],
         [ 0.0000,  0.0000,  0.0000],
         [ 0.6262,  0.2438,  0.7471]]])

embedding_bag

torch.nn.functional.embedding_bag(input, weight, offsets=None, max_norm=None, norm_type=2, scale_grad_by_freq=False, mode='mean', sparse=False, per_sample_weights=None)计算嵌入包的总和，平均值或最大值，而无需实例化中间嵌入。

input (LongTensor) – 将张量包含索引的袋放入嵌入矩阵
weight (Tensor) –行数等于最大可能索引+ 1，列数等于嵌入大小的嵌入矩阵
offsets (LongTensor, optional) – 仅在输入为1D时使用。偏移量确定输入中每个袋（序列）的起始索引位置。
max_norm (float, optional) – 如果给定，则范数大于max_norm的每个嵌入矢量将重新规范化为范数max_norm。注意：这将修改就地重量。
norm_type (float, optional) – p范数中的p为max_norm选项进行计算。默认值2。
scale_grad_by_freq (boolean, optional) – 如果给定的话，它将按小批量中单词频率的倒数来缩放梯度。默认为False。注意：mode =“ max”时不支持此选项。
mode (string, optional) –sum，mean或max。指定减少袋子的方式。默认值：mean
sparse (bool, optional) – 如果为True，则梯度为w.r.t.weight将是一个稀疏的张量。有关稀疏渐变的更多详细信息，请参见torch.nn.Embedding下的注释。注意：mode =“ max”时不支持此选项。
per_sample_weights (Tensor, optional) – 一个float / double权重的张量，或者None表示所有权重都应为1。如果指定，则per_sample_weights的形状必须与输入完全相同，并且如果不为None，则被视为具有相同的偏移量。

>>> # an Embedding module containing 10 tensors of size 3
>>> embedding_matrix = torch.rand(10, 3)
>>> # a batch of 2 samples of 4 indices each
>>> input = torch.tensor([1,2,4,5,4,3,2,9])
>>> offsets = torch.tensor([0,4])
>>> F.embedding_bag(embedding_matrix, input, offsets)
tensor([[ 0.3397,  0.3552,  0.5545],
        [ 0.5893,  0.4386,  0.5882]])

one_hot

torch.nn.functional.one_hot(tensor, num_classes=-1) → LongTensor

接受具有shape（*）索引值的LongTensor并返回一个形状（*，num_classes）的张量，该张量在各处都为零，除非最后一维的索引与输入张量的对应值匹配，在这种情况下它将为1。

tensor (LongTensor) –任何形状的类值。
num_classes (int) – 类别总数。如果设置为-1，则将类数推断为比输入张量中的最大类值大一。

>>> F.one_hot(torch.arange(0, 5) % 3)
tensor([[1, 0, 0],
        [0, 1, 0],
        [0, 0, 1],
        [1, 0, 0],
        [0, 1, 0]])
>>> F.one_hot(torch.arange(0, 5) % 3, num_classes=5)
tensor([[1, 0, 0, 0, 0],
        [0, 1, 0, 0, 0],
        [0, 0, 1, 0, 0],
        [1, 0, 0, 0, 0],
        [0, 1, 0, 0, 0]])
>>> F.one_hot(torch.arange(0, 6).view(3,2) % 3)
tensor([[[1, 0, 0],
         [0, 1, 0]],
        [[0, 0, 1],
         [1, 0, 0]],
        [[0, 1, 0],
         [0, 0, 1]]])

Distance functions

pairwise_distance

torch.nn.functional.pairwise_distance(x1, x2, p=2.0, eps=1e-06, keepdim=False)

cosine_similarity

torch.nn.functional.cosine_similarity(x1, x2, dim=1, eps=1e-8) → Tensor：返回x_1和x_2之间的余弦相似度（沿dim计算）。

x1 (Tensor) – 第一个输入.
x2 (Tensor) – 第二个输入 (与x1大小一致).
dim (int, optional) – 向量维度. Default: 1
eps (float, optional) – 小值避免被零除. Default: 1e-8

>>> input1 = torch.randn(100, 128)
>>> input2 = torch.randn(100, 128)
>>> output = F.cosine_similarity(input1, input2)
>>> print(output)

pdist

torch.nn.functional.pdist(input, p=2) → Tensor:计算输入中每对行向量之间的p范数距离。这与torch.norm（input [:, None]-input，dim = 2，p = p）的对角线除外的上三角部分相同。如果行是连续的，此功能将更快。

如果输入的形状为N×M，则输出的形状为

如果p∈（0，∞），则此函数等效于scipy.spatial.distance.pdist（input，'minkowski'，p = p）。当p = 0时，它等价于scipy.spatial.distance.pdist（input，'hamming'）*M。当p =∞时，最接近的scipy函数为scipy.spatial.distance.pdist(xn, lambda x, y: np.abs(x - y).max()).

input – 输入N×M 形状的张量.
p –用于计算每个向量对之间的p范数距离的p值∈[0，∞] .

Loss functions

binary_cross_entropy

torch.nn.functional.binary_cross_entropy(input, target, weight=None, size_average=None, reduce=None, reduction='mean')测量目标和输出之间的二进制交叉熵的函数。

input – 任意形状的张量
target – 与输入形状相同的张量
weight (Tensor, optional) – 手动重新设置缩放比例的权重（如果提供的话，可以重复以匹配输入张量形状）
size_average (bool, optional) – 不推荐使用（请参见减少内容）。默认情况下，损失是批次中每个损失元素的平均数。请注意，对于某些损失，每个样本有多个元素。如果将字段size_average设置为False，则对每个小批量将损失相加。当reduce为False时被忽略。默认值：True
reduce (bool, optional) – 不推荐使用（请参见减少内容）。默认情况下，根据size_average，对每个小批量的观测值求平均或求和。当reduce为False时，将返回每批元素的损失，并忽略size_average。默认值：True
reduction (string, optional) – 指定要应用于输出的缩减量：none| mean | sum。none：不应用任何减少，mean：输出的总和除以输出中元素的数量，sum：输出的总和。注意：size_average和reduce正在弃用中，同时，指定这两个参数中的任何一个将覆盖reduce。默认值：mean

>>> input = torch.randn((3, 2), requires_grad=True)
>>> target = torch.rand((3, 2), requires_grad=False)
>>> loss = F.binary_cross_entropy(F.sigmoid(input), target)
>>> loss.backward()

binary_cross_entropy_with_logits

torch.nn.functional.binary_cross_entropy_with_logits(input, target, weight=None, size_average=None, reduce=None, reduction='mean', pos_weight=None)测量目标和输出对数之间的二进制交叉熵的函数

input – 任意形状的张量
target – 与输入形状相同的张量
weight (Tensor, optional) – 手动重新设置缩放比例的权重（如果提供的话，可以重复以匹配输入张量形状）
size_average (bool, optional) – 不推荐使用（请参见减少内容）。默认情况下，损失是批次中每个损失元素的平均数。请注意，对于某些损失，每个样本有多个元素。如果将字段size_average设置为False，则对每个小批量将损失相加。当reduce为False时被忽略。默认值：True
reduce (bool, optional) – 不推荐使用（请参见减少内容）。默认情况下，根据size_average，对每个小批量的观测值求平均或求和。当reduce为False时，将返回每批元素的损失，并忽略size_average。默认值：True
reduction (string, optional) – 指定要应用于输出的缩减量：none| mean | sum。none：不应用任何减少，mean：输出的总和除以输出中元素的数量，sum：输出的总和。注意：size_average和reduce正在弃用中，同时，指定这两个参数中的任何一个将覆盖reduce。默认值：mean
pos_weight (Tensor, optional) – 大量积极的例子。必须是长度等于类数的向量。

>>> input = torch.randn(3, requires_grad=True)
>>> target = torch.empty(3).random_(2)
>>> loss = F.binary_cross_entropy_with_logits(input, target)
>>> loss.backward()

poisson_nll_loss

torch.nn.functional.poisson_nll_loss(input, target, log_input=True, full=False, size_average=None, eps=1e-08, reduce=None, reduction='mean')泊松负对数似然损失。

input – 任意形状的张量
target –岁采样target∼Poisson(input) .
log_input –如果为True，则损失计算为exp(input)−target∗input , 如果为假，则损失为input−target∗log(input+eps) . Default: True
full –是否计算全部损失； i。 e。添加斯特林近似项。默认值：False 是否计算全部损失； ie。添加斯特林近似项。默认值：Falsetarget∗log(target)−target+0.5∗log(2∗π∗target)
size_average (bool, optional) – 不推荐使用（请参见减少内容）。默认情况下，损失是批次中每个损失元素的平均数。请注意，对于某些损失，每个样本有多个元素。如果将字段size_average设置为False，则对每个小批量将损失相加。当reduce为False时被忽略。默认值：True
eps (float, optional) – 较小的值，以避免在以下情况下评估log(0) when log_input=False. Default: 1e-8
reduce (bool, optional) – 不推荐使用（请参见减少内容）。默认情况下，根据size_average，对每个小批量的观测值求平均或求和。当reduce为False时，将返回每批元素的损失，并忽略size_average。默认值：True
reduction (string, optional) – 指定要应用于输出的缩减量：none| mean | sum。none：不应用任何减少，mean：输出的总和除以输出中元素的数量，sum：输出的总和。注意：size_average和reduce正在弃用中，同时，指定这两个参数中的任何一个将覆盖reduce。默认值：mean

cosine_embedding_loss

torch.nn.functional.cosine_embedding_loss(input1, input2, target, margin=0, size_average=None, reduce=None, reduction='mean') → Tensor

cross_entropy

torch.nn.functional.cross_entropy(input, target, weight=None, size_average=None, ignore_index=-100, reduce=None, reduction='mean')该标准将log_softmax和nll_loss组合在一个函数中。

input (Tensor) – (N, C) 其中C =类数或在2D损失的情况下 or (N, C, H, W), or (N, C, d_1, d_2, ..., d_K)(N,C,d1,d2,...,dK) 其中在K维损失的情况下为1K≥1。
target (Tensor) –（N）其中每个值为 0≤targets[i]≤C−1 , or (N,d1,d2,...,dK) 其中K≥1表示K维损失。
weight (Tensor, optional) –每个类别都需要手动调整缩放权重比例。如果给定，则必须是大小为C的张量
size_average (bool, optional) – 不推荐使用（请参见减少内容）。默认情况下，损失是批次中每个损失元素的平均数。请注意，对于某些损失，每个样本有多个元素。如果将字段size_average设置为False，则对每个小批量将损失相加。当reduce为False时被忽略。默认值：True
ignore_index (int, optional) – 指定一个目标值，该目标值将被忽略并且不会影响输入梯度。当size_average为True时，损失是在非忽略目标上平均的。默认值：-100
reduce (bool, optional) – 不推荐使用（请参见减少内容）。默认情况下，根据size_average，对每个小批量的观测值求平均或求和。当reduce为False时，将返回每批元素的损失，并忽略size_average。默认值：True
reduction (string, optional) – 指定要应用于输出的缩减量：none| mean | sum。none：不应用任何减少，mean：输出的总和除以输出中元素的数量，sum：输出的总和。注意：size_average和reduce正在弃用中，同时，指定这两个参数中的任何一个将覆盖reduce。默认值：mean

>>> input = torch.randn(3, 5, requires_grad=True)
>>> target = torch.randint(5, (3,), dtype=torch.int64)
>>> loss = F.cross_entropy(input, target)
>>> loss.backward()

ctc_loss

torch.nn.functional.ctc_loss(log_probs, targets, input_lengths, target_lengths, blank=0, reduction='mean', zero_infinity=False)连接主义者的时间分类损失。

log_probs –(T,N,C) 其中C =字母表中的字符数，包括空白，T =输入长度，N =批处理大小。输出的对数概率（例如[torch.nn.functional.log_softmax()]).
targets – (N,S) 或（sum（target_lengths））。目标不能为空。在第二种形式中，假定目标是连接在一起的。
input_lengths – (N) .输入的长度（必须为≤T )
target_lengths – (N)目标长度
blank (int, optional) – 空白标签。默认值0 .
reduction (string, optional) – 指定要应用于输出的缩减量：none| mean | sum。none：不应用任何减少，mean：输出的总和除以输出中元素的数量，sum：输出的总和。注意：size_average和reduce正在弃用中，同时，指定这两个参数中的任何一个将覆盖reduce。默认值：mean
zero_infinity (bool, optional) – 是否将无穷大损失和相关的梯度归零。默认值：False无限损失主要发生在输入太短而无法与目标对齐时。

>>> log_probs = torch.randn(50, 16, 20).log_softmax(2).detach().requires_grad_()
>>> targets = torch.randint(1, 20, (16, 30), dtype=torch.long)
>>> input_lengths = torch.full((16,), 50, dtype=torch.long)
>>> target_lengths = torch.randint(10,30,(16,), dtype=torch.long)
>>> loss = F.ctc_loss(log_probs, targets, input_lengths, target_lengths)
>>> loss.backward()

hinge_embedding_loss

torch.nn.functional.hinge_embedding_loss(input, target, margin=1.0, size_average=None, reduce=None, reduction='mean') → Tensor

l1_loss

torch.nn.functional.l1_loss(input, target, size_average=None, reduce=None, reduction='mean') → Tensor取平均逐元素绝对值差的函数。

mse_loss

torch.nn.functional.mse_loss(input, target, size_average=None, reduce=None, reduction='mean') → Tensor测量按元素的均方误差。

margin_ranking_loss

torch.nn.functional.margin_ranking_loss(input1, input2, target, margin=0, size_average=None, reduce=None, reduction='mean') → Tensor

multilabel_margin_loss

torch.nn.functional.multilabel_margin_loss(input, target, size_average=None, reduce=None, reduction='mean') → Tensor

multilabel_soft_margin_loss

torch.nn.functional.multilabel_soft_margin_loss(input, target, weight=None, size_average=None) → Tensor

multi_margin_loss

torch.nn.functional.multi_margin_loss(input, target, p=1, margin=1.0, weight=None, size_average=None, reduce=None, reduction='mean')

multi_margin_loss(input, target, p=1, margin=1, weight=None, size_average=None,reduce=None, reduction=’mean’) -> Tensor

nll_loss

torch.nn.functional.nll_loss(input, target, weight=None, size_average=None, ignore_index=-100, reduce=None, reduction='mean')负对数似然损失。

>>> # input is of size N x C = 3 x 5
>>> input = torch.randn(3, 5, requires_grad=True)
>>> # each element in target has to have 0 <= value < C
>>> target = torch.tensor([1, 0, 4])
>>> output = F.nll_loss(F.log_softmax(input), target)
>>> output.backward()

smooth_l1_loss

torch.nn.functional.smooth_l1_loss(input, target, size_average=None, reduce=None, reduction='mean')如果绝对逐项误差低于1，则使用平方项的函数，否则使用L1项。

soft_margin_loss

torch.nn.functional.soft_margin_loss(input, target, size_average=None, reduce=None, reduction='mean') → Tensor

triplet_margin_loss

torch.nn.functional.triplet_margin_loss(anchor, positive, negative, margin=1.0, p=2, eps=1e-06, swap=False, size_average=None, reduce=None, reduction='mean')

Vision functions

torch.nn.functional.pixel_shuffle()重新排列形状张量中的元素(∗,C×r2,H,W) 到(∗,C,H×r,W×r) .

>>> input = torch.randn(1, 9, 4, 4)
>>> output = torch.nn.functional.pixel_shuffle(input, 3)
>>> print(output.size())
torch.Size([1, 1, 12, 12])

pad

torch.nn.functional.pad(input, pad, mode='constant', value=0)填充张量

>>> t4d = torch.empty(3, 3, 4, 2)
>>> p1d = (1, 1) # pad last dim by 1 on each side
>>> out = F.pad(t4d, p1d, "constant", 0)  # effectively zero padding
>>> print(out.data.size())
torch.Size([3, 3, 4, 4])
>>> p2d = (1, 1, 2, 2) # pad last dim by (1, 1) and 2nd to last by (2, 2)
>>> out = F.pad(t4d, p2d, "constant", 0)
>>> print(out.data.size())
torch.Size([3, 3, 8, 4])
>>> t4d = torch.empty(3, 3, 4, 2)
>>> p3d = (0, 1, 2, 1, 3, 3) # pad by (0, 1), (2, 1), and (3, 3)
>>> out = F.pad(t4d, p3d, "constant", 0)
>>> print(out.data.size())
torch.Size([3, 9, 7, 3])

interpolate

torch.nn.functional.interpolate(input, size=None, scale_factor=None, mode='nearest', align_corners=None)向下/向上采样输入到给定大小或给定scale_factor.用于插值的算法由模式确定。当前支持时间，空间和体积采样，即，预期输入的形状为3-D，4-D或5-D。输入尺寸以以下格式解释：迷你批x通道x [可选深度] x [可选高度] x宽度。可用于调整大小的模式为：最近，线性（仅3D），双线性，双三次（仅4D），三线性（仅5D），面积

upsample

torch.nn.functional.upsample(input, size=None, scale_factor=None, mode='nearest', align_corners=None)将输入上采样到给定大小或给定scale_factor

upsample_nearest

torch.nn.functional.upsample_nearest(input, size=None, scale_factor=None)使用最近邻的像素值对输入进行上采样。

upsample_bilinear

torch.nn.functional.upsample_bilinear(input, size=None, scale_factor=None)使用双线性上采样对输入进行上采样。

grid_sample

torch.nn.functional.grid_sample(input, grid, mode='bilinear', padding_mode='zeros', align_corners=None)给定input和flow-field grid，使用input和grid中的像素位置来计算输出。当前，仅支持空间（4-D）和体积（5-D）输入。

在空间（4-D）情况下，用于input形状为（N，C，H_in ，W_in ）和形状为（N，H _out ，W_out ，2）的grid，输出将具有（N，C，H_out ，W_out ）的output。

对于每个输出位置output [n，：，h，w]，大小为2的矢量grid[n，h，w]指定input像素位置x和y，这些像素位置用于内插输出值output [n，：, h，w]。在5D输入的情况下，grid [n，d，h，w]指定用于内插output [n，：，d，h，w]的x，y，z像素位置。mode参数指定nearest或bilinear插值方法以对输入像素进行采样。

affine_grid

torch.nn.functional.affine_grid(theta, size, align_corners=None)给定一批仿射矩阵theta，生成2D或3D流场（采样网格）。

DataParallel functions (multi-GPU, distributed)

data_parallel

torch.nn.parallel.data_parallel(module, inputs, device_ids=None, output_device=None, dim=0, module_kwargs=None)跨device_ids中提供的GPU并行评估模块（输入）这是DataParallel模块的功能版本。