class torch.nn.RNN(*args, **kwargs)
参数:
input_size – 输入x的特征数量。
hidden_size – 隐层的特征数量。
num_layers – RNN的层数。
bidirectional – 如果True,将会变成一个双向RNN,默认为False。
RNN的输入: (input, h_0)
- input (seq_len, batch, input_size)
: 保存输入序列特征的tensor。
h_0 (num_layers * num_directions, batch, hidden_size)
: 保存着初始隐状态的tensor
RNN的输出: (output, h_n)
output (seq_len, batch, hidden_size * num_directions)
: 保存着RNN最后一层的输出特征。
h_n (num_layers * num_directions, batch, hidden_size)
: 保存着最后一个时刻隐状态。
例子:
#输入x的长度是10,隐层的长度是20,RNN的层数是2层
rnn = nn.RNN(10, 20, 2)
# (seq_len, batch, input_size)
input = torch.randn(5, 3, 10)
# (num_layers * num_directions, batch, hidden_size)
h0 = torch.randn(2, 3, 20)
output, hn = rnn(input, h0)
print(output.shape) # (seq_len, batch, hidden_size * num_directions)
print(hn.shape) # (num_layers * num_directions, batch, hidden_size)
torch.Size([5, 3, 20])
torch.Size([2, 3, 20])
同理:
class torch.nn.GRU(*args, **kwargs)
class torch.nn.RNNCell(input_size, hidden_size, bias=True, nonlinearity='tanh')[source]
另一类:
class torch.nn.RNNCell(input_size, hidden_size, bias=True, nonlinearity='tanh')
class torch.nn.Linear(in_features, out_features, bias=True)
Applies a linear transformation to the incoming data: y=xA^T+b
例子:
# 三维特征转化为2维特征
m = nn.Linear(3, 2)
input = torch.randn(10, 3)
output = m(input)
print(output.size())
torch.Size([10, 2])
class torch.nn.Dropout(p=0.5, inplace=False)
参数:
p
- 将元素置0的概率。默认值:0.5
in-place
- 若设置为True,会在原地执行操作。默认值:False
形状:
输入: 任意。输入可以为任意形状。
输出: 相同。输出和输入形状相同。
例子:
m = nn.Dropout(p=0.5)
input = autograd.Variable(torch.randn(2, 2))
output = m(input)
output
tensor([[-0.0000, -2.9296],
[ 0.0924, 0.0000]])
class torch.nn.Embedding(num_embeddings, embedding_dim, padding_idx=None, max_norm=None, norm_type=2, scale_grad_by_freq=False, sparse=False, _weight=None)[s
参数:
num_embeddings (int)
- 嵌入字典的大小
embedding_dim (int
) - 每个嵌入向量的大小
padding_idx (int, optional)
- 如果提供的话,输出遇到此下标时用零填充
max_norm (float, optional) - 如果提供的话,会重新归一化词嵌入,使它们的范数小于提供的值
norm_type (float, optional) - 对于max_norm选项计算p范数时的p
scale_grad_by_freq (boolean, optional) - 如果提供的话,会根据字典中单词频率缩放梯度
变量:
weight (Tensor) -形状为(num_embeddings, embedding_dim)的模块中可学习的权值
形状:
输入: LongTensor (N, W)
, N = mini-batch, W = 每个mini-batch中提取的下标数
输出: (N, W, embedding_dim)
例子:
from torch.autograd import Variable
# an Embedding module containing 10 tensors of size 3
embedding = nn.Embedding(10, 3)
# a batch of 2 samples of 4 indices each
input = Variable(torch.LongTensor([[1,2,4,5],[5,4,2,1]]))
embedding(input)
tensor([[[-0.4031, 1.8008, 1.4954],
[ 0.3768, -0.2439, 0.9262],
[ 0.8444, -0.1265, 2.0801],
[ 1.0576, -0.9705, -0.1841]],
[[ 1.0576, -0.9705, -0.1841],
[ 0.8444, -0.1265, 2.0801],
[ 0.3768, -0.2439, 0.9262],
[-0.4031, 1.8008, 1.4954]]])
embedding.weight
Parameter containing:
tensor([[-0.6084, 0.0402, -1.5447],
[-0.4031, 1.8008, 1.4954],
[ 0.3768, -0.2439, 0.9262],
[ 0.4351, -1.6146, 0.7603],
[ 0.8444, -0.1265, 2.0801],
[ 1.0576, -0.9705, -0.1841],
[ 0.6502, -0.1189, 0.0794],
[-0.9843, -0.1582, -0.0912],
[ 0.1690, -0.0980, -0.1338],
[-0.9448, -1.9642, -0.1723]])
example with padding_idx:
# example with padding_idx
embedding = nn.Embedding(10, 3, padding_idx= 1)
input = Variable(torch.LongTensor([[0,1,0,5]]))
embedding(input)
tensor([[[-1.1790, 1.2073, -1.0174],
[ 0.0000, 0.0000, 0.0000],
[-1.1790, 1.2073, -1.0174],
[-0.2278, 1.1332, -0.2259]]])