GRU,LSTM,RNN等模型网络在pytorch中的定义均在torch/nn/modules/rnn,py中
其中GRU,RNN,LSTM均是继承的父类RNNBase
其中关于RNNBase类的定义:
def __init__(self, mode, input_size, hidden_size,
num_layers=1, bias=True, batch_first=False,
dropout=0., bidirectional=False):
super(RNNBase, self).__init__()
self.mode = mode
self.input_size = input_size
self.hidden_size = hidden_size
self.num_layers = num_layers
self.bias = bias
self.batch_first = batch_first
self.dropout = float(dropout)
self.bidirectional = bidirectional
num_directions = 2 if bidirectional else 1
if not isinstance(dropout, numbers.Number) or not 0 <= dropout <= 1 or \
isinstance(dropout, bool):
raise ValueError("dropout should be a number in range [0, 1] "
"representing the probability of an element being "
"zeroed")
if dropout > 0 and num_layers == 1:
warnings.warn("dropout option adds dropout after all but last "
"recurrent layer, so non-zero dropout expects "
"num_layers greater than 1, but got dropout={} and "
"num_layers={}".format(dropout, num_layers))
if mode == 'LSTM':
gate_size = 4 * hidden_size
elif mode == 'GRU':
gate_size = 3 * hidden_size
elif mode == 'RNN_TANH':
gate_size = hidden_size
elif mode == 'RNN_RELU':
gate_size = hidden_size
else:
raise ValueError("Unrecognized RNN mode: " + mode)
self._all_weights = []
for layer in range(num_layers):
for direction in range(num_directions):
layer_input_size = input_size if layer == 0 else hidden_size * num_directions
w_ih = Parameter(torch.Tensor(gate_size, layer_input_size))
w_hh = Parameter(torch.Tensor(gate_size, hidden_size))
b_ih = Parameter(torch.Tensor(gate_size))
# Second bias vector included for CuDNN compatibility. Only one
# bias vector is needed in standard definition.
b_hh = Parameter(torch.Tensor(gate_size))
layer_params = (w_ih, w_hh, b_ih, b_hh)
suffix = '_reverse' if direction == 1 else ''
param_names = ['weight_ih_l{}{}', 'weight_hh_l{}{}']
if bias:
param_names += ['bias_ih_l{}{}', 'bias_hh_l{}{}']
param_names = [x.format(layer, suffix) for x in param_names]
for name, param in zip(param_names, layer_params):
setattr(self, name, param)
self._all_weights.append(param_names)
self.flatten_parameters()
self.reset_parameters()
其中关于mode定义了模型是GRU,LSTM…
关于GRU的输入输出,具体形式和介绍如下:
INPUTS:
OUTPUTS:
Inputs: input, h_0
- input of shape(seq_len, batch, input_size)
: tensor containing the features
of the input sequence. The input can also be a packed variable length
sequence. See :func:torch.nn.utils.rnn.pack_padded_sequence
or :func:torch.nn.utils.rnn.pack_sequence
for details.
- h_0 of shape(num_layers * num_directions, batch, hidden_size)
: tensor
containing the initial hidden state for each element in the batch.
Defaults to zero if not provided. If the RNN is bidirectional,
num_directions should be 2, else it should be 1.Outputs: output, h_n
- output of shape(seq_len, batch, num_directions * hidden_size)
: tensor
containing the output features (h_t
) from the last layer of the RNN,
for eacht
. If a :class:torch.nn.utils.rnn.PackedSequence
has
been given as the input, the output will also be a packed sequence.
For the unpacked case, the directions can be separated
usingoutput.view(seq_len, batch, num_directions, hidden_size)
,
with forward and backward being direction0
and1
respectively.
Similarly, the directions can be separated in the packed case.
- h_n of shape(num_layers * num_directions, batch, hidden_size)
: tensor
containing the hidden state fort = seq_len
.
Like output, the layers can be separated using
h_n.view(num_layers, num_directions, batch, hidden_size)
.
RNN,LSTM输入输出的形式同上。其中如果网络为双向的,则num_directions=2;否则为1。
>>> import torch.nn as nn
>>> gru = nn.GRU(input_size=50, hidden_size=50, batch_first=True)
>>> embed = nn.Embedding(3, 50)
>>> x = torch.LongTensor([[0, 1, 2]])
>>> x_embed = embed(x)
>>> x.size()
torch.Size([1, 3])
>>> x_embed.size()
torch.Size([1, 3, 50])
>>> out, hidden = gru(x_embed)
>>> out.size()
torch.Size([1, 3, 50])
>>> hidden.size()
torch.Size([1, 1, 50])
关于GRU的输入输出维度,当初始化是batch_first=True时,具体的形式如下:
#input:batch_size,seq_length,input_size
#hidden: numlayers*num_directions,batch_size,hidden_size
#output: batch_size,seq_length,num_directions*hidden_size
#h_n: num_layers*num_directions,batch_size,hidden_size
**pytorch中RNN,LSTM,GRU使用详解
torch.nn.GRU()函数解读