一个栗子,假如我们输入有3个句子,每个句子都由5个单词组成,而每个单词用10维的词向量表示,则seq_len=5, batch=3, input_size=10
input_size – The number of expected features in the input x
hidden_size – The number of features in the hidden state h
num_layers – Number of recurrent layers. E.g., setting num_layers=2 would mean stacking two LSTMs together to form a stacked LSTM, with the second LSTM taking
有个困惑点:num_layers=2即2个LSTM串联一起,上个LSTM的输出即下一个的输入,若约定输入维度为10,隐藏层维度为20,那第一个LSTM的隐藏层维度是10还是20 ?
for layer in range(num_layers):
for direction in range(num_directions):
layer_input_size = input_size if layer == 0 else hidden_size * num_directions
w_ih = Parameter(torch.Tensor(gate_size, layer_input_size))
w_hh = Parameter(torch.Tensor(gate_size, hidden_size))
从上面的代码可以看到,无论是第几个LSTM,其隐藏层维度不变。只不过下个LSTM的输入维度变成了上一个LSTM的隐藏层维度。LSTM1(10,20) ->LSTM2(20,20) (input_size,hidden_size )
Inputs: input, (h_0, c_0)
input of shape (seq_len, batch, input_size): tensor containing the features of the input sequence.
h_0 of shape (num_layers * num_directions, batch, hidden_size): tensor containing the initial hidden state for each element in the batch.
c_0 of shape (num_layers * num_directions, batch, hidden_size): tensor containing the initial cell state for each element in the batch.
If (h_0, c_0) is not provided, both h_0 and c_0 default to zero.
Outputs: output, (h_n, c_n)
output of shape (seq_len, batch, hidden_size * num_directions): tensor containing the output features (h_t) from the last layer of the LSTM, for each t.
h_n of shape (num_layers * num_directions, batch, hidden_size): tensor containing the hidden state for t = seq_len
c_n (num_layers * num_directions, batch, hidden_size): tensor containing the cell state for t = seq_len
rnn = nn.LSTM(10, 20, 2)
input = torch.randn(5, 3, 10)
h0 = torch.randn(2, 3, 20)
c0 = torch.randn(2, 3, 20)
output, (hn,cn) = rnn(input, (h0, c0))
torch.Size([5, 3, 20]) torch.Size([2, 3, 20]) torch.Size([2, 3, 20])
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
lstm = nn.LSTM(3, 3) # Input dim is 3, output dim is 3
inputs = [torch.zeros(1, 3) for _ in range(5)] # 长度为5的序列,batch=1,词向量维度3
hidden = (torch.zeros(1, 1, 3),torch.zeros(1, 1, 3)) # 1个STML层,1个batch,隐藏元维度3
for i in inputs:
out, hidden = lstm(i.view(1, 1, -1), hidden)
print(out) #打印每次的out
tensor([[[-0.0916, -0.0248, -0.0481]]], grad_fn=<StackBackward>)
tensor([[[-0.1536, -0.0358, -0.0787]]], grad_fn=<StackBackward>)
tensor([[[-0.1923, -0.0398, -0.0976]]], grad_fn=<StackBackward>)
tensor([[[-0.2158, -0.0405, -0.1092]]], grad_fn=<StackBackward>)
tensor([[[-0.2301, -0.0398, -0.1162]]], grad_fn=<StackBackward>)
#inputs_size: (5,1,3) seq_len=5,batch=1,feature=3
inputs =, 1, -1)
hidden = (torch.zeros(1, 1, 3), torch.zeros(1, 1, 3))
out, hidden = lstm(inputs, hidden)
tensor([[[-0.0916, -0.0248, -0.0481]],
[[-0.1536, -0.0358, -0.0787]],
[[-0.1923, -0.0398, -0.0976]],
[[-0.2158, -0.0405, -0.1092]],
[[-0.2301, -0.0398, -0.1162]]], grad_fn=<StackBackward>)
(tensor([[[-0.2301, -0.0398, -0.1162]]], grad_fn=<StackBackward>),
tensor([[[-0.6446, -0.0941, -0.2772]]], grad_fn=<StackBackward>))