LSTM的介绍博文:https://colah.github.io/posts/2015-08-Understanding-LSTMs/
官方AIP:https://pytorch.org/docs/stable/nn.html?#torch.nn.LSTM
一个栗子,假如我们输入有3个句子,每个句子都由5个单词组成,而每个单词用10维的词向量表示,则seq_len=5, batch=3, input_size=10
类初始化核心参数:
input_size – The number of expected features in the input x
#单词的词向量的维数,如input_size=10
hidden_size – The number of features in the hidden state h
#隐藏层的维度
num_layers – Number of recurrent layers. E.g., setting num_layers=2 would mean stacking two LSTMs together to form a stacked LSTM, with the second LSTM taking
#有多少个LSTM层,如num_layers=2即2个LSTM串联一起,上个LSTM的输出即下一个的输入
有个困惑点:num_layers=2即2个LSTM串联一起,上个LSTM的输出即下一个的输入,若约定输入维度为10,隐藏层维度为20,那第一个LSTM的隐藏层维度是10还是20 ?
for layer in range(num_layers):
for direction in range(num_directions):
layer_input_size = input_size if layer == 0 else hidden_size * num_directions
#num_directions=1
w_ih = Parameter(torch.Tensor(gate_size, layer_input_size))
w_hh = Parameter(torch.Tensor(gate_size, hidden_size))
从上面的代码可以看到,无论是第几个LSTM,其隐藏层维度不变。只不过下个LSTM的输入维度变成了上一个LSTM的隐藏层维度。LSTM1(10,20) ->LSTM2(20,20) (input_size,hidden_size )
Inputs: input, (h_0, c_0)
input of shape (seq_len, batch, input_size): tensor containing the features of the input sequence.
#可以参看上面的例子,注意这里的参数排列和keras不同,batch在第二个位置上
h_0 of shape (num_layers * num_directions, batch, hidden_size): tensor containing the initial hidden state for each element in the batch.
#初始化的隐藏元
c_0 of shape (num_layers * num_directions, batch, hidden_size): tensor containing the initial cell state for each element in the batch.
#初始化的记忆元
If (h_0, c_0) is not provided, both h_0 and c_0 default to zero.
Outputs: output, (h_n, c_n)
output of shape (seq_len, batch, hidden_size * num_directions): tensor containing the output features (h_t) from the last layer of the LSTM, for each t.
#可以类比input,具体看看下面的例子
h_n of shape (num_layers * num_directions, batch, hidden_size): tensor containing the hidden state for t = seq_len
#隐藏元输出
c_n (num_layers * num_directions, batch, hidden_size): tensor containing the cell state for t = seq_len
#记忆元输出
官方API栗子
#词向量维数10,隐藏元维度20,2个LSTM层串联,
rnn = nn.LSTM(10, 20, 2)
#序列长度seq_len=5,batch_size=3,词向量维数=10
input = torch.randn(5, 3, 10)
#初始化的隐藏元和记忆元,通常它们是维度是一样的
#2个LSTM层,batch_size=3,隐藏元维度20
h0 = torch.randn(2, 3, 20)
c0 = torch.randn(2, 3, 20)
#这里有2层lstm,output是最后一层lstm的每个词向量对应隐藏元的输出,其与层数无关,只与序列长度相关
#hn,cn是所有层最后一个隐藏元和记忆元的输出
output, (hn,cn) = rnn(input, (h0, c0))
print(output.size(),hn.size(),cn.size())
torch.Size([5, 3, 20]) torch.Size([2, 3, 20]) torch.Size([2, 3, 20])
官方tutorials栗子
(比对2种写法得结果,代码中把对h0和c0的初始换成了zeros)
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
torch.manual_seed(1)
lstm = nn.LSTM(3, 3) # Input dim is 3, output dim is 3
inputs = [torch.zeros(1, 3) for _ in range(5)] # 长度为5的序列,batch=1,词向量维度3
hidden = (torch.zeros(1, 1, 3),torch.zeros(1, 1, 3)) # 1个STML层,1个batch,隐藏元维度3
#让每个词向量依次通过STML
for i in inputs:
out, hidden = lstm(i.view(1, 1, -1), hidden)
print(out) #打印每次的out
tensor([[[-0.0916, -0.0248, -0.0481]]], grad_fn=<StackBackward>)
tensor([[[-0.1536, -0.0358, -0.0787]]], grad_fn=<StackBackward>)
tensor([[[-0.1923, -0.0398, -0.0976]]], grad_fn=<StackBackward>)
tensor([[[-0.2158, -0.0405, -0.1092]]], grad_fn=<StackBackward>)
tensor([[[-0.2301, -0.0398, -0.1162]]], grad_fn=<StackBackward>)
另一种写法,把序列视为一个整体
#inputs_size: (5,1,3) seq_len=5,batch=1,feature=3
inputs = torch.cat(inputs).view(len(inputs), 1, -1)
hidden = (torch.zeros(1, 1, 3), torch.zeros(1, 1, 3))
out, hidden = lstm(inputs, hidden)
print(out)
print(hidden)
#out
tensor([[[-0.0916, -0.0248, -0.0481]],
[[-0.1536, -0.0358, -0.0787]],
[[-0.1923, -0.0398, -0.0976]],
[[-0.2158, -0.0405, -0.1092]],
[[-0.2301, -0.0398, -0.1162]]], grad_fn=<StackBackward>)
#hidden(hidden_state,cell_state):
(tensor([[[-0.2301, -0.0398, -0.1162]]], grad_fn=<StackBackward>),
tensor([[[-0.6446, -0.0941, -0.2772]]], grad_fn=<StackBackward>))
两种方法output是一样的,在第二种方法中hidden[0]=out[-1],验证了output是lstm最后一层的所有隐藏元的输出,而hn,cn是所有层最后一个隐藏元和记忆元的输出