pytorch笔记:07)LSTM

LSTM的介绍博文:https://colah.github.io/posts/2015-08-Understanding-LSTMs/
官方AIP:https://pytorch.org/docs/stable/nn.html?#torch.nn.LSTM

一个栗子,假如我们输入有3个句子,每个句子都由5个单词组成,而每个单词用10维的词向量表示,则seq_len=5, batch=3, input_size=10

类初始化核心参数

input_size – The number of expected features in the input x
#单词的词向量的维数,如input_size=10
hidden_size – The number of features in the hidden state h
#隐藏层的维度
num_layers – Number of recurrent layers. E.g., setting num_layers=2 would mean stacking two LSTMs together to form a stacked LSTM, with the second LSTM taking 
#有多少个LSTM层,如num_layers=2即2个LSTM串联一起,上个LSTM的输出即下一个的输入

有个困惑点:num_layers=2即2个LSTM串联一起,上个LSTM的输出即下一个的输入,若约定输入维度为10,隐藏层维度为20,那第一个LSTM的隐藏层维度是10还是20 ?

 for layer in range(num_layers):
     for direction in range(num_directions):
         layer_input_size = input_size if layer == 0 else hidden_size * num_directions
		 #num_directions=1
         w_ih = Parameter(torch.Tensor(gate_size, layer_input_size))
         w_hh = Parameter(torch.Tensor(gate_size, hidden_size))

从上面的代码可以看到,无论是第几个LSTM,其隐藏层维度不变。只不过下个LSTM的输入维度变成了上一个LSTM的隐藏层维度。LSTM1(10,20) ->LSTM2(20,20) (input_size,hidden_size )

Inputs: input, (h_0, c_0)

input of shape (seq_len, batch, input_size): tensor containing the features of the input sequence. 
#可以参看上面的例子,注意这里的参数排列和keras不同,batch在第二个位置上
h_0 of shape (num_layers * num_directions, batch, hidden_size): tensor containing the initial hidden state for each element in the batch.
#初始化的隐藏元
c_0 of shape (num_layers * num_directions, batch, hidden_size): tensor containing the initial cell state for each element in the batch.
#初始化的记忆元
If (h_0, c_0) is not provided, both h_0 and c_0 default to zero.

Outputs: output, (h_n, c_n)

output of shape (seq_len, batch, hidden_size * num_directions): tensor containing the output features (h_t) from the last layer of the LSTM, for each t. 
#可以类比input,具体看看下面的例子
h_n of shape (num_layers * num_directions, batch, hidden_size): tensor containing the hidden state for t = seq_len
#隐藏元输出
c_n (num_layers * num_directions, batch, hidden_size): tensor containing the cell state for t = seq_len
#记忆元输出

官方API栗子

#词向量维数10,隐藏元维度20,2个LSTM层串联,
rnn = nn.LSTM(10, 20, 2)
#序列长度seq_len=5,batch_size=3,词向量维数=10
input = torch.randn(5, 3, 10)
#初始化的隐藏元和记忆元,通常它们是维度是一样的
#2个LSTM层,batch_size=3,隐藏元维度20
h0 = torch.randn(2, 3, 20)
c0 = torch.randn(2, 3, 20)
#这里有2层lstm,output是最后一层lstm的每个词向量对应隐藏元的输出,其与层数无关,只与序列长度相关
#hn,cn是所有层最后一个隐藏元和记忆元的输出
output, (hn,cn) = rnn(input, (h0, c0))

print(output.size(),hn.size(),cn.size())
torch.Size([5, 3, 20]) torch.Size([2, 3, 20]) torch.Size([2, 3, 20])

官方tutorials栗子
(比对2种写法得结果,代码中把对h0和c0的初始换成了zeros)

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

torch.manual_seed(1)
lstm = nn.LSTM(3, 3)  # Input dim is 3, output dim is 3
inputs = [torch.zeros(1, 3) for _ in range(5)]  # 长度为5的序列,batch=1,词向量维度3
hidden = (torch.zeros(1, 1, 3),torch.zeros(1, 1, 3)) # 1个STML层,1个batch,隐藏元维度3
#让每个词向量依次通过STML
for i in inputs:
    out, hidden = lstm(i.view(1, 1, -1), hidden)
	print(out) #打印每次的out
	
tensor([[[-0.0916, -0.0248, -0.0481]]], grad_fn=<StackBackward>)
tensor([[[-0.1536, -0.0358, -0.0787]]], grad_fn=<StackBackward>)
tensor([[[-0.1923, -0.0398, -0.0976]]], grad_fn=<StackBackward>)
tensor([[[-0.2158, -0.0405, -0.1092]]], grad_fn=<StackBackward>)
tensor([[[-0.2301, -0.0398, -0.1162]]], grad_fn=<StackBackward>)

另一种写法,把序列视为一个整体

#inputs_size: (5,1,3) seq_len=5,batch=1,feature=3
inputs = torch.cat(inputs).view(len(inputs), 1, -1)
hidden = (torch.zeros(1, 1, 3), torch.zeros(1, 1, 3)) 
out, hidden = lstm(inputs, hidden)
print(out)
print(hidden)

#out
tensor([[[-0.0916, -0.0248, -0.0481]],
        [[-0.1536, -0.0358, -0.0787]],
        [[-0.1923, -0.0398, -0.0976]],
        [[-0.2158, -0.0405, -0.1092]],
        [[-0.2301, -0.0398, -0.1162]]], grad_fn=<StackBackward>)
#hidden(hidden_state,cell_state):
(tensor([[[-0.2301, -0.0398, -0.1162]]], grad_fn=<StackBackward>),
 tensor([[[-0.6446, -0.0941, -0.2772]]], grad_fn=<StackBackward>))

两种方法output是一样的,在第二种方法中hidden[0]=out[-1],验证了output是lstm最后一层的所有隐藏元的输出,而hn,cn是所有层最后一个隐藏元和记忆元的输出

你可能感兴趣的:(机器·深度学习)