pytorch官方教程学习笔记11:LSTM

鏈接

1.输入形状必须是三维的:

Pytorch’s LSTM expects all of its inputs to be 3D tensors. The semantics of the axes of these tensors is important. The first axis is the sequence itself, the second indexes instances in the mini-batch, and the third indexes elements of the input.

第一个维度表示的是一次遍历几个序列。

2.LSTM需要两个hidden,(实则一个h,一个c)

lstm expects two hidden states

为什么呢?暂时也不清楚!因为下一次同时用到了c t-1和h t-1.

应该是根据公式来的吧

以这样的形式给出:

hidden = (torch.randn(1, 1, 3),
          torch.randn(1, 1, 3))

3.代码:hidden和out的关系进一步明确。

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

lstm = nn.LSTM(3, 3)  # Input dim is 3, output dim is 3
inputs = [torch.randn(1, 3) for _ in range(5)]  # make a sequence of length 5

# alternatively, we can do the entire sequence all at once.
# the first value returned by LSTM is all of the hidden states throughout
# the sequence. the second is just the most recent hidden state
# (compare the last slice of "out" with "hidden" below, they are the same)
# The reason for this is that:
# "out" will give you access to all hidden states in the sequence
# "hidden" will allow you to continue the sequence and backpropagate,
# by passing it as an argument  to the lstm at a later time
# Add the extra 2nd dimension
inputs = torch.cat(inputs).view(len(inputs), 1, -1)
hidden = (torch.randn(1, 1, 3), torch.randn(1, 1, 3))  # clean out hidden state
out, hidden = lstm(inputs, hidden)
print(out)
print(hidden)

pytorch官方教程学习笔记11:LSTM_第1张图片
pytorch官方教程学习笔记11:LSTM_第2张图片

仔细看代码中给的那段话,确实这样,进一步理解了。
而且适用于GRU。

4.LSTM如果使用了双向,那么定义的input和hidden应该不是相等关系,应该是1:2的关系。

参考

5.官方教程:

连接
参数说明:
pytorch官方教程学习笔记11:LSTM_第3张图片
num_layers:LSTM的层数,解释中也说明了多层的LSTM是如何计算的,
**setting num_layers=2 would mean stacking two LSTMs together to form a stacked LSTM, with the second LSTM taking in outputs of the first LSTM and computing the final results. **

你可能感兴趣的:(pytorch)