input_size: The number of expected features in the input `x`
hidden_size: The number of features in the hidden state `h`
num_layers: Number of recurrent layers. E.g., setting ``num_layers=2``
would mean stacking two LSTMs together to form a `stacked LSTM`,
with the second LSTM taking in outputs of the first LSTM and
computing the final results. Default: 1
bias: If ``False``, then the layer does not use bias weights `b_ih` and `b_hh`.
Default: ``True``
batch_first: If ``True``, then the input and output tensors are provided
as (batch, seq, feature). Default: ``False``
dropout: If non-zero, introduces a `Dropout` layer on the outputs of each
LSTM layer except the last layer, with dropout probability equal to
:attr:`dropout`. Default: 0
bidirectional: If ``True``, becomes a bidirectional LSTM. Default: ``False``
LSTM总共有7个参数:前面3个是必须输入的
# 首先导入LSTM需要的相关模块
import torch
import torch.nn as nn # 神经网络模块
# 数据向量维数10, 隐藏元维度20, 2个LSTM层串联(如果是1,可以省略,默认为1)
rnn = nn.LSTM(10, 20, 2)
# 序列长度seq_len=5, batch_size=3, 数据向量维数=10
input = torch.randn(5, 3, 10)
# 初始化的隐藏元和记忆元,通常它们的维度是一样的
# 2个LSTM层,batch_size=3,隐藏元维度20
h0 = torch.randn(2, 3, 20)
c0 = torch.randn(2, 3, 20)
# 这里有2层lstm,output是最后一层lstm的每个词向量对应隐藏层的输出,其与层数无关,只与序列长度相关
# hn,cn是所有层最后一个隐藏元和记忆元的输出
output, (hn, cn) = rnn(input, (h0, c0))
print(output.size(),hn.size(),cn.size())
torch.Size([5, 3, 20]) torch.Size([2, 3, 20]) torch.Size([2, 3, 20])
import torch as t
from torch import nn
from torch.autograd import Variable as V
t.manual_seed(1000) #作用:每次得到的1000个随机数是固定的
input = V(t.randn(2,3,4))
# 一个LSTMCell对应的层数只能是一层
lstm = nn.LSTMCell(4,3)
hx = V(t.randn(3,3))
cx = V(t.randn(3,3))
out = []
for i in input:
hx,cx = lstm(i,(hx,cx))
out.append(hx)
print(t.stack(out))
实验结果:
tensor([[[-0.3610, -0.1643, 0.1631],
[-0.0613, -0.4937, -0.1642],
[ 0.5080, -0.4175, 0.2502]],
[[-0.0703, -0.0393, -0.0429],
[ 0.2085, -0.3005, -0.2686],
[ 0.1482, -0.4728, 0.1425]]], grad_fn=<StackBackward>)