本文主要介绍torch.nn.LSTM的num_layers参数以及bidirectional这两个参数的用法,因为在维度上比较绕,所以只看源码也许不太懂,本文用理解加验证的方式去学习如何用这两个参数
咱们统一batch_first=False,也就是默认的情况
设定一个batch,句子长度是50,batch_size=3,embedding_size=10,
设定一个LSTM,input_size=10,hidden_size=20
最简单的情况:
num_layers=1,bidirectional=False,我们知道nn.lstm会返回两个值一个是outputs,另外是一个tuple(h,c), h是hidden state,c是cell state
1.outputs=(word_len,batch_size,hidden_size)
def shp(_):
print(_.shape)
lstm=nn.LSTM(10,20,1,bidirectional=False)
batch1=torch.randn(50,3,10)
outputs,(h,c)=lstm(batch1)
shp(outputs) # word_len*batch_size*hidden_size
shp(h)
shp(c)
输出
torch.Size([50, 3, 20])
torch.Size([1, 3, 20])
torch.Size([1, 3, 20])
且最后一个word的output与hidden一样,取第一个句子验证一下
print(outputs[-1][0])
print(h[0][0])
输出
tensor([ 0.1749, -0.3162, 0.0034, 0.0481, 0.1030, 0.1106, -0.2225, -0.0347,
0.1339, 0.0229, 0.2953, -0.0891, -0.0491, 0.2034, -0.1530, 0.1405,
0.1547, 0.0420, -0.1418, 0.1041], grad_fn=
tensor([ 0.1749, -0.3162, 0.0034, 0.0481, 0.1030, 0.1106, -0.2225, -0.0347,
0.1339, 0.0229, 0.2953, -0.0891, -0.0491, 0.2034, -0.1530, 0.1405,
0.1547, 0.0420, -0.1418, 0.1041], grad_fn=
num_layers:
number of recurrent layers. E.g., setting ``num_layers=2`` would mean stacking two LSTMs together to form a `stacked LSTM`, with the second LSTM taking in outputs of the first LSTM and computing the final results. Default: 1
其实就是lstm的层数啦,如果是多层的话,第一层的输入是我们的embedding,之后其他层的输入就是上一层lstm的output也就是每个word的hidden了
形状是这样的
lstm=nn.LSTM(10,20,2,bidirectional=False)
batch1=torch.randn(50,3,10)
outputs,(h,c)=lstm(batch1)
shp(outputs) # word_len*batch_size*hidden_size
shp(h)
shp(c)
输出:
torch.Size([50, 3, 20])
torch.Size([2, 3, 20])
torch.Size([2, 3, 20])
可以看到,增加了层数之后,outputs其实是最后的状态hidden,那么推测outputs[-1][0]应该是和h[-1][0]一样的,打印看下
print(outputs[-1][0])
print(h[-1][0])
输出:
tensor([ 0.1147, -0.0166, -0.0147, -0.1080, -0.0085, 0.0010, 0.1063, 0.0561,
-0.0021, 0.0810, -0.0339, -0.0336, 0.0826, 0.0264, 0.0284, 0.1243,
0.0279, 0.0075, 0.0842, -0.1104], grad_fn=
tensor([ 0.1147, -0.0166, -0.0147, -0.1080, -0.0085, 0.0010, 0.1063, 0.0561,
-0.0021, 0.0810, -0.0339, -0.0336, 0.0826, 0.0264, 0.0284, 0.1243,
0.0279, 0.0075, 0.0842, -0.1104], grad_fn=
双向:
这个会稍微复杂一些,如果把层数加上,每个lstm一个左到右,一个右到左,outputs就是最后一个lstm这样的状态拼接,看下维度
lstm=nn.LSTM(10,20,5,bidirectional=True)
batch1=torch.randn(50,3,10)
outputs,(h,c)=lstm(batch1)
shp(outputs) # word_len*batch_size*hidden_size
shp(h)
shp(c)
输出:
torch.Size([50, 3, 40])
torch.Size([10, 3, 20])
torch.Size([10, 3, 20])
可以看到hidden state的10是5*2,5是因为5个lstm,*2是因为双向,outputs变成了40维,是20*2,一个左到右,一个右到左hidden拼接而来
因此最后一个word的前一半outputs应该和倒数第二个hidden state一样,第一个word的后一半outputs应该和倒数第一个hidden state一样
就是outputs[-1][0][:20]和h[-2][0]一样,outputs[0][0][20:]和h[-1][0]一样
print(outputs[-1][0][:20])
print(h[-2][0])
print(outputs[0][0][20:])
print(h[-1][0])
输出:
tensor([ 0.0664, 0.1122, -0.0704, 0.0698, -0.1094, -0.0060, -0.0375, 0.0151,
0.1732, 0.0121, 0.1653, 0.0120, -0.1547, -0.0314, -0.1088, -0.0457,
0.0638, 0.1276, 0.0372, -0.0486], grad_fn=
tensor([ 0.0664, 0.1122, -0.0704, 0.0698, -0.1094, -0.0060, -0.0375, 0.0151,
0.1732, 0.0121, 0.1653, 0.0120, -0.1547, -0.0314, -0.1088, -0.0457,
0.0638, 0.1276, 0.0372, -0.0486], grad_fn=
tensor([ 0.0587, 0.0290, 0.0425, -0.0261, 0.0600, 0.0741, -0.0365, -0.1388,
-0.1384, 0.0442, 0.0273, -0.1147, -0.1305, -0.0457, 0.0475, -0.0961,
-0.0711, -0.0542, 0.0624, 0.1075], grad_fn=
tensor([ 0.0587, 0.0290, 0.0425, -0.0261, 0.0600, 0.0741, -0.0365, -0.1388,
-0.1384, 0.0442, 0.0273, -0.1147, -0.1305, -0.0457, 0.0475, -0.0961,
-0.0711, -0.0542, 0.0624, 0.1075], grad_fn=