torch.nn.LSTM详解

本文主要介绍torch.nn.LSTM的num_layers参数以及bidirectional这两个参数的用法,因为在维度上比较绕,所以只看源码也许不太懂,本文用理解加验证的方式去学习如何用这两个参数

咱们统一batch_first=False,也就是默认的情况

设定一个batch,句子长度是50,batch_size=3,embedding_size=10,

设定一个LSTM,input_size=10,hidden_size=20

最简单的情况:

num_layers=1,bidirectional=False,我们知道nn.lstm会返回两个值一个是outputs,另外是一个tuple(h,c), h是hidden state,c是cell state

1.outputs=(word_len,batch_size,hidden_size)

def shp(_):
    print(_.shape)


lstm=nn.LSTM(10,20,1,bidirectional=False)
batch1=torch.randn(50,3,10)
outputs,(h,c)=lstm(batch1)
shp(outputs) # word_len*batch_size*hidden_size
shp(h)
shp(c)

输出

torch.Size([50, 3, 20])
torch.Size([1, 3, 20])
torch.Size([1, 3, 20])

且最后一个word的output与hidden一样,取第一个句子验证一下

print(outputs[-1][0])
print(h[0][0])

输出

tensor([ 0.1749, -0.3162,  0.0034,  0.0481,  0.1030,  0.1106, -0.2225, -0.0347,
         0.1339,  0.0229,  0.2953, -0.0891, -0.0491,  0.2034, -0.1530,  0.1405,
         0.1547,  0.0420, -0.1418,  0.1041], grad_fn=)
tensor([ 0.1749, -0.3162,  0.0034,  0.0481,  0.1030,  0.1106, -0.2225, -0.0347,
         0.1339,  0.0229,  0.2953, -0.0891, -0.0491,  0.2034, -0.1530,  0.1405,
         0.1547,  0.0420, -0.1418,  0.1041], grad_fn=)

num_layers:

number of recurrent layers. E.g., setting ``num_layers=2`` would mean stacking two LSTMs together to form a `stacked LSTM`, with the second LSTM taking in outputs of the first LSTM and computing the final results. Default: 1

其实就是lstm的层数啦,如果是多层的话,第一层的输入是我们的embedding,之后其他层的输入就是上一层lstm的output也就是每个word的hidden了

形状是这样的

lstm=nn.LSTM(10,20,2,bidirectional=False)
batch1=torch.randn(50,3,10)
outputs,(h,c)=lstm(batch1)
shp(outputs) # word_len*batch_size*hidden_size
shp(h)
shp(c)

输出:

torch.Size([50, 3, 20])
torch.Size([2, 3, 20])
torch.Size([2, 3, 20])

可以看到,增加了层数之后,outputs其实是最后的状态hidden,那么推测outputs[-1][0]应该是和h[-1][0]一样的,打印看下

print(outputs[-1][0])
print(h[-1][0])

输出:

tensor([ 0.1147, -0.0166, -0.0147, -0.1080, -0.0085,  0.0010,  0.1063,  0.0561,
        -0.0021,  0.0810, -0.0339, -0.0336,  0.0826,  0.0264,  0.0284,  0.1243,
         0.0279,  0.0075,  0.0842, -0.1104], grad_fn=)
tensor([ 0.1147, -0.0166, -0.0147, -0.1080, -0.0085,  0.0010,  0.1063,  0.0561,
        -0.0021,  0.0810, -0.0339, -0.0336,  0.0826,  0.0264,  0.0284,  0.1243,
         0.0279,  0.0075,  0.0842, -0.1104], grad_fn=)

双向:

这个会稍微复杂一些,如果把层数加上,每个lstm一个左到右,一个右到左,outputs就是最后一个lstm这样的状态拼接,看下维度

lstm=nn.LSTM(10,20,5,bidirectional=True)
batch1=torch.randn(50,3,10)
outputs,(h,c)=lstm(batch1)
shp(outputs) # word_len*batch_size*hidden_size
shp(h)
shp(c)

输出:

torch.Size([50, 3, 40])
torch.Size([10, 3, 20])
torch.Size([10, 3, 20])

可以看到hidden state的10是5*2,5是因为5个lstm,*2是因为双向,outputs变成了40维,是20*2,一个左到右,一个右到左hidden拼接而来

因此最后一个word的前一半outputs应该和倒数第二个hidden state一样,第一个word的后一半outputs应该和倒数第一个hidden state一样

就是outputs[-1][0][:20]和h[-2][0]一样,outputs[0][0][20:]和h[-1][0]一样

print(outputs[-1][0][:20])
print(h[-2][0])

print(outputs[0][0][20:])
print(h[-1][0])

输出:

tensor([ 0.0664,  0.1122, -0.0704,  0.0698, -0.1094, -0.0060, -0.0375,  0.0151,
         0.1732,  0.0121,  0.1653,  0.0120, -0.1547, -0.0314, -0.1088, -0.0457,
         0.0638,  0.1276,  0.0372, -0.0486], grad_fn=)
tensor([ 0.0664,  0.1122, -0.0704,  0.0698, -0.1094, -0.0060, -0.0375,  0.0151,
         0.1732,  0.0121,  0.1653,  0.0120, -0.1547, -0.0314, -0.1088, -0.0457,
         0.0638,  0.1276,  0.0372, -0.0486], grad_fn=)
tensor([ 0.0587,  0.0290,  0.0425, -0.0261,  0.0600,  0.0741, -0.0365, -0.1388,
        -0.1384,  0.0442,  0.0273, -0.1147, -0.1305, -0.0457,  0.0475, -0.0961,
        -0.0711, -0.0542,  0.0624,  0.1075], grad_fn=)
tensor([ 0.0587,  0.0290,  0.0425, -0.0261,  0.0600,  0.0741, -0.0365, -0.1388,
        -0.1384,  0.0442,  0.0273, -0.1147, -0.1305, -0.0457,  0.0475, -0.0961,
        -0.0711, -0.0542,  0.0624,  0.1075], grad_fn=)

你可能感兴趣的:(lstm,ML,nlp)