参考文章:知乎 - tf.nn.dynamic_rnn 详解
简单提一下,用TensorFlow实现RNN系列结构,基本就是定义一个cell,然后调用一个RNN函数,就获得输出了。而且,cell定义成什么类型基本就是什么类型的RNN了。
一、TensorFlow关于RNN函数的定义
tf.nn.dynamic_rnn(
cell, # RNN记忆单元
inputs, # 序列输入
sequence_length=None, # 序列长度,即时序长度
initial_state=None, # RNN初始化状态
dtype=None, # 数据类型
parallel_iterations=None, # 并行执行迭代次数
swap_memory=False, # 用于多GUP并行训练模型
time_major=False, # 规定输入、输出的shape
scope=None # 变量作用域,默认"run"
)
其中,cell就决定了RNN的类型,例如,普通RNN的cell,那就是原始的RNN结构,如果是LSTM的cell,那这个就是LSTM,如果是GRU的cell,那这就是GRU。
inpus就是我们实际的输入数据。sequence_length是输入的数据长度,也是对应的时序长度。
time_major决定了输入输出的shape。怎么理解这句话呢?
通常我们的输入(即这里的inputs)和输出(一会儿提到的output)的shape是`[batch_size, max_time, embedding_size]`。如果time_major为True,那么这个函数就认为输入和输出的shape是`[max_time, batch_size, embedding_size]`,如果为False,那么这个函数就认为输入和输出的shape是`[batch_size, max_time, embedding_size]`。
这个参数默认值是False。
这个函数的返回值有两个,一个是RNN的输出,一个是RNN每个时刻的隐状态。
但是这个输出还挺讲究的,实际需要根据cell的类型,决定输出的类型。
为什么呢?比如LSTM中,细胞状态和输出是不一致的,但在GRU中,这两个值就是一致的,所以就产生了,输出的shape稍有区别。
二、简单实现原始RNN模型
话不多说,先上代码:
import tensorflow as tf
import numpy as np
# 输入
X = np.random.randn(3, 6, 4)
X[1, 4:] = 0
# 序列长度
X_lengths = [6, 4, 6]
# 隐藏层神经元个数,决定了输出的最后一维维度
rnn_hidden_size = 5
# 定义RNN的cell
cell = tf.nn.rnn_cell.BasicRNNCell(num_units=rnn_hidden_size)
# 执行RNN
o, s = tf.nn.dynamic_rnn(cell=cell, inputs=X, sequence_length=X_lengths, dtype=tf.float64)
sess = tf.Session()
sess.run(tf.global_variables_initializer())
print(sess.run([tf.shape(o), o]))
print(sess.run([tf.shape(s), s]))
整个代码还是比较简单的,正如一开始所说,定义一个cell,调一个RNN函数就完成了。
看一下输出:
[array([3, 6, 5]), array([[[ 0.17310878, 0.82633802, -0.43270981, 0.06905287,
-0.44548788],
[ 0.76207901, 0.14140098, -0.95680809, 0.72005081,
-0.15649403],
[ 0.79388277, -0.97675279, -0.33644186, 0.95320421,
-0.60705681],
[ 0.00563942, -0.02027826, 0.89590012, -0.22456675,
-0.40984772],
[-0.02721506, -0.87714997, -0.43034662, -0.93520363,
0.94834008],
[ 0.25531945, 0.93336422, -0.92178408, 0.30199629,
-0.92172056]],
[[ 0.28433064, -0.89897588, 0.4130407 , 0.55888719,
-0.40204589],
[ 0.41459254, 0.3597689 , 0.9548185 , -0.00866829,
0.50680063],
[-0.06959048, -0.4649923 , 0.94124415, 0.08926017,
0.33270379],
[ 0.69817465, 0.95005181, 0.70850335, 0.5241701 ,
-0.53791173],
[ 0. , 0. , 0. , 0. ,
0. ],
[ 0. , 0. , 0. , 0. ,
0. ]],
[[-0.12966264, -0.32701574, -0.74199627, -0.49359511,
-0.32056881],
[-0.44894617, -0.6809439 , 0.63751225, 0.11421618,
0.12798053],
[-0.13901253, 0.86462562, -0.49524682, -0.77128572,
-0.71333543],
[ 0.16060172, -0.47568445, -0.54749102, 0.39206036,
0.46851311],
[-0.94127998, -0.37428214, 0.9176711 , -0.75276436,
0.52876751],
[ 0.60046028, -0.76555278, 0.69193852, 0.76096789,
-0.72530337]]])]
[array([3, 5]), array([[ 0.25531945, 0.93336422, -0.92178408, 0.30199629, -0.92172056],
[ 0.69817465, 0.95005181, 0.70850335, 0.5241701 , -0.53791173],
[ 0.60046028, -0.76555278, 0.69193852, 0.76096789, -0.72530337]])]
可以看到,RNN的输出有两项,第一项是o(即output),代表RNN每个时间步的隐状态输出,这在RNN或LSTM或GRU里都是一致的,没有区别。相对有区别的是s(即state),代表最终状态。这里目前看不出来什么问题,一会儿到LSTM了提一下。
分析讨论下模型的输出:
对于output,输出的维度是 [3, 6, 5] 。
这个3我们可以理解为batch_size;
6可以理解为最长时间步,也即最长的序列长度,可以看到在第二个样本的数据中,出现了两行0,这是因为我们在sequence_length中指定了,对于第二个样本,序列长度只到4;
5可以理解为embedding的维度,至于为什么是5,因为我们在定义RNN的cell的时候,给了一个参数 num_units,这个值即确定了输出的维度。
对于state,输出的维度是 [3, 5]。
仔细观察下,就会发现,state其实就是output中,每个样本的最后一个输出。对于第二个样本,由于序列长度为4,所以最终的state就对应时序为4时的输出。
三、实现LSTM模型
还是先贴代码:
import tensorflow as tf
import numpy as np
# 输入
X = np.random.randn(3, 6, 4)
X[1, 4:] = 0
# 序列长度
X_lengths = [6, 4, 6]
# 隐藏层神经元个数,决定了输出的最后一维维度
rnn_hidden_size = 5
# 定义LSTM的cell
cell = tf.nn.rnn_cell.LSTMCell(num_units=rnn_hidden_size)
# 执行RNN
o, s = tf.nn.dynamic_rnn(cell=cell, inputs=X, sequence_length=X_lengths, dtype=tf.float64)
sess = tf.Session()
sess.run(tf.global_variables_initializer())
print(sess.run([tf.shape(o), o]))
print(sess.run([tf.shape(s), s]))
仔细找找,就会发现,和前面实现RNN的模型相比,就只是修改了一下,cell的定义。。。
看看输出有什么不一样:
[array([3, 6, 5]), array([[[ 0.0970524 , 0.04119286, -0.08474181, -0.18085976,
-0.07741971],
[-0.02223423, -0.10350601, -0.08190117, -0.09554744,
0.00527808],
[ 0.09914153, -0.04312684, -0.06642732, -0.14581045,
-0.08470683],
[ 0.0531387 , -0.05552092, -0.12808912, -0.15655323,
-0.02803473],
[ 0.18570773, 0.00828065, 0.02929196, -0.02115647,
-0.20298142],
[ 0.097577 , 0.00784962, -0.18960612, -0.1939976 ,
-0.02686524]],
[[ 0.05894009, 0.0371513 , -0.147196 , -0.12490672,
0.00890823],
[-0.13846855, 0.0048299 , -0.27920325, -0.10103866,
0.19092917],
[-0.0160551 , 0.04513064, -0.28024583, -0.06436632,
0.12552706],
[-0.06964844, -0.08109376, -0.04003272, 0.13113396,
0.0881404 ],
[ 0. , 0. , 0. , 0. ,
0. ],
[ 0. , 0. , 0. , 0. ,
0. ]],
[[-0.07211149, 0.01887877, -0.12521735, -0.05033309,
0.10820924],
[-0.08311677, -0.00736387, -0.17241497, -0.09276841,
0.11316439],
[ 0.10335096, 0.00665376, -0.02476282, -0.21087149,
-0.11191987],
[ 0.09959326, 0.06205221, -0.01933063, -0.07346646,
-0.11060355],
[-0.04239206, -0.20163064, -0.02140169, -0.02752217,
0.04012931],
[-0.10801504, -0.04490684, -0.02402958, 0.0872379 ,
0.08214646]]])]
[array([2, 3, 5]), LSTMStateTuple(c=array([[ 0.18660455, 0.01673364, -0.43402739, -0.31572953, -0.05726592],
[-0.18487706, -0.12826426, -0.07106543, 0.46722466, 0.19727486],
[-0.17872193, -0.10419721, -0.04430988, 0.18415518, 0.12666562]]), h=array([[ 0.097577 , 0.00784962, -0.18960612, -0.1939976 , -0.02686524],
[-0.06964844, -0.08109376, -0.04003272, 0.13113396, 0.0881404 ],
[-0.10801504, -0.04490684, -0.02402958, 0.0872379 , 0.08214646]]))]
第一个是输出output,维度还是 [3, 6, 5], 和前面一样,是LSTM每个时刻的隐藏层输出。
关于state,就和前面RNN不太一样了。这里维度是 [2, 3, 5]。
和前面RNN相比多了一个 [3, 5],其原因就是,LSTM和其他RNN模型不太一样的就是,其细胞状态和输出并不是同一个。
具体是通过一个输出门将细胞状态转换为隐藏层状态(即输出)。
所以这里可以看到,state有两部分,一个是c(代表cell state)保存的每个样本的最后时刻的细胞状态。另一个是h(代表hidden state)保存的是每个样本最后时刻的隐藏层状态。
三、实现GRU模型(加入Dropout)
先贴代码:
import tensorflow as tf
import numpy as np
# 输入
X = np.random.randn(3, 6, 4)
X[1, 4:] = 0
# 序列长度
X_lengths = [6, 4, 6]
# 隐藏层神经元个数,决定了输出的最后一维维度
rnn_hidden_size = 5
# 定义GRU的cell
cell = tf.nn.rnn_cell.GRUCell(num_units=rnn_hidden_size)
cell = tf.nn.rnn_cell.DropoutWrapper(cell, input_keep_prob=1.0, output_keep_prob=0.8)
# 执行RNN
o, s = tf.nn.dynamic_rnn(cell=cell, inputs=X, sequence_length=X_lengths, dtype=tf.float64)
sess = tf.Session()
sess.run(tf.global_variables_initializer())
print(sess.run([tf.shape(o), o]))
print(sess.run([tf.shape(s), s]))
再再再和前面的代码相比,这里主要两方面的变化,一方面在于cell的定义,这个就比较熟了,
另一个在于,cell上加了一个dropout的wrapper。
dropout在这里主要有三个参数,第一个就是要装饰的cell,第二个就是输入数据的dropout概率,第三个是输出数据的dropout概率。对应RNN类结构,Dropout加在输入和输出上,不会在同一层的隐藏层中使用。
看看输出:
[array([3, 6, 5]), array([[[-0.00384183, -0.2193063 , 0.0603394 , -0.09774706,
0.1135665 ],
[-0.07668639, -0.39645098, 0.05858838, 0.15644397,
0.18005827],
[-0.06259512, -0.63629247, 0.26543947, -0.10871268,
0.22657906],
[ 0. , -0.57908255, 0.1493446 , 0.07591829,
0.27544994],
[ 0.57788612, 0.29002321, -0.14544183, 0.45116937,
-0. ],
[ 0.53739782, 0.21176507, -0.0245097 , 0.08893773,
-0. ]],
[[-0.21511515, -0.11745295, 0.14412874, -0.06509311,
0.0728543 ],
[-0.12518276, -0.41010908, 0.22555463, -0.26639758,
0. ],
[-0.24114207, -0. , 0.3797578 , -0. ,
0.17446607],
[-0. , -0.83807384, 0.57518991, -0.65002891,
0. ],
[ 0. , 0. , 0. , 0. ,
0. ],
[ 0. , 0. , 0. , 0. ,
0. ]],
[[-0.10412225, -0.19353175, 0. , -0.06415311,
0.12416278],
[-0.00095912, -0.08306708, 0.29790358, -0.24395395,
-0.04457892],
[ 0.10402355, -0.06345214, 0.09992852, -0.04574931,
0.1022426 ],
[ 0.0832184 , -0.37382437, 0.57320086, -0. ,
0. ],
[ 0. , 0.22174214, 0.15993414, 0. ,
0. ],
[ 0.28724911, 0.07586199, 0.09271125, 0.29748913,
0.21664836]]])]
[array([3, 5]), array([[ 0.42991826, 0.16941206, -0.01960776, 0.07115018, -0.32511025],
[-0.09532494, -0.67045908, 0.46015193, -0.52002314, 0.08326451],
[ 0.22979929, 0.06068959, 0.074169 , 0.23799131, 0.17331869]])]
output依旧如前面所言,代表每个时刻的隐藏层输出。
state也回到了原来 [3, 5] 的维度,这是因为在GRU中消除了输出门,细胞状态和隐藏层状态是一个值。
可以看到,输出中有一部分数据重置了。
四、实现双向LSTM
代码。。。
import tensorflow as tf
import numpy as np
# 输入
X = np.random.randn(3, 6, 4)
X[1, 4:] = 0
# 序列长度
X_lengths = [6, 4, 6]
# 隐藏层神经元个数,决定了输出的最后一维维度
rnn_hidden_size = 5
# 定义前后向cell
f_cell = tf.nn.rnn_cell.LSTMCell(num_units=rnn_hidden_size)
b_cell = tf.nn.rnn_cell.LSTMCell(num_units=rnn_hidden_size)
# 调用双向RNN函数
o, s = tf.nn.bidirectional_dynamic_rnn(cell_fw=f_cell, cell_bw=b_cell, inputs=X, sequence_length=X_lengths, dtype=tf.float64)
sess = tf.Session()
sess.run(tf.global_variables_initializer())
print(sess.run([tf.shape(o), o]))
print(sess.run([tf.shape(s), s]))
双向LSTM,主要是加入了反向一条反向的LSTM,用以捕捉某时刻的值与后序关联较大但被忽略掉的信息。
所以这里出现了两个RNN cell,这里使用的LSTMcell,可以想象,如果这里换成GRUcell,就变成双向GRU了。
这里没有加Dropout,如果要加的话,分别对前后向cell,使用如(三)所示的方法,加入DropoutWrapper。
看看输出:
[array([2, 3, 6, 5]), (array([[[ 0.00263425, -0.10848654, -0.02888396, -0.14557681,
-0.01867951],
[ 0.16308206, -0.0863213 , -0.058109 , -0.34334372,
-0.06744579],
[ 0.16662425, -0.23343837, -0.04667022, -0.21088134,
-0.05725204],
[-0.00078778, 0.054576 , -0.11021806, -0.1319716 ,
0.11758325],
[ 0.06200103, -0.07955454, -0.10945861, -0.17705271,
0.06469144],
[ 0.03694782, 0.06173401, -0.07308133, -0.13233934,
0.00837825]],
[[ 0.02614167, -0.27913968, -0.01494904, 0.01519353,
0.00592987],
[-0.11470788, -0.08984734, -0.04153365, -0.05076508,
0.02336249],
[ 0.00544636, 0.01089489, -0.07369071, -0.0196202 ,
0.0541166 ],
[-0.06484562, -0.10282789, -0.02678329, -0.12910339,
-0.07545008],
[ 0. , 0. , 0. , 0. ,
0. ],
[ 0. , 0. , 0. , 0. ,
0. ]],
[[-0.18624838, 0.11949917, 0.02117578, -0.00978313,
0.02452981],
[-0.15181061, 0.24886629, 0.0115148 , -0.01600807,
0.07623213],
[-0.0149377 , 0.29534634, -0.02139397, -0.19715693,
0.02354277],
[-0.22288016, 0.20719902, -0.01278846, -0.24238076,
-0.01606708],
[-0.47796464, 0.24820449, -0.02491049, -0.27709986,
0.02486796],
[-0.26716896, 0.10445698, 0.19896813, -0.19703691,
-0.24232671]]]), array([[[ 4.10128322e-02, -1.51443969e-01, 1.40390125e-01,
3.07225130e-02, -1.31468052e-01],
[ 4.95621797e-02, -1.38678298e-02, 1.02490178e-01,
2.87854200e-03, -7.47962853e-02],
[ 1.33340940e-01, -2.09927857e-01, -1.20134344e-01,
1.28247271e-01, -1.12649095e-01],
[ 1.45445855e-01, -1.04384322e-01, -5.10252886e-02,
1.41260135e-01, -1.05015442e-01],
[-1.79240752e-02, -6.74667093e-02, 7.90996834e-02,
-4.50249934e-02, -3.64879099e-04],
[-1.86599105e-01, 9.32626292e-02, 3.97538513e-02,
-9.79307484e-02, 2.85643007e-02]],
[[ 6.20878725e-02, -3.94152368e-01, 2.82636679e-02,
7.36741134e-02, -6.22825883e-02],
[-2.06465763e-02, -1.15915828e-01, 9.68041335e-02,
8.66617924e-03, -5.96928433e-02],
[-5.18831623e-02, -1.39275191e-01, 1.05362934e-01,
1.36670387e-02, -6.52792768e-02],
[ 2.48354288e-02, -2.72565766e-01, 1.62997637e-01,
-1.63169607e-03, 2.63353057e-02],
[ 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
0.00000000e+00, 0.00000000e+00],
[ 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
0.00000000e+00, 0.00000000e+00]],
[[-4.73881441e-01, 1.60115004e-01, 1.14253624e-01,
-8.33266211e-02, -3.14134248e-02],
[-3.02000780e-01, 8.38983607e-02, 9.43078470e-02,
-1.07013195e-02, -8.76889852e-02],
[-4.71119928e-02, -3.33859912e-02, 1.43407943e-01,
-3.39898348e-02, -2.57795278e-02],
[-2.29723652e-02, -6.25440340e-02, 1.13618231e-01,
-1.08651666e-01, 8.59181685e-02],
[-4.48911677e-02, -1.13751661e-02, 5.12618004e-02,
-1.04067697e-01, 5.27669741e-02],
[-8.21400131e-02, 3.94837680e-02, 1.43507504e-01,
-1.51388314e-01, 1.09737389e-01]]]))]
[array([2, 2, 3, 5]), (LSTMStateTuple(c=array([[ 0.06742202, 0.17478356, -0.12198078, -0.23708507, 0.01693249],
[-0.10441755, -0.17200567, -0.15135097, -0.32268767, -0.16683725],
[-0.66716078, 0.32491265, 0.39393026, -0.45806153, -0.41935517]]), h=array([[ 0.03694782, 0.06173401, -0.07308133, -0.13233934, 0.00837825],
[-0.06484562, -0.10282789, -0.02678329, -0.12910339, -0.07545008],
[-0.26716896, 0.10445698, 0.19896813, -0.19703691, -0.24232671]])), LSTMStateTuple(c=array([[ 0.12004201, -0.29230004, 0.34609941, 0.05830209, -0.29338321],
[ 0.41384662, -0.85269306, 0.05049701, 0.0979963 , -0.08568259],
[-0.65500626, 0.32725095, 0.23971899, -0.28581693, -0.09397309]]), h=array([[ 0.04101283, -0.15144397, 0.14039013, 0.03072251, -0.13146805],
[ 0.06208787, -0.39415237, 0.02826367, 0.07367411, -0.06228259],
[-0.47388144, 0.160115 , 0.11425362, -0.08332662, -0.03141342]])))]
output的维度变成了 [2, 3, 6, 5]。与前面的LSTM相比,相当于把两个LSTM的输出([3, 6, 5])拼成一个输出了,意义还是一样的,output[0]代表正向LSTM每个时刻的隐藏层输出,output[1]代表反向LSTM每个时刻的隐藏层输出。
state的维度变成了 [2, 2, 3, 5]。与前面的LSTM相比,相当于把两个LSTM的状态([2, 3, 5])拼成一个状态了,意义也是一样,如output,state[0]代表正向状态,state[1]代表反向状态。