TensorFlow实现RNN

参考文章:知乎 - tf.nn.dynamic_rnn 详解

 

简单提一下,用TensorFlow实现RNN系列结构,基本就是定义一个cell,然后调用一个RNN函数,就获得输出了。而且,cell定义成什么类型基本就是什么类型的RNN了。

 

一、TensorFlow关于RNN函数的定义

tf.nn.dynamic_rnn(
    cell,                      # RNN记忆单元
    inputs,                    # 序列输入
    sequence_length=None,      # 序列长度,即时序长度
    initial_state=None,        # RNN初始化状态
    dtype=None,                # 数据类型
    parallel_iterations=None,  # 并行执行迭代次数
    swap_memory=False,         # 用于多GUP并行训练模型
    time_major=False,          # 规定输入、输出的shape
    scope=None                 # 变量作用域,默认"run"
)

其中,cell就决定了RNN的类型,例如,普通RNN的cell,那就是原始的RNN结构,如果是LSTM的cell,那这个就是LSTM,如果是GRU的cell,那这就是GRU。

inpus就是我们实际的输入数据。sequence_length是输入的数据长度,也是对应的时序长度。

time_major决定了输入输出的shape。怎么理解这句话呢?

通常我们的输入(即这里的inputs)和输出(一会儿提到的output)的shape是`[batch_size, max_time, embedding_size]`。如果time_major为True,那么这个函数就认为输入和输出的shape是`[max_time, batch_size, embedding_size]`,如果为False,那么这个函数就认为输入和输出的shape是`[batch_size, max_time, embedding_size]`。

这个参数默认值是False。

 

这个函数的返回值有两个,一个是RNN的输出,一个是RNN每个时刻的隐状态。

但是这个输出还挺讲究的,实际需要根据cell的类型,决定输出的类型。

为什么呢?比如LSTM中,细胞状态和输出是不一致的,但在GRU中,这两个值就是一致的,所以就产生了,输出的shape稍有区别。

 

二、简单实现原始RNN模型

话不多说,先上代码:

import tensorflow as tf
import numpy as np

# 输入
X = np.random.randn(3, 6, 4)
X[1, 4:] = 0

# 序列长度
X_lengths = [6, 4, 6]

# 隐藏层神经元个数,决定了输出的最后一维维度
rnn_hidden_size = 5

# 定义RNN的cell
cell = tf.nn.rnn_cell.BasicRNNCell(num_units=rnn_hidden_size)

# 执行RNN
o, s = tf.nn.dynamic_rnn(cell=cell, inputs=X, sequence_length=X_lengths, dtype=tf.float64)

sess = tf.Session()
sess.run(tf.global_variables_initializer())

print(sess.run([tf.shape(o), o]))
print(sess.run([tf.shape(s), s]))

整个代码还是比较简单的,正如一开始所说,定义一个cell,调一个RNN函数就完成了。

看一下输出:

[array([3, 6, 5]), array([[[ 0.17310878,  0.82633802, -0.43270981,  0.06905287,
         -0.44548788],
        [ 0.76207901,  0.14140098, -0.95680809,  0.72005081,
         -0.15649403],
        [ 0.79388277, -0.97675279, -0.33644186,  0.95320421,
         -0.60705681],
        [ 0.00563942, -0.02027826,  0.89590012, -0.22456675,
         -0.40984772],
        [-0.02721506, -0.87714997, -0.43034662, -0.93520363,
          0.94834008],
        [ 0.25531945,  0.93336422, -0.92178408,  0.30199629,
         -0.92172056]],

       [[ 0.28433064, -0.89897588,  0.4130407 ,  0.55888719,
         -0.40204589],
        [ 0.41459254,  0.3597689 ,  0.9548185 , -0.00866829,
          0.50680063],
        [-0.06959048, -0.4649923 ,  0.94124415,  0.08926017,
          0.33270379],
        [ 0.69817465,  0.95005181,  0.70850335,  0.5241701 ,
         -0.53791173],
        [ 0.        ,  0.        ,  0.        ,  0.        ,
          0.        ],
        [ 0.        ,  0.        ,  0.        ,  0.        ,
          0.        ]],

       [[-0.12966264, -0.32701574, -0.74199627, -0.49359511,
         -0.32056881],
        [-0.44894617, -0.6809439 ,  0.63751225,  0.11421618,
          0.12798053],
        [-0.13901253,  0.86462562, -0.49524682, -0.77128572,
         -0.71333543],
        [ 0.16060172, -0.47568445, -0.54749102,  0.39206036,
          0.46851311],
        [-0.94127998, -0.37428214,  0.9176711 , -0.75276436,
          0.52876751],
        [ 0.60046028, -0.76555278,  0.69193852,  0.76096789,
         -0.72530337]]])]
[array([3, 5]), array([[ 0.25531945,  0.93336422, -0.92178408,  0.30199629, -0.92172056],
       [ 0.69817465,  0.95005181,  0.70850335,  0.5241701 , -0.53791173],
       [ 0.60046028, -0.76555278,  0.69193852,  0.76096789, -0.72530337]])]

可以看到,RNN的输出有两项,第一项是o(即output),代表RNN每个时间步的隐状态输出,这在RNN或LSTM或GRU里都是一致的,没有区别。相对有区别的是s(即state),代表最终状态。这里目前看不出来什么问题,一会儿到LSTM了提一下。

分析讨论下模型的输出:

对于output,输出的维度是 [3, 6, 5] 。

这个3我们可以理解为batch_size;

6可以理解为最长时间步,也即最长的序列长度,可以看到在第二个样本的数据中,出现了两行0,这是因为我们在sequence_length中指定了,对于第二个样本,序列长度只到4;

5可以理解为embedding的维度,至于为什么是5,因为我们在定义RNN的cell的时候,给了一个参数 num_units,这个值即确定了输出的维度。

对于state,输出的维度是 [3, 5]。

仔细观察下,就会发现,state其实就是output中,每个样本的最后一个输出。对于第二个样本,由于序列长度为4,所以最终的state就对应时序为4时的输出。

 

三、实现LSTM模型

还是先贴代码:

import tensorflow as tf
import numpy as np

# 输入
X = np.random.randn(3, 6, 4)
X[1, 4:] = 0

# 序列长度
X_lengths = [6, 4, 6]

# 隐藏层神经元个数,决定了输出的最后一维维度
rnn_hidden_size = 5

# 定义LSTM的cell
cell = tf.nn.rnn_cell.LSTMCell(num_units=rnn_hidden_size)

# 执行RNN
o, s = tf.nn.dynamic_rnn(cell=cell, inputs=X, sequence_length=X_lengths, dtype=tf.float64)

sess = tf.Session()
sess.run(tf.global_variables_initializer())

print(sess.run([tf.shape(o), o]))
print(sess.run([tf.shape(s), s]))

仔细找找,就会发现,和前面实现RNN的模型相比,就只是修改了一下,cell的定义。。。

看看输出有什么不一样:

[array([3, 6, 5]), array([[[ 0.0970524 ,  0.04119286, -0.08474181, -0.18085976,
         -0.07741971],
        [-0.02223423, -0.10350601, -0.08190117, -0.09554744,
          0.00527808],
        [ 0.09914153, -0.04312684, -0.06642732, -0.14581045,
         -0.08470683],
        [ 0.0531387 , -0.05552092, -0.12808912, -0.15655323,
         -0.02803473],
        [ 0.18570773,  0.00828065,  0.02929196, -0.02115647,
         -0.20298142],
        [ 0.097577  ,  0.00784962, -0.18960612, -0.1939976 ,
         -0.02686524]],

       [[ 0.05894009,  0.0371513 , -0.147196  , -0.12490672,
          0.00890823],
        [-0.13846855,  0.0048299 , -0.27920325, -0.10103866,
          0.19092917],
        [-0.0160551 ,  0.04513064, -0.28024583, -0.06436632,
          0.12552706],
        [-0.06964844, -0.08109376, -0.04003272,  0.13113396,
          0.0881404 ],
        [ 0.        ,  0.        ,  0.        ,  0.        ,
          0.        ],
        [ 0.        ,  0.        ,  0.        ,  0.        ,
          0.        ]],

       [[-0.07211149,  0.01887877, -0.12521735, -0.05033309,
          0.10820924],
        [-0.08311677, -0.00736387, -0.17241497, -0.09276841,
          0.11316439],
        [ 0.10335096,  0.00665376, -0.02476282, -0.21087149,
         -0.11191987],
        [ 0.09959326,  0.06205221, -0.01933063, -0.07346646,
         -0.11060355],
        [-0.04239206, -0.20163064, -0.02140169, -0.02752217,
          0.04012931],
        [-0.10801504, -0.04490684, -0.02402958,  0.0872379 ,
          0.08214646]]])]
[array([2, 3, 5]), LSTMStateTuple(c=array([[ 0.18660455,  0.01673364, -0.43402739, -0.31572953, -0.05726592],
       [-0.18487706, -0.12826426, -0.07106543,  0.46722466,  0.19727486],
       [-0.17872193, -0.10419721, -0.04430988,  0.18415518,  0.12666562]]), h=array([[ 0.097577  ,  0.00784962, -0.18960612, -0.1939976 , -0.02686524],
       [-0.06964844, -0.08109376, -0.04003272,  0.13113396,  0.0881404 ],
       [-0.10801504, -0.04490684, -0.02402958,  0.0872379 ,  0.08214646]]))]

第一个是输出output,维度还是 [3, 6, 5], 和前面一样,是LSTM每个时刻的隐藏层输出。

关于state,就和前面RNN不太一样了。这里维度是 [2, 3, 5]。

和前面RNN相比多了一个 [3, 5],其原因就是,LSTM和其他RNN模型不太一样的就是,其细胞状态和输出并不是同一个。

具体是通过一个输出门将细胞状态转换为隐藏层状态(即输出)。

所以这里可以看到,state有两部分,一个是c(代表cell state)保存的每个样本的最后时刻的细胞状态。另一个是h(代表hidden state)保存的是每个样本最后时刻的隐藏层状态。

 

三、实现GRU模型(加入Dropout)

先贴代码:

import tensorflow as tf
import numpy as np

# 输入
X = np.random.randn(3, 6, 4)
X[1, 4:] = 0

# 序列长度
X_lengths = [6, 4, 6]

# 隐藏层神经元个数,决定了输出的最后一维维度
rnn_hidden_size = 5

# 定义GRU的cell
cell = tf.nn.rnn_cell.GRUCell(num_units=rnn_hidden_size)
cell = tf.nn.rnn_cell.DropoutWrapper(cell, input_keep_prob=1.0, output_keep_prob=0.8)

# 执行RNN
o, s = tf.nn.dynamic_rnn(cell=cell, inputs=X, sequence_length=X_lengths, dtype=tf.float64)

sess = tf.Session()
sess.run(tf.global_variables_initializer())

print(sess.run([tf.shape(o), o]))
print(sess.run([tf.shape(s), s]))

再再再和前面的代码相比,这里主要两方面的变化,一方面在于cell的定义,这个就比较熟了,

另一个在于,cell上加了一个dropout的wrapper。

dropout在这里主要有三个参数,第一个就是要装饰的cell,第二个就是输入数据的dropout概率,第三个是输出数据的dropout概率。对应RNN类结构,Dropout加在输入和输出上,不会在同一层的隐藏层中使用。

看看输出:

[array([3, 6, 5]), array([[[-0.00384183, -0.2193063 ,  0.0603394 , -0.09774706,
          0.1135665 ],
        [-0.07668639, -0.39645098,  0.05858838,  0.15644397,
          0.18005827],
        [-0.06259512, -0.63629247,  0.26543947, -0.10871268,
          0.22657906],
        [ 0.        , -0.57908255,  0.1493446 ,  0.07591829,
          0.27544994],
        [ 0.57788612,  0.29002321, -0.14544183,  0.45116937,
         -0.        ],
        [ 0.53739782,  0.21176507, -0.0245097 ,  0.08893773,
         -0.        ]],

       [[-0.21511515, -0.11745295,  0.14412874, -0.06509311,
          0.0728543 ],
        [-0.12518276, -0.41010908,  0.22555463, -0.26639758,
          0.        ],
        [-0.24114207, -0.        ,  0.3797578 , -0.        ,
          0.17446607],
        [-0.        , -0.83807384,  0.57518991, -0.65002891,
          0.        ],
        [ 0.        ,  0.        ,  0.        ,  0.        ,
          0.        ],
        [ 0.        ,  0.        ,  0.        ,  0.        ,
          0.        ]],

       [[-0.10412225, -0.19353175,  0.        , -0.06415311,
          0.12416278],
        [-0.00095912, -0.08306708,  0.29790358, -0.24395395,
         -0.04457892],
        [ 0.10402355, -0.06345214,  0.09992852, -0.04574931,
          0.1022426 ],
        [ 0.0832184 , -0.37382437,  0.57320086, -0.        ,
          0.        ],
        [ 0.        ,  0.22174214,  0.15993414,  0.        ,
          0.        ],
        [ 0.28724911,  0.07586199,  0.09271125,  0.29748913,
          0.21664836]]])]
[array([3, 5]), array([[ 0.42991826,  0.16941206, -0.01960776,  0.07115018, -0.32511025],
       [-0.09532494, -0.67045908,  0.46015193, -0.52002314,  0.08326451],
       [ 0.22979929,  0.06068959,  0.074169  ,  0.23799131,  0.17331869]])]

output依旧如前面所言,代表每个时刻的隐藏层输出。

state也回到了原来 [3, 5] 的维度,这是因为在GRU中消除了输出门,细胞状态和隐藏层状态是一个值。

可以看到,输出中有一部分数据重置了。

 

四、实现双向LSTM

代码。。。

import tensorflow as tf
import numpy as np

# 输入
X = np.random.randn(3, 6, 4)
X[1, 4:] = 0

# 序列长度
X_lengths = [6, 4, 6]

# 隐藏层神经元个数,决定了输出的最后一维维度
rnn_hidden_size = 5

# 定义前后向cell
f_cell = tf.nn.rnn_cell.LSTMCell(num_units=rnn_hidden_size)
b_cell = tf.nn.rnn_cell.LSTMCell(num_units=rnn_hidden_size)

# 调用双向RNN函数
o, s = tf.nn.bidirectional_dynamic_rnn(cell_fw=f_cell, cell_bw=b_cell, inputs=X, sequence_length=X_lengths, dtype=tf.float64)

sess = tf.Session()
sess.run(tf.global_variables_initializer())

print(sess.run([tf.shape(o), o]))
print(sess.run([tf.shape(s), s]))

双向LSTM,主要是加入了反向一条反向的LSTM,用以捕捉某时刻的值与后序关联较大但被忽略掉的信息。

所以这里出现了两个RNN cell,这里使用的LSTMcell,可以想象,如果这里换成GRUcell,就变成双向GRU了。

这里没有加Dropout,如果要加的话,分别对前后向cell,使用如(三)所示的方法,加入DropoutWrapper。

看看输出:

[array([2, 3, 6, 5]), (array([[[ 0.00263425, -0.10848654, -0.02888396, -0.14557681,
         -0.01867951],
        [ 0.16308206, -0.0863213 , -0.058109  , -0.34334372,
         -0.06744579],
        [ 0.16662425, -0.23343837, -0.04667022, -0.21088134,
         -0.05725204],
        [-0.00078778,  0.054576  , -0.11021806, -0.1319716 ,
          0.11758325],
        [ 0.06200103, -0.07955454, -0.10945861, -0.17705271,
          0.06469144],
        [ 0.03694782,  0.06173401, -0.07308133, -0.13233934,
          0.00837825]],

       [[ 0.02614167, -0.27913968, -0.01494904,  0.01519353,
          0.00592987],
        [-0.11470788, -0.08984734, -0.04153365, -0.05076508,
          0.02336249],
        [ 0.00544636,  0.01089489, -0.07369071, -0.0196202 ,
          0.0541166 ],
        [-0.06484562, -0.10282789, -0.02678329, -0.12910339,
         -0.07545008],
        [ 0.        ,  0.        ,  0.        ,  0.        ,
          0.        ],
        [ 0.        ,  0.        ,  0.        ,  0.        ,
          0.        ]],

       [[-0.18624838,  0.11949917,  0.02117578, -0.00978313,
          0.02452981],
        [-0.15181061,  0.24886629,  0.0115148 , -0.01600807,
          0.07623213],
        [-0.0149377 ,  0.29534634, -0.02139397, -0.19715693,
          0.02354277],
        [-0.22288016,  0.20719902, -0.01278846, -0.24238076,
         -0.01606708],
        [-0.47796464,  0.24820449, -0.02491049, -0.27709986,
          0.02486796],
        [-0.26716896,  0.10445698,  0.19896813, -0.19703691,
         -0.24232671]]]), array([[[ 4.10128322e-02, -1.51443969e-01,  1.40390125e-01,
          3.07225130e-02, -1.31468052e-01],
        [ 4.95621797e-02, -1.38678298e-02,  1.02490178e-01,
          2.87854200e-03, -7.47962853e-02],
        [ 1.33340940e-01, -2.09927857e-01, -1.20134344e-01,
          1.28247271e-01, -1.12649095e-01],
        [ 1.45445855e-01, -1.04384322e-01, -5.10252886e-02,
          1.41260135e-01, -1.05015442e-01],
        [-1.79240752e-02, -6.74667093e-02,  7.90996834e-02,
         -4.50249934e-02, -3.64879099e-04],
        [-1.86599105e-01,  9.32626292e-02,  3.97538513e-02,
         -9.79307484e-02,  2.85643007e-02]],

       [[ 6.20878725e-02, -3.94152368e-01,  2.82636679e-02,
          7.36741134e-02, -6.22825883e-02],
        [-2.06465763e-02, -1.15915828e-01,  9.68041335e-02,
          8.66617924e-03, -5.96928433e-02],
        [-5.18831623e-02, -1.39275191e-01,  1.05362934e-01,
          1.36670387e-02, -6.52792768e-02],
        [ 2.48354288e-02, -2.72565766e-01,  1.62997637e-01,
         -1.63169607e-03,  2.63353057e-02],
        [ 0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
          0.00000000e+00,  0.00000000e+00],
        [ 0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
          0.00000000e+00,  0.00000000e+00]],

       [[-4.73881441e-01,  1.60115004e-01,  1.14253624e-01,
         -8.33266211e-02, -3.14134248e-02],
        [-3.02000780e-01,  8.38983607e-02,  9.43078470e-02,
         -1.07013195e-02, -8.76889852e-02],
        [-4.71119928e-02, -3.33859912e-02,  1.43407943e-01,
         -3.39898348e-02, -2.57795278e-02],
        [-2.29723652e-02, -6.25440340e-02,  1.13618231e-01,
         -1.08651666e-01,  8.59181685e-02],
        [-4.48911677e-02, -1.13751661e-02,  5.12618004e-02,
         -1.04067697e-01,  5.27669741e-02],
        [-8.21400131e-02,  3.94837680e-02,  1.43507504e-01,
         -1.51388314e-01,  1.09737389e-01]]]))]
[array([2, 2, 3, 5]), (LSTMStateTuple(c=array([[ 0.06742202,  0.17478356, -0.12198078, -0.23708507,  0.01693249],
       [-0.10441755, -0.17200567, -0.15135097, -0.32268767, -0.16683725],
       [-0.66716078,  0.32491265,  0.39393026, -0.45806153, -0.41935517]]), h=array([[ 0.03694782,  0.06173401, -0.07308133, -0.13233934,  0.00837825],
       [-0.06484562, -0.10282789, -0.02678329, -0.12910339, -0.07545008],
       [-0.26716896,  0.10445698,  0.19896813, -0.19703691, -0.24232671]])), LSTMStateTuple(c=array([[ 0.12004201, -0.29230004,  0.34609941,  0.05830209, -0.29338321],
       [ 0.41384662, -0.85269306,  0.05049701,  0.0979963 , -0.08568259],
       [-0.65500626,  0.32725095,  0.23971899, -0.28581693, -0.09397309]]), h=array([[ 0.04101283, -0.15144397,  0.14039013,  0.03072251, -0.13146805],
       [ 0.06208787, -0.39415237,  0.02826367,  0.07367411, -0.06228259],
       [-0.47388144,  0.160115  ,  0.11425362, -0.08332662, -0.03141342]])))]

output的维度变成了 [2, 3, 6, 5]。与前面的LSTM相比,相当于把两个LSTM的输出([3, 6, 5])拼成一个输出了,意义还是一样的,output[0]代表正向LSTM每个时刻的隐藏层输出,output[1]代表反向LSTM每个时刻的隐藏层输出。

state的维度变成了 [2, 2, 3, 5]。与前面的LSTM相比,相当于把两个LSTM的状态([2, 3, 5])拼成一个状态了,意义也是一样,如output,state[0]代表正向状态,state[1]代表反向状态。

你可能感兴趣的:(python,tensorflow,LSTM,RNN)