目录
1.使用方法
2.未指定sequence_length
2.1 tensorflow LSTM_cell原理
2.2 tensorflow中LSTM的前向传播
2.3 用numpy来验证tensorflow中 tf.nn.rnn_cell.LSTMCell的前向传播
3. 指定sequence_length时
4. 动态双向rnn
参考:https://blog.csdn.net/u010223750/article/details/71079036
引言
TensorFlow很容易上手,但是TensorFlow的很多trick却是提升TensorFlow心法的法门,之前说过TensorFlow的read心法,现在想说一说TensorFlow在RNN上的心法,简直好用到哭 【以下实验均是基于TensorFlow1.0】
简要介绍tensorflow的RNN
其实在前面多篇都已经提到了TensorFlow的RNN,也在我之前的文章TensorFlow实现文本分类文章中用到了BasicLSTM的方法,通常的,使用RNN的时候,我们需要指定num_step,也就是TensorFlow的roll step步数,但是对于变长的文本来说,指定num_step就不可避免的需要进行padding操作,在之前的文章TensorFlow高阶读写教程也使用了dynamic_padding方法实现自动padding,但是这还不够,因为在跑一遍RNN/LSTM之后,还是需要对padding部分的内容进行删除,我称之为“反padding”,无可避免的,我们就需要指定mask矩阵了,这就有点不优雅,但是TensorFlow提供了一个很优雅的解决方法,让mask去见马克思去了,那就是dynamic_rnn
tf.dynamic_rnn
tensorflow 的dynamic_rnn方法,我们用一个小例子来说明其用法,假设你的RNN的输入input是[2,20,128],其中2是batch_size,20是文本最大长度,128是embedding_size,可以看出,有两个example,我们假设第二个文本长度只有13,剩下的7个是使用0-padding方法填充的。dynamic返回的是两个参数:outputs,last_states,其中outputs是[2,20,128],也就是每一个迭代隐状态的输出,last_states是由(c,h)组成的tuple,均为[batch,128]。
到这里并没有什么不同,但是dynamic有个参数:sequence_length,这个参数用来指定每个example的长度,比如上面的例子中,我们令 sequence_length为[20,13],表示第一个example有效长度为20,第二个example有效长度为13,当我们传入这个参数的时候,对于第二个example,TensorFlow对于13以后的padding就不计算了,其last_states将重复第13步的last_states直至第20步,而outputs中超过13步的结果将会被置零。
用numpy来模拟tensorflowLSTM的前向传播:
参考:https://www.cnblogs.com/yuetz/p/6563377.html
class BasicLSTMCell(RNNCell):
"""Basic LSTM recurrent network cell.
The implementation is based on: http://arxiv.org/abs/1409.2329.
We add forget_bias (default: 1) to the biases of the forget gate in order to
reduce the scale of forgetting in the beginning of the training.
It does not allow cell clipping, a projection layer, and does not
use peep-hole connections: it is the basic baseline.
For advanced models, please use the full LSTMCell that follows.
"""
def __init__(self, num_units, forget_bias=1.0, input_size=None,
state_is_tuple=True, activation=tanh):
"""Initialize the basic LSTM cell.
Args:
num_units: int, The number of units in the LSTM cell.
forget_bias: float, The bias added to forget gates (see above).
input_size: Deprecated and unused.
state_is_tuple: If True, accepted and returned states are 2-tuples of
the `c_state` and `m_state`. If False, they are concatenated
along the column axis. The latter behavior will soon be deprecated.
activation: Activation function of the inner states.
"""
if not state_is_tuple:
logging.warn("%s: Using a concatenated state is slower and will soon be "
"deprecated. Use state_is_tuple=True.", self)
if input_size is not None:
logging.warn("%s: The input_size parameter is deprecated.", self)
self._num_units = num_units
self._forget_bias = forget_bias
self._state_is_tuple = state_is_tuple
self._activation = activation
@property
def state_size(self):
return (LSTMStateTuple(self._num_units, self._num_units)
if self._state_is_tuple else 2 * self._num_units)
@property
def output_size(self):
return self._num_units
def __call__(self, inputs, state, scope=None):
"""Long short-term memory cell (LSTM)."""
with vs.variable_scope(scope or "basic_lstm_cell"):
# Parameters of gates are concatenated into one multiply for efficiency.
if self._state_is_tuple:
c, h = state
else:
c, h = array_ops.split(value=state, num_or_size_splits=2, axis=1)
# 线性计算 concat = [inputs, h]W + b
# 线性计算,分配W和b,W的shape为(2*num_units, 4*num_units), b的shape为(4*num_units,),共包含有四套参数,
# concat shape(batch_size, 4*num_units)
# 注意:只有cell 的input和output的size相等时才可以这样计算,否则要定义两套W,b.每套再包含四套参数
concat = _linear([inputs, h], 4 * self._num_units, True, scope=scope)
# i = input_gate, j = new_input, f = forget_gate, o = output_gate
i, j, f, o = array_ops.split(value=concat, num_or_size_splits=4, axis=1)
new_c = (c * sigmoid(f + self._forget_bias) + sigmoid(i) *
self._activation(j))
new_h = self._activation(new_c) * sigmoid(o)
if self._state_is_tuple:
new_state = LSTMStateTuple(new_c, new_h)
else:
new_state = array_ops.concat([new_c, new_h], 1)
return new_h, new_state
import tensorflow as tf
import numpy as np
import random
random.seed(10)
# 创建输入数据
# batch=2, 第一个序列中的time_step=4,
# 第二个序列中time_step=2, 不够直接补0.0
data = np.array([[[1, 2, 3],
[4, 5, 6],
[5, 6, 4],
[1, 2, 1]],
[[3, 2, 4],
[2, 2, 2],
[0, 0, 0],
[0, 0, 0]]
])/10.0
# 数据占位符
data_placeholder = tf.placeholder(dtype=tf.float32, shape=[2, 4, 3])
# rnn的隐层维数 : 4
cell_f = tf.nn.rnn_cell.LSTMCell(num_units=4, forget_bias=0.0)
outputs, last_states = tf.nn.dynamic_rnn(cell=cell_f, dtype=tf.float32, inputs=data_placeholder)
def get_rnn_variables_to_restore():
return [v for v in tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES) if 'lstm_cell' in v.name]
t = get_rnn_variables_to_restore()
print('输出LSTM_cell中的参数信息:')
print(t[0])
print(t[1])
# 由于我们看不到LSTM参数的初始化,所以不能指定参数的初始化方式
# 但是可以通过上边的方式找到LSTM的参数,从而用tf.assign()初始化
# 用tf.assign将parameter参数赋给LSTM中的参数
parameter = [i + 1 for i in range(112)]
parameter = np.array(parameter)
parameter = np.reshape(parameter, (7, 16)) / 100.0
print('parameter: ')
print(parameter)
parameter_placeholder = tf.placeholder(dtype=tf.float32, shape=[7, 16])
# 讲parameter_placeholder中的参数赋给LSTM_cell中的参数
assign = tf.assign(t[0], parameter_placeholder)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
print(sess.run(assign, feed_dict={parameter_placeholder: parameter}))
print('各个时间步的隐层状态:')
print(sess.run(outputs, feed_dict={data_placeholder: data}))
输出:
输出LSTM_cell中的参数信息:
parameter:
[[0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13 0.14
0.15 0.16]
[0.17 0.18 0.19 0.2 0.21 0.22 0.23 0.24 0.25 0.26 0.27 0.28 0.29 0.3
0.31 0.32]
[0.33 0.34 0.35 0.36 0.37 0.38 0.39 0.4 0.41 0.42 0.43 0.44 0.45 0.46
0.47 0.48]
[0.49 0.5 0.51 0.52 0.53 0.54 0.55 0.56 0.57 0.58 0.59 0.6 0.61 0.62
0.63 0.64]
[0.65 0.66 0.67 0.68 0.69 0.7 0.71 0.72 0.73 0.74 0.75 0.76 0.77 0.78
0.79 0.8 ]
[0.81 0.82 0.83 0.84 0.85 0.86 0.87 0.88 0.89 0.9 0.91 0.92 0.93 0.94
0.95 0.96]
[0.97 0.98 0.99 1. 1.01 1.02 1.03 1.04 1.05 1.06 1.07 1.08 1.09 1.1
1.11 1.12]]
2018-10-19 16:09:07.859667: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
[[0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13 0.14
0.15 0.16]
[0.17 0.18 0.19 0.2 0.21 0.22 0.23 0.24 0.25 0.26 0.27 0.28 0.29 0.3
0.31 0.32]
[0.33 0.34 0.35 0.36 0.37 0.38 0.39 0.4 0.41 0.42 0.43 0.44 0.45 0.46
0.47 0.48]
[0.49 0.5 0.51 0.52 0.53 0.54 0.55 0.56 0.57 0.58 0.59 0.6 0.61 0.62
0.63 0.64]
[0.65 0.66 0.67 0.68 0.69 0.7 0.71 0.72 0.73 0.74 0.75 0.76 0.77 0.78
0.79 0.8 ]
[0.81 0.82 0.83 0.84 0.85 0.86 0.87 0.88 0.89 0.9 0.91 0.92 0.93 0.94
0.95 0.96]
[0.97 0.98 0.99 1. 1.01 1.02 1.03 1.04 1.05 1.06 1.07 1.08 1.09 1.1
1.11 1.12]]
各个时间步的隐层状态:
[[[0.04597805 0.04794675 0.04993038 0.05192876]
[0.21005629 0.2185854 0.22713308 0.23569286]
[0.5031241 0.51528114 0.5271909 0.5388504 ]
[0.77954775 0.78653383 0.7932657 0.79975235]]
[[0.06209857 0.06523891 0.06840938 0.07160922]
[0.14686525 0.15213361 0.1574358 0.1627702 ]
[0.2550022 0.2604864 0.26597795 0.27147514]
[0.45640406 0.4631802 0.4699056 0.4765787 ]]]
import numpy as np
# 来模拟tf.nn.sigmoid(x)函数
def sig(x):
for i in range(len(x)):
for j in range(len(x[0])):
x[i][j] = 1.0 / (1.0 + math.exp(-x[i][j]))
return x
# 来模拟tf.nn.tanh(x)函数
def tanh(x):
for i in range(len(x)):
for j in range(len(x[0])):
x[i][j] = (math.exp(x[i][j]) - math.exp(-x[i][j])) / (math.exp(x[i][j]) + math.exp(-x[i][j]))
return x
# 创建输入数据
# batch=2, 第一个序列中的time_step=4,
# 第二个序列中time_step=2, 不够直接补0.0
data = np.array([[[1, 2, 3],
[4, 5, 6],
[5, 6, 4],
[1, 2, 1]],
[[3, 2, 4],
[2, 2, 2],
[0, 0, 0],
[0, 0, 0]]
])/10.0
"""
input = input/10.0
input1 = np.array([[[1,2,1],[5,6,4],[4,5,6],[1,2,3]],
[[0,0,0],[0,0,0],[2,2,2],[3,2,4]]])
input1 = input1/10.0
"""
print("data : ")
print(data)
w = [i + 1 for i in range(112)]
w = np.array(w)
w = np.reshape(w, (7, 16)) / 100.0
print('LSTM_cell中的参数 :')
print(w)
print('LSTM_cell中的初始h: ')
h = np.zeros((2, 4))
h = h / 10.0
print('LSTM_cell中的初始c: ')
c = np.zeros((2, 4))
c = c / 10.0
# 时间步最大值是4:
for t in range(4):
current = data[:, t, :]
# 将当前输入与h进行拼接
current = np.concatenate((current, h), axis=1)
mid = np.matmul(current, w)
print("mid : ")
print(mid)
# i表示input gate中未激活的值
i = mid[:, 0:4]
# j表示未激活的输入
j = mid[:, 4:8]
# f表示未激活的forget gate
f = mid[:, 8:12]
# o表示未激活的output gata
o = mid[:, 12:16]
# 新的cell state
c = c * sig(f) + sig(i) * tanh(j)
h = sig(o) * tanh(c)
print(str(t+1) + '时刻的h : ')
print(h)
输出:
data :
[[[0.1 0.2 0.3]
[0.4 0.5 0.6]
[0.5 0.6 0.4]
[0.1 0.2 0.1]]
[[0.3 0.2 0.4]
[0.2 0.2 0.2]
[0. 0. 0. ]
[0. 0. 0. ]]]
LSTM_cell中的参数 :
[[0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13 0.14
0.15 0.16]
[0.17 0.18 0.19 0.2 0.21 0.22 0.23 0.24 0.25 0.26 0.27 0.28 0.29 0.3
0.31 0.32]
[0.33 0.34 0.35 0.36 0.37 0.38 0.39 0.4 0.41 0.42 0.43 0.44 0.45 0.46
0.47 0.48]
[0.49 0.5 0.51 0.52 0.53 0.54 0.55 0.56 0.57 0.58 0.59 0.6 0.61 0.62
0.63 0.64]
[0.65 0.66 0.67 0.68 0.69 0.7 0.71 0.72 0.73 0.74 0.75 0.76 0.77 0.78
0.79 0.8 ]
[0.81 0.82 0.83 0.84 0.85 0.86 0.87 0.88 0.89 0.9 0.91 0.92 0.93 0.94
0.95 0.96]
[0.97 0.98 0.99 1. 1.01 1.02 1.03 1.04 1.05 1.06 1.07 1.08 1.09 1.1
1.11 1.12]]
LSTM_cell中的初始h:
LSTM_cell中的初始c:
mid :
[[0.134 0.14 0.146 0.152 0.158 0.164 0.17 0.176 0.182 0.188 0.194 0.2
0.206 0.212 0.218 0.224]
[0.169 0.178 0.187 0.196 0.205 0.214 0.223 0.232 0.241 0.25 0.259 0.268
0.277 0.286 0.295 0.304]]
1时刻的h :
[[0.04597805 0.04794676 0.04993039 0.05192877]
[0.06209857 0.06523893 0.06840938 0.07160925]]
mid :
[[0.43150916 0.448467 0.46542484 0.48238268 0.49934052 0.51629836
0.5332562 0.55021404 0.56717188 0.58412972 0.60108756 0.6180454
0.63500324 0.65196108 0.66891892 0.68587676]
[0.29970617 0.30837974 0.3170533 0.32572686 0.33440042 0.34307398
0.35174754 0.3604211 0.36909466 0.37776823 0.38644179 0.39511535
0.40378891 0.41246247 0.42113603 0.42980959]]
2时刻的h :
[[0.20998371 0.21850343 0.22704085 0.23558962]
[0.1467197 0.15196619 0.15724435 0.16255241]]
mid :
[[0.89634427 0.92025544 0.94416662 0.96807779 0.99198897 1.01590015
1.03981132 1.0637225 1.08763367 1.11154485 1.13545603 1.1593672
1.18327838 1.20718955 1.23110073 1.25501191]
[0.45571443 0.46189926 0.46808409 0.47426891 0.48045374 0.48663857
0.49282339 0.49900822 0.50519304 0.51137787 0.5175627 0.52374752
0.52993235 0.53611718 0.542302 0.54848683]]
3时刻的h :
[[0.49912617 0.51091546 0.52244355 0.53370864]
[0.25299429 0.25829698 0.26359548 0.26888789]]
mid :
[[1.58554353 1.61020546 1.6348674 1.65952934 1.68419128 1.70885322
1.73351515 1.75817709 1.78283903 1.80750097 1.83216291 1.85682485
1.88148678 1.90614872 1.93081066 1.9554726 ]
[0.76619382 0.77663157 0.78706932 0.79750706 0.80794481 0.81838256
0.8288203 0.83925805 0.8496958 0.86013354 0.87057129 0.88100903
0.89144678 0.90188453 0.91232227 0.92276002]]
4时刻的h :
[[0.75492218 0.76107768 0.7670033 0.77270935]
[0.44528038 0.45158054 0.45781786 0.46399112]]
可以看到tensorflow实现的LSTM_cell前向传播计算出的隐藏状态 与 numpy 模拟LSTM实现前向传播所计算出的隐藏状态基本一致(但是 4时刻的隐藏状态差距有点大,不知道为什么)
import tensorflow as tf
import numpy as np
import random
random.seed(10)
# 创建输入数据
# batch=2, 第一个序列中的time_step=4,
# 第二个序列中time_step=2, 不够直接补0.0
data = np.array([[[1, 2, 3],
[4, 5, 6],
[5, 6, 4],
[1, 2, 1]],
[[3, 2, 4],
[2, 2, 2],
[0, 0, 0],
[0, 0, 0]]
])/10.0
# 数据占位符
data_placeholder = tf.placeholder(dtype=tf.float32, shape=[2, 4, 3])
# rnn的隐层维数 : 4
cell_f = tf.nn.rnn_cell.LSTMCell(num_units=4, forget_bias=0.0)
outputs, last_states = tf.nn.dynamic_rnn(cell=cell_f, dtype=tf.float32, inputs=data_placeholder, sequence_length=[4, 2])
def get_rnn_variables_to_restore():
return [v for v in tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES) if 'lstm_cell' in v.name]
t = get_rnn_variables_to_restore()
print('输出LSTM_cell中的参数信息:')
print(t[0])
print(t[1])
# 由于我们看不到LSTM参数的初始化,所以不能指定参数的初始化方式
# 但是可以通过上边的方式找到LSTM的参数,从而用tf.assign()初始化
# 用tf.assign将parameter参数赋给LSTM中的参数
parameter = [i + 1 for i in range(112)]
parameter = np.array(parameter)
parameter = np.reshape(parameter, (7, 16)) / 100.0
print('parameter: ')
print(parameter)
parameter_placeholder = tf.placeholder(dtype=tf.float32, shape=[7, 16])
assign = tf.assign(t[0], parameter_placeholder)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
print(sess.run(assign, feed_dict={parameter_placeholder: parameter}))
print('各个时间步的隐层状态:')
print(sess.run(outputs, feed_dict={data_placeholder: data}))
output:
输出LSTM_cell中的参数信息:
parameter:
[[0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13 0.14
0.15 0.16]
[0.17 0.18 0.19 0.2 0.21 0.22 0.23 0.24 0.25 0.26 0.27 0.28 0.29 0.3
0.31 0.32]
[0.33 0.34 0.35 0.36 0.37 0.38 0.39 0.4 0.41 0.42 0.43 0.44 0.45 0.46
0.47 0.48]
[0.49 0.5 0.51 0.52 0.53 0.54 0.55 0.56 0.57 0.58 0.59 0.6 0.61 0.62
0.63 0.64]
[0.65 0.66 0.67 0.68 0.69 0.7 0.71 0.72 0.73 0.74 0.75 0.76 0.77 0.78
0.79 0.8 ]
[0.81 0.82 0.83 0.84 0.85 0.86 0.87 0.88 0.89 0.9 0.91 0.92 0.93 0.94
0.95 0.96]
[0.97 0.98 0.99 1. 1.01 1.02 1.03 1.04 1.05 1.06 1.07 1.08 1.09 1.1
1.11 1.12]]
2018-10-19 16:30:13.588079: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
[[0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13 0.14
0.15 0.16]
[0.17 0.18 0.19 0.2 0.21 0.22 0.23 0.24 0.25 0.26 0.27 0.28 0.29 0.3
0.31 0.32]
[0.33 0.34 0.35 0.36 0.37 0.38 0.39 0.4 0.41 0.42 0.43 0.44 0.45 0.46
0.47 0.48]
[0.49 0.5 0.51 0.52 0.53 0.54 0.55 0.56 0.57 0.58 0.59 0.6 0.61 0.62
0.63 0.64]
[0.65 0.66 0.67 0.68 0.69 0.7 0.71 0.72 0.73 0.74 0.75 0.76 0.77 0.78
0.79 0.8 ]
[0.81 0.82 0.83 0.84 0.85 0.86 0.87 0.88 0.89 0.9 0.91 0.92 0.93 0.94
0.95 0.96]
[0.97 0.98 0.99 1. 1.01 1.02 1.03 1.04 1.05 1.06 1.07 1.08 1.09 1.1
1.11 1.12]]
各个时间步的隐层状态:
[[[0.04597805 0.04794675 0.04993038 0.05192876]
[0.21005629 0.2185854 0.22713308 0.23569286]
[0.5031241 0.51528114 0.5271909 0.5388504 ]
[0.77954775 0.78653383 0.7932657 0.79975235]]
[[0.06209857 0.06523891 0.06840938 0.07160922]
[0.14686525 0.15213361 0.1574358 0.1627702 ]
[0. 0. 0. 0. ]
[0. 0. 0. 0. ]]]
可以看到:
第二个序列的第3 与谛 4时间步的隐藏状态已经为0
待续