最近想把一个用到Tensorflow的LSTM的模型改成Keras,崩溃,好在解决了问题,小笔记记录一下
让Keras
的LSTM
的输出
与
Tensorflow
的用LSTMCell
和dynamic_rnn
组成的LSTM
结果一样。
首先是固定seed然后做一个简单的tf
的LSTM的模型,如下
这边的输出当作Keras
的配置的正确答案
其权重当作Keras
的初始权重
看看输出的答案是否与正确答案
一样
forget_bias
设置为0的原因在于keras
中并没有提供。(但是不影响bias的训练)
import tensorflow as tf
from tensorflow.contrib.rnn import LSTMCell, LSTMStateTuple
import numpy as np
np.random.seed(0)
tf.set_random_seed(0)
batch_size = 1
seq_length = 5
inputs = tf.placeholder(shape=[None, seq_length, 1], dtype=tf.float32)
cell = LSTMCell(num_units=1,
state_is_tuple=True,
forget_bias=0.0,
initializer=None)
rnn_outputs, rnn_states = tf.nn.dynamic_rnn(
cell=cell,
dtype=tf.float32,
sequence_length=[seq_length] * batch_size,
inputs=inputs)
sess = tf.Session()
sess.run(tf.global_variables_initializer())
total_parameters = 0
for variable in tf.trainable_variables():
print("---- ", variable, " ----")
print(repr(sess.run(variable)))
print("===========================================")
rnn_outputs_, rnn_states_ = sess.run([rnn_outputs, rnn_states],
feed_dict={inputs: np.array([
[[1.],
[0.],
[0.],
[0.],
[0.]]
])})
print(rnn_outputs_)
"""
---- ----
array([[ 0.51967764, 0.54025245, -0.7382488 , 0.2612908 ],
[ 0.552927 , -0.77836156, 0.8382869 , -0.6932683 ]],
dtype=float32)
---- ----
array([0., 0., 0., 0.], dtype=float32)
"""
"""
[[[0.16935204]
[0.04550969]
[0.0154935 ]
[0.00487053]
[0.00150192]]]
"""
然后看一个Tensorflow的源代码,大概可以这样用纯粹的python
复现出来
import numpy as np
import math
def sigmoid(x):
return 1 / (1 + math.exp(-x))
A = np.array([1., 0])
B = np.array([[0.51967764, 0.54025245, -0.7382488, 0.2612908],
[0.552927, -0.77836156, 0.8382869, -0.6932683]])
bias = 0
i, j, f, o = tuple(np.dot(A, B) + bias)
print(i, j, f, o)
c_prev = 0
_forget_bias = 0
_activation = np.tanh
c = (sigmoid(f + _forget_bias) * c_prev + sigmoid(i) * _activation(j))
m = sigmoid(o) * _activation(c)
print(c, m)
"""
0.30925895553474303 0.16935205977884268
"""
可以看到上面的结果0.16935205977884268
可以匹配上Tensorflow的结果
参考Keras
的LSTM
的代码可以写成纯python的
import numpy as np
import math
def bias_add(x, y):
return x + y
def sigmoid(x):
return 1 / (1 + math.exp(-x))
inputs = np.array([1])
kernel = np.array([[0.51967764, 0.54025245, -0.7382488, 0.2612908]])
recurrent_kernel = np.array([[0.552927, -0.77836156, 0.8382869, -0.6932683]])
bias = np.array([0, 0, 0, 0])
states = [0, 0]
recurrent_activation = sigmoid
activation = np.tanh
kernel_i = kernel[:, 0]
kernel_f = kernel[:, 2]
kernel_c = kernel[:, 1]
kernel_o = kernel[:, 3]
recurrent_kernel_i = recurrent_kernel[:, 0]
recurrent_kernel_f = recurrent_kernel[:, 1]
recurrent_kernel_c = recurrent_kernel[:, 2]
recurrent_kernel_o = recurrent_kernel[:, 3]
bias_i = bias[0]
bias_f = bias[1]
bias_c = bias[2]
bias_o = bias[3]
h_tm1, c_tm1 = states[0], states[1]
inputs_i = inputs
inputs_f = inputs
inputs_c = inputs
inputs_o = inputs
x_i = np.dot(inputs_i, kernel_i)
x_f = np.dot(inputs_f, kernel_f)
x_c = np.dot(inputs_c, kernel_c)
x_o = np.dot(inputs_o, kernel_o)
x_i = bias_add(x_i, bias_i)
x_f = bias_add(x_f, bias_f)
x_c = bias_add(x_c, bias_c)
x_o = bias_add(x_o, bias_o)
h_tm1_i = h_tm1
h_tm1_f = h_tm1
h_tm1_c = h_tm1
h_tm1_o = h_tm1
i = recurrent_activation(x_i + np.dot(h_tm1_i, recurrent_kernel_i))
f = recurrent_activation(x_f + np.dot(h_tm1_f, recurrent_kernel_f))
c = f * c_tm1 + i * activation(x_c + np.dot(h_tm1_c, recurrent_kernel_c))
o = recurrent_activation(x_o + np.dot(h_tm1_o, recurrent_kernel_o))
h = o * activation(c)
print(c, h)
"""
[0.30925896] [0.16935206]
"""
可以看到0.16935206
也是match的,说明配置对了!
简单来说,和参考值
的纯python
的版本,应该有
[Tensorflow] i ==> [Keras] i
[Tensorflow] j ==> [Keras] c
[Tensorflow] f ==> [Keras] f
[Tensorflow] o ==> [Keras] o
[Tensorflow] h ==> [Keras] m
然后从Keras
o = recurrent_activation(x_o + np.dot(h_tm1_o, recurrent_kernel_o))
h = o * activation(c)
和Tensorflowr
中的
m = sigmoid(o) * _activation(c)
可以知道
recurrent_activation = sigmoid
activation = _activation = np.tanh
很重要
的一点是:
Tensorflow
中LSTM的Weights
矩阵,他的shape是[n_units + 1,4]
其中+1
对应的是前面state
传过来的权重的
而keras
将这个权重分成了两块:kernel
(shape是[n_units ,4])和recurrent_kernel
(shape是[1,4])
from keras.layers import LSTM
from keras.models import Sequential
import numpy as np
model = Sequential()
model.add(LSTM(units=1, recurrent_activation='sigmoid', return_sequences=True,
activation='tanh', input_shape=(5, 1)))
#from above result
tensorflow_weights = np.array([[0.51967764, 0.54025245, -0.7382488, 0.2612908],
[0.552927, -0.77836156, 0.8382869, -0.6932683]])
tensorflow_bias = np.array([0., 0., 0., 0.], dtype=np.float32)
keras_indices_order = np.argsort([0, 2, 1, 3])
keras_weights = tensorflow_weights[:-1, :][:, keras_indices_order]
keras_recurrent_weights = np.array([tensorflow_weights[-1, :]])[:, keras_indices_order]
keras_bias = tensorflow_bias[keras_indices_order]
model.layers[0].set_weights([keras_weights, keras_recurrent_weights, keras_bias])
_ = np.array([
[[1.],
[0.],
[0.],
[0.],
[0.]]
])
prediction = model.predict(_)
print(prediction)
"""
[[[0.16935204]
[0.04550969]
[0.0154935 ]
[0.00487053]
[0.00150192]]]
"""