LSTM在Keras和Tensorflow中的统一

最近想把一个用到Tensorflow的LSTM的模型改成Keras,崩溃,好在解决了问题,小笔记记录一下

目的

KerasLSTM的输出

Tensorflow的用LSTMCelldynamic_rnn组成的LSTM结果一样。

首先是固定seed然后做一个简单的tf的LSTM的模型,如下

令人抓狂的过程

Tensorflow参考例

这边的输出当作Keras的配置的正确答案
其权重当作Keras初始权重看看输出的答案是否与正确答案一样

forget_bias设置为0的原因在于keras中并没有提供。(但是不影响bias的训练)

import tensorflow as tf
from tensorflow.contrib.rnn import LSTMCell, LSTMStateTuple

import numpy as np

np.random.seed(0)
tf.set_random_seed(0)
batch_size = 1
seq_length = 5
inputs = tf.placeholder(shape=[None, seq_length, 1], dtype=tf.float32)

cell = LSTMCell(num_units=1,
                state_is_tuple=True,
                forget_bias=0.0,
                initializer=None)

rnn_outputs, rnn_states = tf.nn.dynamic_rnn(
    cell=cell,
    dtype=tf.float32,
    sequence_length=[seq_length] * batch_size,
    inputs=inputs)

sess = tf.Session()
sess.run(tf.global_variables_initializer())

total_parameters = 0
for variable in tf.trainable_variables():
    print("---- ", variable, " ----")
    print(repr(sess.run(variable)))
print("===========================================")

rnn_outputs_, rnn_states_ = sess.run([rnn_outputs, rnn_states], 
                                     feed_dict={inputs: np.array([
                                         [[1.],
                                          [0.],
                                          [0.],
                                          [0.],
                                          [0.]]
                                     ])})

print(rnn_outputs_)

"""
----    ----
array([[ 0.51967764,  0.54025245, -0.7382488 ,  0.2612908 ],
       [ 0.552927  , -0.77836156,  0.8382869 , -0.6932683 ]],
      dtype=float32)
----    ----
array([0., 0., 0., 0.], dtype=float32)
"""

"""
[[[0.16935204]
  [0.04550969]
  [0.0154935 ]
  [0.00487053]
  [0.00150192]]]
"""

然后看一个Tensorflow的源代码,大概可以这样用纯粹的python复现出来

纯python版本

import numpy as np
import math


def sigmoid(x):
    return 1 / (1 + math.exp(-x))


A = np.array([1., 0])
B = np.array([[0.51967764, 0.54025245, -0.7382488, 0.2612908],
              [0.552927, -0.77836156, 0.8382869, -0.6932683]])
bias = 0
i, j, f, o = tuple(np.dot(A, B) + bias)

print(i, j, f, o)
c_prev = 0
_forget_bias = 0
_activation = np.tanh
c = (sigmoid(f + _forget_bias) * c_prev + sigmoid(i) * _activation(j))
m = sigmoid(o) * _activation(c)
print(c, m)
"""
0.30925895553474303 0.16935205977884268
"""

可以看到上面的结果0.16935205977884268可以匹配上Tensorflow的结果

Keras版本

参考KerasLSTM的代码可以写成纯python的

import numpy as np
import math

def bias_add(x, y):
    return x + y
def sigmoid(x):
    return 1 / (1 + math.exp(-x))


inputs = np.array([1])
kernel = np.array([[0.51967764, 0.54025245, -0.7382488, 0.2612908]])
recurrent_kernel = np.array([[0.552927, -0.77836156, 0.8382869, -0.6932683]])
bias = np.array([0, 0, 0, 0])
states = [0, 0]

recurrent_activation = sigmoid
activation = np.tanh

kernel_i = kernel[:, 0]
kernel_f = kernel[:, 2]
kernel_c = kernel[:, 1]
kernel_o = kernel[:, 3]

recurrent_kernel_i = recurrent_kernel[:, 0]
recurrent_kernel_f = recurrent_kernel[:, 1]
recurrent_kernel_c = recurrent_kernel[:, 2]
recurrent_kernel_o = recurrent_kernel[:, 3]

bias_i = bias[0]
bias_f = bias[1]
bias_c = bias[2]
bias_o = bias[3]

h_tm1, c_tm1 = states[0], states[1]

inputs_i = inputs
inputs_f = inputs
inputs_c = inputs
inputs_o = inputs

x_i = np.dot(inputs_i, kernel_i)
x_f = np.dot(inputs_f, kernel_f)
x_c = np.dot(inputs_c, kernel_c)
x_o = np.dot(inputs_o, kernel_o)

x_i = bias_add(x_i, bias_i)
x_f = bias_add(x_f, bias_f)
x_c = bias_add(x_c, bias_c)
x_o = bias_add(x_o, bias_o)

h_tm1_i = h_tm1
h_tm1_f = h_tm1
h_tm1_c = h_tm1
h_tm1_o = h_tm1
i = recurrent_activation(x_i + np.dot(h_tm1_i, recurrent_kernel_i))
f = recurrent_activation(x_f + np.dot(h_tm1_f, recurrent_kernel_f))
c = f * c_tm1 + i * activation(x_c + np.dot(h_tm1_c, recurrent_kernel_c))
o = recurrent_activation(x_o + np.dot(h_tm1_o, recurrent_kernel_o))
h = o * activation(c)
print(c, h)

"""
[0.30925896] [0.16935206]
"""

可以看到0.16935206也是match的,说明配置对了!

结论

符号的对应

简单来说,和参考值的纯python的版本,应该有

[Tensorflow] i  ==> [Keras] i
[Tensorflow] j  ==> [Keras] c
[Tensorflow] f  ==> [Keras] f
[Tensorflow] o  ==> [Keras] o
[Tensorflow] h  ==> [Keras] m

activation的选择

然后从Keras

o = recurrent_activation(x_o + np.dot(h_tm1_o, recurrent_kernel_o))
h = o * activation(c)

Tensorflowr中的

m = sigmoid(o) * _activation(c)

可以知道

recurrent_activation = sigmoid
activation = _activation = np.tanh

权重的对应

很重要的一点是:
Tensorflow中LSTM的Weights矩阵,他的shape是[n_units + 1,4]其中+1对应的是前面state传过来的权重的
keras将这个权重分成了两块:kernel(shape是[n_units ,4])和recurrent_kernel(shape是[1,4])

Keras的配置

from keras.layers import LSTM
from keras.models import Sequential
import numpy as np

model = Sequential()
model.add(LSTM(units=1, recurrent_activation='sigmoid', return_sequences=True, 
          activation='tanh', input_shape=(5, 1)))

#from above result
tensorflow_weights = np.array([[0.51967764, 0.54025245, -0.7382488, 0.2612908],
                               [0.552927, -0.77836156, 0.8382869, -0.6932683]])
tensorflow_bias = np.array([0., 0., 0., 0.], dtype=np.float32)

keras_indices_order = np.argsort([0, 2, 1, 3])
keras_weights = tensorflow_weights[:-1, :][:, keras_indices_order]
keras_recurrent_weights = np.array([tensorflow_weights[-1, :]])[:, keras_indices_order]
keras_bias = tensorflow_bias[keras_indices_order]
model.layers[0].set_weights([keras_weights, keras_recurrent_weights, keras_bias])

_ = np.array([
    [[1.],
     [0.],
     [0.],
     [0.],
     [0.]]
])
prediction = model.predict(_)
print(prediction)

"""
[[[0.16935204]
  [0.04550969]
  [0.0154935 ]
  [0.00487053]
  [0.00150192]]]
"""

你可能感兴趣的:(机器学习)