仅供本人参考,错了概不负责
图源:https://www.zhihu.com/question/41949741/answer/309529532
我们在使用tf.nn.rnn_cell.BasicLSTMCell
时,有一个要自己设置的参数 num_units
,先讲讲这玩意是啥?
这四个小黄块,有一定了解的同学都知道[ht-1, Xt]
输入后,经过四个黄块和St-1
,又得到了ht
和St
,所以必然[ht-1, Xt]
经过黄块后,维度和原ht-1
一样。这个num_units
就是ht-1
维度,那几个小黄块就是线性映射后再结激活函数。
查看BasicLSTMCell
源码:
def build(self, inputs_shape):
if inputs_shape[-1] is None:
raise ValueError("Expected inputs.shape[-1] to be known, saw shape: %s"
% inputs_shape)
input_depth = inputs_shape[-1]
h_depth = self._num_units
self._kernel = self.add_variable(
_WEIGHTS_VARIABLE_NAME,
shape=[input_depth + h_depth, 4 * self._num_units])
self._bias = self.add_variable(
_BIAS_VARIABLE_NAME,
shape=[4 * self._num_units],
initializer=init_ops.zeros_initializer(dtype=self.dtype))
build函数中初始化了[input_depth + h_depth, 4 * self._num_units]
形状的变量,
输入:其中
input_depth
代表Xt
输入的维度,h_depth
也就是_num_units
代表ht-1
的维度;
输出:4*self._num_units
为4个小黄块的维度W
并且源码并没有定义任何时间步长有关的参数,说明cell
参数在不同time_step都是共享的。
知道了tf.nn.rnn_cell.BasicLSTMCell
是个什么东西之后,我们来讲讲:
创建cell
之后有至少两种方式创建rnn
tf.nn.dynamic_rnn
batch_size = 5
time_step = 7
depth = 30
num_units = 20
inputs = tf.Variable(tf.random_normal([batch_size, time_step, depth]))
cell = tf.nn.rnn_cell.BasicLSTMCell(num_units)
outputs, output_state = tf.nn.dynamic_rnn(cell, inputs, dtype=tf.float32)
# outputs1 [batch_size, time_step, num_units]
# output_state [2, batch_size, num_units]
-inputs
形状为[batch_size, time_step, depth]
-outputs
形状为[batch_size, time_step, num_units]
,每个time_step
的h
信息总和
当要取最后一次time_step
的信息,需要tf.transpose(outputs, [1,0,2])[-1]
来获得
-output_state
形状为[2, batch_size, num_units]
,这个2
,结合lstm的图
一个是
Ct
(part1的St
,我喜欢叫Ct
),一个是ht
(即最后一个时刻t的信息)
如果初始化的是GRUCell
,显然就[batch_size, num_units]
, 没错,这个还是part1中的那个参数
tf.nn.static_rnn
inputs = tf.unstack(inputs, axis=1)
cell = tf.nn.rnn_cell.BasicLSTMCell(20)
outputs1, output_state_fw1 = tf.nn.static_rnn(cell, inputs, dtype=tf.float32)
-inputs
形状为[time_step, batch_size, depth]
-outputs
形状为[time_step, batch_size, num_units]
,每个time_step
的h
信息总和
当要取最后一次time_step
的信息,需要outputs[-1]
来获得
-output_state_fw
形状为[2, batch_size, num_units]
,这个2
,和上面一样
tf.nn.dynamic_rnn
和tf.nn.static_rnn
最本质的区别还是在于每个batch
中的time_step
是否可以变化,tf.nn.dynamic_rnn
可变;tf.nn.static_rnn
所有batch
的time_step
都要一样.
https://www.zhihu.com/question/52200883/answer/153694449
这里更推荐tf.nn.dynamic_rnn
tf
封装的rnn,自己实现展开这种方式在一些时候有用,如time_step
很小时,for
展开即可,用的少不详细讲解.
当然至少也分tf.nn.static_bidirectional_rnn
和tf.nn.bidirectional_dynamic_rnn()
tf.nn.bidirectional_dynamic_rnn()
batch_size = 5
time_step = 7
depth = 30
num_units = 20
inputs = tf.Variable(tf.random_normal([batch_size, max_time, depth])
fw_cells = tf.nn.rnn_cell.BasicLSTMCell(num_units) # 前向LSTM层
bw_cells = tf.nn.rnn_cell.GRUCell(num_units) # 后向LSTM层
outputs, (output_state_fw, output_state_bw) = tf.nn.bidirectional_dynamic_rnn(fw_cells, bw_cells, inputs, dtype=tf.float32)
# outputs [2, batch_size, time_step, num_units]
# output_state_fw [2, batch_size, num_units]
# output_state_bw [batch_size, num_units]
outputs
和(output_state_fw, output_state_bw)
以tuple方式返回
-inputs
形状为[batch_size, time_step, depth]
-outputs
形状为[2, batch_size, time_step, num_units]
,2
表示fw和bw两个的输出,以tuple方式组合,使用时要output = tf.concat(output, -1)
成[batch_size, time_step, 2*num_units]
,当要取最后一次time_step
的信息,需要tf.transpose(outputs, [1,0,2])[-1]
来获得
-output_state_fw
形状为[2, batch_size, num_units]
,这个2
,结合lstm的图,为fw的Ct和ht
-output_state_bw
形状为[batch_size, num_units]
,和fw时不同,需要注意,只有bw的ht
tf.nn.static_bidirectional_rnn
inputs = tf.unstack(inputs, axis=1)
fw_cells = tf.nn.rnn_cell.BasicLSTMCell(20) # LSTM层
bw_cells = tf.nn.rnn_cell.GRUCell(20) # 后向LSTM层
output, output_state_fw, output_state_bw = tf.nn.static_bidirectional_rnn(fw_cells, bw_cells, inputs, dtype=tf.float32)
# outputs [time_step, batch_size, num_units]
# output_state_fw [2, batch_size, num_units]
# output_state_bw [batch_size, num_units]
-inputs
形状为[time_step, batch_size, depth]
-outputs
形状为[time_step, batch_size, 2*num_units]
,每个time_step
的h
信息总和
当要取最后一次time_step
的信息,需要outputs[-1]
来获得
-output_state_fw
形状为[2, batch_size, num_units]
,这个2
,结合lstm的图,为fw的Ct和ht
-output_state_bw
形状为[batch_size, num_units]
,和fw时不同,需要注意,只有bw的ht
tf.nn.dynamic_rnn
units = [10,20,30]
fw_cells = [tf.nn.rnn_cell.BasicLSTMCell(unit) for unit in units] # LSTM层
cell = tf.nn.rnn_cell.MultiRNNCell(fw_cells, state_is_tuple=True)
outputs, output_state_fw = tf.nn.dynamic_rnn(cell, inputs, dtype=tf.float32)
-outputs
形状为[time_step, batch_size, fw_num_units]
-output_state_fw
形状为[lstm_layer_nums, 2 , batch_size, fw_num_units]
(2
同上)
其他的和tf.nn.dynamic_rnn一样,需要
tf.transpose(outputs, [1,0,2])[-1]
获得最后一层,不同的就是tf.nn.rnn_cell.MultiRNNCell
创建了多层而已
注意:当 fw_cells
, bw_cells
传入的多个layer的 num_units
不同时,打印这部分信息会报错(显然)
fw_cells
是多个cell
组成的list
tf.nn.static_rnn
同理推荐使用(contrib
),肯定的tf.nn.bidirectional_dynamic_rnn
也可以。
tf.nn.bidirectional_dynamic_rnn
batch_size = 5
time_step = 7
depth = 64
inputs = tf.Variable(tf.random_normal([batch_size, time_step, depth]))
units = [20,20,20]
fw_cells = [tf.nn.rnn_cell.BasicLSTMCell(unit) for unit in units] # LSTM层
bw_cells = [tf.nn.rnn_cell.GRUCell(unit) for unit in units] # 后向LSTM层
fw_cell = tf.nn.rnn_cell.MultiRNNCell(fw_cells, state_is_tuple=True)
bw_cell = tf.nn.rnn_cell.MultiRNNCell(bw_cells, state_is_tuple=True)
output, (output_state_fw, output_state_bw) = tf.nn.bidirectional_dynamic_rnn(fw_cell, bw_cell, inputs, dtype=tf.float32)
# outputs [2, batch_size, time_step, units[-1]]
# output_state_fw [2, batch_size, num_units]
# output_state_bw [batch_size, num_units]
似乎只是把之前讲的都叠加了而已,
outputs
的最后一维为最后一层的lstm的num_units
注意:同理,当 fw_cells
, bw_cells
传入的多个layer的 num_units
不同时,打印这部分信息会报错(显然)
tf.nn.static_bidirectional_rnn
inputs = tf.unstack(inputs, axis=1) #unstack的不能给dynamic_rnn当输入,服了。。。
units = [20,20,20]
fw_cells = [tf.nn.rnn_cell.BasicLSTMCell(unit) for unit in units] # LSTM层
bw_cells = [tf.nn.rnn_cell.GRUCell(unit) for unit in units] # 后向LSTM层
fw_cell = tf.nn.rnn_cell.MultiRNNCell(fw_cells, state_is_tuple=True)
bw_cell = tf.nn.rnn_cell.MultiRNNCell(bw_cells, state_is_tuple=True)
output, output_state_fw, output_state_bw = tf.nn.static_bidirectional_rnn(fw_cell, bw_cell, inputs, dtype=tf.float32)
# outputs [time_step, batch_size, fw_units[-1]+bw_units[-1]]
# output_state_fw [2, batch_size, fw_num_units]
# output_state_bw [batch_size, bw_num_units]
outputs
最后一维为 前向最后一维+后向最后一维
其他的和前文讲的无差别
注意:同理,当 fw_cells
, bw_cells
传入的多个layer的 num_units
不同时,打印这部分信息会报错(显然)
tf.contrib.rnn.stack_bidirectional_rnn
inputs = tf.unstack(inputs, axis=1) #unstack的不能给dynamic_rnn当输入,服了。。。
fw_units = [20,20,20]
bw_units = [20,20,20]
fw_cells = [tf.nn.rnn_cell.LSTMCell(unit) for unit in fw_units] # 前向LSTM层
bw_cells = [tf.nn.rnn_cell.GRUCell(unit) for unit in bw_units] # 后向LSTM层
outputs, output_state_fw, output_state_bw = tf.contrib.rnn.stack_bidirectional_rnn(fw_cells, bw_cells, inputs, dtype=tf.float32)
# outputs [time_step, batch_size, fw_units[-1]+bw_units[-1]]
# output_state_fw [2, batch_size, fw_num_units]
# output_state_bw [batch_size, bw_num_units]
-inputs
形状为[time_step, batch_size, depth]
-outputs
形状为[time_step, batch_size, fw_num_units + bw_num_units]
-output_state_fw
形状为[bi-lstm_layer_nums, 2 , batch_size, fw_num_units ]
(2
同上)
-output_state_bw
形状为[bi-lstm_layer_nums, batch_size, bw_num_units ]
注意:同理,当 fw_cells
, bw_cells
传入的多个layer的 num_units
不同时,打印这部分信息会报错(显然)
fw_cells
是多个cell
组成的list;同理于bw_cells
static
和contrib
的LSTM输出都是[time_step, batch_size, depth]
形状,在使用时要千万注意,要重新整理成[batch_size, time_step, num_units]
形状,供之后的网络使用!