Class tf.contrib.rnn.LSTMCell
继承自:LayerRNNCell
Aliases:
长短时记忆单元循环网络单元。默认的non-peephole是基于http://www.bioinf.jku.at/publications/older/2604.pdf
S. Hochreiter and J. Schmidhuber. "Long Short-Term Memory". Neural Computation, 9(8):1735-1780, 1997.
实现的,peephole的实现基于https://research.google.com/pubs/archive/43905.pdf
Hasim Sak, Andrew Senior, and Francoise Beaufays. "Long short-term memory recurrent neural network architectures for large scale acoustic modeling." INTERSPEECH, 2014.
该类使用可选的peephole连接,可选的单元裁剪(cell clipping),可选的投影层。
LSTMCell与BasicLSTMCell的区别是该类增加了可选的peephole连接、单元裁剪(cell clipping)、投影层。
LSTMCell构造函数:
__init__( num_units, use_peepholes=False, cell_clip=None, initializer=None, num_proj=None, proj_clip=None, num_unit_shards=None, num_proj_shards=None, forget_bias=1.0, state_is_tuple=True, activation=None, reuse=None, name=None, dtype=None ) |
参数说明:
BasicLSTMCell构造函数:
__init__( num_units, forget_bias=1.0, state_is_tuple=True, activation=None, reuse=None, name=None, dtype=None ) |
LSTMCell源码:
# i = input_gate, j = new_input, f = forget_gate, o = output_gate lstm_matrix = self._linear1([inputs, m_prev]) i, j, f, o = array_ops.split( value=lstm_matrix, num_or_size_splits=4, axis=1) # Diagonal connections if self._use_peepholes and not self._w_f_diag: scope = vs.get_variable_scope() with vs.variable_scope( scope, initializer=self._initializer) as unit_scope: with vs.variable_scope(unit_scope): self._w_f_diag = vs.get_variable( "w_f_diag", shape=[self._num_units], dtype=dtype) self._w_i_diag = vs.get_variable( "w_i_diag", shape=[self._num_units], dtype=dtype) self._w_o_diag = vs.get_variable( "w_o_diag", shape=[self._num_units], dtype=dtype)
if self._use_peepholes: c = (sigmoid(f + self._forget_bias + self._w_f_diag * c_prev) * c_prev + sigmoid(i + self._w_i_diag * c_prev) * self._activation(j)) else: c = (sigmoid(f + self._forget_bias) * c_prev + sigmoid(i) * self._activation(j))
if self._cell_clip is not None: # pylint: disable=invalid-unary-operand-type c = clip_ops.clip_by_value(c, -self._cell_clip, self._cell_clip) # pylint: enable=invalid-unary-operand-type if self._use_peepholes: m = sigmoid(o + self._w_o_diag * c) * self._activation(c) else: m = sigmoid(o) * self._activation(c)
if self._num_proj is not None: if self._linear2 is None: scope = vs.get_variable_scope() with vs.variable_scope(scope, initializer=self._initializer): with vs.variable_scope("projection") as proj_scope: if self._num_proj_shards is not None: proj_scope.set_partitioner( partitioned_variables.fixed_size_partitioner( self._num_proj_shards)) self._linear2 = _Linear(m, self._num_proj, False) m = self._linear2(m)
if self._proj_clip is not None: # pylint: disable=invalid-unary-operand-type m = clip_ops.clip_by_value(m, -self._proj_clip, self._proj_clip) # pylint: enable=invalid-unary-operand-type
new_state = (LSTMStateTuple(c, m) if self._state_is_tuple else array_ops.concat([c, m], 1)) return m, new_state |
从上面构造函数和源码可以看出LSTMCell和BasicLSTMCell的区别:
c = clip_ops.clip_by_value(c, -self._cell_clip, self._cell_clip)
m = _linear(m, self._num_proj, bias=False, scope=scope)
m = clip_ops.clip_by_value(m, -self._proj_clip, self._proj_clip)
代码实例:
import tensorflow as tf
batch_size=10
embedding_dim=300
inputs=tf.Variable(tf.random_normal([batch_size,embedding_dim]))
previous_state=(tf.Variable(tf.random_normal([batch_size,128])),tf.Variable(tf.random_normal([batch_size,128])))
lstmcell=tf.nn.rnn_cell.LSTMCell(128)
outputs,(h_state,c_state)=lstmcell(inputs,previous_state)
print(outputs.shape) #(10, 128)
print(h_state.shape) #(10, 128)
print(c_state.shape) #(10, 128)