You may have noticed in several Keras recurrent layers, there are two parameters, return_state
,andreturn_sequences
. In this post, I am going to show you what they mean and when to use them in real-life cases.
To understand what they mean, we need firstly crack open a recurrent layer a little bit such as the most often used LSTM and GRU.
The most primitive version of the recurrent layer implemented in Keras, the SimpleRNN, which is suffered from the vanishing gradients problem causing it challenging to capture long-range dependencies. Alternatively, LSTM and GRU each are equipped with unique "Gates" to avoid the long-term information from "vanishing" away.
In the graph above we can see given an input sequence to an RNN layer, each RNN cell related to each time step will generate output known as the hidden state, a
Depends on which RNN you use, it differs in how a
c
Return sequences refer to return the hidden state areturn_sequences
is set to False in Keras RNN layers, and this means the RNN layer will only return the last hidden state output a
In other cases, we need the full sequence as the output. Setting return_sequences
to True is necessary.
Let's define a Keras model consists of only an LSTM layer. Use constant initializers so that the output results are reproducible for the demo purpose.
from keras.models import Model
from keras.layers import Input
from keras.layers import LSTM
from numpy import array
import keras
# to set up the initial value of key super parameters
k_init = keras.initializers.Constant(value=0.1)
b_init = keras.initializers.Constant(value=0)
r_init = keras.initializers.Constant(value=0.1)
# LSTM units
units = 1
# define model
inputs1 = Input(shape=(3, 2))
lstm1 = LSTM(units, return_sequences=True, kernel_initializer=k_init, bias_initializer=b_init, recurrent_initializer=r_init)(inputs1)
model = Model(inputs=inputs1, outputs=lstm1)
# define input data
data = array([0.1, 0.2, 0.3, 0.1, 0.2, 0.3]).reshape((1,3,2))
# make and show prediction
output = model.predict(data)
print(output, output.shape)
output:
[[[0.00767819]
[0.01597687]
[0.02480672]]] (1, 3, 1)
We can see the output array's shape of the LSTM layer is (1,3,1) which stands for (#Samples, #Time steps, #LSTM units). Compared to when return_sequences is set to False, the shape will be (#Samples, #LSTM units), which only returns the last time step hidden state.
# define model
inputs1 = Input(shape=(3, 2))
lstm1 = LSTM(units, kernel_initializer=k_init, bias_initializer=b_init, recurrent_initializer=r_init)(inputs1)
model = Model(inputs=inputs1, outputs=lstm1)
# define input data
data = array([0.1, 0.2, 0.3, 0.1, 0.2, 0.3]).reshape((1,3,2))
# make and show prediction
preds = model.predict(data)
print(preds, preds.shape)
output:
[[0.02480672]] (1, 1)
There are two primary situations when you can apply the return_sequences
to return the full sequence.
return_sequences
to True so that the following RNN layer or layers can have the full sequence as input.Return sequences refer to return the cell state c
In Keras we can output RNN's last cell state in addition to its hidden states by setting return_state
toTrue.
# define model
inputs1 = Input(shape=(3, 2))
lstm1, state_h, state_c = LSTM(units, return_state=True, kernel_initializer=k_init, bias_initializer=b_init, recurrent_initializer=r_init)(inputs1)
model = Model(inputs=inputs1, outputs=[lstm1, state_h, state_c])
# define input data
data = array([0.1, 0.2, 0.3, 0.1, 0.2, 0.3]).reshape((1,3,2))
# make and show prediction
output = model.predict(data)
print(output)
for a in output:
print(a.shape)
Output:
[array([[0.02480672]], dtype=float32), array([[0.02480672]], dtype=float32), array([[0.04864851]], dtype=float32)]
(1, 1)
(1, 1)
(1, 1)
The output of the LSTM layer has three components, they are (a
The major reason you want to set the return_state
is an RNN may need to have its cell state initialized with previous time step while the weights are shared, such as in an encoder-decoder model. A snippet of the code from an encoder-decoder model is shown below.
decoder_lstm = LSTM(latent_dim, return_sequences=True, return_state=True)
for _ in range(max_decoder_seq_length):
# Run the decoder on one timestep
outputs, state_h, state_c = decoder_lstm(inputs,
initial_state=states)
outputs = decoder_dense(outputs)
# Store the current prediction (we will concatenate all predictions later)
all_outputs.append(outputs)
# Reinject the outputs as inputs for the next loop iteration
# as well as update the states
inputs = outputs
states = [state_h, state_c]
You have noticed for the above encoder-decoder model both return_sequences
and return_state
are set to True. In that case, the output of the LSTM will have three components, (a<1...T>, a
# define model
inputs1 = Input(shape=(3, 2))
lstm1, state_h, state_c = LSTM(units, return_sequences=True, return_state=True, kernel_initializer=k_init, bias_initializer=b_init, recurrent_initializer=r_init)(inputs1)
model = Model(inputs=inputs1, outputs=[lstm1, state_h, state_c])
# define input data
data = array([0.1, 0.2, 0.3, 0.1, 0.2, 0.3]).reshape((1,3,2))
# make and show prediction
output = model.predict(data)
print(output)
for a in output:
print(a.shape)
Output
[array([[[0.00767819],
[0.01597687],
[0.02480672]]], dtype=float32), array([[0.02480672]], dtype=float32), array([[0.04864851]], dtype=float32)]
(1, 3, 1)
(1, 1)
(1, 1)
One thing worth mentioning is that if we replace LSTM with GRU the output will have only two components. (a<1...T>, c
To understand how to use return_sequences
and return_state
, we start off with a short introduction of two commonly used recurrent layers, LSTM and GRU and how their cell state and hidden state are derived. Next, we dived into some cases of applying each of two arguments as well as tips when you can consider using them in your next model.
You can find the source code for this post on my GitHub repo.