复现论文 A Dual-Stage Attention-Based Recurrent Neural Network for Time Series Prediction 用于时间序列预测

星期二, 03. 十二月 2019 09:34下午

注：论文实现代码和论文都在我的git账号下，欢迎交流讨论

论文题目:

A Dual-Stage Attention-Based Recurrent Neural Network for Time Series Prediction

论文基本框架

1.png

1、主要内容

该论文主要通过在编码层和解码层都引入双向注意力机制实现时间序列更好的预测

2、输入数据

输入数据地址来自于：https://cseweb.ucsd.edu/~yaq007/NASDAQ100_stock_data.html
数据名称为 nasdaq100/small/nasdaq100_padding.csv
直接下载该csv文件，会得到一个82列的df，最后一列是标签，训练集和验证集通过如下代码生成:

def get_data(data_path,T, n):
    input_X = []
    input_Y = []
    label_Y = []

    df = pd.read_csv(data_path)
    row_length = len(df)
    column_length = df.columns.size
    for i in range(row_length-T+1):
        X_data = df.iloc[i:i+T, 0:column_length-1]
        Y_data = df.iloc[i:i+T-1,column_length-1]
        label_data = df.iloc[i+T-1,column_length-1]
        input_X.append(np.array(X_data))
        input_Y.append(np.array(Y_data))
        label_Y.append(np.array(label_data))
    input_X = np.array(input_X).reshape(-1,T,n)
    input_Y = np.array(input_Y).reshape(-1,T-1,1)
    label_Y = np.array(label_Y).reshape(-1,1)
    return input_X,input_Y,label_Y

其中，为窗口长度，为传感器数目，即81(82-1)，算法通过第个窗口中前个传感器数据值和个标签值，预测第个标签值。

模型框架

模型一共分为两层，即编码阶段Attention和解码阶段Attention，两者实现的思路是一致的，这里只说论文核心的两部分

Attention程序实现

    @staticmethod
    def _attention_layer(_h_t_1 = None,     # [-1, cell_dim]
                         _s_t_1 = None,     # [-1, cell_dim]
                         _x_k   = None,     # input attention layer   : [-1, input_dim, time_step]
                                            # temporal attention layer: [-1, time_step, input_dim];
                         _We    = None,     # [2*cell_dim, time_step]
                         _Ue    = None,     # [time_step, time_step]
                         _Ve    = None, ):  # [time_step, 1]
        _input = tf.concat([_h_t_1, _s_t_1], axis=1)  # [-1, 2*cell_dim]
        _output = tf.reshape(tf.transpose(tf.tensordot(_We, _input, axes=[0, 1])), [-1, _We.get_shape().as_list()[-1], 1]) + \
                  tf.transpose(tf.tensordot(_Ue, _x_k, axes=[0, 2]), perm=[1, 0, 2])   # [-1, time_step, input_dim]
        return tf.reshape(tf.tensordot(_Ve, tf.tanh(_output), axes=[0, 1]), \
                          [-1, _x_k.get_shape().as_list()[1]])    # input attention layer   : [-1, input_dim]
                                                                  # temporal attention layer: [-1, time_step]

Attention的实现是很简单的，这里不详细叙说。

论文中主要解决的问题

第一个问题：LSTM中如何实现Attention?
第二个问题：如何将历史标签用于解码阶段?

关于第一个问题，论文中用了如下公式：
论文中的公式(8):

论文中的公式(9):

式中，为第个传感器窗口为T的样本；和分别为编码器LSTM在时刻的隐层状态和细胞状态，为需要学习的参数。

关于第二个问题，论文中用了如下公式：
论文中的公式(15):

式中，为时刻的历史标签，为解码器LSTM在时刻的细胞状态，得到的为解码器LSTM在时刻的输入。

论文中的编码阶段实现:

    def _Encode(self, encode_input=None):  # encode_input: [-1, time_step, input_dim]
        x_k = tf.transpose(encode_input, perm=[0, 2, 1], name='Series_of_length_TIME_STEP')#[-1,input_dim,time_step]
        encode_time_step_hidden = []
        for t in range(encode_input.get_shape()[1]):  # [t < time_step]
            e_t = self._attention_layer(_h_t_1 = self.encode_hidden,
                                        _s_t_1 = self.encode_cell,
                                        _x_k   = x_k,
                                        _We    = self._weights['Input_attention_layer_We'],
                                        _Ue    = self._weights['Input_attention_layer_Ue'],
                                        _Ve    = self._weights['Input_attention_layer_Ve'], )
            a_t = tf.nn.softmax(e_t)  # [-1, input_dim]
            tmp = tf.reshape(encode_input[:, t, :], shape=[-1, encode_input.get_shape().as_list()[-1]])
            x_t = tf.multiply(a_t, tmp)
            (self.encode_cell, self.encode_hidden) = self._LSTMCell(h_t_1 = self.encode_hidden,
                                                                    s_t_1 = self.encode_cell,
                                                                    x_t   = x_t,
                                                                    name  = 'Encode_LSTMCell')
            encode_time_step_hidden.append(self.encode_hidden)
        return tf.reshape(tf.stack(encode_time_step_hidden), [-1, self.TIME_STEP, self.DECODE_CELL])

注：程序用了for循环，因为每一个LSTM单元都用了Attention

论文中的解码阶段实现：

    def _Decode(self, decode_input=None, y_t=None):
        for t in range(decode_input.get_shape()[1]-1):
            l_t = self._attention_layer(_h_t_1 = self.decode_hidden,
                                        _s_t_1 = self.decode_cell,
                                        _x_k   = decode_input,
                                        _We    = self._weights['Temporal_attention_layer_Wd'],
                                        _Ue    = self._weights['Temporal_attention_layer_Ud'],
                                        _Ve    = self._weights['Temporal_attention_layer_Vd'], )
            b_t = tf.reshape(tf.nn.softmax(l_t), shape=[-1, decode_input.get_shape().as_list()[1], 1])  # [-1, time_step, 1]
            c_t = tf.reduce_sum(tf.multiply(b_t, decode_input), axis=1)  # [-1, time_step, 1]*[-1, time_step, cell_dim]
                                                                         # ---> [-1, time_step, cell_dim]-->[-1, cell_dim]
            y_t_ = self._Dense(_input       = tf.concat([c_t, tf.reshape(y_t[:, t], [-1, 1])], axis=1),
                               _weights     = self._weights['Decode_layer_yt_weights'],
                               _bias        = self._weights['Decode_layer_yt_bias'],
                               _activation  = None,
                               _dtype       = tf.float32,
                               _is_bias     = True, )
            (self.decode_cell, self.decode_hidden) = self._LSTMCell(h_t_1 = self.decode_hidden,
                                                                    s_t_1 = self.decode_cell,
                                                                    x_t   = y_t_,
                                                                    name  = 'Decode_LSTMCell')
        pre_y_ = self._Dense(_input       = tf.concat([self.decode_hidden, self.decode_cell], axis=1),
                             _weights     = self._weights['Decode_layer_output_1_weights'],
                             _bias        = self._weights['Decode_layer_output_1_bias'],
                             _activation  = None,
                             _dtype       = tf.float32,
                             _is_bias     = True, )
        pre_y = self._Dense(_input       = pre_y_,
                            _weights     = self._weights['Decode_layer_output_2_weights'],
                            _bias        = self._weights['Decode_layer_output_2_bias'],
                            _activation  = None,
                            _dtype       = tf.float32,
                            _is_bias     = True, )
        return pre_y

注：编码和解码的程序是相似的，但是加入了历史标签
简而言之，论文就是在编码阶段对每一个LSTM单元引入了Attention，解码阶段也是这样

程序实现

程序的具体实现都在我的git账户上，对比其他git上的实现，亲测我的实现mse是更好的，关于具体的程序请看git或者留言与我讨论，关于程序中可能出现的问题欢迎打扰。