复现两篇论文中的时间序列预测模型

星期二, 08. 十月 2019 03:27下午
注：论文实现代码和论文.pdf都在git 账号下，欢迎交流讨论

论文题目：

Learning to Monitor Machine Health with Convolutional Bi-Directional LSTM Networks
Machine health monitoring using local feature-based gated recurrent unit networks

1.第一篇文章基本框架

fig_1.png

论文主要内容：通过论文提出的模型实现对故障的诊断（分类回归都可以实现）
论文的框架主要分为四个部分：
1、原始数据（可以是多个传感器数据）通过滑窗提取时频域特征，假设传感器数目为m，窗口数目为k，每个传感器提取的特征数目为n，则原始数据提过特征提取后的输入为[-1, k, m*n]，其中-1表示batch数目。
住：这一部分在框图和代码中没有体现，但是在论文中可以看出来。如果读者需要套用这个模型，需要自己实现这一部分的功能，如果在用原始数据输入（不经过特征提取）也可以取得很好效果，则这一部分可以省略
2、卷积部分实现空间特征提取，保留时间信息，代码如下:

    @staticmethod
    def cnn_layer(cnn_input=None, k=None, m=None, s=None, d=None):
        cnn1 = tf.contrib.layers.conv2d(cnn_input,
                                        num_outputs=k,
                                        kernel_size=[m, d],
                                        stride=[1, d],
                                        padding='VALID', )
        cnn1_pool = tf.nn.max_pool(cnn1,
                                   ksize=[1, s, 1, 1],
                                   strides=[1, s, 1, 1],
                                   padding='SAME',
                                   name='cnn1_max_pool')
        cnn1_shape = cnn1_pool.get_shape()
        cnn_out = tf.reshape(cnn1_pool, shape=[-1, cnn1_shape[1], cnn1_shape[-1]])
        return cnn_out

可以参考图3结合代码的实现理解，注意理解卷积层池化层的kernel_size和ksize，其中d表示数据的长度，即m*n的值：

2.png

3、两层双向LSTM的堆叠，主要用于在cnn输出的基础上提取时间信息，代码如下：

    @staticmethod
    def bilstm_layer(bilstm_input=None, num_units=None):
        # first bi-lstm cell
        with tf.variable_scope('1st-bi-lstm-layer', reuse=tf.AUTO_REUSE):
            cell_fw_1 = tf.nn.rnn_cell.LSTMCell(num_units=num_units[0], state_is_tuple=True)
            cell_bw_1 = tf.nn.rnn_cell.LSTMCell(num_units=num_units[0], state_is_tuple=True)
            outputs_1, states_1 = tf.nn.bidirectional_dynamic_rnn(cell_fw_1, cell_bw_1, inputs=bilstm_input,
                                                                  dtype=tf.float32)

        # second bi-lstm cell
        with tf.variable_scope('2nd-bi-lstm-layer', reuse=tf.AUTO_REUSE):
            # input_2 = tf.add(outputs_1[0], outputs_1[1])
            input_2 = tf.concat([outputs_1[0], outputs_1[1]], axis=2)
            cell_fw_2 = tf.nn.rnn_cell.LSTMCell(num_units=num_units[1], state_is_tuple=True)
            cell_bw_2 = tf.nn.rnn_cell.LSTMCell(num_units=num_units[1], state_is_tuple=True)
            outputs_2, states_2 = tf.nn.bidirectional_dynamic_rnn(cell_fw_2, cell_bw_2, inputs=input_2,
                                                                  dtype=tf.float32)

        # bilstm output
        with tf.variable_scope('bi-lstm-layer-output', reuse=tf.AUTO_REUSE):
            bilstm_out = tf.concat([states_2[0].h, states_2[1].h], axis=1)
        return bilstm_out

可以参考图c结合代码的实现理解，注意理解两个双向lstm层的拼接（关于该部分的实现也是根据论文原文实现的，如果有问题还请讨论交流）：

3.png

4、全连接层实现最终结果输出，这一部分的实现相对简单，主要对上一层最后在timestep输出的隐层特征作为输入得到最终的结果，代码如下所示：

    @staticmethod
    def fc_layer(fc_input=None, num_units=None, keep_prob=None):
        fc_input_ = tf.nn.dropout(fc_input, keep_prob=keep_prob)
        fc1 = tf.layers.dense(fc_input_, num_units[0], activation=tf.nn.relu,
                              kernel_initializer=tf.glorot_uniform_initializer())
        fc1_ = tf.nn.dropout(fc1, keep_prob=keep_prob)
        fc_out = tf.layers.dense(fc1_, num_units[1], activation=tf.nn.relu,
                                 kernel_initializer=tf.glorot_uniform_initializer())
        # fc_out = tf.layers.dense(fc_out, 1, activation=None, use_bias=False,
        #                          kernel_initializer=tf.glorot_normal_initializer())
        return fc_out

所有代码实现请参考本人git.

2.第二篇文章基本框架

4.png

论文主要内容：通过论文提出的模型实现对故障的诊断（分类回归都可以实现）
论文的框架主要分为四个部分：
1、原始数据（可以是多个传感器数据）通过滑窗提取时频域特征，假设传感器数目为m，窗口数目为k，每个传感器提取的特征数目为n，则原始数据提过特征提取后的输入为[-1, k, m*n]，其中-1表示batch数目。（这与论文1中提到的是一样的，而且可以从下图中看出来），这一部分的内容同样需要读者自己实现。
2、双向GRU实现时间特征提取，这一部分也相对简单，代码如下：

    @staticmethod
    def bigru_layer(bilstm_input=None, num_units=None):
        cell_fw = tf.nn.rnn_cell.GRUCell(num_units=num_units, name='fw')
        cell_bw = tf.nn.rnn_cell.GRUCell(num_units=num_units, name='bw')
        outputs, states = tf.nn.bidirectional_dynamic_rnn(cell_fw, cell_bw, inputs=bilstm_input, dtype=tf.float32)
        bigru_out = tf.concat([states[0], states[1]], axis=1)
        return bigru_out

3、权重平均化部分，这一部分主要通过fc实现，但是需要先对输入到gru的数据进行处理，参考原文公式和代码，由于原文公式较长，该部分只粘贴代码，其主要思想是实现不同窗口在同一个time step上做类似指数的平均，参考代码如下：

    @staticmethod
    def get_weight_average_layer(weight_average_input=None):
        _arr_weight_average_input = np.array(weight_average_input)
        _, T, _ = _arr_weight_average_input.shape
        _arr = []
        for ck in _arr_weight_average_input:    # every batch
            qk = np.array([np.exp(np.min([k - 1, T - k])) for k in range(1, T+1)])
            sigma_qk = np.sum(qk, dtype=np.float32)
            wk = np.array([qj * 1.0 / sigma_qk for qj in qk])
            c = np.array([wk[k]*ck[k] for k in range(T)]).sum(axis=0)
            _arr.append(c)
        return np.array(_arr)

4、这一部分相对简单，就是将第2和第3部分的结果进行concat再通过一个fc学习，代码如下：

    @staticmethod
    def fc_layer_2(fc_input=None, num_units=None, keep_prob=None):
        fc_input_ = tf.nn.dropout(fc_input, keep_prob=keep_prob)
        fc_out = tf.layers.dense(fc_input_, num_units, activation=tf.nn.relu,
                                 kernel_initializer=tf.glorot_uniform_initializer())
        # fc_out = tf.nn.dropout(fc, keep_prob=keep_prob)
        return fc_out

所有代码实现请参考本人git.

注：

代码中有任何问题欢迎与本人交流讨论；
代码需要支持tensorflow-gpu（>=1.10.0）才能运行；
git 上有参考论文和部分用于实验的数据，读者可以运行main_test文件对模型进行检验；
git 中data文件夹下的数据是一个回归问题，随着程序的运行mse会在100附近；
代码中所有参数的表示都在类中注明了。

复现两篇论文中的时间序列预测模型

论文题目：

1.第一篇文章基本框架

2.第二篇文章基本框架

注：

你可能感兴趣的:(复现两篇论文中的时间序列预测模型)