EA-LSTM: Evolutionary attention-based LSTM for time series prediction

原文链接

1. Introduction

做法: Based on the idea of evolutionary computation[21],we propose a competitive random search (CRS) instead of the gradient-based method to solve the attention layer weights.

Genetic Algorithms and the Optimal Allocation of Trials

为什么要引入CRS:change the search direction to avoid falling into local optimum.

[22] X. Zhang, J. Clune, K.O. Stanley, On the relationship between the OpenAI evolution strategy and stochastic gradient descent, 2017, arXiv:1712. 06564.
[23] E. Conti, V. Madhavan, F. Petroski Such, J. Lehman, K.O. Stanley, J. Clune, Improving exploration in evolution strategies for deep reinforcement learning via a population of novelty-seeking agents, 2017, arXiv:1712. 06560.
[24] Joel Lehman, Jay Chen, Jeff Clune, Kenneth O. Stanley, Safe mutations for deep and recurrent neural multi-order through output gradients, 2017, arXiv:1712.06563.

对GA的操作

  1. In particular, theimproved crossoveroperator has integrated more stochastic mechanisms to maintain the differences between the progeny individuals,
    目的:avoiding premature convergence of the algorithm and being trapped in local optimum.

  2. use the basic bit mutation operator to specifically perform the mutation operation by randomly inverting one or several gene values at the locus according to the mutation rate on a single encoded string.

introduction的知识点:

  1. 遗传算法

2. Preliminaries(预先准备)

时序数据用于分类和回归。

数据集为 X = ( X 1 , X 2 , . . . X T ) X=(X_1, X_2,... X_T) X=(X1,X2,...XT),其中每一个 X t = ( x t 1 , x t 2 , . . . x t L ) X_t=(x_t^1, x_t^2,...x_t^L) Xt=(xt1,xt2,...xtL)代表 L L L个timestamps。其中每个时刻对应的输出值则记为 y y y

离散or 回归则取决于Y的数据是连续的还是discrete的。

目标:根据历史时刻的输入数据和输出数据找到映射函数 y ~ T \widetilde{y}_T y T,数学表达式为:
y ~ T = f ( X , y ) \widetilde{y}_T=f(X, y) y T=f(X,y)

3. Methodology(方法论)

3.1 Overview

本章结构:

  1. ,we first give the overview of the model we proposed
  2. we will detail the evolutionary attention-based LSTM.
  3. we present the competitive random search and a collaborative(协作训练) training mechanism.

工作流程如下:
EA-LSTM: Evolutionary attention-based LSTM for time series prediction_第1张图片

3.2 整体算法流程

  1. 定义注意力层的权重为:
    W = ( W 1 , W 2 , . . . , W L ) W=(W^1, W^2, ..., W^L) W=(W1,W2,...,WL)
    这里的L是timestamps的个数。根据注意力层的权重对LSTM层的输出进行采样。
    X ~ t = ( x t 1 W 1 , x t 1 W 2 , . . . , x t 1 W L ) \widetilde{X}_t=(x_t^1W^1, x_t^1W^2, ..., x_t^1W^L) X t=(xt1W1,xt1W2,...,xt1WL)
  2. 然后把 X ~ t \widetilde{X}_t X t喂到LSTM层中,LSTM的计算公式:
    EA-LSTM: Evolutionary attention-based LSTM for time series prediction_第2张图片在这里插入图片描述
  3. 作者把 h t − 1 h^{t-1} ht1作为输出 y ~ t \widetilde{y}_t y t,然后拼成一个矩阵。 y ~ T = ( y ~ 1 , y ~ 2 , . . . , y ~ T ) \widetilde{y}_T=(\widetilde{y}^1, \widetilde{y}^2, ..., \widetilde{y}^T) y T=(y 1,y 2,...,y T)

3.2 Competitive random search

EA-LSTM: Evolutionary attention-based LSTM for time series prediction_第3张图片

  1. 把part a中的权重进行二进制编码, 每一个个体 W i W_i Wi 对应的权重传递到 part b,利用遗传算法筛选出最合适的 权重组合

这里并未使用所有的权重,而是挑选出了最合适的权重,umm,跟原来想的不太一样,原本以为是通过遗传算法训练attention的weight,现在只是通过遗传算法找到那些weight合适,其实做了一个筛选操作。送到LSTM神经网络中根据误差进行训练。

EA-LSTM: Evolutionary attention-based LSTM for time series prediction_第4张图片
2. 然后重复步骤 c。
3. 最后构建新的种群。

EA-LSTM: Evolutionary attention-based LSTM for time series prediction_第5张图片

你可能感兴趣的:(文献阅读)