信号序列 二维卷积_二维卷积的时间序列预测

信号序列 二维卷积

In this article, I will show you a time series forecasting method I haven’t seen documented elsewhere. I doubt it is a new method, but since I haven’t seen a great article on it, here it is.

在本文中,我将向您展示一个时间序列预测方法,该方法在其他地方没有见过。 我怀疑这是一种新方法,但是由于我还没有看过一篇很棒的文章,所以就在这里。

The Dataset

数据集

The data I used for this project is the data from the Global Energy Forecasting competition, put on by my hometown university, UNC Charlotte. You can find more about it here: http://www.drhongtao.com/gefcom/2017

我用于此项目的数据是由我的家乡大学夏洛特大学(UNC Charlotte)举办的全球能源预测大赛的数据。 您可以在这里找到更多关于它的信息: http : //www.drhongtao.com/gefcom/2017

What you need to know is the data is various readings from an energy grid. Our target is to forecast real-time energy demand for the grid using these data points. The data points also include dew point and dry bulb temperature, since air conditioning is a huge driver of energy consumption.

您需要知道的是数据是来自能源网格的各种读数。 我们的目标是使用这些数据点预测电网的实时能源需求。 数据点还包括露点和干球温度,因为空调是能耗的巨大推动力。

Our target variable is RTDemand: Real Time energy demand for the energy grid we are working with. The data has clear daily cycles. Here are three days of our data:

我们的目标变量是RTDemand:我们正在使用的能源网格的实时能源需求。 数据具有清晰的每日周期。 这是我们三天的数据:

In the middle of the night when the sun is down and everyone is asleep, our power consumption reaches a minimum. We wake up in the morning, head off to work, and our power consumption reaches its maximum as the sun reaches peak intensity. I think the daily dips correspond to commuting times.

在午夜时分,太阳下山,每个人都在睡觉,我们的功耗达到了最低。 我们早上醒来,出发去上班,随着太阳达到峰值强度,我们的功耗达到最大。 我认为每天的逢低与通勤时间相对应。

If we zoom out a little more, we can see clear auto-correlation and trends in days, just as you see in weather. Here’s about 3 weeks of data:

如果再放大一点,就可以在几天中看到清晰的自相关和趋势,就像在天气中看到的那样。 这是大约3周的数据:

3 Weeks of the target variable 3周目标变量

We can also notice a larger seasonal trend if we zoom out even more and look at the data over an entire year:

如果我们进一步放大并查看全年的数据,我们还会注意到更大的季节性趋势:

This is a very good dataset for time series forecasting.

这是用于时间序列预测的非常好的数据集。

Single variable pure time series models

单变量纯时间序列模型

For time series forecasting, we will need to make time sequences leading up to a target outcome. In our examples, I am choosing 72 hours as the length of the time sequence. What this means is the input to our model is 72 individual numbers representing the last 72 hours of data, and the target output we want from our model is its forecast for the 73rd hour. I thought 72 hours was a good length because it can capture local trends and the day/night cycle well.

对于时间序列预测,我们将需要制定导致目标结果的时间序列。 在我们的示例中,我选择72小时作为时间序列的长度。 这意味着对模型的输入是代表过去72小时数据的72个独立数字,而我们希望从模型中获得的目标输出为其对第73小时的预测。 我认为72小时是一个不错的时长,因为它可以捕获本地趋势,并且可以很好地昼夜循环。

Let me show you what I mean by time sequences. Here is our first 3 days of X, our input to the model:

让我告诉您时间序列的含义。 这是X的前3天,即我们对模型的输入:

array([[12055., 11430., 10966., 10725., 10672., 10852., 11255., 11583.,
12238., 12877., 13349., 13510., 13492., 13314., 13156., 13364.,
14632., 15653., 15504., 15088., 14579., 13882., 12931., 11883.,
10978., 10406., 10089., 9982., 10031., 10289., 10818., 11444.,
12346., 13274., 13816., 14103., 14228., 14154., 14055., 14197.,
15453., 16531., 16410., 15954., 15337., 14347., 13178., 12106.,
11400., 11059., 10959., 11073., 11485., 12645., 14725., 15863.,
16076., 16222., 16358., 16362., 16229., 16123., 15976., 16127.,
17359., 18818., 18724., 18269., 17559., 16383., 14881., 13520.],
[11430., 10966., 10725., 10672., 10852., 11255., 11583., 12238.,
12877., 13349., 13510., 13492., 13314., 13156., 13364., 14632.,
15653., 15504., 15088., 14579., 13882., 12931., 11883., 10978.,
10406., 10089., 9982., 10031., 10289., 10818., 11444., 12346.,
13274., 13816., 14103., 14228., 14154., 14055., 14197., 15453.,
16531., 16410., 15954., 15337., 14347., 13178., 12106., 11400.,
11059., 10959., 11073., 11485., 12645., 14725., 15863., 16076.,
16222., 16358., 16362., 16229., 16123., 15976., 16127., 17359.,
18818., 18724., 18269., 17559., 16383., 14881., 13520., 12630.],
[10966., 10725., 10672., 10852., 11255., 11583., 12238., 12877.,
13349., 13510., 13492., 13314., 13156., 13364., 14632., 15653.,
15504., 15088., 14579., 13882., 12931., 11883., 10978., 10406.,
10089., 9982., 10031., 10289., 10818., 11444., 12346., 13274.,
13816., 14103., 14228., 14154., 14055., 14197., 15453., 16531.,
16410., 15954., 15337., 14347., 13178., 12106., 11400., 11059.,
10959., 11073., 11485., 12645., 14725., 15863., 16076., 16222.,
16358., 16362., 16229., 16123., 15976., 16127., 17359., 18818.,
18724., 18269., 17559., 16383., 14881., 13520., 12630., 12223.]])

Each number in the array is a reading of RTDemand: How many kilowatts of power were required for that hour from this particular power station. Each of the three big arrays has 72 hours worth of data in it. If you look carefully at the first 8 or so readings in each of these 3 arrays of 72, you’ll notice that each new set of 72 is a series that has been moved forward 1 hour in time. So each one of these 72-length input arrays represents the last 72 hours of readings for realtime demand for this energy grid.

阵列中的每个数字都是RTDemand的读数:该特定发电站在该小时内需要多少千瓦的功率。 三个大型阵列中的每个阵列都有72小时的数据价值。 如果您仔细观察这3组72个数组中的每一个的前8个左右的读数,您会注意到每个新的72组都是一个系列,已经向前移动了1个小时。 因此,这些72个长度的输入阵列中的每个阵列都代表该能源网格的实时需求的最近72个小时的读数。

We then want to forecast the 73rd hour, so our y array will look like this:

然后,我们要预测第73小时,因此我们的y数组将如下所示:

array([[12630.],
[12223.],
[12070.]])

Notice that if you look at the final entry in the second X array above, it is also the first entry here in our Y, and the final entry in the third X array is the second entry in our Y here. With the first X array, then, we’re trying to predict this second value in the Y series.

请注意,如果您查看上面第二个X数组中的最后一个条目,它也是我们Y中的第一个条目,而第三个X数组中的最后一个条目是此处Y的第二个条目。 然后,使用第一个X数组,我们试图预测Y系列中的第二个值。

Data transformations

数据转换

Once we have our data loaded and windowed, we next need to transform it into a proper set for training machine learning models. First, I’ll be scaling all input variables. Later we will look at using all 12 inputs of the dataset, but for now I’ll introduce the idea with just 1 variable. I will not be scaling my target variable, my Y, because I feel it makes monitoring the progress of the model much easier for minimal cost. Next, we’ll be splitting this into a train/test split:

加载并浏览数据后,接下来我们需要将其转换为训练机器学习模型的适当集合。 首先,我将缩放所有输入变量。 稍后,我们将研究使用数据集的所有12个输入,但是现在,我将仅介绍1个变量。 我不会缩放目标变量Y,因为我认为它使监视模型的进度更加容易,而且成本最低。 接下来,我们将其分为训练/测试部分:

from sklearn.preprocessing import StandardScalerscaler = StandardScaler()
X = scaler.fit_transform(X)split = int(0.8 * len(X))
X_train = X[: split - 1]
X_test = X[split:]y_train = y[: split - 1]
y_test = y[split:]

Finally, our shapes are a little off. The input to the models we’ll be working with is (Samples, Timesteps, Features). In this first model, we’re only using the time-windowed target variable as our input. So, we only have 1 feature. Our X_train.shape is currently (Samples, Timesteps). We’ll reshape it now, after the train/test split above:

最后,我们的形状有些偏离。 我们将使用的模型的输入是(样本,时间步长,特征)。 在第一个模型中,我们仅使用时间窗口目标变量作为输入。 因此,我们只有1个功能。 当前,我们的X_train.shape(样本,时间步长)。 在上面的训练/测试拆分之后,我们现在将重塑它:

X_train = X_train.reshape((X_train.shape[0], X_train.shape[1], 1))
X_test = X_test.reshape((X_test.shape[0], X_test.shape[1], 1))
X_train.shape
(61875, 72, 1)

That is 61,875 samples, each made of 72 hourly readings, and 1 feature. Ready to rock.

即61,875个样本,每个样本包含72小时的读数和1个特征。 准备摇滚。

Benchmark model:

基准模型:

First, a benchmark. Our optimization metric will be mean squared error/root mean squared error. So first we need to know what a good vs. bad reading of that number might be. We’ll also be looking at R², though we will only use mean squared error as our loss function and optimization target if there is a conflict.

首先是基准。 我们的优化指标将是均方误差/均方根误差。 因此,首先我们需要知道该数字的好坏。 我们还将关注R²,尽管如果发生冲突,我们将仅使用均方误差作为损失函数和优化目标。

For the benchmark model, we’ll see what reading we get for mean squared error and R². The benchmark model here will be to guess the previous value in our time series. Here’s some code to come up with a quick read on this model:

对于基准模型,我们将看到均方误差和R²的读数。 这里的基准模型将是猜测我们时间序列中的先前值。 下面是一些可以快速阅读此模型的代码:

# Benchmark model
prev_val = y_test[0]
sse = 0for n in range(0, len(y_test)-1):
err = y_test[n] — prev_val
sq_err = err ** 2
sse = sse + sq_err
prev_val = y_test[n]mse = sse / n
mse

With our test dataset, this produces an answer of 411,577.17, and the square root of that is 641.54. One way to interpret this is that, on average, this benchmark model is off by 641.54 megawatts for a given hour. Here is a graph of the benchmark model vs. the real results.

对于我们的测试数据集,得出的答案为411,577.17,其平方根为641.54。 一种解释方式是,在给定的时间内,该基准模型平均减少了641.54兆瓦。 这是基准模型与实际结果的图表。

This won’t be an easy model to beat, even though it is quite simple.

尽管它很简单,但这并不是一个容易克服的模型。

1-Variable LSTM model:

1变量LSTM模型:

Now that we have our dataset set up, we can start working with machine learning models to make our forecast.

现在我们已经建立了数据集,我们可以开始使用机器学习模型进行预测了。

One common way to forecast time series is LSTM models. This will provide a good benchmark learned model to compare with our convolved models. Here’s some code for setting up and predicting with an LSTM model:

LSTM模型是预测时间序列的一种常用方法。 这将提供一个良好的基准学习模型,以与我们的卷积模型进行比较。 这是一些用于使用LSTM模型进行设置和预测的代码:

def basic_LSTM(window_size=5, n_features=1):
new_model = keras.Sequential()
new_model.add(tf.keras.layers.LSTM(100,
input_shape=(window_size, n_features),
return_sequences=True,
activation=’relu’))
new_model.add(tf.keras.layers.Flatten())
new_model.add(tf.keras.layers.Dense(1500, activation=’relu’))
new_model.add(tf.keras.layers.Dense(100, activation=’linear’))
new_model.add(tf.keras.layers.Dense(1))
new_model.compile(optimizer=”adam”, loss=”mean_squared_error”)
return new_modells_model = basic_LSTM(window_size=window_size, n_features=X_train.shape[2])ls_model.summary()Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
lstm (LSTM) (None, 72, 100) 40800
_________________________________________________________________
flatten (Flatten) (None, 7200) 0
_________________________________________________________________
dense (Dense) (None, 1500) 10801500
_________________________________________________________________
dense_1 (Dense) (None, 100) 150100
_________________________________________________________________
dense_2 (Dense) (None, 1) 101
=================================================================
Total params: 10,992,501
Trainable params: 10,992,501
Non-trainable params: 0

After training, we can evaluate the model:

训练后,我们可以评估模型:

ls_model.evaluate(X_test, y_test, verbose=0)
1174830.0587427279from sklearn.metrics import r2_score
predictions = ls_model.predict(X_test)
test_r2 = r2_score(y_test, predictions)
test_r2
0.8451637094740732

The results we get are Ok, but not stellar. Specifically, we wind up with a higher error than our previous benchmark model. Here’s the graph to get an idea of what it was predicting:

我们得到的结果还可以,但不是很好。 具体来说,我们得出的误差比以前的基准模型高。 以下图表可让您大致了解其预测:

As you can see, it predicts a decent amount of the variability, but ultimately is not that great an outcome. The issue seems to be relatively large errors during the ‘dip’ in the morning. I also found this model to be very unreliable, frequently reducing its loss function to nan and producing no output.

如您所见,它预测了相当大的可变性,但最终并不是那么好结果。 这个问题似乎是早晨“下潜”期间相对较大的错误。 我还发现此模型非常不可靠,经常将其损失函数减小为nan且不产生任何输出。

Introducing a 1D convolution method

介绍一维卷积方法

Another method for forecasting time series is using a 1D convolution model. A 1D convolution uses a filter window and cycles that window over your data to produce a new output. Depending on the learned parameters of the convolution windows, they can act like moving averages, direction indicators, or detectors of patterns across time. Let me explain the method with some images.

预测时间序列的另一种方法是使用一维卷积模型。 一维卷积使用过滤器窗口并在该窗口上循环显示数据,以产生新的输出。 根据卷积窗口的学习参数,它们的作用就像移动平均值,方向指示器或跨时间的模式检测器。 让我用一些图像来解释该方法。

Step 1 第1步

Here we have a dataset that has 8 elements, and a filter size of 4. The four numbers in the filter are the parameters learned by a Conv1D layer. In the first step, we multiply the elements of the filter times the input data, and add together the results to produce a convolved output.

在这里,我们有一个数据集,其中包含8个元素,并且过滤器大小为4。过滤器中的四个数字是Conv1D层学习的参数。 第一步,我们将过滤器的元素乘以输入数据,然后将结果相加以产生卷积输出。

Step 2 第2步

In the second step of a convolution, the window is moved over by one and the same process is repeated to produce a second output.

在卷积的第二步中,将窗口移动一个,然后重复相同的过程以产生第二个输出。

Last step in 1D convolution 一维卷积的最后一步

This process continues until your window hits the end of your input data. In our case, one input data sequence is the 72 hours of data we’ve set up previously. If we add the option padding=”same”, our input data will be equally padded with zeros on the beginning and end to maintain an output length equal to the input length. The demonstration above uses a linear activation, meaning that last multi-colored array is our output. However, you can use a whole host of activation functions here, which will run this number through an additional step. So, in our example below, there will be a ReLU activation function applied to this last output to produce the final, final output.

这个过程一直持续到您的窗口到达输入数据的末尾为止。 在我们的例子中,一个输入数据序列是我们之前设置的72小时数据。 如果我们添加选项padding =“ same”,则我们的输入数据将在开头和结尾处均以零填充,以保持输出长度等于输入长度。 上面的演示使用了线性激活,这意味着最后一个彩色数组是我们的输出。 但是,您可以在此处使用大量的激活功能,它们将通过一个额外的步骤来运行此编号。 因此,在下面的示例中,将对最后输出应用ReLU激活函数以产生最终的最终输出。

Here is the code for setting up and running a 1D convolution, given the data setup we previously described:

鉴于我们先前描述的数据设置,以下是用于设置和运行一维卷积的代码:

def basic_conv1D(n_filters=10, fsize=5, window_size=5, n_features=2):
new_model = keras.Sequential()
new_model.add(tf.keras.layers.Conv1D(n_filters, fsize, padding=”same”, activation=”relu”, input_shape=(window_size, n_features)))
# Flatten will take our convolution filters and lay them out end to end so our dense layer can predict based on the outcomes of each
new_model.add(tf.keras.layers.Flatten())
new_model.add(tf.keras.layers.Dense(1800, activation=’relu’))
new_model.add(tf.keras.layers.Dense(100))
new_model.add(tf.keras.layers.Dense(1))
new_model.compile(optimizer=”adam”, loss=”mean_squared_error”)
return new_model

Here’s what it looks like with our dataset:

这是我们的数据集的外观:

univar_model = basic_conv1D(n_filters=24, fsize=8, window_size=window_size, n_features=X_train.shape[2])univar_model.summary()Model: "sequential_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv1d (Conv1D) (None, 72, 24) 216
_________________________________________________________________
flatten_1 (Flatten) (None, 1728) 0
_________________________________________________________________
dense_3 (Dense) (None, 1800) 3112200
_________________________________________________________________
dense_4 (Dense) (None, 100) 180100
_________________________________________________________________
dense_5 (Dense) (None, 1) 101
=================================================================
Total params: 3,292,617
Trainable params: 3,292,617
Non-trainable params: 0

Notice here I have 24 convolution windows, and the filter size is 8. So, in our case, the input data will be 72 hours, this will create a window with a size of 8, and there will be 24 of those filters. Because I used padding=”same”, the width of the output of each of those filters will be 72, just like our input data, and the number of outputs will be 24 convolved arrays. Flatten produces an output of 72 * 24 = 1,728 length array. To continue our sample convolution from above, it would look like this:

请注意,这里有24个卷积窗口,过滤器大小为8。因此,在我们的示例中,输入数据将为72小时,这将创建一个大小为8的窗口,并且其中有24个过滤器。 因为我使用padding =“ same”,所以每个滤波器的输出宽度将是72,就像我们的输入数据一样,输出的数量将是24个卷积数组。 展平产生的输出为72 * 24 = 1,728个长度的数组。 为了从上面继续我们的示例卷积,它看起来像这样:

How flatten works 展平如何工作

So, how does this method work vs. an LSTM and a dumb model?

那么,与LSTM和哑模型相比,该方法如何工作?

Ok, so it worked a bit better than the LSTM, but it still isn’t up to par with the original benchmark model of ‘just guess the previous value’. When we look at the graph, we can see a clear bias in this model:

好的,所以它比LSTM更好用,但是仍然不能与原始基准模型“只是猜测先前的值”相提并论。 当我们查看图表时,可以看到此模型有明显的偏差:

1-variable Conv1D 1变量Conv1D

Adding more data

添加更多数据

In our case, we’re only using the feature we want to predict as our input variable. However, our dataset came with 12 possible input variables. We can stack up all of our input variables and use them together to make a prediction. Since many of the input variables are moderately to strongly correlated to our output variable, it should be possible to make a better prediction with more data, right? Well, there’s a bit of a problem with that in 1D convolutions.

在本例中,我们仅将要预测的功能用作输入变量。 但是,我们的数据集带有12个可能的输入变量。 我们可以堆叠所有输入变量,并一起使用它们进行预测。 由于许多输入变量与我们的输出变量具有中等程度的相关性,因此应该可以通过更多数据做出更好的预测,对吗? 好吧,在一维卷积中存在一些问题。

Multivariate Conv1D 多元Conv1D

I believe this is what is going on in the case of a 1D convolution window being applied to a multi-series input. If I am right, then adding more datasets would tend to ‘blur out’ the impact of any one particular input changing, and instead should produce a less accurate model.

我相信这是将一维卷积窗口应用于多序列输入的情况。 如果我是对的,那么添加更多的数据集将趋于“模糊”任何特定输入更改的影响,而应该产生一个精度较低的模型。

If I want to stack a different data series into my model, I first have to run it through the same windowing process to produce a set of observations which each contain the last 72 readings of the variable. So, for instance, if I wanted to add in the column 1 variable, DADemand (day ahead demand, the demand from the previous day at this time), I would do the following to it:

如果要将不同的数据系列堆叠到模型中,则首先必须通过相同的窗口化过程来运行它,以产生一组观测值,每个观测值都包含变量的最后72个读数。 因此,例如,如果我想在第1列变量DADemand中添加(提前一天的需求,这时来自前一天的需求),则可以执行以下操作:

(DADemand, _) = window_data(gc_df, window_size, 1, 1)scaler = StandardScaler()DADemand = scaler.fit_transform(DADemand)split = int(0.8 * len(X))DADemand_train = DADemand[: split — 1]DADemand_test = DADemand[split:]DADemand_test.shape
(61875, 72, 1)

Then, I can continue that process for all 12 of my variables, and stack them up into a single set like this:

然后,我可以对所有12个变量继续执行该过程,并将它们堆叠成一个单独的集合,如下所示:

data_train = np.concatenate((X_train, db_train, dew_train, DADemand_train, DALMP_train, DAEC_train, DACC_train, DAMLC_train, RTLMP_train, RTEC_train, RTCC_train, RTMLC_train), axis=2)data_test = np.concatenate((X_test, db_test, dew_test, DADemand_test, DALMP_test, DAEC_test, DACC_test, DAMLC_test, RTLMP_test, RTEC_test, RTCC_test, RTMLC_test), axis=2)data_train.shape(61875, 72, 12)

So that is 61,875 examples, each is 72 hours of individual readings of each of 12 different time series here. We’ll now run this through a Conv1D net and see what results we get. If you take a peek back at our function for creating these models, you’ll notice that one of the variables is the number of features, so the code to run this and an LSTM was as straightforward as creating a model using the existing code but our new stacked data, fit, evaluate, and graph. Here’s how it all worked out:

因此,这是61,875个示例,每个示例是这里12个不同时间序列中每个序列的72小时的单独读数。 现在,我们将通过Conv1D网络运行此操作,并查看得到的结果。 如果您回顾一下我们用于创建这些模型的功能,您会注意到其中一个变量是功能的数量,因此运行此代码和LSTM的代码与使用现有代码创建模型一样简单,但是我们新的堆叠数据,拟合,评估和绘图。 这是全部的结果:

Yowza! 哇!

As expected, our performance here actually declined with the additional variables, I believe because of the blurring effect. As I sat upon a midnight dreary and pondered weak and weary, I found a solution to this problem.

不出所料,我相信这里的表现实际上会因附加变量而下降,我相信是由于效果模糊所致。 当我坐在午夜的沉闷中,思考着疲惫和疲倦时,我找到了解决这个问题的方法。

2D Convolutions

2D卷积

Ok, so what we need here is a convolution window that looks through our features and figures out which features are the good ones. It basically needs to look like this:

好的,所以我们这里需要的是一个卷积窗口,它可以查看我们的功能并找出哪些功能是好的。 它基本上需要看起来像这样:

What I need 我需要的

After doing some research, this shape can be achieved with a 2D convolution window shaped as (1, filter_size), and in the image above, filter_size=3. Back in our energy forecasting problem, we have 12 features. In order to get it into a 2D convolution window, we’ll actually need it to have 4 dimensions. We can get the other dimension with:

在进行了一些研究之后,可以使用形状为(1,filter_size)的2D卷积窗口来实现此形状,在上图中,filter_size = 3。 回到我们的能源预测问题,我们有12个功能。 为了使它进入2D卷积窗口,我们实际上需要它具有4维。 我们可以通过以下方式获得另一个维度:

data_train_wide = data_train.reshape((data_train.shape[0], data_train.shape[1], data_train.shape[2], 1))
data_test_wide = data_test.reshape((data_test.shape[0], data_test.shape[1], data_test.shape[2], 1))data_train_wide.shape(61875, 72, 12, 1)

I did some testing of various shapes for this 2D window and found that doing 2 features at a time worked best:

我对此2D窗口进行了各种形状的测试,发现一次完成2个功能最有效:

def basic_conv2D(n_filters=10, fsize=5, window_size=5, n_features=2):
new_model = keras.Sequential()
new_model.add(tf.keras.layers.Conv2D(n_filters, (1,fsize), padding=”same”, activation=”relu”, input_shape=(window_size, n_features, 1)))
new_model.add(tf.keras.layers.Flatten())
new_model.add(tf.keras.layers.Dense(1000, activation=’relu’))
new_model.add(tf.keras.layers.Dense(100))
new_model.add(tf.keras.layers.Dense(1))
new_model.compile(optimizer=”adam”, loss=”mean_squared_error”)
return new_modelm2 = basic_conv2D(n_filters=24, fsize=2, window_size=window_size, n_features=data_train_wide.shape[2])m2.summary()Model: "sequential_4"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d (Conv2D) (None, 72, 12, 24) 72
_________________________________________________________________
flatten_4 (Flatten) (None, 20736) 0
_________________________________________________________________
dense_12 (Dense) (None, 1000) 20737000
_________________________________________________________________
dense_13 (Dense) (None, 100) 100100
_________________________________________________________________
dense_14 (Dense) (None, 1) 101
=================================================================
Total params: 20,837,273
Trainable params: 20,837,273
Non-trainable params: 0

Ok, so the model is quite huge. It took about 4 minutes per epoch to train on my regular CPU. When it was done, though, I evaluated it and graphed it, and my friend it paid out like the lottery:

好的,因此模型非常庞大。 在我的常规CPU上训练,每个纪元大约需要4分钟。 不过,完成后,我对其进行了评估并绘制了图表,而我的朋友则像抽奖一样付款了:

YOWZA 尤萨

And how does this model compare to our previous models?

与以前的模型相比,该模型如何?

The proof in the pudding 布丁中的证明

So this model far outperformed all previous models and beat our benchmark ‘dumb model’ by a giant margin.

因此,该模型的性能远远超过所有以前的模型,并大大超过了我们的基准“哑模型”。

But wait, there’s more!

但是,等等,还有更多!

Ok, so just as a bonus model you didn’t think you’d get: What if we used a similar idea, but also convolved over 8 hours at a time with a filter shape of (8,1)?

好的,就像奖金模型一样,您认为您不会得到:如果我们使用了类似的想法,但是一次过滤了(8,1),经过了8个小时的卷积该怎么办?

Here’s the code for that next deeper layer:

这是下一个更深层的代码:

def deeper_conv2D(n_filters=10, fsize=5, window_size=5, n_features=2, hour_filter=8):
new_model = keras.Sequential()
new_model.add(tf.keras.layers.Conv2D(n_filters, (1,fsize), padding=”same”, activation=”linear”, input_shape=(window_size, n_features, 1)))
new_model.add(tf.keras.layers.Conv2D(n_filters, (hour_filter, 1), padding=”same”, activation=”relu”))
new_model.add(tf.keras.layers.Flatten())
new_model.add(tf.keras.layers.Dense(1000, activation=’relu’))
new_model.add(tf.keras.layers.Dense(100))
new_model.add(tf.keras.layers.Dense(1))
new_model.compile(optimizer=”adam”, loss=”mean_squared_error”)
return new_model

This model performed very well too:

该模型也表现出色:

Based on our loss metric/optimization target, this model performed better than any other:

根据我们的损失指标/优化目标,该模型的效果优于其他任何模型:

Our final results! 我们的最终结果!

You can find all the code for this, as well as the sample dataset, at this location: https://github.com/walesdata/2Dconv_pub

您可以在以下位置找到所有代码以及示例数据集: https : //github.com/walesdata/2Dconv_pub

You can find out more about me and what I’m up to here: https://www.linkedin.com/in/john-wales-62832b5/

您可以在这里找到有关我的更多信息以及我的打算: https : //www.linkedin.com/in/john-wales-62832b5/

翻译自: https://towardsdatascience.com/time-series-forecasting-with-2d-convolutions-4f1a0f33dff6

信号序列 二维卷积

你可能感兴趣的:(python,人工智能,机器学习,算法,java)