在本节中,我们可以更新普通的LSTM以使用编解码器模型。这意味着模型不会直接输出向量序列。相反,该模型将由两个子模型组成,用于读取和编码输入序列的编码器,以及读取编码的输入序列并对输出序列中的每个元素进行一步预测的解码器。这种差别很细微,因为实际上这两种方法都可以预测序列输出。重要的不同之处在于,解码器使用了LSTM模型,这使得解码器既可以知道前一天在序列中预测了什么,又可以在输出序列时积累内部状态。让我们仔细看看这个模型是如何定义的。和前面一样,我们定义了一个包含200个单元的LSTM隐藏层。这是解码器模型,它将读取输入序列并输出一个200个元素向量(每个单元一个输出),该元素向量从输入序列捕获特性。
我们将使用14天的总功耗作为输入。
# define model
model = Sequential()
model.add(LSTM(200, activation='relu', input_shape=(n_timesteps, n_features)))
我们将使用一个简单的编码器-解码器架构,易于在Keras中实现,这与LSTM自动编码器的架构有很多相似之处。首先,对输入序列的内部表示进行多次重复,对于输出序列中的每个时间步长重复一次。这个向量序列将被呈现给LSTM解码器。
model.add(RepeatVector(7))
然后我们将解码器定义为一个包含200个单元的LSTM隐藏层。重要的是,解码器将输出整个序列,而不仅仅是序列末尾的输出,就像我们对编码器所做的那样。这意味着200个单元中的每一个单元都将为7天中的每一天输出一个值,表示输出序列中每天预测的内容的基础。
model.add(LSTM(200, activation='relu', return_sequences=True))
然后,我们将使用一个完全连接的层来解释最终输出层之前输出序列中的每个时间步长。重要的是,输出层预测输出序列中的一个步骤,不是一次七天,这意味着我们将对输出序列中的每个步骤使用相同的层。它的意思是相同的完全连接层和输出层将用于处理解码器提供的每个时间步长。为了实现这一点,我们将解释层和输出层封装在一个TimeDistributed包装器中,该包装器允许从解码器每次执行步骤时都使用所封装的层。模型。添加(TimeDistributed(密度(100年,激活= ' relu ')))model.add (TimeDistributed(密度(1))):
model.add(TimeDistributed(Dense(100, activation='relu')))
model.add(TimeDistributed(Dense(1)))
这允许LSTM解码器计算出输出序列中每个步骤所需的上下文,以及用于单独解释每个时间步骤的被包装的密集层,同时重用相同的权重来执行解释。另一种方法是将LSTM解码器创建的所有结构压平,并直接输出矢量。您可以尝试将其作为一个扩展,以查看它是如何进行比较的。因此,网络输出与输入结构相同的三维向量,具有维数[样本、时间步长、特征]。它只有一个功能,即每天消耗的总电量,而且总是有7个功能。因此,一个单一的一周预测将有大小:[1,7,1]。因此,在对模型进行训练时,我们必须对输出数据(y)进行重构,使其具有三维结构,而不是上一节所使用的【sample, features】的二维结构。
# reshape output into [samples, timesteps, features]
train_y = train_y.reshape((train_y.shape[0], train_y.shape[1], 1))
# train the model
def build_model(train, n_input):
# prepare data
train_x, train_y = to_supervised(train, n_input)
# define parameters
verbose, epochs, batch_size = 0, 20, 16
n_timesteps, n_features, n_outputs = train_x.shape[1], train_x.shape[2], train_y.shape[1]
# reshape output into [samples, timesteps, features]
train_y = train_y.reshape((train_y.shape[0], train_y.shape[1], 1))
# define model
model = Sequential()
model.add(LSTM(200, activation='relu', input_shape=(n_timesteps, n_features)))
model.add(RepeatVector(n_outputs))
model.add(LSTM(200, activation='relu', return_sequences=True))
model.add(TimeDistributed(Dense(100, activation='relu')))
model.add(TimeDistributed(Dense(1)))
model.compile(loss='mse', optimizer='adam')
# fit network
model.fit(train_x, train_y, epochs=epochs, batch_size=batch_size, verbose=verbose)
return model
# univariate multi-step encoder-decoder lstm
from math import sqrt
from numpy import split
from numpy import array
from pandas import read_csv
from sklearn.metrics import mean_squared_error
from matplotlib import pyplot
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Flatten
from keras.layers import LSTM
from keras.layers import RepeatVector
from keras.layers import TimeDistributed
# split a univariate dataset into train/test sets
def split_dataset(data):
# split into standard weeks
train, test = data[1:-328], data[-328:-6]
# restructure into windows of weekly data
train = array(split(train, len(train)/7))
test = array(split(test, len(test)/7))
return train, test
# evaluate one or more weekly forecasts against expected values
def evaluate_forecasts(actual, predicted):
scores = list()
# calculate an RMSE score for each day
for i in range(actual.shape[1]):
# calculate mse
mse = mean_squared_error(actual[:, i], predicted[:, i])
# calculate rmse
rmse = sqrt(mse)
# store
scores.append(rmse)
# calculate overall RMSE
s = 0
for row in range(actual.shape[0]):
for col in range(actual.shape[1]):
s += (actual[row, col] - predicted[row, col])**2
score = sqrt(s / (actual.shape[0] * actual.shape[1]))
return score, scores
# summarize scores
def summarize_scores(name, score, scores):
s_scores = ', '.join(['%.1f' % s for s in scores])
print('%s: [%.3f] %s' % (name, score, s_scores))
# convert history into inputs and outputs
def to_supervised(train, n_input, n_out=7):
# flatten data
data = train.reshape((train.shape[0]*train.shape[1], train.shape[2]))
X, y = list(), list()
in_start = 0
# step over the entire history one time step at a time
for _ in range(len(data)):
# define the end of the input sequence
in_end = in_start + n_input
out_end = in_end + n_out
# ensure we have enough data for this instance
if out_end < len(data):
x_input = data[in_start:in_end, 0]
x_input = x_input.reshape((len(x_input), 1))
X.append(x_input)
y.append(data[in_end:out_end, 0])
# move along one time step
in_start += 1
return array(X), array(y)
# train the model
def build_model(train, n_input):
# prepare data
train_x, train_y = to_supervised(train, n_input)
# define parameters
verbose, epochs, batch_size = 0, 20, 16
n_timesteps, n_features, n_outputs = train_x.shape[1], train_x.shape[2], train_y.shape[1]
# reshape output into [samples, timesteps, features]
train_y = train_y.reshape((train_y.shape[0], train_y.shape[1], 1))
# define model
model = Sequential()
model.add(LSTM(200, activation='relu', input_shape=(n_timesteps, n_features)))
model.add(RepeatVector(n_outputs))
model.add(LSTM(200, activation='relu', return_sequences=True))
model.add(TimeDistributed(Dense(100, activation='relu')))
model.add(TimeDistributed(Dense(1)))
model.compile(loss='mse', optimizer='adam')
# fit network
model.fit(train_x, train_y, epochs=epochs, batch_size=batch_size, verbose=verbose)
return model
# make a forecast
def forecast(model, history, n_input):
# flatten data
data = array(history)
data = data.reshape((data.shape[0]*data.shape[1], data.shape[2]))
# retrieve last observations for input data
input_x = data[-n_input:, 0]
# reshape into [1, n_input, 1]
input_x = input_x.reshape((1, len(input_x), 1))
# forecast the next week
yhat = model.predict(input_x, verbose=0)
# we only want the vector forecast
yhat = yhat[0]
return yhat
# evaluate a single model
def evaluate_model(train, test, n_input):
# fit model
model = build_model(train, n_input)
# history is a list of weekly data
history = [x for x in train]
# walk-forward validation over each week
predictions = list()
for i in range(len(test)):
# predict the week
yhat_sequence = forecast(model, history, n_input)
# store the predictions
predictions.append(yhat_sequence)
# get real observation and add to history for predicting the next week
history.append(test[i, :])
# evaluate predictions days for each week
predictions = array(predictions)
score, scores = evaluate_forecasts(test[:, :, 0], predictions)
return score, scores
# load the new file
dataset = read_csv('household_power_consumption_days.csv', header=0, infer_datetime_format=True, parse_dates=['datetime'], index_col=['datetime'])
# split into train and test
train, test = split_dataset(dataset.values)
# evaluate model and get scores
n_input = 14
score, scores = evaluate_model(train, test, n_input)
# summarize scores
summarize_scores('lstm', score, scores)
# plot scores
days = ['sun', 'mon', 'tue', 'wed', 'thr', 'fri', 'sat']
pyplot.plot(days, scores, marker='o', label='lstm')
pyplot.show()