num表示自行车数量,weekday表示星期几,hour表示小时。一共45949条数据,这些数据是按一分钟一次的顺序排列的。
用RNN进行预测的话,实际上用num字段就够了,其他两个字段作为额外的参考信息,读者不妨利用这两条信息构建更复杂的模型,提高预测精度。
接下来我们将用多层LSTM 的RNN神经网络去预测这些序列的值,简单来说,我们有9个连续的num,那么如何预测第10个num是多少?(知道前九分钟的num,预测下一分钟的num)
# 加载依赖库
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import time
import csv
from keras.models import Sequential
from keras.layers.core import Dense, Activation, Dropout
from keras.layers.recurrent import LSTM
np.random.seed(2020)
Using TensorFlow backend.
data = pd.read_csv('/Users/liming/Downloads/bike_rnn.csv')
print(data.shape)
data.head()
(45949, 3)
num | weekday | hour | |
---|---|---|---|
0 | 0 | 5 | 17 |
1 | 1 | 5 | 17 |
2 | 1 | 5 | 17 |
3 | 2 | 5 | 17 |
4 | 4 | 5 | 17 |
bike = data['num']
sequence_length=20 # 用来预测的时间序列长度
result = []
for index in range(len(bike) - sequence_length):
result.append(bike[index: index + sequence_length])
result = np.array(result)
result
array([[0, 1, 1, ..., 2, 1, 2],
[1, 1, 2, ..., 1, 2, 1],
[1, 2, 4, ..., 2, 1, 1],
...,
[8, 8, 8, ..., 6, 6, 6],
[8, 8, 6, ..., 6, 6, 6],
[8, 6, 5, ..., 6, 6, 6]])
result_mean = result.mean()
result_mean
6.997708419517081
result = result - result_mean
print("Shift : ", result_mean)
print("Data : ", result.shape)
Shift : 6.997708419517081
Data : (45929, 20)
row = int(round(0.9 * result.shape[0]))
train = result[:row, :]
np.random.shuffle(train)
X_train = train[:, :-1]
y_train = train[:, -1]
X_test = result[row:, :-1]
y_test = result[row:, -1]
X_train = np.reshape(X_train, (X_train.shape[0], X_train.shape[1], 1))
X_test = np.reshape(X_test, (X_test.shape[0], X_test.shape[1], 1))
#建立模型
model = Sequential()
model.add(Embedding(1,30, input_length=maxlen))
model.add(LSTM(40,return_sequences=True))
model.add(Dropout(0.5))
model.add(LSTM(50,return_sequences=False))
model.add(Dense(100))
model.add(Activation("linear"))
model.compile(loss="mse", optimizer="rmsprop",metrics=['accuracy'])
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
in
1 #建立模型
2 model = Sequential()
----> 3 model.add(Embedding(1,30, input_length=maxlen))
4 model.add(LSTM(40,return_sequences=True))
5 model.add(Dropout(0.5))
NameError: name 'Embedding' is not defined
batch_size = 128
model.fit(X_train, y_train, batch_size = batch_size, nb_epoch=30)
然后开始训练模型,调用 model 的 fit 方法。这里我们重点关注一下predicted方法。
model.evaluate(X_test, y_test, batch_size = batch_size)