股票价格预测是金融领域的热门话题,对于投资者、金融机构及研究者而言具有重要意义。高斯过程回归(Gaussian Process Regression, GPR)作为一种强大的非参数贝叶斯回归方法,能够处理复杂的非线性关系,同时提供预测的不确定性估计,非常适合用于股票价格预测。
项目目标:
1. 数据收集与预处理
2. 模型训练
3. 预测与评估
4. 结果展示与报告
技术栈:
scikit-learn
中的预处理和评估工具。框架:
首先,我们需要安装必要的库:
pip install pandas numpy matplotlib scikit-learn
接下来,让我们开始编写代码:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import RBF, WhiteKernel, ConstantKernel as C
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
from sklearn.model_selection import TimeSeriesSplit
from sklearn.externals import joblib # For saving the model
# 读取数据
def load_data(file_path):
data = pd.read_csv(file_path)
return data
# 数据预处理
def preprocess_data(data):
# 填充缺失值
data.fillna(method='ffill', inplace=True)
# 创建特征
data['Date'] = pd.to_datetime(data['Date'])
data['Date'] = (data['Date'] - data['Date'].min()) / np.timedelta64(1,'D')
# 添加更多特征
data['Volume'] = data['Volume'].apply(lambda x: np.log(x + 1))
data['Return'] = data['Close'].pct_change().shift(-1)
data.dropna(inplace=True)
X = data[['Date', 'Volume', 'Return']].values
y = data['Close'].values
return X, y
# 训练模型
def train_model(X_train, y_train):
kernel = C(1.0, (1e-3, 1e3)) * RBF(10, (1e-2, 1e2)) + WhiteKernel(noise_level=1, noise_level_bounds=(1e-5, 1e1))
gpr = GaussianProcessRegressor(kernel=kernel, n_restarts_optimizer=9)
gpr.fit(X_train, y_train)
return gpr
# 优化模型参数
def optimize_hyperparameters(X_train, y_train):
param_grid = {
"kernel": [C(1.0, (1e-3, 1e3)) * RBF(10, (1e-2, 1e2)),
C(1.0, (1e-3, 1e3)) * RBF(5, (1e-2, 1e2)),
C(1.0, (1e-3, 1e3)) * RBF(15, (1e-2, 1e2))],
"alpha": np.logspace(-2, 0, 10),
"n_restarts_optimizer": [0, 1, 2, 5, 9]
}
cv = TimeSeriesSplit(n_splits=5)
grid_search = GridSearchCV(GaussianProcessRegressor(), param_grid, cv=cv, scoring='neg_mean_squared_error')
grid_search.fit(X_train, y_train)
best_params = grid_search.best_params_
print("Best Parameters:", best_params)
best_gpr = GaussianProcessRegressor(**best_params)
best_gpr.fit(X_train, y_train)
return best_gpr
# 评估模型
def evaluate_model(gpr, X_test, y_test):
y_pred, sigma = gpr.predict(X_test, return_std=True)
mse = mean_squared_error(y_test, y_pred)
mae = mean_absolute_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print(f"Mean Squared Error: {mse}")
print(f"Mean Absolute Error: {mae}")
print(f"R^2 Score: {r2}")
return y_pred
# 滚动窗口预测
def rolling_window_forecast(model, X, y, window_size=10, horizon=5):
predictions = []
for i in range(window_size, len(X)-horizon):
X_train, y_train = X[:i], y[:i]
model.fit(X_train, y_train)
y_pred, _ = model.predict(X[i:i+horizon])
predictions.extend(y_pred)
return np.array(predictions)
# 可视化结果
def plot_results(X_train, y_train, X_test, y_test, y_pred, y_rolling=None):
plt.figure(figsize=(12, 6))
plt.scatter(X_train[:, 0], y_train, c='k', label='data')
plt.plot(X_test[:, 0], y_pred, c='r', label='prediction')
if y_rolling is not None:
plt.plot(X_test[window_size:, 0], y_rolling, c='g', linestyle='--', label='rolling prediction')
plt.plot(X_test[:, 0], y_test, 'b:', label=u'ground truth')
plt.legend()
plt.show()
# 保存模型
def save_model(model, filename):
joblib.dump(model, filename)
# 加载模型
def load_model(filename):
return joblib.load(filename)
# 主函数
if __name__ == '__main__':
file_path = 'stock_data.csv' # 假设这是股票数据文件路径
data = load_data(file_path)
X, y = preprocess_data(data)
# 划分训练集和测试集
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, shuffle=False)
# 参数优化
gpr = optimize_hyperparameters(X_train, y_train)
# 评估模型
y_pred = evaluate_model(gpr, X_test, y_test)
# 滚动窗口预测
window_size = 10
horizon = 5
y_rolling = rolling_window_forecast(gpr, X, y, window_size=window_size, horizon=horizon)
# 可视化结果
plot_results(X_train, y_train, X_test, y_test, y_pred, y_rolling)
# 保存模型
save_model(gpr, 'gpr_model.pkl')
# 加载模型
loaded_gpr = load_model('gpr_model.pkl')
y_pred_loaded, _ = loaded_gpr.predict(X_test, return_std=True)
print("Prediction using loaded model:")
evaluate_model(loaded_gpr, X_test, y_test)
GridSearchCV
进行超参数优化,这里使用了时间序列交叉验证TimeSeriesSplit
,因为它更适合时间序列数据。rolling_window_forecast
函数,使其能够预测未来多个时间点的价格,而不是仅仅预测下一个时间点。r2_score
作为额外的评估指标,这有助于衡量模型的解释能力。joblib
库来保存训练好的模型,并能够加载模型进行预测。以上代码提供了一个基本的框架来使用高斯过程回归预测股票价格。需要注意的是,股票市场是非常复杂的,单一的模型很难准确预测其未来走势。通常会结合多种模型和技术来进行预测,同时需要大量的历史数据以及市场分析才能得到较为可靠的结果。
如果文章内容对您有所触动,别忘了点赞、关注,收藏!
推荐阅读:
1.【人工智能】项目实践与案例分析:利用机器学习探测外太空中的系外行星
2.【人工智能】利用TensorFlow.js在浏览器中实现一个基本的情感分析系统
3.【人工智能】TensorFlow lite介绍、应用场景以及项目实践:使用TensorFlow Lite进行数字分类
4.【人工智能】使用NLP进行语音到文本的转换和主题的提取项目实践及案例分析一
5.【人工智能】使用NLP进行语音到文本的转换和主题的提取项目实践及案例分析二