爬虫爬去数据后,使用matlab的神经网络框架和径向基网络框架,通过前五天股市的开盘价预测后一天的收盘价。
pip3安装baostock pandas xlwt
爬取选定时期的股票当天开盘价、收盘价、当天收盘价等参数
并生成训练集及测试集
import baostock as bs
import pandas as pd
import sys
try:
stock = sys.argv[1]
print(stock)
except:
stock='sh.000001' #可选择股票
#if(sys.argv[1] == 1){
# stock = 'sh.' + str(stock2)
# }
#else{stock = 'sz.' + str(stock2)}
# 登陆系统
lg = bs.login()
# 显示登陆返回信息
print('login respond error_code:'+lg.error_code)
print('login respond error_msg:'+lg.error_msg)
# 获取股票信息
#rs = bs.query_hs300_stocks()
#rs = bs.query_all_stock('2020-12-04')
rs = bs.query_history_k_data_plus(stock,
"date,code,open,high,low,close,preclose,volume,amount,adjustflag,turn,tradestatus,pctChg",
start_date='2019-01-01', end_date='2019-12-20',
frequency="d", adjustflag="3")
print('query_hs300 error_code:'+rs.error_code)
print('query_hs300 error_msg:'+rs.error_msg)
# 打印结果集
hs300_stocks = []
while (rs.error_code == '0') & rs.next():
# 获取一条记录,将记录合并在一起
hs300_stocks.append(rs.get_row_data())
result = pd.DataFrame(hs300_stocks, columns=rs.fields)
# 结果集输出到csv文件
result.to_csv("./stock_info_train.csv", encoding="gbk", index=False)
result.to_excel("./stock_info_train.xls")
print(result)
# 获取股票信息
#rs = bs.query_hs300_stocks()
#rs = bs.query_all_stock('2020-12-04')
#601857
rs = bs.query_history_k_data_plus(stock,
"date,code,open,high,low,close,preclose,volume,amount,adjustflag,turn,tradestatus,pctChg",
start_date='2020-06-01', end_date='2029-7-20',
frequency="d", adjustflag="3")#超过会自动截止到最新日期
print('query_hs300 error_code:'+rs.error_code)
print('query_hs300 error_msg:'+rs.error_msg)
# 打印结果集
hs300_stocks = []
while (rs.error_code == '0') & rs.next():
# 获取一条记录,将记录合并在一起
hs300_stocks.append(rs.get_row_data())
result = pd.DataFrame(hs300_stocks, columns=rs.fields)
# 结果集输出到csv文件
result.to_csv("./stock_info_test.csv", encoding="gbk", index=False)
result.to_excel("./stock_info_test.xls")
print(result)
# 登出系统
bs.logout()
clear all;
%stock = 'sh.002415';
%stock=inputdlg('input the code of stock');
system('conda activate tushare')
system('python get_stock_info.py sz.608891')
%[~, numdata]=xlsread('stock_info_train.xls',1,'D2:N200');%读取历史数据
numdata1=csvread('stock_info_train.csv',1,2,[1 2 199 6])%读取历史数据
%numdata2=csvread('stock_info_train.csv',1,7,[1 7 199 8])%读取历史数据
numdata3=csvread('stock_info_train.csv',1,10,[1 10 199 12])%读取历史数据
numdata=[numdata1 numdata3];
numdata;
[~, date]=xlsread('stock_info_train.xls',1,'B2:B200');%读取日期
for i=1:1:193
P(i,:)=numdata(i+5,4);
end
T = 1:1:40
for i=1:1:194
ram = []
for j=1:1:5
ram = [ram numdata(i+j-1,:)]
end
T = [T ; ram]
end
T=T((2:1:194),:);
T=T';
P=P'
[Tn,minT,maxT,Pn,minP,maxP] = premnmx(T,P); %数据归一化处理
%建立神经网络
net=newff(minmax(Tn),[150,1],{'purelin','purelin'},'trainlm');
net.trainparam.show=50; %显示迭代过程
net.trainparam.lr=0.005; %学习率
net.trainparam.epochs=3000; %最大训练次数
net.trainparam.min_grad=1e-14; %最大训练次数
net.trainparam.goal=1e-12; %训练要求精度
net.trainparam.mc=0; %动量因子
[net,tr]=train(net ,Tn,Pn); %训练bp网络
net = newrb(Tn,Pn,0,1,100); %建立径向基网络
%read the test data
numdata_test1=csvread('stock_info_test.csv',1,2,[1 2 199 6]);%读取历史数据
numdata_test2=csvread('stock_info_test.csv',1,10,[1 10 199 12]);%读取历史数据
numdata_test=[numdata_test1 numdata_test2];
[~, date_test]=xlsread('stock_info_test.xls',1,'B2:B200');%读取日期
for i=1:1:193
P_test(i,:)=numdata_test(i+5,4);
end
T_test = 1:1:40;
for i=1:1:194
ram = [];
for j=1:1:5
ram = [ram numdata_test(i+j-1,:)];
end
T_test = [T_test ; ram];
end
T_test=T_test((2:1:194),:);
T_test=T_test';
P_test=P_test'
% numdataT_test = numdata_test';
% P_test = numdataT_test(4,:);%目标值
% T_test = numdataT_test;
% P_test = P_test(2:99);
% T_test = T_test(:,(1:1:98));
[Tn_test,minT_test,maxT_test,Pn_test,minP_test,maxP_test] = premnmx(T_test,P_test); %数据归一化处理
Out2 = sim(net, Tn_test);
%a1 = (1:1:93);
a1 = (1:1:193);
a2=postmnmx(Out2,minP_test,maxP_test);
plot(a1,P_test,'b-',a1,a2,'r-');
%plot(a1,P_test,'.');
title('预测每股价格','FontSize',12);
xlabel('统计时间2014.9.1-2015.11.10','FontSize',10);
ylabel('每股股价','FontSize',10);
%hold on
%plot(a1,a2,'r--');
legend('实际值','预测值');
clc
数据的归一化还是非常有必要的,他可使不同量纲下的数据归一化到合理的范围,有利于模型的泛化性,不易造成梯度爆炸,同时标显著减少了计算量,帮助模型更快收敛。
12.径向基网络和BP网络相比较,径向基网络在创建时更加快速,但BP网络在后续数据预测方面更有优势。
13.股市有风险,投资需谨慎。