对股票的投资者情绪的研究,大部分都是通过对股票评论的内容进行情感分析来获取的,之前看到一篇论文是以隔夜收益率、个股换手率增长额、指令不平衡这 3 个指标构建投资者情绪指标,这种方法比情感分析要方便很多,所以决定用Python实现一下试试。以上三个指标的数据可以通过tushare接口获取。tushare ID:425652
所需计算的指标为以下三个:
隔夜收益率:交易日开盘价与前一交易日收盘价的差额除以前一交易日收盘价(单位:%)
换手率增长额:交易日个股换手率与前一交易日个股换手率的差额(换手率:日度个股成交金额与日度个股流通市值的比值,单位:%)
指令不平衡:交易日个股买入总成交量与卖出总成交量的差额(单位:千万)
tushare数据接口提供的数据及说明可见:https://waditu.com/document/2
需要获取的数据包括:
股票开盘价open;股票前一交易日收盘价pre_close;股票换手率turnover_rate;净流入额net_mf_amount。
daily = pro.daily(ts_code='002149.SZ', start_date='20161101', end_date='20200831', fields='trade_date,open,pre_close')
daily_basi = pro.daily_basic(ts_code='002149.SZ', start_date='20161101', end_date='20200831', fields='trade_date,turnover_rate')
3.净流入额数据通过moneyflow接口获取
moneyflow = pro.moneyflow(ts_code='002149.SZ', start_date='20161101', end_date='20200831', fields='trade_date,net_mf_amount')
4.合并以上dataframe并计算相关指标
data = pd.merge(daily, daily_basic, on='trade_date', how='left')
data = pd.merge(data, moneyflow, on='trade_date', how='left')
data['date'] = pd.to_datetime(data['trade_date'], format='%Y%m%d')
data['date'] = data.date.dt.date
data = data.set_index('date')
data = data.sort_index(ascending=True)
data['inter_return'] = 100*(data['open'] - data['pre_close'])/data['pre_close']
data['turn_growth'] = data['turnover_rate'].rolling(2).apply(lambda x: x[1] - x[0])
data['net_mf_amount'] = data['net_mf_amount']/1000
从sklearn包中导入主成分分析函数
from sklearn.decomposition import PCA
设置主成分个数为1个,即n_components=1,其他参数为默认值
pca = PCA(n_components=1)
训练模型,并得到投资者情绪变量
sentiment = pca.fit(data).transform(data)
data['投资者情绪'] = sentiment
完整代码如下:
import pandas as pd
import tushare as ts
from sklearn.decomposition import PCA
# 从tushare获取数据
token = '**********************************'
pro = ts.pro_api(token)
daily = pro.daily(ts_code='002149.SZ', start_date='20161101', end_date='20200831', fields='trade_date, open, pre_close')
daily_basic = pro.daily_basic(ts_code='002149.SZ', start_date='20161101', end_date='20200831', fields='trade_date,turnover_rate')
moneyflow = pro.moneyflow(ts_code='002149.SZ', start_date='20161101', end_date='20200831', fields='trade_date,net_mf_amount')
# 计算相应指标
data = pd.merge(daily, daily_basic, on='trade_date', how='left')
data = pd.merge(data, moneyflow, on='trade_date', how='left')
data['date'] = pd.to_datetime(data['trade_date'], format='%Y%m%d')
data['date'] = data.date.dt.date
data = data.set_index('date')
data = data.sort_index(ascending=True)
data['inter_return'] = 100*(data['open'] - data['pre_close'])/data['pre_close']
data['turn_growth'] = data['turnover_rate'].rolling(2).apply(lambda x: x[1] - x[0])
data['net_mf_amount'] = data['net_mf_amount']/1000
data = data.dropna(axis=0, how='any')
data = data[['inter_return', 'turn_growth', 'net_mf_amount']]
data.columns = ['隔夜收益率', '换手率增长额', '指令不平衡']
# 主成分分析构建投资者情绪指标
pca = PCA(n_components=1)
sentiment = pca.fit(data).transform(data)
print('各主成分贡献度:{}'.format(pca.explained_variance_ratio_))
data['投资者情绪'] = sentiment
data.to_excel('./投资者情绪.xlsx')