先简单介绍一下尤金·法玛:
尤金·法玛(Eugene F. Fama),著名经济学家、金融经济学领域的思想家,芝加哥经济学派代表人物之一、芝加哥大学教授,2013年诺贝尔经济学奖得主.
1990年代初,法玛与麻省理工学院的肯尼斯·弗伦奇(Kenneth French)教授合作,检验了几种可选择的财务数据能够提高股票回报与经济活动的预测度。他们在宏观经济层面上检验这些数据,同时又在公司水平上检验,例如股利的产生;又在股票回报与商业波动中检验他们的相互关系。除了考虑股票价格相对指数的波动外,他们还考虑了公司的规模以及市净率分类等因素,构建了包含市场因子、规模因子和价值因子的三因素模型。三因素模型的提出的确解释了CAPM模型不能解释的“异常”问题;三因素模型还可用来测度基金的业绩,以考察基金经理的投资能力。
模型认为,一个投资组合(包括单个股票)的超额回报率可由它对三个因子的暴露来解释,这三个因子是:市场资产组合(Rm − Rf)、市值因子(SMB)、账面市值比因子(HML)。这个多因子均衡定价模型可以表示为:
E(R_it) - R_ft = βi[E(R_mt) - R_ft] + siSMB_t + hi*HMI_t
其中R_ft表示时间t的无风险收益率;R_mt表示时间t的市场收益率;R_it表示资产i在时间t的收益率;E(Rmt) − Rft是市场风险溢价,SMB_t为时间t的市值(size)因子收益指数(Small comp return minus Big comp return),HMI_t为时间t的账面市值比(book—to—market)因子收益指数(High btm return minus Low btm return)。
βi、si和hi分别是三个因子的系数,回归模型表示如下:
Rit − Rft = αi + βi(Rmt − Rft) + siSMBt + hiHMIt + εit
1、理论假设
在探讨Fama—French三因子模型的应用时,是以“有限理性”理论假设为基础。并在此基础上得出若干基本假定:
(1)存在着大量投资者;
(2)所有投资者都在同一证券持有期计划自己的投资资产组合;
(3)投资者投资范围仅限于公开金融市场上交易的资产;
(4)不存在证券交易费用(佣金和服务费用等)及税赋;
(5)投资者们对于证券回报率的均值、方差及协方差具有相同的期望值;
(6)所有投资者对证券的评价和经济局势的看法都一致。
2、统计假设
从模型的表达式可以看出,FF模型属于多元回归模型。其基本假设为:
(1)(Rm − Rf)、SMB、HML与随机误差项u不相关;
(2)零均值假定:E(ξi) = 0;
(3)同方差假定,即ξ的方差为一常量:Var(ξi) = S2;
(4)无自相关假定:cov(ξi,ξj)=0,i!=j;
(5)解释变量之间不存在线性相关关系。即两个解释变量之间无确切的线性关系;
(6)假定随机误差项u服从均值为零,方差为S2正态分布,即ξi˜N(0,S2)。
作为对Python统计分析功能的尝试,我希望能够运用Ricequant的平台和数据再现Fama model,验证其合理性以及对其进行扩展(比如Fama自己搞出来的五因素模型),并尝试由此引伸出可行的交易策略。作为第一篇,我在这份notebook中主要做了数据的整理以及一个简单的回归验证。由于我刚刚上手Python的Statsmodels模块,末尾的统计出现了一些bug,还望有大牛前来指正。
下面我们开始构建模型所需的数据: 根据模型,我们需要财务文档里的市值和book value per share两个指标。另外需要个股收盘价和市场收盘价来计算Return。Let’s begin~
import numpy as np
import pandas as pd
import scipy
import math
import statsmodels
dp = get_fundamentals(query(fundamentals.eod_derivative_indicator.market_cap,fundamentals.financial_indicator.book_value_per_share),'2010-11-01', '60m')
market = all_instruments()
price_data = get_price(list(market['order_book_id']),start_date='2005-11-01', end_date='2010-10-30',frequency='1d',fields='ClosingPx')
ERROR: cannot find day bar file for instrument: 603861.XSHG, most likely there’s no country data for it.
ERROR: cannot find day bar file for instrument: 300484.XSHE, most likely there’s no country data for it.
price_data = get_price(list(market['order_book_id']),start_date='2005-11-01', end_date='2010-11-05',frequency='1d',fields='ClosingPx')
ERROR: cannot find day bar file for instrument: 603861.XSHG, most likely there’s no country data for it.
ERROR: cannot find day bar file for instrument: 300484.XSHE, most likely there’s no country data for it.
OK, 现在我们取出了2010-11-01到2015-10-30之间的数据,采用月度数据,因此观测值有60个,满足OLS回归的基本需求了。
index_date = dp['book_value_per_share'].index
stocks = dp['book_value_per_share'].columns
Index(['2010-11-01', '2010-09-30', '2010-09-01', '2010-07-30', '2010-07-01',
'2010-06-01', '2010-04-30', '2010-04-01', '2010-03-01', '2010-02-01',
'2009-12-31', '2009-12-01', '2009-10-30', '2009-09-30', '2009-09-01',
'2009-07-31', '2009-07-01', '2009-06-01', '2009-04-30', '2009-04-01',
'2009-02-27', '2009-01-23', '2008-12-31', '2008-12-01', '2008-10-31',
'2008-09-26', '2008-09-01', '2008-08-01', '2008-07-01', '2008-05-30',
'2008-04-30', '2008-04-01', '2008-02-29', '2008-02-01', '2007-12-28',
'2007-11-30', '2007-11-01', '2007-09-28', '2007-08-31', '2007-08-01',
'2007-06-29', '2007-06-01', '2007-04-30', '2007-03-30', '2007-03-01',
'2007-02-01', '2006-12-29', '2006-12-01', '2006-11-01', '2006-09-29',
'2006-09-01', '2006-08-01', '2006-06-30', '2006-06-01', '2006-04-28',
'2006-03-31', '2006-03-01', '2006-01-25', '2005-12-30', '2005-12-01',
'2005-11-01'],
dtype='object')
#调整nan值为零
where_are_NaNs = np.isnan(dp['book_value_per_share'])
dp['book_value_per_share'][where_are_NaNs] = 0
where_are_NaNs = np.isnan(dp['market_cap'])
dp['market_cap'][where_are_NaNs] = 0
where_are_NaNs = np.isnan(price_data)
price_data[where_are_NaNs] = 0
接下来我们用book value per share 和市场价格来计算book to market ratio,这一数值越大,往往收益越高,即所谓BM效应
##btm stores the index book_to_market ratio
btm = pd.DataFrame(index=index_date,columns=stocks)
for stk in stocks:
for date in index_date:
if price_data[stk][date]!=0:
btm[stk][date] = dp['book_value_per_share'][stk][date]/price_data[stk][date]
else:
btm[stk][date] = 0
SMB、HMI两个指标的计算方式如下:
1、首先,按市值大小平均分为两组(Small 组, Big 组),基准是这一时间的市场上公司市值中位数;
2、按 BM 从小到大分三组,即前 30%(Growth 组),中间 40%(Neutral 组),后 30%(Value 组);
3、每个组的月回报以组内所有成员股票当月回报的加权平均数为依据,计算每个月的 SMB 和 HML 值。具体计算公式如下:
SMB = Small size return - Big size return
HML = Value company return - Growth company return
4、以市场回报减去无风险回报,即得到超额市场回报(Rm-Rf)。(由于Ricequant平台无法提取国债收益率,我只能对这个时段的所有IBO1M求平均值作为无风险回报率)
##calculate market size medians
median_size = {}
for date in index_date:
median_size[date] = np.median(dp['market_cap'].loc[date])
##calculate the 30%, 70% quantile of book to market ratio each month
smark = {}
hmark = {}
for date in index_date:
smark[date] = np.percentile(list(btm.loc[date]),30)
hmark[date] = np.percentile(list(btm.loc[date]),70)
for date in index_date:
if smark[date] == 0:
smark[date]=hmark[date]/2
#return of each stock
return_data = pd.DataFrame(index=index_date,columns=stocks)
for stk in stocks:
i = 0
for date in index_date:
if date=='2005-11-01' or price_data[stk][index_date[i+1]]==0:
return_data[stk][date] = 0
else:
return_data[stk][date] = price_data[stk][index_date[i]]/price_data[stk][index_date[i+1]] - 1
i = i + 1
##calculate SMB and HMI
smb = pd.Series(index = index_date)
hmi = pd.Series(index = index_date)
first_row = True
for date in index_date:
if first_row:
smb[date]=0
hmi[date]=0
first_row=False
continue
small_size=0.0
big_size=0.0
value_btm=0.0
growth_btm=0.0
for stk in stocks:
if dp['market_cap'][stk][date]date]:
small_size = small_size + return_data[stk][date]*dp['market_cap'][stk][date]
else:
big_size = big_size + return_data[stk][date]*dp['market_cap'][stk][date]
if btm[stk][date]date]:
growth_btm = growth_btm + return_data[stk][date]*dp['market_cap'][stk][date]
elif btm[stk][date]>hmark[date]:
value_btm = value_btm + return_data[stk][date]*dp['market_cap'][stk][date]
mktcap = np.sum(dp['market_cap'].loc[date])
smb[date] = (small_size - big_size)/mktcap
hmi[date] = (value_btm - growth_btm)/mktcap
#market (HuShen300)
Rm_data = get_price(['000300.XSHG'],start_date='2005-11-01', end_date='2010-11-05',frequency='1d',fields='ClosingPx')
#market return(HuShen300) and risk free return(0.375724091% 1M, for I can't get this rate from Ricequant nor can I upload my own data)
Rm = pd.Series(index = index_date)
Rf = pd.Series(0.00375724091,index = index_date)
for i in range(61):
if i==60:
Rm[index_date[i]] = 0
Rf[index_date[i]] = 0
continue
Rm[index_date[i]] = Rm_data.loc[index_date[i]]/Rm_data.loc[index_date[i+1]] - 1
OK now we finally got the whole data collection! (是的卤煮我为了整理出能跑回归的数据集也是操碎了心=。=好多股票数据都是Nan的说=。=) 到这里就可以开始跑回归啦。撸主使用的是statsmodels这个模块,它的OLS回归需要把数据整理成矩阵形式。然而卤煮一直没弄明白自己为什么能导出估计的系数列表但是无法直接打印分析结果的summary:一打印就报No loop matching the specified signature and casting was found for ufunc add =.= =.= =.= google了两天了也没明白这句话说明哪里错了,还求大神指点啊!(顺便吐槽statsmodels的document简直就是翔啊啥都没写清楚。。。=.=)
###### Now we can start our regression
import numpy as np
from statsmodels import regression
import statsmodels.api as sm
#we first choose one stock to try the model
#Build the regression matrix
y = np.array(return_data['600000.XSHG'] - Rf)
y = y[1:len(y)-1]
Y = y.T
#x = pd.DataFrame(index=index_date,columns=['Rm-Rf','SMB','HMI'])
Rm_Rf = np.array(Rm-Rf)
Rm_Rf = Rm_Rf[1:60]
SMB = np.array(smb)
SMB = SMB[1:60]
HMI = np.array(hmi)
HMI = HMI[1:60]
X = np.column_stack((Rm_Rf,SMB,HMI))
X = sm.add_constant(X)
mod = regression.linear_model.OLS(Y, X).fit()
a = mod.params
print(a)
[-0.0022684101268798463 0.16417514447020232 -0.97619263840653125
0.2312581654106953]
下面就是打印summary的报错。我检查过数据没有缺失和异常值,矩阵的构建也正确。求大神指点!多多留言!!
OK,后续的话希望能在此基础上试试五因素模型,然后加进去其他的因子不断完善~