实证资产定价(Empirical asset pricing)已经发布于Github. 包的具体用法(Documentation)博主将会陆续在CSDN中详细介绍。
Github: GitHub - whyecofiliter/EAP: empirical asset pricing
自Fama and French (2015)引入以来,投资因素逐渐流行起来。它还包括在HXZ的模型(2015)中,Zhang (2017) 将其扩展到ICAPM。在其流行之前,Titman et al. (2014)是该因素的早期研究者之一,他们使用异常资本投资作为代理变量。然而,在随后的文献中,大多数研究使用资产增长率作为代理变量,包括Fama and French (2015) 和Hou et al. (2015)。在发达国家的市场中,投资因素与未来收益呈负相关,而在大多数发展中国家,这种关系更为密切。在中国市场,大多数文献都不存在显著的投资效应(Guo et al., 2017; Qiao, 2019; Liu et al., 2019)。
在这个demo中,年度资产增长率被用作盈利能力因子的代理变量,盈利能力因子是根据财务数据和衍生工具比率计算出来的。数据集始于2004年1月,从CSMAR数据集中收集。警告:请勿将此演示中的数据集用于任何商业目的。
# %% import package
from numpy import dtype
import pandas as pd
import sys, os
sys.path.append(os.path.abspath(".."))
# %% import data
# Monthly return of stocks in China security market
month_return = pd.read_hdf('.\data\month_return.h5', key='month_return')
company_data = pd.read_hdf('.\data\last_filter_pe.h5', key='data')
对数据进行一些预处理。
# %% preprocessing data
# forward the monthly return for each stock
# emrwd is the return including dividend
month_return['emrwd'] = month_return.groupby(['Stkcd'])['Mretwd'].shift(-1)
# emrnd is the return including no dividend
month_return['emrnd'] = month_return.groupby(['Stkcd'])['Mretnd'].shift(-1)
# select the A share stock
month_return = month_return[month_return['Markettype'].isin([1, 4, 16])]
# % distinguish the stocks whose size is among the up 30% stocks in each month
def percentile(stocks) :
return stocks >= stocks.quantile(q=.3)
month_return['cap'] = month_return.groupby(['Trdmnt'])['Msmvttl'].apply(percentile)
年度资产增长率被用作盈利能力系数的代理变量,数据由财务数据和衍生财务比率计算得出。
# %% calculate the total asset
# asset = debt + equity
# debt = company_value - market_value
# equity = market_value / PB
company_data['debt'] = company_data['EV1'] - company_data['MarketValue']
company_data['equity'] = company_data['MarketValue']/company_data['PBV1A']
company_data['asset'] = company_data['debt'] + company_data['equity']
# asset growth rate
company_data['asset_growth_rate'] = company_data['asset'].groupby(['Symbol']).diff(12)/company_data['asset']
进一步数据预处理。
# %% prepare merge data
from pandas.tseries.offsets import *
month_return['Stkcd_merge'] = month_return['Stkcd'].astype(dtype='string')
month_return['Date_merge'] = pd.to_datetime(month_return['Trdmnt'])
#month_return['Yearmonth'] = month_return['Date_merge'].map(lambda x : 1000*x.year + x.month)
#month_return['Date_merge'] += MonthEnd()
company_data['Stkcd_merge'] = company_data['Symbol'].dropna().astype(dtype='int').astype(dtype='string')
company_data['Date_merge'] = pd.to_datetime(company_data['TradingDate'])
#company_data['Yearmonth'] = company_data['Date_merge'].map(lambda x : 1000*x.year + x.month)
company_data['Date_merge'] += MonthBegin()
# %% dataset starts from '2000-01'
company_data = company_data[company_data['Date_merge'] >= '2000-01']
month_return = month_return[month_return['Date_merge'] >= '2000-01']
return_company = pd.merge(company_data, month_return, on=['Stkcd_merge', 'Date_merge'])
构成了两个数据集。一个包括尾部30%的股票,而另一个不包括尾部30%的股票。附单变量分析和双变量分析。
# %% construct test_data for bivariate analysis
# dataset 1 : no tail stocks & ROE Bivariate
from portfolio_analysis import Bivariate, Univariate
import numpy as np
# select stocks whose size is among the up 30% stocks in each month and whose trading
# days are more than or equal to 10 days
test_data_1 = return_company[(return_company['cap']==True) & (return_company['Ndaytrd']>=10)]
test_data_1 = test_data_1[['emrwd', 'Msmvttl', 'asset_growth_rate', 'Date_merge']].dropna()
test_data_1 = test_data_1[(test_data_1['Date_merge'] >= '2004-01-01') & (test_data_1['Date_merge'] <= '2019-12-01')]
# Univariate analysis
uni_1 = Univariate(np.array(test_data_1[['emrwd', 'asset_growth_rate', 'Date_merge']]), number=9)
uni_1.summary_and_test()
uni_1.print_summary_by_time()
uni_1.print_summary()
====================================================================================================
+---------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+
| Group | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | Diff |
+---------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+
| Average | 0.011 | 0.012 | 0.013 | 0.013 | 0.015 | 0.014 | 0.013 | 0.015 | 0.015 | 0.016 | 0.005 |
| T-Test | 1.393 | 1.655 | 1.783 | 1.879 | 2.054 | 1.985 | 1.955 | 2.162 | 2.064 | 2.152 | 1.907 |
+---------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+
====================================================================================================
# Bivariate analysis
bi_1 = Bivariate(np.array(test_data_1), number=4)
bi_1.average_by_time()
bi_1.summary_and_test()
bi_1.print_summary_by_time()
bi_1.print_summary()
==============================================================
+-------+--------+--------+--------+--------+--------+-------+
| Group | 1 | 2 | 3 | 4 | 5 | Diff |
+-------+--------+--------+--------+--------+--------+-------+
| 1 | 0.015 | 0.017 | 0.018 | 0.018 | 0.02 | 0.005 |
| | 1.848 | 2.119 | 2.336 | 2.404 | 2.482 | 1.985 |
| 2 | 0.012 | 0.014 | 0.017 | 0.015 | 0.019 | 0.007 |
| | 1.509 | 1.784 | 2.301 | 1.984 | 2.434 | 2.8 |
| 3 | 0.01 | 0.012 | 0.015 | 0.014 | 0.014 | 0.004 |
| | 1.314 | 1.695 | 2.026 | 1.884 | 1.912 | 1.862 |
| 4 | 0.009 | 0.01 | 0.011 | 0.013 | 0.015 | 0.006 |
| | 1.194 | 1.507 | 1.579 | 1.831 | 2.009 | 2.45 |
| 5 | 0.007 | 0.01 | 0.011 | 0.014 | 0.012 | 0.005 |
| | 1.03 | 1.517 | 1.685 | 2.106 | 1.749 | 1.7 |
| Diff | -0.008 | -0.007 | -0.008 | -0.005 | -0.007 | 0.0 |
| | -1.902 | -1.646 | -1.897 | -1.213 | -1.771 | 0.088 |
+-------+--------+--------+--------+--------+--------+-------+
==============================================================
数据集#1的结果与文献一致,即在单变量分析中,由于t值低于2.3,差异收益不显著,而在双变量分析中,由于t值低于2.3,差异收益在很大程度上不显著,这表明投资因子不提供超额收益。
# %% construct test_data for bivariate analysis
# dataset 2 : tail stocks & ROE Bivariate
from portfolio_analysis import Bivariate, Univariate
import numpy as np
# select stocks whose size is among the up 30% stocks in each month and whose trading
# days are more than or equal to 10 days
test_data_2 = return_company[return_company['Ndaytrd']>=10]
test_data_2 = test_data_2[['emrwd', 'Msmvttl', 'asset_growth_rate', 'Date_merge']].dropna()
test_data_2 = test_data_2[(test_data_2['Date_merge'] >= '2004-01-01') & (test_data_2['Date_merge'] <= '2019-12-01')]
# Univariate analysis
uni_2 = Univariate(np.array(test_data_2[['emrwd', 'asset_growth_rate', 'Date_merge']]), number=9)
uni_2.summary_and_test()
uni_2.print_summary_by_time()
uni_2.print_summary()
====================================================================================================
+---------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+
| Group | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | Diff |
+---------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+
| Average | 0.017 | 0.017 | 0.017 | 0.017 | 0.017 | 0.017 | 0.016 | 0.017 | 0.017 | 0.018 | 0.001 |
| T-Test | 2.052 | 2.204 | 2.301 | 2.303 | 2.323 | 2.33 | 2.249 | 2.392 | 2.283 | 2.411 | 0.313 |
+---------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+
====================================================================================================
# Bivariate analysis
bi_2 = Bivariate(np.array(test_data_2), number=4)
bi_2.average_by_time()
bi_2.summary_and_test()
bi_2.print_summary_by_time()
bi_2.print_summary()
===============================================================
+-------+--------+--------+--------+--------+--------+-------+
| Group | 1 | 2 | 3 | 4 | 5 | Diff |
+-------+--------+--------+--------+--------+--------+-------+
| 1 | 0.027 | 0.026 | 0.027 | 0.027 | 0.027 | 0.0 |
| | 3.113 | 3.25 | 3.257 | 3.312 | 3.372 | 0.079 |
| 2 | 0.015 | 0.019 | 0.02 | 0.021 | 0.021 | 0.006 |
| | 1.885 | 2.331 | 2.482 | 2.706 | 2.674 | 2.551 |
| 3 | 0.012 | 0.014 | 0.017 | 0.015 | 0.017 | 0.005 |
| | 1.561 | 1.788 | 2.198 | 2.067 | 2.286 | 2.264 |
| 4 | 0.009 | 0.01 | 0.013 | 0.013 | 0.015 | 0.005 |
| | 1.271 | 1.475 | 1.745 | 1.888 | 1.999 | 2.397 |
| 5 | 0.007 | 0.011 | 0.01 | 0.012 | 0.013 | 0.006 |
| | 0.987 | 1.729 | 1.582 | 1.882 | 1.83 | 2.2 |
| Diff | -0.02 | -0.015 | -0.017 | -0.014 | -0.014 | 0.006 |
| | -4.431 | -3.522 | -3.695 | -3.205 | -3.197 | 1.813 |
+-------+--------+--------+--------+--------+--------+-------+
===============================================================
数据集#2的结果与文献一致,即在单变量分析中,由于t值低于2.3,差异收益不显著,而在双变量分析中,由于t值低于2.3,差异收益在很大程度上不显著,这表明投资因子不提供超额收益。