4. 马科维茨资产组合模型+Fama-French五因子优化方案(理论+Python实战)

目录

    • 0. 承前
    • 1. Fama-French五因子优化的现代投资组合理论
      • 1.1 What is Fama-French五因子优化的现代投资组合理论
      • 1.2 Why is Fama-French五因子优化的现代投资组合理论
      • 1.3 How to Fama-French五因子优化的现代投资组合理论
    • 2. 数据要素&计算流程
      • 2.1 参数集设置
      • 2.2 数据获取&预处理
      • 2.3 收益率计算
      • 2.4 因子构建与预期收益率计算
      • 2.5 协方差矩阵计算
      • 2.6 投资组合优化
      • 2.7 持仓筛选
    • 3. 汇总代码
    • 4. 反思
      • 4.1 不足之处
      • 4.2 提升思路
    • 5. 启后

0. 承前

本篇博文是对上一篇文章,链接:
3. 马科维茨资产组合模型+Fama-French三因子优化方案(理论+Python实战)
预期收益计算方式进行反思与优化。其实使用Fama-French三因子,计算预期收益,已不存在较大的问题,主要引导大家关于因子对预期收益率的思考。
如果想更加全面清晰地了解金融资产组合模型进化论的体系架构,可参考:
0. 金融资产组合模型进化全图鉴

1. Fama-French五因子优化的现代投资组合理论

1.1 What is Fama-French五因子优化的现代投资组合理论

这是一个在现代投资组合理论(MPT)框架下,使用Fama-French五因子模型优化预期收益率估计的投资组合模型。MPT通过分散投资来平衡风险和收益,而FF5模型则通过考虑:

  • 市场因子(Beta)资本资产定价模型(CAPM)中所选用的唯一因子,表示个股收益与市场走势的关系;
  • 规模因子(SMB):小市值公司(Small Cap)与大市值公司(Big Cap)之间的收益差异,与市场走势的关系;
  • 价值因子(HML):高账面市值比(High Book-to-Market Ratio,即高B/M或价值股)与低账面市值比(Low B/M,即成长股)之间的收益差异,与市场走势的关系。
  • 盈利能力因子(RMW):高盈利能力公司(Robust Profitability)与低盈利能力公司(Weak Profitability)之间的收益差异,与市场走势的关系。
  • 投资风格因子(CMA):投资保守的公司(Conservative Investment)与投资激进的公司(Aggressive Investment)之间的收益差异,与市场走势的关系。

一共五个因子来提供更全面的预期收益率估计。该模型在MPT的投资组合优化框架基础上,引入多因子模型来提高收益预测的准确性。

1.2 Why is Fama-French五因子优化的现代投资组合理论

五因子模型相较于三因子模型多了以下两个因子

  • 盈利能力因子(RMW):高盈利能力的公司通常具有更强的财务稳定性和更高的未来收益预期。盈利能力因子(RMW)通过区分高盈利能力公司和低盈利能力公司,解释了盈利能力对股票收益的影响。这有助于投资者更好地理解不同盈利能力公司的风险和收益特征。

  • 投资风格因子(CMA):投资风格因子(CMA)区分了投资保守的公司和投资激进的公司。投资保守的公司通常财务状况更稳健,风险较低,而投资激进的公司可能面临更高的财务风险。通过引入CMA因子,模型能够更好地解释不同投资风格公司的收益差异。

  • MPT框架优势

    1. 分散投资:通过资产组合降低非系统性风险
    2. 量化优化:使用夏普比率进行组合权重优化
    3. 风险平衡:在收益与风险之间寻找最优平衡点
    4. 科学配置:基于数学模型的客观资产配置方法
  • FF5改进

    1. 多维度因子:同时考虑市场、规模、价值、盈利和投资五个因子
    2. 盈利能力:通过RMW因子捕捉高盈利公司的溢价
    3. 投资风格:通过CMA因子识别保守投资策略的价值
    4. 解释能力:相比FF3模型具有更强的收益解释能力

1.3 How to Fama-French五因子优化的现代投资组合理论

  • 参数集设置

    1. ts.set_token:设置Tushare的API访问令牌
    2. industry:选择目标行业,如"银行"
    3. end_date:回测结束日期,格式为’YYYYMMDD’
    4. years:回测年限,默认5年
    5. risk_free_rate:无风险利率,默认0.03
    6. top_holdings:投资组合持仓数量,默认10只股票
    7. index_code:市场指数代码,默认’000300.SH’
  • 数据准备

    1. 股票行业数据:通过tushare获取指定行业的股票列表
    2. 历史价格数据:获取指定时间段内的股票日线数据
    3. 市场指数数据:获取指定时间段内的市场指数数据
    4. 因子数据:获取市值(Size)和账面市值比(B/M)数据
    5. 财务数据:获取ROE和资产增长率数据
    6. 无风险利率:设定无风险利率参数
  • 计算流程

    1. 数据获取:获取股票、市场指数和因子数据
    2. 收益率计算:计算月度对数收益率
    3. 因子构建:构建SMB、HML、RMW和CMA因子
    4. 因子载荷计算:计算每只股票对五个因子的敏感度
    5. FF5预期收益:使用五因子模型计算预期收益率
    6. 组合优化:最大化夏普比率得到最优权重
    7. 持仓筛选:选取权重最大的N只股票并归一化

2. 数据要素&计算流程

2.1 参数集设置

设置模型所需的基本参数,包括数据获取、回测区间和优化约束等。

# 参数集
ts.set_token('token')
pro = ts.pro_api()
industry = '银行'
end_date = '20240101'
years = 5   # 数据时长
risk_free_rate = 0.03  # 无风险利率参数
top_holdings = 10      # 持仓数量参数
index_code = '000300.SH'  # 市场指数代码参数

2.2 数据获取&预处理

获取股票、市场指数、因子数据和财务数据,并进行必要的数据清洗和格式转换。

def get_industry_stocks(industry):
    """获取指定行业的股票列表"""
    df = pro.stock_basic(fields=["ts_code", "name", "industry"])
    industry_stocks = df[df["industry"]==industry].copy()
    industry_stocks.sort_values(by='ts_code', inplace=True)
    industry_stocks.reset_index(drop=True, inplace=True)
    return industry_stocks['ts_code'].tolist()

def get_data(code_list, end_date, years):
    """获取指定行业名称的历史收盘价数据"""
    ts_code_list = code_list
    end_date_dt = datetime.strptime(end_date, '%Y%m%d')
    start_date_dt = end_date_dt - timedelta(days=years*365)
    start_date = start_date_dt.strftime('%Y%m%d')
    
    all_data = []
    for stock in ts_code_list:
        df = pro.daily(ts_code=stock, start_date=start_date, end_date=end_date)
        all_data.append(df)
    
    combined_df = pd.concat(all_data).sort_values(by=['ts_code', 'trade_date'])
    combined_df.reset_index(drop=True, inplace=True)
    combined_df.rename(columns={'trade_date': 'date'}, inplace=True)
    
    return combined_df

def get_market_data(index_code='000300.SH', start_date=None, end_date=None):
    """获取市场指数数据用于计算贝塔"""
    df_market = pro.index_daily(ts_code=index_code, 
                              start_date=start_date, 
                              end_date=end_date,
                              fields=['trade_date', 'close'])
    df_market['date'] = pd.to_datetime(df_market['trade_date'])
    df_market.set_index('date', inplace=True)
    df_market = df_market.sort_index()
    
    monthly_last_close = df_market['close'].resample('M').last()
    monthly_log_returns = np.log(monthly_last_close).diff().dropna()
    return monthly_log_returns

def get_factor_data(stock_codes, start_date=None, end_date=None):
    """获取指定股票的因子数据(市值和PB)"""
    all_factor_data = []
    for stock in stock_codes:
        try:
            df = pro.daily_basic(
                ts_code=stock,
                start_date=start_date,
                end_date=end_date,
                fields=['ts_code', 'trade_date', 'total_mv', 'pb']
            )
            all_factor_data.append(df)
        except Exception as e:
            print(f"获取股票 {stock} 的因子数据失败: {str(e)}")
            continue
    
    factor_data = pd.concat(all_factor_data, ignore_index=True)
    factor_data['trade_date'] = pd.to_datetime(factor_data['trade_date'])
    return factor_data

def get_fina_data(stock_codes, start_date=None, end_date=None):
    """获取指定股票的财务指标数据(ROE和资产增长率)"""
    all_fina_data = []
    for stock in stock_codes:
        try:
            df = pro.fina_indicator(
                ts_code=stock,
                start_date=start_date,
                end_date=end_date,
                fields=['ts_code', 'end_date', 'roe_dt', 'assets_yoy', 'update_flag']
            )
            all_fina_data.append(df)
        except Exception as e:
            print(f"获取股票 {stock} 的财务数据失败: {str(e)}")
            continue
    
    # 合并数据
    fina_data = pd.concat(all_fina_data, ignore_index=True)
    
    # 处理update_flag,保留最新数据
    fina_data = (fina_data.groupby(['ts_code', 'end_date'])
                         .agg({'roe_dt': 'first', 
                              'assets_yoy': 'first',
                              'update_flag': 'max'})
                         .reset_index())
    
    # 将end_date转换为datetime
    fina_data['end_date'] = pd.to_datetime(fina_data['end_date'])
    
    # 创建季度到月度的映射
    monthly_data = []
    for _, row in fina_data.iterrows():
        quarter_end = row['end_date']
        if quarter_end.month == 3:  # Q1
            months = [quarter_end + pd.DateOffset(months=i) for i in range(1, 4)]
        elif quarter_end.month == 6:  # Q2
            months = [quarter_end + pd.DateOffset(months=i) for i in range(1, 4)]
        elif quarter_end.month == 9:  # Q3
            months = [quarter_end + pd.DateOffset(months=i) for i in range(1, 4)]
        else:  # Q4
            months = [quarter_end + pd.DateOffset(months=i) for i in range(1, 4)]
        
        for month in months:
            monthly_data.append({
                'ts_code': row['ts_code'],
                'trade_date': month,
                'roe_dt': row['roe_dt'],
                'assets_yoy': row['assets_yoy']
            })
    
    monthly_df = pd.DataFrame(monthly_data)
    return monthly_df

2.3 收益率计算

计算月度对数收益率,为后续的因子构建和优化计算做准备。

def calculate_monthly_log_returns(df):
    """计算每月的对数收益率"""
    df['date'] = pd.to_datetime(df['date'])
    monthly_last_close = df.groupby(['ts_code', pd.Grouper(key='date', freq='M')])['close'].last().unstack(level=-1)
    monthly_log_returns = np.log(monthly_last_close).diff().dropna()
    return monthly_log_returns.T

2.4 因子构建与预期收益率计算

构建SMB、HML、RMW和CMA因子,并使用五因子模型计算预期收益率。

def calculate_expected_returns(monthly_log_returns):
    """使用Fama-French五因子模型计算各股票的预期收益率"""
    start_date = monthly_log_returns.index.min().strftime('%Y%m%d')
    end_date = monthly_log_returns.index.max().strftime('%Y%m%d')
    
    # 获取财务数据时,将start_date往前推一个季度,以确保有完整的季度数据
    fina_start_date = (datetime.strptime(start_date, '%Y%m%d') - timedelta(days=90)).strftime('%Y%m%d')
    
    # 获取市场收益率
    market_returns = get_market_data(index_code, start_date, end_date)
    
    # 获取股票的市值和PB数据
    stock_data = get_factor_data(
        monthly_log_returns.columns.tolist(),
        start_date,
        end_date
    )
    
    # 获取财务指标数据,使用提前的start_date
    fina_data = get_fina_data(
        monthly_log_returns.columns.tolist(),
        fina_start_date,
        end_date
    )
    
    # 确保所有数据的日期对齐
    aligned_dates = monthly_log_returns.index.intersection(market_returns.index)
    market_returns = market_returns[aligned_dates]
    stock_returns = monthly_log_returns.loc[aligned_dates].copy()  # 使用copy()避免SettingWithCopyWarning
    
    def calculate_size_factor(date):
        date_data = stock_data[stock_data['trade_date'].dt.to_period('M') == date.to_period('M')]
        median_mv = date_data['total_mv'].median()
        small_returns = stock_returns.loc[date, date_data[date_data['total_mv'] <= median_mv]['ts_code']]
        big_returns = stock_returns.loc[date, date_data[date_data['total_mv'] > median_mv]['ts_code']]
        return small_returns.mean() - big_returns.mean()
    
    def calculate_value_factor(date):
        date_data = stock_data[stock_data['trade_date'].dt.to_period('M') == date.to_period('M')]
        # 创建date_data的副本并计算bm_ratio
        date_data = date_data.copy()
        date_data.loc[:, 'bm_ratio'] = 1 / date_data['pb']
        
        median_bm = date_data['bm_ratio'].median()
        high_returns = stock_returns.loc[date, date_data[date_data['bm_ratio'] > median_bm]['ts_code']]
        low_returns = stock_returns.loc[date, date_data[date_data['bm_ratio'] <= median_bm]['ts_code']]
        return high_returns.mean() - low_returns.mean()
    
    def calculate_profitability_factor(date):
        date_data = fina_data[fina_data['trade_date'].dt.to_period('M') == date.to_period('M')]
        
        median_roe = date_data['roe_dt'].median()
        robust_returns = stock_returns.loc[date, date_data[date_data['roe_dt'] > median_roe]['ts_code']]
        weak_returns = stock_returns.loc[date, date_data[date_data['roe_dt'] <= median_roe]['ts_code']]
        return robust_returns.mean() - weak_returns.mean()
    
    def calculate_investment_factor(date):
        date_data = fina_data[fina_data['trade_date'].dt.to_period('M') == date.to_period('M')]
        
        median_growth = date_data['assets_yoy'].median()
        conservative_returns = stock_returns.loc[date, date_data[date_data['assets_yoy'] <= median_growth]['ts_code']]
        aggressive_returns = stock_returns.loc[date, date_data[date_data['assets_yoy'] > median_growth]['ts_code']]
        return conservative_returns.mean() - aggressive_returns.mean()
    
    # 计算每个月的因子收益
    smb_factor = pd.Series([calculate_size_factor(date) for date in aligned_dates], index=aligned_dates)
    hml_factor = pd.Series([calculate_value_factor(date) for date in aligned_dates], index=aligned_dates)
    rmw_factor = pd.Series([calculate_profitability_factor(date) for date in aligned_dates], index=aligned_dates)
    cma_factor = pd.Series([calculate_investment_factor(date) for date in aligned_dates], index=aligned_dates)
    
    # 使用OLS回归计算每个股票的因子载荷
    factor_loadings = {}
    for stock in stock_returns.columns:
        X = sm.add_constant(pd.concat([
            market_returns - risk_free_rate,
            smb_factor,
            hml_factor,
            rmw_factor,
            cma_factor
        ], axis=1))
        y = stock_returns[stock] - risk_free_rate
        
        model = sm.OLS(y, X).fit()
        factor_loadings[stock] = model.params[1:]
    
    # 计算因子风险溢价
    market_premium = market_returns.mean() - risk_free_rate
    smb_premium = smb_factor.mean()
    hml_premium = hml_factor.mean()
    rmw_premium = rmw_factor.mean()
    cma_premium = cma_factor.mean()
    
    # 使用FF5模型计算预期收益率
    expected_returns = pd.Series({
        stock: (risk_free_rate + 
                loadings.iloc[0] * market_premium +
                loadings.iloc[1] * smb_premium + 
                loadings.iloc[2] * hml_premium +
                loadings.iloc[3] * rmw_premium +
                loadings.iloc[4] * cma_premium)
        for stock, loadings in factor_loadings.items()
    })
    
    return expected_returns

2.5 协方差矩阵计算

计算收益率的协方差矩阵,用于评估资产间的相关性和波动性。

def calculate_covariance_matrix(monthly_log_returns):
    """计算收益率协方差矩阵"""
    return monthly_log_returns.cov()

2.6 投资组合优化

通过最大化夏普比率来寻找最优权重配置。

def max_sharpe_ratio(mean_returns, cov_matrix, risk_free_rate):
    """计算最大夏普比率的投资组合权重"""
    num_assets = len(mean_returns)
    args = (mean_returns, cov_matrix, risk_free_rate)
    constraints = ({'type': 'eq', 'fun': lambda x: np.sum(x) - 1})
    bounds = tuple((0, 1) for asset in range(num_assets))
    result = minimize(negative_sharpe_ratio, num_assets*[1./num_assets], args=args,
                      method='SLSQP', bounds=bounds, constraints=constraints)
    return result.x

2.7 持仓筛选

选取权重最大的N只股票并重新归一化权重。

def calculate_top_holdings_weights(optimal_weights, monthly_log_returns_columns, top_n):
    """计算前N大持仓的权重占比"""
    result_dict = {asset: weight for asset, weight in zip(monthly_log_returns_columns, optimal_weights)}
    top_n_holdings = sorted(result_dict.items(), key=lambda item: item[1], reverse=True)[:top_n]
    top_n_sum = sum(value for _, value in top_n_holdings)
    updated_result = {key: value / top_n_sum for key, value in top_n_holdings}
    return updated_result

3. 汇总代码

以下即为全量代码,修改参数集中内容即可跑出个性化数据。

import tushare as ts
import pandas as pd
import numpy as np
from datetime import datetime, timedelta
from scipy.optimize import minimize
import backtrader as bt
import statsmodels.api as sm

# 参数集##############################################################################
ts.set_token('177890330a381e4992e5a83ba21970f268985dcdd50b45d6464718c5')
pro = ts.pro_api()
industry = '银行'
end_date = '20240101'
years = 5   # 数据时长
risk_free_rate = 0.03  # 无风险利率参数
top_holdings = 10      # 持仓数量参数
index_code = '000300.SH'  # 市场指数代码参数
# 参数集##############################################################################

def get_industry_stocks(industry):
    """获取指定行业的股票列表"""
    df = pro.stock_basic(fields=["ts_code", "name", "industry"])
    industry_stocks = df[df["industry"]==industry].copy()
    industry_stocks.sort_values(by='ts_code', inplace=True)
    industry_stocks.reset_index(drop=True, inplace=True)
    return industry_stocks['ts_code'].tolist()

def get_data(code_list, end_date, years):
    """获取指定行业名称的历史收盘价数据"""
    ts_code_list = code_list
    end_date_dt = datetime.strptime(end_date, '%Y%m%d')
    start_date_dt = end_date_dt - timedelta(days=years*365)
    start_date = start_date_dt.strftime('%Y%m%d')
    
    all_data = []
    for stock in ts_code_list:
        df = pro.daily(ts_code=stock, start_date=start_date, end_date=end_date)
        all_data.append(df)
    
    combined_df = pd.concat(all_data).sort_values(by=['ts_code', 'trade_date'])
    combined_df.reset_index(drop=True, inplace=True)
    combined_df.rename(columns={'trade_date': 'date'}, inplace=True)
    
    return combined_df

def get_market_data(index_code='000300.SH', start_date=None, end_date=None):
    """获取市场指数数据用于计算贝塔"""
    df_market = pro.index_daily(ts_code=index_code, 
                              start_date=start_date, 
                              end_date=end_date,
                              fields=['trade_date', 'close'])
    df_market['date'] = pd.to_datetime(df_market['trade_date'])
    df_market.set_index('date', inplace=True)
    df_market = df_market.sort_index()
    
    monthly_last_close = df_market['close'].resample('M').last()
    monthly_log_returns = np.log(monthly_last_close).diff().dropna()
    return monthly_log_returns

def get_factor_data(stock_codes, start_date=None, end_date=None):
    """获取指定股票的因子数据(市值和PB)"""
    all_factor_data = []
    for stock in stock_codes:
        try:
            df = pro.daily_basic(
                ts_code=stock,
                start_date=start_date,
                end_date=end_date,
                fields=['ts_code', 'trade_date', 'total_mv', 'pb']
            )
            all_factor_data.append(df)
        except Exception as e:
            print(f"获取股票 {stock} 的因子数据失败: {str(e)}")
            continue
    
    factor_data = pd.concat(all_factor_data, ignore_index=True)
    factor_data['trade_date'] = pd.to_datetime(factor_data['trade_date'])
    return factor_data

def get_fina_data(stock_codes, start_date=None, end_date=None):
    """获取指定股票的财务指标数据(ROE和资产增长率)"""
    all_fina_data = []
    for stock in stock_codes:
        try:
            df = pro.fina_indicator(
                ts_code=stock,
                start_date=start_date,
                end_date=end_date,
                fields=['ts_code', 'end_date', 'roe_dt', 'assets_yoy', 'update_flag']
            )
            all_fina_data.append(df)
        except Exception as e:
            print(f"获取股票 {stock} 的财务数据失败: {str(e)}")
            continue
    
    # 合并数据
    fina_data = pd.concat(all_fina_data, ignore_index=True)
    
    # 处理update_flag,保留最新数据
    fina_data = (fina_data.groupby(['ts_code', 'end_date'])
                         .agg({'roe_dt': 'first', 
                              'assets_yoy': 'first',
                              'update_flag': 'max'})
                         .reset_index())
    
    # 将end_date转换为datetime
    fina_data['end_date'] = pd.to_datetime(fina_data['end_date'])
    
    # 创建季度到月度的映射
    monthly_data = []
    for _, row in fina_data.iterrows():
        quarter_end = row['end_date']
        if quarter_end.month == 3:  # Q1
            months = [quarter_end + pd.DateOffset(months=i) for i in range(1, 4)]
        elif quarter_end.month == 6:  # Q2
            months = [quarter_end + pd.DateOffset(months=i) for i in range(1, 4)]
        elif quarter_end.month == 9:  # Q3
            months = [quarter_end + pd.DateOffset(months=i) for i in range(1, 4)]
        else:  # Q4
            months = [quarter_end + pd.DateOffset(months=i) for i in range(1, 4)]
        
        for month in months:
            monthly_data.append({
                'ts_code': row['ts_code'],
                'trade_date': month,
                'roe_dt': row['roe_dt'],
                'assets_yoy': row['assets_yoy']
            })
    
    monthly_df = pd.DataFrame(monthly_data)
    return monthly_df

def calculate_monthly_log_returns(df):
    """计算每月的对数收益率"""
    df['date'] = pd.to_datetime(df['date'])
    monthly_last_close = df.groupby(['ts_code', pd.Grouper(key='date', freq='M')])['close'].last().unstack(level=-1)
    monthly_log_returns = np.log(monthly_last_close).diff().dropna()
    return monthly_log_returns.T

def calculate_expected_returns(monthly_log_returns):
    """使用Fama-French五因子模型计算各股票的预期收益率"""
    start_date = monthly_log_returns.index.min().strftime('%Y%m%d')
    end_date = monthly_log_returns.index.max().strftime('%Y%m%d')
    
    # 获取财务数据时,将start_date往前推一个季度,以确保有完整的季度数据
    fina_start_date = (datetime.strptime(start_date, '%Y%m%d') - timedelta(days=90)).strftime('%Y%m%d')
    
    # 获取市场收益率
    market_returns = get_market_data(index_code, start_date, end_date)
    
    # 获取股票的市值和PB数据
    stock_data = get_factor_data(
        monthly_log_returns.columns.tolist(),
        start_date,
        end_date
    )
    
    # 获取财务指标数据,使用提前的start_date
    fina_data = get_fina_data(
        monthly_log_returns.columns.tolist(),
        fina_start_date,
        end_date
    )
    
    # 确保所有数据的日期对齐
    aligned_dates = monthly_log_returns.index.intersection(market_returns.index)
    market_returns = market_returns[aligned_dates]
    stock_returns = monthly_log_returns.loc[aligned_dates].copy()  # 使用copy()避免SettingWithCopyWarning
    
    def calculate_size_factor(date):
        date_data = stock_data[stock_data['trade_date'].dt.to_period('M') == date.to_period('M')]
        median_mv = date_data['total_mv'].median()
        small_returns = stock_returns.loc[date, date_data[date_data['total_mv'] <= median_mv]['ts_code']]
        big_returns = stock_returns.loc[date, date_data[date_data['total_mv'] > median_mv]['ts_code']]
        return small_returns.mean() - big_returns.mean()
    
    def calculate_value_factor(date):
        date_data = stock_data[stock_data['trade_date'].dt.to_period('M') == date.to_period('M')]
        # 创建date_data的副本并计算bm_ratio
        date_data = date_data.copy()
        date_data.loc[:, 'bm_ratio'] = 1 / date_data['pb']
        
        median_bm = date_data['bm_ratio'].median()
        high_returns = stock_returns.loc[date, date_data[date_data['bm_ratio'] > median_bm]['ts_code']]
        low_returns = stock_returns.loc[date, date_data[date_data['bm_ratio'] <= median_bm]['ts_code']]
        return high_returns.mean() - low_returns.mean()
    
    def calculate_profitability_factor(date):
        date_data = fina_data[fina_data['trade_date'].dt.to_period('M') == date.to_period('M')]
        
        median_roe = date_data['roe_dt'].median()
        robust_returns = stock_returns.loc[date, date_data[date_data['roe_dt'] > median_roe]['ts_code']]
        weak_returns = stock_returns.loc[date, date_data[date_data['roe_dt'] <= median_roe]['ts_code']]
        return robust_returns.mean() - weak_returns.mean()
    
    def calculate_investment_factor(date):
        date_data = fina_data[fina_data['trade_date'].dt.to_period('M') == date.to_period('M')]
        
        median_growth = date_data['assets_yoy'].median()
        conservative_returns = stock_returns.loc[date, date_data[date_data['assets_yoy'] <= median_growth]['ts_code']]
        aggressive_returns = stock_returns.loc[date, date_data[date_data['assets_yoy'] > median_growth]['ts_code']]
        return conservative_returns.mean() - aggressive_returns.mean()
    
    # 计算每个月的因子收益
    smb_factor = pd.Series([calculate_size_factor(date) for date in aligned_dates], index=aligned_dates)
    hml_factor = pd.Series([calculate_value_factor(date) for date in aligned_dates], index=aligned_dates)
    rmw_factor = pd.Series([calculate_profitability_factor(date) for date in aligned_dates], index=aligned_dates)
    cma_factor = pd.Series([calculate_investment_factor(date) for date in aligned_dates], index=aligned_dates)
    
    # 使用OLS回归计算每个股票的因子载荷
    factor_loadings = {}
    for stock in stock_returns.columns:
        X = sm.add_constant(pd.concat([
            market_returns - risk_free_rate,
            smb_factor,
            hml_factor,
            rmw_factor,
            cma_factor
        ], axis=1))
        y = stock_returns[stock] - risk_free_rate
        
        model = sm.OLS(y, X).fit()
        factor_loadings[stock] = model.params[1:]
    
    # 计算因子风险溢价
    market_premium = market_returns.mean() - risk_free_rate
    smb_premium = smb_factor.mean()
    hml_premium = hml_factor.mean()
    rmw_premium = rmw_factor.mean()
    cma_premium = cma_factor.mean()
    
    # 使用FF5模型计算预期收益率
    expected_returns = pd.Series({
        stock: (risk_free_rate + 
                loadings.iloc[0] * market_premium +
                loadings.iloc[1] * smb_premium + 
                loadings.iloc[2] * hml_premium +
                loadings.iloc[3] * rmw_premium +
                loadings.iloc[4] * cma_premium)
        for stock, loadings in factor_loadings.items()
    })
    
    return expected_returns

def calculate_covariance_matrix(monthly_log_returns):
    """计算收益率协方差矩阵"""
    return monthly_log_returns.cov()

def portfolio_performance(weights, mean_returns, cov_matrix):
    """计算投资组合的表现"""
    returns = np.sum(mean_returns * weights) 
    std_dev = np.sqrt(np.dot(weights.T, np.dot(cov_matrix, weights)))
    return returns, std_dev

def negative_sharpe_ratio(weights, mean_returns, cov_matrix, risk_free_rate):
    """计算负夏普比率"""
    p_ret, p_std = portfolio_performance(weights, mean_returns, cov_matrix)
    sharpe_ratio = (p_ret - risk_free_rate) / p_std
    return -sharpe_ratio

def max_sharpe_ratio(mean_returns, cov_matrix, risk_free_rate):
    """计算最大夏普比率的投资组合权重"""
    num_assets = len(mean_returns)
    args = (mean_returns, cov_matrix, risk_free_rate)
    constraints = ({'type': 'eq', 'fun': lambda x: np.sum(x) - 1})
    bounds = tuple((0, 1) for asset in range(num_assets))
    result = minimize(negative_sharpe_ratio, num_assets*[1./num_assets], args=args,
                      method='SLSQP', bounds=bounds, constraints=constraints)
    return result.x

def calculate_top_holdings_weights(optimal_weights, monthly_log_returns_columns, top_n):
    """计算前N大持仓的权重占比"""
    result_dict = {asset: weight for asset, weight in zip(monthly_log_returns_columns, optimal_weights)}
    top_n_holdings = sorted(result_dict.items(), key=lambda item: item[1], reverse=True)[:top_n]
    top_n_sum = sum(value for _, value in top_n_holdings)
    updated_result = {key: value / top_n_sum for key, value in top_n_holdings}
    return updated_result

def main():
    # 获取数据
    code_list = get_industry_stocks(industry)
    df = get_data(code_list, end_date, years)

    # 计算每月的对数收益率
    monthly_log_returns = calculate_monthly_log_returns(df)
    
    # 使用FF5模型计算预期收益率
    mean_returns = calculate_expected_returns(monthly_log_returns)
    
    # 计算收益率协方差矩阵
    cov_matrix = calculate_covariance_matrix(monthly_log_returns)

    # 优化权重
    optimal_weights = max_sharpe_ratio(mean_returns, cov_matrix, risk_free_rate)
    
    # 计算前N大持仓权重
    updated_result = calculate_top_holdings_weights(
        optimal_weights, 
        monthly_log_returns.columns, 
        top_holdings
    )
    
    # 打印更新后的资产占比
    print(f"\n{end_date}最优资产前{top_holdings}占比:")
    print(updated_result)

if __name__ == "__main__":
    main()

运行结果:
根据参数设置,程序会输出指定行业前N只最优配置的股票及其权重。
在这里插入图片描述

股票代码 占比
601398.SH 0.16318772568631026
601328.SH 0.16177476392789242
600919.SH 0.12936894301756055
600036.SH 0.10747174637443846
601169.SH 0.0958427702817229
600016.SH 0.09012906474680284
601166.SH 0.08768928377548085
601288.SH 0.06538512327994642
600908.SH 0.05559150377594274
600926.SH 0.043559075133902475

4. 反思

4.1 不足之处

  1. 因子复杂性:五个因子的相互作用可能带来多重共线性
  2. 数据质量:财务数据的及时性和准确性要求更高
  3. 行业适用性:某些行业可能对特定因子不敏感

4.2 提升思路

  1. 因子筛选:根据行业特性选择最相关的因子
  2. 动态权重:引入因子权重的动态调整机制
  3. 机器学习:使用机器学习优化因子构建和选择

5. 启后

  • 优化,预期收益率的优化方案:,可参考下一篇文章:
    pass

  • 量化回测实现,可参考下一篇文章:
    【金融资产组合模型进化论】4.1 对MPT+Fama-French五因子优化方案实现Backtrader量化回测(理论+Python实战)

你可能感兴趣的:(金融资产组合模型进化论,python,java,前端,金融,数据库,机器学习,大数据)