在讲因果推断之前,我们在AI聊的更多的是相关性(correlation),对于相关性,我们可以有如下定义:
在相关性在AI界辗转多年后,又提出了因果性,也就是因果推断(Causal Inference),对此也有如下定义:
与其说因果推断的必要性,不如切入到目前AI界的碰壁点讲起,可以从以下两点分析:
这次我们分析一下上图框架中间的因果效应评估(Causal Effect Estimation),因为背后会运用不同的估计器estimator,估计器主要分为以下八个大模块,每个模块的估计器都对应着不同的算法逻辑和思路。
目前比较主流的估计器是Meta Learner,而对应Meta Learner又可以分为以下三种策略:
将该企业的历史数据,输入到Estimator模型,并指定Treatment和Outcome
from ylearn import Why
import numpy as np
import pandas as pd
import warnings
import matplotlib.pyplot as plt
import seaborn as sns
from imblearn.over_sampling import SMOTE
import xgboost as xgb
from ylearn.estimator_model.meta_learner import SLearner, TLearner, XLearner
warnings.filterwarnings("ignore")
treatment = ['Tech Support','Discount']
outcome = 'Revenue'
instrument = 'Size'
adjustment = [i for i in data.columns[:-2]]
why = Why(random_state=2022)
why.fit(data, outcome, treatment = treatment, instrument = instrument)
Estimator会对历史数据进行训练预估,并得到不同Treatment的因果效应值Causal Effect
下图展示了对比没有干预情况,如果采取Tech Support/Discount,会获得多大的效应增益
effect=pd.DataFrame(why.causal_effect(control = [0,0]))
effect
使用反事实推断,对没给予过Tech Support的994个样本使用反事实推断,假设他们都获得了Tech Support
import matplotlib.pyplot as plt
whatif_data= data[data['Tech Support'] == 0]
out_orig=whatif_data[outcome]
value_5=whatif_data['Tech Support'].map(lambda _:1)
out_whaif=why.whatif(whatif_data,value_5,treatment='Tech Support')
print('Selected sample:', len(whatif_data))
print(f'Mean {outcome} if Tech Support is 0:\t{out_orig.mean():.3f}' )
print(f'Mean {outcome} if Tech Support is 1:\t{out_whaif.mean():.3f}' )
plt.figure(figsize=(8, 5), )
out_orig.hist(label='Without tech support',bins=30,alpha=0.7)
out_whaif.hist(label='With tech support',bins=30,alpha=0.7)
plt.ylabel(f'{outcome}')
plt.title(f'what if offer tech support')
plt.legend()
使用反事实推断,对没给予过Discount的979个样本使用反事实推断,假设他们都获得了Discount
import matplotlib.pyplot as plt
whatif_data= data[data['Discount'] == 0]
out_orig=whatif_data[outcome]
value_5=whatif_data['Discount'].map(lambda _:1)
out_whaif=why.whatif(whatif_data,value_5,treatment='Discount')
print('Selected sample:', len(whatif_data))
print(f'Mean {outcome} if Discount is 0:\t{out_orig.mean():.3f}' )
print(f'Mean {outcome} if Discount is 1:\t{out_whaif.mean():.3f}' )
plt.figure(figsize=(8, 5), )
out_orig.hist(label='Without Discount',bins=30,alpha=0.7)
out_whaif.hist(label='With Discount',bins=30,alpha=0.7)
plt.ylabel(f'{outcome}')
plt.title(f'what if offer Discount')
plt.legend()
我们通过因果推断给出关于Tech Support和Discount最优的决策路径