目录
1多任务学习
1.1问题描述
1.2数据集
1.3网络结构
1.4结果
2因果推断使用多任务方式
2.1DRNet
2.2Dragonet
2.3Deep counterfactual networks with propensity-dropout
2.4VCNet
3思考
keras-mmoe/census_income_demo.py at master · drawbridge/keras-mmoe · GitHub
推荐系统-(16)多任务学习:谷歌MMOE原理与实践 - 知乎
近年来,深度神经网络的应用越来越广,如推荐系统。推荐系统通常需要同时优化多个目标,如电影推荐中不仅需要预测用户是否会购买,还需要预测用户对于电影的评分,在比如电商领域同时需要预测物品的点击率CTR和转化率CVR。因此,多任务学习模型成为研究领域的一大热点。
采用多任务学习方式学习因果关系,尤其是多调研推荐系统的多任务学习模式,进行相应的补充。
Learning Counterfactual Representations for Estimating Individual Dose-Response Curves
Adapting Neural Network for the Estimation of Treatment Effects
如果倾向分的网络丢掉之后, 这个网络结构就是和TARNET的结构相同,后面做了和这种方法的试验对比。 这个loss 中有倾向分部分这个部分会导致网络权重对于g(x) 相关性差的特征自动权重降低,有利于进行特征选择。 下面引入target regularizaiton 进行loss的改进。
摘要: 我们提出了一种从观察数据推断治疗(干预)的个体化因果效应的新方法。我们的方法将因果推断概念化为一个多任务学习问题;我们使用一个深度多任务网络,在事实和反事实结果之间有一组共享层,以及一组特定于结果的层,为受试者的潜在结果建模。通过倾向-退出正则化方案缓解了观察数据中选择偏差的影响,其中网络通过依赖于相关倾向分数的退出概率对每个训练示例进行减薄。该网络在交替阶段进行训练,在每个阶段中,我们使用两个潜在结果之一(处理过的和控制过的人群)的训练示例来更新共享层和各自特定结果层的权重。基于真实世界观察研究的数据进行的实验表明,我们的算法优于最先进的算法。
代码:GitHub - Shantanu48114860/Deep-Counterfactual-Networks-with-Propensity-Dropout: Implementation of the paper "Deep Counterfactual Networks with Propensity-Dropout"(https://arxiv.org/pdf/1706.05966.pdf) in pytorch framework
网络:
如果一个参数requires_grad=False,并且这个参数在optimizer里面,则不对它进行更新,并且程序不会报错
network.hidden1_Y1.weight.requires_grad = False
import torch
import torch.nn as nn
import torch.optim as optim
from DCN import DCN
class DCN_network:
def train(self, train_parameters, device):
epochs = train_parameters["epochs"]
treated_batch_size = train_parameters["treated_batch_size"]
control_batch_size = train_parameters["control_batch_size"]
lr = train_parameters["lr"]
shuffle = train_parameters["shuffle"]
model_save_path = train_parameters["model_save_path"].format(epochs, lr)
treated_set = train_parameters["treated_set"]
control_set = train_parameters["control_set"]
print("Saved model path: {0}".format(model_save_path))
treated_data_loader = torch.utils.data.DataLoader(treated_set,
batch_size=treated_batch_size,
shuffle=shuffle,
num_workers=1)
control_data_loader = torch.utils.data.DataLoader(control_set,
batch_size=control_batch_size,
shuffle=shuffle,
num_workers=1)
network = DCN(training_flag=True).to(device)
optimizer = optim.Adam(network.parameters(), lr=lr)
lossF = nn.MSELoss()
min_loss = 100000.0
dataset_loss = 0.0
print(".. Training started ..")
print(device)
for epoch in range(epochs):
network.train()
total_loss = 0
train_set_size = 0
if epoch % 2 == 0:
dataset_loss = 0
# train treated
network.hidden1_Y1.weight.requires_grad = True
network.hidden1_Y1.bias.requires_grad = True
network.hidden2_Y1.weight.requires_grad = True
network.hidden2_Y1.bias.requires_grad = True
network.out_Y1.weight.requires_grad = True
network.out_Y1.bias.requires_grad = True
network.hidden1_Y0.weight.requires_grad = False
network.hidden1_Y0.bias.requires_grad = False
network.hidden2_Y0.weight.requires_grad = False
network.hidden2_Y0.bias.requires_grad = False
network.out_Y0.weight.requires_grad = False
network.out_Y0.bias.requires_grad = False
for batch in treated_data_loader:
covariates_X, ps_score, y_f, y_cf = batch
covariates_X = covariates_X.to(device)
ps_score = ps_score.squeeze().to(device)
train_set_size += covariates_X.size(0)
treatment_pred = network(covariates_X, ps_score)
# treatment_pred[0] -> y1
# treatment_pred[1] -> y0
predicted_ITE = treatment_pred[0] - treatment_pred[1]
true_ITE = y_f - y_cf
if torch.cuda.is_available():
loss = lossF(predicted_ITE.float().cuda(),
true_ITE.float().cuda()).to(device)
else:
loss = lossF(predicted_ITE.float(),
true_ITE.float()).to(device)
optimizer.zero_grad()
loss.backward()
optimizer.step()
total_loss += loss.item()
dataset_loss = total_loss
elif epoch % 2 == 1:
# train controlled
network.hidden1_Y1.weight.requires_grad = False
network.hidden1_Y1.bias.requires_grad = False
network.hidden2_Y1.weight.requires_grad = False
network.hidden2_Y1.bias.requires_grad = False
network.out_Y1.weight.requires_grad = False
network.out_Y1.bias.requires_grad = False
network.hidden1_Y0.weight.requires_grad = True
network.hidden1_Y0.bias.requires_grad = True
network.hidden2_Y0.weight.requires_grad = True
network.hidden2_Y0.bias.requires_grad = True
network.out_Y0.weight.requires_grad = True
network.out_Y0.bias.requires_grad = True
for batch in control_data_loader:
covariates_X, ps_score, y_f, y_cf = batch
covariates_X = covariates_X.to(device)
ps_score = ps_score.squeeze().to(device)
train_set_size += covariates_X.size(0)
treatment_pred = network(covariates_X, ps_score)
# treatment_pred[0] -> y1
# treatment_pred[1] -> y0
predicted_ITE = treatment_pred[0] - treatment_pred[1]
true_ITE = y_cf - y_f
if torch.cuda.is_available():
loss = lossF(predicted_ITE.float().cuda(),
true_ITE.float().cuda()).to(device)
else:
loss = lossF(predicted_ITE.float(),
true_ITE.float()).to(device)
optimizer.zero_grad()
loss.backward()
optimizer.step()
total_loss += loss.item()
dataset_loss = dataset_loss + total_loss
print("epoch: {0}, train_set_size: {1} loss: {2}".
format(epoch, train_set_size, total_loss))
if epoch % 2 == 1:
print("Treated + Control loss: {0}".format(dataset_loss))
# if dataset_loss < min_loss:
# print("Current loss: {0}, over previous: {1}, Saving model".
# format(dataset_loss, min_loss))
# min_loss = dataset_loss
# torch.save(network.state_dict(), model_save_path)
torch.save(network.state_dict(), model_save_path)
@staticmethod
def eval(eval_parameters, device):
print(".. Evaluation started ..")
treated_set = eval_parameters["treated_set"]
control_set = eval_parameters["control_set"]
model_path = eval_parameters["model_save_path"]
network = DCN(training_flag=False).to(device)
network.load_state_dict(torch.load(model_path, map_location=device))
network.eval()
treated_data_loader = torch.utils.data.DataLoader(treated_set,
shuffle=False, num_workers=1)
control_data_loader = torch.utils.data.DataLoader(control_set,
shuffle=False, num_workers=1)
err_treated_list = []
err_control_list = []
for batch in treated_data_loader:
covariates_X, ps_score, y_f, y_cf = batch
covariates_X = covariates_X.to(device)
ps_score = ps_score.squeeze().to(device)
treatment_pred = network(covariates_X, ps_score)
predicted_ITE = treatment_pred[0] - treatment_pred[1]
true_ITE = y_f - y_cf
if torch.cuda.is_available():
diff = true_ITE.float().cuda() - predicted_ITE.float().cuda()
else:
diff = true_ITE.float() - predicted_ITE.float()
err_treated_list.append(diff.item())
for batch in control_data_loader:
covariates_X, ps_score, y_f, y_cf = batch
covariates_X = covariates_X.to(device)
ps_score = ps_score.squeeze().to(device)
treatment_pred = network(covariates_X, ps_score)
predicted_ITE = treatment_pred[0] - treatment_pred[1]
true_ITE = y_cf - y_f
if torch.cuda.is_available():
diff = true_ITE.float().cuda() - predicted_ITE.float().cuda()
else:
diff = true_ITE.float() - predicted_ITE.float()
err_control_list.append(diff.item())
# print(err_treated_list)
# print(err_control_list)
return {
"treated_err": err_treated_list,
"control_err": err_control_list,
}
我们将我们的潜在结果模型称为深度反事实网络(DCN),我们使用首字母缩写DCN- pd来指代具有倾向-退出正则化的DCN。由于我们的模型同时捕捉了倾向得分和结果,因此它是一个双稳健模型(doubly-robust model)。
@article{LizhenNie2021VCNetAF, title={VCNet and Functional Targeted Regularization For Learning Causal Effects of Continuous Treatments}, author={Lizhen Nie and Mao Ye and Qiang Liu and Dan L. Nicolae}, journal={arXiv: Learning}, year={2021}}
参考:
问题:
1、多目标训练,X_T_Y(唯一的),特效x [y1,y2],行形式是x [y1,null]?
答:采用参数不更新的方式训练。
1、使用因果推断给相关性模型纠偏,因果推断和机器学习是什么关系呢?
2、因果推断和机器学习都是解决什么问题?
答:机器学习解决预测问题,不需要知道原因;因果推断知道原因下,预测结果。
3、机器学习解决不了什么问题?
答:不去干预,不去探索原因。
答:回答不了因果问题。
4、因果推断解决不了什么问题?
5、没干预的问题还是因果问题吧?
答:理解不是了。
6、偏差解决方式?