mmoe/Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts

文章目录

  • 总结
  • 细节
  • 实验

总结

每个task分开emb,每个task分开attention

细节

现有的方法对任务间的relationship敏感

MTL

改进1: 不使用shared-bottom,使用单独的参数,但是加一个多个task参数之间的L2正则

mmoe/Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts_第1张图片

shared-bottom,共用emb,每个任务上再套一个tower network。这种做法可以降低overfitting,但若task之间没关系则学习不好
y k = h k ( f ( x ) ) y_k = h^k(f(x)) yk=hk(f(x))

moe:每个task单独emb,最后的输出乘权重后相加,类似attention
y = ∑ i n g ( x ) i f i ( x ) y = \sum_i^n g(x)_if_i(x) y=ing(x)ifi(x)

mmoe:每个task单独emb,单独attention

实验

数据集:uci census-income
评估指标:auc
baseline:shared-bottom, l2-constrained, cross-stitch, omoe, mmoe

你可能感兴趣的:(机器学习,机器学习,人工智能)