NLP论文笔记 Reinforced Self-Attention Network: a Hybrid of Hard and Soft Attention for Sequence Modeling

论文 Reinforced Self-Attention Network:  a Hybrid of Hard and Soft Attention for Sequence Modeling
Shen, Tao, et al. "Reinforced Self-Attention Network: a Hybrid of Hard and Soft Attention for Sequence Modeling."   arXiv preprint arXiv:1801.10296  (2018).

问题:soft attention 在建模句子的局部/全局依存时候,有前途,但是其计算效率低,而且不高效;而hard attention虽然直接有效,但是其不可微分;能否把两者结合起来?
任务:NLI任务;语义相关度SICK semantic relatedness
动机:很多的自然语言的任务仅仅是文本中的少数token的稀疏依存;两者都有缺点和优点能否结合起来?而且hard attention可以使用策略梯度进行学习。hard attention为soft attention选择子集进行计算,而soft attention的的前馈信号反过来用来对hard attention提供奖励信号;
方法:Reinforced Sequence Sampling (RSS)学习hard attention;Reinforced Self-Attention (ReSA) 学习source2token self-attention soft attention;训练:定义hard attention目标函数+选择的正则
试验


消融实验(we conduct an ablation study  ):去掉各个模块。可以看出影响最大的还是soft attention。个人观点,在本文中虽然hard attention 起到了作用,但是其提升仅仅0.3,而soft提升了3.1!
结论:用原文的话说是:
The hard attention modules could be used to trim a long sequence into a much shorter one and encode rich dependencies information for a soft self-attention mechanism to process. Conversely, the soft self-attention mechanism could be used to provide a stable environment and strong reward signals, which improves the feasibility of training the hard attention modules.

创新:hard和soft的作用的整合;如何整合:策略梯度;训练方法:让soft先训练,经过冷启动以后再启动hard的强化学习
论文中存在的问题/疑惑: 1)hard attention起到的作用为什么这么小?其主要作用是为了提高学习的效率么?2)hard attention 原文中使用的是softmax,然而最后需要转化为0-1值,作者是使用的阈值进行的阶段吗?
思考:hard attention 起到的作用还有更大的潜力挖掘,其更是与任务相关的;

你可能感兴趣的:(NLP)