A Deep Reinforced Model for Abstractive Summarization

A Deep Reinforced Model for Abstractive Summarization

Romain Paulus, Caiming Xiong, and Richard Socher. 2017.A Deep Reinforced Model for Abstractive Summarization

官方博客介绍

Introduction

本周读的是Salesforce研究发表的关于自动文本摘要的论文,自动文本摘要主要分成抽取式和生成式,抽取式主要是从原文中找到和主题相关的句子或短语,然后组合成摘要,有点类似“复制粘贴”的模式;而生成式主要是在理解了原文的基础上,对原文进行抽象,然后用语义相近的词或者不同表述方法进行文本摘要,更符合人类的形式。但是长文本生成摘要过程中,经常出现不连贯不相关内容、重复语句等问题,基于这些问题,论文中使用了很多trick,包括融合了改进的attention机制和强化学习的训练方法,结果在CNN/Daily Mail、New York Times数据集上达到了新的state-of-the-art。

Model

论文的主要框架还是Seq2Seq,输入是原文文本,输出是文本摘要,encoder采用bi-LSTM,decoder采用单层LSTM。
A Deep Reinforced Model for Abstractive Summarization_第1张图片

  • A new Attention and Decoding Mechanism

    文中使用了两套attention机制,Intra-temporal attention和Intra-decoder attention。前者是作用在encoder中上的,对input中每个词计算权重,这样能使生成的内容信息覆盖原文。后者是作用在decoder上的,对已经生成的词也计算权重,这样能够避免生成重复的内容。然后两者拼接起来进行decode生成下一个词。如下图所示:

    A Deep Reinforced Model for Abstractive Summarization_第2张图片

    在计算Intra-temporal attention权重的过程中,论文采用了一种方法对input中获得较高权重的词进行惩罚,以防后面decode过程中再次赋予该词高权重。计算公式如下,decoder的Intra-decoder attention权重计算则少了以下第二个公式。

    eti=hdtTWeattnhei

eti=exp(eti)exp(eti)i=t1j=1exp(eji)if  t=1otherwise

αeti=etinj=1etj

cet=ni=1αetihei

  • Supervised Learning vs. Reinforcement Learning

    RNN运行一般有两种mode:(1) Free-running mode;(2) Teacher-Forcing mode。前者就是正常的RNN运行方式:上一个隐状态的输出就做为下一个隐状态的输入,这样做有可能出现训练的早期,靠前的state中如果出现了极差的结果,后面的全部state都会受影响,以至于最终结果非常不好也很难溯源到发生错误的源头,而后者Teacher-Forcing mode的做法是,每次不使用上一个隐状态的输出作为下一个隐状态的输入,而是直接使用ground truth的对应上一项作为下一个隐状态的输入,这样做能使得较快地生成正确的结果,但也会过于依赖数据,training时效果很好,testing时无ground truth数据效果变差。

    A Deep Reinforced Model for Abstractive Summarization_第3张图片

    传统的目标函数一般是使用极大似然函数,但在生成文本摘要时,却存在以下局限性:1)在teacher forcing的监督式训练中有ground truth的支持,错误没有发生累积,而在test过程中,错误没有纠正向后传播,错误会累积,比如训练过程中t时刻生成了单词’tech’,而ground truth中该词应该是’science’,那么接下来生成t+1时刻的单词时,接收的输入还是groud truth中的’science’一词,并不是’tech’,所以错误也发生在t时刻,不会向后传播,但test过程则会。2)对于文本摘要,往往不止一种解读,一份参考,这种word by word的监督式学习和极大似然目标函数会使得模型只生成像参考摘要中一模一样的模式。

    Lml=nt=1log p(yt|y1,...,yt1,x)

    img

​ 而用于评价生成摘要的ROUGE指标考虑到这一灵活性,通过比较参考摘要和生成的摘要,给出摘要的评价。所以希望在训练时引入ROUGE指标。但由于ROUGE不可导,无法直接对ROUGE进行梯度计算。因此,可以考虑用强化学习将ROUGE指标加入训练目标。

​ 文中的做法是模型先生成摘要样本,每一步生成摘要词时选择概率最大的词 y^t ,作为baseline,同时sample 得到词 yst ,然后分别拿 y^ ys 与ground truth y 计算ROUGE指标评测值作为reward,然后再根据这个reward对模型进行奖励和惩罚,更新参数。

Lrl=r(y^)r(ys)nt=1log p(yst|ys1,...,yst1,x)

A Deep Reinforced Model for Abstractive Summarization_第4张图片

​如果只采用强化学习的目标函数进行训练,无法保证摘要的质量和可读性,因此最终再将两者进行加权,即保持了语言的可读性又保证了摘要的灵活性。

Lmixed=γLrl+(1γ)Lml

img

  • Token generation and pointer

    将一些词直接copy过来,解决一部分OOV的问题。主要参考以下的论文。

    Jiatao Gu, Zhengdong Lu, Hang Li, et al. “Incorporating Copying Mechanism in Sequence-to-Sequence Learning”. ACL 2016.

    Caglar Gulcehre, Sungjin Ahn, Ramesh Nallapati et al. “Pointing the Unknown Words”. ACL 2016.

  • Sharing decoder weights

    encoder和decoder共享embedding权重。

  • Repetition avoidance at test time

    对于生成摘要中有重复出现tri-gram的直接忽略,这得具体分析数据和人为制定规则。

Result

A Deep Reinforced Model for Abstractive Summarization_第5张图片

A Deep Reinforced Model for Abstractive Summarization_第6张图片
​ 如果去除掉intra-attention和强化学习,那么摘要出现很多重复句子。以下是博客举例的几篇文章摘要:

Article Summary (ground truth) Summary (our model)
Google Wallet says it has changed its policy when storing users’ funds as they will now be federally-insured (file photo) For those who use Google Wallet, their money just became safer with federal-level insurance. Google confirmed to Yahoo Finance in a statement that its current policy changed - meaning the company will store the balances for users of the mobile transfer service (similar to PayPal and Venmo) in multiple federally-insured banking institutions. This is good news for people who place large amounts of money in their Wallet Balance because the Federal Deposit Insurance Corporation insures funds for banking institutions up to $250,000. Currently, Google’s user agreement says funds are not protected by the FDIC. However, a Google spokesperson told Yahoo Finance that the current policy has changed. (…) Google spokesperson confirmed current policy changed meaning funds will be protected by the federal deposit insurance corporation. As a non-banking institution, Google Wallet, along with competitors PayPal and Venmo, is not legally required to be federally insured. With the new change to its policy, funds in wallet balance are protected if anything were to happen to the company like bankruptcy. Google confirmed to Yahoo Finance in a statement that its current policy changed. The company will store the balances for users of the mobile transfer service (similar to PayPal and Venmo) in multiple federally-insured banking institutions. Google’s user agreement says funds are not protected by the federal deposit insurance corporation.
Talk about a chain reaction! This is the moment a billiards player performs a complex trick shot by setting up a domino train to pot four balls. Video footage shows a white ball being rolled down a positioned cue. It then bounces off one side of the red-clothed table and hits the first in a long line of dominoes. One by one the small counters fall down, tapping balls into various pockets as they go. First a yellow, then a blue, then a red. Finally, the last domino gently hits an orange ball, causing it to roll down another positioned cue lying on the table. The orb then knocks a green ball into the center pocket. In less than 30 seconds the stunt comes to a close. (…) The clip was uploaded by youtube user honda4ridered. In another upload the skilled billiards player shows viewers how to pocket four balls in a single shot-and for those who miss it there’s a slow motion version. Video footage shows a white ball being rolled down a jumper. It then bounces off one side of the red-clothed table and hits the first in a long line of dominoes. One by one the small counters fall down, tapping balls into pockets as they go-first a yellow. It comes to a close. The clip was uploaded by youtube user honda4ridered.
Kelly Osbourne didn’t always want to grow up to be like her famous mom - but in a letter published in the new book A Letter to My Mom, the TV personality admitted that she is now proud to be Sharon Osbourne’s daughter. For author Lisa Erspamer’s third collection of tributes, celebrities such as Melissa Rivers, Shania Twain, will.i.am, Christy Turlington Burns, and Kristin Chenoweth all composed messages of love and gratitude to the women who raised them. And the heartwarming epistolary book, which was published last week, has arrived just in time for Mother’s Day on May 10. ‘Like all teenage girls I had this ridiculous fear of growing up and becoming just like you,’ Kelly Osbourne wrote in her letter, republished on Yahoo Parenting. ‘I was so ignorant and adamant about creating my “own” identity.’ Scroll down for video Mini-me: In Lisa Erspamer’s new book A Letter to My Mom, Kelly Osbourne (R) wrote a letter to her mother Sharon (L) saying that she’s happy to have grown up to be just like her (…) Author Lisa Erspamer invited celebrities and a number of other people to write heartfelt notes to their mothers for her new book a letter to my mom. Stars such as Melissa Rivers, will.i.am, and Christy Turlington participated in the moving project. Kelly didn’t always want to grow up to be like her famous mom. Lisa Erspamer’s third collection of tributes, celebrities such as Melissa rivers, Shania Twain, will.i.am, Christy Turlington, and Kristin Chenoweth all composed messages of love and gratitude to the women who raised them. Kelly wrote a letter to her mom before Joan’s death last year. She has arrived just in time for Mother’s Day on May 10.

Thoughts

  • 集合了大部分前沿论文的做法,采用了很多trick,但并有一些并没有做对比实验,很难解释是否是该trick起到作用。比如前面提到的Intra-decoder attention。

  • 文中说是生成式文本摘要,但从博客举的例子来看,似乎都是从原文中的句子进行组合,更像是抽取式,可能是因为采用了pointer的原因,大部分是pointer在起作用?

  • 这里的强化学习似乎不是很准确,只是利用了reward的思路?reward函数的设计也似乎有点简单?

你可能感兴趣的:(论文笔记)