论文阅读:Keyphrase Extraction for N-best Reranking in Multi-Sentence Compression

作者: Florian Boudin and Emmanuel Morin
来源: 2013 NAACL-HLT
概述:
这篇文章扩展了Filippova (2010)’s word graph-based MSC方法,添加了一个re-reranking步骤,使得包含最多相关关键词的compression被选择出来。
资源:
1. 代码:https://github.com/boudinfl/takahe
2. 数据集:https://github.com/boudinfl/lina-msc
相关工作:
1. Multi-sentence compression
a) Use a syntactic parser (control the grammaticality of the output)
b) Word graph-based approaches that only require a POS tagger (The key assumption is that the redundancy provides a reliable way of generating grammatical sentences. )
2. Keyphrase extraction
Supervised: 将其视为一个二分类问题,缺点:the need for training data; the bias towards the domain
Unsupervised: a) language modeling. b) graph-based ranking. c) clustering
模型:
Given a set of redundant sentences, a word-graph is constructed by iteratively adding sentences to it. The best compression is obtained by finding the shortest path in the word graph. The original algorithm was published and described in :
Katja Filippova, Multi-Sentence Compression: Finding Shortest Paths in Word Graphs.
论文阅读:Keyphrase Extraction for N-best Reranking in Multi-Sentence Compression_第1张图片
A keyphrase-based reranking method can be applied to generated more informative compressions.
论文阅读:Keyphrase Extraction for N-best Reranking in Multi-Sentence Compression_第2张图片
Step1: TextRank计算每个node的salience score:
这里写图片描述
Step2: 生成并计算每个keyphrase candidate的得分
这里写图片描述
Step3: 比(Filippova, 2010)使用更多的路径数,对这些路径重排序,计算sentence compression c的最终得分
这里写图片描述
计算ROUGE得分时移除了stopword并做了词干化处理:http://snowballstem.org/

你可能感兴趣的:(自动文摘,keyphrase)