Cosmos QA:Machine reading comprehension with contextual commonsense reasoning

2019.09 -EMNLP 2019

论文
评测网站

介绍

Cosmos QA是一个35.6K问题的大规模数据集,需要基于常识的阅读理解,被表述为多项选择题。 它着重于阅读人们日常叙事的不同线条之间的界限,询问有关事件的可能原因或影响的问题,这些事件需要推理超出上下文的确切范围。

目前在此数据集上模型最高准确率为 68.4%,相对于人类表现的94%。

阅读理解的相关数据集

SQuAD 2016
NEWSQA 2017
SearchQA 2017
NarrativeQA 2018
ProPara 2018
CoQA 2018
ReCoRD 2018
Dream 2019
MCTest 2013
RACE 2017
CNN/Daily Mail 2015
Children’s Book Test 2015
MCScript 2018
这些数据集中大多数集中在对上下文段落的相对明确的理解上,因此,如果有的话,数据集中相对较小或未知的部分需要常识性推理。
ReCoRD 例外,专为通过常识性推理挑战阅读理解而设计。
ReCoRD论文
ReCoRD评测网站

相对于ReCoRD,cosmosQA的特点:
COSMOS通过三个独特的挑战来补充ReCoRD:(1)我们的背景是来自Web博客而不是新闻,因此需要对日常事件而不是对新闻有价值的事件进行常识性推理。 (2)ReCoRD的所有答案均包含在段落中,并被视为实体。 相反,在COSMOS中,
段落中未提及超过83%的答案,这给建模带来了独特的挑战。 (3)除了多项选择评估外,COSMOS还可以用于生成评估

还有其他专门针对常识问题回答的数据集,比如
CommonsenseQA, 2018 (基于Concept)
Social IQA, 2019… (基于ATOMIC)
和这些相比,cosmos的独特贡献在于将阅读理解与常识推理相结合,要求更为复杂:多样化和更长的上下文中进行上下文常识推理。
Cosmos QA:Machine reading comprehension with contextual commonsense reasoning_第1张图片

举例

Paragraph: A woman had topped herself by jumping off the roof of the hospital she had just recently been admitted to. She was there because the first or perhaps latest suicide attempt was unsuccessful. She put her clothes on, folded the hospital gown and made the bed. She walked through the unit unimpeded and took the elevator to the top floor.

Question: What would have happened to the woman if the staff at the hospital were doing their job properly?

Options: (click the choice to see if it’s correct or not)

Cosmos QA:Machine reading comprehension with contextual commonsense reasoning_第2张图片

对比

commonsense QA

不需要上下文,仅需要常识
参考

SQuAD

仅需要阅读材料的上下文,不需要常识
参考

你可能感兴趣的:(论文)