AllenNLP系列文章之四:指代消解

      指代消解是自然语言处理的一大任务之一,它是信息抽取不可或缺的组成部分。在信息抽取中,由于用户关心的事件和实体间语义关系往往散布于文本的不同位置,其中涉及到的实体通常可以有多种不同的表达方式,例如某个语义关系中的实体可能是以代词形式出现的,为了更准确且没有遗漏地从文本中抽取相关信息,必须要对文章中的指代现象进行消解。指代消解不但在信息抽取中起着重要的作用,而且在机器翻译、文本摘要和问答系统等应用中也极为关键。

       如本方第一句话:  “指代消解是自然语言处理的一大任务之一,是信息抽取不可或缺的组成部分。”

       AllenNLP很Nice的一点是,提供了指代消解的功能,其介绍如下:

Coreference Resolution

Coreference resolution is the task of finding all expressions that refer to the same entity in a text. It is an important step for many higher level NLP tasks that involve natural language understanding, such as document summarization, question answering and information extraction. Our implementation is based on End-to-End Coreference Resolution (Lee et al, 2017)--a neural model which considers all possible spans in the document as potential mentions and learns distributions over possible anteceedents for each span. This approach achieved state-of-the-art results on the Ontonotes 5.0 dataset in early 2017. The AllenNLP implementation achives 63.0% F1 on the CoNLL test set. Please note that this model does not include speaker features (impractical for general use), variational dropout (currently difficult to implement in Pytorch) or data augmentation and considers 100 anteceedents rather than 250 due to memory constraints.

指代消解的基本实现原理可以见stanford的CS224n课程15的介绍,其基本原理是找到一个句子中的所有mention,然后两两配对,评分,如课程PPT中的图示:

AllenNLP系列文章之四:指代消解_第1张图片

由于机器并不知道哪些会成为一个Coreference Cluster,因此需要两两配对,再打分。

AllenNLP系列文章之四:指代消解_第2张图片

打分后聚类的结果如下,从而可实现指代消解。

AllenNLP系列文章之四:指代消解_第3张图片

1、论文原理 

即里面集成了ACL 2017年的指代消解算法,End-to-end Neural Coreference Resolution。它针对的问题就是上面配对的数量随着文档而指数增长的问题,因此采用一些策略来减少配对,提高速度,同时在精度上也有所提升。

Scoring all span pairs in our end-to-end model is impractical, since the complexity would be quartic in the document length. Therefore we factor the model over unary mention scores and pairwise antecedent scores, both of which are simple functions of the learned span embedding. The unary mention scores are used to prune the space of spans and antecedents, to aggressively reduce the number of pairwise computations.

其技术框架 如下:

AllenNLP系列文章之四:指代消解_第4张图片

AllenNLP系列文章之四:指代消解_第5张图片

上述总共分为两个步骤,输入是词向量(含字符向量),然后得到每个mention及其得分,引入了head attention机制来实现配对的优化。


2、论文实践

(1)测试例子:The woman reading a newspaper sat on the bench with her dog.

AllenNLP系列文章之四:指代消解_第6张图片

从其结果可知其聚类结果为【0-4】,【10】两个配对,即:

  • The woman reading a newspaper,      her,
  • (2)测试例子:
  • Xuming Zhang , Chairman of the Chinese Enterprise Association in Macau said that , at present there were more than 200 enterprises operating with Chinese capital in Macau and that the total value of assets is more than 90 billion patacas . Chinese capital enterprises have become the biggest foreign investors in Macau . Xuming Zhang recently said at the joint meeting for the fifth anniversary of the establishment of the Chinese Enterprise Association in Macau , that Macau 's inland investment enterprises , from small to large and from weak to strong , have developed into an important force in Macau 's economic domain .  They have made important contributions to the prosperity and stability of Macau . According to presentations , these enterprises have extensively taken part in many areas of operating activities such as trade , industry , finance , insurance , tourism , catering , traffic and transportation , construction , real estate , etc. in Macau . Among these , the proportion that Chinese capital accounts for in financial insurance has reached 50 % . It accounts for from 50 % to 70 % of the tourism industry , accounts for 30 % of imports and exports , and accounts for 70 % of real estate . Xuming Zhang expressed that enterprises operating with Chinese capital in Macau will continue to take the direction of Xiaoping Deng 's program of  one country , two systems  and of all national guidelines and policies for Hong Kong and Macau , to adhere to the principle of  Some to do and some not to do , and to strive together with local figures in industrial and commercial circles to make more contributions to promoting Macau 's economic stability and social development .

AllenNLP系列文章之四:指代消解_第7张图片

测试结果的可视化如WEB页面所示:

AllenNLP系列文章之四:指代消解_第8张图片

你可能感兴趣的:(DL+NLP,大数据智能,指代消解)