关系抽取:SemEval2010 Task8数据集

任务描述

SemEval2010 Task8详细信息请参考官方文档。

任务:

对于给定了的句子和两个做了标注的名词,从给定的关系清单中选出最合适的关系。

关系清单(9+1)如下所示:

关系 定义 例子

Cause-Effect

(因果关系)

Cause-Effect(X, Y)  is true for a sentence S that mentions entities X and Y if and only if

(1) S, X and Y are in accordance with the general annotation guidelines (http://docs.google.com/Doc?docid=dfhkmm46_0f63mfvf7)

(2) the situation described in S entails that X is the cause of Y, or that X causes/makes/produces/emits/... Y.

"A person infected with a particular flu virus strain develops an antibody against that virus."

Cause-Effect(e2, e1)

Comment: flu is a state, virus is the causal agent, thus (a) is satisfied; the virus is actively involved in causing flu and thus (c) is satisfied.

Instrument-Agency

Instrument-Agency(X, Y) is true of a sentence S that mentions entities X and Y if and only if:

(1) S, X and Y are in accordance with the general annotation guidelines (http://docs.google.com/Doc?docid=dfhkmm46_0f63mfvf7)

(2) the situation described in S entails the fact that X is the instrument (tool) of Y or, equivalently, that Y uses X.

"A person infected with a particular flu virus strain develops an antibody against that virus."

Cause-Effect(e2, e1)

Comment: flu is a state, virus is the causal agent, thus (a) is satisfied; the virus is actively involved in causing flu and thus (c) is satisfied.

Product-Producer

(生产与被生产之间的关系)

Product-Producer (X, Y) is true for a sentence S that mentions entities X and Y if and only if:

(1) S, X and Y are in accordance with the general annotation guidelines (http://docs.google.com/Doc?docid=dfhkmm46_0f63mfvf7)

(2) the situation described in S entails the fact that X is a product of Y, or Y produces X.

"The honey bee is the third insect genome published by scientists, after a lab workhorse, the fruit fly, and a health menace, the mosquito."

Product-Producer(e1, e2)

Comment: This is a typical example of Product-Producer. Honey is a tangible concrete object (c), and the bee is actively involved in producing it (a).

Content-Container

Content-Container(X, Y) is true for a sentence S that mentions entities X and Y if and only if

(1) S, X and Y are in accordance with the general annotation guidelines (http://docs.google.com/Doc?docid=dfhkmm46_0f63mfvf7)

(2) the situation described in S entails that X is or was (usually temporarily) stored or carried inside Y.

"The apples are in the basket."

Content-Container(e1, e2)

Comment: This is a prototypical example of Content-Container.

Entity-Origin

Entity-Origin(X, Y) is true for a sentence S that mentions the entities X and Y if and only if

(1) S, X and Y are in accordance with the general annotation guidelines (http://docs.google.com/Doc?docid=dfhkmm46_0f63mfvf7)

(2) the situation described in S entails that Y is the origin of an entity X (rather than its location), and X is coming or derived from that origin.

"Under state law, minors are not permitted to have grain alcohol, even if a parent provides it to their children."

Entity-Origin(e2, e1)

Comment: This is a prototypical example of a material Entity-Origin relation. Restriction (b.4) applies.

Entity-Destination

Entity-Destination(X, Y) is true for a sentence S that mentions the entities X and Y if and only if:

(1) S, X and Y are in accordance with the general annotation guidelines (http://docs.google.com/Doc?docid=dfhkmm46_0f63mfvf7)

(2) the situation described in S entails the fact that Y is the destination of X in the sense of X moving (in a physical or abstract sense) toward Y.

"Theboy ran into the school cafeteria."

Entity-Destination(e1,e2)

Comment: school cafeteria is a spatial/geographical destination.

Component - Whole

Component-Whole (X,Y) is true for a sentence S that mentions entities X and Y if and only if:

(1) S, X and Y are in accordance with the general annotation guidelines (http://docs.google.com/Doc?docid=dfhkmm46_0f63mfvf7)

(2) the situation described in S entails that X is a component of Y;

(3) X has a functional relation with Y. In other words, X has an operating or usable purpose within Y.

We don't need Einstein's quantum mechanics to understand why each hand has 5 fingers, and not 4 or 6.

Component-Whole(e2, e1)

Comment: Fingers are functional, integral parts of the hand.

Member-Collection

Member-Collection(X, Y) is true for a sentence S that mentions entities X and Y if and only if:

(1) S, X and Y are in accordance with the general annotation guidelines (http://docs.google.com/Doc?docid=dfhkmm46_0f63mfvf7)

(2) the situation described in S entails the fact that X is a member of Y.

"Italian playing cards most commonly consist of a deck of 40 cards."

Member-Collection(e2, e1)

Comment: A deck is a collection of cards, cards are different and separable from the deck, not functional to the deck.

Message-Topic

Message-Topic(X, Y) is true for a sentence S that mentions the entities X and Y if and only if:

 

(1) S, X and Y are in accordance with the general annotation guidelines (http://docs.google.com/Doc?docid=dfhkmm46_0f63mfvf7)

 

(2) the situation described in S entails the fact that X is a communicative message containing information about Y.

"The recommendations contained the following key points about the new politics of the government."

Message-Topic(e1, e2)

Comment: politics is the topic of the key points.

Other 当句子中实体之前不满足前九种关系时,将标签设置为Other  

各类数据的占比如下图所示:

                      关系抽取:SemEval2010 Task8数据集_第1张图片

数据集

  1. Trial Dataset:试验数据集于2009年8月30日发布,它包含前五个关系的数据。但是,其中也包含了一些其他四种关系的引用,  这些数据在试验数据集上可以被视为Other关系,而不必多加处理。
  2. Training Dataset:训练集包含8000个样例,涵盖上文提到的9+1中关系。
  3. Development Dataset:没有提供官方开发集,但是参与者可以使用该部分训练数据集来调整期参数,如使用交叉验证。
  4. Test Dataset:测试集包含2717个样例,涵盖上文提到的9+1中关系,于2010年3月18日发布。
  5. WordNet senses提示:和SemEval-2007 Task 4不同,此处不提供人工标注的WordNet senses,会使得任务更加真实。

SemEval-2010 Task 8 VS SemEval-2007 Task 4

  • l相比2007中对于每一种关系提供一个单独的数据集和一个对应的二分类任务,2010仅仅提供一个单独的多类别数据集。
  • l分类任务
  • l候选的实体仍然会提供,但是评测系统需要去决策实体在关系中的槽位。
  • lWordNet senses query strings将不再提供。
  • l数据集中数据量大了很多(超过10000条标记的句子)。
  • l关系的集合也变大了

难点

关系清单种中两组相近的关系:

l1

  • lComponent-Whole
  • lMember-Collection
  • l都是Part-Whole的特殊情况

l2

  • lContent-Container
  • lEntity-Origin
  • lEntity-Destination
  • l可以通过考虑所表达的状态是静态的还是动态的进行区分

 

你可能感兴趣的:(关系抽取,自然语言)