Stanford Natural Language Inference (SNLI)和Multi-Genre NLI Corpus(MultiNLI) 数据集



Stanford Natural Language Inference (SNLI)和Multi-Genre NLI Corpus(MultiNLI) 数据集

https://nlp.stanford.edu/projects/snli/
https://www.nyu.edu/projects/bowman/multinli/
MultiNLI是SNLI的升级版,格式一样,规模相当,但是前者变化更多,也包含了一个辅助测试集用于cross-genre transfer 评估

SNLI1.0包含570,000的人工手写英文句子对,人工标注了平衡的分类标签:蕴含entailment,矛盾,中性
支持NLI(natural language inference)任务,也被视为RTE( recognizing textual entailment )任务

详细介绍:
Samuel R. Bowman, Gabor Angeli, Christopher Potts, and Christopher D. Manning. 2015. A large annotated corpus for learning natural language inference. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP). [pdf] [bib]

除了gold label,还包含了5个标注人的评估结果,另外句子以两种解析表示:

gold_label sentence1_binary_parse sentence2_binary_parse sentence1_parse sentence2_parse sentence1 sentence2 captionID pairID label1 label2 label3 label4 label5
neutral ( ( ( A person ) ( on ( a horse ) ) ) ( ( jumps ( over ( a ( broken ( down airplane ) ) ) ) ) . ) ) ( ( A person ) ( ( is ( ( training ( his horse ) ) ( for ( a competition ) ) ) ) . ) ) (ROOT (S (NP (NP (DT A) (NN person)) (PP (IN on) (NP (DT a) (NN horse)))) (VP (VBZ jumps) (PP (IN over) (NP (DT a) (JJ broken) (JJ down) (NN airplane)))) (. .))) (ROOT (S (NP (DT A) (NN person)) (VP (VBZ is) (VP (VBG training) (NP (PRP$ his) (NN horse)) (PP (IN for) (NP (DT a) (NN competition))))) (. .))) A person on a horse jumps over a broken down airplane. A person is training his horse for a competition. 3416050480.jpg#4 3416050480.jpg#4r1n neutral 



你可能感兴趣的:(深度学习,机器学习,自然语言处理,一般技巧和资源介绍)