Cora Dataset


论文Object Identication with Attribute-Mediated Dependences提供了Cora Dataset 的来源:
http://www.cs.umass.edu/~mccallum/data/
   论文A Pitfall and Solution in Multi-Class Feature Selection for Text Classification提供了启发,cora是有6大类,36个小类的.这样一来终于解决了相关性的难题.

(a)cora-refs.tar.gz数据集
Cora Citation Matching [reference matching, object correspondence]
Text of citations hand-clustered into groups referring to the same paper. 
(b) cora-ie.tar.gz数据集
Cora Information Extraction [information extraction]

Research paper headers and citations, with labeled segments for
authors, title, institutions, venue, date, page numbers and several
other fields.
(c) cora-classify.tar.gz  数据集
Cora Research Paper Classification   [relational document classification]
Research papers classified into a topic hierarchy with 73 leaves. We call this a relational data set, because the citations provide relations among papers.
(d)  cora-hmm.tar.gz
      Cora HMM  is the C implementation of HMMs used for information extraction in Cora. It was written by Kristie Seymore.

你可能感兴趣的:(dataset,Cora,实验数据集)