数据预处理:AMiner to JSON


1、处理需求

  主要处理的数据来自己Extraction and Mining of Academic Social Networks官网链接的数据集,本文以“AMiner-Paper.txt”为例。原始数据如下:

数据预处理:AMiner to JSON_第1张图片

  处理之后的结果如下:

数据预处理:AMiner to JSON_第2张图片

2、处理代码如下

#AMiner数据转换成JSON数据格式

#待处理和处理后数据的文件路径
inputPath = u"D:/DataSets/AMiner/AMiner-Paper.txt"
outputPath = u"D:/DataSets/AMiner/AMiner-Paper.json"

#打开待处理数据文件
file = open(inputPath, encoding='utf-8')

#处理数据文件
def format2josn(file):
    '''
    :param file:
    :return: AMiner
    '''
    AMiner = []
    onePaper = []
    for line in file.readlines():
        strLine = line.strip()
        if strLine[0:2] != '#!':
           onePaper.append(strLine)
        else:
            onePaper.append(strLine)
            #dict
            paper = {}
            refences = []
            for i in range(len(onePaper)):
                if onePaper[i][0:6] == '#index':
                    paper['index'] = onePaper[i][8:]
                elif onePaper[i][0:2] == '#*':
                    paper['title'] = onePaper[i][4:]
                elif onePaper[i][0:2] == '#@':
                    paper['authors'] = onePaper[i][4:]
                elif onePaper[i][0:2] == '#%':
                    refences.append(onePaper[i][4:])
                elif onePaper[i][0:2] == '#!':
                    paper['abstract'] = onePaper[i][4:]
            paper['references'] = refences
            AMiner.append(paper)
            onePaper = []
    return AMiner

def store(AMiner):
    import json
    with open(outputPath, 'w') as f:
        f.write(json.dumps(AMiner))

if __name__ == "__main__":
    store(format2josn(file=file))


#导入处理过的JSON数据
import json
f = open(outputPath, encoding='utf-8')
data = json.loads(f.read())
print(data)

你可能感兴趣的:(数据科学家)