读取json数据并嵌套读取值,保存到excel中。将句子进行jieba分词,保存到excel中

1.数据样式

{"source": "PMC", 
"date": "20140719", 
"key": "pmc.key", 
"infons": {}, 
"documents": [{"id": "555756", "infons": {}, 
                          "passages": [{"offset": 0, 
                                                "infons": {"name_3": "sunames:Seppo A",
                                                "text": "Gluten-free diet may alleviate depressive and behavioural symptoms in adolescents with coeliac disease: a prospective follow-up case-series study", "sentences": [],                                                 "annotations": [{"id": "MIC1", 
                                                                           "infons": {"type": "MeSH_Indexing_Chemical",                                                                                                "entry_term": "Amino Acids"}, 
                                                                           "text": "", "locations": []}, 
                                                                         {"id": "MIC2", 
                                                                           "infons": {"type":

读取json数据并嵌套读取值,保存到excel中。将句子进行jieba分词,保存到excel中_第1张图片

2.代码

主要提取text 和entry_term并且保存到数据库中,其中

import jieba
import json
import jsonpath


file_ = open('555756_v1.json')
text_ = json.load(file_ )
texteach = jsonpath.jsonpath(text_,"$.documents[0].passages[0].text") #读取句子内容
lenn_entry = jsonpath.jsonpath(text_,"$.documents[2].passages") 

eachentry = jsonpath.jsonpath(text_,"$.documents[0].passages[0].annotations[0].infons.entry_term") 

print("结果:",len(text_ ['documents'][0]['passages']))#输出实体的个数,便于遍历


-----------------------------------------------------------------------------------

import json
import jsonpath
import xlwt

file_ = open('555756_v1.json')
text_ = json.load(file_ )

# 创建一个workbook 设置编码
workbook = xlwt.Workbook(encoding = 'utf-8')
worksheet = workbook.add_sheet("my1")
eachentry1 = jsonpath.jsonpath(text_,"$.documents[0].passages[0].annotations[0].infons.entry_term") 
for i in range(len(text_ ['documents'][0]['passages'])):
    #文本
    w="$.documents[0].passages["+str(i)+"].text"
    texteach = jsonpath.jsonpath(text_,w) 
   
    #实体
    m="$.documents[0].passages["+str(i)+"].annotations[0].infons.entry_term"
    
    eachentry = jsonpath.jsonpath(text_,m) 
    worksheet.write(i, 0,eachentry)  # 第i行0列
    worksheet.write(i, 1, texteach) # 第i行1列
print("结果:ok")
# 保存
workbook.save('Excel_test.xls')


    

读取json数据并嵌套读取值,保存到excel中。将句子进行jieba分词,保存到excel中_第2张图片

3.结巴分词

cut_text = jieba.cut("Gluten-free diet may alleviate depressive and behavioural symptoms in adolescents with coeliac disease: a prospective follow-up case-series study")
result = " ".join(cut_text)
#print("句子:Gluten-free diet may alleviate depressive and behavioural symptoms in adolescents with coeliac disease: a prospective follow-up case-series study")
#print("结果:",result)

 

你可能感兴趣的:(数据分析)