如果有一个多层嵌套的复杂字典,想要根据key和下标来批量提取value,这是比较困难的。jsonpath模块就能解决这个痛点,接下来我们就来学习jsonpath模块
jsonpath可以按照key对python字典进行批量数据提取 【多层嵌套的复杂字典直接提取数据】
示例代码:
data = {'key1': {'key2': {'key3': {'key4': {'key5': {'key6': 'python'}}}}}}
print(data)
print(data['key1']['key2']['key3']['key4']['key5']['key6'])
运行结果:
jsonpath是第三方模块,需要额外安装
pip install jsonpath
from jsonpath import jsonpath
ret = jsonpath(a, 'jsonpath语法规则字符串') # 此处的a就是对应的词典
示例代码:
from jsonpath import jsonpath
data = {'key1': {'key2': {'key3': {'key4': {'key5': {'key6': 'python'}}}}}}
print(data)
# jsonpath的结果为列表,获取数据需要索引
print(data['key1']['key2']['key3']['key4']['key5']['key6'])
print(jsonpath(data, '$.key1.key2.key3.key4.key5.key6'))
print(jsonpath(data, '$.key1.key2.key3.key4.key5.key6')[0])
print(jsonpath(data, '$..key6')[0])
运行效果:
from jsonpath import jsonpath
book_dict = {
"store": {
"book": [
{ "category": "reference",
"author": "Nigel Rees",
"title": "Sayings of the Century",
"price": 8.95
},
{ "category": "fiction",
"author": "Evelyn Waugh",
"title": "Sword of Honour",
"price": 12.99
},
{ "category": "fiction",
"author": "Herman Melville",
"title": "Moby Dick",
"isbn": "0-553-21311-3",
"price": 8.99
},
{ "category": "fiction",
"author": "J. R. R. Tolkien",
"title": "The Lord of the Rings",
"isbn": "0-395-19395-8",
"price": 22.99
}
],
"bicycle": {
"color": "red",
"price": 19.95
}
}
}
print(jsonpath(book_dict, '$..author'))
print(jsonpath(book_dict, '$..prince'))
print(jsonpath(book_dict, '$..price'))
print(jsonpath(book_dict, '$..book..price'))
print(jsonpath(book_dict, '$..book'))
print(jsonpath(book_dict, '$..bicycle.color'))
print(jsonpath(book_dict, '$..color'))
【print(jsonpath(book_dict, '$..author')) 有一个输出一个,有多个则全部输出,若没有则报错】
运行效果:
我们以拉勾网城市JSON文件 http://www.lagou.com/lbs/getAllCitySearchLabels.json 为例,获取所有城市的名字的列表,并写入文件。
参考代码:
import requests
import json
import jsonpath
# 获取拉勾网城市json字符串
url = 'http://www.lagou.com/lbs/getAllCitySearchLabels.json'
headers = {
"User-Agent": "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0)"
}
response = requests.get(url, headers=headers)
html_str = response.content.decode()
print(html_str)
# 把json格式字符串转换成python对象
jsonobj = json.loads(html_str)
print(jsonobj)
# 从根节点开始,获取所有key为name的值
citylist = jsonpath.jsonpath(jsonobj, '$..name')
print(citylist)
获取的json数据进行解析:
运行效果: