07-数据提取-jsonpath

jsonpath用来解析多层嵌套的json数据
jsonpath官方文档

安装
pip install jsonpath
语法
JSONPath 描述
$ 根节点
. or [] 子节点
.. 不管位置,选择所有符合条件的条件
使用

字典的根节点为最外部大括号
jsonpath()返回一个结果列表

import jsonpath

dict_data = { "store": {
    "book": [
      { "category": "reference",
        "author": "Nigel Rees",
        "title": "Sayings of the Century",
        "price": 8.95
      },
      { "category": "fiction",
        "author": "Evelyn Waugh",
        "title": "Sword of Honour",
        "price": 12.99
      },
      { "category": "fiction",
        "author": "Herman Melville",
        "title": "Moby Dick",
        "isbn": "0-553-21311-3",
        "price": 8.99
      },
      { "category": "fiction",
        "author": "J. R. R. Tolkien",
        "title": "The Lord of the Rings",
        "isbn": "0-395-19395-8",
        "price": 22.99
      }
    ],
    "bicycle": {
      "color": "red",
      "price": 19.95
    }
  }
}

print(jsonpath.jsonpath(dict_data, "$.store.bicycle.price"))
>>[19.95]
print(jsonpath.jsonpath(dict_data, "$..price"))
>>[8.95, 12.99, 8.99, 22.99, 19.95]
练习

爬取bilibili电影分类下的欧美电影数据

import json
import jsonpath
import requests

url="https://api.bilibili.com/archive_rank/getarchiverankbypartion?jsonp=jsonp&tid=145&pn=1"

headers={"User-Agent":"Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.81 Mobile Safari/537.36"}

responses=requests.get(url,headers=headers)

html_dict=json.loads(responses.content)

movie=jsonpath.jsonpath(html_dict,"$..data..archives..title")

for i in movie:
    with open("bilibili.txt","a",encoding='utf-8') as f:
        f.write(i+"\n")

你可能感兴趣的:(07-数据提取-jsonpath)