Python爬虫解析工具小结-JSONPath

符号意义

  • $ 根元素
  • @ 当前元素
  • . 子选择器
  • [] 子选择器
  • .. 递归搜索下级所有
  • * 通配符
  • ?() 过滤器
  • () 脚本引擎工具

示例代码

import jsonpath
import json


s = """
{ "store": {
    "book": [ 
      { "category": "reference",
        "author": "Nigel Rees",
        "title": "Sayings of the Century",
        "price": 8.95
      },
      { "category": "fiction",
        "author": "Evelyn Waugh",
        "title": "Sword of Honour",
        "price": 12.99
      },
      { "category": "fiction",
        "author": "Herman Melville",
        "title": "Moby Dick",
        "isbn": "0-553-21311-3",
        "price": 8.99
      },
      { "category": "fiction",
        "author": "J. R. R. Tolkien",
        "title": "The Lord of the Rings",
        "isbn": "0-395-19395-8",
        "price": 22.99
      }
    ],
    "bicycle": {
      "color": "red",
      "price": 19.95,
      "author": "test"
    }
  }
}
"""
# 转化为字典
data = json.loads(s)
# 进行数据提取
# res = jsonpath.jsonpath(data, '$.store.book[*].author')  # store中所有book的author(列表)
# res = jsonpath.jsonpath(data, '$..author')  # 所有的author
# res = jsonpath.jsonpath(data, '$.store.*')  # store下一级中所有元素,分为book下的所有和bicycle下的所有
# res = jsonpath.jsonpath(data, '$.store..price')  # store里的所有price
# res = jsonpath.jsonpath(data, '$..book[2]')  # book下的第三个元素
# res = jsonpath.jsonpath(data, '$..book[(@.length-1)]')  # book下的最后一项,可以使用切片,@.length意为当前元素长度,@指当前元素
# res = jsonpath.jsonpath(data, '$..book[-1:]')  # book下的最后一项,可以使用数字切片
# res = jsonpath.jsonpath(data, '$..book[0, 1]')  # book下的前两本数,数字切片
# res = jsonpath.jsonpath(data, '$..book[?(@.isbn)]')  # 使用isbn进行过滤,存在isbn的项目,@.isbn当前元素的isbn项
# res = jsonpath.jsonpath(data, '$..book[?(@.price<10)]')  # book中price小于10的项目
res = jsonpath.jsonpath(data, '$..*')  # 结构中的所有成员
print(res)

参考地址

  • http://goessner.net/articles/JsonPath/

你可能感兴趣的:(Python爬虫解析工具小结-JSONPath)