Python爬虫——解析插件JsonPath安装及使用

目录

1.安装JsonPath

 2.JsonPath与xpath不同

 3.案例:淘票票


1.安装JsonPath

          JsonPath适用于解析JSON文件。

        CMD进入python编辑器所在的Scripts目录下。

pip install jsonpath

Python爬虫——解析插件JsonPath安装及使用_第1张图片

 2.JsonPath与xpath不同

        JsonPath与xpath不同,JsonPath只能解析本地文件,xpath可以解析本地文件也可以解析服务器响应文件。参考文章:http://blog.csdn.net/luxideyao/article/details/77802389

        编写json文件

{ "store": {
    "book": [
      { "category": "修真",
        "author": "六道",
        "title": "坏蛋是怎样练成的",
        "price": 8.95
      },
      { "category": "修改",
        "author": "天蚕土豆",
        "title": "斗破苍穹",
        "price": 12.99
      },
      { "category": "修真",
        "author": "唐家三少",
        "title": "斗罗大陆",
        "isbn": "0-553-21311-3",
        "price": 8.99
      },
      { "category": "修真",
        "author": "南派三叔",
        "title": "星辰变",
        "isbn": "0-395-19395-8",
        "price": 22.99
      }
    ],
    "bicycle": {
      "author": "老马",
      "color": "黑色",
      "price": 19.95
    }
  }
}

         编写python代码

import jsonpath
import json

obj = json.load(open('4.json','r',encoding='utf-8'))

# 书店所有书的作者
author_list = jsonpath.jsonpath(obj,'$.store.book[*].author')

# 所有的作者
author_list = jsonpath.jsonpath(obj,'$..author')

# store下面的所有元素
tag_list = jsonpath.jsonpath(obj,'$.store.*')

# store里面所有东西的price
price_list = jsonpath.jsonpath(obj,'$.store..price')

# 第三个书
book = jsonpath.jsonpath(obj,'$..book[2]')

# 最后一本书
book = jsonpath.jsonpath(obj,'$..book[(@.length-1)]')

# 前面的两本书
# book_list = jsonpath.jsonpath(obj,'$..book[0,1]')
book_list = jsonpath.jsonpath(obj,'$..book[:2]')

# 条件过滤需要在圆括号的前面添加问号
# 过滤出所有的包含isbn的书
book_list = jsonpath.jsonpath(obj,'$..book[?(@.isbn)]')

# 那本书超过了10块钱
book_list = jsonpath.jsonpath(obj,'$..book[?(@.price>10)]')

print(book_list)

 3.案例:淘票票

        需求:获取淘票票官网上的城市名称。

Python爬虫——解析插件JsonPath安装及使用_第2张图片

         代码如下:

import urllib.request
import json
import jsonpath

url = 'https://dianying.taobao.com/cityAction.json?activityId&_ksTS=1660459746946_108&jsoncallback=jsonp109&action=cityAction&n_s=new&event_submit_doGetAllRegion=true'

headers = {
    # ':authority': 'dianying.taobao.com',
    # ':method': 'GET',
    # ':path': '/cityAction.json?activityId&_ksTS=1660459746946_108&jsoncallback=jsonp109&action=cityAction&n_s=new&event_submit_doGetAllRegion=true',
    # ':scheme': 'https',
    'accept': 'text/javascript, application/javascript, application/ecmascript, application/x-ecmascript, */*; q=0.01',
    # 'accept-encoding': 'gzip, deflate, br',
    'accept-language': 'zh-CN,zh;q=0.9',
    'bx-v': '2.2.2',
    'cookie': 't=e1b36d2553572e46a78d2f77ae0ba17a; cookie2=1a48588cb63edadedfe7415e8ba197de; v=0; _tb_token_=7b03734ef7eb9; cna=Q1x9G4NYUxcCAbe6vxxlbI/T; xlly_s=1; tb_city=110100; tb_cityName="sbG+qQ=="; tfstk=cDcNBVGd8CdNMFDrLWN2lBSqdcwOZDr0V6zzI3sYulzWbSeGijsY-_QU8r7FmRf..; l=eBag1-XmLvObyoEWBO5Zourza77tNIRb4sPzaNbMiInca6ddtFTqRNCHWorHSdtjgtCfWetzqZSAbdLHR3AgCc0c07kqm05o3xvO.; isg=BDAwbCdMJst8r_rPQMuk2uyyAf6CeRTDLIjnTiqBWgte5dCP0okoU87bPe2F9cyb',
    'referer': 'https://dianying.taobao.com/?spm=a1z21.3046609.city.1.32c0112aTZ8oQq&city=110100',
    'sec-ch-ua': '".Not/A)Brand";v="99", "Google Chrome";v="103", "Chromium";v="103"',
    'sec-ch-ua-mobile': '?0',
    'sec-ch-ua-platform': '"Windows"',
    'sec-fetch-dest': 'empty',
    'sec-fetch-mode': 'cors',
    'sec-fetch-site': 'same-origin',
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36',
    'x-requested-with': 'XMLHttpRequest',
}

request = urllib.request.Request(url=url,headers=headers)

response = urllib.request.urlopen(request)

content = response.read().decode('utf-8')

# split切割
content = content.split('(')[1].split(')')[0]

# 保存本地-----jsonpath只可以识别本地文件
with open('淘票票.json','w',encoding='utf-8')as fp:
    fp.write(content)

# 注意打开的是一个文件
obj = json.load(open('淘票票.json','r',encoding='utf-8'))

city_list = jsonpath.jsonpath(obj,'$..regionName')

print(city_list)

你可能感兴趣的:(python,python,爬虫,json)