python动态爬虫案例

文章目录

  • 爬取人民邮电出版社--新书推荐板块
    • 思路
    • 代码实现

爬取人民邮电出版社–新书推荐板块

思路

1.获取能返回bookTagId的接口,将bookTagId获取下来,以便于批量保存图书信息
python动态爬虫案例_第1张图片
2.切换图书类型,找到调用的新的接口,找到返回的对应类型下的图书信息(预览部分)。注意,在调用该接口时携带了参数。
python动态爬虫案例_第2张图片
3.打开一本图书,找到对应接口,根据调用接口时携带的bookId参数获取返回的对应图书的具体信息
python动态爬虫案例_第3张图片

代码实现

import requests
import json
from openpyxl import Workbook

wb = Workbook()
ws = wb.worksheets[0]
url = r'https://www.ptpress.com.cn/recommendBook/getRecommendTypeListForPortal'
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36 Edg/111.0.1661.51",
    "Accept-Encoding": "gzip, deflate, br"
}
lists = []
res = requests.get(url=url, headers=headers)
res.encoding = 'utf-8'
bookTag = json.loads(res.text)
bookTags = []
for bookTag_one in bookTag['data']:
    bookTagId = bookTag_one['bookTagId']
    url_tag = r'https://www.ptpress.com.cn/recommendBook/getRecommendBookListForPortal'
    data = {
        'bookTagId': bookTagId
    }
    res_Tag = requests.get(url=url_tag, params=data)
    bookIds = json.loads(res_Tag.text)
    for bookId in bookIds['data']:
        list_one = []
        bookId_one = bookId['bookId']
        url_Id = r'https://www.ptpress.com.cn/bookinfo/getBookDetailsById'
        data = {
            'bookId': bookId_one
        }
        res_Id = requests.post(url=url_Id, data=data)
        books = json.loads(res_Id.text)
        author, bookName, discountPrice, isbn = books['data']['author'], books['data']['bookName'], books['data'][
            'discountPrice'], books['data']['bookDetail']['data']['isbn']
        list_one.append(bookName)
        list_one.append(author)
        list_one.append(discountPrice)
        list_one.append(isbn)
        lists.append(list_one)
for list in lists:
    for i in list:
        ws.cell(lists.index(list)+1, list.index(i)+1, i)
wb.save("人民邮电出版社.xlsx")

表格效果图
python动态爬虫案例_第4张图片

初学动态网页爬取

你可能感兴趣的:(python,爬虫,开发语言)