爬虫小白-如何辨别请求头referer/origin反爬

目录

      • 一、网站分析
      • 二、最终代码

一、网站分析

  • 1、网站,研究这块数据从哪个接口来的
    爬虫小白-如何辨别请求头referer/origin反爬_第1张图片
  • 2、反爬参数:请求头referer/origin校验和x-api-key
    爬虫小白-如何辨别请求头referer/origin反爬_第2张图片
  • 3、详细分析流程,看b站十一姐时一视频, 或者知识星球时光漫漫图文文章
    爬虫小白-如何辨别请求头referer/origin反爬_第3张图片

二、最终代码

# -*- coding: utf-8 -*-
# @Time : 2023-08-13
# @Author: sy
# @公众号: 逆向OneByOne
# @url: https://www.regulations.gov/docket/FDA-2016-D-1399/document
# @desc: 请求头referer/origin与X-Api-Key反爬校验
from loguru import logger
import requests
import re


headers = {
    "authority": "api.regulations.gov",
    "accept": "application/vnd.api+json",
    "accept-language": "zh-CN,zh;q=0.9",
    "referer": "https://www.regulations.gov/",
    "user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/115.0.0.0 Safari/537.36",

}
# 第一次请求
doc_url = "https://www.regulations.gov/docket/FDA-2016-D-1399/document"
res = requests.get(doc_url, headers=headers, timeout=20)
logger.info(f"req请求: {res.status_code}")
api_key = re.search(r"apiKey%22%3A%22(.*?)%22%2C%22api", res.text).group(1)
doc_id = re.search(r"/(FDA.*?)/document", doc_url).group(1)
headers.update({"X-Api-Key": api_key})
# 第二次请求
doc_true_url = f"https://api.regulations.gov/v4/documents?filter[docketId]={doc_id}&page[number]=1&sort=-commentEndDate"
res = requests.get(doc_true_url, headers=headers, timeout=20)
logger.info(f"req请求: {res.status_code}")
for file_a in res.json()['data']:
    file_title = file_a['attributes']['title']
    logger.info(f">>>file_id is {file_a['id']}, title is {file_title}")

你可能感兴趣的:(爬虫)