爬虫百度图片进入百度验证怎么办?

文章目录

  • 前言
  • 爬虫百度图片时,总是有时好有时坏 解决方案
    • 出现问题:
    • 更改headers:
      • 找到属于自己的headers
          • 我们都在成长的路上,请相信自己!sincerely,end.


前言

  爬虫百度图片时,总是时好时坏(爬不上的居多),已解决,如有错误,请纠正,万分感谢


爬虫百度图片时,总是有时好有时坏 解决方案

出现问题:

  根据调试信息,我观察到了返回结果如下:

 <!DOCTYPE html>
<html lang="zh-CN">
<head>
    <meta charset="utf-8">
    <title>百度安全验证</title>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
    <meta name="apple-mobile-web-app-capable" content="yes">
    <meta name="apple-mobile-web-app-status-bar-style" content="black">
    <meta name="viewport" content="width=device-width, user-scalable=no, initial-scale=1.0, minimum-scale=1.0, maximum-scale=1.0">
    <meta name="format-detection" content="telephone=no, email=no">
    <link rel="shortcut icon" href="https://www.baidu.com/favicon.ico" type="image/x-icon">
    <link rel="icon" sizes="any" mask href="https://www.baidu.com/img/baidu.svg">
    <meta http-equiv="X-UA-Compatible" content="IE=Edge">
    <meta http-equiv="Content-Security-Policy" content="upgrade-insecure-requests">
    <link rel="stylesheet" href="https://ppui-static-wap.cdn.bcebos.com/static/touch/css/api/mkdjump_0635445.css" />
</head>
<body>
    <div class="timeout hide">
        <div class="timeout-img"></div>
        <div class="timeout-title">网络不给力,请稍后重试</div>
        <button type="button" class="timeout-button">返回首页</button>
    </div>
    <div class="timeout-feedback hide">
        <div class="timeout-feedback-icon"></div>
        <p class="timeout-feedback-title">问题反馈</p>
    </div>

<script src="https://wappass.baidu.com/static/machine/js/api/mkd.js"></script>
<script src="https://ppui-static-wap.cdn.bcebos.com/static/touch/js/mkdjump_1448d18.js"></script>
</body>
</html>

  原来是进入了百度验证!

更改headers:

  之前的headers 如下:

headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.82'}

  后来增加如下信息:

headers = {
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'Accept-Encoding': 'gzip, deflate, br',
'Accept-Language': 'zh-CN,zh;q=0.9,en-US;q=0.8,en;q=0.7',
'Cache-Control': 'max-age=0',
'Connection': 'keep-alive',
'sec-ch-ua': '"Google Chrome";v="89", "Chromium";v="89", ";Not A Brand";v="99"',
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.82 Safari/537.36'}

  成功解决!
  别划走别划走, 每个人的因为浏览器版本不同等原因,不一定代码一样

找到属于自己的headers

(以百度图片为例)

  1. 打开爬虫网页 —— 百度图片

  2. F12打开开发者工具,按F5刷新

  3. 点击Network,找到Doc,点击Name下的信息,找到Headers
    爬虫百度图片进入百度验证怎么办?_第1张图片

  4. 找到Request Headers Accept Accept-Encoding Accept-Language Cache-Control Connection sec-ch-ua User-Agent 字段,将其复制下来
    爬虫百度图片进入百度验证怎么办?_第2张图片
    爬虫百度图片进入百度验证怎么办?_第3张图片

  5. 将复制的字段构造成字典形式

举例:
Accept-Encoding: gzip, deflate, br
更改为 ‘Accept-Encoding’: ‘gzip, deflate, br’

  1. python中的部分代码(仅供参考,版本不一定一致,具体还是要按上述步骤找到自己的headers和url):
headers = {
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'Accept-Encoding': 'gzip, deflate, br',
'Accept-Language': 'zh-CN,zh;q=0.9,en-US;q=0.8,en;q=0.7',
'Cache-Control': 'max-age=0',
'Connection': 'keep-alive',
'sec-ch-ua': '"Google Chrome";v="89", "Chromium";v="89", ";Not A Brand";v="99"',
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.82 Safari/537.36'}

#name是需要搜索图片的名字
url = 'https://image.baidu.com/search/index?tn=baiduimage&ipn=r&ct=201326592&cl=2&fm=detail&lm=-1&hd=&latest=©right=&st=-1&sf=2&fmq=1616167633329_R_D&fm=detail&pv=&ic=0&nc=1&z=&se=&showtab=0&fb=0&width=&height=&face=0&istype=2&ie=utf-8&word='+name+'&pn='+str(i*30)

res = requests.get(url,headers=headers)

  解决!


我们都在成长的路上,请相信自己!sincerely,end.

你可能感兴趣的:(python,#,pycharm使用遇见的问题,#,爬虫,爬虫,百度图片爬虫,百度验证)