Python3之爬虫----retrying模块的使用和处理cookie相关的请求

1.1. 设使用超时参数

  • requests.get(url,headers = headers,timeout = 3) #设置超时参数,若url在三秒内未得到响应,报错

1.2.retrying模块的使用(第三方模块)

from retrying import retry

@retry(stop_max_attempt_number = 3)


示例(访问百度)

import requests
from retrying import retry

#专门请求url的方法

@retry(stop_max_attempt_number = 3) #让被装饰的函数反复执行三次,三次都报错才会报错,一次成功即为成长
def aparse_url(url):
    print("此处计入装饰器"+"*"*100)
    response = requests.get(url,headers = headers,timeout = 5)   
    print("此处获取url反馈")
    return response.content.decode()


def parse_url(url):
    try:
        html_str = aparse_url(url)
    except:
        html_str = None
    return html_str

if __name__ == '__main__':
    headers = {"User-Agent":"Mozilla/5.0 (iPhone; CPU iPhone OS 11_0 like Mac OS X) AppleWebKit/604.1.38 (KHTML, like Gecko) Version/11.0 Mobile/15A372 Safari/604.1",
           "Accept":"text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8",
"Accept-Language":"zh-CN,zh;q=0.9",
"Connection":"keep-alive"}
    url = "https://www.baidu.com"
    print("程序的起点")
    print(parse_url(url)[:100])
程序的起点
此处计入装饰器****************************************************************************************************
此处获取url反馈

<html class=""><head><meta name="referrer" content="always" /><meta

1.3. 处理cookie相关的请求

  • 1.直接携带cookie请求url地址
    Python3 headers = { “User-agent”:”…”,”Cookie”:”cookie” }
  • 2.cookie字典传给cookies参数
    requests.get(url,cookies = cookie_dict)

  • 3.先发送post请求,获取cookie,带上cookie请求登陆后的页面

    • 3.1seeion = requests,session() # session具有的方法和requests一样
    • 3.2session.post(url,data,headers)
    • 3.3session.get(url) #会带上保存在session中的cookie,能够请求成功
    • Python3之爬虫----retrying模块的使用和处理cookie相关的请求_第1张图片

你可能感兴趣的:(Python语言进阶)