爬取知乎壁纸:selenium模拟登陆获取cookies,再将cookies传递给requests

  1. selenium很好用,但是爬取大量数据时速度较慢。
  2. 通过selenium模拟登陆,获取cookies,再将cookies传递给requests,通过requests爬取加快速度。
  3. 以为知乎网爬取壁纸为例,代码如下:
from selenium import webdriver
import requests
from lxml import etree
import time
import os

#使用selenium通过扫码模拟登陆知乎,获取cookies
post_url = 'https://www.zhihu.com/signin?next=%2F'
driver = webdriver.Chrome()
driver.get(post_url)
time.sleep(10)
post_cookies = driver.get_cookies()
print('已经获取cookies!')
cookies = {}
for post_cookie in post_cookies:
    cookies[post_cookie['name']] = post_cookie['value']
print('='*30)


#将获取到的cookies传递给requests,然后爬取图片
url = input('请输入需要爬取的网址(粘贴的话,需要删除最后一个字符再手动输入最后一个字符):')
headers = {
    "user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.106 Safari/537.36"
}
resp = requests.get(url=url,headers=headers,cookies=cookies)
text = resp.text
html = etree.HTML(text)
images_url = html.xpath('//figure[@data-size="normal"]/img/@data-original')
if not os.path.exists('知乎壁纸下载'):
    os.makedirs('知乎壁纸下载')
for index,image_url in enumerate(images_url):
    image_resp = requests.get(image_url,headers=headers,cookies=cookies)
    page = str(index+1)+str(".jpg")
    with open('知乎壁纸下载\\'+page,'wb') as fp:
        fp.write(image_resp.content)
        print('第%d张图片下载完成!'%(index+1))

你可能感兴趣的:(爬取知乎壁纸:selenium模拟登陆获取cookies,再将cookies传递给requests)