爬虫(四)之伪装登录

import os
import requests
from pyquery import PyQuery as pq
import config

def get(url):
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) '
                      'AppleWebKit/537.36 (KHTML, like Gecko) '
                      'Chrome/62.0.3202.94 Safari/537.36',
        'Cookie': config.cookie,
    }
    r = requests.get(url, headers=headers)
    page = r.content
    return page

def main():
    url = 'https://www.zhihu.com'
    page = get(url)
    print(page.decode())

if __name__ == '__main__':
    main()
  • User-Agent 用来标识浏览器
  • Cookie:代表登录状态,新建文件夹再引入,可以保护隐私
  • 如果加入 Cookie 后还是无法登陆,那么可以将请求头全部传进去

你可能感兴趣的:(爬虫(四)之伪装登录)