Python-豆瓣爬虫登录

如何使用requests登录豆瓣并且爬取内容
Note:
1.如果登录之后要去其他页面查看相关内容得记录session

   s=requests.session()
  r = s.post(loginUrl, data=formData, headers=headers
  res=s.get("http://movie.douban.com/mine",cookies=r.cookies,headers=headers)

2.r.history可以记录login之后的302 status

Code:

# -*- encoding:utf-8 -*-  
##############################  
__author__ = "KevinZhou"
__date__ = "2017/7/23"
###############################  

import requests
from bs4 import BeautifulSoup
import urllib.request
import re

loginUrl = 'https://accounts.douban.com/login'
formData = {
    "redir": "http://movie.douban.com/mine",
    "form_email": "******",
    "form_password": "******",
    "login": u'登录',
    "source":"index_nav"
}
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36'}

r = requests.post(loginUrl, data=formData, headers=headers)
page = r.text
print (r.url)

'''''获取验证码图片'''
# 利用bs4获取captcha地址
soup = BeautifulSoup(page, "html.parser")
captchaAddr = soup.find('img', id='captcha_image')['src']
# 利用正则表达式获取captcha的ID
# reCaptchaID = r'

你可能感兴趣的:(Python-豆瓣爬虫登录)