Python学习——学习记录

表单登录

代码如下:

import time
import requests
from bs4 import BeautifulSoup

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.89 Safari/537.36',
}
url = 'https://www.zhihu.com'


def kill_captcha(data):
    with open('1.gif', 'wb') as fp:
        fp.write(data)
    return input('captcha : ')


def login(username, password, kill_captcha):
    session = requests.session()
    _xsrf = session.get(url+'/#signin', headers=headers).cookies['_xsrf']
    captcha_content = session.get(url+'/captcha.gif?r=%d&type=login' % (time.time() * 1000), headers=headers).content

    data = {
        '_xsrf': _xsrf,
        'password': password,
        'captcha': kill_captcha(captcha_content),
        'email': username,
        'remember_me': 'true'
        # 字典的键值对顺序可以随机
    }

    resp = session.post(url+'/login/email', data=data, headers=headers).text
    # 登录成功
    assert r'\u767b\u5f55\u6210\u529f' in resp
    return session


if __name__ == '__main__':
    session = login(username, password, kill_captcha)
    page = session.get(url, headers=headers).text
    soup = BeautifulSoup(page, 'lxml').findAll('a', {'class': 'question_link'})
    for s in soup:
        print(s.get('href'))

模拟知乎表单登录,还需要手动输入验证码。用户登录的页面分为手机和email地址是不同的,这里演示的是email的登录,手机号登录post的url不同,其他没有区别,图片验证码的地址是一个时间戳.

你可能感兴趣的:(Python学习——学习记录)