spider表单登录神器robobrowser

常规表单登录

获取验证需要的隐藏域,然后传上用户加密码在加上隐藏域

resp = requests.get('http://github.com/login')
    cookies = resp.cookies.get_dict()
    if resp.status_code != 200:
        return None
    soup = BeautifulSoup(resp.text, 'lxml')
    utf8_value = soup.select_one('form input[name=utf8]').attrs['value']
    authenticity_token_value = soup.select_one('form input[name=authenticity_token]').attrs['value']
    print(utf8_value)
    print(authenticity_token_value)
    print(cookies)
    data = {
        'utf8': utf8_value,
        'authenticity': authenticity_token_value,
        'login': '123456',
        'password': '123456'
    }
    files = {
        'file1': '',
        'file2': ''
    }
    response = requests.post('https://github.com/session', data=data, cookies=cookies,
                             files=files)
    print(response.text)

表单神器的用法,可以避开表单的隐藏域

import robobrowser


def main():
    b = robobrowser.RoboBrowser(parser='lxml')
    b.open('https://v.taobao.com/v/content/live?catetype=704&from=taonvlang')
    for img_tag in b.select('img[src]'):
        print(img_tag.attrs['src'])


if __name__ == '__main__':
    main()

你可能感兴趣的:(spider表单登录神器robobrowser)