python crawler - Session模拟表单登陆并下载登录后用户头像demo

要登录的网站:https://www.1point3acres.com/bbs/
找到form中的action查看提交表单的目的地址
https://www.1point3acres.com/bbs/member.php?mod=logging&action=login&loginsubmit=yes&infloat=yes&lssubmit=yes&inajax=1
python crawler - Session模拟表单登陆并下载登录后用户头像demo_第1张图片

登录后,查看表单数据作为提交参数
python crawler - Session模拟表单登陆并下载登录后用户头像demo_第2张图片
最后就是查看头像的位置:
python crawler - Session模拟表单登陆并下载登录后用户头像demo_第3张图片
利用BeautifulSoup先找到div,在获取其子节点得到img中的src属性

import requests
from bs4 import BeautifulSoup

header = {
    'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.106 Safari/537.36'
}

form_data = {
    'username' : 'dave_lzw2020',
    'password' : "Password123456.",
    'quickforward' : 'yes',
    'handlekey' : 'ls'
}

session = requests.Session()

html = session.post(
    'https://www.1point3acres.com/bbs/member.php?mod=logging&action=login&loginsubmit=yes&infloat=yes&lssubmit=yes&inajax=1',
    headers=header,
    data=form_data
)

# print(html.text)

resp = session.get('https://www.1point3acres.com/bbs/',headers=header).text

# print(resp)

ht = BeautifulSoup(resp,'lxml')

div_node = ht.find('div',{'class':'avt y'})

print(div_node)
chnodes = div_node.children
print(chnodes)

img_src = [chnode.find('img')['src'] for chnode in chnodes if chnode.find('img') is not None]

print(img_src)

for src in img_src:
    img_content = session.get(src,headers=header,verify=False).content
    src = src.lstrip('https://').replace(r'/','-')
    print(src)
    with open('{src}.jpg'.format_map(vars()) , 'wb+') as f :
        f.write(img_content)

# vars() : 返回对象object的属性和属性值的字典对象,如果没有参数,就打印当前调用位置的属性和属性值 类似locals()



报错及注意事项

1.form_data填写务必正确,不然登陆失败后访问用户页面一直显示
Access denied | www.1point3acres.com used Cloudflare to restrict,让我一直在找如何绕过Cloudflare,
后面将post返回的页面打印出来才发现是密码输入错误,根本没有登陆成功。

2.报错[SSL: CERTIFICATE_VERIFY_FAILED],在get里面加一个verify=False即可。如下:
img_content = session.get(src,headers=header,verify=False).content

你可能感兴趣的:(爬虫,python)