要登录的网站
:https://www.1point3acres.com/bbs/
找到form
中的action
查看提交表单的目的地址:
https://www.1point3acres.com/bbs/member.php?mod=logging&action=login&loginsubmit=yes&infloat=yes&lssubmit=yes&inajax=1
登录后,查看表单数据
作为提交参数:
最后就是查看头像的位置:
利用BeautifulSoup
先找到div
,在获取其子节点得到img
中的src
属性
import requests
from bs4 import BeautifulSoup
header = {
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.106 Safari/537.36'
}
form_data = {
'username' : 'dave_lzw2020',
'password' : "Password123456.",
'quickforward' : 'yes',
'handlekey' : 'ls'
}
session = requests.Session()
html = session.post(
'https://www.1point3acres.com/bbs/member.php?mod=logging&action=login&loginsubmit=yes&infloat=yes&lssubmit=yes&inajax=1',
headers=header,
data=form_data
)
# print(html.text)
resp = session.get('https://www.1point3acres.com/bbs/',headers=header).text
# print(resp)
ht = BeautifulSoup(resp,'lxml')
div_node = ht.find('div',{'class':'avt y'})
print(div_node)
chnodes = div_node.children
print(chnodes)
img_src = [chnode.find('img')['src'] for chnode in chnodes if chnode.find('img') is not None]
print(img_src)
for src in img_src:
img_content = session.get(src,headers=header,verify=False).content
src = src.lstrip('https://').replace(r'/','-')
print(src)
with open('{src}.jpg'.format_map(vars()) , 'wb+') as f :
f.write(img_content)
# vars() : 返回对象object的属性和属性值的字典对象,如果没有参数,就打印当前调用位置的属性和属性值 类似locals()
报错及注意事项:
1.form_data
填写务必正确,不然登陆失败后访问用户页面一直显示
Access denied | www.1point3acres.com used Cloudflare to restrict
,让我一直在找如何绕过Cloudflare
,
后面将post
返回的页面打印出来才发现是密码输入错误
,根本没有登陆成功。
2.报错:[SSL: CERTIFICATE_VERIFY_FAILED]
,在get
里面加一个verify=False
即可。如下:
img_content = session.get(src,headers=header,verify=False).content