[Python]网络数据采集概述(3)—穿越网页表单、登录窗口进行采集

  • Python Requests库提交表单
  • 提交文件和图像
  • 处理登陆和CookieSessionHttp基本认证
    • Cookie
    • Session
    • HTTP基本接入认证

Python Requests库提交表单

params = {"firstname": "Liu", "lastname": "Vi"}
r = requests.post("http://pythonscraping.com/files/processing.php", data=params)
print(r.text)

如果不了解字段name、value或者提交的路径,可以通过查看网页源码或者控制台中查看Network

提交文件和图像

files = {'uploadFile': open("1.jpg", 'rb')}
r = requests.post("http://pythonscraping.com/files/processing2.php", files= files)
print(r.text)

处理登陆和Cookie、Session、Http基本认证

params = {"username": "vi", "password": "password"}
    r = requests.post("http://pythonscraping.com/pages/cookies/welcome.php", data= params)
    print("Cookie is set to: ")
    print(r.cookies.get_dict())
    print("--------------------")
    print("Going to profile page...")
    r = requests.get("http://pythonscraping.com/pages/cookies/profile.php", cookies= r.cookies)
    print(r.text)

Session

session = requests.Session()
params = {'username': 'vi', 'password': 'password'}
s = session.post("http://pythonscraping.com/pages/cookies/welcome.php", data= params)
print("Cookie is set to: ")
print(s.cookies.get_dict())
print("-------------------")
print("Going to profile page...")
s = session.get("http://pythonscraping.com/pages/cookies/profile.php")
print(s.text)
print(session.headers)
print('---------------')
print(session.cookies)

HTTP基本接入认证

auth = HTTPBasicAuth('vi', 'password')
r = requests.post(url= "http://pythonscraping.com/pages/auth/login.php", auth= auth)
print(r.text)

参考书籍:
《Python网络数据采集》

你可能感兴趣的:(爬虫,python,爬虫)