Python3中request模块访问网页以及客户端伪装

在python3中我们使用request模块访问一个网页,可以选择对文件的读写或者urllib.request.urlretrieve()方法将我们浏览的页面保存到本地。
方法1:
url_list=["http://www.bundcredit.com","http://www.baidu.com","http://www.winnerlook.com","http://www.winnertoke.com"]
for urlinfo in url_list:
file=urllib.request.urlopen(urlinfo)
data=file.read()
with open(str(urlinfo).split(".")[1]+".html","wb") as fileinfo:
fileinfo.write(data)

方法2:
filename=urllib.request.urlretrieve("http://www.cniao5.com/course/sz.html",filename=str(fileline)
检查Web服务器Nginx的访问日志:
IP地址 时间 访问方法 访问协议 访问状态等
180.156.222.228 - - [26/Nov/2017:20:02:02 +0800] "GET / HTTP/1.1" 200 4462 "-" "Python-urllib/3.5" "-"
180.156.222.228 - - [26/Nov/2017:20:02:03 +0800] "GET / HTTP/1.1" 200 4462 "-" "Python-urllib/3.5" "-"
180.156.222.228 - - [26/Nov/2017:20:02:03 +0800] "GET / HTTP/1.1" 200 4462 "-" "Python-urllib/3.5" "-"
180.156.222.228 - - [26/Nov/2017:20:02:03 +0800] "GET / HTTP/1.1" 200 4462 "-" "Python-urllib/3.5" "-"
180.156.222.228 - - [26/Nov/2017:20:02:03 +0800] "GET / HTTP/1.1" 200 4462 "-" "Python-urllib/3.5" "-"
180.156.222.228 - - [26/Nov/2017:20:02:03 +0800] "GET / HTTP/1.1" 200 4462 "-" "Python-urllib/3.5" "-"
180.156.222.228 - - [26/Nov/2017:20:02:03 +0800] "GET / HTTP/1.1" 200 4462 "-" "Python-urllib/3.5" "-"
180.156.222.228 - - [26/Nov/2017:20:02:03 +0800] "GET / HTTP/1.1" 200 4462 "-" "Python-urllib/3.5" "-"
180.156.222.228 - - [26/Nov/2017:20:02:03 +0800] "GET / HTTP/1.1" 200 4462 "-" "Python-urllib/3.5" "-"
模拟浏览器-Headers属性1:
import urllib.request
import re
url="http://www.bundcredit.com"
headers = ("User-Agent","Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:57.0) Gecko/20100101 Firefox/57.0")
opener = urllib.request.build_opener()
opener.addheaders=[headers]
data=opener.open(url).read()
with open( "1.html", "wb") as fileinfo:
fileinfo.write(data)

伪装后的请求:
180.156.222.228 - - [26/Nov/2017:20:57:22 +0800] "GET / HTTP/1.1" 200 4462 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:57.0) Gecko/20100101 Firefox/57.0" "-"
180.156.222.228 - - [26/Nov/2017:20:57:22 +0800] "GET / HTTP/1.1" 200 4462 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:57.0) Gecko/20100101 Firefox/57.0" "-"
180.156.222.228 - - [26/Nov/2017:20:57:22 +0800] "GET / HTTP/1.1" 200 4462 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:57.0) Gecko/20100101 Firefox/57.0" "-"
180.156.222.228 - - [26/Nov/2017:20:57:22 +0800] "GET / HTTP/1.1" 200 4462 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:57.0) Gecko/20100101 Firefox/57.0" "-"

模拟浏览器—Headers属性2
url="http://www.bundcredit.com"
req=urllib.request.Request(url)
req.add_header("User-Agent","Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:57.0) Gecko/20100101 Firefox/57.0")
data=urllib.request.urlopen(req).read()
print(data)

转载于:https://blog.51cto.com/dreamlinux/2044474

你可能感兴趣的:(Python3中request模块访问网页以及客户端伪装)