[urllib2中的urlopen()使用方法及实例]
http://www.cnblogs.com/langdashu/p/4963053.html
数据传输GET和POST的区别
1.Post传输数据时,不需要在URL中显示出来,而Get方法要在URL中显示。
2.Post传输的数据量大,可以达到2M,而Get方法由于受到URL长度的限制,只能传递大约1024字节.
3.Post顾名思义,就是为了将数据传送到服务器段,Get就是为了从服务器段取得数据.而Get之所以也能传送数据,只是用来设计告诉服务器,你到底需要什么样的数据.Post的信息作为http请求的内容,而Get是在Http头部传输的。
Get方式传递数据
# coding:utf-8
import urllib2
import urllib
values = {}
values['username'] = '186######26'
values['password'] = '######'
data = urllib.urlencode(values) # 注意转换格式
url = 'https://accounts.douban.com/login?alias=&redir=https%3A%2F%2Fwww.douban.com%2F&source=index_nav&error=1001'
getUrl = url+'?'+data # get传输时要在url中显示
request = urllib2.Request(getUrl)
response = urllib2.urlopen(request)
# print response.read()
print(getUrl)
输出结果
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
https://accounts.douban.com/login?alias=&redir=https%3A%2F%2Fwww.douban.com%2F&source=index_nav&error=1001?username=186######26&password=######
Post方式传递数据
# coding:utf-8
import urllib2
import urllib
values = {}
values['username'] = '18699940926'
values['password'] = 'shl5880423'
data = urllib.urlencode(values)
url = 'https://accounts.douban.com/login?alias=&redir=https%3A%2F%2Fwww.douban.com%2F&source=index_nav&error=1001'
request = urllib2.Request(url, data)
response = urllib2.urlopen(request)
#print response.read()
print(request)
print(data)
输出结果
··································
登录豆瓣
登录豆瓣
如何获取http头部编码格式信息,利用info():返回一个httplib.HTTPMessage 对象,表示远程服务器返回的头信息(header)
from urllib2 import urlopen
doc = urlopen("http://www.baidu.com")
print doc.info()
print doc.info().getheader('Content-Type')
输出的结果
···························
Transfer-Encoding: chunked
Bdpagetype: 1
Bdqid: 0xad9de3e700024e01
Cache-Control: private
Content-Type: text/html; charset=utf-8
Cxy_all: baidu+ddb991b06b5ef88b2a906ae2f393f374
Date: Wed, 02 May 2018 09:29:39 GMT
Expires: Wed, 02 May 2018 09:29:05 GMT
Keep-Alive: timeout=38
P3p: CP=" OTI DSP COR IVA OUR IND COM "
Server: BWS/1.1
Set-Cookie: BAIDUID=727D13931FCB93DA64AF865355E9B838:FG=1; expires=Thu, 31-Dec-37 23:55:55 GMT; max-age=2147483647; path=/; domain=.baidu.com
Set-Cookie: BIDUPSID=727D13931FCB93DA64AF865355E9B838; expires=Thu, 31-Dec-37 23:55:55 GMT; max-age=2147483647; path=/; domain=.baidu.com
Set-Cookie: PSTM=1525253379; expires=Thu, 31-Dec-37 23:55:55 GMT; max-age=2147483647; path=/; domain=.baidu.com
Set-Cookie: BDSVRTM=0; path=/
Set-Cookie: BD_HOME=0; path=/
Set-Cookie: H_PS_PSSID=1467_21110_26307_20927; path=/; domain=.baidu.com
Vary: Accept-Encoding
X-Powered-By: HPHP
X-Ua-Compatible: IE=Edge,chrome=1
text/html; charset=utf-8
requests
安装第三方库requests
响应与编码
# coding:utf-8
import requests
url = 'http://www.baidu.com'
r = requests.get(url) # 尝试获取网页
print type(r)
print r.status_code # 响应状态码
print r.encoding # 编码值
print r.content # 找到编码
print r.cookies # 浏览器缓存
..............................................
200
ISO-8859-1
百度一下,你就知道
]>
Get请求方式
values = {'user': 'aaa', 'id': '123' }
r = requests.get(url, values) # Get请求方式
print r.url
............................
http://www.baidu.com/?user=aaa&id=123
Post请求方式
values = {'user': 'aaa', 'id': '123'}
r = requests.post(url, values) # Post请求方式
print r.url
得到
............................
http://www.baidu.com/
请求headers处理
user_agent = {'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.104 Safari/537.36 Core/1.53.4295.400 QQBrowser/9.7.12661.400'}
header = {'User-Agent': user_agent}
url = 'http://www.baidu.com/'
r = requests.get(url, headers=header)
g = requests.get
print r.content
响应码code与响应头headers处理
url = 'http://www.baidu.com'
r = requests.get(url)
if r.status_code == requests.codes.ok: # Requests 内置的状态码查询对象
print r.status_code
print r.headers
print r.headers.get('content-type') # 推荐用这种get方式获取头部字段
else:
r.raise_for_status() # 如果发送错误请求,我们通过raise_for_status()来抛出异常
得到
..........................................................
200
{'Content-Encoding': 'gzip', 'Transfer-Encoding': 'chunked', 'Set-Cookie': 'BDORZ=27315; max-age=86400; domain=.baidu.com; path=/', 'Server': 'bfe/1.0.8.18', 'Last-Modified': 'Mon, 23 Jan 2017 13:27:36 GMT', 'Connection': 'Keep-Alive', 'Pragma': 'no-cache', 'Cache-Control': 'private, no-cache, no-store, proxy-revalidate, no-transform', 'Date': 'Wed, 02 May 2018 17:26:58 GMT', 'Content-Type': 'text/html'}
text/html
cookie处理
url = 'https://www.zhihu.com/'
r = requests.get(url)
print r.cookies
print r.cookies.keys()
得到
...................................
]>
['aliyungf_tc']
重定向与历史消息
处理重定向(网址重新定向)只是需要设置一下allow_redirects字段即可,将allow_redirectsy设置为True则是允许重定向的,设置为False则禁止重定向的。
url = 'http://www.baidu.com'
r = requests.get(url, allow_redirects=True)
print r.url
print r.status_code
print r.history
得到
...................................................
http://www.baidu.com/
200
[]