Python第三方包 requests还是urllib?

我注意到一个很奇怪的事情。在极客学院爬虫课视频里,老师说要用requests,但是在其他学习群里,很多同学都在讨论urllib/urllib2相关。
然后就迷惑了,为什么会有这三个东西呢?扒了扒国内的博客网站,大多都是urllib相关,并且不推荐使用requests包。于是去墙外搜了下,竟然发现国外大多推荐用requests。我完全愣到了,完全相反的意见。
再仔细八一下,发现国内很多博客博主,只是说不建议使用requests,然后就介绍urllib,并没有说为什么不选requests;国外的稍微好些,有很多相关的讨论。
因为我本来就是刚接触的就是requests,肯定是喜欢它多一点,于是看到一篇做比较的博文,就转过来了(如下文)。
另,在stackoverflow看到一个相关问题以及讨论,这个能访问,就不转了。

我就是觉得requests好哈哈哈哈


urllib/urllib2 vs requests package in Python
转自:http://avi-urllib-vs-requests.blogspot.com/

Python contains libraries to interact with websites or used for opening HTTP URLs.
Example:urllib/urllib2 , requests.

1.urllib/urllib2:

· Urllib is a python module used for opening HTTP URLs.
· It accomplish tasks such as basic authentication, getting cookies, serving GET/POST requests, error handling, viewing headers.
· Urllib2 is an improved Python module and provides additional functionalities to several methods.
· Hence some urllib methods have been replaced by urllib2 methods.
· In spite of having additional features, urllib cannot be completely replaced by urllib2 since the former provides important methods (e.g., urlencode(), used for generating GET query strings) that are absent in urllib2.

2.Python Requests:

· ‘Requests’ is a simple, easy-to-use HTTP library written in Python.
· Requests makes interacting with Web services seamless.

Features of Python Requests:

· Connection pooling: There is a pool of connections, and a connection is released only once all its data has been read.
· Sessions with cookie persistence: You can make a session object and set certain parameters and cookie values. This allows you to persist these parameters and cookies across all requests made from the session instance.
· Python Requests encodes the parameters automatically so you just pass them as simple arguments, unlike in the case of urllib, where you need to use the method urllib.encode() to encode the parameters before passing them.
· Python Requests automatically decodes the response into Unicode.
· Python Requests handles multi-part file uploads, as well as automatic form-encoding.
· In Python, Requests .get() is a method, auth is an optional parameter (since we may or may not require authentication).
· Python Requests supports the entire restful API, i.e., all its methods – PUT, GET, DELETE, POST.
· Unlike the urllib/urllib2 module, there is no confusion caused by Requests, as there is only a single module that can do the entire task.
· Can write easier and shorter code.

Comparison between Python Requests and urllib/urllib2:

Example 1: A simple HTTP GET request and authentication

Using urllib2: In this example, to make a simple HTTP GET request we need to call a lot of methods.
Remembering the names of these methods can be difficult:

import urllib2

url = ‘https://www.example.com’
username= ‘user’
password = ‘pass’

request = urllib2.Request(url)

password_manager = >urllib2.HTTPPasswordMgrWithDefaultRealm()
password_manager.add_password(None, url, username, password)

auth_manager = urllib2.HTTPBasicAuthHandler(password_manager)
opener = urllib2.build_opener(auth_manager)

urllib2.install_opener(opener)

handler = urllib2.urlopen(request)

print handler.getcode()
print handler.headers.getheader(‘content-type’)

Using Requests: The task of making a simple HTTP GET request can be accomplished in a single line when compared to the large code written using urllib2.

import requests

r = requests.get(‘https://www.example.com‘, auth=(‘user’, ‘pass’))

print r.status_code
print r.headers[‘content-type’]

Example 2: Making a POST request
Using urllib2/urllib: Note that in this example we had to make use of both the urllib and urllib2 modules in order to write a script for a simple POST request:

import urllib
import urllib2

url = “http://www.example.com”
values = {“firstname”:” abc “, “lastname”:” xyz “}

header = {“User-Agent”:”Mozilla/4.0 (compatible; MSIE 5.5;Windows NT)”}

values = urllib.urlencode(values)
request = urllib2.Request(url, values, header)

response = urllib2.urlopen(request)
html_content = response.read()

Using Requests: Here we do not require import multiple modules and a single requests module can accomplish the entire task:

import requests

values = {“”firstname”:” abc “, “lastname”:” xyz “}
r = requests.post(‘https://www.example.com, data=values)

I hope from above examples it is clear that requests library is very much easy to use .

Thanks guys :) for reading this ,please share your comments or thoughts.

你可能感兴趣的:(爬虫,工具类)