Pycurl包是一个libcurl的Python接口,由C语言编写的,功能强大,速度快。由于pycurl的属性和方法太多了,写这篇博文记录一下pycurl的属性和方法。
pip install pycurl
如果出现问题,可以按照系统版本搜索安装方法,比如centos7.1 安装pycurl
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
|
import
pycurl
,
urllib
from
io
import
BytesIO
url
=
'http://www.baidu.com'
headers
=
[
"User-Agent:Mozilla/5.0 (iPhone; CPU iPhone OS 5_0 like Mac OS X) AppleWebKit/534.46 (KHTML, like Gecko) Version/5.1 Mobile/9A334 Safari/7534.48.3"
,
]
data
=
{
"cityListName"
:
""
,
"trade"
:
""
}
c
=
pycurl
.
Curl
(
)
#通过curl方法构造一个对象
#c.setopt(pycurl.REFERER, 'http://www.baidu.com/') #设置referer
c
.
setopt
(
pycurl
.
FOLLOWLOCATION
,
True
)
#自动进行跳转抓取
c
.
setopt
(
pycurl
.
MAXREDIRS
,
5
)
#设置最多跳转多少次
c
.
setopt
(
pycurl
.
CONNECTTIMEOUT
,
60
)
#设置链接超时
c
.
setopt
(
pycurl
.
TIMEOUT
,
120
)
#下载超时
c
.
setopt
(
pycurl
.
ENCODING
,
'gzip,deflate'
)
#处理gzip内容
# c.setopt(c.PROXY,ip) # 代理
c
.
fp
=
BytesIO
(
)
c
.
setopt
(
pycurl
.
URL
,
url
)
#设置要访问的URL
c
.
setopt
(
pycurl
.
HTTPHEADER
,
headers
)
#传入请求头
c
.
setopt
(
pycurl
.
POST
,
1
)
c
.
setopt
(
pycurl
.
POSTFIELDS
,
urllib
.
urlencode
(
data
)
)
#传入POST数据
c
.
setopt
(
c
.
WRITEFUNCTION
,
c
.
fp
.
write
)
#回调写入字符串缓存
c
.
perform
(
)
code
=
c
.
getinfo
(
c
.
HTTP_CODE
)
#返回状态码
html
=
c
.
fp
.
getvalue
(
)
#返回源代码
print
c
.
getinfo
(
c
.
TOTAL_TIME
)
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
|
c
=
pycurl
.
Curl
(
)
#通过curl方法构造一个对象
c
.
setopt
(
pycurl
.
FOLLOWLOCATION
,
True
)
#自动进行跳转抓取
c
.
setopt
(
pycurl
.
MAXREDIRS
,
5
)
#设置最多跳转多少次
c
.
setopt
(
pycurl
.
CONNECTTIMEOUT
,
60
)
#设置链接超时
c
.
setopt
(
pycurl
.
TIMEOUT
,
120
)
#下载超时
c
.
setopt
(
pycurl
.
ENCODING
,
'gzip,deflate'
)
#处理gzip内容
# c.setopt(c.PROXY,ip) # 代理
c
.
fp
=
BytesIO
(
)
c
.
setopt
(
pycurl
.
URL
,
url
)
#设置要访问的URL
c
.
setopt
(
pycurl
.
USERAGENT
,
ua
)
#传入ua
# c.setopt(pycurl.HTTPHEADER,self.headers) #传入请求头
c
.
setopt
(
c
.
WRITEFUNCTION
,
c
.
fp
.
write
)
#回调写入字符串缓存
c
.
perform
(
)
code
=
c
.
getinfo
(
c
.
HTTP_CODE
)
#返回状态码
html
=
c
.
fp
.
getvalue
(
)
#返回源代码
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
|
c
=
pycurl
.
Curl
(
)
#通过curl方法构造一个对象
c
.
setopt
(
pycurl
.
FOLLOWLOCATION
,
True
)
#自动进行跳转抓取
c
.
setopt
(
pycurl
.
MAXREDIRS
,
5
)
#设置最多跳转多少次
c
.
setopt
(
pycurl
.
CONNECTTIMEOUT
,
60
)
#设置链接超时
c
.
setopt
(
pycurl
.
TIMEOUT
,
120
)
#下载超时
c
.
setopt
(
pycurl
.
ENCODING
,
'gzip,deflate'
)
#处理gzip内容
# c.setopt(c.PROXY,ip) # 代理
c
.
fp
=
BytesIO
(
)
c
.
setopt
(
pycurl
.
URL
,
url
)
#设置要访问的URL
c
.
setopt
(
pycurl
.
USERAGENT
,
ua
)
#传入User-Agent
# c.setopt(pycurl.HTTPHEADER,headers) #传入请求头
c
.
setopt
(
pycurl
.
POST
,
1
)
c
.
setopt
(
pycurl
.
POSTFIELDS
,
urllib
.
parse
.
urlencode
(
data
)
)
c
.
setopt
(
c
.
WRITEFUNCTION
,
c
.
fp
.
write
)
#回调写入字符串缓存
c
.
perform
(
)
code
=
c
.
getinfo
(
c
.
HTTP_CODE
)
#返回状态码
html
=
c
.
fp
.
getvalue
(
)
#返回源代码
|
windows 访问https的方法,需要证书
1
2
3
|
import
certifi
c
.
setopt
(
pycurl
.
CAINFO
,
certifi
.
where
(
)
)
|
1
2
|
c
.
getinfo
(
pycurl
.
EFFECTIVE_URL
)
获取网页的最终地址
|
1
2
3
|
c
.
setopt
(
pycurl
.
COOKIEFILE
,
"cookie_file_etherscan"
)
#读取cookie
c
.
setopt
(
pycurl
.
COOKIEJAR
,
"cookie_file_etherscan"
)
#设置cookie
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
|
pycurl的部分
API:
pycurl
.
Curl
(
)
#创建一个pycurl对象的方法
pycurl
.
Curl
(
pycurl
.
URL
,
http
:
/
/
www
.
google
.
com
.
hk
)
#设置要访问的URL
pycurl
.
Curl
(
)
.
setopt
(
pycurl
.
MAXREDIRS
,
5
)
#设置最大重定向次数
pycurl
.
Curl
(
)
.
setopt
(
pycurl
.
CONNECTTIMEOUT
,
60
)
pycurl
.
Curl
(
)
.
setopt
(
pycurl
.
TIMEOUT
,
300
)
#连接超时设置
pycurl
.
Curl
(
)
.
setopt
(
pycurl
.
USERAGENT
,
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)"
)
#模拟浏览器
pycurl
.
Curl
(
)
.
perform
(
)
#服务器端返回的信息
pycurl
.
Curl
(
)
.
getinfo
(
pycurl
.
HTTP_CODE
)
#查看HTTP的状态 类似urllib中status属性
pycurl
.
NAMELOOKUP
_TIME
域名解析时间
pycurl
.
CONNECT
_TIME
远程服务器连接时间
pycurl
.
PRETRANSFER
_TIME
连接上后到开始传输时的时间
pycurl
.
STARTTRANSFER
_TIME
接收到第一个字节的时间
pycurl
.
TOTAL
_TIME
上一请求总的时间
pycurl
.
REDIRECT
_TIME
如果存在转向的话,花费的时间
pycurl
.
HTTP_CODE
HTTP
响应代码
pycurl
.
REDIRECT
_COUNT
重定向的次数
pycurl
.
SIZE
_UPLOAD
上传的数据大小
pycurl
.
SIZE
_DOWNLOAD
下载的数据大小
pycurl
.
SPEED
_UPLOAD
上传速度
pycurl
.
HEADER
_SIZE
头部大小
pycurl
.
REQUEST
_SIZE
请求大小
pycurl
.
CONTENT_LENGTH
_DOWNLOAD
下载内容长度
pycurl
.
CONTENT_LENGTH
_UPLOAD
上传内容长度
pycurl
.
CONTENT
_TYPE
内容的类型
pycurl
.
RESPONSE
_CODE
响应代码
pycurl
.
SPEED
_DOWNLOAD
下载速度
pycurl
.
INFO
_FILETIME
文件的时间信息
pycurl
.
HTTP_CONNECTCODE
HTTP
连接代码
|
参考文档
http://pycurl.io/docs/latest/quickstart.html
https://stackoverflow.com/questions/15461995/python-requests-vs-pycurl-performanceFirst and foremost, requests is built on top of the urllib3 library, the stdlib urllib or urllib2 libraries are not used at all.
There is little point in comparing requests with pycurl on performance. pycurl may use C code for its work but like all network programming, your execution speed depends largely on the network that separates your machine from the target server. Moreover, the target server could be slow to respond.
In the end, requests has a far more friendly API to work with, and you'll find that you'll be more productive using that friendlier API.