linux curl是通过url语法在命令行下上传或下载文件的工具软件,它支持http,https,ftp,ftps,telnet等多种协议,常被用来抓取网页和监控Web服务器状态。
一、Linux curl用法举例:
1. linux curl抓取网页:
抓取百度:
<textarea class="crayon-plain print-no" data-settings="dblclick" readonly="readonly" style="margin: 0px; padding: 0px 5px; border-width: 0px; width: 598px; overflow: hidden; height: 15px; position: absolute; opacity: 0; box-shadow: none; border-top-left-radius: 0px; border-top-right-radius: 0px; border-bottom-right-radius: 0px; border-bottom-left-radius: 0px; -webkit-box-shadow: none; word-wrap: normal; resize: none; color: rgb(0, 0, 0); tab-size: 4; z-index: 0; line-height: 15px !important;"></textarea>
|
curl
http
:
//www.baidu.com
|
如发现乱码,可以使用iconv转码:
<textarea class="crayon-plain print-no" data-settings="dblclick" readonly="readonly" style="margin: 0px; padding: 0px 5px; border-width: 0px; width: 598px; overflow: hidden; height: 15px; position: absolute; opacity: 0; box-shadow: none; border-top-left-radius: 0px; border-top-right-radius: 0px; border-bottom-right-radius: 0px; border-bottom-left-radius: 0px; -webkit-box-shadow: none; word-wrap: normal; resize: none; color: rgb(0, 0, 0); tab-size: 4; z-index: 0; line-height: 15px !important;"></textarea>
|
curl
http
:
//iframe.ip138.com/ic.asp|iconv -fgb2312
|
iconv的用法请参阅:在Linux/Unix系统下用iconv命令处理文本文件中文乱码问题
2. Linux curl使用代理:
linux curl使用http代理抓取页面:
<textarea class="crayon-plain print-no" data-settings="dblclick" readonly="readonly" style="margin: 0px; padding: 0px 5px; border-width: 0px; width: 598px; overflow: hidden; height: 30px; position: absolute; opacity: 0; box-shadow: none; border-top-left-radius: 0px; border-top-right-radius: 0px; border-bottom-right-radius: 0px; border-bottom-left-radius: 0px; -webkit-box-shadow: none; word-wrap: normal; resize: none; color: rgb(0, 0, 0); tab-size: 4; z-index: 0; line-height: 15px !important;"></textarea>
|
curl
-
x
111.95.243.36
:
80
http
:
//iframe.ip138.com/ic.asp|iconv -fgb2312
curl
-
x
111.95.243.36
:
80
-
U
aiezu
:
password
http
:
//www.baidu.com
|
使用socks代理抓取页面:
<textarea class="crayon-plain print-no" data-settings="dblclick" readonly="readonly" style="margin: 0px; padding: 0px 5px; border-width: 0px; width: 598px; overflow: hidden; height: 30px; position: absolute; opacity: 0; box-shadow: none; border-top-left-radius: 0px; border-top-right-radius: 0px; border-bottom-right-radius: 0px; border-bottom-left-radius: 0px; -webkit-box-shadow: none; word-wrap: normal; resize: none; color: rgb(0, 0, 0); tab-size: 4; z-index: 0; line-height: 15px !important;"></textarea>
|
curl
--
socks4
202.113.65.229
:
443
http
:
//iframe.ip138.com/ic.asp|iconv -fgb2312
curl
--
socks5
202.113.65.229
:
443
http
:
//iframe.ip138.com/ic.asp|iconv -fgb2312
|
代理服务器地址可以从爬虫代理上获取。
3. linux curl处理cookies
接收cookies:
<textarea class="crayon-plain print-no" data-settings="dblclick" readonly="readonly" style="margin: 0px; padding: 0px 5px; border-width: 0px; width: 598px; overflow: hidden; height: 15px; position: absolute; opacity: 0; box-shadow: none; border-top-left-radius: 0px; border-top-right-radius: 0px; border-bottom-right-radius: 0px; border-bottom-left-radius: 0px; -webkit-box-shadow: none; word-wrap: normal; resize: none; color: rgb(0, 0, 0); tab-size: 4; z-index: 0; line-height: 15px !important;"></textarea>
|
curl
-
c
/
tmp
/
cookies
http
:
//www.baidu.com #cookies保存到/tmp/cookies文件
|
发送cookies:
<textarea class="crayon-plain print-no" data-settings="dblclick" readonly="readonly" style="margin: 0px; padding: 0px 5px; border-width: 0px; width: 598px; overflow: hidden; height: 30px; position: absolute; opacity: 0; box-shadow: none; border-top-left-radius: 0px; border-top-right-radius: 0px; border-bottom-right-radius: 0px; border-bottom-left-radius: 0px; -webkit-box-shadow: none; word-wrap: normal; resize: none; color: rgb(0, 0, 0); tab-size: 4; z-index: 0; line-height: 15px !important;"></textarea>
|
curl
-
b
"key1=val1;key2=val2;"
http
:
//www.baidu.com #发送cookies文本
curl
-
b
/
tmp
/
cookies
http
:
//www.baidu.com #从文件中读取cookies
|
4. linux curl发送数据:
linux curl get方式提交数据:
<textarea class="crayon-plain print-no" data-settings="dblclick" readonly="readonly" style="margin: 0px; padding: 0px 5px; border-width: 0px; width: 598px; overflow: hidden; height: 15px; position: absolute; opacity: 0; box-shadow: none; border-top-left-radius: 0px; border-top-right-radius: 0px; border-bottom-right-radius: 0px; border-bottom-left-radius: 0px; -webkit-box-shadow: none; word-wrap: normal; resize: none; color: rgb(0, 0, 0); tab-size: 4; z-index: 0; line-height: 15px !important;"></textarea>
|
curl
-
G
-
d
"name=value&name2=value2"
http
:
//www.baidu.com
|
linux curl post方式提交数据:
<textarea class="crayon-plain print-no" data-settings="dblclick" readonly="readonly" style="margin: 0px; padding: 0px 5px; border-width: 0px; width: 598px; overflow: hidden; height: 30px; position: absolute; opacity: 0; box-shadow: none; border-top-left-radius: 0px; border-top-right-radius: 0px; border-bottom-right-radius: 0px; border-bottom-left-radius: 0px; -webkit-box-shadow: none; word-wrap: normal; resize: none; color: rgb(0, 0, 0); tab-size: 4; z-index: 0; line-height: 15px !important;"></textarea>
|
curl
-
d
"name=value&name2=value2"
http
:
//www.baidu.com #post数据
curl
-
d
a
=
b
&
c
=
d
&
txt
@
/
tmp
/
txt
http
:
//www.baidu.com #post文件
|
以表单的方式上传文件:
<textarea class="crayon-plain print-no" data-settings="dblclick" readonly="readonly" style="margin: 0px; padding: 0px 5px; border-width: 0px; width: 598px; overflow: hidden; height: 15px; position: absolute; opacity: 0; box-shadow: none; border-top-left-radius: 0px; border-top-right-radius: 0px; border-bottom-right-radius: 0px; border-bottom-left-radius: 0px; -webkit-box-shadow: none; word-wrap: normal; resize: none; color: rgb(0, 0, 0); tab-size: 4; z-index: 0; line-height: 15px !important;"></textarea>
|
curl
-
F
file
=
@
/
tmp
/
me
.
txt
http
:
//www.aiezu.com
|
相当于设置form表单的method="POST"和enctype='multipart/form-data'两个属性。
5. linux curl http header处理:
设置http请求头信息:
<textarea class="crayon-plain print-no" data-settings="dblclick" readonly="readonly" style="margin: 0px; padding: 0px 5px; border-width: 0px; width: 598px; overflow: hidden; height: 60px; position: absolute; opacity: 0; box-shadow: none; border-top-left-radius: 0px; border-top-right-radius: 0px; border-bottom-right-radius: 0px; border-bottom-left-radius: 0px; -webkit-box-shadow: none; word-wrap: normal; resize: none; color: rgb(0, 0, 0); tab-size: 4; z-index: 0; line-height: 15px !important;"></textarea>
|
curl
-
A
"Mozilla/5.0 Firefox/21.0"
http
:
//www.baidu.com #设置http请求头User-Agent
curl
-
e
"http://pachong.org/"
http
:
//www.baidu.com #设置http请求头Referer
curl
-
H
"Connection:keep-alive \n User-Agent: Mozilla/5.0"
http
:
//www.aiezu.com
|
设置http响应头处理:
<textarea class="crayon-plain print-no" data-settings="dblclick" readonly="readonly" style="margin: 0px; padding: 0px 5px; border-width: 0px; width: 598px; overflow: hidden; height: 30px; position: absolute; opacity: 0; box-shadow: none; border-top-left-radius: 0px; border-top-right-radius: 0px; border-bottom-right-radius: 0px; border-bottom-left-radius: 0px; -webkit-box-shadow: none; word-wrap: normal; resize: none; color: rgb(0, 0, 0); tab-size: 4; z-index: 0; line-height: 15px !important;"></textarea>
|
curl
-
I
http
:
//www.aiezu.com #仅仅返回header
curl
-
D
/
tmp
/
header
http
:
//www.aiezu.com #将http header保存到/tmp/header文件
|
6. linux curl认证:
<textarea class="crayon-plain print-no" data-settings="dblclick" readonly="readonly" style="margin: 0px; padding: 0px 5px; border-width: 0px; width: 598px; overflow: hidden; height: 30px; position: absolute; opacity: 0; box-shadow: none; border-top-left-radius: 0px; border-top-right-radius: 0px; border-bottom-right-radius: 0px; border-bottom-left-radius: 0px; -webkit-box-shadow: none; word-wrap: normal; resize: none; color: rgb(0, 0, 0); tab-size: 4; z-index: 0; line-height: 15px !important;"></textarea>
|
curl
-
u
aiezu
:
password
http
:
//www.aiezu.com #用户名密码认证
curl
-
E
mycert
.
pem
https
:
//www.baidu.com #采用证书认证
|
6. 其他:
<textarea class="crayon-plain print-no" readonly="readonly" style=""></textarea>
|
curl
-
# http://www.baidu.com #以“#”号输出进度条
curl
-
o
/
tmp
/
aiezu
http
:
//www.baidu.com #保存http响应到/tmp/aiezu
|