关于curl命令行下载页面为空的问题

平时经常用到wget/curl命令行下载页面,有时候会遇到curl命令行下载为空的情况,如:

zhuliting@zhuliting:~$ curl 'http://m.youku.com' -o youku.html
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
zhuliting@zhuliting:~$ file youku.html
youku.html: ERROR: cannot open `youku.html' (No such file or directory)

而wget下载,可以得到内容:

zhuliting@zhuliting:~$ wget 'http://m.youku.com' -O youku.html
--2015-08-06 20:09:44--  http://m.youku.com/
正在解析主机 m.youku.com (m.youku.com)... 211.151.146.60
正在连接 m.youku.com (m.youku.com)|211.151.146.60|:80... 已连接。
已发出 HTTP 请求,正在等待回应... 302 Moved Temporarily
位置:http://m.youku.com/wap/ [跟随至新的 URL]
--2015-08-06 20:09:44--  http://m.youku.com/wap/
再次使用存在的到 m.youku.com:80 的连接。
已发出 HTTP 请求,正在等待回应... 200 OK
长度: 16603 (16K) [text/html]
正在保存至: “youku.html”

100%[======================================================================================================>] 16,603      86.4KB/s   用时 0.2s 

2015-08-06 20:09:44 (86.4 KB/s) - 已保存 “youku.html” [16603/16603])

zhuliting@zhuliting:~$ file youku.html 
youku.html: HTML document, UTF-8 Unicode text

如果这种情况,很可能是因为在下载过程中,发生了重定向。wget命令在下载时,默认能够自动下载重定向后的链接,而curl默认是不会的。通过对curl加参数-v进行调试,也可以发现,下载时发生了302重定向:

zhuliting@zhuliting:~$ curl 'http://m.youku.com' -o youku.html -v
* Rebuilt URL to: http://m.youku.com/
* Hostname was NOT found in DNS cache
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0*   Trying 211.151.146.60...
* Connected to m.youku.com (211.151.146.60) port 80 (#0)
> GET / HTTP/1.1
> User-Agent: curl/7.35.0
> Host: m.youku.com
> Accept: */*
> 
< HTTP/1.1 302 Moved Temporarily
* Server nginx/0.8.54 is not blacklisted
< Server: nginx/0.8.54
< Date: Thu, 06 Aug 2015 12:14:52 GMT
< Content-Type: text/html;charset=utf-8
< Connection: keep-alive
< Set-Cookie: JSESSIONID=ED87E6E12D2605176672773D570B8377; Path=/; HttpOnly
< Location: http://www.youku.com
< Content-Length: 0
< 
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
* Connection #0 to host m.youku.com left intact


通过man curl发现,可以通过加选项-L进行重定向,来解决下载为空的问题:

zhuliting@zhuliting:~$ curl 'http://m.youku.com' -o youku.html -v -L
* Rebuilt URL to: http://m.youku.com/
* Hostname was NOT found in DNS cache
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0*   Trying 211.151.146.60...
* Connected to m.youku.com (211.151.146.60) port 80 (#0)
> GET / HTTP/1.1
> User-Agent: curl/7.35.0
> Host: m.youku.com
> Accept: */*
> 
< HTTP/1.1 302 Moved Temporarily
* Server nginx/0.8.54 is not blacklisted
< Server: nginx/0.8.54
< Date: Thu, 06 Aug 2015 12:17:43 GMT
< Content-Type: text/html;charset=utf-8
< Connection: keep-alive
< Set-Cookie: JSESSIONID=5ED7472830728CC0C65B60F9EDCCE087; Path=/; HttpOnly
< Location: http://www.youku.com
< Content-Length: 0
< 
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
* Connection #0 to host m.youku.com left intact
* Issue another request to this URL: 'http://www.youku.com'
* Rebuilt URL to: http://www.youku.com/
* Hostname was NOT found in DNS cache
*   Trying 43.250.12.42...
* Connected to www.youku.com (43.250.12.42) port 80 (#1)
> GET / HTTP/1.1
> User-Agent: curl/7.35.0
> Host: www.youku.com
> Accept: */*
> 
< HTTP/1.1 200 OK
< Content-Type: text/html
< Accept-Ranges: bytes
< ETag: "2574430301"
< Last-Modified: Thu, 06 Aug 2015 12:16:02 GMT
< Content-Length: 619907
< Connection: close
< Date: Thu, 06 Aug 2015 12:17:43 GMT
* Server b28www4 is not blacklisted
< Server: b28www4
< 
{ [data not shown]
100  605k  100  605k    0     0   394k      0  0:00:01  0:00:01 --:--:--  458k
* Closing connection 1




你可能感兴趣的:(关于curl命令行下载页面为空的问题)