使用Requests下载一个文件

想通过requests库下载一个文件。

环境是:Python3.5.2+requests+windows10

import requests

# 通过requests库下载文件
url = 'https://www.gipsa.usda.gov/fgis/exportgrain/CY2016.csv'
r = requests.get(url)
print(r.content)
with open("myCY2016.csv", "wb") as code:
    code.write(r.content)

但是,报错。

  File "C:\Users\admin\AppData\Local\Programs\Python\Python35-32\lib\site-packages\requests\adapters.py", line 497, in send
    raise SSLError(e, request=request)
requests.exceptions.SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:645)

看到stackoverflow中说有这样:

The problem you are having is caused by an untrusted SSL certificate.
Like @dirk mentioned in a previous comment, the *quickest* fix is setting verify=False
.
Please note that this will cause the certificate not to be verified. **This will expose your application to security risks, such as man-in-the-middle attacks.**
Of course, apply judgment. As mentioned in the comments, this *may* be acceptable for quick/throwaway applications/scripts, *but really should not go to production software*.
If just skipping the certificate check is not acceptable in your particular context, consider the following options, your best option is to set the verify
 parameter to a string that is the path of the .pem
 file of the certificate (which you should obtain by some sort of secure means).
So, as of version 2.0, the verify
 parameter accepts the following values, with their respective semantics:
True
: causes the certificate to validated against the library's own trusted certificate authorities (Note: you can see which Root Certificates Requests uses via the Certifi library, a trust database of RCs extracted from Requests: [Certifi - Trust Database for Humans](http://certifiio.readthedocs.org/en/latest/)).
False
: bypasses certificate validation *completely*.
Path to a CA_BUNDLE file for Requests to use to validate the certificates.

Source: [Requests - SSL Cert Verification](http://docs.python-requests.org/en/master/user/advanced/?highlight=ssl#ssl-cert-verification)
Also take a look at the cert
 parameter on the same link.

好的,虽然不是很明白,但是把参数verify = False设置好。继续。

C:\Users\admin\AppData\Local\Programs\Python\Python35-32\lib\site-packages\requests\packages\urllib3\connectionpool.py:843: 
InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. 
See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
  InsecureRequestWarning)

又看到stackoverflow看到这个:

The reason doing urllib3.disable_warnings()
 didn't work for you is because it looks like you're using a separate instance of urllib3 vendored inside of requests.
I gather this based on the path here: /usr/lib/python2.6/site-packages/requests/packages/urllib3/connectionpool.py

To disable warnings in requests' vendored urllib3, you'll need to import that specific instance of the module:
import requestsfrom requests.packages.urllib3.exceptions import InsecureRequestWarningrequests.packages.urllib3.disable_warnings(InsecureRequestWarning)

OK,运行成功。

贴最后的代码:

import requests
from requests.packages.urllib3.exceptions import InsecureRequestWarning
requests.packages.urllib3.disable_warnings(InsecureRequestWarning)

# 通过requests库下载文件
url = 'https://www.gipsa.usda.gov/fgis/exportgrain/CY2016.csv'
r = requests.get(url,verify = False)
print(r.content)
with open("myCY2016.csv", "wb") as code:
    code.write(r.content)

不过,好像网速巨慢。

so,问题来了,而且有很多,ssl到底是个什么东西?为什么同样是发送一个get请求,在浏览器里面就直接会触发下载按钮,而模拟的时候,却不是,而是放在body里面?也许是因为有什么字段告诉浏览器,把body里面的东西下载到本地之类的把?为什么HTTP传输的时候,有时候会说是用字节编码传递过来,有时候是用字符编码传递过来,但是最后编程网络层进行传输之后,不都是字节流吗?也许在某一层改变了呢?

另外stackoverflow真是厉害,不是吗!?

你可能感兴趣的:(使用Requests下载一个文件)