python 通过代理请求_Python请求和代理

python 通过代理请求

One of Requests’ most popular features is its simple proxying support. HTTP as a protocol has very well-defined semantics for dealing with proxies, and this has lead to widespread deployment of HTTP proxies.

Requests最受欢迎的功能之一是其简单的代理支持。 HTTP作为协议具有处理代理的非常明确的语义,这导致HTTP代理的广泛部署。

The vast majority of these proxies are ‘transparent’: that is, they sit on the message path and quietly capture HTTP messages before forwarding them on. These proxies are not a problem for people interacting with HTTP exactly because of their transparency: you don’t need to know anything about them to get your messages through.

这些代理中的绝大多数是“透明的”:也就是说,它们位于消息路径上,并在转发之前静默捕获HTTP消息。 正是由于它们的透明性,对于与HTTP交互的人们来说,这些代理不是问题:您无需了解任何有关它们的信息即可通过它们。

Many proxies however are non-transparent. The most prevalent use of this kind of HTTP proxy is at the border between a controlled LAN and the wider internet. In particular, companies and state institutions (e.g. schools) deploy HTTP proxies very widely. These proxies require explicit configuration on HTTP clients because all HTTP traffic must pass through them.

但是,许多代理是不透明的。 这种HTTP代理的最普遍使用是在受控LAN和更广泛的Internet之间的边界。 特别是,公司和州立机构(例如学校)非常广泛地部署HTTP代理。 这些代理要求在HTTP客户端上进行显式配置,因为所有HTTP流量都必须通过它们。

The widespread nature of this kind of deployment means that Requests is essentially obligated to support routing HTTP requests through proxies. Today I’m going briefly to talk about how this is done, and some particular problems we’ve had with the implementation.

这种部署的广泛性意味着,从本质上讲,请求有义务支持通过代理路由HTTP请求。 今天,我将简短地讨论这是如何完成的,以及我们在实现过程中遇到的一些特殊问题。

优点:API (The Good: The API)

From the perspective of the Requests user, the configuration of proxies is the perfect combination of simple and powerful. You simply build a dictionary, mapping URL schemes to the URL to the proxy. A proxy dictionary could look like this:

从请求用户的角度来看,代理的配置是简单和强大的完美结合。 您只需构建一个字典,即可将URL方案映射到代理的URL。 代理字典可能如下所示:

proxies proxies = = {{ 'http' 'http' : : 'http://10.0.0.1:8080''http://10.0.0.1:8080' ,
           ,
           'https''https' : : 'https://10.0.0.1:4444''https://10.0.0.1:4444' }}

This dictionary would then get passed into the standard Requests call:

然后,该字典将传递到标准Requests调用中:

Voila! Your HTTP messages are now being routed through the proxy at 10.0.0.1. If you were using a Session object then you’d just configure the proxy dictionary on the Session:

瞧! 您的HTTP消息现在正在通过位于10.0.0.1的代理进行路由。 如果您使用的是Session对象,则只需在Session上配置代理字典:

s s = = requestsrequests .. SessionSession ()
()
ss .. proxies proxies = = proxiesproxies

No big deal, right?

没关系吧?

也不错:要求内部 (Also Good: The Requests Internals)

Happily, inside Requests everything also looks pretty good. The proxies parameter isn’t used until it reaches the Transport Adapter at the bottom of the Requests stack. Here, it is used for three things. The first two are simple: it can affect the URL that Requests passes to urllib3 and we can potentially add a Proxy-Authorization header (in an ugly hack I’m not entirely proud of writing). The third thing, however, is the most complex: it affects what connection pool we use.

令人高兴的是,在Requests里面的所有内容看起来也不错。 直到到达“请求”堆栈底部的“ 传输适配器”时,才使用proxies参数。 在这里,它用于三件事。 前两个很简单:它可以影响Requests传递给urllib3的URL,并且我们可以潜在地添加Proxy-Authorization标头(在一个丑陋的hack中,我并不为编写完全感到自豪)。 但是,第三件事是最复杂的:它影响我们使用的连接池。

This is the the bit that matters most. We take great advantage of the urllib3 connection pools, and obviously all requests that pass through a proxy should use the same connection pool: after all, they’re all going to the same place. The urllib3 connection pool used for proxies is basically the same as the standard kind, but it’ll put on a few extra headers and does a bit less sanity checking. No big deal. Another win for code sanity!

这是最重要的位。 我们充分利用了urllib3连接池,显然所有通过代理的请求都应使用相同的连接池:毕竟,它们都将到达同一位置。 用于代理的urllib3连接池基本上与标准类型相同,但是它将添加一些额外的标头,并且进行的健全性检查要少一些。 没什么大不了的。 代码理智的另一个胜利!

坏处:HTTPS (The Bad: HTTPS)

So far so good, right? Unfortunately, this is where I tell you that the idealised view of proxies provided above is only half the story. You see, with the above steps, HTTP over proxies works like a charm. In fact, Requests has had functioning proxy support over HTTP for a very long time, and it has almost never broken. It’s one of the stablest parts of the library.

到目前为止一切顺利,对吗? 不幸的是,这就是我告诉你的,上面提供的代理的理想视图只是故事的一半。 通过上述步骤,您可以看到基于代理的HTTP就像一个魅力。 实际上,很长一段时间以来,Requests一直在HTTP上具有有效的代理支持,并且几乎从未中断过。 它是库中最稳定的部分之一。

However, proxying and HTTPS is a totally different story. To explain why I’m going to walk you through a little bit of proxying in Requests.

但是,代理和HTTPS完全不同。 解释为什么我将带您逐步了解Requests中的代理。

To do that, we’re going to use a tool that I consider to be a vital weapon in the arsenal of the network programmer: mitmproxy. The list of sweet features in mitmproxy is as long as my arm, so I’ll just direct you to their website. In this case, we’re going to abuse it as a cheap, easy to run proxy.

为此,我们将使用一种我认为是网络程序员中至关重要的工具: mitmproxy 。 mitmproxy的甜蜜功能列表只要我一臂之力,因此,我将直接带您访问他们的网站。 在这种情况下,我们将滥用它作为廉价,易于运行的代理。

We crack it out, and then get to work. First, let’s pass a simple HTTP request through it:

我们将其破解,然后开始工作。 首先,让我们通过一个简单的HTTP请求:

In the mitmproxy window we can see the request and response come through, no big deal:

在mitmproxy窗口中,我们可以看到请求和响应通过了,没什么大不了的:

GET http://www.google.com/
    <- 200 text/html 10.58kB
GET http://www.google.com/
    <- 200 text/html 10.58kB
 

Awesome, so we know it works. Now, let’s try to pass an HTTPS request through it:

太棒了,所以我们知道它可以工作。 现在,让我们尝试通过它传递HTTPS请求:

Uh-oh. What the hell happened there?

哦哦 那里到底发生了什么?

The short answer is that everything went to hell in a hand-basket, and to understand why you need to understand what happens when you try to proxy a HTTPS request.

简短的答案是,一切都陷入困境,并理解为什么您需要了解尝试代理HTTPS请求时会发生什么。

代理HTTPS (Proxying HTTPS)

The thing about HTTPS is that it relies on secure connections created using public key cryptography. The keys for the connection are established using cryptographically signed certificates, which are handed out by certificate authorities (by ‘handed out’ I mean ‘exorbitantly charged for’). In principle these authorities (also called ‘CA’s) should verify the person applying for the certificate owns the domain in question: in practice if a government comes to them and asks really nicely, they’ll usually hand over a new set of keys.

关于HTTPS的事情是,它依赖于使用公钥加密创建的安全连接。 使用加密签名的证书建立连接的密钥,证书由证书颁发机构发放(“发放”是指“高额收费”)。 原则上,这些机构(也称为“ CA”)应验证申请证书的人是否拥有相关域:实际上,如果政府来找他们并提出很好的要求,他们通常会交出一套新的密钥。

When your computer establishes an SSL connection, it begins by performing the SSL handshake, which involves handing certificates over. This certificate is only valid for a single domain, and nothing else. If your User-Agent verifies SSL certificates (like Requests does by default), your connection will fail if the machine you’re connecting to hands over a certificate that isn’t correct for the domain.

当您的计算机建立SSL连接时,它首先执行SSL握手,其中包括移交证书。 该证书仅对单个域有效,其他都无效。 如果您的User-Agent验证SSL证书(如“请求”默认情况下一样),则当您连接的计算机交出的证书不适用于域时,连接将失败。

This poses a problem for proxying HTTPS traffic. To send the message on the proxy needs to know where it’s going, but it can’t find out without performing the SSL handshake. It can’t do that because it doesn’t have the right certificate for the connection, so the User-Agent will terminate the connection attempt. (Those au fait with SSL/TLS will note I’ve simplified a lot here, but we don’t have time for the full discussion.)

这给代理HTTPS流量带来了问题。 在代理上发送消息需要知道消息的去向,但是如果不执行SSL握手,消息是无法发现的。 它无法执行此操作,因为它没有用于连接的正确证书,因此User-Agent将终止连接尝试。 (那些使用SSL / TLS的事实会在这里简化很多,但我们没有时间进行完整的讨论。)

The solution has been to use the HTTP CONNECT verb. The CONNECT verb essentially turns HTTP into a tunnel over which you can send raw TCP data. This is obviously ludicrously inefficient (TCP over HTTP over TCP), but means the proxy can pass your handshake (and then the subsequent encrypted messages) along without needing to be able to read them.

解决方案是使用HTTP CONNECT动词。 CONNECT动词实际上将HTTP变成了一个隧道,您可以通过该隧道发送原始TCP数据。 这显然是低效的(通过TCP上的TCP上的TCP上的TCP),但是这意味着代理可以传递您的握手(然后传递随后的加密消息),而无需能够读取它们。

So what’s the problem?

所以有什么问题?

是的,我们不那样做 (Yeah, We Don’t Do That)

Requests does not support the CONNECT verb. At all. This is because our underlying HTTP connection library, urllib3, also doesn’t support it. There has been an open Pull Request on urllib3 for some time, but it has been essentially abandoned by its original author and there’s not been a sufficient push to get the rebased version up to standards. This is, in my opinion, the single biggest problem Requests has as a library at the moment.

请求不支持CONNECT动词。 完全没有 这是因为我们的基础HTTP连接库urllib3也不支持它。 在urllib3上已经有一个开放的Pull Request ,但是它实际上已经被它的原始作者放弃了,并且没有足够的推动力来使重新构建的版本达到标准。 我认为,这是Requests目前作为图书馆所面临的最大问题。

如何使HTTPS正常工作? (How Do I Get HTTPS Working?)

It depends. If you want HTTPS proxying without the proxy being able to read it, I’m afraid you can’t use Requests right now. This is unfortunate, but until we can get the other Pull Request to move forward that’s just where we are.

这取决于。 如果您希望HTTPS代理在代理无法读取的情况下使用,恐怕您现在无法使用请求。 这是不幸的,但是直到我们获得另一个“拉取请求”以继续前进,这才是我们所处的位置。

However, if you want to be able to connect to HTTPS URLs and don’t care if the proxy can read it (more fool you!), you can set up your proxies argument like this:

但是,如果您希望能够连接到HTTPS URL,并且不关心代理是否可以读取它(您真是个傻瓜!),则可以像下面这样设置您的代理参数:

proxies = {'https': 'http://127.0.0.1:8080'}
proxies = {'https': 'http://127.0.0.1:8080'}
 

翻译自: https://www.pybloggers.com/2013/07/python-requests-and-proxies/

python 通过代理请求

你可能感兴趣的:(网络,python,java,https,人工智能)