python httplib2 - 使用代理出错

python httplib2 - 使用代理出错 - 从波 - 博客园

python httplib2 - 使用代理出错

httplib2使用socksipy实现代理支持,示例代码如下:

import httplib2 
import socks 

client = httplib2.Http(proxy_info = httplib2.ProxyInfo(socks.PROXY_TYPE_HTTP, "190.253.95.219", 8080)) 

 

很多代理连接死循环,一些代理连接失败,而用pycurl测试同样的代理无问题。

 

python version:2.7

httplib2 version:0.7.2

socks verson:1.00

 

socks.py模块介绍:http://socksipy.sourceforge.net/readme.txt,相关部分如下:

PROXY COMPATIBILITY
SocksiPy is compatible with three different types of proxies:
1. SOCKS Version 4 (Socks4), including the Socks4a extension.
2. SOCKS Version 5 (Socks5).
3. HTTP Proxies which support tunneling using the CONNECT method.

 

那些httplib2失败的代理是因为socksipy仅支持使用CONNECT方法的代理服务器,CONNECT方法是HTTP/1.1协议中预留给能够将连接改为管道方式的代理服务器,即支持HTTPS。与HTTP的URL由“http://”起始且默认使用端口80不同,HTTPS的URL由“https://”起始且默认使用端口443。

 

修改socks.py模块相关源代码,/usr/lib/python2.6/dist-packages/socks.py,添加一个"PROXY_TYPE_HTTP_NO_TUNNEL"类型,当失败时再跳到"PROXY_TYPE_HTTP"类型。

 

补丁参考:http://code.google.com/p/xbmc-iplayerv2/source/browse/trunk/plugin.video.iplayer/lib/httplib2/socks.py,测试可用,代码如下:

View Code
  1 """SocksiPy - Python SOCKS module.
  2 Version 1.00
  3 
  4 Copyright 2006 Dan-Haim. All rights reserved.
  5 
  6 Redistribution and use in source and binary forms, with or without modification,
  7 are permitted provided that the following conditions are met:
  8 1. Redistributions of source code must retain the above copyright notice, this
  9    list of conditions and the following disclaimer.
 10 2. Redistributions in binary form must reproduce the above copyright notice,
 11    this list of conditions and the following disclaimer in the documentation
 12    and/or other materials provided with the distribution.
 13 3. Neither the name of Dan Haim nor the names of his contributors may be used
 14    to endorse or promote products derived from this software without specific
 15    prior written permission.
 16    
 17 THIS SOFTWARE IS PROVIDED BY DAN HAIM "AS IS" AND ANY EXPRESS OR IMPLIED
 18 WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
 19 MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO
 20 EVENT SHALL DAN HAIM OR HIS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
 21 INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
 22 LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA
 23 OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
 24 LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
 25 OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMANGE.
 26 
 27 
 28 This module provides a standard socket-like interface for Python
 29 for tunneling connections through SOCKS proxies.
 30 
 31 """
 32 
 33 """
 34 
 35 Minor modifications made by Christopher Gilbert (http://motomastyle.com/)
 36 for use in PyLoris (http://pyloris.sourceforge.net/)
 37 
 38 Minor modifications made by Mario Vilas (http://breakingcode.wordpress.com/)
 39 mainly to merge bug fixes found in Sourceforge
 40 
 41 """
 42 
 43 import socket
 44 import struct
 45 import sys
 46 import base64
 47 
 48 PROXY_TYPE_SOCKS4 = 1
 49 PROXY_TYPE_SOCKS5 = 2
 50 PROXY_TYPE_HTTP = 3
 51 PROXY_TYPE_HTTP_NO_TUNNEL = 4
 52 
 53 _defaultproxy = None
 54 _orgsocket = socket.socket
 55 
 56 class ProxyError(Exception): pass
 57 class GeneralProxyError(ProxyError): pass
 58 class Socks5AuthError(ProxyError): pass
 59 class Socks5Error(ProxyError): pass
 60 class Socks4Error(ProxyError): pass
 61 class HTTPError(ProxyError): pass
 62 
 63 _generalerrors = ("success",
 64     "invalid data",
 65     "not connected",
 66     "not available",
 67     "bad proxy type",
 68     "bad input")
 69 
 70 _socks5errors = ("succeeded",
 71     "general SOCKS server failure",
 72     "connection not allowed by ruleset",
 73     "Network unreachable",
 74     "Host unreachable",
 75     "Connection refused",
 76     "TTL expired",
 77     "Command not supported",
 78     "Address type not supported",
 79     "Unknown error")
 80 
 81 _socks5autherrors = ("succeeded",
 82     "authentication is required",
 83     "all offered authentication methods were rejected",
 84     "unknown username or invalid password",
 85     "unknown error")
 86 
 87 _socks4errors = ("request granted",
 88     "request rejected or failed",
 89     "request rejected because SOCKS server cannot connect to identd on the client",
 90     "request rejected because the client program and identd report different user-ids",
 91     "unknown error")
 92 
 93 def setdefaultproxy(proxytype=None, addr=None, port=None, rdns=True, username=None, password=None):
 94     """setdefaultproxy(proxytype, addr[, port[, rdns[, username[, password]]]])
 95     Sets a default proxy which all further socksocket objects will use,
 96     unless explicitly changed.
 97     """
 98     global _defaultproxy
 99     _defaultproxy = (proxytype, addr, port, rdns, username, password)
100 
101 def wrapmodule(module):
102     """wrapmodule(module)
103     Attempts to replace a module's socket library with a SOCKS socket. Must set
104     a default proxy using setdefaultproxy(...) first.
105     This will only work on modules that import socket directly into the namespace;
106     most of the Python Standard Library falls into this category.
107     """
108     if _defaultproxy != None:
109         module.socket.socket = socksocket
110     else:
111         raise GeneralProxyError((4, "no proxy specified"))
112 
113 class socksocket(socket.socket):
114     """socksocket([family[, type[, proto]]]) -> socket object
115     Open a SOCKS enabled socket. The parameters are the same as
116     those of the standard socket init. In order for SOCKS to work,
117     you must specify family=AF_INET, type=SOCK_STREAM and proto=0.
118     """
119 
120     def __init__(self, family=socket.AF_INET, type=socket.SOCK_STREAM, proto=0, _sock=None):
121         _orgsocket.__init__(self, family, type, proto, _sock)
122         if _defaultproxy != None:
123             self.__proxy = _defaultproxy
124         else:
125             self.__proxy = (None, None, None, None, None, None)
126         self.__proxysockname = None
127         self.__proxypeername = None
128 
129         self.__httptunnel = True
130 
131     def __recvall(self, count):
132         """__recvall(count) -> data
133         Receive EXACTLY the number of bytes requested from the socket.
134         Blocks until the required number of bytes have been received.
135         """
136         data = self.recv(count)
137         while len(data) < count:
138             d = self.recv(count-len(data))
139             if not d: raise GeneralProxyError((0, "connection closed unexpectedly"))
140             data = data + d
141         return data
142 
143     def sendall(self, content, *args):
144         """ override socket.socket.sendall method to rewrite the header 
145         for non-tunneling proxies if needed 
146         """
147         if not self.__httptunnel:
148             content = self.__rewriteproxy(content)
149 
150         return super(socksocket, self).sendall(content, *args)
151 
152     def __rewriteproxy(self, header):
153         """ rewrite HTTP request headers to support non-tunneling proxies 
154         (i.e. thos which do not support the CONNECT method).
155         This only works for HTTP (not HTTPS) since HTTPS requires tunneling.
156         """
157         host, endpt = None, None
158         hdrs = header.split("\r\n")
159         for hdr in hdrs:
160             if hdr.lower().startswith("host:"):
161                 host = hdr
162             elif hdr.lower().startswith("get") or hdr.lower().startswith("post"):
163                 endpt = hdr
164         if host and endpt: 
165             hdrs.remove(host)
166             hdrs.remove(endpt)
167             host = host.split(" ")[1]
168             endpt = endpt.split(" ")
169             if (self.__proxy[4] != None and self.__proxy[5] != None):
170                 hdrs.insert(0, self.__getauthheader())
171             hdrs.insert(0, "Host: %s" % host)
172             hdrs.insert(0, "%s http://%s%s %s" % (endpt[0], host, endpt[1], endpt[2]))
173 
174         return "\r\n".join(hdrs)
175 
176     def __getauthheader(self):
177         auth = self.__proxy[4] + ":" + self.__proxy[5]
178         return "Proxy-Authorization: Basic " + base64.b64encode(auth)
179 
180     def setproxy(self, proxytype=None, addr=None, port=None, rdns=True, username=None, password=None):
181         """setproxy(proxytype, addr[, port[, rdns[, username[, password]]]])
182         Sets the proxy to be used.
183         proxytype -    The type of the proxy to be used. Three types
184                 are supported: PROXY_TYPE_SOCKS4 (including socks4a),
185                 PROXY_TYPE_SOCKS5 and PROXY_TYPE_HTTP
186         addr -        The address of the server (IP or DNS).
187         port -        The port of the server. Defaults to 1080 for SOCKS
188                 servers and 8080 for HTTP proxy servers.
189         rdns -        Should DNS queries be preformed on the remote side
190                 (rather than the local side). The default is True.
191                 Note: This has no effect with SOCKS4 servers.
192         username -    Username to authenticate with to the server.
193                 The default is no authentication.
194         password -    Password to authenticate with to the server.
195                 Only relevant when username is also provided.
196         """
197         self.__proxy = (proxytype, addr, port, rdns, username, password)
198 
199     def __negotiatesocks5(self, destaddr, destport):
200         """__negotiatesocks5(self,destaddr,destport)
201         Negotiates a connection through a SOCKS5 server.
202         """
203         # First we'll send the authentication packages we support.
204         if (self.__proxy[4]!=None) and (self.__proxy[5]!=None):
205             # The username/password details were supplied to the
206             # setproxy method so we support the USERNAME/PASSWORD
207             # authentication (in addition to the standard none).
208             self.sendall(struct.pack('BBBB', 0x05, 0x02, 0x00, 0x02))
209         else:
210             # No username/password were entered, therefore we
211             # only support connections with no authentication.
212             self.sendall(struct.pack('BBB', 0x05, 0x01, 0x00))
213         # We'll receive the server's response to determine which
214         # method was selected
215         chosenauth = self.__recvall(2)
216         if chosenauth[0:1] != chr(0x05).encode():
217             self.close()
218             raise GeneralProxyError((1, _generalerrors[1]))
219         # Check the chosen authentication method
220         if chosenauth[1:2] == chr(0x00).encode():
221             # No authentication is required
222             pass
223         elif chosenauth[1:2] == chr(0x02).encode():
224             # Okay, we need to perform a basic username/password
225             # authentication.
226             self.sendall(chr(0x01).encode() + chr(len(self.__proxy[4])) + self.__proxy[4] + chr(len(self.__proxy[5])) + self.__proxy[5])
227             authstat = self.__recvall(2)
228             if authstat[0:1] != chr(0x01).encode():
229                 # Bad response
230                 self.close()
231                 raise GeneralProxyError((1, _generalerrors[1]))
232             if authstat[1:2] != chr(0x00).encode():
233                 # Authentication failed
234                 self.close()
235                 raise Socks5AuthError((3, _socks5autherrors[3]))
236             # Authentication succeeded
237         else:
238             # Reaching here is always bad
239             self.close()
240             if chosenauth[1] == chr(0xFF).encode():
241                 raise Socks5AuthError((2, _socks5autherrors[2]))
242             else:
243                 raise GeneralProxyError((1, _generalerrors[1]))
244         # Now we can request the actual connection
245         req = struct.pack('BBB', 0x05, 0x01, 0x00)
246         # If the given destination address is an IP address, we'll
247         # use the IPv4 address request even if remote resolving was specified.
248         try:
249             ipaddr = socket.inet_aton(destaddr)
250             req = req + chr(0x01).encode() + ipaddr
251         except socket.error:
252             # Well it's not an IP number,  so it's probably a DNS name.
253             if self.__proxy[3]:
254                 # Resolve remotely
255                 ipaddr = None
256                 req = req + chr(0x03).encode() + chr(len(destaddr)).encode() + destaddr
257             else:
258                 # Resolve locally
259                 ipaddr = socket.inet_aton(socket.gethostbyname(destaddr))
260                 req = req + chr(0x01).encode() + ipaddr
261         req = req + struct.pack(">H", destport)
262         self.sendall(req)
263         # Get the response
264         resp = self.__recvall(4)
265         if resp[0:1] != chr(0x05).encode():
266             self.close()
267             raise GeneralProxyError((1, _generalerrors[1]))
268         elif resp[1:2] != chr(0x00).encode():
269             # Connection failed
270             self.close()
271             if ord(resp[1:2])<=8:
272                 raise Socks5Error((ord(resp[1:2]), _socks5errors[ord(resp[1:2])]))
273             else:
274                 raise Socks5Error((9, _socks5errors[9]))
275         # Get the bound address/port
276         elif resp[3:4] == chr(0x01).encode():
277             boundaddr = self.__recvall(4)
278         elif resp[3:4] == chr(0x03).encode():
279             resp = resp + self.recv(1)
280             boundaddr = self.__recvall(ord(resp[4:5]))
281         else:
282             self.close()
283             raise GeneralProxyError((1,_generalerrors[1]))
284         boundport = struct.unpack(">H", self.__recvall(2))[0]
285         self.__proxysockname = (boundaddr, boundport)
286         if ipaddr != None:
287             self.__proxypeername = (socket.inet_ntoa(ipaddr), destport)
288         else:
289             self.__proxypeername = (destaddr, destport)
290 
291     def getproxysockname(self):
292         """getsockname() -> address info
293         Returns the bound IP address and port number at the proxy.
294         """
295         return self.__proxysockname
296 
297     def getproxypeername(self):
298         """getproxypeername() -> address info
299         Returns the IP and port number of the proxy.
300         """
301         return _orgsocket.getpeername(self)
302 
303     def getpeername(self):
304         """getpeername() -> address info
305         Returns the IP address and port number of the destination
306         machine (note: getproxypeername returns the proxy)
307         """
308         return self.__proxypeername
309 
310     def __negotiatesocks4(self,destaddr,destport):
311         """__negotiatesocks4(self,destaddr,destport)
312         Negotiates a connection through a SOCKS4 server.
313         """
314         # Check if the destination address provided is an IP address
315         rmtrslv = False
316         try:
317             ipaddr = socket.inet_aton(destaddr)
318         except socket.error:
319             # It's a DNS name. Check where it should be resolved.
320             if self.__proxy[3]:
321                 ipaddr = struct.pack("BBBB", 0x00, 0x00, 0x00, 0x01)
322                 rmtrslv = True
323             else:
324                 ipaddr = socket.inet_aton(socket.gethostbyname(destaddr))
325         # Construct the request packet
326         req = struct.pack(">BBH", 0x04, 0x01, destport) + ipaddr
327         # The username parameter is considered userid for SOCKS4
328         if self.__proxy[4] != None:
329             req = req + self.__proxy[4]
330         req = req + chr(0x00).encode()
331         # DNS name if remote resolving is required
332         # NOTE: This is actually an extension to the SOCKS4 protocol
333         # called SOCKS4A and may not be supported in all cases.
334         if rmtrslv:
335             req = req + destaddr + chr(0x00).encode()
336         self.sendall(req)
337         # Get the response from the server
338         resp = self.__recvall(8)
339         if resp[0:1] != chr(0x00).encode():
340             # Bad data
341             self.close()
342             raise GeneralProxyError((1,_generalerrors[1]))
343         if resp[1:2] != chr(0x5A).encode():
344             # Server returned an error
345             self.close()
346             if ord(resp[1:2]) in (91, 92, 93):
347                 self.close()
348                 raise Socks4Error((ord(resp[1:2]), _socks4errors[ord(resp[1:2]) - 90]))
349             else:
350                 raise Socks4Error((94, _socks4errors[4]))
351         # Get the bound address/port
352         self.__proxysockname = (socket.inet_ntoa(resp[4:]), struct.unpack(">H", resp[2:4])[0])
353         if rmtrslv != None:
354             self.__proxypeername = (socket.inet_ntoa(ipaddr), destport)
355         else:
356             self.__proxypeername = (destaddr, destport)
357 
358     def __negotiatehttp(self, destaddr, destport):
359         """__negotiatehttp(self,destaddr,destport)
360         Negotiates a connection through an HTTP server.
361         """
362         # If we need to resolve locally, we do this now
363         if not self.__proxy[3]:
364             addr = socket.gethostbyname(destaddr)
365         else:
366             addr = destaddr
367         headers =  "CONNECT " + addr + ":" + str(destport) + " HTTP/1.1\r\n"
368         headers += "Host: " + destaddr + "\r\n"
369         if (self.__proxy[4] != None and self.__proxy[5] != None):
370                 headers += self.__getauthheader() + "\r\n"
371         headers += "\r\n"
372         self.sendall(headers.encode())
373         # We read the response until we get the string "\r\n\r\n"
374         resp = self.recv(1)
375         while resp.find("\r\n\r\n".encode()) == -1:
376             resp = resp + self.recv(1)
377         # We just need the first line to check if the connection
378         # was successful
379         statusline = resp.splitlines()[0].split(" ".encode(), 2)
380         if statusline[0] not in ("HTTP/1.0".encode(), "HTTP/1.1".encode()):
381             self.close()
382             raise GeneralProxyError((1, _generalerrors[1]))
383         try:
384             statuscode = int(statusline[1])
385         except ValueError:
386             self.close()
387             raise GeneralProxyError((1, _generalerrors[1]))
388         if statuscode != 200:
389             self.close()
390             raise HTTPError((statuscode, statusline[2]))
391         self.__proxysockname = ("0.0.0.0", 0)
392         self.__proxypeername = (addr, destport)
393 
394     def connect(self, destpair):
395         """connect(self, despair)
396         Connects to the specified destination through a proxy.
397         destpar - A tuple of the IP/DNS address and the port number.
398         (identical to socket's connect).
399         To select the proxy server use setproxy().
400         """
401         # Do a minimal input check first
402         if (not type(destpair) in (list,tuple)) or (len(destpair) < 2) or (type(destpair[0]) != type('')) or (type(destpair[1]) != int):
403             raise GeneralProxyError((5, _generalerrors[5]))
404         if self.__proxy[0] == PROXY_TYPE_SOCKS5:
405             if self.__proxy[2] != None:
406                 portnum = self.__proxy[2]
407             else:
408                 portnum = 1080
409             _orgsocket.connect(self, (self.__proxy[1], portnum))
410             self.__negotiatesocks5(destpair[0], destpair[1])
411         elif self.__proxy[0] == PROXY_TYPE_SOCKS4:
412             if self.__proxy[2] != None:
413                 portnum = self.__proxy[2]
414             else:
415                 portnum = 1080
416             _orgsocket.connect(self,(self.__proxy[1], portnum))
417             self.__negotiatesocks4(destpair[0], destpair[1])
418         elif self.__proxy[0] == PROXY_TYPE_HTTP:
419             if self.__proxy[2] != None:
420                 portnum = self.__proxy[2]
421             else:
422                 portnum = 8080
423             _orgsocket.connect(self,(self.__proxy[1], portnum))
424             self.__negotiatehttp(destpair[0], destpair[1])
425         elif self.__proxy[0] == PROXY_TYPE_HTTP_NO_TUNNEL:
426             if self.__proxy[2] != None:
427                 portnum = self.__proxy[2]
428             else:
429                 portnum = 8080
430             _orgsocket.connect(self,(self.__proxy[1],portnum))
431             if destpair[1] == 443:
432                 print "WARN: SSL connections (generally on port 443) require the use of tunneling - failing back to PROXY_TYPE_HTTP"
433                 self.__negotiatehttp(destpair[0],destpair[1])
434             else:
435                 self.__httptunnel = False
436         elif self.__proxy[0] == None:
437             _orgsocket.connect(self, (destpair[0], destpair[1]))
438         else:
439             raise GeneralProxyError((4, _generalerrors[4]))

 

参考:

http://code.google.com/p/httplib2/issues/detail?id=38

 

原文:http://www.cnblogs.com/congbo/archive/2012/08/16/2641079.html

你可能感兴趣的:(python)