python httplib2 - 使用代理出错 - 从波 - 博客园
python httplib2 - 使用代理出错
httplib2使用socksipy实现代理支持,示例代码如下:
import httplib2 import socks client = httplib2.Http(proxy_info = httplib2.ProxyInfo(socks.PROXY_TYPE_HTTP, "190.253.95.219", 8080))
很多代理连接死循环,一些代理连接失败,而用pycurl测试同样的代理无问题。
python version:2.7
httplib2 version:0.7.2
socks verson:1.00
socks.py模块介绍:http://socksipy.sourceforge.net/readme.txt,相关部分如下:
PROXY COMPATIBILITY SocksiPy is compatible with three different types of proxies: 1. SOCKS Version 4 (Socks4), including the Socks4a extension. 2. SOCKS Version 5 (Socks5). 3. HTTP Proxies which support tunneling using the CONNECT method.
那些httplib2失败的代理是因为socksipy仅支持使用CONNECT方法的代理服务器,CONNECT方法是HTTP/1.1协议中预留给能够将连接改为管道方式的代理服务器,即支持HTTPS。与HTTP的URL由“
http://
”起始且默认使用端口80不同,HTTPS的URL由“https://
”起始且默认使用端口443。
修改socks.py模块相关源代码,/usr/lib/python2.6/dist-packages/socks.py,添加一个"PROXY_TYPE_HTTP_NO_TUNNEL"类型,当失败时再跳到"PROXY_TYPE_HTTP"类型。
补丁参考:http://code.google.com/p/xbmc-iplayerv2/source/browse/trunk/plugin.video.iplayer/lib/httplib2/socks.py,测试可用,代码如下:
View Code1 """SocksiPy - Python SOCKS module. 2 Version 1.00 3 4 Copyright 2006 Dan-Haim. All rights reserved. 5 6 Redistribution and use in source and binary forms, with or without modification, 7 are permitted provided that the following conditions are met: 8 1. Redistributions of source code must retain the above copyright notice, this 9 list of conditions and the following disclaimer. 10 2. Redistributions in binary form must reproduce the above copyright notice, 11 this list of conditions and the following disclaimer in the documentation 12 and/or other materials provided with the distribution. 13 3. Neither the name of Dan Haim nor the names of his contributors may be used 14 to endorse or promote products derived from this software without specific 15 prior written permission. 16 17 THIS SOFTWARE IS PROVIDED BY DAN HAIM "AS IS" AND ANY EXPRESS OR IMPLIED 18 WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF 19 MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO 20 EVENT SHALL DAN HAIM OR HIS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, 21 INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT 22 LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA 23 OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF 24 LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT 25 OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMANGE. 26 27 28 This module provides a standard socket-like interface for Python 29 for tunneling connections through SOCKS proxies. 30 31 """ 32 33 """ 34 35 Minor modifications made by Christopher Gilbert (http://motomastyle.com/) 36 for use in PyLoris (http://pyloris.sourceforge.net/) 37 38 Minor modifications made by Mario Vilas (http://breakingcode.wordpress.com/) 39 mainly to merge bug fixes found in Sourceforge 40 41 """ 42 43 import socket 44 import struct 45 import sys 46 import base64 47 48 PROXY_TYPE_SOCKS4 = 1 49 PROXY_TYPE_SOCKS5 = 2 50 PROXY_TYPE_HTTP = 3 51 PROXY_TYPE_HTTP_NO_TUNNEL = 4 52 53 _defaultproxy = None 54 _orgsocket = socket.socket 55 56 class ProxyError(Exception): pass 57 class GeneralProxyError(ProxyError): pass 58 class Socks5AuthError(ProxyError): pass 59 class Socks5Error(ProxyError): pass 60 class Socks4Error(ProxyError): pass 61 class HTTPError(ProxyError): pass 62 63 _generalerrors = ("success", 64 "invalid data", 65 "not connected", 66 "not available", 67 "bad proxy type", 68 "bad input") 69 70 _socks5errors = ("succeeded", 71 "general SOCKS server failure", 72 "connection not allowed by ruleset", 73 "Network unreachable", 74 "Host unreachable", 75 "Connection refused", 76 "TTL expired", 77 "Command not supported", 78 "Address type not supported", 79 "Unknown error") 80 81 _socks5autherrors = ("succeeded", 82 "authentication is required", 83 "all offered authentication methods were rejected", 84 "unknown username or invalid password", 85 "unknown error") 86 87 _socks4errors = ("request granted", 88 "request rejected or failed", 89 "request rejected because SOCKS server cannot connect to identd on the client", 90 "request rejected because the client program and identd report different user-ids", 91 "unknown error") 92 93 def setdefaultproxy(proxytype=None, addr=None, port=None, rdns=True, username=None, password=None): 94 """setdefaultproxy(proxytype, addr[, port[, rdns[, username[, password]]]]) 95 Sets a default proxy which all further socksocket objects will use, 96 unless explicitly changed. 97 """ 98 global _defaultproxy 99 _defaultproxy = (proxytype, addr, port, rdns, username, password) 100 101 def wrapmodule(module): 102 """wrapmodule(module) 103 Attempts to replace a module's socket library with a SOCKS socket. Must set 104 a default proxy using setdefaultproxy(...) first. 105 This will only work on modules that import socket directly into the namespace; 106 most of the Python Standard Library falls into this category. 107 """ 108 if _defaultproxy != None: 109 module.socket.socket = socksocket 110 else: 111 raise GeneralProxyError((4, "no proxy specified")) 112 113 class socksocket(socket.socket): 114 """socksocket([family[, type[, proto]]]) -> socket object 115 Open a SOCKS enabled socket. The parameters are the same as 116 those of the standard socket init. In order for SOCKS to work, 117 you must specify family=AF_INET, type=SOCK_STREAM and proto=0. 118 """ 119 120 def __init__(self, family=socket.AF_INET, type=socket.SOCK_STREAM, proto=0, _sock=None): 121 _orgsocket.__init__(self, family, type, proto, _sock) 122 if _defaultproxy != None: 123 self.__proxy = _defaultproxy 124 else: 125 self.__proxy = (None, None, None, None, None, None) 126 self.__proxysockname = None 127 self.__proxypeername = None 128 129 self.__httptunnel = True 130 131 def __recvall(self, count): 132 """__recvall(count) -> data 133 Receive EXACTLY the number of bytes requested from the socket. 134 Blocks until the required number of bytes have been received. 135 """ 136 data = self.recv(count) 137 while len(data) < count: 138 d = self.recv(count-len(data)) 139 if not d: raise GeneralProxyError((0, "connection closed unexpectedly")) 140 data = data + d 141 return data 142 143 def sendall(self, content, *args): 144 """ override socket.socket.sendall method to rewrite the header 145 for non-tunneling proxies if needed 146 """ 147 if not self.__httptunnel: 148 content = self.__rewriteproxy(content) 149 150 return super(socksocket, self).sendall(content, *args) 151 152 def __rewriteproxy(self, header): 153 """ rewrite HTTP request headers to support non-tunneling proxies 154 (i.e. thos which do not support the CONNECT method). 155 This only works for HTTP (not HTTPS) since HTTPS requires tunneling. 156 """ 157 host, endpt = None, None 158 hdrs = header.split("\r\n") 159 for hdr in hdrs: 160 if hdr.lower().startswith("host:"): 161 host = hdr 162 elif hdr.lower().startswith("get") or hdr.lower().startswith("post"): 163 endpt = hdr 164 if host and endpt: 165 hdrs.remove(host) 166 hdrs.remove(endpt) 167 host = host.split(" ")[1] 168 endpt = endpt.split(" ") 169 if (self.__proxy[4] != None and self.__proxy[5] != None): 170 hdrs.insert(0, self.__getauthheader()) 171 hdrs.insert(0, "Host: %s" % host) 172 hdrs.insert(0, "%s http://%s%s %s" % (endpt[0], host, endpt[1], endpt[2])) 173 174 return "\r\n".join(hdrs) 175 176 def __getauthheader(self): 177 auth = self.__proxy[4] + ":" + self.__proxy[5] 178 return "Proxy-Authorization: Basic " + base64.b64encode(auth) 179 180 def setproxy(self, proxytype=None, addr=None, port=None, rdns=True, username=None, password=None): 181 """setproxy(proxytype, addr[, port[, rdns[, username[, password]]]]) 182 Sets the proxy to be used. 183 proxytype - The type of the proxy to be used. Three types 184 are supported: PROXY_TYPE_SOCKS4 (including socks4a), 185 PROXY_TYPE_SOCKS5 and PROXY_TYPE_HTTP 186 addr - The address of the server (IP or DNS). 187 port - The port of the server. Defaults to 1080 for SOCKS 188 servers and 8080 for HTTP proxy servers. 189 rdns - Should DNS queries be preformed on the remote side 190 (rather than the local side). The default is True. 191 Note: This has no effect with SOCKS4 servers. 192 username - Username to authenticate with to the server. 193 The default is no authentication. 194 password - Password to authenticate with to the server. 195 Only relevant when username is also provided. 196 """ 197 self.__proxy = (proxytype, addr, port, rdns, username, password) 198 199 def __negotiatesocks5(self, destaddr, destport): 200 """__negotiatesocks5(self,destaddr,destport) 201 Negotiates a connection through a SOCKS5 server. 202 """ 203 # First we'll send the authentication packages we support. 204 if (self.__proxy[4]!=None) and (self.__proxy[5]!=None): 205 # The username/password details were supplied to the 206 # setproxy method so we support the USERNAME/PASSWORD 207 # authentication (in addition to the standard none). 208 self.sendall(struct.pack('BBBB', 0x05, 0x02, 0x00, 0x02)) 209 else: 210 # No username/password were entered, therefore we 211 # only support connections with no authentication. 212 self.sendall(struct.pack('BBB', 0x05, 0x01, 0x00)) 213 # We'll receive the server's response to determine which 214 # method was selected 215 chosenauth = self.__recvall(2) 216 if chosenauth[0:1] != chr(0x05).encode(): 217 self.close() 218 raise GeneralProxyError((1, _generalerrors[1])) 219 # Check the chosen authentication method 220 if chosenauth[1:2] == chr(0x00).encode(): 221 # No authentication is required 222 pass 223 elif chosenauth[1:2] == chr(0x02).encode(): 224 # Okay, we need to perform a basic username/password 225 # authentication. 226 self.sendall(chr(0x01).encode() + chr(len(self.__proxy[4])) + self.__proxy[4] + chr(len(self.__proxy[5])) + self.__proxy[5]) 227 authstat = self.__recvall(2) 228 if authstat[0:1] != chr(0x01).encode(): 229 # Bad response 230 self.close() 231 raise GeneralProxyError((1, _generalerrors[1])) 232 if authstat[1:2] != chr(0x00).encode(): 233 # Authentication failed 234 self.close() 235 raise Socks5AuthError((3, _socks5autherrors[3])) 236 # Authentication succeeded 237 else: 238 # Reaching here is always bad 239 self.close() 240 if chosenauth[1] == chr(0xFF).encode(): 241 raise Socks5AuthError((2, _socks5autherrors[2])) 242 else: 243 raise GeneralProxyError((1, _generalerrors[1])) 244 # Now we can request the actual connection 245 req = struct.pack('BBB', 0x05, 0x01, 0x00) 246 # If the given destination address is an IP address, we'll 247 # use the IPv4 address request even if remote resolving was specified. 248 try: 249 ipaddr = socket.inet_aton(destaddr) 250 req = req + chr(0x01).encode() + ipaddr 251 except socket.error: 252 # Well it's not an IP number, so it's probably a DNS name. 253 if self.__proxy[3]: 254 # Resolve remotely 255 ipaddr = None 256 req = req + chr(0x03).encode() + chr(len(destaddr)).encode() + destaddr 257 else: 258 # Resolve locally 259 ipaddr = socket.inet_aton(socket.gethostbyname(destaddr)) 260 req = req + chr(0x01).encode() + ipaddr 261 req = req + struct.pack(">H", destport) 262 self.sendall(req) 263 # Get the response 264 resp = self.__recvall(4) 265 if resp[0:1] != chr(0x05).encode(): 266 self.close() 267 raise GeneralProxyError((1, _generalerrors[1])) 268 elif resp[1:2] != chr(0x00).encode(): 269 # Connection failed 270 self.close() 271 if ord(resp[1:2])<=8: 272 raise Socks5Error((ord(resp[1:2]), _socks5errors[ord(resp[1:2])])) 273 else: 274 raise Socks5Error((9, _socks5errors[9])) 275 # Get the bound address/port 276 elif resp[3:4] == chr(0x01).encode(): 277 boundaddr = self.__recvall(4) 278 elif resp[3:4] == chr(0x03).encode(): 279 resp = resp + self.recv(1) 280 boundaddr = self.__recvall(ord(resp[4:5])) 281 else: 282 self.close() 283 raise GeneralProxyError((1,_generalerrors[1])) 284 boundport = struct.unpack(">H", self.__recvall(2))[0] 285 self.__proxysockname = (boundaddr, boundport) 286 if ipaddr != None: 287 self.__proxypeername = (socket.inet_ntoa(ipaddr), destport) 288 else: 289 self.__proxypeername = (destaddr, destport) 290 291 def getproxysockname(self): 292 """getsockname() -> address info 293 Returns the bound IP address and port number at the proxy. 294 """ 295 return self.__proxysockname 296 297 def getproxypeername(self): 298 """getproxypeername() -> address info 299 Returns the IP and port number of the proxy. 300 """ 301 return _orgsocket.getpeername(self) 302 303 def getpeername(self): 304 """getpeername() -> address info 305 Returns the IP address and port number of the destination 306 machine (note: getproxypeername returns the proxy) 307 """ 308 return self.__proxypeername 309 310 def __negotiatesocks4(self,destaddr,destport): 311 """__negotiatesocks4(self,destaddr,destport) 312 Negotiates a connection through a SOCKS4 server. 313 """ 314 # Check if the destination address provided is an IP address 315 rmtrslv = False 316 try: 317 ipaddr = socket.inet_aton(destaddr) 318 except socket.error: 319 # It's a DNS name. Check where it should be resolved. 320 if self.__proxy[3]: 321 ipaddr = struct.pack("BBBB", 0x00, 0x00, 0x00, 0x01) 322 rmtrslv = True 323 else: 324 ipaddr = socket.inet_aton(socket.gethostbyname(destaddr)) 325 # Construct the request packet 326 req = struct.pack(">BBH", 0x04, 0x01, destport) + ipaddr 327 # The username parameter is considered userid for SOCKS4 328 if self.__proxy[4] != None: 329 req = req + self.__proxy[4] 330 req = req + chr(0x00).encode() 331 # DNS name if remote resolving is required 332 # NOTE: This is actually an extension to the SOCKS4 protocol 333 # called SOCKS4A and may not be supported in all cases. 334 if rmtrslv: 335 req = req + destaddr + chr(0x00).encode() 336 self.sendall(req) 337 # Get the response from the server 338 resp = self.__recvall(8) 339 if resp[0:1] != chr(0x00).encode(): 340 # Bad data 341 self.close() 342 raise GeneralProxyError((1,_generalerrors[1])) 343 if resp[1:2] != chr(0x5A).encode(): 344 # Server returned an error 345 self.close() 346 if ord(resp[1:2]) in (91, 92, 93): 347 self.close() 348 raise Socks4Error((ord(resp[1:2]), _socks4errors[ord(resp[1:2]) - 90])) 349 else: 350 raise Socks4Error((94, _socks4errors[4])) 351 # Get the bound address/port 352 self.__proxysockname = (socket.inet_ntoa(resp[4:]), struct.unpack(">H", resp[2:4])[0]) 353 if rmtrslv != None: 354 self.__proxypeername = (socket.inet_ntoa(ipaddr), destport) 355 else: 356 self.__proxypeername = (destaddr, destport) 357 358 def __negotiatehttp(self, destaddr, destport): 359 """__negotiatehttp(self,destaddr,destport) 360 Negotiates a connection through an HTTP server. 361 """ 362 # If we need to resolve locally, we do this now 363 if not self.__proxy[3]: 364 addr = socket.gethostbyname(destaddr) 365 else: 366 addr = destaddr 367 headers = "CONNECT " + addr + ":" + str(destport) + " HTTP/1.1\r\n" 368 headers += "Host: " + destaddr + "\r\n" 369 if (self.__proxy[4] != None and self.__proxy[5] != None): 370 headers += self.__getauthheader() + "\r\n" 371 headers += "\r\n" 372 self.sendall(headers.encode()) 373 # We read the response until we get the string "\r\n\r\n" 374 resp = self.recv(1) 375 while resp.find("\r\n\r\n".encode()) == -1: 376 resp = resp + self.recv(1) 377 # We just need the first line to check if the connection 378 # was successful 379 statusline = resp.splitlines()[0].split(" ".encode(), 2) 380 if statusline[0] not in ("HTTP/1.0".encode(), "HTTP/1.1".encode()): 381 self.close() 382 raise GeneralProxyError((1, _generalerrors[1])) 383 try: 384 statuscode = int(statusline[1]) 385 except ValueError: 386 self.close() 387 raise GeneralProxyError((1, _generalerrors[1])) 388 if statuscode != 200: 389 self.close() 390 raise HTTPError((statuscode, statusline[2])) 391 self.__proxysockname = ("0.0.0.0", 0) 392 self.__proxypeername = (addr, destport) 393 394 def connect(self, destpair): 395 """connect(self, despair) 396 Connects to the specified destination through a proxy. 397 destpar - A tuple of the IP/DNS address and the port number. 398 (identical to socket's connect). 399 To select the proxy server use setproxy(). 400 """ 401 # Do a minimal input check first 402 if (not type(destpair) in (list,tuple)) or (len(destpair) < 2) or (type(destpair[0]) != type('')) or (type(destpair[1]) != int): 403 raise GeneralProxyError((5, _generalerrors[5])) 404 if self.__proxy[0] == PROXY_TYPE_SOCKS5: 405 if self.__proxy[2] != None: 406 portnum = self.__proxy[2] 407 else: 408 portnum = 1080 409 _orgsocket.connect(self, (self.__proxy[1], portnum)) 410 self.__negotiatesocks5(destpair[0], destpair[1]) 411 elif self.__proxy[0] == PROXY_TYPE_SOCKS4: 412 if self.__proxy[2] != None: 413 portnum = self.__proxy[2] 414 else: 415 portnum = 1080 416 _orgsocket.connect(self,(self.__proxy[1], portnum)) 417 self.__negotiatesocks4(destpair[0], destpair[1]) 418 elif self.__proxy[0] == PROXY_TYPE_HTTP: 419 if self.__proxy[2] != None: 420 portnum = self.__proxy[2] 421 else: 422 portnum = 8080 423 _orgsocket.connect(self,(self.__proxy[1], portnum)) 424 self.__negotiatehttp(destpair[0], destpair[1]) 425 elif self.__proxy[0] == PROXY_TYPE_HTTP_NO_TUNNEL: 426 if self.__proxy[2] != None: 427 portnum = self.__proxy[2] 428 else: 429 portnum = 8080 430 _orgsocket.connect(self,(self.__proxy[1],portnum)) 431 if destpair[1] == 443: 432 print "WARN: SSL connections (generally on port 443) require the use of tunneling - failing back to PROXY_TYPE_HTTP" 433 self.__negotiatehttp(destpair[0],destpair[1]) 434 else: 435 self.__httptunnel = False 436 elif self.__proxy[0] == None: 437 _orgsocket.connect(self, (destpair[0], destpair[1])) 438 else: 439 raise GeneralProxyError((4, _generalerrors[4]))
参考:
http://code.google.com/p/httplib2/issues/detail?id=38
原文:http://www.cnblogs.com/congbo/archive/2012/08/16/2641079.html