python httplib2 - 使用代理出错

httplib2使用socksipy实现代理支持,示例代码如下:

import httplib2 

import socks 



client = httplib2.Http(proxy_info = httplib2.ProxyInfo(socks.PROXY_TYPE_HTTP, "190.253.95.219", 8080)) 

 

很多代理连接死循环,一些代理连接失败,而用pycurl测试同样的代理无问题。

 

python version:2.7

httplib2 version:0.7.2

socks verson:1.00

 

socks.py模块介绍:http://socksipy.sourceforge.net/readme.txt,相关部分如下:

PROXY COMPATIBILITY

SocksiPy is compatible with three different types of proxies:

1. SOCKS Version 4 (Socks4), including the Socks4a extension.

2. SOCKS Version 5 (Socks5).

3. HTTP Proxies which support tunneling using the CONNECT method.

 

那些httplib2失败的代理是因为socksipy仅支持使用CONNECT方法的代理服务器,CONNECT方法是HTTP/1.1协议中预留给能够将连接改为管道方式的代理服务器,即支持HTTPS。与HTTP的URL由“http://”起始且默认使用端口80不同,HTTPS的URL由“https://”起始且默认使用端口443。

 

修改socks.py模块相关源代码,/usr/lib/python2.6/dist-packages/socks.py,添加一个"PROXY_TYPE_HTTP_NO_TUNNEL"类型,当失败时再跳到"PROXY_TYPE_HTTP"类型。

 

补丁参考:http://code.google.com/p/xbmc-iplayerv2/source/browse/trunk/plugin.video.iplayer/lib/httplib2/socks.py,测试可用,代码如下:

View Code
  1 """SocksiPy - Python SOCKS module.

  2 Version 1.00

  3 

  4 Copyright 2006 Dan-Haim. All rights reserved.

  5 

  6 Redistribution and use in source and binary forms, with or without modification,

  7 are permitted provided that the following conditions are met:

  8 1. Redistributions of source code must retain the above copyright notice, this

  9    list of conditions and the following disclaimer.

 10 2. Redistributions in binary form must reproduce the above copyright notice,

 11    this list of conditions and the following disclaimer in the documentation

 12    and/or other materials provided with the distribution.

 13 3. Neither the name of Dan Haim nor the names of his contributors may be used

 14    to endorse or promote products derived from this software without specific

 15    prior written permission.

 16    

 17 THIS SOFTWARE IS PROVIDED BY DAN HAIM "AS IS" AND ANY EXPRESS OR IMPLIED

 18 WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF

 19 MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO

 20 EVENT SHALL DAN HAIM OR HIS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,

 21 INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT

 22 LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA

 23 OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF

 24 LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT

 25 OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMANGE.

 26 

 27 

 28 This module provides a standard socket-like interface for Python

 29 for tunneling connections through SOCKS proxies.

 30 

 31 """

 32 

 33 """

 34 

 35 Minor modifications made by Christopher Gilbert (http://motomastyle.com/)

 36 for use in PyLoris (http://pyloris.sourceforge.net/)

 37 

 38 Minor modifications made by Mario Vilas (http://breakingcode.wordpress.com/)

 39 mainly to merge bug fixes found in Sourceforge

 40 

 41 """

 42 

 43 import socket

 44 import struct

 45 import sys

 46 import base64

 47 

 48 PROXY_TYPE_SOCKS4 = 1

 49 PROXY_TYPE_SOCKS5 = 2

 50 PROXY_TYPE_HTTP = 3

 51 PROXY_TYPE_HTTP_NO_TUNNEL = 4

 52 

 53 _defaultproxy = None

 54 _orgsocket = socket.socket

 55 

 56 class ProxyError(Exception): pass

 57 class GeneralProxyError(ProxyError): pass

 58 class Socks5AuthError(ProxyError): pass

 59 class Socks5Error(ProxyError): pass

 60 class Socks4Error(ProxyError): pass

 61 class HTTPError(ProxyError): pass

 62 

 63 _generalerrors = ("success",

 64     "invalid data",

 65     "not connected",

 66     "not available",

 67     "bad proxy type",

 68     "bad input")

 69 

 70 _socks5errors = ("succeeded",

 71     "general SOCKS server failure",

 72     "connection not allowed by ruleset",

 73     "Network unreachable",

 74     "Host unreachable",

 75     "Connection refused",

 76     "TTL expired",

 77     "Command not supported",

 78     "Address type not supported",

 79     "Unknown error")

 80 

 81 _socks5autherrors = ("succeeded",

 82     "authentication is required",

 83     "all offered authentication methods were rejected",

 84     "unknown username or invalid password",

 85     "unknown error")

 86 

 87 _socks4errors = ("request granted",

 88     "request rejected or failed",

 89     "request rejected because SOCKS server cannot connect to identd on the client",

 90     "request rejected because the client program and identd report different user-ids",

 91     "unknown error")

 92 

 93 def setdefaultproxy(proxytype=None, addr=None, port=None, rdns=True, username=None, password=None):

 94     """setdefaultproxy(proxytype, addr[, port[, rdns[, username[, password]]]])

 95     Sets a default proxy which all further socksocket objects will use,

 96     unless explicitly changed.

 97     """

 98     global _defaultproxy

 99     _defaultproxy = (proxytype, addr, port, rdns, username, password)

100 

101 def wrapmodule(module):

102     """wrapmodule(module)

103     Attempts to replace a module's socket library with a SOCKS socket. Must set

104     a default proxy using setdefaultproxy(...) first.

105     This will only work on modules that import socket directly into the namespace;

106     most of the Python Standard Library falls into this category.

107     """

108     if _defaultproxy != None:

109         module.socket.socket = socksocket

110     else:

111         raise GeneralProxyError((4, "no proxy specified"))

112 

113 class socksocket(socket.socket):

114     """socksocket([family[, type[, proto]]]) -> socket object

115     Open a SOCKS enabled socket. The parameters are the same as

116     those of the standard socket init. In order for SOCKS to work,

117     you must specify family=AF_INET, type=SOCK_STREAM and proto=0.

118     """

119 

120     def __init__(self, family=socket.AF_INET, type=socket.SOCK_STREAM, proto=0, _sock=None):

121         _orgsocket.__init__(self, family, type, proto, _sock)

122         if _defaultproxy != None:

123             self.__proxy = _defaultproxy

124         else:

125             self.__proxy = (None, None, None, None, None, None)

126         self.__proxysockname = None

127         self.__proxypeername = None

128 

129         self.__httptunnel = True

130 

131     def __recvall(self, count):

132         """__recvall(count) -> data

133         Receive EXACTLY the number of bytes requested from the socket.

134         Blocks until the required number of bytes have been received.

135         """

136         data = self.recv(count)

137         while len(data) < count:

138             d = self.recv(count-len(data))

139             if not d: raise GeneralProxyError((0, "connection closed unexpectedly"))

140             data = data + d

141         return data

142 

143     def sendall(self, content, *args):

144         """ override socket.socket.sendall method to rewrite the header 

145         for non-tunneling proxies if needed 

146         """

147         if not self.__httptunnel:

148             content = self.__rewriteproxy(content)

149 

150         return super(socksocket, self).sendall(content, *args)

151 

152     def __rewriteproxy(self, header):

153         """ rewrite HTTP request headers to support non-tunneling proxies 

154         (i.e. thos which do not support the CONNECT method).

155         This only works for HTTP (not HTTPS) since HTTPS requires tunneling.

156         """

157         host, endpt = None, None

158         hdrs = header.split("\r\n")

159         for hdr in hdrs:

160             if hdr.lower().startswith("host:"):

161                 host = hdr

162             elif hdr.lower().startswith("get") or hdr.lower().startswith("post"):

163                 endpt = hdr

164         if host and endpt: 

165             hdrs.remove(host)

166             hdrs.remove(endpt)

167             host = host.split(" ")[1]

168             endpt = endpt.split(" ")

169             if (self.__proxy[4] != None and self.__proxy[5] != None):

170                 hdrs.insert(0, self.__getauthheader())

171             hdrs.insert(0, "Host: %s" % host)

172             hdrs.insert(0, "%s http://%s%s %s" % (endpt[0], host, endpt[1], endpt[2]))

173 

174         return "\r\n".join(hdrs)

175 

176     def __getauthheader(self):

177         auth = self.__proxy[4] + ":" + self.__proxy[5]

178         return "Proxy-Authorization: Basic " + base64.b64encode(auth)

179 

180     def setproxy(self, proxytype=None, addr=None, port=None, rdns=True, username=None, password=None):

181         """setproxy(proxytype, addr[, port[, rdns[, username[, password]]]])

182         Sets the proxy to be used.

183         proxytype -    The type of the proxy to be used. Three types

184                 are supported: PROXY_TYPE_SOCKS4 (including socks4a),

185                 PROXY_TYPE_SOCKS5 and PROXY_TYPE_HTTP

186         addr -        The address of the server (IP or DNS).

187         port -        The port of the server. Defaults to 1080 for SOCKS

188                 servers and 8080 for HTTP proxy servers.

189         rdns -        Should DNS queries be preformed on the remote side

190                 (rather than the local side). The default is True.

191                 Note: This has no effect with SOCKS4 servers.

192         username -    Username to authenticate with to the server.

193                 The default is no authentication.

194         password -    Password to authenticate with to the server.

195                 Only relevant when username is also provided.

196         """

197         self.__proxy = (proxytype, addr, port, rdns, username, password)

198 

199     def __negotiatesocks5(self, destaddr, destport):

200         """__negotiatesocks5(self,destaddr,destport)

201         Negotiates a connection through a SOCKS5 server.

202         """

203         # First we'll send the authentication packages we support.

204         if (self.__proxy[4]!=None) and (self.__proxy[5]!=None):

205             # The username/password details were supplied to the

206             # setproxy method so we support the USERNAME/PASSWORD

207             # authentication (in addition to the standard none).

208             self.sendall(struct.pack('BBBB', 0x05, 0x02, 0x00, 0x02))

209         else:

210             # No username/password were entered, therefore we

211             # only support connections with no authentication.

212             self.sendall(struct.pack('BBB', 0x05, 0x01, 0x00))

213         # We'll receive the server's response to determine which

214         # method was selected

215         chosenauth = self.__recvall(2)

216         if chosenauth[0:1] != chr(0x05).encode():

217             self.close()

218             raise GeneralProxyError((1, _generalerrors[1]))

219         # Check the chosen authentication method

220         if chosenauth[1:2] == chr(0x00).encode():

221             # No authentication is required

222             pass

223         elif chosenauth[1:2] == chr(0x02).encode():

224             # Okay, we need to perform a basic username/password

225             # authentication.

226             self.sendall(chr(0x01).encode() + chr(len(self.__proxy[4])) + self.__proxy[4] + chr(len(self.__proxy[5])) + self.__proxy[5])

227             authstat = self.__recvall(2)

228             if authstat[0:1] != chr(0x01).encode():

229                 # Bad response

230                 self.close()

231                 raise GeneralProxyError((1, _generalerrors[1]))

232             if authstat[1:2] != chr(0x00).encode():

233                 # Authentication failed

234                 self.close()

235                 raise Socks5AuthError((3, _socks5autherrors[3]))

236             # Authentication succeeded

237         else:

238             # Reaching here is always bad

239             self.close()

240             if chosenauth[1] == chr(0xFF).encode():

241                 raise Socks5AuthError((2, _socks5autherrors[2]))

242             else:

243                 raise GeneralProxyError((1, _generalerrors[1]))

244         # Now we can request the actual connection

245         req = struct.pack('BBB', 0x05, 0x01, 0x00)

246         # If the given destination address is an IP address, we'll

247         # use the IPv4 address request even if remote resolving was specified.

248         try:

249             ipaddr = socket.inet_aton(destaddr)

250             req = req + chr(0x01).encode() + ipaddr

251         except socket.error:

252             # Well it's not an IP number,  so it's probably a DNS name.

253             if self.__proxy[3]:

254                 # Resolve remotely

255                 ipaddr = None

256                 req = req + chr(0x03).encode() + chr(len(destaddr)).encode() + destaddr

257             else:

258                 # Resolve locally

259                 ipaddr = socket.inet_aton(socket.gethostbyname(destaddr))

260                 req = req + chr(0x01).encode() + ipaddr

261         req = req + struct.pack(">H", destport)

262         self.sendall(req)

263         # Get the response

264         resp = self.__recvall(4)

265         if resp[0:1] != chr(0x05).encode():

266             self.close()

267             raise GeneralProxyError((1, _generalerrors[1]))

268         elif resp[1:2] != chr(0x00).encode():

269             # Connection failed

270             self.close()

271             if ord(resp[1:2])<=8:

272                 raise Socks5Error((ord(resp[1:2]), _socks5errors[ord(resp[1:2])]))

273             else:

274                 raise Socks5Error((9, _socks5errors[9]))

275         # Get the bound address/port

276         elif resp[3:4] == chr(0x01).encode():

277             boundaddr = self.__recvall(4)

278         elif resp[3:4] == chr(0x03).encode():

279             resp = resp + self.recv(1)

280             boundaddr = self.__recvall(ord(resp[4:5]))

281         else:

282             self.close()

283             raise GeneralProxyError((1,_generalerrors[1]))

284         boundport = struct.unpack(">H", self.__recvall(2))[0]

285         self.__proxysockname = (boundaddr, boundport)

286         if ipaddr != None:

287             self.__proxypeername = (socket.inet_ntoa(ipaddr), destport)

288         else:

289             self.__proxypeername = (destaddr, destport)

290 

291     def getproxysockname(self):

292         """getsockname() -> address info

293         Returns the bound IP address and port number at the proxy.

294         """

295         return self.__proxysockname

296 

297     def getproxypeername(self):

298         """getproxypeername() -> address info

299         Returns the IP and port number of the proxy.

300         """

301         return _orgsocket.getpeername(self)

302 

303     def getpeername(self):

304         """getpeername() -> address info

305         Returns the IP address and port number of the destination

306         machine (note: getproxypeername returns the proxy)

307         """

308         return self.__proxypeername

309 

310     def __negotiatesocks4(self,destaddr,destport):

311         """__negotiatesocks4(self,destaddr,destport)

312         Negotiates a connection through a SOCKS4 server.

313         """

314         # Check if the destination address provided is an IP address

315         rmtrslv = False

316         try:

317             ipaddr = socket.inet_aton(destaddr)

318         except socket.error:

319             # It's a DNS name. Check where it should be resolved.

320             if self.__proxy[3]:

321                 ipaddr = struct.pack("BBBB", 0x00, 0x00, 0x00, 0x01)

322                 rmtrslv = True

323             else:

324                 ipaddr = socket.inet_aton(socket.gethostbyname(destaddr))

325         # Construct the request packet

326         req = struct.pack(">BBH", 0x04, 0x01, destport) + ipaddr

327         # The username parameter is considered userid for SOCKS4

328         if self.__proxy[4] != None:

329             req = req + self.__proxy[4]

330         req = req + chr(0x00).encode()

331         # DNS name if remote resolving is required

332         # NOTE: This is actually an extension to the SOCKS4 protocol

333         # called SOCKS4A and may not be supported in all cases.

334         if rmtrslv:

335             req = req + destaddr + chr(0x00).encode()

336         self.sendall(req)

337         # Get the response from the server

338         resp = self.__recvall(8)

339         if resp[0:1] != chr(0x00).encode():

340             # Bad data

341             self.close()

342             raise GeneralProxyError((1,_generalerrors[1]))

343         if resp[1:2] != chr(0x5A).encode():

344             # Server returned an error

345             self.close()

346             if ord(resp[1:2]) in (91, 92, 93):

347                 self.close()

348                 raise Socks4Error((ord(resp[1:2]), _socks4errors[ord(resp[1:2]) - 90]))

349             else:

350                 raise Socks4Error((94, _socks4errors[4]))

351         # Get the bound address/port

352         self.__proxysockname = (socket.inet_ntoa(resp[4:]), struct.unpack(">H", resp[2:4])[0])

353         if rmtrslv != None:

354             self.__proxypeername = (socket.inet_ntoa(ipaddr), destport)

355         else:

356             self.__proxypeername = (destaddr, destport)

357 

358     def __negotiatehttp(self, destaddr, destport):

359         """__negotiatehttp(self,destaddr,destport)

360         Negotiates a connection through an HTTP server.

361         """

362         # If we need to resolve locally, we do this now

363         if not self.__proxy[3]:

364             addr = socket.gethostbyname(destaddr)

365         else:

366             addr = destaddr

367         headers =  "CONNECT " + addr + ":" + str(destport) + " HTTP/1.1\r\n"

368         headers += "Host: " + destaddr + "\r\n"

369         if (self.__proxy[4] != None and self.__proxy[5] != None):

370                 headers += self.__getauthheader() + "\r\n"

371         headers += "\r\n"

372         self.sendall(headers.encode())

373         # We read the response until we get the string "\r\n\r\n"

374         resp = self.recv(1)

375         while resp.find("\r\n\r\n".encode()) == -1:

376             resp = resp + self.recv(1)

377         # We just need the first line to check if the connection

378         # was successful

379         statusline = resp.splitlines()[0].split(" ".encode(), 2)

380         if statusline[0] not in ("HTTP/1.0".encode(), "HTTP/1.1".encode()):

381             self.close()

382             raise GeneralProxyError((1, _generalerrors[1]))

383         try:

384             statuscode = int(statusline[1])

385         except ValueError:

386             self.close()

387             raise GeneralProxyError((1, _generalerrors[1]))

388         if statuscode != 200:

389             self.close()

390             raise HTTPError((statuscode, statusline[2]))

391         self.__proxysockname = ("0.0.0.0", 0)

392         self.__proxypeername = (addr, destport)

393 

394     def connect(self, destpair):

395         """connect(self, despair)

396         Connects to the specified destination through a proxy.

397         destpar - A tuple of the IP/DNS address and the port number.

398         (identical to socket's connect).

399         To select the proxy server use setproxy().

400         """

401         # Do a minimal input check first

402         if (not type(destpair) in (list,tuple)) or (len(destpair) < 2) or (type(destpair[0]) != type('')) or (type(destpair[1]) != int):

403             raise GeneralProxyError((5, _generalerrors[5]))

404         if self.__proxy[0] == PROXY_TYPE_SOCKS5:

405             if self.__proxy[2] != None:

406                 portnum = self.__proxy[2]

407             else:

408                 portnum = 1080

409             _orgsocket.connect(self, (self.__proxy[1], portnum))

410             self.__negotiatesocks5(destpair[0], destpair[1])

411         elif self.__proxy[0] == PROXY_TYPE_SOCKS4:

412             if self.__proxy[2] != None:

413                 portnum = self.__proxy[2]

414             else:

415                 portnum = 1080

416             _orgsocket.connect(self,(self.__proxy[1], portnum))

417             self.__negotiatesocks4(destpair[0], destpair[1])

418         elif self.__proxy[0] == PROXY_TYPE_HTTP:

419             if self.__proxy[2] != None:

420                 portnum = self.__proxy[2]

421             else:

422                 portnum = 8080

423             _orgsocket.connect(self,(self.__proxy[1], portnum))

424             self.__negotiatehttp(destpair[0], destpair[1])

425         elif self.__proxy[0] == PROXY_TYPE_HTTP_NO_TUNNEL:

426             if self.__proxy[2] != None:

427                 portnum = self.__proxy[2]

428             else:

429                 portnum = 8080

430             _orgsocket.connect(self,(self.__proxy[1],portnum))

431             if destpair[1] == 443:

432                 print "WARN: SSL connections (generally on port 443) require the use of tunneling - failing back to PROXY_TYPE_HTTP"

433                 self.__negotiatehttp(destpair[0],destpair[1])

434             else:

435                 self.__httptunnel = False

436         elif self.__proxy[0] == None:

437             _orgsocket.connect(self, (destpair[0], destpair[1]))

438         else:

439             raise GeneralProxyError((4, _generalerrors[4]))

 

参考:

http://code.google.com/p/httplib2/issues/detail?id=38

 

原文:http://www.cnblogs.com/congbo/archive/2012/08/16/2641079.html

你可能感兴趣的:(python)