公司的电脑使用代理上网,目前见到的设置是在IE中。
在python脚本中,使用urllib中的urlopen可以获取到网页
import urllib
print urllib.urlopen('www.pythoncool.com').read()
但httplib的HTTPConnection.request()方法却失败,错误码11001
import httplib
conn = httplib.HTTPConnection('www.pythoncool.com')
conn.request('get', '/') #这一步失败
Traceback (most recent call last):
File "<pyshell#18>", line 1, in <module>
conn.request('get','/')
File "C:/Python25/lib/httplib.py", line 862, in request
self._send_request(method, url, body, headers)
File "C:/Python25/lib/httplib.py", line 885, in _send_request
self.endheaders()
File "C:/Python25/lib/httplib.py", line 856, in endheaders
self._send_output()
File "C:/Python25/lib/httplib.py", line 728, in _send_output
self.send(msg)
File "C:/Python25/lib/httplib.py", line 695, in send
self.connect()
File "C:/Python25/lib/httplib.py", line 663, in connect
socket.SOCK_STREAM):
gaierror: (11001, 'getaddrinfo failed')
-------------------------------------------------------------------------------
urllib模块中
getproxies_environment()可以获取本机所使用的代理地址及其端口
跟踪urllib.openurl(),发现其实现是向代理发送相应的HTTP请求
更进一步弄清楚,需要了解HTTP协议
大致是,获取局域网代理的ip和端口,(ip,port)
如果要访问http://bbs.chinaunix.net/forum-55-1.html
则建一个连接到(ip,port)的TCP连接
向它发送'GET http://bbs.chinaunix.net/forum-55-1.html HTTP/1.0/r/nHost: bbs.chinaunix.net/r/nUser-Agent: Python-urllib/1.17/r/n/r/n'
然后接收处理返回的信息