小案例系列之:python3的requests库socks5代理账号密码为什么不支持#等字符
前段时间遇到一个朋友说:测试个socks5账号居然出现神奇的现象,参数设置进去以后抛出异常网络不可达。再跟进去一点,发现参数居然只返回的部分,甚至服务器地址都没有返回。后面各种测试发现只要账号密码包含#号,就会出现这个问题。这个没理由的,socks5协议中没有要求账号密码不能包含#号,一定是python3的库有问题。
每一次异常都不要轻易放过,蜂巢指纹浏览器(NestBrowser)开发中经常遇到,尽力找到“为什么”。
测试代码如下:
import requests
proxy_data = {
"server": 'x.x.x.x',
"port": "456",
"user": "test",
"pwd": "xxxx#111",
}
proxy = f'{proxy_data.get("user")}:{proxy_data.get("pwd")}@{proxy_data.get("server")}:{proxy_data.get("port")}'
proxies = {
'http': 'socks5h://' + proxy,
'https': 'socks5h://' + proxy,
}
try:
requests.packages.urllib3.disable_warnings(InsecureRequestWarning)
url = ""
response = requests.get(url, proxies=proxies, timeout=5, verify=False) #GBK
content = response.content.decode('utf-8')
print("content:",content)
response.close()
except Exception as e:
print('Error', proxy, e)
1、问题出现了只能跟进去使用的requests库。 蜂巢指纹浏览器(NestBrowser)开发过程中也大量使用python3完成很多功能。
Ubuntu下这个库地址位于目录:/usr/local/lib/python3.6/site-packages
运行命令:pip3 list
再运行命令:pip3 show requests
确定好库的文件路径
2、打开requests库目录,api.py中就有get接口的申明
def get(url, params=None, **kwargs):
r"""Sends a GET request.
:param url: URL for the new :class:`Request` object.
:param params: (optional) Dictionary, list of tuples or bytes to send
in the query string for the :class:`Request`.
:param \\*\\*kwargs: Optional arguments that ``request`` takes.
:return: :class:`Response ` object
:rtype: requests.Response
"""
return request('get', url, params=params, **kwargs)
3、继续跟models.py:
from urllib3.util import parse_url
try:
scheme, auth, host, port, path, query, fragment = parse_url(url)
except LocationParseError as e:
raise InvalidURL(*e.args)
4、继续跟urllib3/util/url.py;requests依赖了urllib3
def parse_url(url):
"""
Given a url, return a parsed :class:`.Url` namedtuple. Best-effort is
performed to parse incomplete urls. Fields not provided will be None.
This parser is RFC 3986 and RFC 6874 compliant.
The parser logic and helper functions are based heavily on
work done in the ``rfc3986`` module.
:param str url: URL to parse into a :class:`.Url` namedtuple.
Partly backwards-compatible with :mod:`urlparse`.
Example::
>>> parse_url(' ')
Url(scheme='http', host='google.com', port=None, path='/mail/', ...)
>>> parse_url('google.com:80')
Url(scheme=None, host='google.com', port=80, path=None, ...)
>>> parse_url('/foo?bar')
Url(scheme=None, host=None, port=None, path='/foo', query='bar', ...)
"""
source_url = url
if not SCHEME_RE.search(url):
url = "//" + url
try:
scheme, authority, path, query, fragment = URI_RE.match(url).groups()
normalize_uri = scheme is None or scheme.lower() in NORMALIZABLE_SCHEMES
到这里问题就清晰了,socks5的账号密码是合成在url中传入的,解析也用的是标准的url解析方式;This parser is RFC 3986 and RFC 6874 compliant。
那一定要用怎么解决呢?只能在传入的时候替换,在解析完成以后替换回来,具体替换代码如下:
1、修改urllib3/util/url.py,把可能的"#"替换成"_-_"
def parse_url(xurl):
url = str(xurl).replace("#","_-_")
if not url:
return Url()
2、修改urllib3/contrib/socks.py,把"_-_"替换回来
def __init__(
self,
proxy_url,
username=None,
password=None,
num_pools=10,
headers=None,
**connection_pool_kw
):
parsed = parse_url(proxy_url)
if username is None and password is None and parsed.auth is not None:
split = parsed.auth.split(":")
if len(split) == 2:
username, password = split
if parsed.scheme == "socks5":
socks_version = socks.PROXY_TYPE_SOCKS5
rdns = False
elif parsed.scheme == "socks5h":
socks_version = socks.PROXY_TYPE_SOCKS5
rdns = True
elif parsed.scheme == "socks4":
socks_version = socks.PROXY_TYPE_SOCKS4
rdns = False
elif parsed.scheme == "socks4a":
socks_version = socks.PROXY_TYPE_SOCKS4
rdns = True
else:
raise ValueError("Unable to determine SOCKS version from %s" % proxy_url)
self.proxy_url = proxy_url
if password is not None:
password = str(password ).replace('_-_','#')
替换这两个修改后的py文件,测试就可以正常跑通了。
这只是一次刨根问底而已,正常情况下应该要遵守它默认的格式和协议。蜂巢指纹浏览器(NestBrowser)等待用户来一次刨根问底。