Python3中使用的socks5代理账号密码不能包含#*等字符原因是什么,怎么解决

小案例系列之:python3的requests库socks5代理账号密码为什么不支持#等字符

前段时间遇到一个朋友说:测试个socks5账号居然出现神奇的现象,参数设置进去以后抛出异常网络不可达。再跟进去一点,发现参数居然只返回的部分,甚至服务器地址都没有返回。后面各种测试发现只要账号密码包含#号,就会出现这个问题。这个没理由的,socks5协议中没有要求账号密码不能包含#号,一定是python3的库有问题。

每一次异常都不要轻易放过,蜂巢指纹浏览器(NestBrowser)开发中经常遇到,尽力找到“为什么”。

测试代码如下:

import requests

proxy_data = {
    "server": 'x.x.x.x',
    "port": "456",
    "user": "test",
    "pwd": "xxxx#111",
}
proxy = f'{proxy_data.get("user")}:{proxy_data.get("pwd")}@{proxy_data.get("server")}:{proxy_data.get("port")}'
proxies = {
    'http': 'socks5h://' + proxy,
    'https': 'socks5h://' + proxy,
}

try:
    requests.packages.urllib3.disable_warnings(InsecureRequestWarning)
    url = ""
    response = requests.get(url, proxies=proxies, timeout=5, verify=False)    #GBK
    content = response.content.decode('utf-8')
    print("content:",content)
    response.close()
except Exception as e:
    print('Error', proxy, e)
1、问题出现了只能跟进去使用的requests库。
蜂巢指纹浏览器(NestBrowser)开发过程中也大量使用python3完成很多功能。

Ubuntu下这个库地址位于目录:/usr/local/lib/python3.6/site-packages

运行命令:pip3 list

再运行命令:pip3 show requests

确定好库的文件路径

Python3中使用的socks5代理账号密码不能包含#*等字符原因是什么,怎么解决_第1张图片

​2、打开requests库目录,api.py中就有get接口的申明

def get(url, params=None, **kwargs):
    r"""Sends a GET request.

    :param url: URL for the new :class:`Request` object.
    :param params: (optional) Dictionary, list of tuples or bytes to send
        in the query string for the :class:`Request`.
    :param \\*\\*kwargs: Optional arguments that ``request`` takes.
    :return: :class:`Response ` object
    :rtype: requests.Response
    """

    return request('get', url, params=params, **kwargs)

3、继续跟models.py:

from urllib3.util import parse_url

try:
    scheme, auth, host, port, path, query, fragment = parse_url(url)
except LocationParseError as e:
    raise InvalidURL(*e.args)
4、继续跟urllib3/util/url.py;requests依赖了urllib3
def parse_url(url):
    """
    Given a url, return a parsed :class:`.Url` namedtuple. Best-effort is
    performed to parse incomplete urls. Fields not provided will be None.
    This parser is RFC 3986 and RFC 6874 compliant.

    The parser logic and helper functions are based heavily on
    work done in the ``rfc3986`` module.

    :param str url: URL to parse into a :class:`.Url` namedtuple.

    Partly backwards-compatible with :mod:`urlparse`.

    Example::

        >>> parse_url('')
        Url(scheme='http', host='google.com', port=None, path='/mail/', ...)
        >>> parse_url('google.com:80')
        Url(scheme=None, host='google.com', port=80, path=None, ...)
        >>> parse_url('/foo?bar')
        Url(scheme=None, host=None, port=None, path='/foo', query='bar', ...)
    """

    source_url = url
    if not SCHEME_RE.search(url):
        url = "//" + url

    try:
        scheme, authority, path, query, fragment = URI_RE.match(url).groups()
        normalize_uri = scheme is None or scheme.lower() in NORMALIZABLE_SCHEMES

到这里问题就清晰了,socks5的账号密码是合成在url中传入的,解析也用的是标准的url解析方式;This parser is RFC 3986 and RFC 6874 compliant。

那一定要用怎么解决呢?只能在传入的时候替换,在解析完成以后替换回来,具体替换代码如下:

1、修改urllib3/util/url.py,把可能的"#"替换成"_-_"

def parse_url(xurl):
    url = str(xurl).replace("#","_-_")
    if not url:
        return Url()

2、修改urllib3/contrib/socks.py,把"_-_"替换回来

def __init__(
        self,
        proxy_url,
        username=None,
        password=None,
        num_pools=10,
        headers=None,
        **connection_pool_kw
    ):
        parsed = parse_url(proxy_url)

        if username is None and password is None and parsed.auth is not None:
            split = parsed.auth.split(":")
            if len(split) == 2:
                username, password = split
        if parsed.scheme == "socks5":
            socks_version = socks.PROXY_TYPE_SOCKS5
            rdns = False
        elif parsed.scheme == "socks5h":
            socks_version = socks.PROXY_TYPE_SOCKS5
            rdns = True
        elif parsed.scheme == "socks4":
            socks_version = socks.PROXY_TYPE_SOCKS4
            rdns = False
        elif parsed.scheme == "socks4a":
            socks_version = socks.PROXY_TYPE_SOCKS4
            rdns = True
        else:
            raise ValueError("Unable to determine SOCKS version from %s" % proxy_url)

        self.proxy_url = proxy_url

        if password is not None:
            password = str(password ).replace('_-_','#')

替换这两个修改后的py文件,测试就可以正常跑通了。

这只是一次刨根问底而已,正常情况下应该要遵守它默认的格式和协议。蜂巢指纹浏览器(NestBrowser)等待用户来一次刨根问底。

你可能感兴趣的:(小案例,python,ubuntu)