小猿圈python学习-Selenium爬虫之使用代理ip的方法

今天小猿圈给大家分享的是如何在爬取数据的时候防止IP被封,今天给大家分享两种方法,希望大家可以认真学习,再也不用

担心被封IP啦。

第一种:

降低访问速度,我们可以使用time模块中的sleep,使程序每运行一次后就睡眠1s,这样的话就可以大大的减少ip被封的几率

第二种:

为了提高效率,我们可以使用代理ip来解决,ip是亿牛云的动态转发代理,以下是代理配置过程的示例

Selenium

   from selenium import webdriver

   import string

   import zipfile

    #代理服务器

   proxyHost = "t.16yun.cn"

   proxyPort = "31111"

    #代理隧道验证信息

   proxyUser = "username"

   proxyPass = "password"

   def create_proxy_auth_extension(proxy_host, proxy_port,

                                  proxy_username, proxy_password,

                                  scheme='http', plugin_path=None):

       if plugin_path is None:

           plugin_path = r'C:/{}_{}@t.16yun.zip'.format(proxy_username,proxy_password)

       manifest_json = """       {            "version":"1.0.0",           "manifest_version": 2,            "name": "16YUNProxy",           "permissions": [                "proxy",                "tabs",               "unlimitedStorage",               "storage",               "",               "webRequest",               "webRequestBlocking"           ],           "background": {               "scripts": ["background.js"]            },           "minimum_chrome_version":"22.0.0"        }       """

       background_js = string.Template(

           """            varconfig = {                mode:"fixed_servers",               rules: {                    singleProxy: {                        scheme:"${scheme}",                       host: "${host}",                        port:parseInt(${port})                   },                    bypassList:["foobar.com"]               }              };

            chrome.proxy.settings.set({value:config, scope: "regular"}, function() {});

           function callbackFn(details) {                return {                    authCredentials: {                        username:"${username}",                       password:"${password}"                   }                };            }

           chrome.webRequest.onAuthRequired.addListener(                callbackFn,                {urls: [""]},                ['blocking']            );            """

        ).substitute(

           host=proxy_host,

           port=proxy_port,

           username=proxy_username,

           password=proxy_password,

           scheme=scheme,

       )

       with zipfile.ZipFile(plugin_path, 'w') as zp:

           zp.writestr("manifest.json", manifest_json)

           zp.writestr("background.js", background_js)

       return plugin_path

   proxy_auth_plugin_path = create_proxy_auth_extension(

       proxy_host=proxyHost,

       proxy_port=proxyPort,

       proxy_username=proxyUser,

       proxy_password=proxyPass)

   option = webdriver.ChromeOptions()

   option.add_argument("--start-maximized")

   option.add_extension(proxy_auth_plugin_path)

   driver = webdriver.Chrome(chrome_options=option)

driver.get("http://httpbin.org/ip")


好了,今天小猿圈关于python学习的分享就到这里,上边的那段代码可以直接使用,但是里边的代理应该已经过期,大家在使用

的时候可能需要联系代理商开通服务,最后呢希望大家能够收藏起来,要记得做笔记哦。好记性不如烂笔头。

你可能感兴趣的:(小猿圈python学习-Selenium爬虫之使用代理ip的方法)