Proxypool代理池搭建

个人博客阅读体验更佳:点我

前言

项目地址 : https://github.com/jhao104/proxy_pool

这个项目是github上一个大佬基于python爬虫制作的定时获取免费可用代理并入池的代理池项目

我们来具体实现一下。


具体操作

1.安装配置redis

将自动爬取的代理入池需要redis数据库,首先就得安装redis。

redis官方建议我们在linux上安装,安装方式主要有两种,直接包获取或手动安装。

- 指令安装

apt-get install redis-server

- 手动安装

在官网下载最新redis安装包,导入Linux。

tar -zxvf redis-6.2.6.tar.gz
cd redis-6.2.6/
make
make install
cd /usr/local/bin
mkdir config
cp /opt/redis-6.2.6/redis.conf config		# 默认安装位置为/opt

配置文件修改

修改redis配置文件(注意两种安装方式的配置文件位置不同,自动安装在/etc/redis/redis.conf,手动安装在/opt/redis-6.2.6/redis.conf),进行如下修改:

daemonize yes		# 守护进程开启
protected-mode no   # 关闭保护模式
# bind 127.0.0.1 ::1			# 此条为仅允许本地访问,必须注释掉
port 6379			# redis 开放端口(如果是有防火墙的服务器需要开启该端口)

开启redis

redis-server config/redis.conf
redis-cli

Proxypool代理池搭建_第1张图片

如需停止:

shutdown
exit

2.拉取并使用脚本

根据项目文档,可以手动配置也可以使用docker部署(推荐)

docker 使用方法见另一篇博客

docker pull jhao104/proxy_pool
docker run --env DB_CONN=redis://:[password]@[ip]:[port]/[db] -p 5010:5010 jhao104/proxy_pool:latest

password 没有可为空

db 默认0

运行成功应如图:

Proxypool代理池搭建_第2张图片


3.生成配置文件并导入Proxyfier

首先pip安装redis包

pip install redis

编译以下代码,注意修改第8行的ip和port(redis)

# -*- coding:utf8 -*-
import redis
import json
from xml.etree import ElementTree

def RedisProxyGet():
    ConnectString = []
    pool = redis.ConnectionPool(host='[ip]', port=[port], db=0, decode_responses=True)
    use_proxy = redis.Redis(connection_pool=pool)
    key = use_proxy.hkeys('use_proxy')
    for temp in key:
        try:
            ConnectString.append(json.loads(use_proxy.hget('use_proxy',temp)))
        except json.JSONDecodeError: # JSON解析异常处理
            pass
    return ConnectString

def xmlOutputs(data):
    i = 101
    ProxyIDList = []
    ProxifierProfile = ElementTree.Element("ProxifierProfile")
    ProxifierProfile.set("version", str(i))
    ProxifierProfile.set("platform", "Windows")
    ProxifierProfile.set("product_id", "0")
    ProxifierProfile.set("product_minver", "310")
    Options = ElementTree.SubElement(ProxifierProfile, "Options")
    Resolve = ElementTree.SubElement(Options, "Resolve")
    AutoModeDetection = ElementTree.SubElement(Resolve, "AutoModeDetection")
    AutoModeDetection.set("enabled", "false")
    ViaProxy = ElementTree.SubElement(Resolve, "ViaProxy")
    ViaProxy.set("enabled", "false")
    TryLocalDnsFirst = ElementTree.SubElement(ViaProxy, "TryLocalDnsFirst")
    TryLocalDnsFirst.set("enabled", "false")
    ExclusionList = ElementTree.SubElement(Resolve, "ExclusionList")
    ExclusionList.text = "%ComputerName%; localhost; *.local"
    Encryption = ElementTree.SubElement(Options, "Encryption")
    Encryption.set("mode", 'basic')
    Encryption = ElementTree.SubElement(Options, "HttpProxiesSupport")
    Encryption.set("enabled", 'true')
    Encryption = ElementTree.SubElement(Options, "HandleDirectConnections")
    Encryption.set("enabled", 'false')
    Encryption = ElementTree.SubElement(Options, "ConnectionLoopDetection")
    Encryption.set("enabled", 'true')
    Encryption = ElementTree.SubElement(Options, "ProcessServices")
    Encryption.set("enabled", 'false')
    Encryption = ElementTree.SubElement(Options, "ProcessOtherUsers")
    Encryption.set("enabled", 'false')
    ProxyList = ElementTree.SubElement(ProxifierProfile, "ProxyList")
    for temp in data:
        i += 1  # 从101开始增加
        Proxy = ElementTree.SubElement(ProxyList, "Proxy")
        Proxy.set("id", str(i))
        if not temp['https']:
            Proxy.set("type", "HTTP")
        else:
            Proxy.set("type", "HTTPS")
            Proxy.text = str(i)
            ProxyIDList.append(i)
        Address = ElementTree.SubElement(Proxy, "Address")
        Address.text = temp['proxy'].split(":", 1)[0]

        Port = ElementTree.SubElement(Proxy, "Port")
        Port.text = temp['proxy'].split(":", 1)[1]

        Options = ElementTree.SubElement(Proxy, "Options")
        Options.text = "48"
    ChainList = ElementTree.SubElement(ProxifierProfile, "ChainList")

    Chain = ElementTree.SubElement(ChainList, "Chain")
    Chain.set("id", str(i))
    Chain.set("type", "simple")

    Name = ElementTree.SubElement(Chain, "Name")
    Name.text="AgentPool"

    for temp_id in ProxyIDList:
        Proxy = ElementTree.SubElement(Chain, "Proxy")
        Proxy.set("enabled", "true")
        Proxy.text=str(temp_id)
    RuleList = ElementTree.SubElement(ProxifierProfile, "RuleList")

    Rule = ElementTree.SubElement(RuleList, "Rule")
    Rule.set("enabled", "true")
    Name = ElementTree.SubElement(Rule,"Name")
    Applications = ElementTree.SubElement(Rule,"Applications")
    Action = ElementTree.SubElement(Rule,"Action")

    Name.text="御剑后台扫描工具.exe [auto-created]"
    Applications.text="御剑后台扫描工具.exe"
    Action.set("type","Direct")

    # Rule
    Rule = ElementTree.SubElement(RuleList, "Rule")
    Rule.set("enabled", "true")
    Name = ElementTree.SubElement(Rule,"Name")
    Targets = ElementTree.SubElement(Rule,"Targets")
    Action = ElementTree.SubElement(Rule,"Action")

    Name.text="Localhost"
    Targets.text="localhost; 127.0.0.1; %ComputerName%"
    Action.set("type", "Direct")

    # Rule
    Rule = ElementTree.SubElement(RuleList, "Rule")
    Rule.set("enabled", "true")
    Name = ElementTree.SubElement(Rule, "Name")
    Action = ElementTree.SubElement(Rule, "Action")
    Name.text = "Default"
    Action.text = "102"
    Action.set("type", "Proxy")

    tree = ElementTree.ElementTree(ProxifierProfile)
    tree.write("ProxifierConf.ppx", encoding="UTF-8", xml_declaration=True)
    if __name__ == '__main__':
    proxy_data = RedisProxyGet()
    xmlOutputs(proxy_data)
    print("ProxifierConf.ppx配置文件创建完成....")

编译成功生成ProxyfierConf.ppx文件。双击导入proxyfier即可

这里proxyfier的版本不能太高,否则会报错,建议3.3.1

Proxypool代理池搭建_第3张图片

你可能感兴趣的:(技术分享总结,笔记,数据库,爬虫,docker)