Python爬虫实战 | (19) 搭建Cookies池

在本篇博客中我们将构建Cookies池,上篇博客中我们搭建了IP代理池,与IP代理池不同,Cookies池具有针对性,如果你爬微博就要构建一个微博cookies池,爬知乎就需要构建一个知乎cookies池;而IP代理池是通用的,可供不同的爬虫任务共同使用。

比如当构建微博cookies池时,我们需要一些微博账号,然后使用selenium模拟登录微博,识别验证码,登录成功后,获取该账号对应的cookies,存入redis数据库,从而维护一个微博cookies池,一个账号对应一个cookies。知乎等其他网站亦然。

当构建好cookies池后,我们就可以直接通过cookies进行登录,而不需要再模拟登录,包括输入账号密码,识别验证码等操作。

本篇博客中我们将使用redis,flask搭建一个可扩展的cookies池,这里我们将分别搭建微博和知乎的cookies池,之后任意站点的cookies池都可以通过类似的操作进行搭建,可以不断扩展,维护多个cookies池。

 

  • 模拟登录微博,获取cookies

首先需要使用超级鹰帮助我们识别微博登录时的图片验证码:

chaojiying.py:

import requests
from hashlib import md5


class Chaojiying(object):

    def __init__(self, username, password, soft_id):
        self.username = username
        self.password = md5(password.encode('utf-8')).hexdigest()
        self.soft_id = soft_id
        self.base_params = {
            'user': self.username,
            'pass2': self.password,
            'softid': self.soft_id,
        }
        self.headers = {
            'Connection': 'Keep-Alive',
            'User-Agent': 'Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0)',
        }
        

    def post_pic(self, im, codetype):
        """
        im: 图片字节
        codetype: 题目类型 参考 http://www.chaojiying.com/price.html
        """
        params = {
            'codetype': codetype,
        }
        params.update(self.base_params)
        files = {'userfile': ('ccc.jpg', im)}
        r = requests.post('http://upload.chaojiying.net/Upload/Processing.php', data=params, files=files, headers=self.headers)
        return r.json()

    def report_error(self, im_id):
        """
        im_id:报错题目的图片ID
        """
        params = {
            'id': im_id,
        }
        params.update(self.base_params)
        r = requests.post('http://upload.chaojiying.net/Upload/ReportError.php', data=params, headers=self.headers)
        return r.json()

使用selenium模拟登录微博,登录成功后,获取帐号对应的cookies:

loginweibo.py:

import requests
from requests import RequestException
from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException, TimeoutException
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from CookiesPool.chaojiying import Chaojiying

# 超级鹰用户名、密码、软件ID、
CHAOJIYING_USERNAME = ''
CHAOJIYING_PASSWORD = ''
CHAOJIYING_SOFT_ID = 
CHAOJIYING_KIND = 1902

class LoginWeibo():
    def __init__(self,username,password,browser):
        self.url = 'https://www.weibo.com'
        self.browser = browser
        self.wait = WebDriverWait(self.browser,20)
        self.username = username
        self.password = password
        self.chaojiying = Chaojiying(CHAOJIYING_USERNAME, CHAOJIYING_PASSWORD, CHAOJIYING_SOFT_ID)

    # def __del__(self):
    #     self.browser.close()

    def open(self):
        """
        打开网页输入用户名密码
        :return: None
        """
        self.browser.get(self.url)
        #找到用户名和密码输入框
        '''
        
        '''
        username = self.wait.until(EC.presence_of_element_located((By.ID,'loginname')))
        '''
        
        '''
        password = self.wait.until(EC.presence_of_element_located((By.NAME,'password')))
        #输入用户名和密码
        username.send_keys(self.username)
        password.send_keys(self.password)

    def get_click_button(self):
        '''
        找到登录按钮
        :return:
        '''

        '''
        登录
        '''
        button = self.wait.until(EC.element_to_be_clickable((By.CLASS_NAME,'W_btn_a')))
        return  button

    def login_successfully(self):
        """
        判断登陆是否成功
        :return:
        """
        '''
        登录成功才能看到
        I
        '''
        try:
            return bool(
                WebDriverWait(self.browser,5).until(EC.presence_of_element_located((By.CSS_SELECTOR,'.ficon_mail')))
            )
        except TimeoutException:
            return  False

    def get_click_image(self,name='captcha.png'):
        """
        获取验证码图片
        :param name:
        :return: 图片对象
        """
        try:
            '''
            
            '''
            element = self.wait.until(EC.presence_of_element_located((By.XPATH,'//img[@action-type="btn_change_verifycode"]')))
            image_url = element.get_attribute('src')
            image = get_html(image_url).content
            with open(name, 'wb') as f:
                f.write(image)
            return image
        except NoSuchElementException:
            print('')
        return  None

    def password_error(self):
        """
        判断是否密码错误
        :return:
        """
        try:
            element = WebDriverWait(self.browser, 5).until(
                EC.presence_of_element_located((By.XPATH, '//span[@node-type="text"]')))
            print(element.text)
            if element.text == '用户名或密码错误。查看帮助':
                return True
        except TimeoutException:
            return False

    def get_cookies(self):
        """
        获取Cookies
        :return:
        """
        print(self.browser.get_cookies())
        return self.browser.get_cookies()

    def login(self):
        #打开网址 输入用户名和密码
        self.open()
        # 点击登录按钮
        button = self.get_click_button()
        button.click()
        if self.password_error():
            print('用户名或密码错误')
            return {
                'status': 2,
                'content': '用户名或密码错误'
            }
        if self.login_successfully():
            print('登录成功')
            #获取帐号对应的cookies
            cookies = self.get_cookies()
            return {
                'status': 1,
                'content': cookies
            }
        else: #有时会需要验证码
            # 获取验证码图片
            image = self.get_click_image()
            # 识别验证码
            result = self.chaojiying.post_pic(image, CHAOJIYING_KIND)
            print(result)
            # 输入验证码
            verifycode = self.wait.until(EC.presence_of_element_located((By.NAME, 'verifycode')))
            verifycode.send_keys(result['pic_str'])
            # 点击登录按钮
            button = self.get_click_button()
            button.click()
            if self.login_successfully():
                print('登录成功')
                # 获取帐号对应的cookies
                cookies = self.get_cookies()
                return {
                    'status': 1,
                    'content': cookies
                }
            else:
                self.chaojiying.report_error(result['pic_id'])
                self.login()
                # return {
                #     'status': 3,
                #     'content': '登录失败'
                # }

def get_html(url):
    try:
        # 添加User-Agent,放在headers中,伪装成浏览器
        headers = {
            'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36'
        }
        response = requests.get(url, headers=headers)
        if response.status_code == 200:
            response.encoding = response.apparent_encoding
            return response
        return None
    except RequestException:
        return None

if __name__ == '__main__':
    result = LoginWeibo('','',webdriver.Chrome()).login()

Python爬虫实战 | (19) 搭建Cookies池_第1张图片

  • 模拟登录知乎,获取cookies

使用requests模拟登录微博,登录成功后,获取帐号对应的cookies:

loginzhihu.py

import requests
import re
import execjs
import time
import hmac
from hashlib import sha1


class Zhihu(object):

    def __init__(self, username, password):

        self.username = username
        self.password = password
        self.session = requests.session()
        self.headers = {
            'content-type': 'application/x-www-form-urlencoded',
            'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.109 Safari/537.36',
            'x-zse-83': '3_1.1'
        }

    def login(self):

        # 请求login_url,udid_url,captcha_url加载所需要的cookie
        login_url = 'https://www.zhihu.com/signup?next=/'
        resp = self.session.get(login_url, headers=self.headers)
        print("请求{},响应状态码:{}".format(login_url, resp.status_code))
        # print(self.session.cookies.get_dict())
        # self.save_file('login',resp.text)

        udid_url = 'https://www.zhihu.com/udid'
        resp = self.session.post(udid_url, headers=self.headers)
        print("请求{},响应状态码:{}".format(udid_url, resp.status_code))
        # print(self.session.cookies.get_dict())

        captcha_url = 'https://www.zhihu.com/api/v3/oauth/captcha?lang=en'
        resp = self.session.get(captcha_url, headers=self.headers)
        print("请求{},响应状态码:{}".format(captcha_url, resp.status_code))
        # print(self.session.cookies.get_dict())
        # print(resp.text)
        # self.save_file('captcha',resp.text)

        # 校验是否需要验证吗,需要则直接退出,还没遇到过需要验证码的
        if re.search('true', resp.text):
            print('需要验证码')
            exit()

        # 获取signature参数
        self.time_str = str(int(time.time() * 1000))
        signature = self.get_signature()
        # print(signature)

        # 拼接需要加密的字符串
        string = "client_id=c3cef7c66a1843f8b3a9e6a1e3160e20&grant_type=password×tamp={}&source=com.zhihu.web&signature={}&username={}&password={}&captcha=&lang=en&ref_source=homepage&utm_source=".format(
            self.time_str, signature, self.username, self.password)
        # print(string)
        # 加密字符串
        encrypt_string = self.encrypt(string)
        # print(encrypt_string)

        # post请求登陆接口
        post_url = "https://www.zhihu.com/api/v3/oauth/sign_in"
        resp = self.session.post(post_url, data=encrypt_string, headers=self.headers)
        print("请求{},响应状态码:{}".format(post_url, resp.status_code))
        print(self.session.cookies.get_dict())
        # print(resp.text)
        # self.save_file('post', resp.text)

        # 校验是否登陆成功
        if re.search('user_id', resp.text):
            print('登陆成功')
            return {
                'status': 1,
                'content': self.session.cookies.get_dict()
            }
        else:
            print("登陆失败")
            return {
                'status': 2,
                'content': "登陆失败"
            }

    def test(self):

        # 请求个人信息接口查看个人信息
        me_url = 'https://www.zhihu.com/api/v4/me'
        data = {
            'include': 'ad_type;available_message_types,default_notifications_count,follow_notifications_count,vote_thank_notifications_count,messages_count;draft_count;following_question_count;account_status,is_bind_phone,is_force_renamed,email,renamed_fullname;ad_type'
        }
        resp = self.session.get(me_url, data=data, headers=self.headers)
        print("请求{},响应状态码:{}".format(me_url, resp.status_code))
        print(resp.text)
        return resp.status_code
        # self.save_file('me',resp.text)

    def encrypt(self, string):
        with open('zhihu.js', 'r', encoding='utf-8') as f:
            js = f.read()
        result = execjs.compile(js).call('encrypt', string)
        return result

    def get_signature(self):

        h = hmac.new(key='d1b964811afb40118a12068ff74a12f4'.encode('utf-8'), digestmod=sha1)
        grant_type = 'password'
        client_id = 'c3cef7c66a1843f8b3a9e6a1e3160e20'
        source = 'com.zhihu.web'
        now = self.time_str
        h.update((grant_type + client_id + source + now).encode('utf-8'))
        return h.hexdigest()

    def save_file(self, name, html):

        with open('{}.html'.format(name), 'w', encoding='utf-8') as f:
            f.write(html)


if __name__ == "__main__":
    account = Zhihu('13552134696', '1315882755a')
    account.login()
    account.test()
  • 搭建可扩展cookies池

主要分为4个模块:存储模块、生成模块、检测模块和接口模块。

还有配置文件config,py和调度模块scheduler.py:

配置文件:

存储一些各个模块中需要使用的变量和参数配置config.py:

# Redis数据库地址
REDIS_HOST = 'localhost'

# Redis端口
REDIS_PORT = 6379

# Redis密码,如无填None
REDIS_PASSWORD = None

# 产生器使用的浏览器
BROWSER_TYPE = 'Chrome'

# 产生器类,如扩展其他站点,请在此配置
GENERATOR_MAP = {
    'weibo': 'WeiboCookiesGenerator',
    'zhihu': 'ZhihuCookiesGenerator'

    #'XXX':'XXXCookiesGenerator'

}

# 测试类,如扩展其他站点,请在此配置
TESTER_MAP = {
    'weibo': 'WeiboValidTester',
    'zhihu': 'ZhihuValidTester'
    #'XXX':'XXXValidTester'
}

TEST_URL_MAP = {
    'weibo': 'https://www.weibo.com',
    'zhihu': 'https://www.zhihu.com'
}

# 产生器和验证器循环周期
CYCLE = 120

# API地址和端口
API_HOST = '127.0.0.1'
API_PORT = 5000

# 产生器开关,模拟登录添加Cookies
GENERATOR_PROCESS = True
# 验证器开关,循环检测数据库中Cookies是否可用,不可用删除
VALID_PROCESS = True
# API接口服务
API_PROCESS = True

生成模块:

负责生成新的Cookies 。此模块会从存储模块逐个拿取账号的用户名和密码, 然后模拟登录目标页面,判断登录成功,就将Cookies返回并交给存储模块存储。generator.py:

import json
from selenium import webdriver
from selenium.webdriver import DesiredCapabilities
from CookiesPool.cookiespool.config import *
from CookiesPool.cookiespool.db import RedisClient
from CookiesPool.weibologin.loginweibo import LoginWeibo
from CookiesPool.zhihulogin.loginzhihu import Zhihu


class CookiesGenerator(object):
    def __init__(self, website='default'):
        """
        父类, 初始化一些对象
        :param website: 名称
        :param browser: 浏览器, 若不使用浏览器则可设置为 None
        """
        self.website = website
        self.cookies_db = RedisClient('cookies', self.website) #存储用户名和cookies
        self.accounts_db = RedisClient('accounts', self.website) #存储用户名和密码
        self.init_browser()

    def __del__(self):
        self.close()

    def init_browser(self):
        """
        通过browser参数初始化全局浏览器供模拟登录使用
        :return:
        """
        if BROWSER_TYPE == 'PhantomJS':
            caps = DesiredCapabilities.PHANTOMJS
            caps[
                "phantomjs.page.settings.userAgent"] = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36'
            self.browser = webdriver.PhantomJS(desired_capabilities=caps)
            self.browser.set_window_size(1400, 500)
        elif BROWSER_TYPE == 'Chrome':
            self.browser = webdriver.Chrome()

    def new_cookies(self, username, password):
        """
        新生成Cookies,子类需要重写
        :param username: 用户名
        :param password: 密码
        :return:
        """
        raise NotImplementedError

    def process_cookies(self, cookies):
        """
        处理Cookies
        :param cookies:
        :return:
        """
        dict = {}
        for cookie in cookies:
            dict[cookie['name']] = cookie['value']
        return dict

    def run(self):
        """
        运行, 得到所有账户, 然后顺次模拟登录
        :return:
        """
        accounts_usernames = self.accounts_db.usernames()
        cookies_usernames = self.cookies_db.usernames()

        for username in accounts_usernames:
            if not username in cookies_usernames:
                password = self.accounts_db.get(username)
                print('正在生成Cookies', '账号', username, '密码', password)
                result = self.new_cookies(username, password)
                # 成功获取
                if self.website == 'weibo':
                    if result.get('status') == 1:
                        cookies = self.process_cookies(result.get('content'))
                        print('成功获取到Cookies', cookies)
                        if self.cookies_db.set(username, json.dumps(cookies)):
                            print('成功保存Cookies')
                    # 密码错误,移除账号
                    elif result.get('status') == 2:
                        print(result.get('content'))
                        if self.accounts_db.delete(username):
                            print('成功删除账号')
                    else:
                        print(result.get('content'))
                if self.website == 'zhihu':
                    if result.get('status') == 1:
                        cookies = result.get('content')
                        print('成功获取到Cookies', cookies)
                        if self.cookies_db.set(username, json.dumps(cookies)):
                            print('成功保存Cookies')
        else:
            print('所有账号都已经成功获取Cookies')

    def close(self):
        """
        关闭
        :return:
        """
        try:
            print('Closing Browser')
            self.browser.close()
            del self.browser
        except TypeError:
            print('Browser not opened')


class WeiboCookiesGenerator(CookiesGenerator):
    def __init__(self, website='weibo'):
        """
        初始化操作
        :param website: 站点名称
        :param browser: 使用的浏览器
        """
        CookiesGenerator.__init__(self, website)
        self.website = website

    def new_cookies(self, username, password):
        """
        生成Cookies
        :param username: 用户名
        :param password: 密码
        :return: 用户名和Cookies
        """
        return LoginWeibo(username, password, self.browser).login()


class ZhihuCookiesGenerator(CookiesGenerator):
    def __init__(self, website='zhihu'):
        """
        初始化操作
        :param website: 站点名称
        :param browser: 使用的浏览器
        """
        CookiesGenerator.__init__(self, website)
        self.website = website

    def new_cookies(self, username, password):
        """
        生成Cookies
        :param username: 用户名
        :param password: 密码
        :return: 用户名和Cookies
        """
        return Zhihu(username, password).login()


if __name__ == '__main__':
    generator = WeiboCookiesGenerator()
    generator.run()

存储模块:

有两个数据库,一个负责存储每个帐号的用户名和密码,另一个存储用户名和Cookies 信息。db.py:

import random
import redis
from CookiesPool.cookiespool.config import *


class RedisClient(object):
    def __init__(self, type, website, host=REDIS_HOST, port=REDIS_PORT, password=REDIS_PASSWORD):
        """
        初始化Redis连接
        :param host: 地址
        :param port: 端口
        :param password: 密码
        """
        self.db = redis.StrictRedis(host=host, port=port, password=password, decode_responses=True)
        self.type = type
        self.website = website

    def name(self):
        """
        获取Hash的名称
        :return: Hash名称
        """
        return "{type}:{website}".format(type=self.type, website=self.website)

    def set(self, username, value):
        """
        设置键值对
        :param username: 用户名
        :param value: 密码或Cookies
        :return:
        """
        return self.db.hset(self.name(), username, value)

    def get(self, username):
        """
        根据键名获取键值
        :param username: 用户名
        :return:
        """
        return self.db.hget(self.name(), username)

    def delete(self, username):
        """
        根据键名删除键值对
        :param username: 用户名
        :return: 删除结果
        """
        return self.db.hdel(self.name(), username)

    def count(self):
        """
        获取数目
        :return: 数目
        """
        return self.db.hlen(self.name())

    def random(self):
        """
        随机得到键值,用于随机Cookies获取
        :return: 随机Cookies
        """
        return random.choice(self.db.hvals(self.name()))

    def usernames(self):
        """
        获取所有账户信息
        :return: 所有用户名
        """
        return self.db.hkeys(self.name())

    def all(self):
        """
        获取所有键值对
        :return: 用户名和密码或Cookies的映射表
        """
        return self.db.hgetall(self.name())


if __name__ == '__main__':
    conn = RedisClient('accounts', 'weibo')
    result = conn.set('[email protected]', 'Xdnxwovqp9')
    print(result)

检测模块:

定时检测数据库中的Cookies 能否正常登录,tester.py:

import json
import re
import requests
from requests.exceptions import ConnectionError
from CookiesPool.cookiespool.db import *
from CookiesPool.zhihulogin.loginzhihu import Zhihu


class ValidTester(object):
    def __init__(self, website='default'):
        self.website = website
        self.cookies_db = RedisClient('cookies', self.website)
        self.accounts_db = RedisClient('accounts', self.website)

    def test(self, username, cookies):
        raise NotImplementedError

    def run(self):
        cookies_groups = self.cookies_db.all()
        for username, cookies in cookies_groups.items():
            self.test(username, cookies)


class WeiboValidTester(ValidTester):
    def __init__(self, website='weibo'):
        ValidTester.__init__(self, website)

    def test(self, username, cookies):
        print('正在测试Cookies', '用户名', username)
        try:
            cookies = json.loads(cookies)
        except TypeError:
            print('Cookies不合法', username)
            self.cookies_db.delete(username)
            print('删除Cookies', username)
            return
        try:
            test_url = TEST_URL_MAP[self.website]
            response = requests.get(test_url, cookies=cookies, timeout=5)
            if response.status_code == 200:
                html = response.text
                pattern = re.compile("'islogin']='1'")
                islogin = pattern.search(html)
                if islogin:
                    print('Cookies有效', username)
                else:
                    print(response.status_code, response.headers)
                    print('Cookies失效', username)
                    self.cookies_db.delete(username)
                    print('删除Cookies', username)
            else:
                print(response.status_code, response.headers)
                print('Cookies失效', username)
                self.cookies_db.delete(username)
                print('删除Cookies', username)
        except ConnectionError as e:
            print('发生异常', e.args)


class ZhihuValidTester(ValidTester):
    def __init__(self, website='zhihu'):
        ValidTester.__init__(self, website)

    def test(self, username, cookies):
        print('正在测试Cookies', '用户名', username)
        try:
            cookies = json.loads(cookies)
        except TypeError:
            print('Cookies不合法', username)
            self.cookies_db.delete(username)
            print('删除Cookies', username)
            return
        try:
            zhihu = Zhihu()
            response = zhihu.test()
            if response.status_code == 200:
                print('Cookies有效', username)
            else:
                print(response.status_code, response.headers)
                print('Cookies失效', username)
                self.cookies_db.delete(username)
                print('删除Cookies', username)
        except ConnectionError as e:
            print('发生异常', e.args)


if __name__ == '__main__':
    WeiboValidTester().run()

接口模块:

提供对外服务的接口。api.py:

import json
from flask import Flask, g
from CookiesPool.cookiespool.config import *
from CookiesPool.cookiespool.db import *

__all__ = ['app']

app = Flask(__name__)

@app.route('/')
def index():
    return '

Welcome to Cookie Pool System

' def get_conn(): """ 获取 :return: """ for website in GENERATOR_MAP: print(website) if not hasattr(g, website): setattr(g, website + '_cookies', eval('RedisClient' + '("cookies", "' + website + '")')) setattr(g, website + '_accounts', eval('RedisClient' + '("accounts", "' + website + '")')) return g @app.route('//random') def random(website): """ 获取随机的Cookie, 访问地址如 /weibo/random :return: 随机Cookie """ g = get_conn() cookies = getattr(g, website + '_cookies').random() return cookies @app.route('//add//') def add(website, username, password): """ 添加用户, 访问地址如 /weibo/add/user/password :param website: 站点 :param username: 用户名 :param password: 密码 :return: """ g = get_conn() print(username, password) getattr(g, website + '_accounts').set(username, password) return json.dumps({'status': '1'}) @app.route('//count') def count(website): """ 获取Cookies总数 """ g = get_conn() count = getattr(g, website + '_cookies').count() return json.dumps({'status': '1', 'count': count}) if __name__ == '__main__': app.run(host=API_HOST,port=API_PORT)

调度模块:

调度各个模块的运行,scheduler.py:

import time
from multiprocessing import Process
from CookiesPool.cookiespool.api import app
from CookiesPool.cookiespool.config import *
from CookiesPool.cookiespool.generator import *
from CookiesPool.cookiespool.tester import *


class Scheduler(object):
    @staticmethod
    def valid_cookie(cycle=CYCLE):
        while True:
            print('Cookies检测进程开始运行')
            try:
                for website, cls in TESTER_MAP.items():
                    tester = eval(cls + '(website="' + website + '")')
                    tester.run()
                    print('Cookies检测完成')
                    del tester
                    time.sleep(cycle)
            except Exception as e:
                print(e.args)

    @staticmethod
    def generate_cookie(cycle=CYCLE):
        while True:
            print('Cookies生成进程开始运行')
            try:
                for website, cls in GENERATOR_MAP.items():
                    generator = eval(cls + '(website="' + website + '")')
                    generator.run()
                    print('Cookies生成完成')
                    generator.close()
                    time.sleep(cycle)
            except Exception as e:
                print(e.args)

    @staticmethod
    def api():
        print('API接口开始运行')
        app.run(host=API_HOST, port=API_PORT)

    def run(self):
        if API_PROCESS:
            api_process = Process(target=Scheduler.api)
            api_process.start()

        if GENERATOR_PROCESS:
            generate_process = Process(target=Scheduler.generate_cookie)
            generate_process.start()

        if VALID_PROCESS:
            valid_process = Process(target=Scheduler.valid_cookie)
            valid_process.start()


if __name__ == '__main__':
    test = Scheduler()
    test.run()

最终效果:

使用方法:

首先运行scheduler.py文件,下方有api接口的地址如http://127.0.0.1:5000,复制地址到浏览,打开会看到欢迎界面:

Python爬虫实战 | (19) 搭建Cookies池_第2张图片

然后添加用户名和密码,如果添加微博的话,就访问http://127.0.0.1:5000/weibo/add/username/password,回车,便可将此帐号的用户名和密码添加到redis的accounts:weibo数据库中,添加知乎等其他站点帐号亦然,换一下站点的名字即可。

然后程序会自动为之前添加的帐号,生成cookies,并存储到redis的cookies:weibo数据库中,并定时监测其有效性。

然后重复这个过程,不断手动添加用户名和密码,然后自动生成rookies,储存到上述两个数据库中。

最后 ,如果需要在爬虫程序中获取cookies池中的cookies,爬取微博则可以访问http://127.0.0.1:5000/weibo/random,获取微博cookies,其他站点亦然,换一下名字即可。

Python爬虫实战 | (19) 搭建Cookies池_第3张图片

完整项目

 

 

 

 

 

 

你可能感兴趣的:(Python爬虫实战 | (19) 搭建Cookies池)