基于各种场景, 我们主要针对redis缓存的get, set操作进行各种codis集群HA方案的验证
当codis集群中只有一个Group1时,该场景可以确保测试程序操作的数据都在该分组的redis实例中(通过codis-proxy的程序大多数不会使用hash tag来区分slots,该场景测试大多数操作场景,功能是否正常).
让Group1.Master下线. > 1. 验证get,set 正常, 会有5秒的服务不可用.
让Group1.Slave下线. > 1. get,set 正常
让Group1.Mster, Group1.Slave同时下线, 之后再依次上线 > 1. get, set 服务不可用, 上线之后服务可用.
当codis集群中添加Group2, Group3.通过 hashtag 确定对redis的操作落在了指定的slots上.
添加Group2,Group3的过程中(测试slot自动均衡过程中是否影响操作). > 1. 验证get, set是否异常
让Group1.Master, Group2.Master下线(测试集群中多个分组中的master出现问题, codis-ha是否可以顺利将slave提升为master) > 1. 验证操作这两个group的数据是否有问题
zookeeper不可用时.
查看get, set是否可以正常工作
初始状态
group1: 192.168.200.120:8001(master), 192.168.200.120:8002(slave)
group2: 192.168.200.120:8003(master), 192.168.200.120:8004(slave)
2台codis-proxy, 地址为: 192.168.200.120:19000, 192.168.200.120:19001
2个group
1个codis-server:192.168.200.120:8005
1个HAproxy:
单台redis的性能测试(2.8.21版本)
目的: 可以比对codis-server与redis原生版本的性能, 排除因为机器性能产生的差异.
压测地址:
200个用户,
单台codis-server性能测试
redisMAMA监控codis-server截图
对codis-serve压测客户端截图1
对codis-serve压测客户端截图2
目的: 脱离proxy的转发, 看实际codis-server的性能.
压测地址:192.168.200.120:8001
200个用户, 15000qps
压测结果
加入codis-proxy,然后一个group1(1个master,1个slave)
读测试
写测试
目的: 与独立codis-server的压测结果进行比对, 确定codis-proxy的性能损耗
压测地址: 192.168.200.120:19000
200个用户, 15000qps
压测结果
动态加入新的group2(1个master,1个slave),变为两个组
读测试
写测试
目的: 查看添加group之后, codis-proxy是否有新的性能提升
压测地址: 192.168.200.120:19000
200个用户, xxx qps
压测结果
动态缩减group2, 变为1个组
目的: 动态缩减过程中,数据迁移过程中,对性能的损耗
keepAlive压力测试
压测地址
启动codis web页面 在zookeeper中注册codis组的节点
nohup ./bin/codis-config -c config.ini -L ./log/dashboard.log dashboard --addr=:18087 --http-log=./log/requests.log &>/dev/null
初始化slot
$CODIS_HOME/bin/codis-config -c $CODIS_HOME/conf/config.ini slot init -f
* 查看slot槽位1的信息```./codis-config -c /usr/local/codis/conf/config.ini slot info 1```
添加codis组
codis-config -c /usr/local/codis/conf/config.ini server add-group 1
* 删除codis组```Codis-config -c /usr/local/codis/conf/config.ini server remove-group 1```
添加机器到codis组中
codis-config -c /usr/local/codis/conf/config.ini server add 2 192.168.10.170:6380 master
* 从codis组中删除redis实例```Codis-config -c /usr/local/codis/conf/config.ini server remove 1 192.168.10.169:6379```
分槽
./bin/codis-config -c /etc/codis/config_10.ini slot range-set 0 300 1 online
生成一个proxy,把codis_proxy_1这个proxy名称注册到zookeeper中,并设置为offline状态
/usr/local/codis/bin/codis-config -c /usr/local/codis/conf/config.ini proxy offline codis_proxy_1
启动proxy,代理端口为19000 http端口为11000
nohup /usr/local/codis/bin/codis-proxy --log-level info -c /usr/local/codis/conf/config.ini -L /usr/local/codis/logs/proxy.log --cpu=8 --addr=0.0.0.0:19000 --http-addr=0.0.0.0:11000 &
把proxy id为codis_proxy_1的proxy设置为online状态,可以让客户端访问
/usr/local/codis/bin/codis-config -c /usr/local/codis/conf/config.ini proxy online codis_proxy_1
启动codis-ha,用来做group中从自动提升为master用,并针对哪个codis组名
nohup ./codis-ha --codis-config=127.0.0.1:18087 --productName=testgroup1 &
当group_1中的master断掉,需要先从group_1中把挂掉的master节点删除掉(需要写脚本完成自动化).
codis-config -c ../conf/config.ini server remove 1 192.168.10.170:6379
把以前的master重新加进来换成slave节点(需要写脚本完成自动化)
codis-config -c ../conf/config.ini server add 1 192.168.10.170:6379 slave
如果group_1中有 >=2 个slave时,master断掉后,没有提升为master的slave还是会同步断掉的master节点,依次把slave从group_1中删除, 依次把slave节点加入group_1中.
当group_1中的slave断掉,之后恢复后,人为重新加入到group_1中(需要写脚本完成自动化).
codis-config -c ../conf/config.ini server add 1 192.168.10.170:6379 slave
自动切换脚本
#!/usr/bin/python # -*- coding:utf-8 -*- import os import socket import logging from logging import handlers smslog = '/var/log/codisswitch.log' def log(msg,level='info',logfile=smslog): if not os.path.exists(logfile): os.mknod(logfile) formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s') logger = logging.getLogger('codis-sms-sender') logger.setLevel(logging.DEBUG) file_handler = logging.handlers.TimedRotatingFileHandler(logfile,'D') file_handler.setFormatter(formatter) Levels = {'debug':logger.debug, 'info':logger.info, 'warning':logger.warning, 'error':logger.error, 'critical':logger.critical } logger.addHandler(file_handler) Levels[level](msg) logger.removeHandler(file_handler) file_handler.close() def IsOpen(ip,port): s = socket.socket(socket.AF_INET,socket.SOCK_STREAM) try: s.connect((ip,int(port))) s.shutdown(2) print '%s is open' % port return 'true' except: print '%s is down' % port return 'false' def codisSwitch(): codisInfo = eval(os.popen('/usr/local/codis/bin/codis-config -c /usr/local/codis/conf/config.ini server list').read()) for i in codisInfo: for ii in i['servers']: print ii if ii['type'] == 'offline': ip = ii['addr'].split(':')[0] port = ii['addr'].split(':')[1] codisgroup = ii['group_id'] msg = ip,str(port)+'做为group:'+str(codisgroup)+'中的master is down' log(msg , 'critical') portStatus = IsOpen(ip,port) print 'group:'+str(codisgroup),'master:'+ii['addr'],'is down' if portStatus == 'true': codisInfo = eval(os.popen('/usr/local/codis/bin/codis-config -c /usr/local/codis/conf/config.ini server remove %s %s'%(str(codisgroup),ii['addr'])).read()) print codisInfo if codisInfo['msg'] == 'OK': print '------' msg = ip,port,'已经从codis集群中移除' log(msg , 'critical') codisaddInfo = eval(os.popen('/usr/local/codis/bin/codis-config -c /usr/local/codis/conf/config.ini server add %s %s slave'%(str(codisgroup),ii['addr'])).read()) if codisaddInfo['msg'] == 'OK': print '++++++' msg = ip,port,'已经重新加入到codis集群中,角色是slave' log(msg , 'critical') elif ii['type'] == 'slave': print '-----------------' ip = ii['addr'].split(':')[0] port = ii['addr'].split(':')[1] codisgroup = ii['group_id'] portStatus = IsOpen(ip,port) if portStatus == 'false': print '======' msg = 'group:'+str(codisgroup),'slave:'+ii['addr'],'is down' log(msg , 'critical') else: os.popen('/usr/local/codis/bin/codis-config -c /usr/local/codis/conf/config.ini server add %s %s slave'%(str(codisgroup),ii['addr'])).read() if __name__ =='__main__': while True: codisSwitch()
通过Namespace进行业务隔离
codis
fence(proxy 地址)
servers
slots
proxy(proxy列表)
migrate_tasks
dashbord
LOCK
actions (1033个子节点)
ActionResponse (1033个子节点, 应该和actions是一一对应)
10-219:19000 (Proxy Addr)
ip:port = {"type":"slave","group_id":1,"addr":"192.168.10.168:6379"}
group1
slot_345 = {"product_name":"testgroup1","id":1,"group_id":1,"state":{"status":"online","migrate_status":{"from":-1,"to":-1},"last_op_ts":"0"}}
slot
codis_proxy_1 (Proxy Name, 临时节点) = {"id":"codis_proxy_1","addr":"10-219:19000","last_event":"","last_event_ts":0,"state":"online","description":"","debug_var_addr":"10-219:11000","pid":20659,"start_at":"2015-10-19 11:58:21.473980358 +0800 CST"}
0000001602 = {"type":"slot_changed","desc":"","target":{"product_name":"testgroup1","id":801,"group_id":-1,"state":{"status":"offline","migrate_status":{"from":-1,"to":-1},"last_op_ts":"0"}},"ts":"1445224886","receivers":null}
0000001602 = {"type":"slot_changed","desc":"","target":{"product_name":"testgroup1","id":462,"group_id":-1,"state":{"status":"offline","migrate_status":{"from":-1,"to":-1},"last_op_ts":"0"}},"ts":"1445224755","receivers":null}
product name
动态缩容, 扩容, 对应用完全透明, 可以在大促或者双十一时, 动态添加实例, 峰值过后再缩容.
product name 进行多租户隔离
成熟的管理界面.
利用zk进行动态数据迁移.
支持MSET, MGET
支持LPUSH LPOP
通过Hash tag支持eval
有完整的数据迁移方案