1, 安装环境
+ 安装服务器
- ccsp-1-1(192.168.16.12, 默认为 master 节点)
- ccsp-2-1(192.168.16.14)
- ccsp-1-3(192.168.16.16)
- ccsp-2-3(192.168.16.17)
+ Redis 安装版本 redis-2.8.12
- http://download.redis.io/releases/redis-2.8.12.tar.gz
+ RW 属性: master 节点可读可写,slave 节点为只读
+ 所有 Redis 节点的相关端口和路径:
Port : 6379
Config file : /etc/redis/6379.conf
Log file : /var/log/redis_6379.log
Data dir : /var/lib/redis/6379
Executable : /usr/local/bin/redis-server
Cli Executable : /usr/local/bin/redis-cli
Sentinel Config: /etc/redis/sentinel.conf
Cluster Name : ccsp
2, 安装概述
主要基于 Redis 官方版本提供的 Sentinel 来实现 Redis 集群,Sentinel 主要实现 4 个任务:
+ Monitoring: 监控 Master 和 Slave 节点的运行状况;
+ Notification: 当监测到有 Redis 节点发生故障, Sentinel 需要通过特定的 API 通知系统管理员, 其他程序;
+ Automatic failover: 如果 Master 节点发生故障,Sentinel 需要能够启动 故障转移处理过程:
- 从选择剩余的 Slave 节点中选择一个节点作为 Master 节点;
- 其他的 Slave 节点需要重新配置指向新的 Master 节点;
- 连接到此 Redis 集群的应用程序也应能够获得通知连接新的 address;
+ Configuration provider: Sentinel 同时作为授权角色, 为 client 提供 service disconvery, client 先连接 sentinel 获取当前 master 节点的 address, 当发生 failover 时, sentinel 负责 report the new address of new master.
3, 配置
+ 在每一个节点上运行 redis server 和 sentinel 服务
- redis-server /etc/redis/6379.conf
- redis-sentinel /etc/redis/sentinel.conf
+ 在任意一个节点上通过 redis-cli 登录后执行 INFO 命令,查看集群运行状态(找到 Replication 段):
* 下面为在 master 节点上查看的运行状态信息:
127.0.0.1:6379> INFO
# Server
redis_version:2.8.12
redis_git_sha1:00000000
redis_git_dirty:0
redis_build_id:a0ced9d6bbfe5950
redis_mode:standalone
os:Linux 2.6.32-358.el6.x86_64 x86_64
arch_bits:64
multiplexing_api:epoll
gcc_version:4.4.7
process_id:8942
run_id:e8b0962919eefc55f771e502f6aeb3dfcc32aa88
tcp_port:6379
uptime_in_seconds:3132
uptime_in_days:0
hz:10
lru_clock:13877460
config_file:/etc/redis/6379.conf
# Clients
connected_clients:9
client_longest_output_list:0
client_biggest_input_buf:0
blocked_clients:0
... ...
# Replication
role:master
connected_slaves:3
slave0:ip=192.168.16.14,port=6379,state=online,offset=439275,lag=0
slave1:ip=192.168.16.16,port=6379,state=online,offset=439412,lag=0
slave2:ip=192.168.16.17,port=6379,state=online,offset=439275,lag=1
master_repl_offset:439549
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:2
repl_backlog_histlen:439548
* 下面为在某一 slave 节点上运行状态信息:
# Replication
role:slave
master_host:192.168.16.12
master_port:6379
master_link_status:up
master_last_io_seconds_ago:1
master_sync_in_progress:0
slave_repl_offset:25834
slave_priority:100
slave_read_only:1
connected_slaves:0
master_repl_offset:0
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0
4, 测试
+ data 复制到其他节点:
- 首先在 master 节点 ccsp-1-1 上 write K(testkey) V(testval), 如下:
[root@ccsp-1-1 redis]# redis-cli -h ccsp-1-1 -p 6379 lpush testkey testval
(integer) 1
[root@ccsp-1-1 redis]#
- 之后再任意一个 slave 节点上 read, 如下:
[root@ccsp-1-1 redis]# redis-cli -h ccsp-2-1 -p 6379 lrange testkey 0 -1
1) "testval"
[root@ccsp-1-1 redis]# redis-cli -h ccsp-1-3 -p 6379 lrange testkey 0 -1
1) "testval"
[root@ccsp-1-1 redis]# redis-cli -h ccsp-2-3 -p 6379 lrange testkey 0 -1
1) "testval"
[root@ccsp-1-1 redis]#
+ master 节点挂掉, 通过 sentinel 重新选举一个 slave 为 master 节点:
- 这里通过 service 停止 ccsp-1-1 上的防火墙来演示 其它slave节点无法连接上 master 节点时的情形
- 查看其它节点的日志, 如下显示出其它 slave 节点都一直在尝试连接到 MASTER 节点
[7914] 26 Jul 08:02:26.696 * MASTER <-> SLAVE sync started
[7914] 26 Jul 08:02:26.696 # Error condition on socket for SYNC: Connection refused
[7914] 26 Jul 08:02:27.710 * Connecting to MASTER 192.168.16.12:6379
[7914] 26 Jul 08:02:27.710 * MASTER <-> SLAVE sync started
[7914] 26 Jul 08:02:27.711 # Error condition on socket for SYNC: Connection refused
[7914] 26 Jul 08:02:28.733 - DB 0: 2 keys (0 volatile) in 4 slots HT.
[7914] 26 Jul 08:02:28.734 - 8 clients connected (0 slaves), 976976 bytes in use
[7914] 26 Jul 08:02:28.740 * Connecting to MASTER 192.168.16.12:6379
[7914] 26 Jul 08:02:28.741 * MASTER <-> SLAVE sync started
[7914] 26 Jul 08:02:28.741 # Error condition on socket for SYNC: Connection refused
[7914] 26 Jul 08:02:29.767 * Connecting to MASTER 192.168.16.12:6379
[7914] 26 Jul 08:02:29.768 * MASTER <-> SLAVE sync started
[7914] 26 Jul 08:02:29.769 # Error condition on socket for SYNC: Connection refused
- 在 sentinel.conf 中配置的 down-after-milliseconds 的时长之后开始选举新的 master 节点:
[7374] 26 Jul 08:17:30.142 # +sdown sentinel 192.168.16.12:26379 192.168.16.12 26379 @ ccsp 192.168.16.12 6379
[7374] 26 Jul 08:17:30.388 # +new-epoch 3
[7374] 26 Jul 08:17:30.406 # +vote-for-leader 2ebcd5c3243e72aad45c4dc507ee6fdc820b20a6 3
[7374] 26 Jul 08:17:30.697 # +sdown master ccsp 192.168.16.12 6379
[7374] 26 Jul 08:17:30.767 # +odown master ccsp 192.168.16.12 6379 #quorum 3/2
[7374] 26 Jul 08:17:30.767 # Next failover delay: I will not start a failover before Sat Jul 26 08:23:31 2014
[7374] 26 Jul 08:17:30.770 - -role-change slave 192.168.16.17:6379 192.168.16.17 6379 @ ccsp 192.168.16.12 6379 new reported role is master
[7374] 26 Jul 08:17:31.499 # +config-update-from sentinel 192.168.16.14:26379 192.168.16.14 26379 @ ccsp 192.168.16.12 6379
[7374] 26 Jul 08:17:31.499 # +switch-master ccsp 192.168.16.12 6379 192.168.16.17 6379
[7374] 26 Jul 08:17:31.501 * +slave slave 192.168.16.16:6379 192.168.16.16 6379 @ ccsp 192.168.16.17 6379
[7374] 26 Jul 08:17:31.542 * +slave slave 192.168.16.14:6379 192.168.16.14 6379 @ ccsp 192.168.16.17 6379
[7374] 26 Jul 08:17:31.573 * +slave slave 192.168.16.12:6379 192.168.16.12 6379 @ ccsp 192.168.16.17 6379
[7374] 26 Jul 08:18:01.597 # +sdown slave 192.168.16.12:6379 192.168.16.12 6379 @ ccsp 192.168.16.17 6379
- 下面通过 INFO 命令可以看到剩余的 3 个slave 节点选举了 ccsp-2-3(192.168.16.17) 作为新的 master 节点:
如下为在 ccsp-2-3 上查看 INFO 的结果
[root@ccsp-1-1 ~]# redis-cli -h ccsp-2-3 -p 6379
ccsp-2-3:6379> INFO
# Server
redis_version:2.8.12
redis_git_sha1:00000000
redis_git_dirty:0
redis_build_id:91d7988321783ab4
redis_mode:standalone
os:Linux 2.6.32-358.el6.x86_64 x86_64
arch_bits:64
multiplexing_api:epoll
gcc_version:4.4.7
process_id:7914
run_id:35730970956ffa0e85c95c46eb31070e548430a2
tcp_port:6379
uptime_in_seconds:1654
uptime_in_days:0
hz:10
lru_clock:13879056
config_file:/etc/redis/6379.conf
... ...
# Replication
role:master
connected_slaves:3
slave0:ip=192.168.16.14,port=6379,state=online,offset=103743,lag=0
slave1:ip=192.168.16.16,port=6379,state=online,offset=103743,lag=0
master_repl_offset:104017
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:68558
repl_backlog_histlen:35460
+ 将原来的节点重新加入集群, 这里在加入时需要指定当前集群中的 master 节点
[root@ccsp-1-1 redis]# redis-server /etc/redis/6379.conf --slaveof ccsp-2-3 6379
[root@ccsp-1-1 redis]# redis-sentinel /etc/redis/sentinel.conf
5, 参考资料:
* Redis Sentinel 官网说明 <http://redis.io/topics/sentinel>