一、redis cluster介绍:
在redis3.0之前的版本中是没有集群功能的,只有master-slave模式,这种模式有个弊端是master主机挂掉,客户端过来的请求就处理不了了。虽然可以通过sentinel高可用来解决这一问题,但是当数据量很庞大的时候,也会成为影响性能的瓶颈,所以可以考虑配置redis cluster来提高性能。Redis集群是一个提供在多个Redis节点间共享数据的程序集,它并不支持处理多个keys的命令,因为这需要在不同的节点间移动数据,从而达不到像Redis那样的性能,在高负载的情况下可能会导致不可预料的错误.集群通过分区来提供一定程度的可用性,在实际环境中当某个节点宕机或者不可达的情况下继续处理命令.
1、Redis 集群的优势:
a、自动分割数据到不同的节点上. b、整个集群的部分节点失败或者不可达的情况下能够继续处理命令.
2、Redis集群的数据分片
Redis集群没有使用一致性hash,而是引入了哈希槽的概念.Redis集群有16384个哈希槽,每个key通过CRC16校验后对16384取模来决定放置哪个槽.集群的每个节点负责一部分hash槽。举个例子,比如当前集群有3个节点,那么:
节点 A 包含0到5500号哈希槽. 节点 B 包含5501到11000号哈希槽. 节点 C 包含11001到16384号哈希槽.
这种结构很容易添加或者删除节点,比如如果我想新添加个节点D, 我需要从节点 A, B, C中得到部分槽到D上. 如果我想移除节点A,需要将A中的槽移到B和C节点上,然后将没有任何槽的A节点从集群中移除即可.由于从一个节点将哈希槽移动到另一个节点并不会停止服务,所以无论添加删除或者改变某个节点的哈希槽的数量都不会造成集群不可用的状态.
3、Redis一致性保证
Redis并不能保证数据的强一致性. 这意味着在实际中集群在特定的条件下可能会丢失写操作.
a、第一个原因是因为集群是用了异步复制. 写操作过程如下:
客户端向主节点B写入一条命令.
主节点B向客户端回复命令状态.
主节点将写操作复制给它的从节点 B1, B2 和 B3.
主节点对命令的复制工作发生在返回命令回复之后,因为如果每次处理命令请求都需要等待复制操作完成的话,那么主节点处理命令请求的速度将极大地降低,所以必须在性能和一致性之间做出权衡。
注意:Redis 集群可能会在将来提供同步写的方法。
b、另外一种可能会丢失命令的情况是集群出现了网络分区,并且一个客户端与至少包括一个主节点在内的少数实例被孤立。
举个例子,假设集群包含 A 、 B 、 C 、 A1 、 B1 、 C1 六个节点, 其中 A 、B 、C 为主节点, A1 、B1 、C1 为A,B,C的从节点,还有一个客户端Z1。假设集群中发生网络分区,那么集群可能会分为两方,大部分的一方包含节点 A 、C 、A1 、B1 和 C1 ,小部分的一方则包含节点B和客户端Z1。Z1仍然能够向主节点B中写入, 如果网络分区发生时间较短,那么集群将会继续正常运作,如果分区的时间足够让大部分的一方将B1选举为新的master,那么Z1写入B中得数据便丢失了.
注意:在网络分裂出现期间,客户端Z1可以向主节点B发送写命令的最大时间是有限制的,这一时间限制称为节点超时时间(node timeout),是 Redis 集群的一个重要的配置选项!
二、redis cluster 集群架构
服务器环境:Centos 6.6 x86_64 redis版本3.0.7 服务器ip: 端口 10.0.18.145 7000 10.0.18.145 7001 10.0.18.146 7002 10.0.18.146 7003 10.0.18.147 7004 10.0.18.147 7005
注:在这三台服务器上编译安装redis,可以参考http://linuxg.blog.51cto.com/4410110/1862040。我这里是使用3台服务器,每台服务器开启2个进程,一共6个进程来模拟3主3从的cluster模式!如果是生产环境,至少要有6台服务器来配置redis cluster!因为配置集群依赖ruby库,所以需要在3台服务器上安装ruby,如下:
#yum install ruby rubygems -y
三、redis cluster集群配置过程
1、为不同的端口提供redis配置文件
首先在10.0.18.145上操作:
#cd /usr/local/redis/conf #cat cluster_7000.conf bind 10.0.18.145 127.0.0.1 daemonize yes pidfile /var/run/cluster_redis7000.pid port 7000 tcp-backlog 511 timeout 0 tcp-keepalive 0 loglevel notice logfile "/usr/local/redis/log/cluster_redis7000.log" databases 16 stop-writes-on-bgsave-error no rdbcompression yes rdbchecksum yes dbfilename dump_7000.rdb dir /usr/local/redis/ slave-serve-stale-data yes slave-read-only yes repl-diskless-sync no repl-diskless-sync-delay 5 repl-disable-tcp-nodelay no slave-priority 100 appendonly yes appendfilename "appendonly_7000.aof" appendfsync everysec no-appendfsync-on-rewrite no auto-aof-rewrite-percentage 100 auto-aof-rewrite-min-size 64mb aof-load-truncated yes lua-time-limit 5000 cluster-enabled yes cluster-config-file nodes_7000.conf cluster-node-timeout 10000 slowlog-log-slower-than 10000 slowlog-max-len 128 latency-monitor-threshold 0 notify-keyspace-events "" hash-max-ziplist-entries 512 hash-max-ziplist-value 64 list-max-ziplist-entries 512 list-max-ziplist-value 64 set-max-intset-entries 512 zset-max-ziplist-entries 128 zset-max-ziplist-value 64 hll-sparse-max-bytes 3000 activerehashing yes client-output-buffer-limit normal 0 0 0 client-output-buffer-limit slave 256mb 64mb 60 client-output-buffer-limit pubsub 32mb 8mb 60 hz 10 aof-rewrite-incremental-fsync yes maxmemory 1024mb maxmemory-policy volatile-lru rename-command FLUSHALL "XSFLUSHALL" rename-command FLUSHDB "XSFLUSHDB" rename-command SHUTDOWN "XSSHUTDOWN" rename-command DEBUG "XSDEBUG" rename-command CONFIG "XSCONFIG" rename-command SLAVEOF "XSSLAVEOF" #cat cluster_7001.conf 注:cluster_7001.conf的内容除了以下几项和cluster_7000.conf不同之外,其他都一样 pidfile /var/run/cluster_redis7001.pid port 7001 logfile "/usr/local/redis/log/cluster_redis7001.log" dbfilename dump_7001.rdb appendfilename "appendonly_7001.aof" cluster-config-file nodes_7001.conf 启动7000和7001进程 #cd /usr/local/redis/ #nohup ./bin/redis-server ./conf/cluster_7000.conf & #nohup ./bin/redis-server ./conf/cluster_7001.conf & 查看进程: #ps aux | grep redis root 14799 0.1 0.1 137452 2432 ? Ssl 17:54 0:00 ./bin/redis-server 10.0.18.145:7000 [cluster] root 14803 0.1 0.1 137452 2428 ? Ssl 17:55 0:00 ./bin/redis-server 10.0.18.145:7001 [cluster] root 14842 0.0 0.0 103252 840 pts/0 S+ 18:06 0:00 grep redis 查看端口: Active Internet connections (only servers) Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 4628/sshd tcp 0 0 127.0.0.1:7000 0.0.0.0:* LISTEN 14799/./bin/redis-s tcp 0 0 10.0.18.145:7000 0.0.0.0:* LISTEN 14799/./bin/redis-s tcp 0 0 127.0.0.1:7001 0.0.0.0:* LISTEN 14803/./bin/redis-s tcp 0 0 10.0.18.145:7001 0.0.0.0:* LISTEN 14803/./bin/redis-s tcp 0 0 127.0.0.1:25 0.0.0.0:* LISTEN 1101/master tcp 0 0 127.0.0.1:17000 0.0.0.0:* LISTEN 14799/./bin/redis-s tcp 0 0 10.0.18.145:17000 0.0.0.0:* LISTEN 14799/./bin/redis-s tcp 0 0 127.0.0.1:17001 0.0.0.0:* LISTEN 14803/./bin/redis-s tcp 0 0 10.0.18.145:17001 0.0.0.0:* LISTEN 14803/./bin/redis-s tcp 0 0 :::22 :::* LISTEN 4628/sshd tcp 0 0 ::1:25 :::* LISTEN 1101/master 查看节点信息,如下: #cat nodes_7000.conf 2e2f563a1467c2515857d70d762c26b06faabbd4 :0 myself,master - 0 0 0 connected vars currentEpoch 0 lastVoteEpoch 0 #cat nodes_7001.conf b3798ae7f13e4fca57e7a505eca0dae1f19ce4c3 :0 myself,master - 0 0 0 connected vars currentEpoch 0 lastVoteEpoch 0 现在还没有创建集群,所以是这种状态!
其次在10.0.18.146上操作:
本台服务器的cluster_7002.conf和cluster_7003.conf配置的内容和cluster_7000.conf一样,除了以下几项:
bind 10.0.18.146 127.0.0.1 #绑定ip为本机地址 剩下的参数,需要将涉及到端口的部分修改为对应的7002和7003 启动7002和7003进程 #cd /usr/local/redis/ #nohup ./bin/redis-server conf/cluster_7002.conf & #nohup ./bin/redis-server conf/cluster_7003.conf & 查看进程: #ps aux | grep redis root 24102 0.1 0.1 137452 2468 ? Ssl 18:16 0:00 ./bin/redis-server 10.0.18.146:7002 [cluster] root 24106 0.0 0.1 137452 2464 ? Ssl 18:16 0:00 ./bin/redis-server 10.0.18.146:7003 [cluster] root 24110 0.0 0.0 103252 844 pts/0 S+ 18:17 0:00 grep redis 查看端口: #netstat -tunlp Active Internet connections (only servers) Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 1402/sshd tcp 0 0 127.0.0.1:25 0.0.0.0:* LISTEN 1103/master tcp 0 0 127.0.0.1:7002 0.0.0.0:* LISTEN 24102/./bin/redis-s tcp 0 0 10.0.18.146:7002 0.0.0.0:* LISTEN 24102/./bin/redis-s tcp 0 0 127.0.0.1:7003 0.0.0.0:* LISTEN 24106/./bin/redis-s tcp 0 0 10.0.18.146:7003 0.0.0.0:* LISTEN 24106/./bin/redis-s tcp 0 0 127.0.0.1:17002 0.0.0.0:* LISTEN 24102/./bin/redis-s tcp 0 0 10.0.18.146:17002 0.0.0.0:* LISTEN 24102/./bin/redis-s tcp 0 0 127.0.0.1:17003 0.0.0.0:* LISTEN 24106/./bin/redis-s tcp 0 0 10.0.18.146:17003 0.0.0.0:* LISTEN 24106/./bin/redis-s tcp 0 0 :::22 :::* LISTEN 1402/sshd tcp 0 0 ::1:25 :::* LISTEN 1103/master
最后在10.0.18.147上操作,方法和18.146是一样的,这里不在赘述了,请确保端口可以正常启动!
2、创建集群
在三台服务器上都可以操作,这里选择在10.0.18.145上执行创建命令,如下:
#redis-trib.rb create --replicas 1 10.0.18.145:7000 10.0.18.145:7001 10.0.18.146:7002 10.0.18.146:7003 10.0.18.147:7004 10.0.18.147:7005 /usr/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:31:in `gem_original_require': no such file to load -- redis (LoadError) from /usr/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:31:in `require' from /usr/local/redis/bin/redis-trib.rb:25 可以看到有报错提示,根据提示找出原因是:缺少redis和ruby的接口,这里使用gem安装,如下: #gem install redis #在三台redis服务器上都要安装 Successfully installed redis-3.3.1 1 gem installed Installing ri documentation for redis-3.3.1... Installing RDoc documentation for redis-3.3.1... 注:如果redis服务器无法访问外网,可以到一台可以访问外网的服务器上执行gem install redis 然后 会在此服务器上找到/usr/lib/ruby/gems/1.8/cache/redis-3.3.1.gem,再将redis-3.3.1.gem复制到其 他无法访问外网的redis上安装#gem install redis-3.3.1.gem 重新执行创建集群命令: #redis-trib.rb create --replicas 1 10.0.18.145:7000 10.0.18.145:7001 10.0.18.146:7002 10.0.18.146:7003 10.0.18.147:7004 10.0.18.147:7005 >>> Creating cluster >>> Performing hash slots allocation on 6 nodes... Using 3 masters: #可以看到已经配置好了下面对应的主机端口为masters 10.0.18.147:7004 10.0.18.146:7002 10.0.18.145:7000 Adding replica 10.0.18.146:7003 to 10.0.18.147:7004 #主从对应关系 Adding replica 10.0.18.147:7005 to 10.0.18.146:7002 Adding replica 10.0.18.145:7001 to 10.0.18.145:7000 M: 2e2f563a1467c2515857d70d762c26b06faabbd4 10.0.18.145:7000 slots:10923-16383 (5461 slots) master S: b3798ae7f13e4fca57e7a505eca0dae1f19ce4c3 10.0.18.145:7001 replicates 2e2f563a1467c2515857d70d762c26b06faabbd4 M: faee6763e5f591d9d63df8d841041255ae7992f3 10.0.18.146:7002 slots:5461-10922 (5462 slots) master S: 4eeee08df0ee4ee0db34b39bf6922a24edb356d8 10.0.18.146:7003 replicates c1ea9a2adaa3f9790ccf29227293c9bafc6efb7b M: c1ea9a2adaa3f9790ccf29227293c9bafc6efb7b 10.0.18.147:7004 slots:0-5460 (5461 slots) master S: 37ea68d5cfd3953bde636656dd2250d5f6a7029d 10.0.18.147:7005 replicates faee6763e5f591d9d63df8d841041255ae7992f3 Can I set the above configuration? (type 'yes' to accept): yes #输入yes >>> Nodes configuration updated >>> Assign a different config epoch to each node >>> Sending CLUSTER MEET messages to join the cluster Waiting for the cluster to join.... >>> Performing Cluster Check (using node 10.0.18.145:7000) M: 2e2f563a1467c2515857d70d762c26b06faabbd4 10.0.18.145:7000 slots:10923-16383 (5461 slots) master M: b3798ae7f13e4fca57e7a505eca0dae1f19ce4c3 10.0.18.145:7001 slots: (0 slots) master replicates 2e2f563a1467c2515857d70d762c26b06faabbd4 M: faee6763e5f591d9d63df8d841041255ae7992f3 10.0.18.146:7002 slots:5461-10922 (5462 slots) master M: 4eeee08df0ee4ee0db34b39bf6922a24edb356d8 10.0.18.146:7003 slots: (0 slots) master replicates c1ea9a2adaa3f9790ccf29227293c9bafc6efb7b M: c1ea9a2adaa3f9790ccf29227293c9bafc6efb7b 10.0.18.147:7004 slots:0-5460 (5461 slots) master M: 37ea68d5cfd3953bde636656dd2250d5f6a7029d 10.0.18.147:7005 slots: (0 slots) master replicates faee6763e5f591d9d63df8d841041255ae7992f3 [OK] All nodes agree about slots configuration. >>> Check for open slots... >>> Check slots coverage... [OK] All 16384 slots covered.
查看10.0.18.145主机上的nodes信息,如下:
#cat nodes_7000.conf b3798ae7f13e4fca57e7a505eca0dae1f19ce4c3 10.0.18.145:7001 slave 2e2f563a1467c2515857d70d762c26b06faabbd4 0 1476878663502 2 connected c1ea9a2adaa3f9790ccf29227293c9bafc6efb7b 10.0.18.147:7004 master - 0 1476878662501 5 connected 0-5460 37ea68d5cfd3953bde636656dd2250d5f6a7029d 10.0.18.147:7005 slave faee6763e5f591d9d63df8d841041255ae7992f3 0 1476878663502 6 connected 2e2f563a1467c2515857d70d762c26b06faabbd4 10.0.18.145:7000 myself,master - 0 0 1 connected 10923-16383 4eeee08df0ee4ee0db34b39bf6922a24edb356d8 10.0.18.146:7003 slave c1ea9a2adaa3f9790ccf29227293c9bafc6efb7b 0 1476878662501 5 connected faee6763e5f591d9d63df8d841041255ae7992f3 10.0.18.146:7002 master - 0 1476878661499 3 connected 5461-10922 vars currentEpoch 6 lastVoteEpoch 0 #cat nodes_7001.conf c1ea9a2adaa3f9790ccf29227293c9bafc6efb7b 10.0.18.147:7004 master - 0 1476878662184 5 connected 0-5460 4eeee08df0ee4ee0db34b39bf6922a24edb356d8 10.0.18.146:7003 slave c1ea9a2adaa3f9790ccf29227293c9bafc6efb7b 0 1476878663186 5 connected 37ea68d5cfd3953bde636656dd2250d5f6a7029d 10.0.18.147:7005 slave faee6763e5f591d9d63df8d841041255ae7992f3 0 1476878661181 6 connected b3798ae7f13e4fca57e7a505eca0dae1f19ce4c3 10.0.18.145:7001 myself,slave 2e2f563a1467c2515857d70d762c26b06faabbd4 0 0 2 connected faee6763e5f591d9d63df8d841041255ae7992f3 10.0.18.146:7002 master - 0 1476878660175 3 connected 5461-10922 2e2f563a1467c2515857d70d762c26b06faabbd4 10.0.18.145:7000 master - 0 1476878659173 1 connected 10923-16383 vars currentEpoch 6 lastVoteEpoch 0 这些nodes_xxxx.conf中记录了本机不同redis-server端口的角色信息!
可以在命令行中查看集群的信息,在三台服务器中的其中一台查看,在10.0.18.145查看:
#redis-cli -h 10.0.18.145 -p 7000 cluster nodes b3798ae7f13e4fca57e7a505eca0dae1f19ce4c3 10.0.18.145:7001 slave 2e2f563a1467c2515857d70d762c26b06faabbd4 0 1476879080657 2 connected c1ea9a2adaa3f9790ccf29227293c9bafc6efb7b 10.0.18.147:7004 master - 0 1476879077649 5 connected 0-5460 37ea68d5cfd3953bde636656dd2250d5f6a7029d 10.0.18.147:7005 slave faee6763e5f591d9d63df8d841041255ae7992f3 0 1476879078652 6 connected 2e2f563a1467c2515857d70d762c26b06faabbd4 10.0.18.145:7000 myself,master - 0 0 1 connected 10923-16383 4eeee08df0ee4ee0db34b39bf6922a24edb356d8 10.0.18.146:7003 slave c1ea9a2adaa3f9790ccf29227293c9bafc6efb7b 0 1476879080657 5 connected faee6763e5f591d9d63df8d841041255ae7992f3 10.0.18.146:7002 master - 0 1476879079654 3 connected 5461-10922
在10.0.18.146查看:
#./bin/redis-cli -h 10.0.18.146 -p 7003 cluster nodes 37ea68d5cfd3953bde636656dd2250d5f6a7029d 10.0.18.147:7005 slave faee6763e5f591d9d63df8d841041255ae7992f3 0 1476879258036 6 connected c1ea9a2adaa3f9790ccf29227293c9bafc6efb7b 10.0.18.147:7004 master - 0 1476879259039 5 connected 0-5460 b3798ae7f13e4fca57e7a505eca0dae1f19ce4c3 10.0.18.145:7001 slave 2e2f563a1467c2515857d70d762c26b06faabbd4 0 1476879258036 2 connected 4eeee08df0ee4ee0db34b39bf6922a24edb356d8 10.0.18.146:7003 myself,slave c1ea9a2adaa3f9790ccf29227293c9bafc6efb7b 0 0 4 connected 2e2f563a1467c2515857d70d762c26b06faabbd4 10.0.18.145:7000 master - 0 1476879260041 1 connected 10923-16383 faee6763e5f591d9d63df8d841041255ae7992f3 10.0.18.146:7002 master - 0 1476879257033 3 connected 5461-10922 #./bin/redis-cli -h 10.0.18.146 -p 7002 cluster nodes 37ea68d5cfd3953bde636656dd2250d5f6a7029d 10.0.18.147:7005 slave faee6763e5f591d9d63df8d841041255ae7992f3 0 1476879264453 6 connected faee6763e5f591d9d63df8d841041255ae7992f3 10.0.18.146:7002 myself,master - 0 0 3 connected 5461-10922 c1ea9a2adaa3f9790ccf29227293c9bafc6efb7b 10.0.18.147:7004 master - 0 1476879262447 5 connected 0-5460 4eeee08df0ee4ee0db34b39bf6922a24edb356d8 10.0.18.146:7003 slave c1ea9a2adaa3f9790ccf29227293c9bafc6efb7b 0 1476879261444 5 connected b3798ae7f13e4fca57e7a505eca0dae1f19ce4c3 10.0.18.145:7001 slave 2e2f563a1467c2515857d70d762c26b06faabbd4 0 1476879263449 2 connected 2e2f563a1467c2515857d70d762c26b06faabbd4 10.0.18.145:7000 master - 0 1476879261444 1 connected 10923-16383
可以看到查看的端口不同,显示信息的顺序和格式也不同!
3、手动设置key-values
在10.0.18.146上操作 #./bin/redis-cli -h 10.0.18.146 -p 7002 -c #进入集群的7002端口 10.0.18.146:7002> set name kobe #设置一个键值 OK 10.0.18.146:7002> get name #查看值 "kobe" 到18.145上查看: #redis-cli -h 10.0.18.145 -p 7000 -c 10.0.18.145:7000> get name -> Redirected to slot [5798] located at 10.0.18.146:7002 "kobe" #会提示你这个键在10.0.18.146的7002进程 在18.147上查看: #./bin/redis-cli -h 10.0.18.147 -p 7004 -c 10.0.18.147:7004> get name -> Redirected to slot [5798] located at 10.0.18.146:7002 "kobe" #也会提示你这个键在10.0.18.146的7002进程
四、redis cluster使用过程中遇到的一些问题
1、集群扩展:
有时候会遇到这种情况,前期规划的redis集群随着业务的增加,数据量也在不断增大,需要扩展redis cluster集群,由原来的3主3从,扩展为4主4从,或者更多,那么就需要扩展redis cluster集群!
我这里是在10.0.18.147服务器上再开启两个端口7006和7007,操作如下:
#cd /usr/local/redis/conf #cp cluster_7005.conf cluster_7006.conf #cp cluster_7005.conf cluster_7007.conf 修改对应配置文件中的端口 #sed -i 's/7005/7006/g' cluster_7006.conf #sed -i 's/7005/7007/g' cluster_7007.conf 然后启动 #cd /usr/local/redis/ #nohup bin/redis-server conf/cluster_7006.conf & #nohup bin/redis-server conf/cluster_7007.conf & 查看启动端口: #ps aux| grep redis root 25805 0.1 0.1 137452 2756 ? Ssl Oct19 2:07 ./bin/redis-server 10.0.18.147:7004 [cluster] root 25809 0.1 0.1 137452 2724 ? Ssl Oct19 2:00 ./bin/redis-server 10.0.18.147:7005 [cluster] root 26115 0.0 0.1 137452 2472 ? Ssl 13:56 0:00 bin/redis-server 10.0.18.147:7006 [cluster] root 26119 0.0 0.1 137452 2464 ? Ssl 13:56 0:00 bin/redis-server 10.0.18.147:7007 [cluster] root 26123 0.0 0.0 103252 844 pts/0 S+ 13:57 0:00 grep redis
开始将新结点添加到集群中:
在介绍添加新节点到集群之前,先介绍add-node的用法,如下:
add-node命令可以将新节点加入集群,节点可以为master,也可以为某个master节点的slave。 用法: add-node new_host:new_port existing_host:existing_port --slave --master-idadd-node 有两个可选参数: --slave:设置该参数,则新节点以slave的角色加入集群 --master-id:这个参数需要设置了--slave才能生效,--master-id用来指定新节点的master节点,如果不设置该参数,则会随机为节点选择master节点。
开始添加节点:在10.0.18.145服务器上操作,不过在另外2台上操作也是可以的!
#redis-trib.rb add-node 10.0.18.147:7006 10.0.18.145:7000 >>> Adding node 10.0.18.147:7006 to cluster 10.0.18.145:7000 >>> Performing Cluster Check (using node 10.0.18.145:7000) M: 2e2f563a1467c2515857d70d762c26b06faabbd4 10.0.18.145:7000 slots:10923-16383 (5461 slots) master 1 additional replica(s) S: b3798ae7f13e4fca57e7a505eca0dae1f19ce4c3 10.0.18.145:7001 slots: (0 slots) slave replicates 2e2f563a1467c2515857d70d762c26b06faabbd4 M: c1ea9a2adaa3f9790ccf29227293c9bafc6efb7b 10.0.18.147:7004 slots:0-5460 (5461 slots) master 1 additional replica(s) S: 37ea68d5cfd3953bde636656dd2250d5f6a7029d 10.0.18.147:7005 slots: (0 slots) slave replicates faee6763e5f591d9d63df8d841041255ae7992f3 S: 4eeee08df0ee4ee0db34b39bf6922a24edb356d8 10.0.18.146:7003 slots: (0 slots) slave replicates c1ea9a2adaa3f9790ccf29227293c9bafc6efb7b M: faee6763e5f591d9d63df8d841041255ae7992f3 10.0.18.146:7002 slots:5461-10922 (5462 slots) master 1 additional replica(s) [OK] All nodes agree about slots configuration. >>> Check for open slots... >>> Check slots coverage... [OK] All 16384 slots covered. >>> Send CLUSTER MEET to node 10.0.18.147:7006 to make it join the cluster. [OK] New node added correctly. 根据提示信息可以看出已经添加OK了,然后查看集群状态 #redis-cli -h 10.0.18.145 -p 7000 cluster nodes 5cbec92f52fbe9da7cd643cd34e87055c5358018 10.0.18.147:7006 master - 0 1476945360520 0 connected #新添加的节点被分为master节点,但是没有slot的,一会儿为其分配slot b3798ae7f13e4fca57e7a505eca0dae1f19ce4c3 10.0.18.145:7001 slave 2e2f563a1467c2515857d70d762c26b06faabbd4 0 1476945356508 2 connected c1ea9a2adaa3f9790ccf29227293c9bafc6efb7b 10.0.18.147:7004 master - 0 1476945359516 5 connected 0-5460 37ea68d5cfd3953bde636656dd2250d5f6a7029d 10.0.18.147:7005 slave faee6763e5f591d9d63df8d841041255ae7992f3 0 1476945357509 6 connected 2e2f563a1467c2515857d70d762c26b06faabbd4 10.0.18.145:7000 myself,master - 0 0 1 connected 10923-16383 4eeee08df0ee4ee0db34b39bf6922a24edb356d8 10.0.18.146:7003 slave c1ea9a2adaa3f9790ccf29227293c9bafc6efb7b 0 1476945359516 5 connected faee6763e5f591d9d63df8d841041255ae7992f3 10.0.18.146:7002 master - 0 1476945358513 3 connected 5461-10922
注意:新节点现在已经连接上了集群,成为集群的一份子,并且可以对客户端的命令请求进行转向了,但是和其他主节点相比, 新节点还有两点区别:
a、新节点没有包含任何数据,因为它没有包含任何哈希槽.
b、尽管新节点没有包含任何哈希槽,但它仍然是一个主节点,所以在集群需要将某个从节点升级为新的主节点时,这个新节点不会被选中。.
接下来,只要使用redis-trib程序,将集群中的某些哈希桶移动到新节点里面 新节点就会成为真正的主节点了,这里涉及到一个概念是reshard,这里简单介绍一下:
reshard命令可以在线把集群的一些slot从集群原来slot负责节点迁移到新的节点,利用reshard可以完成集群的在线横向扩容和缩容。 reshard的参数很多,下面进行介绍: reshard host:port --from--to --slots --yes --timeout --pipeline host:port:这个是必传参数,用来从一个节点获取整个集群信息,相当于获取集群信息的入口。 --from :需要从哪些源节点上迁移slot,可从多个源节点完成迁移,以逗号隔开,传递的是节点的node id,还可以直接传递--from all,这样源节点就是集群的所有节点,不传递该参数的话,则会在迁移过程中提示用户输入。 --to :slot需要迁移的目的节点的node id,目的节点只能填写一个,不传递该参数的话,则会在迁移过程中提示用户输入。 --slots :需要迁移的slot数量,不传递该参数的话,则会在迁移过程中提示用户输入。 --yes:设置该参数,可以在打印执行reshard计划的时候,提示用户输入yes确认后再执行reshard。 --timeout :设置migrate命令的超时时间。 --pipeline :定义cluster getkeysinslot命令一次取出的key数量,不传的话使用默认值为10。
在10.0.18.145上操作如下:
#redis-trib.rb reshard 10.0.18.147:7006 #指定为哪个节点分配slot,当时我们这里是新添加的节点 ……………… >>> Performing Cluster Check (using node 10.0.18.147:7006) M: 5cbec92f52fbe9da7cd643cd34e87055c5358018 10.0.18.147:7006 slots: (0 slots) master #新的7006实例的slot为0,没有任何槽位 0 additional replica(s) M: faee6763e5f591d9d63df8d841041255ae7992f3 10.0.18.146:7002 slots:5461-10922 (5462 slots) master 1 additional replica(s) M: 2e2f563a1467c2515857d70d762c26b06faabbd4 10.0.18.145:7000 slots:10923-16383 (5461 slots) master 1 additional replica(s) S: b3798ae7f13e4fca57e7a505eca0dae1f19ce4c3 10.0.18.145:7001 slots: (0 slots) slave replicates 2e2f563a1467c2515857d70d762c26b06faabbd4 S: 37ea68d5cfd3953bde636656dd2250d5f6a7029d 10.0.18.147:7005 slots: (0 slots) slave replicates faee6763e5f591d9d63df8d841041255ae7992f3 M: c1ea9a2adaa3f9790ccf29227293c9bafc6efb7b 10.0.18.147:7004 slots:0-5460 (5460 slots) master 1 additional replica(s) S: 4eeee08df0ee4ee0db34b39bf6922a24edb356d8 10.0.18.146:7003 slots: (0 slots) slave replicates c1ea9a2adaa3f9790ccf29227293c9bafc6efb7b [OK] All nodes agree about slots configuration. >>> Check for open slots... >>> Check slots coverage... [OK] All 16384 slots covered. How many slots do you want to move (from 1 to 16384)? 500 #输入要给新节点分配的slot数量, What is the receiving node ID? 5cbec92f52fbe9da7cd643cd34e87055c5358018 #输入新节点7006实例的node id Please enter all the source node IDs. Type 'all' to use all the nodes as source nodes for the hash slots. Type 'done' once you entered all the source nodes IDs. Source node #1: c1ea9a2adaa3f9790ccf29227293c9bafc6efb7b #输入从哪一个node id 分离出1000个slot,我这里选择10.0.18.147:7004 master Source node #2:done #输入done …………………… Moving slot 487 from c1ea9a2adaa3f9790ccf29227293c9bafc6efb7b Moving slot 488 from c1ea9a2adaa3f9790ccf29227293c9bafc6efb7b Moving slot 489 from c1ea9a2adaa3f9790ccf29227293c9bafc6efb7b Moving slot 490 from c1ea9a2adaa3f9790ccf29227293c9bafc6efb7b Moving slot 491 from c1ea9a2adaa3f9790ccf29227293c9bafc6efb7b Moving slot 492 from c1ea9a2adaa3f9790ccf29227293c9bafc6efb7b Moving slot 493 from c1ea9a2adaa3f9790ccf29227293c9bafc6efb7b Moving slot 494 from c1ea9a2adaa3f9790ccf29227293c9bafc6efb7b Moving slot 495 from c1ea9a2adaa3f9790ccf29227293c9bafc6efb7b Moving slot 496 from c1ea9a2adaa3f9790ccf29227293c9bafc6efb7b Moving slot 497 from c1ea9a2adaa3f9790ccf29227293c9bafc6efb7b Moving slot 498 from c1ea9a2adaa3f9790ccf29227293c9bafc6efb7b Moving slot 499 from c1ea9a2adaa3f9790ccf29227293c9bafc6efb7b Do you want to proceed with the proposed reshard plan (yes/no)? yes #输入yes ………… Moving slot 492 from 10.0.18.147:7004 to 10.0.18.147:7006: Moving slot 493 from 10.0.18.147:7004 to 10.0.18.147:7006: Moving slot 494 from 10.0.18.147:7004 to 10.0.18.147:7006: Moving slot 495 from 10.0.18.147:7004 to 10.0.18.147:7006: Moving slot 496 from 10.0.18.147:7004 to 10.0.18.147:7006: Moving slot 497 from 10.0.18.147:7004 to 10.0.18.147:7006: Moving slot 498 from 10.0.18.147:7004 to 10.0.18.147:7006: Moving slot 499 from 10.0.18.147:7004 to 10.0.18.147:7006: 结束之后,重新查看集群状态: #redis-cli -h 10.0.18.145 -p 7000 cluster nodes 5cbec92f52fbe9da7cd643cd34e87055c5358018 10.0.18.147:7006 master - 0 1476946198888 7 connected 0-499 #可以看到新加入的节点有了0-499个slot b3798ae7f13e4fca57e7a505eca0dae1f19ce4c3 10.0.18.145:7001 slave 2e2f563a1467c2515857d70d762c26b06faabbd4 0 1476946196884 2 connected c1ea9a2adaa3f9790ccf29227293c9bafc6efb7b 10.0.18.147:7004 master - 0 1476946197886 5 connected 500-5460 #7004这个节点的500个slot已经被分配走了 37ea68d5cfd3953bde636656dd2250d5f6a7029d 10.0.18.147:7005 slave faee6763e5f591d9d63df8d841041255ae7992f3 0 1476946199892 6 connected 2e2f563a1467c2515857d70d762c26b06faabbd4 10.0.18.145:7000 myself,master - 0 0 1 connected 10923-16383 4eeee08df0ee4ee0db34b39bf6922a24edb356d8 10.0.18.146:7003 slave c1ea9a2adaa3f9790ccf29227293c9bafc6efb7b 0 1476946198888 5 connected faee6763e5f591d9d63df8d841041255ae7992f3 10.0.18.146:7002 master - 0 1476946199892 3 connected 5461-10922
以上我们将10.0.18.147:7006添加到了集群中,被默认设置为master,那么如何将一个新节点添加到一个master节点下面呢,下面来演示一下:
将10.0.18.147 7007 节点作为slave添加到10.0.18.147 7006 master节点下 #redis-trib.rb add-node --slave --master-id 5cbec92f52fbe9da7cd643cd34e87055c5358018 10.0.18.147:7007 10.0.18.147:7006 >>> Adding node 10.0.18.147:7007 to cluster 10.0.18.147:7006 >>> Performing Cluster Check (using node 10.0.18.147:7006) M: 5cbec92f52fbe9da7cd643cd34e87055c5358018 10.0.18.147:7006 slots:0-499 (500 slots) master 0 additional replica(s) S: b3798ae7f13e4fca57e7a505eca0dae1f19ce4c3 10.0.18.145:7001 slots: (0 slots) slave replicates 2e2f563a1467c2515857d70d762c26b06faabbd4 M: c1ea9a2adaa3f9790ccf29227293c9bafc6efb7b 10.0.18.147:7004 slots:500-5460 (4961 slots) master 1 additional replica(s) S: 4eeee08df0ee4ee0db34b39bf6922a24edb356d8 10.0.18.146:7003 slots: (0 slots) slave replicates c1ea9a2adaa3f9790ccf29227293c9bafc6efb7b M: 2e2f563a1467c2515857d70d762c26b06faabbd4 10.0.18.145:7000 slots:10923-16383 (5461 slots) master 1 additional replica(s) S: 37ea68d5cfd3953bde636656dd2250d5f6a7029d 10.0.18.147:7005 slots: (0 slots) slave replicates faee6763e5f591d9d63df8d841041255ae7992f3 M: faee6763e5f591d9d63df8d841041255ae7992f3 10.0.18.146:7002 slots:5461-10922 (5462 slots) master 1 additional replica(s) [OK] All nodes agree about slots configuration. >>> Check for open slots... >>> Check slots coverage... [OK] All 16384 slots covered. [ERR] Node 10.0.18.147:7007 is not empty. Either the node already knows other nodes (check with CLUSTER NODES) or contains some key in database 0. 报错了,也就是没有添加成功,网上看到的解决方法: 1)、将需要新增的节点下aof、rdb等本地备份文件删除; 2)、同时将新Node的集群配置文件删除,即:删除你redis.conf里面cluster-config-file所对应的文件; 3)、再次添加新节点如果还是报错,则登录新Node执行./redis-cli -h x -p对数据库进行清除: 127.0.0.1:7001> flushdb #清空当前数据库 我这里是这样解决的: 删除appendonly_7007.aof和nodes_7007.conf文件;然后停掉7007这个redis-server进程;最后重启7007这个进程 重新执行上面的命令,就OK了,如下: #redis-trib.rb add-node --slave --master-id 5cbec92f52fbe9da7cd643cd34e87055c5358018 10.0.18.147:7007 10.0.18.147:7006 >>> Adding node 10.0.18.147:7007 to cluster 10.0.18.147:7006 >>> Performing Cluster Check (using node 10.0.18.147:7006) M: 5cbec92f52fbe9da7cd643cd34e87055c5358018 10.0.18.147:7006 slots:0-499 (500 slots) master 0 additional replica(s) S: b3798ae7f13e4fca57e7a505eca0dae1f19ce4c3 10.0.18.145:7001 slots: (0 slots) slave replicates 2e2f563a1467c2515857d70d762c26b06faabbd4 M: c1ea9a2adaa3f9790ccf29227293c9bafc6efb7b 10.0.18.147:7004 slots:500-5460 (4961 slots) master 1 additional replica(s) S: 4eeee08df0ee4ee0db34b39bf6922a24edb356d8 10.0.18.146:7003 slots: (0 slots) slave replicates c1ea9a2adaa3f9790ccf29227293c9bafc6efb7b M: 2e2f563a1467c2515857d70d762c26b06faabbd4 10.0.18.145:7000 slots:10923-16383 (5461 slots) master 1 additional replica(s) S: 37ea68d5cfd3953bde636656dd2250d5f6a7029d 10.0.18.147:7005 slots: (0 slots) slave replicates faee6763e5f591d9d63df8d841041255ae7992f3 M: faee6763e5f591d9d63df8d841041255ae7992f3 10.0.18.146:7002 slots:5461-10922 (5462 slots) master 1 additional replica(s) [OK] All nodes agree about slots configuration. >>> Check for open slots... >>> Check slots coverage... [OK] All 16384 slots covered. >>> Send CLUSTER MEET to node 10.0.18.147:7007 to make it join the cluster. Waiting for the cluster to join. >>> Configure node as replica of 10.0.18.147:7006. [OK] New node added correctly. 查看集群信息,如下: #redis-cli -h 10.0.18.145 -p 7000 cluster nodes 5cbec92f52fbe9da7cd643cd34e87055c5358018 10.0.18.147:7006 master - 0 1476951249203 7 connected 0-499 b3798ae7f13e4fca57e7a505eca0dae1f19ce4c3 10.0.18.145:7001 slave 2e2f563a1467c2515857d70d762c26b06faabbd4 0 1476951250206 2 connected c1ea9a2adaa3f9790ccf29227293c9bafc6efb7b 10.0.18.147:7004 master - 0 1476951250206 5 connected 500-5460 37ea68d5cfd3953bde636656dd2250d5f6a7029d 10.0.18.147:7005 slave faee6763e5f591d9d63df8d841041255ae7992f3 0 1476951249203 6 connected 2e2f563a1467c2515857d70d762c26b06faabbd4 10.0.18.145:7000 myself,master - 0 0 1 connected 10923-16383 4eeee08df0ee4ee0db34b39bf6922a24edb356d8 10.0.18.146:7003 slave c1ea9a2adaa3f9790ccf29227293c9bafc6efb7b 0 1476951251208 5 connected e80b8f454affd7e8149dc5bd4f6f3cf1ef3872a1 10.0.18.147:7007 slave 5cbec92f52fbe9da7cd643cd34e87055c5358018 0 1476951251209 7 connected #新添加的slave节点 faee6763e5f591d9d63df8d841041255ae7992f3 10.0.18.146:7002 master - 0 1476951248200 3 connected 5461-10922 注:在18.146和18.147上查看的集群节点信息也是一样的!
2、删除集群中的某个节点
如果想缩小集群中节点的数量,可以选择删除某些节点,但是要删除slave节点,再删除master节点。这里先介绍一下del-node的用法:
del-node 可以把某个节点从集群中删除,del-node只能删除没有分配slot的节点,删除命令传递两个参数: host:port:从该节点获取集群信息。 node_id: 需要删除的节点id 用法:redis-trib.rb del-node host:port node_id
将10.0.18.147:7007这个slave节点删除,如下,在10.0.18.146上操作
#./bin/redis-trib.rb del-node 10.0.18.147:7007 e80b8f454affd7e8149dc5bd4f6f3cf1ef3872a1 >>> Removing node e80b8f454affd7e8149dc5bd4f6f3cf1ef3872a1 from cluster 10.0.18.147:7007 >>> Sending CLUSTER FORGET messages to the cluster... >>> SHUTDOWN the node. /usr/lib/ruby/gems/1.8/gems/redis-3.3.1/lib/redis/client.rb:121:in `call': ERR unknown command 'shutdown' (Redis::CommandError) from /usr/lib/ruby/gems/1.8/gems/redis-3.3.1/lib/redis.rb:306:in `shutdown' from /usr/lib/ruby/gems/1.8/gems/redis-3.3.1/lib/redis/client.rb:293:in `with_reconnect' from /usr/lib/ruby/gems/1.8/gems/redis-3.3.1/lib/redis.rb:304:in `shutdown' from /usr/lib/ruby/gems/1.8/gems/redis-3.3.1/lib/redis.rb:58:in `synchronize' from /usr/lib/ruby/1.8/monitor.rb:242:in `mon_synchronize' from /usr/lib/ruby/gems/1.8/gems/redis-3.3.1/lib/redis.rb:58:in `synchronize' from /usr/lib/ruby/gems/1.8/gems/redis-3.3.1/lib/redis.rb:303:in `shutdown' from ./bin/redis-trib.rb:1389:in `delnode_cluster_cmd' from ./bin/redis-trib.rb:1695:in `send' from ./bin/redis-trib.rb:1695 注:不知道为啥会抛出这么多from 信息,不影响删除slave节点。 查看 #./bin/redis-cli -h 10.0.18.146 -p 7002 cluster nodes 5cbec92f52fbe9da7cd643cd34e87055c5358018 10.0.18.147:7006 master - 0 1476955092868 7 connected 0-499 37ea68d5cfd3953bde636656dd2250d5f6a7029d 10.0.18.147:7005 slave faee6763e5f591d9d63df8d841041255ae7992f3 0 1476955091865 6 connected faee6763e5f591d9d63df8d841041255ae7992f3 10.0.18.146:7002 myself,master - 0 0 3 connected 5461-10922 c1ea9a2adaa3f9790ccf29227293c9bafc6efb7b 10.0.18.147:7004 master - 0 1476955093871 5 connected 500-5460 4eeee08df0ee4ee0db34b39bf6922a24edb356d8 10.0.18.146:7003 slave c1ea9a2adaa3f9790ccf29227293c9bafc6efb7b 0 1476955094872 5 connected b3798ae7f13e4fca57e7a505eca0dae1f19ce4c3 10.0.18.145:7001 slave 2e2f563a1467c2515857d70d762c26b06faabbd4 0 1476955094872 2 connected 2e2f563a1467c2515857d70d762c26b06faabbd4 10.0.18.145:7000 master - 0 1476955095874 1 connected 10923-16383 可以看到10.0.18.147:7007这个slave节点已经删除掉了!
将10.0.18.147:7006这个master节点删除,如下:
#./bin/redis-trib.rb del-node 10.0.18.147:7006 5cbec92f52fbe9da7cd643cd34e87055c5358018 >>> Removing node 5cbec92f52fbe9da7cd643cd34e87055c5358018 from cluster 10.0.18.147:7006 [ERR] Node 10.0.18.147:7006 is not empty! Reshard data away and try again. 提示信息告知删除失败,因为这个master节点不是空的,有slot存在!所以需要将此节点的slot转移给其他master节点
将10.1.18.147:7006这个master节点的slot转移到10.0.18.147:7004上,如下:
./bin/redis-trib.rb reshard 10.0.18.147:7006 (这个是搞砸的,不要按照这个操作!) >>> Performing Cluster Check (using node 10.0.18.147:7006) M: 5cbec92f52fbe9da7cd643cd34e87055c5358018 10.0.18.147:7006 slots:0-499 (500 slots) master #10.0.18.147:7006节点的slot数量是500 0 additional replica(s) S: b3798ae7f13e4fca57e7a505eca0dae1f19ce4c3 10.0.18.145:7001 slots: (0 slots) slave replicates 2e2f563a1467c2515857d70d762c26b06faabbd4 M: c1ea9a2adaa3f9790ccf29227293c9bafc6efb7b 10.0.18.147:7004 slots:500-5460 (4961 slots) master 1 additional replica(s) S: 4eeee08df0ee4ee0db34b39bf6922a24edb356d8 10.0.18.146:7003 slots: (0 slots) slave replicates c1ea9a2adaa3f9790ccf29227293c9bafc6efb7b M: 2e2f563a1467c2515857d70d762c26b06faabbd4 10.0.18.145:7000 slots:10923-16383 (5461 slots) master 1 additional replica(s) S: 37ea68d5cfd3953bde636656dd2250d5f6a7029d 10.0.18.147:7005 slots: (0 slots) slave replicates faee6763e5f591d9d63df8d841041255ae7992f3 M: faee6763e5f591d9d63df8d841041255ae7992f3 10.0.18.146:7002 slots:5461-10922 (5462 slots) master 1 additional replica(s) [OK] All nodes agree about slots configuration. >>> Check for open slots... >>> Check slots coverage... [OK] All 16384 slots covered. How many slots do you want to move (from 1 to 16384)? 500 #输入500 What is the receiving node ID? 5cbec92f52fbe9da7cd643cd34e87055c5358018 #输入10.0.18.147:7006节点的node id Please enter all the source node IDs. Type 'all' to use all the nodes as source nodes for the hash slots. Type 'done' once you entered all the source nodes IDs. Source node #1:c1ea9a2adaa3f9790ccf29227293c9bafc6efb7b #接收这500个slot的主节点是10.0.18.147:7004 Source node #2:done #输入done ……………… Moving slot 993 from c1ea9a2adaa3f9790ccf29227293c9bafc6efb7b Moving slot 994 from c1ea9a2adaa3f9790ccf29227293c9bafc6efb7b Moving slot 995 from c1ea9a2adaa3f9790ccf29227293c9bafc6efb7b Moving slot 996 from c1ea9a2adaa3f9790ccf29227293c9bafc6efb7b Moving slot 997 from c1ea9a2adaa3f9790ccf29227293c9bafc6efb7b Moving slot 998 from c1ea9a2adaa3f9790ccf29227293c9bafc6efb7b Moving slot 999 from c1ea9a2adaa3f9790ccf29227293c9bafc6efb7b Do you want to proceed with the proposed reshard plan (yes/no)? yes ……………… Moving slot 996 from 10.0.18.147:7004 to 10.0.18.147:7006: Moving slot 997 from 10.0.18.147:7004 to 10.0.18.147:7006: Moving slot 998 from 10.0.18.147:7004 to 10.0.18.147:7006: Moving slot 999 from 10.0.18.147:7004 to 10.0.18.147:7006: 搞砸了,本来是想将10.0.18.147:7006的500个slot转移到10.0.18.147:7004的,结果搞反掉了,入下: #./bin/redis-cli -h 10.0.18.146 -p 7002 cluster nodes 5cbec92f52fbe9da7cd643cd34e87055c5358018 10.0.18.147:7006 master - 0 1476956724238 7 connected 0-999 37ea68d5cfd3953bde636656dd2250d5f6a7029d 10.0.18.147:7005 slave faee6763e5f591d9d63df8d841041255ae7992f3 0 1476956726241 6 connected faee6763e5f591d9d63df8d841041255ae7992f3 10.0.18.146:7002 myself,master - 0 0 3 connected 5461-10922 c1ea9a2adaa3f9790ccf29227293c9bafc6efb7b 10.0.18.147:7004 master - 0 1476956724237 5 connected 1000-5460 4eeee08df0ee4ee0db34b39bf6922a24edb356d8 10.0.18.146:7003 slave c1ea9a2adaa3f9790ccf29227293c9bafc6efb7b 0 1476956727243 5 connected b3798ae7f13e4fca57e7a505eca0dae1f19ce4c3 10.0.18.145:7001 slave 2e2f563a1467c2515857d70d762c26b06faabbd4 0 1476956725239 2 connected 2e2f563a1467c2515857d70d762c26b06faabbd4 10.0.18.145:7000 master - 0 1476956726241 1 connected 10923-16383 请原谅我的失误!再来一次: #./bin/redis-trib.rb reshard 10.0.18.147:7006 >>> Performing Cluster Check (using node 10.0.18.147:7006) M: 5cbec92f52fbe9da7cd643cd34e87055c5358018 10.0.18.147:7006 slots:0-999 (1000 slots) master #现在18.147:7006节点有1000个slot了 0 additional replica(s) S: b3798ae7f13e4fca57e7a505eca0dae1f19ce4c3 10.0.18.145:7001 slots: (0 slots) slave replicates 2e2f563a1467c2515857d70d762c26b06faabbd4 M: c1ea9a2adaa3f9790ccf29227293c9bafc6efb7b 10.0.18.147:7004 slots:1000-5460 (4461 slots) master 1 additional replica(s) S: 4eeee08df0ee4ee0db34b39bf6922a24edb356d8 10.0.18.146:7003 slots: (0 slots) slave replicates c1ea9a2adaa3f9790ccf29227293c9bafc6efb7b M: 2e2f563a1467c2515857d70d762c26b06faabbd4 10.0.18.145:7000 slots:10923-16383 (5461 slots) master 1 additional replica(s) S: 37ea68d5cfd3953bde636656dd2250d5f6a7029d 10.0.18.147:7005 slots: (0 slots) slave replicates faee6763e5f591d9d63df8d841041255ae7992f3 M: faee6763e5f591d9d63df8d841041255ae7992f3 10.0.18.146:7002 slots:5461-10922 (5462 slots) master 1 additional replica(s) [OK] All nodes agree about slots configuration. >>> Check for open slots... >>> Check slots coverage... [OK] All 16384 slots covered. How many slots do you want to move (from 1 to 16384)? 1000 #输入1000 What is the receiving node ID? c1ea9a2adaa3f9790ccf29227293c9bafc6efb7b #输入10.0.18.147:7004节点的node id,因为接收这1000个slot的节点是10.0.18.147:7004 Please enter all the source node IDs. Type 'all' to use all the nodes as source nodes for the hash slots. Type 'done' once you entered all the source nodes IDs. Source node #1:5cbec92f52fbe9da7cd643cd34e87055c5358018 #输入10.0.18.147:7006节点的node id,因为是要从此节点转移slot。 Source node #2:done #输入done ………… Moving slot 996 from 5cbec92f52fbe9da7cd643cd34e87055c5358018 Moving slot 997 from 5cbec92f52fbe9da7cd643cd34e87055c5358018 Moving slot 998 from 5cbec92f52fbe9da7cd643cd34e87055c5358018 Moving slot 999 from 5cbec92f52fbe9da7cd643cd34e87055c5358018 Do you want to proceed with the proposed reshard plan (yes/no)? yes #输入yes ……………… Moving slot 996 from 10.0.18.147:7006 to 10.0.18.147:7004: Moving slot 997 from 10.0.18.147:7006 to 10.0.18.147:7004: Moving slot 998 from 10.0.18.147:7006 to 10.0.18.147:7004: Moving slot 999 from 10.0.18.147:7006 to 10.0.18.147:7004: 查看集群信息: #./bin/redis-cli -h 10.0.18.146 -p 7002 cluster nodes 5cbec92f52fbe9da7cd643cd34e87055c5358018 10.0.18.147:7006 master - 0 1476957466206 7 connected 37ea68d5cfd3953bde636656dd2250d5f6a7029d 10.0.18.147:7005 slave faee6763e5f591d9d63df8d841041255ae7992f3 0 1476957469212 6 connected faee6763e5f591d9d63df8d841041255ae7992f3 10.0.18.146:7002 myself,master - 0 0 3 connected 5461-10922 c1ea9a2adaa3f9790ccf29227293c9bafc6efb7b 10.0.18.147:7004 master - 0 1476957467206 8 connected 0-5460 4eeee08df0ee4ee0db34b39bf6922a24edb356d8 10.0.18.146:7003 slave c1ea9a2adaa3f9790ccf29227293c9bafc6efb7b 0 1476957469211 8 connected b3798ae7f13e4fca57e7a505eca0dae1f19ce4c3 10.0.18.145:7001 slave 2e2f563a1467c2515857d70d762c26b06faabbd4 0 1476957468209 2 connected 2e2f563a1467c2515857d70d762c26b06faabbd4 10.0.18.145:7000 master - 0 1476957470214 1 connected 10923-16383 现在可以删除10.0.18.147:7006这个master节点了,如下: #./bin/redis-trib.rb del-node 10.0.18.147:7006 5cbec92f52fbe9da7cd643cd34e87055c5358018 >>> Removing node 5cbec92f52fbe9da7cd643cd34e87055c5358018 from cluster 10.0.18.147:7006 >>> Sending CLUSTER FORGET messages to the cluster... >>> SHUTDOWN the node. /usr/lib/ruby/gems/1.8/gems/redis-3.3.1/lib/redis/client.rb:121:in `call': ERR unknown command 'shutdown' (Redis::CommandError) from /usr/lib/ruby/gems/1.8/gems/redis-3.3.1/lib/redis.rb:306:in `shutdown' from /usr/lib/ruby/gems/1.8/gems/redis-3.3.1/lib/redis/client.rb:293:in `with_reconnect' from /usr/lib/ruby/gems/1.8/gems/redis-3.3.1/lib/redis.rb:304:in `shutdown' from /usr/lib/ruby/gems/1.8/gems/redis-3.3.1/lib/redis.rb:58:in `synchronize' from /usr/lib/ruby/1.8/monitor.rb:242:in `mon_synchronize' from /usr/lib/ruby/gems/1.8/gems/redis-3.3.1/lib/redis.rb:58:in `synchronize' from /usr/lib/ruby/gems/1.8/gems/redis-3.3.1/lib/redis.rb:303:in `shutdown' from ./bin/redis-trib.rb:1389:in `delnode_cluster_cmd' from ./bin/redis-trib.rb:1695:in `send' from ./bin/redis-trib.rb:1695 删除之后再次查看节点信息: #./bin/redis-cli -h 10.0.18.146 -p 7002 cluster nodes 37ea68d5cfd3953bde636656dd2250d5f6a7029d 10.0.18.147:7005 slave faee6763e5f591d9d63df8d841041255ae7992f3 0 1476957619621 6 connected faee6763e5f591d9d63df8d841041255ae7992f3 10.0.18.146:7002 myself,master - 0 0 3 connected 5461-10922 c1ea9a2adaa3f9790ccf29227293c9bafc6efb7b 10.0.18.147:7004 master - 0 1476957618620 8 connected 0-5460 4eeee08df0ee4ee0db34b39bf6922a24edb356d8 10.0.18.146:7003 slave c1ea9a2adaa3f9790ccf29227293c9bafc6efb7b 0 1476957619621 8 connected b3798ae7f13e4fca57e7a505eca0dae1f19ce4c3 10.0.18.145:7001 slave 2e2f563a1467c2515857d70d762c26b06faabbd4 0 1476957616615 2 connected 2e2f563a1467c2515857d70d762c26b06faabbd4 10.0.18.145:7000 master - 0 1476957618620 1 connected 10923-16383 可以看到10.0.18.147:7006这个master节点已经不存在了!
参考链接:http://blog.csdn.net/huwei2003/article/details/50973967
不足之处,请多多指出!