redis4 集群

配置redis实例 并启动

一般使用三台服务器, 每台服务器上启动两个实例,互为主备。

  1. 下载redis4.0.9 源码, 编译
  2. 每台服务器上配置两个实例,根据端口后区分
mkdir -p 7001 
mkdir -p 7002 

在7001 7002目录下拷贝并编辑配置文件redis.conf 。

cluster-enabled yes
bind 10.30.16.202
cluster-config-file nodes-7001.conf
cluster-node-timeout 5000
appendonly yes

  1. 启动实例:
./src/redis-server 7001/redis.conf > redis-server-7001.out 2>&1 &

日志文件为 redis-server-7001.out 。

下面是启动实例后的日志信息, 根据日志有些参数需要调整,如下:

88841:C 09 Jun 15:59:11.598 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
88841:C 09 Jun 15:59:11.599 # Redis version=4.0.9, bits=64, commit=00000000, modified=0, pid=88841, just started
88841:C 09 Jun 15:59:11.599 # Configuration loaded
88841:M 09 Jun 15:59:11.600 * No cluster configuration found, I'm 81801de023462c4b4096cf374350adb5b7100e84
88841:M 09 Jun 15:59:11.604 * Running mode=cluster, port=7002.
88841:M 09 Jun 15:59:11.604 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
88841:M 09 Jun 15:59:11.604 # Server initialized
88841:M 09 Jun 15:59:11.604 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
88841:M 09 Jun 15:59:11.604 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.
88841:M 09 Jun 15:59:11.604 * Ready to accept connections

  • 参数 /proc/sys/net/core/somaxconn
    临时修改: echo 511 > /proc/sys/net/core/somaxconn
    永久修改: 在 /etc/sysctl.conf ,添加
# for redis cluster tcp backlog warning 
net.core.somaxconn = 1024
  • 参数 vm.overcommit_memory :
    进程通常调用malloc()函数来分配内存,内存决定是否有足够的可用内存,并允许或拒绝内存分配的请求。Linux支持超量分配内存,以允许分配比可用RAM加上交换内存的请求。
    vm.overcommit_memory参数有三种可能的配置:

0 表示检查是否有足够的内存可用,如果是,允许分配;如果内存不够,拒绝该请求,并返回一个错误给应用程序。
1 表示根据vm.overcommit_ratio定义的值,允许分配超出物理内存加上交换内存的请求。vm.overcommit_ratio参数是一个百分比,加上内存量决定内存可以超量分配多少内存。
例如,vm.overcommit_ratio值为50,而内存有1GB,那么这意味着在内存分配请求失败前,加上交换内存,内存将允许高达1.5GB的内存分配请求。
2 表示内核总是返回true。
除了以上几个常见的Linux内核调优方法外,还有一些其他的方法,管理员可根据需要进行适当调整。

在hadoop中 , 要求此项设置为0 ; 在redis中要求此项设置为1.

  • 参数 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' , 默认总是启用透明大页面,需要关闭;

在启动了六个实例后, 后续是将这些实例组织为集群:

#脚本 starCluster.sh
./src/redis-trib.rb create --replicas 1 10.30.16.202:7000 10.30.16.202:7001 10.30.16.203:7002 10.30.16.203:7003 10.30.16.204:7004 10.30.16.204:7005

其中 参数 --replicas 1 表示后续的列表中每个master跟一个salve。
(若机器上没有ruby环境, 会报错,见另一篇ruby安装。)

启动集群:

[root@node202 redis-4.0.9]# ./startCluster.sh 
>>> Creating cluster
>>> Performing hash slots allocation on 6 nodes...
Using 3 masters:
10.30.16.202:7000
10.30.16.203:7002
10.30.16.204:7004
Adding replica 10.30.16.203:7003 to 10.30.16.202:7000
Adding replica 10.30.16.204:7005 to 10.30.16.203:7002
Adding replica 10.30.16.202:7001 to 10.30.16.204:7004
M: e5e7bb21b75c5fd222cf8a6eb99ef19606480c99 10.30.16.202:7000
   slots:0-5460 (5461 slots) master
S: e8c848eca117a91e7166f35abdea3012cda35bbe 10.30.16.202:7001
   replicates d2b2e76cdc4e7ec41b7a0002d43cbe80127f3a60
M: 81801de023462c4b4096cf374350adb5b7100e84 10.30.16.203:7002
   slots:5461-10922 (5462 slots) master
S: efef338ae359169cbc88348dc992155471932797 10.30.16.203:7003
   replicates e5e7bb21b75c5fd222cf8a6eb99ef19606480c99
M: d2b2e76cdc4e7ec41b7a0002d43cbe80127f3a60 10.30.16.204:7004
   slots:10923-16383 (5461 slots) master
S: 8752f40fe95cb4cd65e43a76208a883ed50bba6f 10.30.16.204:7005
   replicates 81801de023462c4b4096cf374350adb5b7100e84
Can I set the above configuration? (type 'yes' to accept): yes
>>> Nodes configuration updated
>>> Assign a different config epoch to each node
>>> Sending CLUSTER MEET messages to join the cluster
Waiting for the cluster to join...
>>> Performing Cluster Check (using node 10.30.16.202:7000)
M: e5e7bb21b75c5fd222cf8a6eb99ef19606480c99 10.30.16.202:7000
   slots:0-5460 (5461 slots) master
   1 additional replica(s)
S: 8752f40fe95cb4cd65e43a76208a883ed50bba6f 10.30.16.204:7005
   slots: (0 slots) slave
   replicates 81801de023462c4b4096cf374350adb5b7100e84
S: e8c848eca117a91e7166f35abdea3012cda35bbe 10.30.16.202:7001
   slots: (0 slots) slave
   replicates d2b2e76cdc4e7ec41b7a0002d43cbe80127f3a60
S: efef338ae359169cbc88348dc992155471932797 10.30.16.203:7003
   slots: (0 slots) slave
   replicates e5e7bb21b75c5fd222cf8a6eb99ef19606480c99
M: d2b2e76cdc4e7ec41b7a0002d43cbe80127f3a60 10.30.16.204:7004
   slots:10923-16383 (5461 slots) master
   1 additional replica(s)
M: 81801de023462c4b4096cf374350adb5b7100e84 10.30.16.203:7002
   slots:5461-10922 (5462 slots) master
   1 additional replica(s)
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.

验证

redis-cli -h hostname -c -p 端口
-h 表示连接的主机ip或域名;
-p 表示端口
-c 表示连接到集群

[root@node202 src]# ./redis-cli -h 10.30.16.202 -c -p 7001
10.30.16.202:7001> show
(error) ERR unknown command 'show'
10.30.16.202:7001> set hello world
-> Redirected to slot [866] located at 10.30.16.202:7000
OK
10.30.16.202:7000> get hello
"world"
10.30.16.202:7000> keys
(error) ERR wrong number of arguments for 'keys' command
10.30.16.202:7000> KEYS *
1) "hello"
10.30.16.202:7000> set zzz 123
-> Redirected to slot [10118] located at 10.30.16.203:7002
OK
10.30.16.203:7002> keys *
1) "zzz"
10.30.16.203:7002> KEYS *
1) "zzz"
10.30.16.203:7002> get hello
-> Redirected to slot [866] located at 10.30.16.202:7000
"world"
10.30.16.202:7000> 

可以看到, 虽然连接的是7001端口(备), 但是在真正使用时会自动跳转到相应的机器和端口上。

Redis 集群没有并使用传统的一致性哈希来分配数据,而是采用另外一种叫做哈希槽 (hash slot)的方式来分配的。redis cluster 默认分配了 16384 个slot,当我们set一个key 时,会用CRC16算法来取模得到所属的slot,然后将这个key 分到哈希槽区间的节点上,具体算法就是:CRC16(key) % 16384。所以我们在测试的时候看到set 和 get 的时候,直接跳转到了不同端口的节点。

Redis 集群会把数据存在一个 master 节点,然后在这个 master 和其对应的salve 之间进行数据同步。当读取数据时,也根据一致性哈希算法到对应的 master 节点获取数据。只有当一个master 挂掉之后,才会启动一个对应的 salve 节点,充当 master 。

需要注意的是:必须要3个或以上的主节点,否则在创建集群时会失败,并且当存活的主节点数小于总节点数的一半时,整个集群就无法提供服务了。

集群状态命令

cluster info : 显示集群的信息
cluster nodes: 显示集群的节点信息

10.30.16.202:7000> cluster info
cluster_state:ok
cluster_slots_assigned:16384
cluster_slots_ok:16384
cluster_slots_pfail:0
cluster_slots_fail:0
cluster_known_nodes:6
cluster_size:3
cluster_current_epoch:6
cluster_my_epoch:1
cluster_stats_messages_ping_sent:3619
cluster_stats_messages_pong_sent:3653
cluster_stats_messages_sent:7272
cluster_stats_messages_ping_received:3648
cluster_stats_messages_pong_received:3619
cluster_stats_messages_meet_received:5
cluster_stats_messages_received:7272
10.30.16.202:7000> cluster nodes
8752f40fe95cb4cd65e43a76208a883ed50bba6f 10.30.16.204:7005@17005 slave 81801de023462c4b4096cf374350adb5b7100e84 0 1528539083570 6 connected
e8c848eca117a91e7166f35abdea3012cda35bbe 10.30.16.202:7001@17001 slave d2b2e76cdc4e7ec41b7a0002d43cbe80127f3a60 0 1528539084570 5 connected
efef338ae359169cbc88348dc992155471932797 10.30.16.203:7003@17003 slave e5e7bb21b75c5fd222cf8a6eb99ef19606480c99 0 1528539084570 4 connected
d2b2e76cdc4e7ec41b7a0002d43cbe80127f3a60 10.30.16.204:7004@17004 master - 0 1528539083000 5 connected 10923-16383
e5e7bb21b75c5fd222cf8a6eb99ef19606480c99 10.30.16.202:7000@17000 myself,master - 0 1528539084000 1 connected 0-5460
81801de023462c4b4096cf374350adb5b7100e84 10.30.16.203:7002@17002 master - 0 1528539083570 3 connected 5461-10922

故障转移

  • 对master 或slaver 使用 DEBUG SEGFAULT
    使用该命令后, 相应端口的实例进程会被杀掉。 即通过ps-ef | grep redis 查看不到相应的进程。
10.30.16.202:7000> DEBUG SEGFAULT
Could not connect to Redis at 10.30.16.202:7000: Connection refused
not connected> 
not connected> 
not connected> exit
[root@node202 src]# ./redis-cli -h 10.30.16.202 -c -p 7001
10.30.16.202:7001> cluster nodes
81801de023462c4b4096cf374350adb5b7100e84 10.30.16.203:7002@17002 master - 0 1528539342049 3 connected 5461-10922
e5e7bb21b75c5fd222cf8a6eb99ef19606480c99 10.30.16.202:7000@17000 master,fail - 1528539328099 1528539325593 1 disconnected
d2b2e76cdc4e7ec41b7a0002d43cbe80127f3a60 10.30.16.204:7004@17004 master - 0 1528539342049 5 connected 10923-16383
8752f40fe95cb4cd65e43a76208a883ed50bba6f 10.30.16.204:7005@17005 slave 81801de023462c4b4096cf374350adb5b7100e84 0 1528539341548 6 connected
efef338ae359169cbc88348dc992155471932797 10.30.16.203:7003@17003 master - 0 1528539342551 7 connected 0-5460
e8c848eca117a91e7166f35abdea3012cda35bbe 10.30.16.202:7001@17001 myself,slave d2b2e76cdc4e7ec41b7a0002d43cbe80127f3a60 0 1528539341000 2 connected

(从DEBUG SEGFAULT 恢复 ,需要重启该实例!)

[root@node202 redis-4.0.9]# ./src/redis-cli -h 10.30.16.202 -c -p 7000
10.30.16.202:7000> cluster nodes
e5e7bb21b75c5fd222cf8a6eb99ef19606480c99 10.30.16.202:7000@17000 myself,slave efef338ae359169cbc88348dc992155471932797 0 1528539832000 1 connected
efef338ae359169cbc88348dc992155471932797 10.30.16.203:7003@17003 master - 0 1528539831268 7 connected 0-5460
8752f40fe95cb4cd65e43a76208a883ed50bba6f 10.30.16.204:7005@17005 slave 81801de023462c4b4096cf374350adb5b7100e84 0 1528539832000 6 connected
e8c848eca117a91e7166f35abdea3012cda35bbe 10.30.16.202:7001@17001 master - 0 1528539832571 8 connected 10923-16383
d2b2e76cdc4e7ec41b7a0002d43cbe80127f3a60 10.30.16.204:7004@17004 slave e8c848eca117a91e7166f35abdea3012cda35bbe 0 1528539832270 8 connected
81801de023462c4b4096cf374350adb5b7100e84 10.30.16.203:7002@17002 master - 0 1528539831268 3 connected 5461-10922

重启后, 可以看到7001变为master , 7000为slave 。

  • 对slaver使用 CLUSTER FAILOVER 命令
    该命令只能对slave使用, 是将salve提升为master 。
10.30.16.202:7001> CLUSTER FAILOVER
OK
10.30.16.202:7001> 
10.30.16.202:7001> 
10.30.16.202:7001> cluster nodes
81801de023462c4b4096cf374350adb5b7100e84 10.30.16.203:7002@17002 master - 0 1528539628000 3 connected 5461-10922
e5e7bb21b75c5fd222cf8a6eb99ef19606480c99 10.30.16.202:7000@17000 master,fail - 1528539328099 1528539325593 1 disconnected
d2b2e76cdc4e7ec41b7a0002d43cbe80127f3a60 10.30.16.204:7004@17004 slave e8c848eca117a91e7166f35abdea3012cda35bbe 0 1528539628959 8 connected
8752f40fe95cb4cd65e43a76208a883ed50bba6f 10.30.16.204:7005@17005 slave 81801de023462c4b4096cf374350adb5b7100e84 0 1528539627555 6 connected
efef338ae359169cbc88348dc992155471932797 10.30.16.203:7003@17003 master - 0 1528539628557 7 connected 0-5460
e8c848eca117a91e7166f35abdea3012cda35bbe 10.30.16.202:7001@17001 myself,master - 0 1528539628000 8 connected 10923-16383

你可能感兴趣的:(redis4 集群)