Redis Cluster集群部署-故障模拟-监控

一.基础环境

1):准备系统:Centos7.x Ansible

2):准备主机:

192.168.1.115 redis01
192.168.1.192 redis02
192.168.1.23 redis03
192.168.1.47 redis04
192.168.1.65 redis05
192.168.1.14 redis06

3):软件存放目录:/application/redis-3.2.3.xxxx
数据库存放目录:/data/cluster/端口号

4):准备环境
预安装:
安装ruby
yum install -y rubygems //安装rubygem
gem install redis       //安装redis的接口包
gem list 
如果上面无法安装报依赖错误,请线更新系统:yum update
依赖包:

yum install gcc-c++ patch readline readline-devel zlib zlib-devel   
yum install libyaml-devel libffi-devel openssl-devel make   
yum install bzip2 autoconf automake libtool bison iconv-devel sqlite-devel 
curl -L get.rvm.io | bash -s stable

rvm是一个命令行工具,提供多版本ruby环境的管理和切换,还可以根据项目管理不同的gemset。
如果没有生成ssh key,这一步可能会失败,直接ssh-keygen生成key之后再执行安装即可。
如果失败的话,执行:command curl -sSL https://rvm.io/mpapis.asc | gpg2 --import -

然后再安装一次rvm,根据安装提示进行一些操作。然后rvm -v就能看到rvm安装上了。
如果安装不上,可以尝试连接后再安装。
find / -name rvm -print

/usr/local/rvm 
/usr/local/rvm/src/rvm 
/usr/local/rvm/src/rvm/bin/rvm 
/usr/local/rvm/src/rvm/lib/rvm 
/usr/local/rvm/src/rvm/scripts/rvm 
/usr/local/rvm/bin/rvm 
/usr/local/rvm/lib/rvm 
/usr/local/rvm/scripts/rvm
 
source /usr/local/rvm/scripts/rvm

Rvm list known 

Redis Cluster集群部署-故障模拟-监控_第1张图片
rvm install 2.3.7 #安装ruby-2.3.7
rvm use 2.3.7 #使用ruby
rvm use 2.3.7 --default #设置默认值

软件:redis-3.2.3.tar.gz
解压编译:tar -zxvf redis-3.2.3.tar.gz
Cd redis-3.2.3 && make && make install

创建数据存放目录:ansible redis-cluster -m shell -a “mkdir /data/cluster”
Ansible redis-cluster -m shell -a “mkdir /data/cluster/{7000…7005}/logs -p”
#这里在6个节点建立数据存放点/data/cluster 其中7000-7005为数据的存放的对应端口。

5):编写配置文件

bind 192.168.1.115

protected-mode yes

port 7000

tcp-backlog 511
timeout 0
tcp-keepalive 300

daemonize yes

supervised no
pidfile /var/run/redis_6379.pid

loglevel notice
logfile "/data/cluster/7000/logs/redis.log"

databases 16

save 900 1
save 300 10
save 60 10000

stop-writes-on-bgsave-error yes

rdbcompression yes

rdbchecksum yes

dbfilename dump.rdb

dir ./

masterauth redis-cluster-gmjk
requirepass redis-cluster-gmjk

slave-serve-stale-data yes

slave-read-only yes

repl-diskless-sync no

repl-diskless-sync-delay 5
repl-disable-tcp-nodelay no

slave-priority 100
appendonly yes

appendfilename "appendonly.aof"
appendfsync everysec

no-appendfsync-on-rewrite no
auto-aof-rewrite-percentage 100
auto-aof-rewrite-min-size 64mb

aof-load-truncated yes
lua-time-limit 5000

cluster-enabled yes


cluster-config-file nodes.conf
cluster-node-timeout 5000

slowlog-log-slower-than 10000

slowlog-max-len 128


latency-monitor-threshold 0


notify-keyspace-events ""

hash-max-ziplist-entries 512
hash-max-ziplist-value 64

list-max-ziplist-size -2

list-compress-depth 0

set-max-intset-entries 512

zset-max-ziplist-entries 128
zset-max-ziplist-value 64

hll-sparse-max-bytes 3000

activerehashing yes

client-output-buffer-limit normal 0 0 0
client-output-buffer-limit slave 256mb 64mb 60
client-output-buffer-limit pubsub 32mb 8mb 60

hz 10

aof-rewrite-incremental-fsync yes

#将配置文件拷贝到其他5台机器上,修改端口号即可.
Ansible redis-cluster -m copy -a “src=/data/cluster/ desc=/data/cluster/”

二.部署Redis Cluster

Redis01>redis-trib.rb create --replicas 1 192.168.1.115:7000 192.168.1.192:7001 192.168.1.23:7002 192.168.1.47:7003 192.168.1.65:7004 192.168.1.14:7005

#回车yes后提示如下表示部署成功!

Redis Cluster集群部署-故障模拟-监控_第2张图片#–replicas 1表示每个master有一个副本集

#确认redis集群关系:
Redis Cluster集群部署-故障模拟-监控_第3张图片
Master1:7000–>slave:7004
Master2:7002–>slave:7005
Master3:7001–>slave:7003

三.可视化工具

1.Relumin

2.TreeNMS
安装目录/application/tomcat7-8088-treenms
192.168.1.23:8088/treenms 
账号:Admin/treenms@1948
#可视化查看管理工具安装,略

四.可能出现的错误

**1):集群密码设置问题**

	1、密码设置(推荐)
	方式一:修改所有Redis集群中的redis.conf文件加入: 
	masterauth passwd123 
	requirepass passwd123 
	说明:这种方式需要重新启动各节点
	方式二:进入各个实例进行设置:
	./redis-cli -c -p 7000 
	config set masterauth passwd123 
	config set requirepass passwd123 
	config rewrite 
	之后分别使用./redis-cli -c -p 7001,./redis-cli -c -p 7002…..命令给各节点设置上密码。
	注意:各个节点密码都必须一致,否则Redirected就会失败, 推荐这种方式,这种方式会把密码写入到redis.conf里面去,且不用重启。
	用方式二修改密码,./redis-trib.rb check 10.104.111.174:6379执行时可能会报[ERR] Sorry, can't connect to node 10.104.111.174:6379,因为6379的redis.conf没找到密码配置。
	2、设置密码之后如果需要使用redis-trib.rb的各种命令 
	如:./redis-trib.rb check 127.0.0.1:7000,则会报错ERR] Sorry, can’t connect to node 127.0.0.1:7000 
	解决办法:vim /usr/local/rvm/gems/ruby-2.3.3/gems/redis-4.0.0/lib/redis/client.rb,然后修改passord
	
	class Client
	    DEFAULTS = {
	      :url => lambda { ENV["REDIS_URL"] },
	      :scheme => "redis",
	      :host => "127.0.0.1",
	      :port => 6379,
	      :path => nil,
	      :timeout => 5.0,
	      :password => "passwd123",
	      :db => 0,
	      :driver => nil,
	      :id => nil,
	      :tcp_keepalive => 0,
	      :reconnect_attempts => 1,
	      :inherit_socket => false
	    }
	
	注意:client.rb路径可以通过find命令查找:find / -name 'client.rb'
	
	带密码访问集群
	./redis-cli -c -p 7000 -a passwd123


**2):配置文件未开启远程访问问题**
	Redis.conf  中的bind  x.x.x.x最好为集群中能够相互访问的IP网络环境。

**3):最大的坑ruby安装**
	详见:基础环境中的4)

五.模拟故障转移测试

在这里插入图片描述
Redis Cluster集群部署-故障模拟-监控_第4张图片
通过上图可知:
故障前:
Master1:7000–>slave:7004
Master2:7002–>slave:7005
Master3:7001–>slave:7003

Kill -9   192.168.1.23:7002
故障后:
Master1:7000-->slave:7003
Master2:7001-->slave:7002
Master3:7005-->slave:7004

启动:192.168.1.23后:
Redis Cluster集群部署-故障模拟-监控_第5张图片

#发现主从并没有发生转变,状态良好,不会对让恢复后波动影响项目稳定

六.常用命令

1.redis-cli -c -h 192.168.1.115 -p 7000
#-c 访问集群模式 -h访问指定的ip

2.192.168.1.115:7000> auth ‘redis-cluster-gmjk’
#登录后认证后进行操作。

3.cluster nodes
#登录后查看cluster状态
e65d83a1cb4bbf1607030aae0b59b807aa09d7eb 192.168.1.23:7002 slave dd6bffa94b1d0bfba19bcb4d78b26be688767df2 0 1554775152404 7 connected
bc185eaa8aed942d8dd2b20e9fb280b852c2fa62 192.168.1.47:7003 slave c89b43c9a9fe300bac1129e4173d1b9ca8b57657 0 1554775152905 4 connected
c89b43c9a9fe300bac1129e4173d1b9ca8b57657 192.168.1.115:7000 myself,master - 0 0 1 connected 0-5460
39fd20947e7d972ef45fd127441da9a6c8322549 192.168.1.192:7001 master - 0 1554775153908 2 connected 5461-10922
dd6bffa94b1d0bfba19bcb4d78b26be688767df2 192.168.1.14:7005 master - 0 1554775152404 7 connected 10923-16383
6bc6508262b091e4fcd66864c581101782ac6a3a 192.168.1.65:7004 slave 39fd20947e7d972ef45fd127441da9a6c8322549 0 1554775153406 5 connected

4.redis-trib.rb check 192.168.1.115:7000
#无登录,查看集群状态

Performing Cluster Check (using node 192.168.1.115:7000)
M: c89b43c9a9fe300bac1129e4173d1b9ca8b57657 192.168.1.115:7000
slots:0-5460 (5461 slots) master
1 additional replica(s)
S: e65d83a1cb4bbf1607030aae0b59b807aa09d7eb 192.168.1.23:7002
slots: (0 slots) slave
replicates dd6bffa94b1d0bfba19bcb4d78b26be688767df2
S: bc185eaa8aed942d8dd2b20e9fb280b852c2fa62 192.168.1.47:7003
slots: (0 slots) slave
replicates c89b43c9a9fe300bac1129e4173d1b9ca8b57657
M: 39fd20947e7d972ef45fd127441da9a6c8322549 192.168.1.192:7001
slots:5461-10922 (5462 slots) master
1 additional replica(s)
M: dd6bffa94b1d0bfba19bcb4d78b26be688767df2 192.168.1.14:7005
slots:10923-16383 (5461 slots) master
1 additional replica(s)
S: 6bc6508262b091e4fcd66864c581101782ac6a3a 192.168.1.65:7004
slots: (0 slots) slave
replicates 39fd20947e7d972ef45fd127441da9a6c8322549
[OK] All nodes agree about slots configuration.

Check for open slots…
Check slots coverage…
[OK] All 16384 slots covered.

更多参考:
http://www.cnblogs.com/zhoujinyi/p/6477133.html
https://blog.csdn.net/qq_41319311/article/details/80940874
https://www.cnblogs.com/linjiqin/p/7462822.html

7.redis cluster 集群监控

Shell+zabbix:
	https://blog.csdn.net/meijinmeng/article/details/103378759

Python:
https://blog.csdn.net/meijinmeng/article/details/103320695

你可能感兴趣的:(数据库基础及实践)