在项目上线之后发现了SESSION同步出现了问题(由于之前的项目是交由WAS管理),然而在tomcat下并没有提供这样的功能,因此需要在当前环境下搭建session集群。了解之后发现最为被任何的方案是:redis集群。于是就有了今天的笔记。
首先需要了解的是,redis是去中心化的设计,大致翻阅了几篇文档之后斗胆得到一个结论:集群环境的redis节点数与几台机器并没有必然的关系(故,你可以在一台机器上安装多个redis节点)。当然出于高可用的角度考虑 当然是在不同的机器上更为稳妥,因为你无法保证硬件故障不出现。
笔者这里模拟一台机器6个节点的模式。本地开放虚拟机,IP为 192.168.5.131
创建6个节点:192.168.5.131:7001 192.168.5.131:7002 192.168.5.131:7003 192.168.5.131:7004 192.168.5.131:7005 192.168.5.131:7006
创建用户rds,创建目录 /rds 用于安装这些玩意 授权rds当前目录所有权限
chown -R rds /etc/profile
mkdir /rds
chown -R rds /rds
赋予rds /etc/profile权限
chown -R rds /etc/profile
开放网络权限(笔者这里由于是单机,所以只需要开通以上节点到应用的网络权限)
如果是两台机(192.168.0.1:7001,192.168.0.1:7002,192.168.0.1:7003,192.168.0.2:7004,192.168.0.2:7005,192.168.0.2:7006)在开通两台机器到应用的访问权限同时,还要开通以下网络策略。
192.168.0.1 -> 192.168.0.2:17004
192.168.0.1 -> 192.168.0.2:17005
192.168.0.1 -> 192.168.0.2:17006
192.168.0.2 -> 192.168.0.1:17001
192.168.0.2 -> 192.168.0.1:17002
192.168.0.2 -> 192.168.0.1:17003
需要准备以下安装包
- ruby
- rubygems
- redisapi
- tcl
- redis
下载地址: https://download.csdn.net/download/a116475939/11135926 笔者已经打包好了
将以上文件上传到 /rds 底下
创建6个redis-conf文件
文件名称均为redis.conf
daemonize yes
pidfile /var/run/redis_7001.pid
logfile "/rds/redis/log/7001.log"
port 7001
bind 192.168.5.131
cluster-enabled yes
cluster-config-file nodes_7001.conf
cluster-node-timeout 5000
appendonly yes
其中,端口和IP以下内容需要自己修改,一定确保每台机器不同!
安装ruby
tar zxf ruby-2.3.1.tar.gz
mkdir /rds/ruby
cd ruby-2.3.1
./configure -prefix=/rds/ruby
make && make install
echo "PATH=$PATH:/rds/ruby/bin" >> /etc/profile
source /etc/profile
一顿操作之后
ruby -v
可以看到
ruby 2.3.1p112 (2016-04-26 revision 54768) [x86_64-linux]
ruby安装成功
安装zlib
mkdir /rds/zlib
tar -vxf /rds/zlib-1.2.11.tar.gz -C /rds/zlib
cd /rds/zlib/zlib-1.2.11/
./configure --prefix=/rds/zlib
make && make install
cd /rds/ruby-2.3.1/ext/zlib/
ruby extconf.rb --with-zlib-include=/rds/zlib/include/ --with-zlib-lib=/rds/zlib/lib
make && make install
安装rubygems
mkdir /rds/rubygems
tar zxf /rds/rubygems-2.7.6.tgz -C /rds/rubygems
cd /rds/rubygems/rubygems-2.7.6
ruby setup.rb
echo "PATH=$PATH:/opt/rubygems-2.7.6/bin" >> /etc/profile
source /etc/profile
安装redisapi
cd /rds
gem install -l redis-3.3.0.gem
gem list redis
安装tcl,期间出现的各种警告不吊它
mkdir /rds/tcl
cd /rds
tar zxf tcl8.6.0-src.tar.gz -C /rds/tcl
cd /rds/tcl/tcl8.6.0/unix
./configure -prefix=/rds/tcl
make && make install
make install-private-headers
ln -v -sf tclsh8.6 /rds/tcl/bin/tclsh
echo "PATH=$PATH:/rds/tcl/bin" >> /etc/profile
source /etc/profile
安装redis
mkdir /rds/redis
cd /rds
tar zxf redis-3.2.0.tar.gz -C /rds/redis
cd /rds/redis/redis-3.2.0/src
修改自己的安装目录
vi Makefile (PREFIX?=/rds/redis)
执行安装
make && make test && make install
创建集群目录
mkdir -p /rds/redis/etc/redis-cluster/{7001,7002,7003,7004,7005,7006}
将上面编辑好的redis-config文件放入以上6个文件夹,创建日志目录
mkdir /rds/redis/log
然后启动6个节点
/rds/redis/bin/redis-server /rds/redis/etc/redis-cluster/7001/redis.conf
/rds/redis/bin/redis-server /rds/redis/etc/redis-cluster/7002/redis.conf
/rds/redis/bin/redis-server /rds/redis/etc/redis-cluster/7003/redis.conf
/rds/redis/bin/redis-server /rds/redis/etc/redis-cluster/7004/redis.conf
/rds/redis/bin/redis-server /rds/redis/etc/redis-cluster/7005/redis.conf
/rds/redis/bin/redis-server /rds/redis/etc/redis-cluster/7006/redis.conf
此时查看redis可以看刀服务已经启动
ps -ef | grep redis-server
rds 75181 1 0 02:25 ? 00:00:00 /rds/redis/bin/redis-server 192.168.5.131:7001 [cluster]
rds 75185 1 0 02:25 ? 00:00:00 /rds/redis/bin/redis-server 192.168.5.131:7002 [cluster]
rds 75189 1 0 02:25 ? 00:00:00 /rds/redis/bin/redis-server 192.168.5.131:7003 [cluster]
rds 75193 1 0 02:25 ? 00:00:00 /rds/redis/bin/redis-server 192.168.5.131:7004 [cluster]
rds 75197 1 0 02:25 ? 00:00:00 /rds/redis/bin/redis-server 192.168.5.131:7005 [cluster]
rds 75201 1 0 02:25 ? 00:00:00 /rds/redis/bin/redis-server 192.168.5.131:7006 [cluster]
rds 75205 36563 0 02:25 pts/0 00:00:00 grep --color=auto redis-server
将集群命令复制过来,执行之(如果是不同机器,改为自己相应的IP即可)
mkdir /rds/bin
cp /rds/redis/redis-3.2.0/src/redis-trib.rb /rds/bin
/rds/bin/redis-trib.rb create --replicas 1 192.168.5.131:7001 192.168.5.131:7002 192.168.5.131:7003 192.168.5.131:7004 192.168.5.131:7005 192.168.5.131:7006
此时就会提示
>>> Creating cluster
>>> Performing hash slots allocation on 6 nodes...
Using 3 masters:
192.168.5.131:7001
192.168.5.131:7002
192.168.5.131:7003
Adding replica 192.168.5.131:7004 to 192.168.5.131:7001
Adding replica 192.168.5.131:7005 to 192.168.5.131:7002
Adding replica 192.168.5.131:7006 to 192.168.5.131:7003
M: 3f7c78854b2586dbd5eac4a88b908c77a45c213f 192.168.5.131:7001
slots:0-5460 (5461 slots) master
M: 32d3e2b63a8e408790f220bd304d89f262a32d97 192.168.5.131:7002
slots:5461-10922 (5462 slots) master
M: c9e44b0d6d6fc5aa1628c5dc5438eee454a3595d 192.168.5.131:7003
slots:10923-16383 (5461 slots) master
S: 576f072139166640c83c1a5a1d72cb383bd0223c 192.168.5.131:7004
replicates 3f7c78854b2586dbd5eac4a88b908c77a45c213f
S: acb73e4a2d516409a3f672bad843a1d5a7b28500 192.168.5.131:7005
replicates 32d3e2b63a8e408790f220bd304d89f262a32d97
S: 7b825c7e731e3063db48b4fc2e70629f9e329241 192.168.5.131:7006
replicates c9e44b0d6d6fc5aa1628c5dc5438eee454a3595d
Can I set the above configuration? (type 'yes' to accept):
输入yes
>>> Nodes configuration updated
>>> Assign a different config epoch to each node
>>> Sending CLUSTER MEET messages to join the cluster
Waiting for the cluster to join.
>>> Performing Cluster Check (using node 192.168.5.131:7001)
M: 3f7c78854b2586dbd5eac4a88b908c77a45c213f 192.168.5.131:7001
slots:0-5460 (5461 slots) master
M: 32d3e2b63a8e408790f220bd304d89f262a32d97 192.168.5.131:7002
slots:5461-10922 (5462 slots) master
M: c9e44b0d6d6fc5aa1628c5dc5438eee454a3595d 192.168.5.131:7003
slots:10923-16383 (5461 slots) master
M: 576f072139166640c83c1a5a1d72cb383bd0223c 192.168.5.131:7004
slots: (0 slots) master
replicates 3f7c78854b2586dbd5eac4a88b908c77a45c213f
M: acb73e4a2d516409a3f672bad843a1d5a7b28500 192.168.5.131:7005
slots: (0 slots) master
replicates 32d3e2b63a8e408790f220bd304d89f262a32d97
M: 7b825c7e731e3063db48b4fc2e70629f9e329241 192.168.5.131:7006
slots: (0 slots) master
replicates c9e44b0d6d6fc5aa1628c5dc5438eee454a3595d
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
表示集群已经搭建完成
连接其中一个节点
/rds/redis/bin/redis-cli -c -h 192.168.5.131 -p 7001
打印集群信息,集群当前节点信息
cluster info
cluster nodes
结果如下
192.168.5.131:7001> cluster info
cluster_state:ok
cluster_slots_assigned:16384
cluster_slots_ok:16384
cluster_slots_pfail:0
cluster_slots_fail:0
cluster_known_nodes:6
cluster_size:3
cluster_current_epoch:6
cluster_my_epoch:1
cluster_stats_messages_sent:630
cluster_stats_messages_received:630
主从信息一览无余
32d3e2b63a8e408790f220bd304d89f262a32d97 192.168.5.131:7002 master - 0 1555925720332 2 connected 5461-10922
acb73e4a2d516409a3f672bad843a1d5a7b28500 192.168.5.131:7005 slave 32d3e2b63a8e408790f220bd304d89f262a32d97 0 1555925720835 5 connected
3f7c78854b2586dbd5eac4a88b908c77a45c213f 192.168.5.131:7001 myself,master - 0 0 1 connected 0-5460
7b825c7e731e3063db48b4fc2e70629f9e329241 192.168.5.131:7006 slave c9e44b0d6d6fc5aa1628c5dc5438eee454a3595d 0 1555925721842 6 connected
c9e44b0d6d6fc5aa1628c5dc5438eee454a3595d 192.168.5.131:7003 master - 0 1555925720332 3 connected 10923-16383
576f072139166640c83c1a5a1d72cb383bd0223c 192.168.5.131:7004 slave 3f7c78854b2586dbd5eac4a88b908c77a45c213f 0 1555925721339 4 connected
启动集群如果报这个错
Node 192.168.5.131:7001 is not empty
那么,切换到每个节点
/rds/redis/bin/redis-cli -c -h 192.168.5.131 -p 700X
flushall
cluster reset
/rds/redis/bin/redis-cli -h 192.168.5.131 -p 700X shutdown
然后删除相关的数据文件
cd /rds/redis
rm -rf appendonly.aof nodes_*
再次启动集群
/rds/bin/redis-trib.rb create --replicas 1 192.168.5.131:7001 192.168.5.131:7002 192.168.5.131:7003 192.168.5.131:7004 192.168.5.131:7005 192.168.5.131:7006
安装redis时test出错
[err]: Test replication partial resync: ok psync (diskless: no, reconnect: 1) in tests/integration/replication-psync.tcl
Expected condition '[s -1 sync_partial_ok] > 0' to be true ([s -1 sync_partial_ok] > 0)
[err]: Test replication partial resync: ok psync (diskless: yes, reconnect: 1) in tests/integration/replication-psync.tcl
Expected condition '[s -1 sync_partial_ok] > 0' to be true ([s -1 sync_partial_ok] > 0)
Cleanup: may take some time... OK
make: *** [test] 错误 1
解决方案
https://blog.csdn.net/chenggong2dm/article/details/51911053
剩下的常见问题会不断记录,未完待续