场景:使用redisson锁,redis-server是集群方式,三主三从
错误信息有:
Unable to write command into connection! Increase connection pool size. Node source: NodeSource [slot=5073, addr=null, redisClient=null, redirect=null, entry=null], connection: RedisConnection@1161754400 [redisClient=[addr=redis://172.21.75.153:6379], channel=[id: 0x8490cd34, L:0.0.0.0/0.0.0.0:33988 ! R:172.21.75.153/172.21.75.153:6379], currentCommand=null, usage=1], command: (EXISTS), params: [redisLock_serviceLock] after 3 retry attempts
org.redisson.client.RedisConnectionException: SlaveConnectionPool no available Redis entries. Master entry host: 172.21.75.157/172.21.75.157:6379 Disconnected hosts: [172.21.75.157/172.21.75.157:6379] Hosts disconnected due to errors during `failedSlaveCheckInterval`: [172.21.75.153/172.21.75.153:6379]
org.redisson.client.RedisConnectionException: SlaveConnectionPool no available Redis entries. Master entry host: 172.21.75.157/172.21.75.157:6379 Disconnected hosts: [172.21.75.157/172.21.75.157:6379, 172.21.75.153/172.21.75.153:6379]
redisson 官方github让升级版本,但这已经是20版本。
报错信息连不上153,报错命令时exists,很明显是加锁时使用exists命令,到153节点读取失败
看153的redis-server日志,错误日志,157是报错从节点对应的主节点:说明从节点连不上主节点
3759119:S 06 May 2023 13:43:04.123 # Cluster state changed: fail
3759119:S 06 May 2023 13:43:42.325 # MASTER timeout: no data nor PING received...
3759119:S 06 May 2023 13:43:42.325 # Connection with master lost.
3759119:S 06 May 2023 13:43:42.325 * Caching the disconnected master state.
3759119:S 06 May 2023 13:43:42.325 * Connecting to MASTER 172.21.75.157:6379
3759119:S 06 May 2023 13:43:42.325 # Unable to connect to MASTER: Network is unreachable
3759119:S 06 May 2023 13:43:43.327 * Connecting to MASTER 172.21.75.157:6379
3759119:S 06 May 2023 13:43:43.327 # Unable to connect to MASTER: Network is unreachable
3759119:S 06 May 2023 13:43:44.329 * Connecting to MASTER 172.21.75.157:6379
3759119:S 06 May 2023 13:43:44.329 # Unable to connect to MASTER: Network is unreachable
3759119:S 06 May 2023 13:43:45.331 * Connecting to MASTER 172.21.75.157:6379
3759119:S 06 May 2023 13:43:45.331 # Unable to connect to MASTER: Network is unreachable
排查redis-server日志发现 redis集群从节点出现问题,一个从节点和其主节点网络连接有问题,但这个从节点却和其他节点通信正常,所以这个从节点没有被踢出集群。
但redisson配置的是从节点读、主节点写,所以导致加锁执行exists命令时 到问题节点读取失败。
解决方式,把redisson配置为主节点读写,不在从节点读,去除从节点不稳定的影响。
本身redis-server也是默认只在主节点读写,如果用redis-cli连接就能发现,读写任何key都会自动跳转到主节点,而不会在从节点。从节点只是为主节点宕机、替换主节点使用。
BaseConfig baseConfig;
baseConfig = config.setCodec(JsonJacksonCodec.INSTANCE)
.useClusterServers().addNodeAddress(xxx)
.setReadMode(ReadMode.MASTER)
经多次redis集群主从切换观察看,redis集群某节点出问题、或网络问题时,redis集群会自己完成主从切换,并且redisson客户端会主动感知到集群状态,并对新的主节点、或从节点初始化新的连接(从redisson日志能观察到),所以客户端不需要做什么。
如果出了类似问题,说明集群某个节点出了问题、而且没有完成踢出节点、或者主从切换,集群本身状态有问题。