【0】redis的主从数据库实现, 参见: redis的主从数据库复制功能
1)redis的主从数据库的作用: 在一主多从的redis系统中, 从数据库起到了 数据冗余备份和读写分离的作用;
2)redis2.8提供的哨兵: 用来实现自动化的系统监控和故障恢复功能;
【1】启动哨兵进程
1)编辑哨兵启动配置文件
在redis_home 目录添加 redis-sentinel.conf 文件, 并修改 其监控的主数据库信息, 如下:
sentinel monitor mymaster 192.168.186.100 6379 1
其中, mymaster是监控的主数据库的名称,可以自定义,将其与 192.168.186.100 绑定即可; 6379 端口号; 1 表示最低通过票数;
(该conf文件, 在文末会贴出)
2)启动哨兵,
[pacoson@localhost redis-4.0.8]$ redis-sentinel sentinel.conf
3256:X 09 Mar 07:46:54.108 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
3256:X 09 Mar 07:46:54.108 # Redis version=4.0.8, bits=32, commit=00000000, modified=0, pid=3256, just started
3256:X 09 Mar 07:46:54.108 # Configuration loaded
3256:X 09 Mar 07:46:54.111 # You requested maxclients of 10000 requiring at least 10032 max file descriptors.
3256:X 09 Mar 07:46:54.111 # Server can't set maximum open files to 10032 because of OS error: Operation not permitted.
3256:X 09 Mar 07:46:54.111 # Current maximum open files is 4096. maxclients has been reduced to 4064 to compensate for low ulimit. If you need higher maxclients increase 'ulimit -n'.
3256:X 09 Mar 07:46:54.113 # Warning: 32 bit instance detected but no memory limit set. Setting 3 GB maxmemory limit with 'noeviction' policy now.
_._
_.-``__ ''-._
_.-`` `. `_. ''-._ Redis 4.0.8 (00000000/0) 32 bit
.-`` .-```. ```\/ _.,_ ''-._
( ' , .-` | `, ) Running in sentinel mode
|`-._`-...-` __...-.``-._|'` _.-'| Port: 26379
| `-._ `._ / _.-' | PID: 3256
`-._ `-._ `-./ _.-' _.-'
|`-._`-._ `-.__.-' _.-'_.-'|
| `-._`-._ _.-'_.-' | http://redis.io
`-._ `-._`-.__.-'_.-' _.-'
|`-._`-._ `-.__.-' _.-'_.-'|
| `-._`-._ _.-'_.-' |
`-._ `-._`-.__.-'_.-' _.-'
`-._ `-.__.-' _.-'
`-._ _.-'
`-.__.-'
3256:X 09 Mar 07:46:54.115 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
3256:X 09 Mar 07:46:54.142 # Sentinel ID is 285f3e0832faf2c4c94a7ce89c2011fe76beb797
3256:X 09 Mar 07:46:54.142 # +monitor master mymaster 192.168.186.100 6379 quorum 1
3256:X 09 Mar 07:46:54.147 * +slave slave 192.168.186.100:6380 192.168.186.100 6380 @ mymaster 192.168.186.100 6379
3256:X 09 Mar 07:46:54.151 * +slave slave 192.168.186.100:6381 192.168.186.100 6381 @ mymaster 192.168.186.100 6379
【说明】
1)最后4行的启动日志非常重要:
哨兵编号;
+monitor: 哨兵监控的主数据库名称及其 ip地址, 端口号等;
+slave: 表示新发现了 从数据库, 显然哨兵发现了两个从数据库;
【3】停止主数据库的服务器
[pacoson@localhost redis-4.0.8]$ ps -ef |grep redis
pacoson 2112 1 0 06:23 ? 00:00:17 redis-server 192.168.186.100:6379
pacoson 2557 1931 0 06:25 pts/0 00:00:00 redis-cli -h 192.168.186.100
pacoson 2908 2533 0 07:23 pts/1 00:00:08 redis-server *:6380
pacoson 2913 2854 0 07:23 pts/5 00:00:08 redis-server *:6381
pacoson 2918 2662 0 07:23 pts/2 00:00:00 redis-cli -h 192.168.186.100 -p 6380
pacoson 2920 2793 0 07:24 pts/4 00:00:00 redis-cli -h 192.168.186.100 -p 6381
pacoson 3256 2733 1 07:46 pts/3 00:00:06 redis-sentinel *:26379 [sentinel]
pacoson 3314 3270 1 07:55 pts/7 00:00:00 grep redis
[pacoson@localhost redis-4.0.8]$ kill 2112
[pacoson@localhost redis-4.0.8]$ kill 2112
-bash: kill: (2112) - 没有那个进程
[pacoson@localhost redis-4.0.8]$ ps -ef |grep redis
pacoson 2908 2533 0 07:23 pts/1 00:00:08 redis-server *:6380
pacoson 2913 2854 0 07:23 pts/5 00:00:08 redis-server *:6381
pacoson 2918 2662 0 07:23 pts/2 00:00:00 redis-cli -h 192.168.186.100 -p 6380
pacoson 2920 2793 0 07:24 pts/4 00:00:00 redis-cli -h 192.168.186.100 -p 6381
pacoson 3256 2733 1 07:46 pts/3 00:00:07 redis-sentinel *:26379 [sentinel]
pacoson 3317 3270 0 07:56 pts/7 00:00:00 grep redis
[pacoson@localhost redis-4.0.8]$
【4】等待大约30秒后(可配置), 哨兵进程输出如下内容:
3256:X 09 Mar 07:46:54.115 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
3256:X 09 Mar 07:46:54.142 # Sentinel ID is 285f3e0832faf2c4c94a7ce89c2011fe76beb797
3256:X 09 Mar 07:46:54.142 # +monitor master mymaster 192.168.186.100 6379 quorum 1
3256:X 09 Mar 07:46:54.147 * +slave slave 192.168.186.100:6380 192.168.186.100 6380 @ mymaster 192.168.186.100 6379
3256:X 09 Mar 07:46:54.151 * +slave slave 192.168.186.100:6381 192.168.186.100 6381 @ mymaster 192.168.186.100 6379
// A:主服务器停止后, 哨兵进程的监控日志
3256:X 09 Mar 07:56:49.816 # +sdown master mymaster 192.168.186.100 6379
3256:X 09 Mar 07:56:49.816 # +odown master mymaster 192.168.186.100 6379 #quorum 1/1
3256:X 09 Mar 07:56:49.816 # +new-epoch 1
// B: 哨兵尝试挑选一个从数据库升级为主数据库,即进行故障恢复;
3256:X 09 Mar 07:56:49.816 # +try-failover master mymaster 192.168.186.100 6379
3256:X 09 Mar 07:56:49.841 # +vote-for-leader 285f3e0832faf2c4c94a7ce89c2011fe76beb797 1
3256:X 09 Mar 07:56:49.841 # +elected-leader master mymaster 192.168.186.100 6379
3256:X 09 Mar 07:56:49.841 # +failover-state-select-slave master mymaster 192.168.186.100 6379
3256:X 09 Mar 07:56:49.896 # +selected-slave slave 192.168.186.100:6380 192.168.186.100 6380 @ mymaster 192.168.186.100 6379
3256:X 09 Mar 07:56:49.896 * +failover-state-send-slaveof-noone slave 192.168.186.100:6380 192.168.186.100 6380 @ mymaster 192.168.186.100 6379
3256:X 09 Mar 07:56:49.973 * +failover-state-wait-promotion slave 192.168.186.100:6380 192.168.186.100 6380 @ mymaster 192.168.186.100 6379
3256:X 09 Mar 07:56:50.814 # +promoted-slave slave 192.168.186.100:6380 192.168.186.100 6380 @ mymaster 192.168.186.100 6379
3256:X 09 Mar 07:56:50.814 # +failover-state-reconf-slaves master mymaster 192.168.186.100 6379
3256:X 09 Mar 07:56:50.886 * +slave-reconf-sent slave 192.168.186.100:6381 192.168.186.100 6381 @ mymaster 192.168.186.100 6379
3256:X 09 Mar 07:56:51.685 * +slave-reconf-inprog slave 192.168.186.100:6381 192.168.186.100 6381 @ mymaster 192.168.186.100 6379
3256:X 09 Mar 07:56:52.754 * +slave-reconf-done slave 192.168.186.100:6381 192.168.186.100 6381 @ mymaster 192.168.186.100 6379
3256:X 09 Mar 07:56:52.817 # +failover-end master mymaster 192.168.186.100 6379
3256:X 09 Mar 07:56:52.817 # +switch-master mymaster 192.168.186.100 6379 192.168.186.100 6380
3256:X 09 Mar 07:56:52.817 * +slave slave 192.168.186.100:6381 192.168.186.100 6381 @ mymaster 192.168.186.100 6380
3256:X 09 Mar 07:56:52.818 * +slave slave 192.168.186.100:6379 192.168.186.100 6379 @ mymaster 192.168.186.100 6380
3256:X 09 Mar 07:57:22.857 # +sdown slave 192.168.186.100:6379 192.168.186.100 6379 @ mymaster 192.168.186.100 6380
【说明】
1)+sdown:表示哨兵主观认为主数据库服务停止了;
2)+odown:表示哨兵客观认为主数据库服务停止了;
3)此时哨兵开始执行故障恢复, 挑选一个从数据库, 将其升级为 主数据库, 输出如下内容:
+try-failover: 表示哨兵开始进行故障恢复;
+failover-end:表示哨兵完成故障恢复, 过程复杂, 包括领头哨兵选择,备选从数据库的选择等;
【4.1】关注最后4条输出:
3256:X 09 Mar 07:56:52.817 # +switch-master mymaster 192.168.186.100 6379 192.168.186.100 6380
3256:X 09 Mar 07:56:52.817 * +slave slave 192.168.186.100:6381 192.168.186.100 6381 @ mymaster 192.168.186.100 6380
3256:X 09 Mar 07:56:52.818 * +slave slave 192.168.186.100:6379 192.168.186.100 6379 @ mymaster 192.168.186.100 6380
3256:X 09 Mar 07:57:22.857 # +sdown slave 192.168.186.100:6379 192.168.186.100 6379 @ mymaster 192.168.186.100 6380
【说明】
1) +switch-master: 表示主数据库从 6379 端口迁移到 6380 端口上了;即 6380端口上的redis 服务升级为 主数据库;
2)+slave :列出了两个新的从数据库, 包括 6381 和 6379;
3)为什么 6379 还会被当做从数据库呢? 因为停止服务的实例有可能会在之后的某个时间恢复服务,恢复服务后,该数据库将作为 6380主数据库的从数据库;
【5】故障恢复后, 可以info replication 检查6380 和 6381 上的复制信息;
192.168.186.100:6380> info replication
# Replication
role:master // 主数据库
connected_slaves:1
slave0:ip=192.168.186.100,port=6381,state=online,offset=89055,lag=0
master_replid:4327aef198c3c958e0d8a23414ccb69498cca6f2
master_replid2:28c0673df46b0bceda233e58a7a3cd9ae9d5b2e6
master_repl_offset:89200
second_repl_offset:43056
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:43
repl_backlog_histlen:89158
192.168.186.100:6380>
192.168.186.100:6381> info replication
# Replication
role:slave // 从数据库
master_host:192.168.186.100
master_port:6380
master_link_status:up
master_last_io_seconds_ago:0
master_sync_in_progress:0
slave_repl_offset:90098
slave_priority:100
slave_read_only:1
connected_slaves:0
master_replid:4327aef198c3c958e0d8a23414ccb69498cca6f2
master_replid2:28c0673df46b0bceda233e58a7a3cd9ae9d5b2e6
master_repl_offset:90098
second_repl_offset:43056
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:211
repl_backlog_histlen:89888
192.168.186.100:6381>
redis-server redis.conf
1)启动后, 哨兵进程打印日志如下:
3256:X 09 Mar 08:09:55.694 # -sdown slave 192.168.186.100:6379 192.168.186.100 6379 @ mymaster 192.168.186.100 6380
3256:X 09 Mar 08:10:05.693 * +convert-to-slave slave 192.168.186.100:6379 192.168.186.100 6379 @ mymaster 192.168.186.100 6380
2) 客户端连接 6379 数据库,并打印复制信息,
192.168.186.100:6379> info replication
# Replication
role:slave // 从数据库
master_host:192.168.186.100
master_port:6380 // 主数据库端口为 6380
master_link_status:up
master_last_io_seconds_ago:0
master_sync_in_progress:0
slave_repl_offset:103858
slave_priority:100
slave_read_only:1
connected_slaves:0
master_replid:4327aef198c3c958e0d8a23414ccb69498cca6f2
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:103858
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:100141
repl_backlog_histlen:3718
192.168.186.100:6379>
192.168.186.100:6380> info replication
# Replication
role:master // 主数据库
connected_slaves:2 // 附带2个从 数据库 6381 和 6379
slave0:ip=192.168.186.100,port=6381,state=online,offset=110813,lag=0
slave1:ip=192.168.186.100,port=6379,state=online,offset=110813,lag=0
master_replid:4327aef198c3c958e0d8a23414ccb69498cca6f2
master_replid2:28c0673df46b0bceda233e58a7a3cd9ae9d5b2e6
master_repl_offset:110813
second_repl_offset:43056
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:43
repl_backlog_histlen:110771
192.168.186.100:6380>
【补充】sentinel.conf 文件
# Example sentinel.conf
# *** IMPORTANT ***
#
# By default Sentinel will not be reachable from interfaces different than
# localhost, either use the 'bind' directive to bind to a list of network
# interfaces, or disable protected mode with "protected-mode no" by
# adding it to this configuration file.
#
# Before doing that MAKE SURE the instance is protected from the outside
# world via firewalling or other means.
#
# For example you may use one of the following:
#
# bind 127.0.0.1 192.168.1.1
#
# protected-mode no
# port
# The port that this sentinel instance will run on
port 26379
# sentinel announce-ip
# sentinel announce-port
#
# The above two configuration directives are useful in environments where,
# because of NAT, Sentinel is reachable from outside via a non-local address.
#
# When announce-ip is provided, the Sentinel will claim the specified IP address
# in HELLO messages used to gossip its presence, instead of auto-detecting the
# local address as it usually does.
#
# Similarly when announce-port is provided and is valid and non-zero, Sentinel
# will announce the specified TCP port.
#
# The two options don't need to be used together, if only announce-ip is
# provided, the Sentinel will announce the specified IP and the server port
# as specified by the "port" option. If only announce-port is provided, the
# Sentinel will announce the auto-detected local IP and the specified port.
#
# Example:
#
# sentinel announce-ip 1.2.3.4
# dir
# Every long running process should have a well-defined working directory.
# For Redis Sentinel to chdir to /tmp at startup is the simplest thing
# for the process to don't interfere with administrative tasks such as
# unmounting filesystems.
dir "/tmp"
# sentinel monitor
#
# Tells Sentinel to monitor this master, and to consider it in O_DOWN
# (Objectively Down) state only if at least sentinels agree.
#
# Note that whatever is the ODOWN quorum, a Sentinel will require to
# be elected by the majority of the known Sentinels in order to
# start a failover, so no failover can be performed in minority.
#
# Slaves are auto-discovered, so you don't need to specify slaves in
# any way. Sentinel itself will rewrite this configuration file adding
# the slaves using additional configuration options.
# Also note that the configuration file is rewritten when a
# slave is promoted to master.
#
# Note: master name should not include special characters or spaces.
# The valid charset is A-z 0-9 and the three characters ".-_".
sentinel myid 285f3e0832faf2c4c94a7ce89c2011fe76beb797
# sentinel auth-pass
#
# Set the password to use to authenticate with the master and slaves.
# Useful if there is a password set in the Redis instances to monitor.
#
# Note that the master password is also used for slaves, so it is not
# possible to set a different password in masters and slaves instances
# if you want to be able to monitor these instances with Sentinel.
#
# However you can have Redis instances without the authentication enabled
# mixed with Redis instances requiring the authentication (as long as the
# password set is the same for all the instances requiring the password) as
# the AUTH command will have no effect in Redis instances with authentication
# switched off.
#
# Example:
#
# sentinel auth-pass mymaster MySUPER--secret-0123passw0rd
# sentinel down-after-milliseconds
#
# Number of milliseconds the master (or any attached slave or sentinel) should
# be unreachable (as in, not acceptable reply to PING, continuously, for the
# specified period) in order to consider it in S_DOWN state (Subjectively
# Down).
#
# Default is 30 seconds.
sentinel monitor mymaster 192.168.186.100 6379 1 // highlight
# sentinel parallel-syncs
#
# How many slaves we can reconfigure to point to the new slave simultaneously
# during the failover. Use a low number if you use the slaves to serve query
# to avoid that all the slaves will be unreachable at about the same
# time while performing the synchronization with the master.
sentinel config-epoch mymaster 0
# sentinel failover-timeout
#
# Specifies the failover timeout in milliseconds. It is used in many ways:
#
# - The time needed to re-start a failover after a previous failover was
# already tried against the same master by a given Sentinel, is two
# times the failover timeout.
#
# - The time needed for a slave replicating to a wrong master according
# to a Sentinel current configuration, to be forced to replicate
# with the right master, is exactly the failover timeout (counting since
# the moment a Sentinel detected the misconfiguration).
#
# - The time needed to cancel a failover that is already in progress but
# did not produced any configuration change (SLAVEOF NO ONE yet not
# acknowledged by the promoted slave).
#
# - The maximum time a failover in progress waits for all the slaves to be
# reconfigured as slaves of the new master. However even after this time
# the slaves will be reconfigured by the Sentinels anyway, but not with
# the exact parallel-syncs progression as specified.
#
# Default is 3 minutes.
sentinel leader-epoch mymaster 0
# SCRIPTS EXECUTION
#
# sentinel notification-script and sentinel reconfig-script are used in order
# to configure scripts that are called to notify the system administrator
# or to reconfigure clients after a failover. The scripts are executed
# with the following rules for error handling:
#
# If script exits with "1" the execution is retried later (up to a maximum
# number of times currently set to 10).
#
# If script exits with "2" (or an higher value) the script execution is
# not retried.
#
# If script terminates because it receives a signal the behavior is the same
# as exit code 1.
#
# A script has a maximum running time of 60 seconds. After this limit is
# reached the script is terminated with a SIGKILL and the execution retried.
# NOTIFICATION SCRIPT
#
# sentinel notification-script
#
# Call the specified notification script for any sentinel event that is
# generated in the WARNING level (for instance -sdown, -odown, and so forth).
# This script should notify the system administrator via email, SMS, or any
# other messaging system, that there is something wrong with the monitored
# Redis systems.
#
# The script is called with just two arguments: the first is the event type
# and the second the event description.
#
# The script must exist and be executable in order for sentinel to start if
# this option is provided.
#
# Example:
#
# sentinel notification-script mymaster /var/redis/notify.sh
# CLIENTS RECONFIGURATION SCRIPT
#
# sentinel client-reconfig-script
#
# When the master changed because of a failover a script can be called in
# order to perform application-specific tasks to notify the clients that the
# configuration has changed and the master is at a different address.
#
# The following arguments are passed to the script:
#
#
#
# is currently always "failover"
# is either "leader" or "observer"
#
# The arguments from-ip, from-port, to-ip, to-port are used to communicate
# the old address of the master and the new address of the elected slave
# (now a master).
#
# This script should be resistant to multiple invocations.
#
# Example:
#
# sentinel client-reconfig-script mymaster /var/redis/reconfig.sh
# Generated by CONFIG REWRITE
maxclients 4064
maxmemory 3gb
sentinel known-slave mymaster 192.168.186.100 6381
sentinel known-slave mymaster 192.168.186.100 6380
sentinel current-epoch 0