MHA(Master High Availability)目前在MySQL高可用方面是一个相对成熟的解决方案,该工具仅适用于MySQL Replication 环境,目的在于维持master主库的高可用性。MHA 是自动的 master 故障转移和 slave 提升的软件包,基于标准的MySQL复制(异步/半同步)。
MHA由两部分组成:MHA Manager
(管理节点)和 MHA Node
(数据节点)。
MHA Manager
会定时探测集群中的master节点,当master出现故障时,它可以自动将最新数据的slave提升为新的master,然后将所有其他的slave重新指向新的master。整个故障转移过程对应用程序完全透明。
在MHA自动故障切换过程中,MHA试图从宕机的主服务器上保存二进制日志,最大程度的保证数据的不丢失,但这并不总是可行的。例如,如果主服务器硬件故障或无法通过ssh访问,MHA没法保存二进制日志,只进行故障转移而丢失了最新的数据。使用半同步复制,可以大大降低数据丢失的风险。MHA可以与半同步复制结合起来。如果只有一个slave已经收到了最新的二进制日志,MHA可以将最新的二进制日志应用于其他所有的slave服务器上,因此可以保证所有节点的数据一致性
通过 masterha_master_monitor
心跳检测脚本监控数据库节点,包括
默认探测4次,每隔 ping_interval=2
秒探测一次,如果主库还没有心跳,则认为主库宕机,进入failover过程
如果设定有权重 candidate_master=1
,按照权重强制指定备选主库
check_repl_delay=0
:不检查日志落后情况,强制选择如果判断从库(position或者GTID),数据有差异,最接近于Master的slave,成为备选主
如果判断从库(position或者GTID),数据一致,按照配置文件顺序选主
当SSH能连接,从库使用 save_binary_logs
脚本对比主库GTID 或者position号,立即将二进制日志保存至各个从节点并且应用
当SSH不能连接,从库之间使用 apply_diff_relay_logs
对比relaylog差异进行补偿
masterha_conf_host
脚本将故障库踢出集群`masterha_check_ssh` 检查MHA的SSH配置状况
`masterha_check_repl` 检查MySQL复制状况
`masterha_manger` 启动MHA
`masterha_check_status` 检测当前MHA运行状态
`masterha_master_monitor` 检测master是否宕机
`masterha_master_switch` 控制故障转移(自动或者手动)
`masterha_conf_host` 添加或删除配置的server信息
`save_binary_logs` 保存和复制master的二进制日志
`apply_diff_relay_logs` 识别差异的中继日志事件并将其差异的事件应用于其他的slave
`filter_mysqlbinlog` 去除不必要的ROLLBACK事件(MHA已不再使用这个工具)
`purge_relay_logs` 清除中继日志(不会阻塞SQL线程)
主机 | MySQL角色 | MHA角色 | server_id | IP |
---|---|---|---|---|
db01 | Master | node | 51 | 192.168.159.51 |
db02 | Slave | node | 52 | 192.168.159.52 |
db03 | Slave、manager | node manager |
53 | 192.168.159.53 |
grant replication slave on *.* to repl@'192.168.159.%' identified by '123';
change master to
master_host='192.168.159.51',
master_user='repl',
master_password='123' ,
MASTER_AUTO_POSITION=1;
start slave;
# 这两个命令会被MHA调用,一定要放在指定目录下
ln -s /app/database/mysql/bin/mysqlbinlog /usr/bin/mysqlbinlog
ln -s /app/database/mysql/bin/mysql /usr/bin/mysql
# db01:
rm -rf /root/.ssh
ssh-keygen
cd /root/.ssh
mv id_rsa.pub authorized_keys
scp -r /root/.ssh 192.168.159.52:/root
scp -r /root/.ssh 192.168.159.53:/root
==========各节点验证==========
# db01:
ssh 192.168.159.51 date
ssh 192.168.159.52 date
ssh 192.168.159.53 date
# db02:
ssh 192.168.159.51 date
ssh 192.168.159.52 date
ssh 192.168.159.53 date
# db03:
ssh 192.168.159.51 date
ssh 192.168.159.52 date
ssh 192.168.159.53 date
下载MHA软件
mha官网:https://code.google.com/archive/p/mysql-master-ha/
github下载地址:https://github.com/yoshinorim/mha4mysql-manager/wiki/Downloads
在所有节点上安装node依赖包及软件
yum install perl-DBD-MySQL -y
rpm -ivh mha4mysql-node-0.56-0.el6.noarch.rpm
在主库中创建MHA专用用户
db01 [(none)]>grant all privileges on *.* to mha@'192.168.159.%' identified by 'mha';
db02 [(none)]>select user,host from mysql.user;
+---------------+---------------+
| user | host |
+---------------+---------------+
| mha | 192.168.159.% |
+---------------+---------------+
db03 [(none)]>select user,host from mysql.user;
+---------------+---------------+
| user | host |
+---------------+---------------+
| mha | 192.168.159.% |
+---------------+---------------+
安装Manager软件(db03)
yum install -y perl-Config-Tiny epel-release perl-Log-Dispatch perl-Parallel-ForkManager perl-Time-HiRes
rpm -ivh mha4mysql-manager-0.56-0.el6.noarch.rpm
# 创建配置文件目录
mkdir -p /etc/mha
# 创建日志目录
mkdir -p /var/log/mha/app1
# 编辑mha配置文件
cat > /etc/mha/app1.cnf << EOF
[server default]
# MHA日志文件
manager_log=/var/log/mha/app1/manager
# MHA工作目录
manager_workdir=/var/log/mha/app1
# 主库binlog位置点
master_binlog_dir=/data/binlog
# MHA用户及密码
user=mha
password=mha
# MHA监控间隔
ping_interval=2
# 主从复制用户及密码
repl_password=123
repl_user=repl
# 互信用户
ssh_user=root
# 节点信息
[server1]
hostname=192.168.159.51
port=3306
[server2]
hostname=192.168.159.52
port=3306
[server3]
hostname=192.168.159.53
port=3306
EOF
互信状态检查
[root@db03 ~]# masterha_check_ssh --conf=/etc/mha/app1.cnf
Sun Jul 5 15:21:28 2020 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Sun Jul 5 15:21:28 2020 - [info] Reading application default configuration from /etc/mha/app1.cnf..
Sun Jul 5 15:21:28 2020 - [info] Reading server configuration from /etc/mha/app1.cnf..
Sun Jul 5 15:21:28 2020 - [info] Starting SSH connection tests..
Sun Jul 5 15:21:29 2020 - [debug]
Sun Jul 5 15:21:28 2020 - [debug] Connecting via SSH from root@192.168.159.51(192.168.159.51:22) to root@192.168.159.52(192.168.159.52:22)..
Sun Jul 5 15:21:29 2020 - [debug] ok.
Sun Jul 5 15:21:29 2020 - [debug] Connecting via SSH from root@192.168.159.51(192.168.159.51:22) to root@192.168.159.53(192.168.159.53:22)..
Sun Jul 5 15:21:29 2020 - [debug] ok.
Sun Jul 5 15:21:30 2020 - [debug]
Sun Jul 5 15:21:29 2020 - [debug] Connecting via SSH from root@192.168.159.53(192.168.159.53:22) to root@192.168.159.51(192.168.159.51:22)..
Sun Jul 5 15:21:30 2020 - [debug] ok.
Sun Jul 5 15:21:30 2020 - [debug] Connecting via SSH from root@192.168.159.53(192.168.159.53:22) to root@192.168.159.52(192.168.159.52:22)..
Sun Jul 5 15:21:30 2020 - [debug] ok.
Sun Jul 5 15:21:30 2020 - [debug]
Sun Jul 5 15:21:29 2020 - [debug] Connecting via SSH from root@192.168.159.52(192.168.159.52:22) to root@192.168.159.51(192.168.159.51:22)..
Sun Jul 5 15:21:29 2020 - [debug] ok.
Sun Jul 5 15:21:29 2020 - [debug] Connecting via SSH from root@192.168.159.52(192.168.159.52:22) to root@192.168.159.53(192.168.159.53:22)..
Sun Jul 5 15:21:29 2020 - [debug] ok.
Sun Jul 5 15:21:30 2020 - [info] All SSH connection tests passed successfully.
主从状态检查
[root@db03 ~]# masterha_check_repl --conf=/etc/mha/app1.cnf
Sun Jul 5 15:23:07 2020 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Sun Jul 5 15:23:07 2020 - [info] Reading application default configuration from /etc/mha/app1.cnf..
Sun Jul 5 15:23:07 2020 - [info] Reading server configuration from /etc/mha/app1.cnf..
Sun Jul 5 15:23:07 2020 - [info] MHA::MasterMonitor version 0.56.
Sun Jul 5 15:23:08 2020 - [info] GTID failover mode = 1
Sun Jul 5 15:23:08 2020 - [info] Dead Servers:
Sun Jul 5 15:23:08 2020 - [info] Alive Servers:
Sun Jul 5 15:23:08 2020 - [info] 192.168.159.51(192.168.159.51:3306)
Sun Jul 5 15:23:08 2020 - [info] 192.168.159.52(192.168.159.52:3306)
Sun Jul 5 15:23:08 2020 - [info] 192.168.159.53(192.168.159.53:3306)
Sun Jul 5 15:23:08 2020 - [info] Alive Slaves:
Sun Jul 5 15:23:08 2020 - [info] 192.168.159.52(192.168.159.52:3306) Version=5.7.20-log (oldest major version between slaves) log-bin:enabled
Sun Jul 5 15:23:08 2020 - [info] GTID ON
Sun Jul 5 15:23:08 2020 - [info] Replicating from 192.168.159.51(192.168.159.51:3306)
Sun Jul 5 15:23:08 2020 - [info] 192.168.159.53(192.168.159.53:3306) Version=5.7.20-log (oldest major version between slaves) log-bin:enabled
Sun Jul 5 15:23:08 2020 - [info] GTID ON
Sun Jul 5 15:23:08 2020 - [info] Replicating from 192.168.159.51(192.168.159.51:3306)
Sun Jul 5 15:23:08 2020 - [info] Current Alive Master: 192.168.159.51(192.168.159.51:3306)
Sun Jul 5 15:23:08 2020 - [info] Checking slave configurations..
Sun Jul 5 15:23:08 2020 - [info] read_only=1 is not set on slave 192.168.159.52(192.168.159.52:3306).
Sun Jul 5 15:23:08 2020 - [info] read_only=1 is not set on slave 192.168.159.53(192.168.159.53:3306).
Sun Jul 5 15:23:08 2020 - [info] Checking replication filtering settings..
Sun Jul 5 15:23:08 2020 - [info] binlog_do_db= , binlog_ignore_db=
Sun Jul 5 15:23:08 2020 - [info] Replication filtering check ok.
Sun Jul 5 15:23:08 2020 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.
Sun Jul 5 15:23:08 2020 - [info] Checking SSH publickey authentication settings on the current master..
Sun Jul 5 15:23:09 2020 - [info] HealthCheck: SSH to 192.168.159.51 is reachable.
Sun Jul 5 15:23:09 2020 - [info]
192.168.159.51(192.168.159.51:3306) (current master)
+--192.168.159.52(192.168.159.52:3306)
+--192.168.159.53(192.168.159.53:3306)
Sun Jul 5 15:23:09 2020 - [info] Checking replication health on 192.168.159.52..
Sun Jul 5 15:23:09 2020 - [info] ok.
Sun Jul 5 15:23:09 2020 - [info] Checking replication health on 192.168.159.53..
Sun Jul 5 15:23:09 2020 - [info] ok.
Sun Jul 5 15:23:09 2020 - [warning] master_ip_failover_script is not defined.
Sun Jul 5 15:23:09 2020 - [warning] shutdown_script is not defined.
Sun Jul 5 15:23:09 2020 - [info] Got exit code 0 (Not master dead).
MySQL Replication Health is OK.
[root@db03 ~]# nohup masterha_manager --conf=/etc/mha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /var/log/mha/app1/manager.log 2>&1 &
[root@db03 ~]# masterha_check_status --conf=/etc/mha/app1.cnf
app1 (pid:4898) is running(0:PING_OK), master:192.168.159.51
应用:
* 没有支持vip,应用端只能通过主库ip连接集群
* 如果主库宕机,需要修改应用端配置
* 高可用性不完整,需要管理员参与
数据:
* 如果主库整体宕掉,SSH无法连接,很有可能丢失部分事务
提醒:
* 没有故障提醒功能,管理员不能及时得到反馈
架构改造方案
vim /etc/mha/app1.cnf
[server default]
master_ip_failover_script=/usr/local/bin/master_ip_failover
准备master_ip_failover脚本
#!/usr/bin/env perl
# Copyright (C) 2011 DeNA Co.,Ltd.
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc.,
# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
## Note: This is a sample script and is not complete. Modify the script based on your environment.
use strict;
use warnings FATAL => 'all';
use Getopt::Long;
use MHA::DBHelper;
my (
$command, $ssh_user, $orig_master_host,
$orig_master_ip, $orig_master_port, $new_master_host,
$new_master_ip, $new_master_port, $new_master_user,
$new_master_password
);
my $vip = '192.168.159.55/24';
my $key = '1';
my $ssh_start_vip = "/sbin/ifconfig ens33:$key $vip";
my $ssh_stop_vip = "/sbin/ifconfig ens33:$key down";
GetOptions(
'command=s' => \$command,
'ssh_user=s' => \$ssh_user,
'orig_master_host=s' => \$orig_master_host,
'orig_master_ip=s' => \$orig_master_ip,
'orig_master_port=i' => \$orig_master_port,
'new_master_host=s' => \$new_master_host,
'new_master_ip=s' => \$new_master_ip,
'new_master_port=i' => \$new_master_port,
'new_master_user=s' => \$new_master_user,
'new_master_password=s' => \$new_master_password,
);
exit &main();
sub main {
print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n";
if ( $command eq "stop" || $command eq "stopssh" ) {
my $exit_code = 1;
eval {
print "Disabling the VIP on old master: $orig_master_host \n";
&stop_vip();
$exit_code = 0;
};
if ($@) {
warn "Got Error: $@\n";
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "start" ) {
my $exit_code = 10;
eval {
print "Enabling the VIP - $vip on the new master - $new_master_host \n";
&start_vip();
$exit_code = 0;
};
if ($@) {
warn $@;
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "status" ) {
print "Checking the Status of the script.. OK \n";
exit 0;
}
else {
&usage();
exit 1;
}
}
sub start_vip() {
`ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`;
}
sub stop_vip() {
return 0 unless ($ssh_user);
`ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
}
sub usage {
print
"Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
}
修改脚本内容
vim /usr/local/bin/master_ip_failover
my $vip = '192.168.159.55/24';
my $key = '1';
my $ssh_start_vip = "/sbin/ifconfig ens33:$key $vip";
my $ssh_stop_vip = "/sbin/ifconfig ens33:$key down";
添加执行权限
chmod +x /usr/local/bin/master_ip_failover
第一个vip地址需要在主库上手工生成,注意一定要和脚本文件中的网卡一致
[root@db01 ~]# ifconfig ens33:1 192.168.159.55/24
[root@db01 ~]# ip ad sh ens33
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
link/ether 00:0c:29:c8:dc:7e brd ff:ff:ff:ff:ff:ff
inet 192.168.159.51/24 brd 192.168.159.255 scope global ens33
valid_lft forever preferred_lft forever
inet 192.168.159.55/24 brd 192.168.159.255 scope global secondary ens33:1
valid_lft forever preferred_lft forever
inet6 fe80::cb00:3b4a:384e:8ddf/64 scope link
valid_lft forever preferred_lft forever
masterha_stop --conf=/etc/mha/app1.cnf
nohup masterha_manager --conf=/etc/mha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /var/log/mha/app1/manager.log 2>&1 &
masterha_check_status --conf=/etc/mha/app1.cnf
app1 (pid:9675) is running(0:PING_OK), master:192.168.159.51
binlog在备份中起着至关重要的作用,备份binlog文件时,只能先在本地备份,然后才能传送到远程服务器上。从MySQL5.6版本后,可以利用mysqlbinlog命令把远程机器的日志备份到本地目录,这样就更加方便地实现binlog日志的安全备份
将binlog server作为MHA的一个节点,最快速度备份主库中所有binlog日志,当主库宕机后各节点就可以从binlog server中拉取binlog日志
找一台额外的机器,必须要有5.6以上的版本,支持gtid开启,这里使用db03
vim/etc/mha/app1.cnf
# 添加配置信息
[binlog1]
# 指定不参与选主
no_master=1
hostname=192.168.159.53
# 指定binlog server数据存放位置,不能与binlog位置相同
master_binlog_dir=/data/mysql/binlog
mkdir -p /data/mysql/binlog
chown -R mysql.mysql /data/*
常用参数
-R
:–read-from-remote-server,表示开启binlog备份,在对应的主节点上请求binlog到本地--raw
:被复制过来的binnlog以二进制的格式存放,如果不加该参数则为text格式-r
:–result-file,指定目录或文件名前缀-t
:表示从指定的binlog开始拉取,直到当前主节点上binlog的最后一个--stop-never
:持续从主节点拉取binlog,持续备份到当前最后一个,并继续下去,该选项包含 -t
--stop-nerver-slave-server-id
:拉取binlog所使用的身份id,默认65535,用在多个mysqlbinlog进程或多个从服务器的情况下,避免ID冲突# 必须先进入指定数据目录
cd /data/mysql/binlog
# 拉取日志的起点按需指定,一般按照目前从库已获取到的binlog日志点为起点(主库当前binlog位置点)
mysqlbinlog -R --host=192.168.159.51 --user=mha --password=mha --raw --stop-never mysql-bin.000001 &
[root@db03 ~]# masterha_stop --conf=/etc/mha/app1.cnf
[root@db03 ~]# nohup masterha_manager --conf=/etc/mha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /var/log/mha/app1/manager.log 2>&1 &
[root@db03 ~]# masterha_check_status --conf=/etc/mha/app1.cnf
app1 (pid:9675) is running(0:PING_OK), master:192.168.159.51
vim /etc/mha/app1.cnf
# 添加配置
[server default]
report_script=/usr/local/bin/send
# 安装第三方smtp软件
yum install -y sendemail
# 编写邮件脚本
vim /usr/local/bin/send
#/bin/bash
/usr/bin/sendemail -o tls=no -f [email protected] -t [email protected] -s smtp.qq.com:25 -xu 979828334 -xp ctujxaczkfkmbbjc -u "MHA Waring" -m "YOUR MHA MAY BE FAILOVER" &> /tmp/sendemail.log
masterha_stop --conf=/etc/mha/app1.cnf
nohup masterha_manager --conf=/etc/mha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /var/log/mha/app1/manager.log 2>&1 &
masterha_check_status --conf=/etc/mha/app1.cnf
app1 (pid:9675) is running(0:PING_OK), master:192.168.159.51
宕掉主库测试,预想结果:
[root@db01 ~]# systemctl stop mysqld
----- Failover Report -----
app1: MySQL Master failover 192.168.159.51(192.168.159.51:3306) to 192.168.159.53(192.168.159.53:3306) succeeded
Master 192.168.159.51(192.168.159.51:3306) is down!
Check MHA Manager logs at db03:/var/log/mha/app1/manager for details.
Started automated(non-interactive) failover.
Invalidated master IP address on 192.168.159.51(192.168.159.51:3306)
Selected 192.168.159.53(192.168.159.53:3306) as a new master.
192.168.159.53(192.168.159.53:3306): OK: Applying all logs succeeded.
192.168.159.53(192.168.159.53:3306): OK: Activated master IP address.
192.168.159.53(192.168.159.53:3306): Resetting slave info succeeded.
Master failover to 192.168.159.53(192.168.159.53:3306) completed successfully.
Sun Jul 5 21:23:42 2020 - [info] Sending mail..
db03 [(none)]> show slave status\G
Master_Host: 192.168.159.52
[binlog1]
hostname=192.168.159.53
master_binlog_dir=/data/mysql/binlog
no_master=1
[server default]
manager_log=/var/log/mha/app1/manager
manager_workdir=/var/log/mha/app1
master_binlog_dir=/data/binlog
master_ip_failover_script=/usr/local/bin/master_ip_failover
password=mha
ping_interval=2
repl_password=123
repl_user=repl
report_script=/usr/local/bin/send
ssh_user=root
user=mha
[server2]
hostname=192.168.159.52
[server3]
hostname=192.168.159.53
port=3306
[root@db03 ~]# ps -ef | grep manager
root 14734 10028 0 21:29 pts/1 00:00:00 grep --color=auto manager
[root@db03 ~]# ps -ef | grep mysqlbinlog
root 14736 10028 0 21:29 pts/1 00:00:00 grep --color=auto mysqlbinlog
[root@db03 ~]# tail /var/log/mha/app1/manager
[root@db01 ~]# systemctl start mysqld
# 将故障库修好后,手工加入已有的主从中作为从库
db01 [(none)]>change master to
-> master_host='192.168.159.52',
-> master_user='repl',
-> master_password='123' ,
-> MASTER_AUTO_POSITION=1;
db01 [(none)]>start slave;
# 添加故障节点信息
[server1]
hostname=192.168.159.51
port=3306
[root@db03 binlog]# rm -rf *
[root@db03 binlog]# mysqlbinlog -R --host=192.168.159.52 --user=mha --password=mha --raw --stop-never mysql-bin.000001 &
[root@db03 binlog]# masterha_check_ssh --conf=/etc/mha/app1.cnf
Sun Jul 5 21:42:20 2020 - [info] All SSH connection tests passed successfully.
[root@db03 binlog]# masterha_check_repl --conf=/etc/mha/app1.cnf
MySQL Replication Health is OK.
[root@db02 ~]# ip ad sh ens33
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
inet 192.165.159.55/24 brd 192.165.159.255 scope global ens33:1
valid_lft forever preferred_lft forever
[root@db03 binlog]# nohup masterha_manager --conf=/etc/mha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /var/log/mha/app1/manager.log 2>&1 &
[2] 14866
[root@db03 binlog]# masterha_check_status --conf=/etc/mha/app1.cnf
app1 (pid:14866) is running(0:PING_OK), master:192.168.159.52