MySQL
作为数据存储工具,可以说是整个架构体系中最重要的一环都不为过。无论是怎样的架构,怎样的设计,都不能离开关系型数据库。如果数据库故障了,整个系统肯定是不可用的,所以MySQL
的高可用非常重要。本篇主要从理论上讲解常见的MySQL高可用架构MMM
和MHA
,以及从零开始,一步步搭建一个高可用的MHA
架构。
本篇内容是基于上一篇来扩展的,所以请务必阅读一下这篇前置阅读。
MySQL
主从复制常见的高可用架构有两种,MMM
和MHA
。想要实现MySQL
主从复制的高可用,需要实现以下几点功能
master
节点进行监控master
节点宕机后把VIP
(Virtual IP Address,即虚拟IP)迁移到新的master
节点slave
节点从新的master
同步MMM
(Master-Master replication manager for MySQL)是一套支持双主故障切换和双主日常管理的脚本程序。主要用来监控和管理Master-Master
(双主)复制,虽然叫做双主复制,但是同一个时刻只有一个master
,另一个作为master
的备份,以加速在主主切换时刻备选master
的预热,一方面实现了故障自动切换的功能,另一方面也可以实现多个slave
的读负载均衡。
MMM
的整体架构图如下所示
结合MMM
的架构图,我们可以知道
master
节点,两个master
节点互为主备。同一时刻,只能有一个master
对外提供服务slave
节点master
分配一个VIP
,只能在主备之间切换;给每个slave
节点分配一个读VIP
,可以在任意slave
节点上切换当master
宕机时,MMM
管理工具会把所有的slave
节点切换为主备的slave
。并且把写VIP
迁移到主备服务器上,slave
节点从新的master
节点上同步数据,整个过程简单粗暴,所以无法保证数据的一致性。
当slave
节点宕机时,MMM
管理工具会把读VIP
迁移到其他slave
节点,slave
节点可以有多个VIP
。
GTID
的复制方式(可以自行修改perl
脚本)MHA
(Master High Availability)是一款开源的 MySQL 的高可用程序。MHA
管理工具在监控到master
节点故障时,会提升拥有master
最新数据的slave
节点成为新的master
节点,并且会让其他的slave
节点从新的master
节点上同步数据。MHA
还提供了master
节点的在线切换功能,即按需切换master/slave
节点。
MHA
的架构图如下所示
从MHA
架构图可以知道,MHA
只监控master
的健康状态,当master
宕机时,MHA
管理工具会从master
所有的slave
中选出一个最接近master
的节点提升为新的master
。
MHA
管理下的MySQL
主从复制,master
故障之后,会经过如下过程进行故障转移
master
的VIP
,挑选具有最新数据的slave
master
保存二进制日志(如果仅仅是MySQL
实例宕机,则有可能成功)relay log
)到其它slave
,因为作为备选master
的节点的中继日志,和其它slave
节点的中继日志可能有差异,所以要把备选master
节点的中继日志应用到其它slave
节点master
保存的二进制日志(如果第二步成功)master
提升为新的master
slave
从新的master
同步,把写VIP
迁移到新的master
上GTID
的复制方式和基于日志点的复制方式slave
节点中选取最适合的master
master
中尽可能保存更多的未同步日志master
的未同步日志VIP
转移脚本master
,而没有对slave
实现高可用GTID
的复制方式整个MHA的搭建过程虽然不算复杂,但是涉及到的步骤较多,建议先整体阅读一下,再动手实践。
首先在master
节点(192.168.1.101
)上执行,一路回车即可
ssh-keygen
执行结果如下
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:HFpSaM7IVW+TjQVUM0m1JBNgrnhH85O3wuur58sev1E [email protected]
The key's randomart image is:
+---[RSA 2048]----+
| oo.==X+o |
| +. + =.* . |
| . *. o X . . |
| o o* = + . |
| o S . + . E|
| . . . o o |
| + o |
| ..= . |
| .*Ooo. |
+----[SHA256]-----+
把生成的/root/.ssh/id_rsa
拷贝到三个节点上(包括自己)
ssh-copy-id -i /root/.ssh/id_rsa [email protected]
ssh-copy-id -i /root/.ssh/id_rsa [email protected]
ssh-copy-id -i /root/.ssh/id_rsa [email protected]
执行完成后,在192.168.1.101
使用ssh
命令连接到102
和103
上是不需要密码的
ssh 192.168.1.102
上述操作需要在192.168.1.102
和192.168.1.103
上均执行一遍
wget https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
rpm -ivh epel-release-latest-7.noarch.rpm
vim /etc/yum.repos.d/epel.repo
只需要修改一项内容,就是epel
节点下的gpgcheck
[epel]
...
## 只需要修改epel节点下的gpgcheck属性
gpgcheck=0
...
上述操作需要在所有节点上均执行一遍
执行如下命令
yum -y install perl-DBD-MySQL ncftp perl-DBI.x86
上述操作需要在所有节点上均执行一遍
下载地址:https://download.csdn.net/download/Baisitao_/12505957
## 安装mha-node
rpm -ivh mha4mysql-node-0.57-0.el7.noarch.rpm
上述操作需要在所有节点上均执行一遍
严格来说,监控工具应该安装在一个单独的节点,此处为了节约一个节点,就安装在192.168.1.103
上。
yum -y install perl-Config-Tiny.noarch perl-Time-HiRes.x86_64 perl-Parallel-ForkManager perl-Log-Dispatch
安装完成后就可以开始安装mha-manager
了
rpm -ivh mha4mysql-manager-0.57-0.el7.noarch.rpm
在监控节点(192.168.1.103
)上,创建mha
的配置目录
## 配置目录
mkdir -p /etc/mha
在每个节点上创建mha
工作目录
## 工作目录,该目录用于master宕机时,slave将master的bin log拷贝到此目录
mkdir -p /root/mha
在master
节点(192.168.1.101
)上,创建mha
需要用到的账户,并授权
## 创建用户
create user dba_mha@'192.168.1.%' identified by 'your password';
## 授权
grant all privileges on *.* to dba_mha@'192.168.1.%';
在监控节点(192.168.1.103
)上新建并编辑配置文件
vim /etc/mha/mysql-mha.conf
配置如下内容,根据自己的实际情况进行修改(password
、ip
、目录等)
[server default]
user=dba_mha
## 注意改成自己的密码
password=your password
manager_workdir=/root/mha
manager_log=/root/mha/manager.log
remote_workdir=/root/mha
ssh_user=root
repl_password=your password
ping_interval=1
master_binlog_dir=/home/mysql/sql_log
ssh_port=22
master_ip_failover_script=/usr/bin/master_ip_failover
secondary_check_script=/usr/bin/masterha_secondary_check -s 192.168.1.101 -s 192.168.1.102 -s 192.168.1.103
[server1]
hostname=192.168.1.101
candidate_master=1
[server2]
hostname=192.168.1.102
candidate_master=1
[server3]
hostname=192.168.1.103
## 该节点也是监控节点,所以关闭master候选
no_master=1
从配置文件可以看到,参数master_ip_failover_script
配置了master
故障时,需要执行写VIP
的故障转移脚本/usr/bin/master_ip_failover
。所以还需要配置这个脚本,创建并编辑这个脚本
vim /usr/bin/master_ip_failover
配置如下内容,根据自己的实际情况进行修改
#!/usr/bin/env perl
use strict;
use warnings FATAL => 'all';
use Getopt::Long;
my (
$command, $orig_master_host, $orig_master_ip,$ssh_user,
$orig_master_port, $new_master_host, $new_master_ip,$new_master_port,
$orig_master_ssh_port,$new_master_ssh_port,$new_master_user,$new_master_password
);
my $vip = '192.168.1.88/24';
my $key = '1';
my $ssh_start_vip = "/sbin/ifconfig ens33:$key $vip";
my $ssh_stop_vip = "/sbin/ifconfig ens33:$key down";
my $ssh_Bcast_arp= "/sbin/arping -I ens33 -c 3 -A 192.168.1.88";
GetOptions(
'command=s' => \$command,
'ssh_user=s' => \$ssh_user,
'orig_master_host=s' => \$orig_master_host,
'orig_master_ip=s' => \$orig_master_ip,
'orig_master_port=i' => \$orig_master_port,
'orig_master_ssh_port=i' => \$orig_master_ssh_port,
'new_master_host=s' => \$new_master_host,
'new_master_ip=s' => \$new_master_ip,
'new_master_port=i' => \$new_master_port,
'new_master_ssh_port' => \$new_master_ssh_port,
'new_master_user' => \$new_master_user,
'new_master_password' => \$new_master_password
);
exit &main();
sub main {
$ssh_user = defined $ssh_user ? $ssh_user : 'root';
print "\n\nIN SCRIPT TEST====$ssh_user|$ssh_stop_vip==$ssh_user|$ssh_start_vip===\n\n";
if ( $command eq "stop" || $command eq "stopssh" ) {
my $exit_code = 1;
eval {
print "Disabling the VIP on old master: $orig_master_host \n";
&stop_vip();
$exit_code = 0;
};
if ($@) {
warn "Got Error: $@\n";
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "start" ) {
my $exit_code = 10;
eval {
print "Enabling the VIP - $vip on the new master - $new_master_host \n";
&start_vip();
&start_arp();
$exit_code = 0;
};
if ($@) {
warn $@;
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "status" ) {
print "Checking the Status of the script.. OK \n";
exit 0;
}
else {
&usage();
exit 1;
}
}
sub start_vip() {
`ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`;
}
sub stop_vip() {
`ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
}
sub start_arp() {
`ssh $ssh_user\@$new_master_host \" $ssh_Bcast_arp \"`;
}
sub usage {
print
"Usage: master_ip_failover --command=start|stop|stopssh|status --ssh_user=user --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
}
配置文件中,值得注意的地方(第14行开始)如下图所示
vip
表示读的虚拟IP,而不是master
节点的IP。ens33
是网络接口的名称,可以通过ifconfig
查看
这个脚本实现了master
故障时,写VIP
的自动转移。
脚本编辑完成后,赋予可执行的权限
chmod +x /usr/bin/master_ip_failover
由于配置内容比较多,不能保证全部都正确,所以可以先校验一下相关配置,在监控节点(192.168.1.103
)上执行
检查SSH
配置
masterha_check_ssh --conf=/etc/mha/mysql-mha.conf
执行结果
Tue Jun 9 22:11:11 2020 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Tue Jun 9 22:11:11 2020 - [info] Reading application default configuration from /etc/mha/mysql-mha.conf..
Tue Jun 9 22:11:11 2020 - [info] Reading server configuration from /etc/mha/mysql-mha.conf..
Tue Jun 9 22:11:11 2020 - [info] Starting SSH connection tests..
Tue Jun 9 22:11:16 2020 - [debug]
Tue Jun 9 22:11:12 2020 - [debug] Connecting via SSH from [email protected](192.168.1.103:22) to [email protected](192.168.1.101:22)..
Tue Jun 9 22:11:14 2020 - [debug] ok.
Tue Jun 9 22:11:14 2020 - [debug] Connecting via SSH from [email protected](192.168.1.103:22) to [email protected](192.168.1.102:22)..
Tue Jun 9 22:11:15 2020 - [debug] ok.
Tue Jun 9 22:11:19 2020 - [debug]
Tue Jun 9 22:11:11 2020 - [debug] Connecting via SSH from [email protected](192.168.1.101:22) to [email protected](192.168.1.102:22)..
Tue Jun 9 22:11:17 2020 - [debug] ok.
Tue Jun 9 22:11:17 2020 - [debug] Connecting via SSH from [email protected](192.168.1.101:22) to [email protected](192.168.1.103:22)..
Tue Jun 9 22:11:18 2020 - [debug] ok.
Tue Jun 9 22:11:25 2020 - [debug]
Tue Jun 9 22:11:12 2020 - [debug] Connecting via SSH from [email protected](192.168.1.102:22) to [email protected](192.168.1.101:22)..
Tue Jun 9 22:11:13 2020 - [debug] ok.
Tue Jun 9 22:11:13 2020 - [debug] Connecting via SSH from [email protected](192.168.1.102:22) to [email protected](192.168.1.103:22)..
Tue Jun 9 22:11:24 2020 - [debug] ok.
Tue Jun 9 22:11:25 2020 - [info] All SSH connection tests passed successfully.
通过日志可以看到SSH
的配置正确
检查主从复制的配置
masterha_check_repl --conf=/etc/mha/mysql-mha.conf
执行结果
Tue Jun 9 22:22:43 2020 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Tue Jun 9 22:22:43 2020 - [info] Reading application default configuration from /etc/mha/mysql-mha.conf..
Tue Jun 9 22:22:43 2020 - [info] Reading server configuration from /etc/mha/mysql-mha.conf..
Tue Jun 9 22:22:43 2020 - [info] MHA::MasterMonitor version 0.57.
Tue Jun 9 22:22:45 2020 - [info] GTID failover mode = 1
Tue Jun 9 22:22:45 2020 - [info] Dead Servers:
Tue Jun 9 22:22:45 2020 - [info] Alive Servers:
Tue Jun 9 22:22:45 2020 - [info] 192.168.1.101(192.168.1.101:3306)
Tue Jun 9 22:22:45 2020 - [info] 192.168.1.102(192.168.1.102:3306)
Tue Jun 9 22:22:45 2020 - [info] 192.168.1.103(192.168.1.103:3306)
Tue Jun 9 22:22:45 2020 - [info] Alive Slaves:
Tue Jun 9 22:22:45 2020 - [info] 192.168.1.102(192.168.1.102:3306) Version=5.7.30-log (oldest major version between slaves) log-bin:enabled
Tue Jun 9 22:22:45 2020 - [info] GTID ON
Tue Jun 9 22:22:45 2020 - [info] Replicating from 192.168.1.101(192.168.1.101:3306)
Tue Jun 9 22:22:45 2020 - [info] Primary candidate for the new Master (candidate_master is set)
Tue Jun 9 22:22:45 2020 - [info] 192.168.1.103(192.168.1.103:3306) Version=5.7.30-log (oldest major version between slaves) log-bin:enabled
Tue Jun 9 22:22:45 2020 - [info] GTID ON
Tue Jun 9 22:22:45 2020 - [info] Replicating from 192.168.1.101(192.168.1.101:3306)
Tue Jun 9 22:22:45 2020 - [info] Not candidate for the new Master (no_master is set)
Tue Jun 9 22:22:45 2020 - [info] Current Alive Master: 192.168.1.101(192.168.1.101:3306)
Tue Jun 9 22:22:45 2020 - [info] Checking slave configurations..
Tue Jun 9 22:22:45 2020 - [info] read_only=1 is not set on slave 192.168.1.102(192.168.1.102:3306).
Tue Jun 9 22:22:45 2020 - [info] read_only=1 is not set on slave 192.168.1.103(192.168.1.103:3306).
Tue Jun 9 22:22:45 2020 - [info] Checking replication filtering settings..
Tue Jun 9 22:22:45 2020 - [info] binlog_do_db= , binlog_ignore_db=
Tue Jun 9 22:22:45 2020 - [info] Replication filtering check ok.
Tue Jun 9 22:22:45 2020 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.
Tue Jun 9 22:22:45 2020 - [info] Checking SSH publickey authentication settings on the current master..
Tue Jun 9 22:22:50 2020 - [warning] HealthCheck: Got timeout on checking SSH connection to 192.168.1.101! at /usr/share/perl5/vendor_perl/MHA/HealthCheck.pm line 342.
Tue Jun 9 22:22:50 2020 - [info]
192.168.1.101(192.168.1.101:3306) (current master)
+--192.168.1.102(192.168.1.102:3306)
+--192.168.1.103(192.168.1.103:3306)
Tue Jun 9 22:22:50 2020 - [info] Checking replication health on 192.168.1.102..
Tue Jun 9 22:22:50 2020 - [info] ok.
Tue Jun 9 22:22:50 2020 - [info] Checking replication health on 192.168.1.103..
Tue Jun 9 22:22:50 2020 - [info] ok.
Tue Jun 9 22:22:50 2020 - [info] Checking master_ip_failover_script status:
Tue Jun 9 22:22:50 2020 - [info] /usr/bin/master_ip_failover --command=status --ssh_user=root --orig_master_host=192.168.1.101 --orig_master_ip=192.168.1.101 --orig_master_port=3306
IN SCRIPT TEST====root|/sbin/ifconfig ens33:1 down==root|/sbin/ifconfig ens33:1 192.168.1.88/24===
Checking the Status of the script.. OK
Tue Jun 9 22:22:50 2020 - [info] OK.
Tue Jun 9 22:22:50 2020 - [warning] shutdown_script is not defined.
Tue Jun 9 22:22:50 2020 - [info] Got exit code 0 (Not master dead).
MySQL Replication Health is OK.
根据输出可以看到,主从复制配置也正确。
更多的检查方式可以通过ll /usr/bin/ |grep master
命令查看
由于MHA
工具只会在故障时迁移VIP
,所以第一次启动MHA
的时候,需要手动给master
节点(192.168.1.101
)配置一个写VIP
,配置方式如下,在master
节点(192.168.1.101
)上执行如下命令(参数需要根据实际情况修改)
/sbin/ifconfig ens33:1 192.168.1.88/24
ens33
是网络接口的名称,192.168.1.88
是写VIP
,这些配置在master_ip_failover
脚本中已经指定过。
配置写VIP
之前,使用ifconfig
输出如下
ens33: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 192.168.1.101 netmask 255.255.255.0 broadcast 192.168.1.255
inet6 fe80::bce6:1d30:472c:d811 prefixlen 64 scopeid 0x20<link>
inet6 2409:8a4c:a13:3f30:9d96:8b33:ca89:c62c prefixlen 64 scopeid 0x0<global>
ether 00:0c:29:28:70:7c txqueuelen 1000 (Ethernet)
RX packets 979338 bytes 460658144 (439.3 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 693198 bytes 278374776 (265.4 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
inet6 ::1 prefixlen 128 scopeid 0x10<host>
loop txqueuelen 1000 (Local Loopback)
RX packets 208973 bytes 18422224 (17.5 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 208973 bytes 18422224 (17.5 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
配置写VIP
之后,ifconfig
名称输出如下
ens33: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 192.168.1.101 netmask 255.255.255.0 broadcast 192.168.1.255
inet6 fe80::bce6:1d30:472c:d811 prefixlen 64 scopeid 0x20<link>
inet6 2409:8a4c:a13:3f30:9d96:8b33:ca89:c62c prefixlen 64 scopeid 0x0<global>
ether 00:0c:29:28:70:7c txqueuelen 1000 (Ethernet)
RX packets 1040146 bytes 477864466 (455.7 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 733075 bytes 299370855 (285.5 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
ens33:1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 192.168.1.88 netmask 255.255.255.0 broadcast 192.168.1.255
ether 00:0c:29:28:70:7c txqueuelen 1000 (Ethernet)
lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
inet6 ::1 prefixlen 128 scopeid 0x10<host>
loop txqueuelen 1000 (Local Loopback)
RX packets 222701 bytes 19630288 (18.7 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 222701 bytes 19630288 (18.7 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
在监控节点(192.168.1.103
)上执行如下命令(默认前台运行)
masterha_manager --conf=/etc/mha/mysql-mha.conf
执行之后输出日志如下
Tue Jun 9 22:38:05 2020 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Tue Jun 9 22:38:05 2020 - [info] Reading application default configuration from /etc/mha/mysql-mha.conf..
Tue Jun 9 22:38:05 2020 - [info] Reading server configuration from /etc/mha/mysql-mha.conf..
可以看到MHA
已经成功启动。
除此之外,/root/mha
目录下还有两个相关的文件manager.log
和mysql-mha.master_status.health
,分别用来记录MHA
日志和master
节点的健康状态。
至此,MHA架构已经搭建完成。
因为master
节点的VIP
是192.168.1.88
,所以写操作只需要连接这个VIP
即可。如果连接不上,请开启MySQL允许远程访问。
MHA
高可用搭建后,理论上是高可用的,即master
宕机后,马上会提升一个slave
为新的master
。但是理论归理论,我们还是要实践下。
以下日志是master
宕机(只停止了MySQL
服务)后,MHA
监控工具打印的日志。
Thu Jun 11 20:59:53 2020 - [warning] Got error on MySQL select ping: 2013 (Lost connection to MySQL server during query)
Thu Jun 11 20:59:53 2020 - [info] Executing secondary network check script: /usr/bin/masterha_secondary_check -s 192.168.1.101 -s 192.168.1.102 -s 192.168.1.103 --user=root --master_host=192.168.1.101 --master_ip=192.168.1.101 --master_port=3306 --master_user=dba_mha --master_password=Ppnn13y,dkst2yc. --ping_type=SELECT
Thu Jun 11 20:59:53 2020 - [info] Executing SSH check script: exit 0
Thu Jun 11 20:59:54 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.1.101' (111))
Thu Jun 11 20:59:54 2020 - [warning] Connection failed 2 time(s)..
Thu Jun 11 20:59:54 2020 - [info] HealthCheck: SSH to 192.168.1.101 is reachable.
Monitoring server 192.168.1.101 is reachable, Master is not reachable from 192.168.1.101. OK.
Thu Jun 11 20:59:55 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.1.101' (111))
Thu Jun 11 20:59:55 2020 - [warning] Connection failed 3 time(s)..
Monitoring server 192.168.1.102 is reachable, Master is not reachable from 192.168.1.102. OK.
Thu Jun 11 20:59:56 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.1.101' (111))
Thu Jun 11 20:59:56 2020 - [warning] Connection failed 4 time(s)..
Monitoring server 192.168.1.103 is reachable, Master is not reachable from 192.168.1.103. OK.
Thu Jun 11 20:59:56 2020 - [info] Master is not reachable from all other monitoring servers. Failover should start.
Thu Jun 11 20:59:56 2020 - [warning] Master is not reachable from health checker!
Thu Jun 11 20:59:56 2020 - [warning] Master 192.168.1.101(192.168.1.101:3306) is not reachable!
Thu Jun 11 20:59:56 2020 - [warning] SSH is reachable.
Thu Jun 11 20:59:56 2020 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha_default.cnf and /etc/mha/mysql-mha.conf again, and trying to connect to all servers to check server status..
Thu Jun 11 20:59:56 2020 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Thu Jun 11 20:59:56 2020 - [info] Reading application default configuration from /etc/mha/mysql-mha.conf..
Thu Jun 11 20:59:56 2020 - [info] Reading server configuration from /etc/mha/mysql-mha.conf..
Thu Jun 11 20:59:57 2020 - [info] GTID failover mode = 1
Thu Jun 11 20:59:57 2020 - [info] Dead Servers:
Thu Jun 11 20:59:57 2020 - [info] 192.168.1.101(192.168.1.101:3306)
Thu Jun 11 20:59:57 2020 - [info] Alive Servers:
Thu Jun 11 20:59:57 2020 - [info] 192.168.1.102(192.168.1.102:3306)
Thu Jun 11 20:59:57 2020 - [info] 192.168.1.103(192.168.1.103:3306)
Thu Jun 11 20:59:57 2020 - [info] Alive Slaves:
Thu Jun 11 20:59:57 2020 - [info] 192.168.1.102(192.168.1.102:3306) Version=5.7.30-log (oldest major version between slaves) log-bin:enabled
Thu Jun 11 20:59:57 2020 - [info] GTID ON
Thu Jun 11 20:59:57 2020 - [info] Replicating from 192.168.1.101(192.168.1.101:3306)
Thu Jun 11 20:59:57 2020 - [info] Primary candidate for the new Master (candidate_master is set)
Thu Jun 11 20:59:57 2020 - [info] 192.168.1.103(192.168.1.103:3306) Version=5.7.30-log (oldest major version between slaves) log-bin:enabled
Thu Jun 11 20:59:57 2020 - [info] GTID ON
Thu Jun 11 20:59:57 2020 - [info] Replicating from 192.168.1.101(192.168.1.101:3306)
Thu Jun 11 20:59:57 2020 - [info] Not candidate for the new Master (no_master is set)
Thu Jun 11 20:59:57 2020 - [info] Checking slave configurations..
Thu Jun 11 20:59:57 2020 - [info] read_only=1 is not set on slave 192.168.1.102(192.168.1.102:3306).
Thu Jun 11 20:59:57 2020 - [info] read_only=1 is not set on slave 192.168.1.103(192.168.1.103:3306).
Thu Jun 11 20:59:57 2020 - [info] Checking replication filtering settings..
Thu Jun 11 20:59:57 2020 - [info] Replication filtering check ok.
Thu Jun 11 20:59:57 2020 - [info] Master is down!
Thu Jun 11 20:59:57 2020 - [info] Terminating monitoring script.
Thu Jun 11 20:59:57 2020 - [info] Got exit code 20 (Master dead).
Thu Jun 11 20:59:57 2020 - [info] MHA::MasterFailover version 0.57.
Thu Jun 11 20:59:57 2020 - [info] Starting master failover.
Thu Jun 11 20:59:57 2020 - [info]
Thu Jun 11 20:59:57 2020 - [info] * Phase 1: Configuration Check Phase..
Thu Jun 11 20:59:57 2020 - [info]
Thu Jun 11 20:59:59 2020 - [info] GTID failover mode = 1
Thu Jun 11 20:59:59 2020 - [info] Dead Servers:
Thu Jun 11 20:59:59 2020 - [info] 192.168.1.101(192.168.1.101:3306)
Thu Jun 11 20:59:59 2020 - [info] Checking master reachability via MySQL(double check)...
Thu Jun 11 20:59:59 2020 - [info] ok.
Thu Jun 11 20:59:59 2020 - [info] Alive Servers:
Thu Jun 11 20:59:59 2020 - [info] 192.168.1.102(192.168.1.102:3306)
Thu Jun 11 20:59:59 2020 - [info] 192.168.1.103(192.168.1.103:3306)
Thu Jun 11 20:59:59 2020 - [info] Alive Slaves:
Thu Jun 11 20:59:59 2020 - [info] 192.168.1.102(192.168.1.102:3306) Version=5.7.30-log (oldest major version between slaves) log-bin:enabled
Thu Jun 11 20:59:59 2020 - [info] GTID ON
Thu Jun 11 20:59:59 2020 - [info] Replicating from 192.168.1.101(192.168.1.101:3306)
Thu Jun 11 20:59:59 2020 - [info] Primary candidate for the new Master (candidate_master is set)
Thu Jun 11 20:59:59 2020 - [info] 192.168.1.103(192.168.1.103:3306) Version=5.7.30-log (oldest major version between slaves) log-bin:enabled
Thu Jun 11 20:59:59 2020 - [info] GTID ON
Thu Jun 11 20:59:59 2020 - [info] Replicating from 192.168.1.101(192.168.1.101:3306)
Thu Jun 11 20:59:59 2020 - [info] Not candidate for the new Master (no_master is set)
Thu Jun 11 20:59:59 2020 - [info] Starting GTID based failover.
Thu Jun 11 20:59:59 2020 - [info]
Thu Jun 11 20:59:59 2020 - [info] ** Phase 1: Configuration Check Phase completed.
Thu Jun 11 20:59:59 2020 - [info]
Thu Jun 11 20:59:59 2020 - [info] * Phase 2: Dead Master Shutdown Phase..
Thu Jun 11 20:59:59 2020 - [info]
Thu Jun 11 20:59:59 2020 - [info] Forcing shutdown so that applications never connect to the current master..
Thu Jun 11 20:59:59 2020 - [info] Executing master IP deactivation script:
Thu Jun 11 20:59:59 2020 - [info] /usr/bin/master_ip_failover --orig_master_host=192.168.1.101 --orig_master_ip=192.168.1.101 --orig_master_port=3306 --command=stopssh --ssh_user=root
IN SCRIPT TEST====root|/sbin/ifconfig ens33:1 down==root|/sbin/ifconfig ens33:1 192.168.1.88/24===
Disabling the VIP on old master: 192.168.1.101
Thu Jun 11 20:59:59 2020 - [info] done.
Thu Jun 11 20:59:59 2020 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.
Thu Jun 11 20:59:59 2020 - [info] * Phase 2: Dead Master Shutdown Phase completed.
Thu Jun 11 20:59:59 2020 - [info]
Thu Jun 11 20:59:59 2020 - [info] * Phase 3: Master Recovery Phase..
Thu Jun 11 20:59:59 2020 - [info]
Thu Jun 11 20:59:59 2020 - [info] * Phase 3.1: Getting Latest Slaves Phase..
Thu Jun 11 20:59:59 2020 - [info]
Thu Jun 11 20:59:59 2020 - [info] The latest binary log file/position on all slaves is mysql-bin.000003:2435
Thu Jun 11 20:59:59 2020 - [info] Retrieved Gtid Set: 81502f9e-a592-11ea-b912-000c2928707c:12-16
Thu Jun 11 20:59:59 2020 - [info] Latest slaves (Slaves that received relay log files to the latest):
Thu Jun 11 20:59:59 2020 - [info] 192.168.1.102(192.168.1.102:3306) Version=5.7.30-log (oldest major version between slaves) log-bin:enabled
Thu Jun 11 20:59:59 2020 - [info] GTID ON
Thu Jun 11 20:59:59 2020 - [info] Replicating from 192.168.1.101(192.168.1.101:3306)
Thu Jun 11 20:59:59 2020 - [info] Primary candidate for the new Master (candidate_master is set)
Thu Jun 11 20:59:59 2020 - [info] 192.168.1.103(192.168.1.103:3306) Version=5.7.30-log (oldest major version between slaves) log-bin:enabled
Thu Jun 11 20:59:59 2020 - [info] GTID ON
Thu Jun 11 20:59:59 2020 - [info] Replicating from 192.168.1.101(192.168.1.101:3306)
Thu Jun 11 20:59:59 2020 - [info] Not candidate for the new Master (no_master is set)
Thu Jun 11 20:59:59 2020 - [info] The oldest binary log file/position on all slaves is mysql-bin.000003:2435
Thu Jun 11 20:59:59 2020 - [info] Retrieved Gtid Set: 81502f9e-a592-11ea-b912-000c2928707c:12-16
Thu Jun 11 20:59:59 2020 - [info] Oldest slaves:
Thu Jun 11 20:59:59 2020 - [info] 192.168.1.102(192.168.1.102:3306) Version=5.7.30-log (oldest major version between slaves) log-bin:enabled
Thu Jun 11 20:59:59 2020 - [info] GTID ON
Thu Jun 11 20:59:59 2020 - [info] Replicating from 192.168.1.101(192.168.1.101:3306)
Thu Jun 11 20:59:59 2020 - [info] Primary candidate for the new Master (candidate_master is set)
Thu Jun 11 20:59:59 2020 - [info] 192.168.1.103(192.168.1.103:3306) Version=5.7.30-log (oldest major version between slaves) log-bin:enabled
Thu Jun 11 20:59:59 2020 - [info] GTID ON
Thu Jun 11 20:59:59 2020 - [info] Replicating from 192.168.1.101(192.168.1.101:3306)
Thu Jun 11 20:59:59 2020 - [info] Not candidate for the new Master (no_master is set)
Thu Jun 11 20:59:59 2020 - [info]
Thu Jun 11 20:59:59 2020 - [info] * Phase 3.3: Determining New Master Phase..
Thu Jun 11 20:59:59 2020 - [info]
Thu Jun 11 20:59:59 2020 - [info] Searching new master from slaves..
Thu Jun 11 20:59:59 2020 - [info] Candidate masters from the configuration file:
Thu Jun 11 20:59:59 2020 - [info] 192.168.1.102(192.168.1.102:3306) Version=5.7.30-log (oldest major version between slaves) log-bin:enabled
Thu Jun 11 20:59:59 2020 - [info] GTID ON
Thu Jun 11 20:59:59 2020 - [info] Replicating from 192.168.1.101(192.168.1.101:3306)
Thu Jun 11 20:59:59 2020 - [info] Primary candidate for the new Master (candidate_master is set)
Thu Jun 11 20:59:59 2020 - [info] Non-candidate masters:
Thu Jun 11 20:59:59 2020 - [info] 192.168.1.103(192.168.1.103:3306) Version=5.7.30-log (oldest major version between slaves) log-bin:enabled
Thu Jun 11 20:59:59 2020 - [info] GTID ON
Thu Jun 11 20:59:59 2020 - [info] Replicating from 192.168.1.101(192.168.1.101:3306)
Thu Jun 11 20:59:59 2020 - [info] Not candidate for the new Master (no_master is set)
Thu Jun 11 20:59:59 2020 - [info] Searching from candidate_master slaves which have received the latest relay log events..
Thu Jun 11 20:59:59 2020 - [info] New master is 192.168.1.102(192.168.1.102:3306)
Thu Jun 11 20:59:59 2020 - [info] Starting master failover..
Thu Jun 11 20:59:59 2020 - [info]
From:
192.168.1.101(192.168.1.101:3306) (current master)
+--192.168.1.102(192.168.1.102:3306)
+--192.168.1.103(192.168.1.103:3306)
To:
192.168.1.102(192.168.1.102:3306) (new master)
+--192.168.1.103(192.168.1.103:3306)
Thu Jun 11 20:59:59 2020 - [info]
Thu Jun 11 20:59:59 2020 - [info] * Phase 3.3: New Master Recovery Phase..
Thu Jun 11 20:59:59 2020 - [info]
Thu Jun 11 20:59:59 2020 - [info] Waiting all logs to be applied..
Thu Jun 11 20:59:59 2020 - [info] done.
Thu Jun 11 20:59:59 2020 - [info] Getting new master's binlog name and position..
Thu Jun 11 20:59:59 2020 - [info] mysql-bin.000002:463
Thu Jun 11 20:59:59 2020 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='192.168.1.102', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='repl', MASTER_PASSWORD='xxx';
Thu Jun 11 20:59:59 2020 - [info] Master Recovery succeeded. File:Pos:Exec_Gtid_Set: mysql-bin.000002, 463, 1dbd5375-a4d9-11ea-9eef-000c29cf4cca:1,
81502f9e-a592-11ea-b912-000c2928707c:1-16
Thu Jun 11 20:59:59 2020 - [info] Executing master IP activate script:
Thu Jun 11 20:59:59 2020 - [info] /usr/bin/master_ip_failover --command=start --ssh_user=root --orig_master_host=192.168.1.101 --orig_master_ip=192.168.1.101 --orig_master_port=3306 --new_master_host=192.168.1.102 --new_master_ip=192.168.1.102 --new_master_port=3306 --new_master_user='dba_mha' --new_master_password=xxx
Option new_master_user does not take an argument
Option new_master_password does not take an argument
IN SCRIPT TEST====root|/sbin/ifconfig ens33:1 down==root|/sbin/ifconfig ens33:1 192.168.1.88/24===
Enabling the VIP - 192.168.1.88/24 on the new master - 192.168.1.102
Thu Jun 11 21:00:02 2020 - [info] OK.
Thu Jun 11 21:00:02 2020 - [info] ** Finished master recovery successfully.
Thu Jun 11 21:00:02 2020 - [info] * Phase 3: Master Recovery Phase completed.
Thu Jun 11 21:00:02 2020 - [info]
Thu Jun 11 21:00:02 2020 - [info] * Phase 4: Slaves Recovery Phase..
Thu Jun 11 21:00:02 2020 - [info]
Thu Jun 11 21:00:02 2020 - [info]
Thu Jun 11 21:00:02 2020 - [info] * Phase 4.1: Starting Slaves in parallel..
Thu Jun 11 21:00:02 2020 - [info]
Thu Jun 11 21:00:02 2020 - [info] -- Slave recovery on host 192.168.1.103(192.168.1.103:3306) started, pid: 28647. Check tmp log /root/mha/192.168.1.103_3306_20200611205957.log if it takes time..
Thu Jun 11 21:00:04 2020 - [info]
Thu Jun 11 21:00:04 2020 - [info] Log messages from 192.168.1.103 ...
Thu Jun 11 21:00:04 2020 - [info]
Thu Jun 11 21:00:02 2020 - [info] Resetting slave 192.168.1.103(192.168.1.103:3306) and starting replication from the new master 192.168.1.102(192.168.1.102:3306)..
Thu Jun 11 21:00:02 2020 - [info] Executed CHANGE MASTER.
Thu Jun 11 21:00:03 2020 - [info] Slave started.
Thu Jun 11 21:00:03 2020 - [info] gtid_wait(1dbd5375-a4d9-11ea-9eef-000c29cf4cca:1,
81502f9e-a592-11ea-b912-000c2928707c:1-16) completed on 192.168.1.103(192.168.1.103:3306). Executed 0 events.
Thu Jun 11 21:00:04 2020 - [info] End of log messages from 192.168.1.103.
Thu Jun 11 21:00:04 2020 - [info] -- Slave on host 192.168.1.103(192.168.1.103:3306) started.
Thu Jun 11 21:00:04 2020 - [info] All new slave servers recovered successfully.
Thu Jun 11 21:00:04 2020 - [info]
Thu Jun 11 21:00:04 2020 - [info] * Phase 5: New master cleanup phase..
Thu Jun 11 21:00:04 2020 - [info]
Thu Jun 11 21:00:04 2020 - [info] Resetting slave info on the new master..
Thu Jun 11 21:00:04 2020 - [info] 192.168.1.102: Resetting slave info succeeded.
Thu Jun 11 21:00:04 2020 - [info] Master failover to 192.168.1.102(192.168.1.102:3306) completed successfully.
Thu Jun 11 21:00:04 2020 - [info]
----- Failover Report -----
mysql-mha: MySQL Master failover 192.168.1.101(192.168.1.101:3306) to 192.168.1.102(192.168.1.102:3306) succeeded
Master 192.168.1.101(192.168.1.101:3306) is down!
Check MHA Manager logs at localhost.localdomain:/root/mha/manager.log for details.
Started automated(non-interactive) failover.
Invalidated master IP address on 192.168.1.101(192.168.1.101:3306)
Selected 192.168.1.102(192.168.1.102:3306) as a new master.
192.168.1.102(192.168.1.102:3306): OK: Applying all logs succeeded.
192.168.1.102(192.168.1.102:3306): OK: Activated master IP address.
192.168.1.103(192.168.1.103:3306): OK: Slave started, replicating from 192.168.1.102(192.168.1.102:3306)
192.168.1.102(192.168.1.102:3306): Resetting slave info succeeded.
Master failover to 192.168.1.102(192.168.1.102:3306) completed successfully.
从日志可以看到192.168.1.102
被提升为了新的master
。
如果原来的
master
恢复了,会不会抢回master
呢,还是会出现多个master
?
如果原来的master
恢复后,还是master
,那就是一个主从复制集群中出现了两个master
,这样就出现了脑裂。
且看MySQL
如何解决这个问题。
重新启动192.168.1.101
的MySQL
服务。因为新的master
已经变成192.168.1.102
了,所以在新的master
上执行show slave hosts
查看master
上连接了几个slave
show slave hosts;
+-----------+------+------+-----------+--------------------------------------+
| Server_id | Host | Port | Master_id | Slave_UUID |
+-----------+------+------+-----------+--------------------------------------+
| 103 | | 3306 | 102 | d6532e2a-a592-11ea-99c3-000c297f5b55 |
+-----------+------+------+-----------+--------------------------------------+
1 row in set (0.00 sec)
可以看到slave
只剩下192.168.1.103
,也就说原来的master
恢复后,并没有抢回master
,也没有成为slave
。
如果想让原来的master
加入集群,需要重新配置
change master to master_host='192.168.1.102', master_user='repl', master_password='your password', master_auto_position=1;
start slave;
配置、启动之后,再次查看master
的slave
节点
show slave hosts;
+-----------+------+------+-----------+--------------------------------------+
| Server_id | Host | Port | Master_id | Slave_UUID |
+-----------+------+------+-----------+--------------------------------------+
| 101 | | 3306 | 102 | 81502f9e-a592-11ea-b912-000c2928707c |
| 103 | | 3306 | 102 | d6532e2a-a592-11ea-99c3-000c297f5b55 |
+-----------+------+------+-----------+--------------------------------------+
2 rows in set (0.00 sec)
可以看到,原来的master
就在故障恢复之后成功的加入了集群。
MySQL
的高可用非常重要,手动搭建一个MHA
的高可用架构,可以让我们更好的理解MHA
的工作原理,也让我们在面对MySQL
故障时不至于束手无策。