本次测验实现的是centos7.4+MySQL8.0-mha,为啥是8.0,其实和5.7搭建没啥区别,之前用5.7也实现了,后来DBA老大哥说,用8.0来试试吧,8.0有聚合函数,所以我又重新搭成了8.0版本的mha,本身配置并无区别;
由于个人测试机有限,本次实现是在docker容器里搭建的,虽说是容器,但从centos7.4基础镜像开始搭建服务,并指定独立IP和端口,实现配置和物理机并无差异。
实现MHA的原理,度娘谷歌bing上都很多,不做说明,这里主要纪录搭建过程即分享一些报错时处理方法。
实现mha条件:
1、各个节点机器(不管是管理的manager机器还是nodeMySQL服务机器)都要打通SSH免密登陆;
2、在MySQL运行的node机器节点上要先搭建好主从结构集群(配置文件中还要开启半同步复制)
3、安装mha管理工具(分别在manager管理机器上安装manager+node工具包,MySQL机器上安装node工具包)
4、检查、修改配置、并运行
条件1的SSH互通,已单独写了一篇,这里不做说明
条件2的MySQL主从结构,也有一篇文章单独说明,这里不做说明,分享下这里MySQL主、从的配置文件:
#主
[mysqld]
port=50031
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
server-id=3
log-error=/var/log/mysqld.log
#从
[mysqld]
port=50051
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
log-error=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid
#mater-slave setting
server-id=4
log-bin=master-bin
relay-log=relay-bin
skip_name_resolve=ON
innodb_file_per_table=ON
max_connections=65536
#关闭中继日志自动修剪
relay_log_purge=0
#将从节点设置为只读
read_only=1
默认到这里已经打通各个节点之间的SSH免密访问,已经在运行MySQL的主从复制结构。主从结构不管是5.7版本还是8版本基本搭建都已差不多的。
条件三:
安装mha管理工具。
mha管理分为两部分,第一部分是mha-manager,这是mha的软件管理部分,第二个就是mha-node,可以看作是执行manager指令操作的部分。
由于manager管理和被管理的MySQL node不是在一台机器上的,所以我们要分开安装,在MySQL的机器上全部都要安装mha-node这个执行指令的软件。
在独立的manager管理机器上我们需要安装manager管理工具和node工具。
总结下来:所有机器都要安装mha-node工具,manager机器上要多安装一个mha-manager工具。
基本上linux上的程序执行都是程序+配置文件,所以我们先安装完程序,之后在设定配置文件,然后再运行。
mha是基于perl来写的,所以在安装mha之前我们要先安装perl源,在安装perl源之前,我们还要安装epel-release源来扩展我们的yum。
默认已经扩展了release。
注意:mha-manager安装依赖有mha-node所以,在manager机器上,要先安装node才能安装manager。
安装perl
yum install perl-DBD-MySQL perl-Config-Tiny perl-Log-Dispatch perl-Parallel-ForkManager -y
然后,其实说不定有很多没有找到包而导致没有安装成功,不过不着急,后面再解决
mha没有yum,需要Google一下他的包,直解Google官网mha,基本在前两条,注意要找centos7版本,度娘上弹出来几次的头条都是centos6的,别找错了。
具体下载地址自行谷歌,然后wget一下node包和manager包
得到 node包:mha4mysql-node-0.58-0.el7.centos.noarch.rpm
manager包:mha4mysql-manager-0.58-0.el7.centos.noarch.rpm
安装方式我总结下来有三种:1、rpm安装 2、make编译安装 3、yum安装
第一种:rpm安装:
下载rpm包后通过rpm -ivh安装包
如果在yum perl的时候有的依赖包没有找到导致没有安装
在rpm -ivh 安装的时候会报依赖错误,解决办法是,哪个依赖没有就去Google一下相关依赖,再rpm逐步安装即可(注意寻找依赖rpm包的时候最好带上系统版本进行查询)。
第二种:make编译安装:
网上还有的方法是,下载tar包解压后镜像perl 编译,之后再make && make install,同样可能报错没有依赖,这时候make失败需要搜索并安装完依赖后重新perl 编译,再make
上面两种会出现问题:
然而:上面两种方法(rpm和make)有个致命的报错,即在安装完各个依赖并且安装完mha后,试运行可能会报错依赖版本冲突。这时后卸载也不好卸载,改也不好改,这种版本错误让我纠结了两天。
最后推荐一种方法,本地yum安装,也是基于rpm包进行安装:
由于之前安装了epel并且下载的是rpm包,所以我们可以通过
yum loacalinstall +包名 进行本地yum安装,这时候如果出现没有包的情况,yum会自行下载对应的版本依赖,避免后期出现依赖兼容性问题
yum loacalinstall 虽然能够解决版本依赖问题,但是产生了一个新的问题:
如果通过make的同学可能会发现,在安装完node和manager后,自动生成了相关配置文件的模板,而通过yum localinstall安装却找不到配置模板。不过没关系,配置模板什么的,没有不是大事情,大不了自己写一个配置文件嘛。总比依赖冲突导致不能运行来的简单。
安装完node后,在manager机器上安装mha-manager安装包,同理,进行yum localinstall 安装
安装完成后,网上各种教程会然你去找配置文件并且复制到etc目录,然后修改。然而你在这里会发现,安装完后找不到配置文件
不要慌,我们自己动手写配置文件,反正系统产生的都是配置文件模板,还不是要手动改,还不如自己写:
在etc下建立一个mha目录吧
mkdir -p /etc/mha
在根目录下建立一个mha运行的目录吧
mkdir -p /masterha/app1
然后,在/etc/mha目录下新建一个配置文件并且添加内容:
vi /etc/mha/default.cnf
[server default]
#登陆数据库监控的账户
user=root
password=Lc123456.
#manager工作管理文件夹
manager_workdir=/masterha/app1
#设置manager的日志
manager_log=/masterha/app1/manager.log
#设置远端mysql在发生切换时binlog的保存位置
remote_workdir=/masterha/app1
#使用ssh账户
ssh_user=root
#MySQL主从复制账户
repl_user=repl
repl_password=Lc123456.
#设置监控主库,发送ping包的时间间隔,默认是3秒,尝试三次没有回应的时候自动进行railover
ping_interval=1
#指定并设置自动failover时候的切换脚(可省略)
#master_ip_failover_script="/etc/mha/scripts/master_ip_failover"
#设置手动切换时候的切换脚本(可省略)
master_ip_online_change_script="/etc/mha/scripts/master_ip_online_change"
#设置发生切换后发送的报警的脚本(可省略)
report_script="/etc/mha/scripts/send_report"
#设置故障发生后关闭故障主机脚本(该脚本的主要作用是关闭主机放在发生脑裂,这里没有使用)(可省略)
#shutdown_script=""
#一旦MHA到server2的监控之间出现问题,MHA Manager将会尝试从0.3登录到0.4
secondary_check_script=masterha_secondary_check -s 172.20.0.3 -s 172.20.0.4
[server1]
#节点IP地址
hostname=172.20.0.3
#如果没有改变默认的3306端口可以不用设置port参数,改变了MySQL默认端口就要指定端口
port=50031
#设置master 保存binlog的位置,以便MHA可以找到master的日志,我这里的也就是mysql
master_binlog_dir=/var/lib/mysql/
#设置为候选master,如果设置该参数以后,发生主从切换以后将会将此从库提升为主库,即使这个主库不是集群中时间最新的slave
candidate_master=1
#relay_log_purge=0
[server2]
hostname=172.20.0.4
port=50051
master_binlog_dir=/var/lib/mysql/var/lib/mysql/
candidate_master=1
读配置文件,多读几遍,会发现,我们还需要的一些细节条件:
1、MySQL的复制账户(需要访问权限、操作权限等)
2、数据库监控、登陆账户(root太高了可以新建一个,我比较省事直接用了root)
3、指定工作管理文件夹(我们新建的工作路径下,这三个条件可以直接粘贴拿来用)
4、剩余的脚本(之后慢慢添加修改,我在后面会分享我的配置)
配置文件的三个脚本,自行vi创建编辑:
master_ip_failover:(这个配置文件在后面测试链接报错的话,直接注销掉,这些脚本都不是必须的)
#!/usr/bin/env perl
use strict;
use warnings FATAL => 'all';
use Getopt::Long;
my (
command,ssh_user, origmasterhost,orig_master_ip,origmasterport,new_master_host, newmasterip,new_master_port
);
#定义VIP变量
my vip=′172.20.0.6/24′;mykey = '1';
my sshstartvip="/sbin/ifconfigeth0:key vip";myssh_stop_vip = "/sbin/ifconfig eth0:$key down";
GetOptions(
'command=s' => $command,
'ssh_user=s' => $ssh_user,
'orig_master_host=s' => $orig_master_host,
'orig_master_ip=s' => $orig_master_ip,
'orig_master_port=i' => $orig_master_port,
'new_master_host=s' => $new_master_host,
'new_master_ip=s' => $new_master_ip,
'new_master_port=i' => $new_master_port,
);
exit &main();
sub main {
print "\n\nIN SCRIPT TEST====sshstopvip==ssh_start_vip===\n\n";
if ( commandeq"stop"||command eq "stopssh" ) {
my exit_code = 1; eval { print "Disabling the VIP on old master: $orig_master_host \n"; &stop_vip(); $exit_code = 0; }; if (@) {
warn "Got Error: @\n";exitexit_code;
}
exit $exit_code;
}
elsif ( command eq "start" ) { myexit_code = 10;
eval {
print "Enabling the VIP - viponthenewmaster−new_master_host \n";
&start_vip();
$exit_code = 0;
};
if (@)warn$@;exit$exitcode;exitexit_code;
}
elsif ( $command eq "status" ) {
print "Checking the Status of the script.. OK \n";
exit 0;
}
else {
&usage();
exit 1;
}
}
sub start_vip() {
`ssh sshuser\@new_master_host \" ssh_start_vip \"`; } sub stop_vip() { return 0 unless (ssh_user);
`ssh sshuser\@orig_master_host \" $ssh_stop_vip \"`;
}
sub usage {
print
"Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
}
master_ip_online_change :
#!/bin/bash
source /root/.bash_profile
vip=`echo '172.20.0.7/24'` #设置VIP
key=`echo '1'`
command=`echo "1"|awk−F=′print$2′‘origmasterhost=‘echo"2" | awk -F = '{print 2}'` new_master_host=`echo "7" | awk -F = '{print 2}'` orig_master_ssh_user=`echo "{12}" | awk -F = '{print 2}'` new_master_ssh_user=`echo "{13}" | awk -F = '{print $2}'`
#要求服务的网卡识别名一样,都为ens192(这里是)
stop_vip=`echo "ssh root@origmasterhost/usr/sbin/ifconfigeth0:key down"`
start_vip=`echo "ssh root@newmasterhost/usr/sbin/ifconfigeth0:key $vip"`
if [ command=′stop′]thenecho−e"\n\n\n∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗\n"echo−e"DisabledthiVIP−vip on old master: origmasterhost\n"stop_vip
if [ $? -eq 0 ]
then
echo "Disabled the VIP successfully"
else
echo "Disabled the VIP failed"
fi
echo -e "***************************\n\n\n"
fi
if [ command=′start′−ocommand = 'status' ]
then
echo -e "\n\n\n*************************\n"
echo -e "Enabling the VIP - viponnewmaster:new_master_host \n"
startvipif[? -eq 0 ]
then
echo "Enabled the VIP successfully"
else
echo "Enabled the VIP failed"
fi
echo -e "***************************\n\n\n"
fi
send_report:
#!/bin/bash
source /root/.bash_profile
# 解析变量
orig_master_host=`echo "1"|awk−F=′print$2′‘newmasterhost=‘echo"2" | awk -F = '{print 2}'` new_slave_hosts=`echo "3" | awk -F = '{print 2}'` subject=`echo "4" | awk -F = '{print 2}'` body=`echo "5" | awk -F = '{print $2}'`
#定义收件人地址
email="[email protected]"
tac /var/log/mha/app1/manager.log | sed -n 2p | grep 'successfully' > /dev/null
if [ ?−eq0]thenmessages=‘echo−e"MHAsubject 主从切换成功\n master:origmasterhost−−>new_master_host \n body\n当前从库:new_slave_hosts"`
echo "messages"|mail−s"Mysql实例宕掉,MHAsubject 切换成功" email >>/tmp/mailx.log 2>&1 else messages=`echo -e "MHAsubject 主从切换失败\n master:origmasterhost−−>new_master_host \n body"‘echo"messages" | mail -s ""Mysql 实例宕掉,MHA subject切换失败""email >>/tmp/mailx.log 2>&1
fi
相关脚本对应内容自行修改,赋予脚本执行权限
chmod +x /etc/mha/scripts/master_ip_failover
chmod +x /etc/mha/scripts/master_ip_online_change
chmod +x /etc/mha/scripts/send_report
到这里我们就安装完了mha,在manager机器上测试链接状态(保证MySQL主从运行无问题和ssh无问题的情况下才能测)
##### 通过 masterha_check_ssh 验证 ssh 信任登录是否成功
[root@bc9aac747b7f scripts]# masterha_check_ssh --conf=/etc/mha/default.cnf
Wed Feb 20 02:42:00 2019 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Wed Feb 20 02:42:00 2019 - [info] Reading application default configuration from /etc/mha/default.cnf..
Wed Feb 20 02:42:00 2019 - [info] Reading server configuration from /etc/mha/default.cnf..
Wed Feb 20 02:42:00 2019 - [info] Starting SSH connection tests..
Wed Feb 20 02:42:00 2019 - [debug]
Wed Feb 20 02:42:00 2019 - [debug] Connecting via SSH from [email protected](172.20.0.3:22) to [email protected](172.20.0.4:22)..
Wed Feb 20 02:42:00 2019 - [debug] ok.
Wed Feb 20 02:42:01 2019 - [debug]
Wed Feb 20 02:42:00 2019 - [debug] Connecting via SSH from [email protected](172.20.0.4:22) to [email protected](172.20.0.3:22)..
Wed Feb 20 02:42:00 2019 - [debug] ok.
Wed Feb 20 02:42:01 2019 - [info] All SSH connection tests passed successfully.
#通过 masterha_check_repl 验证 mysql 主从复制是否成功(下面输出表示测试通过)
[root@bc9aac747b7f scripts]# masterha_check_repl --conf=/etc/mha/default.cnf
Wed Feb 20 02:42:54 2019 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Wed Feb 20 02:42:54 2019 - [info] Reading application default configuration from /etc/mha/default.cnf..
Wed Feb 20 02:42:54 2019 - [info] Reading server configuration from /etc/mha/default.cnf..
Wed Feb 20 02:42:54 2019 - [info] MHA::MasterMonitor version 0.58.
Wed Feb 20 02:42:55 2019 - [info] GTID failover mode = 0
Wed Feb 20 02:42:55 2019 - [info] Dead Servers:
Wed Feb 20 02:42:55 2019 - [info] Alive Servers:
Wed Feb 20 02:42:55 2019 - [info] 172.20.0.3(172.20.0.3:50031)
Wed Feb 20 02:42:55 2019 - [info] 172.20.0.4(172.20.0.4:50051)
Wed Feb 20 02:42:55 2019 - [info] Alive Slaves:
Wed Feb 20 02:42:55 2019 - [info] 172.20.0.4(172.20.0.4:50051) Version=8.0.15 (oldest major version between slaves) log-bin:enabled
Wed Feb 20 02:42:55 2019 - [info] Replicating from 172.20.0.3(172.20.0.3:50031)
Wed Feb 20 02:42:55 2019 - [info] Primary candidate for the new Master (candidate_master is set)
Wed Feb 20 02:42:55 2019 - [info] Current Alive Master: 172.20.0.3(172.20.0.3:50031)
Wed Feb 20 02:42:55 2019 - [info] Checking slave configurations..
Wed Feb 20 02:42:55 2019 - [info] Checking replication filtering settings..
Wed Feb 20 02:42:55 2019 - [info] binlog_do_db= , binlog_ignore_db=
Wed Feb 20 02:42:55 2019 - [info] Replication filtering check ok.
Wed Feb 20 02:42:55 2019 - [info] GTID (with auto-pos) is not supported
Wed Feb 20 02:42:55 2019 - [info] Starting SSH connection tests..
Wed Feb 20 02:42:56 2019 - [info] All SSH connection tests passed successfully.
Wed Feb 20 02:42:56 2019 - [info] Checking MHA Node version..
Wed Feb 20 02:42:56 2019 - [info] Version check ok.
Wed Feb 20 02:42:56 2019 - [info] Checking SSH publickey authentication settings on the current master..
Wed Feb 20 02:42:57 2019 - [info] HealthCheck: SSH to 172.20.0.3 is reachable.
Wed Feb 20 02:42:57 2019 - [info] Master MHA Node version is 0.58.
Wed Feb 20 02:42:57 2019 - [info] Checking recovery script configurations on 172.20.0.3(172.20.0.3:50031)..
Wed Feb 20 02:42:57 2019 - [info] Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/var/lib/mysql/ --output_file=/masterha/app1/save_binary_logs_test --manager_version=0.58 --start_file=master-log.000002
Wed Feb 20 02:42:57 2019 - [info] Connecting to [email protected](172.20.0.3:22)..
Creating /masterha/app1 if not exists.. ok.
Checking output directory is accessible or not..
ok.
Binlog found at /var/lib/mysql/, up to master-log.000002
Wed Feb 20 02:42:57 2019 - [info] Binlog setting check done.
Wed Feb 20 02:42:57 2019 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
Wed Feb 20 02:42:57 2019 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='root' --slave_host=172.20.0.4 --slave_ip=172.20.0.4 --slave_port=50051 --workdir=/masterha/app1 --target_version=8.0.15 --manager_version=0.58 --relay_dir=/var/lib/mysql --current_relay_log=relay-bin.000002 --slave_pass=xxx
Wed Feb 20 02:42:57 2019 - [info] Connecting to [email protected](172.20.0.4:22)..
Checking slave recovery environment settings..
Relay log found at /var/lib/mysql, up to relay-bin.000002
Temporary relay log file is /var/lib/mysql/relay-bin.000002
Checking if super_read_only is defined and turned on.. not present or turned off, ignoring.
Testing mysql connection and privileges..
mysql: [Warning] Using a password on the command line interface can be insecure.
done.
Testing mysqlbinlog output.. done.
Cleaning up test file(s).. done.
Wed Feb 20 02:42:58 2019 - [info] Slaves settings check done.
Wed Feb 20 02:42:58 2019 - [info]
172.20.0.3(172.20.0.3:50031) (current master)
+--172.20.0.4(172.20.0.4:50051)
Wed Feb 20 02:42:58 2019 - [info] Checking replication health on 172.20.0.4..
Wed Feb 20 02:42:58 2019 - [info] ok.
Wed Feb 20 02:42:58 2019 - [warning] master_ip_failover_script is not defined.
Wed Feb 20 02:42:58 2019 - [warning] shutdown_script is not defined.
Wed Feb 20 02:42:58 2019 - [info] Got exit code 0 (Not master dead).
MySQL Replication Health is OK.
到这里mha运行测试成功(保证SSH、主从结构无问题)
登陆各个MySQL-slave节点(不包括master节点)
修改relay-loge清除方式:
set global relay_log_purge=0
注意:
MHA在发生切换的过程中,从库的恢复过程中依赖于relay log的相关信息,所以这里要将relay log的自动清除设置为OFF,采用手动清除relay log的方式。在默认情况下,从服务器上的中继日志会在SQL线程执行完毕后被自动删除。但是在MHA环境中,这些中继日志在恢复其他从服务器时可能会被用到,因此需要禁用中继日志的自动删除功能。定期清除中继日志需要考虑到复制延时的问题。在ext3的文件系统下,删除大的文件需要一定的时间,会导致严重的复制延时。为了避免复制延时,需要暂时为中继日志创建硬链接,因为在linux系统中通过硬链接删除大文件速度会很快。(在mysql数据库中,删除大表时,通常也采用建立硬链接的方式)。
设定定时清除日志脚本:
cat purge_relay_log.sh
#!/bin/bash
user=root
passwd=123456
port=3306
log_dir='/data/masterha/log'
work_dir='/data'
purge='/usr/local/bin/purge_relay_logs'
if [ ! -d $log_dir ]
then
mkdir $log_dir -p
fi
$purge --user=$user --password=$passwd --disable_relay_log_purge --port=$port --workdir=$work_dir $log_dir/purge_relay_logs.log 2&1
192.168.2.129 [root ~]$ crontab -l
0 4 * * * /bin/bash /root/purge_relay_log.sh
启动MHA和启动前检查:
检查MHA Manager的状态
[root@bc9aac747b7f scripts]# masterha_check_status --conf=/etc/mha/default.cnf
default is stopped(2:NOT_RUNNING).
开启MHA Manager监控
mkdir -p /var/log/masterha/app1/
[root@bc9aac747b7f scripts]# nohup masterha_manager --conf=/etc/mha/default.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null /var/log/masterha/app1/manager.log 2&1 &
[1] 28781
[2] 28782
启动参数说明:
--remove_dead_master_conf //该参数代表当发生主从切换后,老的主库的ip将会从配置文件中移除。
--manger_log //日志存放位置
--ignore_last_failover //在缺省情况下,如果MHA检测到连续发生宕机,且两次宕机间隔不足8小时的话,则不会进行Failover,之所以这样限制是为了避免ping-pong效应。该参数代表忽略上次MHA触发切换产生的文件,默认情况下,MHA发生切换后会在日志目录,也就是上面我设置的/data产生app1.failover.complete文件,下次再次切换的时候如果发现该目录下存在该文件将不允许触发切换,除非在第一次切换后收到删除该文件,为了方便,这里设置为--ignore_last_failover。
查看MHA Manager监控是否正常:
[root@bc9aac747b7f scripts]# masterha_check_status --conf=/etc/mha/default.cnf
default (pid:28781) is running(0:PING_OK), master:172.20.0.3
启动日志存放在配置文件的指定文件夹日志信息里面,这里新建的日志存放目录是操作日志指定写入地址
关闭MHA:
关闭MHA Manage监控
[root@bc9aac747b7f app1]# masterha_stop --conf=/etc/mha/default.cnf
Stopped default successfully.
[1]+ Exit 1 nohup masterha_manager --conf=/etc/mha/default.cnf --remove_dead_master_conf --ignore_last_failover /var/log/masterha/app1/manager.log 2 < /dev/null (wd: /etc/mha/scripts)
(wd now: /var/log/masterha/app1)
测试关掉主库查看切换情况:
关掉主库
查看mha状态:
[root@bc9aac747b7f app1]# masterha_check_status --conf=/etc/mha/default.cnf
default is stopped(2:NOT_RUNNING).
[1]+ Done nohup masterha_manager --conf=/etc/mha/default.cnf --remove_dead_master_conf --ignore_last_failover /var/log/masterha/app1/manager.log 2 < /dev/null (wd: /var/log/masterha/app1)
(wd now: /masterha/app1)
发现MHA manager管理程序被关闭了
查看mha日志:
[root@bc9aac747b7f app1]# cat manager.log
Wed Feb 20 02:57:48 2019 - [info] MHA::MasterMonitor version 0.58.
Wed Feb 20 02:57:49 2019 - [info] GTID failover mode = 0
Wed Feb 20 02:57:49 2019 - [info] Dead Servers:
Wed Feb 20 02:57:49 2019 - [info] Alive Servers:
Wed Feb 20 02:57:49 2019 - [info] 172.20.0.3(172.20.0.3:50031)
Wed Feb 20 02:57:49 2019 - [info] 172.20.0.4(172.20.0.4:50051)
Wed Feb 20 02:57:49 2019 - [info] Alive Slaves:
Wed Feb 20 02:57:49 2019 - [info] 172.20.0.4(172.20.0.4:50051) Version=8.0.15 (oldest major version between slaves) log-bin:enabled
Wed Feb 20 02:57:49 2019 - [info] Replicating from 172.20.0.3(172.20.0.3:50031)
Wed Feb 20 02:57:49 2019 - [info] Primary candidate for the new Master (candidate_master is set)
Wed Feb 20 02:57:49 2019 - [info] Current Alive Master: 172.20.0.3(172.20.0.3:50031)
Wed Feb 20 02:57:49 2019 - [info] Checking slave configurations..
Wed Feb 20 02:57:49 2019 - [info] Checking replication filtering settings..
Wed Feb 20 02:57:49 2019 - [info] binlog_do_db= , binlog_ignore_db=
Wed Feb 20 02:57:49 2019 - [info] Replication filtering check ok.
Wed Feb 20 02:57:49 2019 - [info] GTID (with auto-pos) is not supported
Wed Feb 20 02:57:49 2019 - [info] Starting SSH connection tests..
Wed Feb 20 02:57:50 2019 - [info] All SSH connection tests passed successfully.
Wed Feb 20 02:57:50 2019 - [info] Checking MHA Node version..
Wed Feb 20 02:57:50 2019 - [info] Version check ok.
Wed Feb 20 02:57:50 2019 - [info] Checking SSH publickey authentication settings on the current master..
Wed Feb 20 02:57:51 2019 - [info] HealthCheck: SSH to 172.20.0.3 is reachable.
Wed Feb 20 02:57:51 2019 - [info] Master MHA Node version is 0.58.
Wed Feb 20 02:57:51 2019 - [info] Checking recovery script configurations on 172.20.0.3(172.20.0.3:50031)..
Wed Feb 20 02:57:51 2019 - [info] Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/var/lib/mysql/ --output_file=/masterha/app1/save_binary_logs_test --manager_version=0.58 --start_file=master-log.000002
Wed Feb 20 02:57:51 2019 - [info] Connecting to [email protected](172.20.0.3:22)..
Creating /masterha/app1 if not exists.. ok.
Checking output directory is accessible or not..
ok.
Binlog found at /var/lib/mysql/, up to master-log.000002
Wed Feb 20 02:57:51 2019 - [info] Binlog setting check done.
Wed Feb 20 02:57:51 2019 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
Wed Feb 20 02:57:51 2019 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='root' --slave_host=172.20.0.4 --slave_ip=172.20.0.4 --slave_port=50051 --workdir=/masterha/app1 --target_version=8.0.15 --manager_version=0.58 --relay_dir=/var/lib/mysql --current_relay_log=relay-bin.000002 --slave_pass=xxx
Wed Feb 20 02:57:51 2019 - [info] Connecting to [email protected](172.20.0.4:22)..
Checking slave recovery environment settings..
Relay log found at /var/lib/mysql, up to relay-bin.000002
Temporary relay log file is /var/lib/mysql/relay-bin.000002
Checking if super_read_only is defined and turned on.. not present or turned off, ignoring.
Testing mysql connection and privileges..
mysql: [Warning] Using a password on the command line interface can be insecure.
done.
Testing mysqlbinlog output.. done.
Cleaning up test file(s).. done.
Wed Feb 20 02:57:52 2019 - [info] Slaves settings check done.
Wed Feb 20 02:57:52 2019 - [info]
172.20.0.3(172.20.0.3:50031) (current master)
+--172.20.0.4(172.20.0.4:50051)
Wed Feb 20 02:57:52 2019 - [warning] master_ip_failover_script is not defined.
Wed Feb 20 02:57:52 2019 - [warning] shutdown_script is not defined.
Wed Feb 20 02:57:52 2019 - [info] Set master ping interval 1 seconds.
Wed Feb 20 02:57:52 2019 - [info] Set secondary check script: masterha_secondary_check -s 172.20.0.3 -s 172.20.0.4
Wed Feb 20 02:57:52 2019 - [info] Starting ping health check on 172.20.0.3(172.20.0.3:50031)..
Wed Feb 20 02:57:52 2019 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..
Wed Feb 20 03:04:14 2019 - [info] Got terminate signal. Exit.
Wed Feb 20 03:05:25 2019 - [info] MHA::MasterMonitor version 0.58.
Wed Feb 20 03:05:26 2019 - [info] GTID failover mode = 0
Wed Feb 20 03:05:26 2019 - [info] Dead Servers:
Wed Feb 20 03:05:26 2019 - [info] Alive Servers:
Wed Feb 20 03:05:26 2019 - [info] 172.20.0.3(172.20.0.3:50031)
Wed Feb 20 03:05:26 2019 - [info] 172.20.0.4(172.20.0.4:50051)
Wed Feb 20 03:05:26 2019 - [info] Alive Slaves:
Wed Feb 20 03:05:26 2019 - [info] 172.20.0.4(172.20.0.4:50051) Version=8.0.15 (oldest major version between slaves) log-bin:enabled
Wed Feb 20 03:05:26 2019 - [info] Replicating from 172.20.0.3(172.20.0.3:50031)
Wed Feb 20 03:05:26 2019 - [info] Primary candidate for the new Master (candidate_master is set)
Wed Feb 20 03:05:26 2019 - [info] Current Alive Master: 172.20.0.3(172.20.0.3:50031)
Wed Feb 20 03:05:26 2019 - [info] Checking slave configurations..
Wed Feb 20 03:05:26 2019 - [info] Checking replication filtering settings..
Wed Feb 20 03:05:26 2019 - [info] binlog_do_db= , binlog_ignore_db=
Wed Feb 20 03:05:26 2019 - [info] Replication filtering check ok.
Wed Feb 20 03:05:26 2019 - [info] GTID (with auto-pos) is not supported
Wed Feb 20 03:05:26 2019 - [info] Starting SSH connection tests..
Wed Feb 20 03:05:27 2019 - [info] All SSH connection tests passed successfully.
Wed Feb 20 03:05:27 2019 - [info] Checking MHA Node version..
Wed Feb 20 03:05:27 2019 - [info] Version check ok.
Wed Feb 20 03:05:27 2019 - [info] Checking SSH publickey authentication settings on the current master..
Wed Feb 20 03:05:27 2019 - [info] HealthCheck: SSH to 172.20.0.3 is reachable.
Wed Feb 20 03:05:28 2019 - [info] Master MHA Node version is 0.58.
Wed Feb 20 03:05:28 2019 - [info] Checking recovery script configurations on 172.20.0.3(172.20.0.3:50031)..
Wed Feb 20 03:05:28 2019 - [info] Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/var/lib/mysql/ --output_file=/masterha/app1/save_binary_logs_test --manager_version=0.58 --start_file=master-log.000002
Wed Feb 20 03:05:28 2019 - [info] Connecting to [email protected](172.20.0.3:22)..
Creating /masterha/app1 if not exists.. ok.
Checking output directory is accessible or not..
ok.
Binlog found at /var/lib/mysql/, up to master-log.000002
Wed Feb 20 03:05:28 2019 - [info] Binlog setting check done.
Wed Feb 20 03:05:28 2019 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
Wed Feb 20 03:05:28 2019 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='root' --slave_host=172.20.0.4 --slave_ip=172.20.0.4 --slave_port=50051 --workdir=/masterha/app1 --target_version=8.0.15 --manager_version=0.58 --relay_dir=/var/lib/mysql --current_relay_log=relay-bin.000002 --slave_pass=xxx
Wed Feb 20 03:05:28 2019 - [info] Connecting to [email protected](172.20.0.4:22)..
Checking slave recovery environment settings..
Relay log found at /var/lib/mysql, up to relay-bin.000002
Temporary relay log file is /var/lib/mysql/relay-bin.000002
Checking if super_read_only is defined and turned on.. not present or turned off, ignoring.
Testing mysql connection and privileges..
mysql: [Warning] Using a password on the command line interface can be insecure.
done.
Testing mysqlbinlog output.. done.
Cleaning up test file(s).. done.
Wed Feb 20 03:05:28 2019 - [info] Slaves settings check done.
Wed Feb 20 03:05:28 2019 - [info]
172.20.0.3(172.20.0.3:50031) (current master)
+--172.20.0.4(172.20.0.4:50051)
Wed Feb 20 03:05:28 2019 - [warning] master_ip_failover_script is not defined.
Wed Feb 20 03:05:28 2019 - [warning] shutdown_script is not defined.
Wed Feb 20 03:05:28 2019 - [info] Set master ping interval 1 seconds.
Wed Feb 20 03:05:28 2019 - [info] Set secondary check script: masterha_secondary_check -s 172.20.0.3 -s 172.20.0.4
Wed Feb 20 03:05:28 2019 - [info] Starting ping health check on 172.20.0.3(172.20.0.3:50031)..
Wed Feb 20 03:05:28 2019 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..
Wed Feb 20 05:01:12 2019 - [warning] Got error on MySQL select ping: 2006 (MySQL server has gone away)
Wed Feb 20 05:01:12 2019 - [info] Executing secondary network check script: masterha_secondary_check -s 172.20.0.3 -s 172.20.0.4 --user=root --master_host=172.20.0.3 --master_ip=172.20.0.3 --master_port=50031 --master_user=root --master_password=Lc123456. --ping_type=SELECT
Wed Feb 20 05:01:12 2019 - [info] Executing SSH check script: save_binary_logs --command=test --start_pos=4 --binlog_dir=/var/lib/mysql/ --output_file=/masterha/app1/save_binary_logs_test --manager_version=0.58 --binlog_prefix=master-log
Wed Feb 20 05:01:13 2019 - [info] HealthCheck: SSH to 172.20.0.3 is reachable.
Monitoring server 172.20.0.3 is reachable, Master is not reachable from 172.20.0.3. OK.
Monitoring server 172.20.0.4 is reachable, Master is not reachable from 172.20.0.4. OK.
Wed Feb 20 05:01:13 2019 - [info] Master is not reachable from all other monitoring servers. Failover should start.
Wed Feb 20 05:01:13 2019 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.20.0.3' (111))
Wed Feb 20 05:01:13 2019 - [warning] Connection failed 2 time(s)..
Wed Feb 20 05:01:14 2019 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.20.0.3' (111))
Wed Feb 20 05:01:14 2019 - [warning] Connection failed 3 time(s)..
Wed Feb 20 05:01:15 2019 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.20.0.3' (111))
Wed Feb 20 05:01:15 2019 - [warning] Connection failed 4 time(s)..
Wed Feb 20 05:01:15 2019 - [warning] Master is not reachable from health checker!
Wed Feb 20 05:01:15 2019 - [warning] Master 172.20.0.3(172.20.0.3:50031) is not reachable!
Wed Feb 20 05:01:15 2019 - [warning] SSH is reachable.
Wed Feb 20 05:01:15 2019 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha_default.cnf and /etc/mha/default.cnf again, and trying to connect to all servers to check server status..
Wed Feb 20 05:01:15 2019 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Wed Feb 20 05:01:15 2019 - [info] Reading application default configuration from /etc/mha/default.cnf..
Wed Feb 20 05:01:15 2019 - [info] Reading server configuration from /etc/mha/default.cnf..
Wed Feb 20 05:01:16 2019 - [info] GTID failover mode = 0
Wed Feb 20 05:01:16 2019 - [info] Dead Servers:
Wed Feb 20 05:01:16 2019 - [info] 172.20.0.3(172.20.0.3:50031)
Wed Feb 20 05:01:16 2019 - [info] Alive Servers:
Wed Feb 20 05:01:16 2019 - [info] 172.20.0.4(172.20.0.4:50051)
Wed Feb 20 05:01:16 2019 - [info] Alive Slaves:
Wed Feb 20 05:01:16 2019 - [info] 172.20.0.4(172.20.0.4:50051) Version=8.0.15 (oldest major version between slaves) log-bin:enabled
Wed Feb 20 05:01:16 2019 - [info] Replicating from 172.20.0.3(172.20.0.3:50031)
Wed Feb 20 05:01:16 2019 - [info] Primary candidate for the new Master (candidate_master is set)
Wed Feb 20 05:01:16 2019 - [info] Checking slave configurations..
Wed Feb 20 05:01:16 2019 - [info] Checking replication filtering settings..
Wed Feb 20 05:01:16 2019 - [info] Replication filtering check ok.
Wed Feb 20 05:01:16 2019 - [info] Master is down!
Wed Feb 20 05:01:16 2019 - [info] Terminating monitoring script.
Wed Feb 20 05:01:16 2019 - [info] Got exit code 20 (Master dead).
Wed Feb 20 05:01:16 2019 - [info] MHA::MasterFailover version 0.58.
Wed Feb 20 05:01:16 2019 - [info] Starting master failover.
Wed Feb 20 05:01:16 2019 - [info]
Wed Feb 20 05:01:16 2019 - [info] * Phase 1: Configuration Check Phase..
Wed Feb 20 05:01:16 2019 - [info]
Wed Feb 20 05:01:17 2019 - [info] GTID failover mode = 0
Wed Feb 20 05:01:17 2019 - [info] Dead Servers:
Wed Feb 20 05:01:17 2019 - [info] 172.20.0.3(172.20.0.3:50031)
Wed Feb 20 05:01:17 2019 - [info] Checking master reachability via MySQL(double check)...
Wed Feb 20 05:01:17 2019 - [info] ok.
Wed Feb 20 05:01:17 2019 - [info] Alive Servers:
Wed Feb 20 05:01:17 2019 - [info] 172.20.0.4(172.20.0.4:50051)
Wed Feb 20 05:01:17 2019 - [info] Alive Slaves:
Wed Feb 20 05:01:17 2019 - [info] 172.20.0.4(172.20.0.4:50051) Version=8.0.15 (oldest major version between slaves) log-bin:enabled
Wed Feb 20 05:01:17 2019 - [info] Replicating from 172.20.0.3(172.20.0.3:50031)
Wed Feb 20 05:01:17 2019 - [info] Primary candidate for the new Master (candidate_master is set)
Wed Feb 20 05:01:17 2019 - [info] Starting Non-GTID based failover.
Wed Feb 20 05:01:17 2019 - [info]
Wed Feb 20 05:01:17 2019 - [info] ** Phase 1: Configuration Check Phase completed.
Wed Feb 20 05:01:17 2019 - [info]
Wed Feb 20 05:01:17 2019 - [info] * Phase 2: Dead Master Shutdown Phase..
Wed Feb 20 05:01:17 2019 - [info]
Wed Feb 20 05:01:17 2019 - [info] Forcing shutdown so that applications never connect to the current master..
Wed Feb 20 05:01:17 2019 - [warning] master_ip_failover_script is not set. Skipping invalidating dead master IP address.
Wed Feb 20 05:01:17 2019 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.
Wed Feb 20 05:01:18 2019 - [info] * Phase 2: Dead Master Shutdown Phase completed.
Wed Feb 20 05:01:18 2019 - [info]
Wed Feb 20 05:01:18 2019 - [info] * Phase 3: Master Recovery Phase..
Wed Feb 20 05:01:18 2019 - [info]
Wed Feb 20 05:01:18 2019 - [info] * Phase 3.1: Getting Latest Slaves Phase..
Wed Feb 20 05:01:18 2019 - [info]
Wed Feb 20 05:01:18 2019 - [info] The latest binary log file/position on all slaves is master-log.000002:1954
Wed Feb 20 05:01:18 2019 - [info] Latest slaves (Slaves that received relay log files to the latest):
Wed Feb 20 05:01:18 2019 - [info] 172.20.0.4(172.20.0.4:50051) Version=8.0.15 (oldest major version between slaves) log-bin:enabled
Wed Feb 20 05:01:18 2019 - [info] Replicating from 172.20.0.3(172.20.0.3:50031)
Wed Feb 20 05:01:18 2019 - [info] Primary candidate for the new Master (candidate_master is set)
Wed Feb 20 05:01:18 2019 - [info] The oldest binary log file/position on all slaves is master-log.000002:1954
Wed Feb 20 05:01:18 2019 - [info] Oldest slaves:
Wed Feb 20 05:01:18 2019 - [info] 172.20.0.4(172.20.0.4:50051) Version=8.0.15 (oldest major version between slaves) log-bin:enabled
Wed Feb 20 05:01:18 2019 - [info] Replicating from 172.20.0.3(172.20.0.3:50031)
Wed Feb 20 05:01:18 2019 - [info] Primary candidate for the new Master (candidate_master is set)
Wed Feb 20 05:01:18 2019 - [info]
Wed Feb 20 05:01:18 2019 - [info] * Phase 3.2: Saving Dead Master's Binlog Phase..
Wed Feb 20 05:01:18 2019 - [info]
Wed Feb 20 05:01:19 2019 - [info] Fetching dead master's binary logs..
Wed Feb 20 05:01:19 2019 - [info] Executing command on the dead master 172.20.0.3(172.20.0.3:50031): save_binary_logs --command=save --start_file=master-log.000002 --start_pos=1954 --binlog_dir=/var/lib/mysql/ --output_file=/masterha/app1/saved_master_binlog_from_172.20.0.3_50031_20190220050116.binlog --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.58
Creating /masterha/app1 if not exists.. ok.
Concat binary/relay logs from master-log.000002 pos 1954 to master-log.000002 EOF into /masterha/app1/saved_master_binlog_from_172.20.0.3_50031_20190220050116.binlog ..
Binlog Checksum enabled
Dumping binlog format description event, from position 0 to 155.. ok.
Dumping effective binlog data from /var/lib/mysql//master-log.000002 position 1954 to tail(1977).. ok.
Binlog Checksum enabled
Concat succeeded.
Wed Feb 20 05:01:20 2019 - [info] scp from [email protected]:/masterha/app1/saved_master_binlog_from_172.20.0.3_50031_20190220050116.binlog to local:/masterha/app1/saved_master_binlog_from_172.20.0.3_50031_20190220050116.binlog succeeded.
Wed Feb 20 05:01:20 2019 - [info] HealthCheck: SSH to 172.20.0.4 is reachable.
Wed Feb 20 05:01:20 2019 - [info]
Wed Feb 20 05:01:20 2019 - [info] * Phase 3.3: Determining New Master Phase..
Wed Feb 20 05:01:20 2019 - [info]
Wed Feb 20 05:01:20 2019 - [info] Finding the latest slave that has all relay logs for recovering other slaves..
Wed Feb 20 05:01:20 2019 - [info] All slaves received relay logs to the same position. No need to resync each other.
Wed Feb 20 05:01:20 2019 - [info] Searching new master from slaves..
Wed Feb 20 05:01:20 2019 - [info] Candidate masters from the configuration file:
Wed Feb 20 05:01:20 2019 - [info] 172.20.0.4(172.20.0.4:50051) Version=8.0.15 (oldest major version between slaves) log-bin:enabled
Wed Feb 20 05:01:20 2019 - [info] Replicating from 172.20.0.3(172.20.0.3:50031)
Wed Feb 20 05:01:20 2019 - [info] Primary candidate for the new Master (candidate_master is set)
Wed Feb 20 05:01:20 2019 - [info] Non-candidate masters:
Wed Feb 20 05:01:20 2019 - [info] Searching from candidate_master slaves which have received the latest relay log events..
Wed Feb 20 05:01:20 2019 - [info] New master is 172.20.0.4(172.20.0.4:50051)
Wed Feb 20 05:01:20 2019 - [info] Starting master failover..
Wed Feb 20 05:01:20 2019 - [info]
From:
172.20.0.3(172.20.0.3:50031) (current master)
+--172.20.0.4(172.20.0.4:50051)
To:
172.20.0.4(172.20.0.4:50051) (new master)
Wed Feb 20 05:01:20 2019 - [info]
Wed Feb 20 05:01:20 2019 - [info] * Phase 3.4: New Master Diff Log Generation Phase..
Wed Feb 20 05:01:20 2019 - [info]
Wed Feb 20 05:01:20 2019 - [info] This server has all relay logs. No need to generate diff files from the latest slave.
Wed Feb 20 05:01:20 2019 - [info] Sending binlog..
Wed Feb 20 05:01:21 2019 - [info] scp from local:/masterha/app1/saved_master_binlog_from_172.20.0.3_50031_20190220050116.binlog to [email protected]:/masterha/app1/saved_master_binlog_from_172.20.0.3_50031_20190220050116.binlog succeeded.
Wed Feb 20 05:01:21 2019 - [info]
Wed Feb 20 05:01:21 2019 - [info] * Phase 3.5: Master Log Apply Phase..
Wed Feb 20 05:01:21 2019 - [info]
Wed Feb 20 05:01:21 2019 - [info] *NOTICE: If any error happens from this phase, manual recovery is needed.
Wed Feb 20 05:01:21 2019 - [info] Starting recovery on 172.20.0.4(172.20.0.4:50051)..
Wed Feb 20 05:01:21 2019 - [info] Generating diffs succeeded.
Wed Feb 20 05:01:21 2019 - [info] Waiting until all relay logs are applied.
Wed Feb 20 05:01:21 2019 - [info] done.
Wed Feb 20 05:01:21 2019 - [info] Getting slave status..
Wed Feb 20 05:01:21 2019 - [info] This slave(172.20.0.4)'s Exec_Master_Log_Pos equals to Read_Master_Log_Pos(master-log.000002:1954). No need to recover from Exec_Master_Log_Pos.
Wed Feb 20 05:01:21 2019 - [info] Connecting to the target slave host 172.20.0.4, running recover script..
Wed Feb 20 05:01:21 2019 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user='root' --slave_host=172.20.0.4 --slave_ip=172.20.0.4 --slave_port=50051 --apply_files=/masterha/app1/saved_master_binlog_from_172.20.0.3_50031_20190220050116.binlog --workdir=/masterha/app1 --target_version=8.0.15 --timestamp=20190220050116 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.58 --slave_pass=xxx
Wed Feb 20 05:01:21 2019 - [info]
Applying differential binary/relay log files /masterha/app1/saved_master_binlog_from_172.20.0.3_50031_20190220050116.binlog on 172.20.0.4:50051. This may take long time...
Applying log files succeeded.
Wed Feb 20 05:01:21 2019 - [info] All relay logs were successfully applied.
Wed Feb 20 05:01:21 2019 - [info] Getting new master's binlog name and position..
Wed Feb 20 05:01:21 2019 - [info] master-bin.000002:1954
Wed Feb 20 05:01:21 2019 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='172.20.0.4', MASTER_PORT=50051, MASTER_LOG_FILE='master-bin.000002', MASTER_LOG_POS=1954, MASTER_USER='repl', MASTER_PASSWORD='xxx';
Wed Feb 20 05:01:21 2019 - [warning] master_ip_failover_script is not set. Skipping taking over new master IP address.
Wed Feb 20 05:01:21 2019 - [info] Setting read_only=0 on 172.20.0.4(172.20.0.4:50051)..
Wed Feb 20 05:01:21 2019 - [info] ok.
Wed Feb 20 05:01:21 2019 - [info] ** Finished master recovery successfully.
Wed Feb 20 05:01:21 2019 - [info] * Phase 3: Master Recovery Phase completed.
Wed Feb 20 05:01:21 2019 - [info]
Wed Feb 20 05:01:21 2019 - [info] * Phase 4: Slaves Recovery Phase..
Wed Feb 20 05:01:21 2019 - [info]
Wed Feb 20 05:01:21 2019 - [info] * Phase 4.1: Starting Parallel Slave Diff Log Generation Phase..
Wed Feb 20 05:01:21 2019 - [info]
Wed Feb 20 05:01:21 2019 - [info] Generating relay diff files from the latest slave succeeded.
Wed Feb 20 05:01:21 2019 - [info]
Wed Feb 20 05:01:21 2019 - [info] * Phase 4.2: Starting Parallel Slave Log Apply Phase..
Wed Feb 20 05:01:21 2019 - [info]
Wed Feb 20 05:01:21 2019 - [info] All new slave servers recovered successfully.
Wed Feb 20 05:01:21 2019 - [info]
Wed Feb 20 05:01:21 2019 - [info] * Phase 5: New master cleanup phase..
Wed Feb 20 05:01:21 2019 - [info]
Wed Feb 20 05:01:21 2019 - [info] Resetting slave info on the new master..
Wed Feb 20 05:01:21 2019 - [info] 172.20.0.4: Resetting slave info succeeded.
Wed Feb 20 05:01:21 2019 - [info] Master failover to 172.20.0.4(172.20.0.4:50051) completed successfully.
Wed Feb 20 05:01:21 2019 - [info] Deleted server1 entry from /etc/mha/default.cnf .
Wed Feb 20 05:01:21 2019 - [info]
----- Failover Report -----
default: MySQL Master failover 172.20.0.3(172.20.0.3:50031) to 172.20.0.4(172.20.0.4:50051) succeeded
Master 172.20.0.3(172.20.0.3:50031) is down!
Check MHA Manager logs at bc9aac747b7f:/masterha/app1/manager.log for details.
Started automated(non-interactive) failover.
The latest slave 172.20.0.4(172.20.0.4:50051) has all relay logs for recovery.
Selected 172.20.0.4(172.20.0.4:50051) as a new master.
172.20.0.4(172.20.0.4:50051): OK: Applying all logs succeeded.
Generating relay diff files from the latest slave succeeded.
172.20.0.4(172.20.0.4:50051): Resetting slave info succeeded.
Master failover to 172.20.0.4(172.20.0.4:50051) completed successfully.
Wed Feb 20 05:01:21 2019 - [info] Sending mail..
/etc/mha/scripts/send_report: line 4: awk−F=′print--new_master_host=172.20.0.4′‘newmasterhost=‘echo2: command not found
tac: failed to open ‘/var/log/mha/app1/manager.log’ for reading: No such file or directory
/etc/mha/scripts/send_report: line 11: syntax error near unexpected token `fi'
/etc/mha/scripts/send_report: line 11: `fi'
Wed Feb 20 05:01:21 2019 - [error][/usr/share/perl5/vendor_perl/MHA/MasterFailover.pm, ln2089] Failed to send mail with return code 2:0
日志最后发现slave被提到master了,MHA安装成功
总结一下:
MHA切换后会自动关闭掉,要想重启必须要把down掉的主重新启动并且change master改为从库后,才能重启MHA-manager
MHA在多个slave选主的情况下,会对比bin-log,且会选择较全的slave提升成主,意味着这里就会有时间消耗,官方说法是在10秒内会选出来切换成功
MySQL在开启半同步复制的时候会对网络通信有要求,但并不吃主机性能,考虑到半同步复制要返回一个rely-log才会commit,所以异地机房的半同步复制模式影响较大。
参考博客:
https://www.cnblogs.com/EikiXu/p/9604391.html