1, 设置免密登陆
server1,server2,server3, monitor 4个节点都执行ssh-keygen -t rsa 生成私钥;
ssh-keygen -t rsa
将各自的私钥分发给其他节点;各自的机器分发给另外3台机器即可;
ssh-copy-id -i ~/.ssh/id_rsa.pub [email protected]
ssh-copy-id -i ~/.ssh/id_rsa.pub [email protected]
ssh-copy-id -i ~/.ssh/id_rsa.pub [email protected]
ssh-copy-id -i ~/.ssh/id_rsa.pub [email protected]
2, 搭建MYSQL主从环境
使用mysql安装配置脚本搭建就好了; 我这边搭建的是: 5.7.19-log
3, 安装MHA node和manager
先下载安装包,然后执行安装,
总用量 244
-rw-r--r-- 1 root root 87119 4月 21 16:42 mha4mysql-manager-0.56-0.el6.noarch.rpm
drwxr-xr-x 10 1001 1001 4096 5月 31 2015 mha4mysql-manager-0.57
-rw-r--r-- 1 root root 118521 4月 21 16:42 mha4mysql-manager-0.57.tar.gz
-rw-r--r-- 1 root root 36326 4月 21 16:43 mha4mysql-node-0.56-0.el6.noarch.rpm
[root@host-10-185-161-202 mha]# rpm -ivh mha4mysql-node-0.56-0.el6.noarch.rpm
Preparing... ########################################### [100%]
package mha4mysql-node-0.56-0.el6.noarch is already installed
[root@host-10-185-161-202 mha]#
[root@host-10-185-161-202 mha]# rpm -ivh mha4mysql-node-0.56-0.el6.noarch.rpm
Preparing... ########################################### [100%]
package mha4mysql-node-0.56-0.el6.noarch is already installed
由于MHA是基于Perl开发的,依赖perl环境,如果报错:
error: Failed dependencies:
perl(Config::Tiny) is needed by mha4mysql-manager-0.56-0.el6.noarch
perl(Log::Dispatch) is needed by mha4mysql-manager-0.56-0.el6.noarch
perl(Log::Dispatch::File) is needed by mha4mysql-manager-0.56-0.el6.noarch
perl(Log::Dispatch::Screen) is needed by mha4mysql-manager-0.56-0.el6.noarch
perl(Parallel::ForkManager) is needed by mha4mysql-manager-0.56-0.el6.noarch
perl(Time::HiRes) is needed by mha4mysql-manager-0.56-0.el6.noarch
则需要安装依赖环境:
yum install perl-Config-Tiny
yum install perl-Log-Dispatch
yum install perl-Parallel-ForkManager
yum install perl-Time-HiRes
有的服务器yum源配置的不对,可能不一定能够安装, 可以找其他配置好的server,将/etc/yum.repos.d/目录下的repo文件复制过来,再安装即可;
安装切换的包: mha4mysql-manager-0.57/samples/scripts 目录下有几个文件;文件的含义可以参考上一篇MHA介绍;
-rwxr-xr-x 1 1001 1001 3648 5月 31 2015 master_ip_failover
-rwxr-xr-x 1 1001 1001 9870 5月 31 2015 master_ip_online_change
-rwxr-xr-x 1 1001 1001 11867 5月 31 2015 power_manager
-rwxr-xr-x 1 1001 1001 1360 5月 31 2015 send_report
将这些可执行文件复制到/usr/local/bin/目录下;
创建配置文件目录,并编辑配置文件:
mkdir /etc/masterha
cp mha4mysql-manager-0.57/samples/conf/app1.cnf 复制到 /etc/masterha/下
编辑配置文件:
[server default]
manager_workdir=/var/log/masterha/app1
manager_log=/var/log/masterha/app1/manager.log
#master binlog path
master_binlog_dir=/export/data/mysql/
#failover script
master_ip_failover_script=/usr/local/bin/master_ip_failover
#MHA master switch
master_ip_online_change_script=/usr/local/bin/master_ip_online_change
log_level=debug
#monitor user
user=mha
password='123456'
#time interval
ping_interval=3
#检测方式是insert, MHA-0.56开始支持insert
#master会生成一个infra库
ping_type=INSERT
#设置远端mysql在发生切换时,binlog的保存位置
remote_workdir=/export/data/dbbak/
#复制的用户
repl_user=repl
repl_password=1234
#告警脚本
#report_script=/usr/local/bin/send_report
#通过从库进行二次探测的脚本
secondary_check_script=/usr/local/bin/masterha_secondary_check --master_host=172.20.130.103 -s 172.28.114.94 --user=mha -s 172.20.130.105 --master_port=3358
#故障发生后关闭宿主机脚本
#shutdown_script="/usr/local/bin/power_manager --command=stopssh2 --ssh_user=root"
#ssh user
ssh_user=root
[server1]
hostname=server1
ip=172.20.130.105
candidate_master=1
port=3358
[server2]
hostname=server2
ip=172.28.114.94
#candidate_master=0
port=3358
[server3]
hostname=server3
ip=172.20.130.103
candidate_master=1
check_repl_delay=0
port=3358
环境检查:
执行: masterha_check_ssh --conf=/etc/masterha/app1.cnf 检查免密登陆是否配置正确;
脚本会登陆各个node,并从node 去ssh其他的node; 如果报错则需要重新配置免密登陆;
[root@host-10-185-161-202 ~]# masterha_check_ssh --conf=/etc/masterha/app1.cnf
Fri Apr 24 19:46:05 2020 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Fri Apr 24 19:46:05 2020 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Fri Apr 24 19:46:05 2020 - [info] Reading server configuration from /etc/masterha/app1.cnf..
Fri Apr 24 19:46:05 2020 - [info] Starting SSH connection tests..
Fri Apr 24 19:46:06 2020 - [debug]
Fri Apr 24 19:46:05 2020 - [debug] Connecting via SSH from root@server1(172.20.130.105:22) to root@server2(172.28.114.94:22)..
Fri Apr 24 19:46:05 2020 - [debug] ok.
Fri Apr 24 19:46:05 2020 - [debug] Connecting via SSH from root@server1(172.20.130.105:22) to root@server3(172.20.130.103:22)..
Fri Apr 24 19:46:05 2020 - [debug] ok.
Fri Apr 24 19:46:07 2020 - [debug]
Fri Apr 24 19:46:06 2020 - [debug] Connecting via SSH from root@server3(172.20.130.103:22) to root@server1(172.20.130.105:22)..
Fri Apr 24 19:46:06 2020 - [debug] ok.
Fri Apr 24 19:46:06 2020 - [debug] Connecting via SSH from root@server3(172.20.130.103:22) to root@server2(172.28.114.94:22)..
Fri Apr 24 19:46:06 2020 - [debug] ok.
Fri Apr 24 19:46:07 2020 - [debug]
Fri Apr 24 19:46:05 2020 - [debug] Connecting via SSH from root@server2(172.28.114.94:22) to root@server1(172.20.130.105:22)..
Fri Apr 24 19:46:06 2020 - [debug] ok.
Fri Apr 24 19:46:06 2020 - [debug] Connecting via SSH from root@server2(172.28.114.94:22) to root@server3(172.20.130.103:22)..
Fri Apr 24 19:46:07 2020 - [debug] ok.
Fri Apr 24 19:46:07 2020 - [info] All SSH connection tests passed successfully.
主从复制健康检查:masterha_check_repl --conf=/etc/masterha/app1.cnf 会检查主从复制状态;
如果报错: Bareword "FIXME_xxx" not allowed while "strict subs" in use at /usr/local/bin/master_ip_failover line 93,
只需要将改行注释掉即可;
[root@host-10-185-161-202 ~]# masterha_check_repl --conf=/etc/masterha/app1.cnf
Fri Apr 24 19:57:22 2020 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Fri Apr 24 19:57:22 2020 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Fri Apr 24 19:57:22 2020 - [info] Reading server configuration from /etc/masterha/app1.cnf..
Fri Apr 24 19:57:22 2020 - [info] MHA::MasterMonitor version 0.56.
Fri Apr 24 19:57:22 2020 - [debug] Connecting to servers..
Fri Apr 24 19:57:23 2020 - [debug] Connected to: server1(172.20.130.105:3358), user=mha
Fri Apr 24 19:57:23 2020 - [debug] Number of slave worker threads on host server1(172.20.130.105:3358): 0
Fri Apr 24 19:57:23 2020 - [debug] Connected to: server2(172.28.114.94:3358), user=mha
Fri Apr 24 19:57:23 2020 - [debug] Number of slave worker threads on host server2(172.28.114.94:3358): 0
Fri Apr 24 19:57:23 2020 - [debug] Connected to: server3(172.20.130.103:3358), user=mha
Fri Apr 24 19:57:23 2020 - [debug] Number of slave worker threads on host server3(172.20.130.103:3358): 0
Fri Apr 24 19:57:23 2020 - [warning] SQL Thread is stopped(no error) on server1(172.20.130.105:3358)
Fri Apr 24 19:57:23 2020 - [debug] Comparing MySQL versions..
Fri Apr 24 19:57:23 2020 - [debug] Comparing MySQL versions done.
Fri Apr 24 19:57:23 2020 - [debug] Connecting to servers done.
Fri Apr 24 19:57:23 2020 - [info] Multi-master configuration is detected. Current primary(writable) master is server1(172.20.130.105:3358)
Fri Apr 24 19:57:23 2020 - [info] Master configurations are as below:
Master server1(172.20.130.105:3358), replicating from 172.20.130.103(172.20.130.103:3358)
Master server3(172.20.130.103:3358), replicating from 172.20.130.105(172.20.130.105:3358), read-only
Fri Apr 24 19:57:23 2020 - [info] GTID failover mode = 0
Fri Apr 24 19:57:23 2020 - [info] Dead Servers:
Fri Apr 24 19:57:23 2020 - [info] Alive Servers:
Fri Apr 24 19:57:23 2020 - [info] server1(172.20.130.105:3358)
Fri Apr 24 19:57:23 2020 - [info] server2(172.28.114.94:3358)
Fri Apr 24 19:57:23 2020 - [info] server3(172.20.130.103:3358)
Fri Apr 24 19:57:23 2020 - [info] Alive Slaves:
Fri Apr 24 19:57:23 2020 - [info] server2(172.28.114.94:3358) Version=5.7.19-log (oldest major version between slaves) log-bin:enabled
Fri Apr 24 19:57:23 2020 - [debug] Relay log info repository: FILE
Fri Apr 24 19:57:23 2020 - [info] Replicating from 172.20.130.105(172.20.130.105:3358)
Fri Apr 24 19:57:23 2020 - [info] server3(172.20.130.103:3358) Version=5.7.19-log (oldest major version between slaves) log-bin:enabled
Fri Apr 24 19:57:23 2020 - [debug] Relay log info repository: FILE
Fri Apr 24 19:57:23 2020 - [info] Replicating from 172.20.130.105(172.20.130.105:3358)
Fri Apr 24 19:57:23 2020 - [info] Primary candidate for the new Master (candidate_master is set)
Fri Apr 24 19:57:23 2020 - [info] Current Alive Master: server1(172.20.130.105:3358)
Fri Apr 24 19:57:23 2020 - [info] Checking slave configurations..
Fri Apr 24 19:57:23 2020 - [info] Checking replication filtering settings..
Fri Apr 24 19:57:23 2020 - [info] binlog_do_db= , binlog_ignore_db=
Fri Apr 24 19:57:23 2020 - [info] Replication filtering check ok.
Fri Apr 24 19:57:23 2020 - [info] GTID (with auto-pos) is not supported
Fri Apr 24 19:57:23 2020 - [info] Starting SSH connection tests..
Fri Apr 24 19:57:23 2020 - [debug]
Fri Apr 24 19:57:23 2020 - [debug] Connecting via SSH from root@server1(172.20.130.105:22) to root@server2(172.28.114.94:22)..
Fri Apr 24 19:57:23 2020 - [debug] ok.
Fri Apr 24 19:57:23 2020 - [debug] Connecting via SSH from root@server1(172.20.130.105:22) to root@server3(172.20.130.103:22)..
Fri Apr 24 19:57:23 2020 - [debug] ok.
Fri Apr 24 19:57:24 2020 - [debug]
Fri Apr 24 19:57:24 2020 - [debug] Connecting via SSH from root@server3(172.20.130.103:22) to root@server1(172.20.130.105:22)..
Fri Apr 24 19:57:24 2020 - [debug] ok.
Fri Apr 24 19:57:24 2020 - [debug] Connecting via SSH from root@server3(172.20.130.103:22) to root@server2(172.28.114.94:22)..
Fri Apr 24 19:57:24 2020 - [debug] ok.
Fri Apr 24 19:57:24 2020 - [debug]
Fri Apr 24 19:57:23 2020 - [debug] Connecting via SSH from root@server2(172.28.114.94:22) to root@server1(172.20.130.105:22)..
Fri Apr 24 19:57:24 2020 - [debug] ok.
Fri Apr 24 19:57:24 2020 - [debug] Connecting via SSH from root@server2(172.28.114.94:22) to root@server3(172.20.130.103:22)..
Fri Apr 24 19:57:24 2020 - [debug] ok.
Fri Apr 24 19:57:24 2020 - [info] All SSH connection tests passed successfully.
Fri Apr 24 19:57:24 2020 - [info] Checking MHA Node version..
Fri Apr 24 19:57:25 2020 - [info] Version check ok.
Fri Apr 24 19:57:25 2020 - [info] Checking SSH publickey authentication settings on the current master..
Fri Apr 24 19:57:25 2020 - [debug] SSH connection test to server1, option -o StrictHostKeyChecking=no -o PasswordAuthentication=no -o BatchMode=yes -o ConnectTimeout=5, timeout 5
Fri Apr 24 19:57:25 2020 - [info] HealthCheck: SSH to server1 is reachable.
Fri Apr 24 19:57:25 2020 - [info] Master MHA Node version is 0.56.
Fri Apr 24 19:57:25 2020 - [info] Checking recovery script configurations on server1(172.20.130.105:3358)..
Fri Apr 24 19:57:25 2020 - [info] Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/export/data/mysql/ --output_file=/export/data/dbbak//save_binary_logs_test --manager_version=0.56 --start_file=mysql-bin.000007 --debug
Fri Apr 24 19:57:25 2020 - [info] Connecting to [email protected](server1:22)..
Creating /export/data/dbbak if not exists.. ok.
Checking output directory is accessible or not..
ok.
Binlog found at /export/data/mysql/, up to mysql-bin.000007
Fri Apr 24 19:57:25 2020 - [info] Binlog setting check done.
Fri Apr 24 19:57:25 2020 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
Fri Apr 24 19:57:25 2020 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='mha' --slave_host=server2 --slave_ip=172.28.114.94 --slave_port=3358 --workdir=/export/data/dbbak/ --target_version=5.7.19-log --manager_version=0.56 --relay_log_info=/export/data/mysql/relay-log.info --relay_dir=/export/data/mysql/ --debug --slave_pass=xxx
Fri Apr 24 19:57:25 2020 - [info] Connecting to [email protected](server2:22)..
Checking slave recovery environment settings..
Opening /export/data/mysql/relay-log.info ... ok.
Relay log found at /export/data/mysql, up to A01-R04-I114-94-GV54352-relay-bin.000002
Temporary relay log file is /export/data/mysql/A01-R04-I114-94-GV54352-relay-bin.000002
Testing mysql connection and privileges..mysql: [Warning] Using a password on the command line interface can be insecure.
done.
Testing mysqlbinlog output.. done.
Cleaning up test file(s).. done.
Fri Apr 24 19:57:26 2020 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='mha' --slave_host=server3 --slave_ip=172.20.130.103 --slave_port=3358 --workdir=/export/data/dbbak/ --target_version=5.7.19-log --manager_version=0.56 --relay_log_info=/export/data/mysql/relay-log.info --relay_dir=/export/data/mysql/ --debug --slave_pass=xxx
Fri Apr 24 19:57:26 2020 - [info] Connecting to [email protected](server3:22)..
Checking slave recovery environment settings..
Opening /export/data/mysql/relay-log.info ... ok.
Relay log found at /export/data/mysql, up to LF-MYSQL-130-103-relay-bin.000002
Temporary relay log file is /export/data/mysql/LF-MYSQL-130-103-relay-bin.000002
Testing mysql connection and privileges..mysql: [Warning] Using a password on the command line interface can be insecure.
done.
Testing mysqlbinlog output.. done.
Cleaning up test file(s).. done.
Fri Apr 24 19:57:26 2020 - [info] Slaves settings check done.
Fri Apr 24 19:57:26 2020 - [info]
server1(172.20.130.105:3358) (current master)
+--server2(172.28.114.94:3358)
+--server3(172.20.130.103:3358)
Fri Apr 24 19:57:26 2020 - [info] Checking replication health on server2..
Fri Apr 24 19:57:26 2020 - [info] ok.
Fri Apr 24 19:57:26 2020 - [info] Checking replication health on server3..
Fri Apr 24 19:57:26 2020 - [info] ok.
Fri Apr 24 19:57:26 2020 - [info] Checking master_ip_failover_script status:
Fri Apr 24 19:57:26 2020 - [info] /usr/local/bin/master_ip_failover --command=status --ssh_user=root --orig_master_host=server1 --orig_master_ip=172.20.130.105 --orig_master_port=3358
Fri Apr 24 19:57:26 2020 - [info] OK.
Fri Apr 24 19:57:26 2020 - [warning] shutdown_script is not defined.
Fri Apr 24 19:57:26 2020 - [debug] Disconnected from server1(172.20.130.105:3358)
Fri Apr 24 19:57:26 2020 - [debug] Disconnected from server2(172.28.114.94:3358)
Fri Apr 24 19:57:26 2020 - [debug] Disconnected from server3(172.20.130.103:3358)
Fri Apr 24 19:57:26 2020 - [info] Got exit code 0 (Not master dead).
MySQL Replication Health is OK.
启动MHA:nohup masterha_manager --conf=/etc/masterha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /var/log/masterha/app1/manager.log 2>&1 &
检查MHA运行状态:masterha_check_status --conf=/etc/masterha/app1.cnf
[root@host-10-185-161-202 ~]# masterha_check_status --conf=/etc/masterha/app1.cnf
app1 (pid:366067) is running(0:PING_OK), master:server1
切换测试:
在master上执行service mysql stop,将主库实例停掉,观察状态;
[root@host-10-185-161-202 ~]# cat /var/log/masterha/app1/manager.log
Fri Apr 24 19:58:00 2020 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Fri Apr 24 19:58:00 2020 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Fri Apr 24 19:58:00 2020 - [info] Reading server configuration from /etc/masterha/app1.cnf..
Creating /export/data/dbbak if not exists.. ok.
Checking output directory is accessible or not..
ok.
Binlog found at /export/data/mysql/, up to mysql-bin.000007
Fri Apr 24 20:03:53 2020 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Fri Apr 24 20:03:53 2020 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Fri Apr 24 20:03:53 2020 - [info] Reading server configuration from /etc/masterha/app1.cnf..
st server2(172.28.114.94:3358): 0
Fri Apr 24 19:58:02 2020 - [debug] Connected to: server3(172.20.130.103:3358), user=mha
Fri Apr 24 19:58:02 2020 - [debug] Number of slave worker threads on host server3(172.20.130.103:3358): 0
Fri Apr 24 19:58:02 2020 - [warning] SQL Thread is stopped(no error) on server1(172.20.130.105:3358)
Fri Apr 24 19:58:02 2020 - [debug] Comparing MySQL versions..
Fri Apr 24 19:58:02 2020 - [debug] Comparing MySQL versions done.
Fri Apr 24 19:58:02 2020 - [debug] Connecting to servers done.
Fri Apr 24 19:58:02 2020 - [info] Multi-master configuration is detected. Current primary(writable) master is server1(172.20.130.105:3358)
Fri Apr 24 19:58:02 2020 - [info] Master configurations are as below:
Master server1(172.20.130.105:3358), replicating from 172.20.130.103(172.20.130.103:3358)
Master server3(172.20.130.103:3358), replicating from 172.20.130.105(172.20.130.105:3358), read-only
Fri Apr 24 19:58:02 2020 - [info] GTID failover mode = 0
Fri Apr 24 19:58:02 2020 - [info] Dead Servers:
Fri Apr 24 19:58:02 2020 - [info] Alive Servers:
Fri Apr 24 19:58:02 2020 - [info] server1(172.20.130.105:3358)
Fri Apr 24 19:58:02 2020 - [info] server2(172.28.114.94:3358)
Fri Apr 24 19:58:02 2020 - [info] server3(172.20.130.103:3358)
Fri Apr 24 19:58:02 2020 - [info] Alive Slaves:
Fri Apr 24 19:58:02 2020 - [info] server2(172.28.114.94:3358) Version=5.7.19-log (oldest major version between slaves) log-bin:enabled
Fri Apr 24 19:58:02 2020 - [debug] Relay log info repository: FILE
Fri Apr 24 19:58:02 2020 - [info] Replicating from 172.20.130.105(172.20.130.105:3358)
Fri Apr 24 19:58:02 2020 - [info] server3(172.20.130.103:3358) Version=5.7.19-log (oldest major version between slaves) log-bin:enabled
Fri Apr 24 19:58:02 2020 - [debug] Relay log info repository: FILE
Fri Apr 24 19:58:02 2020 - [info] Replicating from 172.20.130.105(172.20.130.105:3358)
Fri Apr 24 19:58:02 2020 - [info] Primary candidate for the new Master (candidate_master is set)
Fri Apr 24 19:58:02 2020 - [info] Current Alive Master: server1(172.20.130.105:3358)
Fri Apr 24 19:58:02 2020 - [info] Checking slave configurations..
Fri Apr 24 19:58:02 2020 - [info] Checking replication filtering settings..
Fri Apr 24 19:58:02 2020 - [info] binlog_do_db= , binlog_ignore_db=
Fri Apr 24 19:58:02 2020 - [info] Replication filtering check ok.
Fri Apr 24 19:58:02 2020 - [info] GTID (with auto-pos) is not supported
Fri Apr 24 19:58:02 2020 - [info] Starting SSH connection tests..
Fri Apr 24 19:58:03 2020 - [debug]
Fri Apr 24 19:58:03 2020 - [debug] Connecting via SSH from root@server3(172.20.130.103:22) to root@server1(172.20.130.105:22)..
Fri Apr 24 19:58:03 2020 - [debug] ok.
Fri Apr 24 19:58:03 2020 - [debug] Connecting via SSH from root@server3(172.20.130.103:22) to root@server2(172.28.114.94:22)..
Fri Apr 24 19:58:03 2020 - [debug] ok.
Fri Apr 24 19:58:03 2020 - [debug]
Fri Apr 24 19:58:02 2020 - [debug] Connecting via SSH from root@server2(172.28.114.94:22) to root@server1(172.20.130.105:22)..
Fri Apr 24 19:58:02 2020 - [debug] ok.
Fri Apr 24 19:58:02 2020 - [debug] Connecting via SSH from root@server2(172.28.114.94:22) to root@server3(172.20.130.103:22)..
Fri Apr 24 19:58:03 2020 - [debug] ok.
Fri Apr 24 19:58:03 2020 - [debug]
Fri Apr 24 19:58:02 2020 - [debug] Connecting via SSH from root@server1(172.20.130.105:22) to root@server2(172.28.114.94:22)..
Fri Apr 24 19:58:02 2020 - [debug] ok.
Fri Apr 24 19:58:02 2020 - [debug] Connecting via SSH from root@server1(172.20.130.105:22) to root@server3(172.20.130.103:22)..
Fri Apr 24 19:58:03 2020 - [debug] ok.
Fri Apr 24 19:58:03 2020 - [info] All SSH connection tests passed successfully.
Fri Apr 24 19:58:03 2020 - [info] Checking MHA Node version..
Fri Apr 24 19:58:04 2020 - [info] Version check ok.
Fri Apr 24 19:58:04 2020 - [info] Checking SSH publickey authentication settings on the current master..
Fri Apr 24 19:58:04 2020 - [debug] SSH connection test to server1, option -o StrictHostKeyChecking=no -o PasswordAuthentication=no -o BatchMode=yes -o ConnectTimeout=5, timeout 5
Fri Apr 24 19:58:04 2020 - [info] HealthCheck: SSH to server1 is reachable.
Fri Apr 24 19:58:04 2020 - [info] Master MHA Node version is 0.56.
Fri Apr 24 19:58:04 2020 - [info] Checking recovery script configurations on server1(172.20.130.105:3358)..
Fri Apr 24 19:58:04 2020 - [info] Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/export/data/mysql/ --output_file=/export/data/dbbak//save_binary_logs_test --manager_version=0.56 --start_file=mysql-bin.000007 --debug
Fri Apr 24 19:58:04 2020 - [info] Connecting to [email protected](server1:22)..
Creating /export/data/dbbak if not exists.. ok.
Checking output directory is accessible or not..
ok.
Binlog found at /export/data/mysql/, up to mysql-bin.000007
Fri Apr 24 19:58:04 2020 - [info] Binlog setting check done.
Fri Apr 24 19:58:04 2020 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
Fri Apr 24 19:58:04 2020 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='mha' --slave_host=server2 --slave_ip=172.28.114.94 --slave_port=3358 --workdir=/export/data/dbbak/ --target_version=5.7.19-log --manager_version=0.56 --relay_log_info=/export/data/mysql/relay-log.info --relay_dir=/export/data/mysql/ --debug --slave_pass=xxx
Fri Apr 24 19:58:04 2020 - [info] Connecting to [email protected](server2:22)..
Checking slave recovery environment settings..
Opening /export/data/mysql/relay-log.info ... ok.
Relay log found at /export/data/mysql, up to A01-R04-I114-94-GV54352-relay-bin.000002
Temporary relay log file is /export/data/mysql/A01-R04-I114-94-GV54352-relay-bin.000002
Testing mysql connection and privileges..mysql: [Warning] Using a password on the command line interface can be insecure.
done.
Testing mysqlbinlog output.. done.
Cleaning up test file(s).. done.
Fri Apr 24 19:58:04 2020 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='mha' --slave_host=server3 --slave_ip=172.20.130.103 --slave_port=3358 --workdir=/export/data/dbbak/ --target_version=5.7.19-log --manager_version=0.56 --relay_log_info=/export/data/mysql/relay-log.info --relay_dir=/export/data/mysql/ --debug --slave_pass=xxx
Fri Apr 24 19:58:04 2020 - [info] Connecting to [email protected](server3:22)..
Checking slave recovery environment settings..
Opening /export/data/mysql/relay-log.info ... ok.
Relay log found at /export/data/mysql, up to LF-MYSQL-130-103-relay-bin.000002
Temporary relay log file is /export/data/mysql/LF-MYSQL-130-103-relay-bin.000002
Testing mysql connection and privileges..mysql: [Warning] Using a password on the command line interface can be insecure.
done.
Testing mysqlbinlog output.. done.
Cleaning up test file(s).. done.
Fri Apr 24 19:58:04 2020 - [info] Slaves settings check done.
Fri Apr 24 19:58:04 2020 - [info]
server1(172.20.130.105:3358) (current master)
+--server2(172.28.114.94:3358)
+--server3(172.20.130.103:3358)
Fri Apr 24 19:58:04 2020 - [info] Checking master_ip_failover_script status:
Fri Apr 24 19:58:04 2020 - [info] /usr/local/bin/master_ip_failover --command=status --ssh_user=root --orig_master_host=server1 --orig_master_ip=172.20.130.105 --orig_master_port=3358
Fri Apr 24 19:58:05 2020 - [info] OK.
Fri Apr 24 19:58:05 2020 - [warning] shutdown_script is not defined.
Fri Apr 24 19:58:05 2020 - [debug] Disconnected from server1(172.20.130.105:3358)
Fri Apr 24 19:58:05 2020 - [debug] Disconnected from server2(172.28.114.94:3358)
Fri Apr 24 19:58:05 2020 - [debug] Disconnected from server3(172.20.130.103:3358)
Fri Apr 24 19:58:05 2020 - [debug] SSH check command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/export/data/mysql/ --output_file=/export/data/dbbak//save_binary_logs_test --manager_version=0.56 --binlog_prefix=mysql-bin --debug
Fri Apr 24 19:58:05 2020 - [info] Set master ping interval 3 seconds.
Fri Apr 24 19:58:05 2020 - [info] Set secondary check script: /usr/local/bin/masterha_secondary_check --master_host=172.20.130.103 -s 172.28.114.94 --user=mha -s 172.20.130.105 --master_port=3358
Fri Apr 24 19:58:05 2020 - [info] Starting ping health check on server1(172.20.130.105:3358)..
Fri Apr 24 19:58:05 2020 - [debug] Connected on master.
Fri Apr 24 19:58:05 2020 - [debug] Set short wait_timeout on master: 6 seconds
Fri Apr 24 19:58:05 2020 - [debug] Trying to get advisory lock..
Fri Apr 24 19:58:05 2020 - [info] Ping(INSERT) succeeded, waiting until MySQL doesn't respond..
Fri Apr 24 20:03:42 2020 - [warning] Got error on MySQL insert ping: 2006 (MySQL server has gone away)
Fri Apr 24 20:03:43 2020 - [info] Executing SSH check script: save_binary_logs --command=test --start_pos=4 --binlog_dir=/export/data/mysql/ --output_file=/export/data/dbbak//save_binary_logs_test --manager_version=0.56 --binlog_prefix=mysql-bin --debug
Fri Apr 24 20:03:43 2020 - [debug] SSH connection test to server1, option -o StrictHostKeyChecking=no -o PasswordAuthentication=no -o BatchMode=yes -o ConnectTimeout=5, timeout 5
Fri Apr 24 20:03:43 2020 - [info] Executing secondary network check script: /usr/local/bin/masterha_secondary_check --master_host=172.20.130.103 -s 172.28.114.94 --user=mha -s 172.20.130.105 --master_port=3358 --user=root --master_host=server1 --master_ip=172.20.130.105 --master_port=3358 --master_user=mha --master_password=w5kRBGysiZvc1sNiMphf --ping_type=INSERT
Fri Apr 24 20:03:43 2020 - [info] HealthCheck: SSH to server1 is reachable.
Monitoring server 172.28.114.94 is reachable, Master is not reachable from 172.28.114.94. OK.
Monitoring server 172.20.130.105 is reachable, Master is not reachable from 172.20.130.105. OK.
Fri Apr 24 20:03:43 2020 - [info] Master is not reachable from all other monitoring servers. Failover should start.
Fri Apr 24 20:03:45 2020 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111)
Fri Apr 24 20:03:45 2020 - [warning] Connection failed 2 time(s)..
Fri Apr 24 20:03:48 2020 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111)
Fri Apr 24 20:03:48 2020 - [warning] Connection failed 3 time(s)..
Fri Apr 24 20:03:51 2020 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111)
Fri Apr 24 20:03:51 2020 - [warning] Connection failed 4 time(s)..
Fri Apr 24 20:03:51 2020 - [warning] Master is not reachable from health checker!
Fri Apr 24 20:03:51 2020 - [warning] Master server1(172.20.130.105:3358) is not reachable!
Fri Apr 24 20:03:51 2020 - [warning] SSH is reachable.
Fri Apr 24 20:03:51 2020 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha_default.cnf and /etc/masterha/app1.cnf again, and trying to connect to all servers to check server status..
Fri Apr 24 20:03:51 2020 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Fri Apr 24 20:03:51 2020 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Fri Apr 24 20:03:51 2020 - [info] Reading server configuration from /etc/masterha/app1.cnf..
Fri Apr 24 20:03:51 2020 - [debug] Skipping connecting to dead master server1(172.20.130.105:3358).
Fri Apr 24 20:03:51 2020 - [debug] Connecting to servers..
Fri Apr 24 20:03:53 2020 - [debug] Connected to: server2(172.28.114.94:3358), user=mha
Fri Apr 24 20:03:53 2020 - [debug] Number of slave worker threads on host server2(172.28.114.94:3358): 0
Fri Apr 24 20:03:53 2020 - [debug] Connected to: server3(172.20.130.103:3358), user=mha
Fri Apr 24 20:03:53 2020 - [debug] Number of slave worker threads on host server3(172.20.130.103:3358): 0
Fri Apr 24 20:03:53 2020 - [debug] Comparing MySQL versions..
Fri Apr 24 20:03:53 2020 - [debug] Comparing MySQL versions done.
Fri Apr 24 20:03:53 2020 - [debug] Connecting to servers done.
Fri Apr 24 20:03:53 2020 - [info] GTID failover mode = 0
Fri Apr 24 20:03:53 2020 - [info] Dead Servers:
Fri Apr 24 20:03:53 2020 - [info] server1(172.20.130.105:3358)
Fri Apr 24 20:03:53 2020 - [info] Alive Servers:
Fri Apr 24 20:03:53 2020 - [info] server2(172.28.114.94:3358)
Fri Apr 24 20:03:53 2020 - [info] server3(172.20.130.103:3358)
Fri Apr 24 20:03:53 2020 - [info] Alive Slaves:
Fri Apr 24 20:03:53 2020 - [info] server2(172.28.114.94:3358) Version=5.7.19-log (oldest major version between slaves) log-bin:enabled
Fri Apr 24 20:03:53 2020 - [debug] Relay log info repository: FILE
Fri Apr 24 20:03:53 2020 - [info] Replicating from 172.20.130.105(172.20.130.105:3358)
Fri Apr 24 20:03:53 2020 - [info] Primary candidate for the new Master (candidate_master is set)
Fri Apr 24 20:03:53 2020 - [info] server3(172.20.130.103:3358) Version=5.7.19-log (oldest major version between slaves) log-bin:enabled
Fri Apr 24 20:03:53 2020 - [debug] Relay log info repository: FILE
Fri Apr 24 20:03:53 2020 - [info] Replicating from 172.20.130.105(172.20.130.105:3358)
Fri Apr 24 20:03:53 2020 - [info] Primary candidate for the new Master (candidate_master is set)
Fri Apr 24 20:03:53 2020 - [info] Checking slave configurations..
Fri Apr 24 20:03:53 2020 - [info] Checking replication filtering settings..
Fri Apr 24 20:03:53 2020 - [info] Replication filtering check ok.
Fri Apr 24 20:03:53 2020 - [info] Master is down!
Fri Apr 24 20:03:53 2020 - [info] Terminating monitoring script.
Fri Apr 24 20:03:53 2020 - [info] Got exit code 20 (Master dead).
Fri Apr 24 20:03:53 2020 - [info] MHA::MasterFailover version 0.56.
Fri Apr 24 20:03:53 2020 - [info] Starting master failover.
Fri Apr 24 20:03:53 2020 - [info]
Fri Apr 24 20:03:53 2020 - [info] * Phase 1: Configuration Check Phase..
Fri Apr 24 20:03:53 2020 - [info]
Fri Apr 24 20:03:53 2020 - [debug] Skipping connecting to dead master server1.
Fri Apr 24 20:03:53 2020 - [debug] Connecting to servers..
Fri Apr 24 20:03:54 2020 - [debug] Connected to: server2(172.28.114.94:3358), user=mha
Fri Apr 24 20:03:54 2020 - [debug] Number of slave worker threads on host server2(172.28.114.94:3358): 0
Fri Apr 24 20:03:54 2020 - [debug] Connected to: server3(172.20.130.103:3358), user=mha
Fri Apr 24 20:03:54 2020 - [debug] Number of slave worker threads on host server3(172.20.130.103:3358): 0
Fri Apr 24 20:03:54 2020 - [debug] Comparing MySQL versions..
Fri Apr 24 20:03:54 2020 - [debug] Comparing MySQL versions done.
Fri Apr 24 20:03:54 2020 - [debug] Connecting to servers done.
Fri Apr 24 20:03:54 2020 - [info] GTID failover mode = 0
Fri Apr 24 20:03:54 2020 - [info] Dead Servers:
Fri Apr 24 20:03:54 2020 - [info] server1(172.20.130.105:3358)
Fri Apr 24 20:03:54 2020 - [info] Alive Servers:
Fri Apr 24 20:03:54 2020 - [info] server2(172.28.114.94:3358)
Fri Apr 24 20:03:54 2020 - [info] server3(172.20.130.103:3358)
Fri Apr 24 20:03:54 2020 - [info] Alive Slaves:
Fri Apr 24 20:03:54 2020 - [info] server2(172.28.114.94:3358) Version=5.7.19-log (oldest major version between slaves) log-bin:enabled
Fri Apr 24 20:03:54 2020 - [debug] Relay log info repository: FILE
Fri Apr 24 20:03:54 2020 - [info] Replicating from 172.20.130.105(172.20.130.105:3358)
Fri Apr 24 20:03:54 2020 - [info] Primary candidate for the new Master (candidate_master is set)
Fri Apr 24 20:03:54 2020 - [info] server3(172.20.130.103:3358) Version=5.7.19-log (oldest major version between slaves) log-bin:enabled
Fri Apr 24 20:03:54 2020 - [debug] Relay log info repository: FILE
Fri Apr 24 20:03:54 2020 - [info] Replicating from 172.20.130.105(172.20.130.105:3358)
Fri Apr 24 20:03:54 2020 - [info] Primary candidate for the new Master (candidate_master is set)
Fri Apr 24 20:03:54 2020 - [info] Starting Non-GTID based failover.
Fri Apr 24 20:03:54 2020 - [info]
Fri Apr 24 20:03:54 2020 - [info] ** Phase 1: Configuration Check Phase completed.
Fri Apr 24 20:03:54 2020 - [info]
Fri Apr 24 20:03:54 2020 - [info] * Phase 2: Dead Master Shutdown Phase..
Fri Apr 24 20:03:54 2020 - [info]
Fri Apr 24 20:03:54 2020 - [debug] Stopping IO thread on server2(172.28.114.94:3358)..
Fri Apr 24 20:03:54 2020 - [info] Forcing shutdown so that applications never connect to the current master..
Fri Apr 24 20:03:54 2020 - [info] Executing master IP deactivation script:
Fri Apr 24 20:03:54 2020 - [info] /usr/local/bin/master_ip_failover --orig_master_host=server1 --orig_master_ip=172.20.130.105 --orig_master_port=3358 --command=stopssh --ssh_user=root
Fri Apr 24 20:03:54 2020 - [debug] Stopping IO thread on server3(172.20.130.103:3358)..
Fri Apr 24 20:03:54 2020 - [debug] Stop IO thread on server2(172.28.114.94:3358) done.
Fri Apr 24 20:03:54 2020 - [debug] Stop IO thread on server3(172.20.130.103:3358) done.
Fri Apr 24 20:03:54 2020 - [info] done.
Fri Apr 24 20:03:54 2020 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.
Fri Apr 24 20:03:54 2020 - [info] * Phase 2: Dead Master Shutdown Phase completed.
Fri Apr 24 20:03:54 2020 - [info]
Fri Apr 24 20:03:54 2020 - [info] * Phase 3: Master Recovery Phase..
Fri Apr 24 20:03:54 2020 - [info]
Fri Apr 24 20:03:54 2020 - [info] * Phase 3.1: Getting Latest Slaves Phase..
Fri Apr 24 20:03:54 2020 - [info]
Fri Apr 24 20:03:54 2020 - [debug] Fetching current slave status..
Fri Apr 24 20:03:54 2020 - [debug] Fetching current slave status done.
Fri Apr 24 20:03:54 2020 - [info] The latest binary log file/position on all slaves is mysql-bin.000007:85790
Fri Apr 24 20:03:54 2020 - [info] Latest slaves (Slaves that received relay log files to the latest):
Fri Apr 24 20:03:54 2020 - [info] server2(172.28.114.94:3358) Version=5.7.19-log (oldest major version between slaves) log-bin:enabled
Fri Apr 24 20:03:54 2020 - [debug] Relay log info repository: FILE
Fri Apr 24 20:03:54 2020 - [info] Replicating from 172.20.130.105(172.20.130.105:3358)
Fri Apr 24 20:03:54 2020 - [info] Primary candidate for the new Master (candidate_master is set)
Fri Apr 24 20:03:54 2020 - [info] server3(172.20.130.103:3358) Version=5.7.19-log (oldest major version between slaves) log-bin:enabled
Fri Apr 24 20:03:54 2020 - [debug] Relay log info repository: FILE
Fri Apr 24 20:03:54 2020 - [info] Replicating from 172.20.130.105(172.20.130.105:3358)
Fri Apr 24 20:03:54 2020 - [info] Primary candidate for the new Master (candidate_master is set)
Fri Apr 24 20:03:54 2020 - [info] The oldest binary log file/position on all slaves is mysql-bin.000007:85790
Fri Apr 24 20:03:54 2020 - [info] Oldest slaves:
Fri Apr 24 20:03:54 2020 - [info] server2(172.28.114.94:3358) Version=5.7.19-log (oldest major version between slaves) log-bin:enabled
Fri Apr 24 20:03:54 2020 - [debug] Relay log info repository: FILE
Fri Apr 24 20:03:54 2020 - [info] Replicating from 172.20.130.105(172.20.130.105:3358)
Fri Apr 24 20:03:54 2020 - [info] Primary candidate for the new Master (candidate_master is set)
Fri Apr 24 20:03:54 2020 - [info] server3(172.20.130.103:3358) Version=5.7.19-log (oldest major version between slaves) log-bin:enabled
Fri Apr 24 20:03:54 2020 - [debug] Relay log info repository: FILE
Fri Apr 24 20:03:54 2020 - [info] Replicating from 172.20.130.105(172.20.130.105:3358)
Fri Apr 24 20:03:54 2020 - [info] Primary candidate for the new Master (candidate_master is set)
Fri Apr 24 20:03:54 2020 - [info]
Fri Apr 24 20:03:54 2020 - [info] * Phase 3.2: Saving Dead Master's Binlog Phase..
Fri Apr 24 20:03:54 2020 - [info]
Fri Apr 24 20:03:54 2020 - [info] Fetching dead master's binary logs..
Fri Apr 24 20:03:54 2020 - [info] Executing command on the dead master server1(172.20.130.105:3358): save_binary_logs --command=save --start_file=mysql-bin.000007 --start_pos=85790 --binlog_dir=/export/data/mysql/ --output_file=/export/data/dbbak//saved_master_binlog_from_server1_3358_20200424200353.binlog --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.56 --debug
Creating /export/data/dbbak if not exists.. ok.
Concat binary/relay logs from mysql-bin.000007 pos 85790 to mysql-bin.000007 EOF into /export/data/dbbak//saved_master_binlog_from_server1_3358_20200424200353.binlog ..
parse_init_headers: file=mysql-bin.000007 event_type=15 server_id=130105 length=119 nextmpos=123 prevrelay=4 cur(post)relay=123
Binlog Checksum enabled
parse_init_headers: file=mysql-bin.000007 event_type=35 server_id=130105 length=31 nextmpos=154 prevrelay=123 cur(post)relay=154
Got previous gtids log event: 154.
parse_init_headers: file=mysql-bin.000007 event_type=34 server_id=130105 length=65 nextmpos=219 prevrelay=154 cur(post)relay=219
Dumping binlog format description event, from position 0 to 154.. ok.
Dumping effective binlog data from /export/data/mysql//mysql-bin.000007 position 85790 to tail(85813).. ok.
parse_init_headers: file=saved_master_binlog_from_server1_3358_20200424200353.binlog event_type=15 server_id=130105 length=119 nextmpos=123 prevrelay=4 cur(post)relay=123
Binlog Checksum enabled
parse_init_headers: file=saved_master_binlog_from_server1_3358_20200424200353.binlog event_type=35 server_id=130105 length=31 nextmpos=154 prevrelay=123 cur(post)relay=154
Got previous gtids log event: 154.
parse_init_headers: file=saved_master_binlog_from_server1_3358_20200424200353.binlog event_type=3 server_id=130105 length=23 nextmpos=85813 prevrelay=154 cur(post)relay=177
Concat succeeded.
Fri Apr 24 20:03:54 2020 - [info] scp from [email protected]:/export/data/dbbak//saved_master_binlog_from_server1_3358_20200424200353.binlog to local:/var/log/masterha/app1/saved_master_binlog_from_server1_3358_20200424200353.binlog succeeded.
Fri Apr 24 20:03:54 2020 - [debug] SSH connection test to server2, option -o StrictHostKeyChecking=no -o PasswordAuthentication=no -o BatchMode=yes -o ConnectTimeout=5, timeout 5
Fri Apr 24 20:03:54 2020 - [info] HealthCheck: SSH to server2 is reachable.
Fri Apr 24 20:03:55 2020 - [debug] SSH connection test to server3, option -o StrictHostKeyChecking=no -o PasswordAuthentication=no -o BatchMode=yes -o ConnectTimeout=5, timeout 5
Fri Apr 24 20:03:55 2020 - [info] HealthCheck: SSH to server3 is reachable.
Fri Apr 24 20:03:55 2020 - [info]
Fri Apr 24 20:03:55 2020 - [info] * Phase 3.3: Determining New Master Phase..
Fri Apr 24 20:03:55 2020 - [info]
Fri Apr 24 20:03:55 2020 - [info] Finding the latest slave that has all relay logs for recovering other slaves..
Fri Apr 24 20:03:55 2020 - [info] All slaves received relay logs to the same position. No need to resync each other.
Fri Apr 24 20:03:55 2020 - [info] Searching new master from slaves..
Fri Apr 24 20:03:55 2020 - [info] Candidate masters from the configuration file:
Fri Apr 24 20:03:55 2020 - [info] server2(172.28.114.94:3358) Version=5.7.19-log (oldest major version between slaves) log-bin:enabled
Fri Apr 24 20:03:55 2020 - [debug] Relay log info repository: FILE
Fri Apr 24 20:03:55 2020 - [info] Replicating from 172.20.130.105(172.20.130.105:3358)
Fri Apr 24 20:03:55 2020 - [info] Primary candidate for the new Master (candidate_master is set)
Fri Apr 24 20:03:55 2020 - [info] server3(172.20.130.103:3358) Version=5.7.19-log (oldest major version between slaves) log-bin:enabled
Fri Apr 24 20:03:55 2020 - [debug] Relay log info repository: FILE
Fri Apr 24 20:03:55 2020 - [info] Replicating from 172.20.130.105(172.20.130.105:3358)
Fri Apr 24 20:03:55 2020 - [info] Primary candidate for the new Master (candidate_master is set)
Fri Apr 24 20:03:55 2020 - [info] Non-candidate masters:
Fri Apr 24 20:03:55 2020 - [info] Searching from candidate_master slaves which have received the latest relay log events..
Fri Apr 24 20:03:55 2020 - [info] New master is server2(172.28.114.94:3358)
Fri Apr 24 20:03:55 2020 - [info] Starting master failover..
Fri Apr 24 20:03:55 2020 - [info]
From:
server1(172.20.130.105:3358) (current master)
+--server2(172.28.114.94:3358)
+--server3(172.20.130.103:3358)
To:
server2(172.28.114.94:3358) (new master)
+--server3(172.20.130.103:3358)
Fri Apr 24 20:03:55 2020 - [info]
Fri Apr 24 20:03:55 2020 - [info] * Phase 3.3: New Master Diff Log Generation Phase..
Fri Apr 24 20:03:55 2020 - [info]
Fri Apr 24 20:03:55 2020 - [info] This server has all relay logs. No need to generate diff files from the latest slave.
Fri Apr 24 20:03:55 2020 - [info] Sending binlog..
Fri Apr 24 20:03:55 2020 - [info] scp from local:/var/log/masterha/app1/saved_master_binlog_from_server1_3358_20200424200353.binlog to root@server2:/export/data/dbbak//saved_master_binlog_from_server1_3358_20200424200353.binlog succeeded.
Fri Apr 24 20:03:55 2020 - [info]
Fri Apr 24 20:03:55 2020 - [info] * Phase 3.4: Master Log Apply Phase..
Fri Apr 24 20:03:55 2020 - [info]
Fri Apr 24 20:03:55 2020 - [info] *NOTICE: If any error happens from this phase, manual recovery is needed.
Fri Apr 24 20:03:55 2020 - [info] Starting recovery on server2(172.28.114.94:3358)..
Fri Apr 24 20:03:55 2020 - [info] Generating diffs succeeded.
Fri Apr 24 20:03:55 2020 - [info] Waiting until all relay logs are applied.
Fri Apr 24 20:03:55 2020 - [info] done.
Fri Apr 24 20:03:55 2020 - [debug] Stopping SQL thread on server2(172.28.114.94:3358)..
Fri Apr 24 20:03:55 2020 - [debug] done.
Fri Apr 24 20:03:55 2020 - [info] Getting slave status..
Fri Apr 24 20:03:55 2020 - [info] This slave(server2)'s Exec_Master_Log_Pos equals to Read_Master_Log_Pos(mysql-bin.000007:85790). No need to recover from Exec_Master_Log_Pos.
Fri Apr 24 20:03:55 2020 - [debug] Current max_allowed_packet is 67108864.
Fri Apr 24 20:03:55 2020 - [debug] Tentatively setting max_allowed_packet to 1GB succeeded.
Fri Apr 24 20:03:55 2020 - [info] Connecting to the target slave host server2, running recover script..
Fri Apr 24 20:03:55 2020 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user='mha' --slave_host=server2 --slave_ip=172.28.114.94 --slave_port=3358 --apply_files=/export/data/dbbak//saved_master_binlog_from_server1_3358_20200424200353.binlog --workdir=/export/data/dbbak/ --target_version=5.7.19-log --timestamp=20200424200353 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.56 --debug --slave_pass=xxx
Fri Apr 24 20:03:56 2020 - [info]
MySQL client version is 5.7.19. Using --binary-mode.
Applying differential binary/relay log files /export/data/dbbak//saved_master_binlog_from_server1_3358_20200424200353.binlog on server2:3358. This may take long time...
Applying log files succeeded.
Fri Apr 24 20:03:56 2020 - [debug] Setting max_allowed_packet back to 67108864 succeeded.
Fri Apr 24 20:03:56 2020 - [info] All relay logs were successfully applied.
Fri Apr 24 20:03:56 2020 - [info] Getting new master's binlog name and position..
Fri Apr 24 20:03:56 2020 - [info] mysql-bin.000007:122748
Fri Apr 24 20:03:56 2020 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='server2 or 172.28.114.94', MASTER_PORT=3358, MASTER_LOG_FILE='mysql-bin.000007', MASTER_LOG_POS=122748, MASTER_USER='repl', MASTER_PASSWORD='xxx';
Fri Apr 24 20:03:56 2020 - [info] Executing master IP activate script:
Fri Apr 24 20:03:56 2020 - [info] /usr/local/bin/master_ip_failover --command=start --ssh_user=root --orig_master_host=server1 --orig_master_ip=172.20.130.105 --orig_master_port=3358 --new_master_host=server2 --new_master_ip=172.28.114.94 --new_master_port=3358 --new_master_user='mha' --new_master_password='w5kRBGysiZvc1sNiMphf'
Undefined subroutine &main::FIXME_xxx_create_user called at /usr/local/bin/master_ip_failover line 88.
Set read_only=0 on the new master.
Creating app user on the new master..
Fri Apr 24 20:03:56 2020 - [error][/usr/share/perl5/vendor_perl/MHA/MasterFailover.pm, ln1588] Failed to activate master IP address for server2(172.28.114.94:3358) with return code 10:0
Fri Apr 24 20:03:56 2020 - [warning] Proceeding.
Fri Apr 24 20:03:56 2020 - [info] ** Finished master recovery successfully.
Fri Apr 24 20:03:56 2020 - [info] * Phase 3: Master Recovery Phase completed.
Fri Apr 24 20:03:56 2020 - [info]
Fri Apr 24 20:03:56 2020 - [info] * Phase 4: Slaves Recovery Phase..
Fri Apr 24 20:03:56 2020 - [info]
Fri Apr 24 20:03:56 2020 - [info] * Phase 4.1: Starting Parallel Slave Diff Log Generation Phase..
Fri Apr 24 20:03:56 2020 - [info]
Fri Apr 24 20:03:56 2020 - [info] -- Slave diff file generation on host server3(172.20.130.103:3358) started, pid: 367940. Check tmp log /var/log/masterha/app1/server3_3358_20200424200353.log if it takes time..
Fri Apr 24 20:03:57 2020 - [info]
Fri Apr 24 20:03:57 2020 - [info] Log messages from server3 ...
Fri Apr 24 20:03:57 2020 - [info]
Fri Apr 24 20:03:56 2020 - [info] This server has all relay logs. No need to generate diff files from the latest slave.
Fri Apr 24 20:03:57 2020 - [info] End of log messages from server3.
Fri Apr 24 20:03:57 2020 - [info] -- server3(172.20.130.103:3358) has the latest relay log events.
Fri Apr 24 20:03:57 2020 - [info] Generating relay diff files from the latest slave succeeded.
Fri Apr 24 20:03:57 2020 - [info]
Fri Apr 24 20:03:57 2020 - [info] * Phase 4.2: Starting Parallel Slave Log Apply Phase..
Fri Apr 24 20:03:57 2020 - [info]
Fri Apr 24 20:03:57 2020 - [info] -- Slave recovery on host server3(172.20.130.103:3358) started, pid: 367942. Check tmp log /var/log/masterha/app1/server3_3358_20200424200353.log if it takes time..
Fri Apr 24 20:03:57 2020 - [debug] Explicitly disabled relay_log_purge.
Fri Apr 24 20:03:58 2020 - [info]
Fri Apr 24 20:03:58 2020 - [info] Log messages from server3 ...
Fri Apr 24 20:03:58 2020 - [info]
Fri Apr 24 20:03:57 2020 - [info] Sending binlog..
Fri Apr 24 20:03:57 2020 - [info] scp from local:/var/log/masterha/app1/saved_master_binlog_from_server1_3358_20200424200353.binlog to root@server3:/export/data/dbbak//saved_master_binlog_from_server1_3358_20200424200353.binlog succeeded.
Fri Apr 24 20:03:57 2020 - [info] Starting recovery on server3(172.20.130.103:3358)..
Fri Apr 24 20:03:57 2020 - [info] Generating diffs succeeded.
Fri Apr 24 20:03:57 2020 - [info] Waiting until all relay logs are applied.
Fri Apr 24 20:03:57 2020 - [info] done.
Fri Apr 24 20:03:57 2020 - [debug] Stopping SQL thread on server3(172.20.130.103:3358)..
Fri Apr 24 20:03:57 2020 - [debug] done.
Fri Apr 24 20:03:57 2020 - [info] Getting slave status..
Fri Apr 24 20:03:57 2020 - [info] This slave(server3)'s Exec_Master_Log_Pos equals to Read_Master_Log_Pos(mysql-bin.000007:85790). No need to recover from Exec_Master_Log_Pos.
Fri Apr 24 20:03:57 2020 - [debug] Current max_allowed_packet is 67108864.
Fri Apr 24 20:03:57 2020 - [debug] Tentatively setting max_allowed_packet to 1GB succeeded.
Fri Apr 24 20:03:57 2020 - [info] Connecting to the target slave host server3, running recover script..
Fri Apr 24 20:03:57 2020 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user='mha' --slave_host=server3 --slave_ip=172.20.130.103 --slave_port=3358 --apply_files=/export/data/dbbak//saved_master_binlog_from_server1_3358_20200424200353.binlog --workdir=/export/data/dbbak/ --target_version=5.7.19-log --timestamp=20200424200353 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.56 --debug --slave_pass=xxx
Fri Apr 24 20:03:57 2020 - [info]
MySQL client version is 5.7.19. Using --binary-mode.
Applying differential binary/relay log files /export/data/dbbak//saved_master_binlog_from_server1_3358_20200424200353.binlog on server3:3358. This may take long time...
Applying log files succeeded.
Fri Apr 24 20:03:57 2020 - [debug] Setting max_allowed_packet back to 67108864 succeeded.
Fri Apr 24 20:03:57 2020 - [info] All relay logs were successfully applied.
Fri Apr 24 20:03:57 2020 - [info] Resetting slave server3(172.20.130.103:3358) and starting replication from the new master server2(172.28.114.94:3358)..
Fri Apr 24 20:03:57 2020 - [debug] Stopping slave IO/SQL thread on server3(172.20.130.103:3358)..
Fri Apr 24 20:03:57 2020 - [debug] done.
Fri Apr 24 20:03:57 2020 - [info] Executed CHANGE MASTER.
Fri Apr 24 20:03:57 2020 - [debug] Starting slave IO/SQL thread on server3(172.20.130.103:3358)..
Fri Apr 24 20:03:57 2020 - [debug] done.
Fri Apr 24 20:03:57 2020 - [info] Slave started.
Fri Apr 24 20:03:58 2020 - [info] End of log messages from server3.
Fri Apr 24 20:03:58 2020 - [info] -- Slave recovery on host server3(172.20.130.103:3358) succeeded.
Fri Apr 24 20:03:58 2020 - [info] All new slave servers recovered successfully.
Fri Apr 24 20:03:58 2020 - [info]
Fri Apr 24 20:03:58 2020 - [info] * Phase 5: New master cleanup phase..
Fri Apr 24 20:03:58 2020 - [info]
Fri Apr 24 20:03:58 2020 - [info] Resetting slave info on the new master..
Fri Apr 24 20:03:58 2020 - [debug] Clearing slave info..
Fri Apr 24 20:03:58 2020 - [debug] Stopping slave IO/SQL thread on server2(172.28.114.94:3358)..
Fri Apr 24 20:03:58 2020 - [debug] done.
Fri Apr 24 20:03:58 2020 - [error][/usr/share/perl5/vendor_perl/MHA/Server.pm, ln705] SHOW SLAVE STATUS shows new master replicates from somewhere. Check for details!
Fri Apr 24 20:03:58 2020 - [error][/usr/share/perl5/vendor_perl/MHA/Server.pm, ln719] server2: Resetting slave info failed.
Fri Apr 24 20:03:58 2020 - [error][/usr/share/perl5/vendor_perl/MHA/MasterFailover.pm, ln2021] Master failover to server2(172.28.114.94:3358) done, but recovery on slave partially failed.
Fri Apr 24 20:03:58 2020 - [debug] Disconnected from server2(172.28.114.94:3358)
Fri Apr 24 20:03:58 2020 - [debug] Disconnected from server3(172.20.130.103:3358)
Fri Apr 24 20:03:58 2020 - [info]
----- Failover Report -----
app1: MySQL Master failover server1(172.20.130.105:3358) to server2(172.28.114.94:3358)
Master server1(172.20.130.105:3358) is down!
Check MHA Manager logs at host-10-185-161-202:/var/log/masterha/app1/manager.log for details.
Started automated(non-interactive) failover.
Invalidated master IP address on server1(172.20.130.105:3358)
The latest slave server2(172.28.114.94:3358) has all relay logs for recovery.
Selected server2(172.28.114.94:3358) as a new master.
server2(172.28.114.94:3358): OK: Applying all logs succeeded.
Failed to activate master IP address for server2(172.28.114.94:3358) with return code 10:0
server3(172.20.130.103:3358): This host has the latest relay log events.
Generating relay diff files from the latest slave succeeded.
server3(172.20.130.103:3358): OK: Applying all logs succeeded. Slave started, replicating from server2(172.28.114.94:3358)
server2(172.28.114.94:3358): Resetting slave info failed.
Master failover to server2(172.28.114.94:3358) done, but recovery on slave partially failed.
From:
server1(172.20.130.105:3358) (current master)
+--server2(172.28.114.94:3358)
+--server3(172.20.130.103:3358)
To:
server2(172.28.114.94:3358) (new master)
+--server3(172.20.130.103:3358)
根据日志可以看到,server1 down了, server2提升为master了;
日志中具体表现切换过程中做的事情,可以参考上一篇MHA介绍理解;