以下所有测试,全部基于以下复制结构完成
host_1(host_1:3306) (current master)
+--host_2(host_2:3306)
+--host_3(host_3:3306)
mha node 所有服务器节点都要安装
mha manager 只需要安装在manager节点
1. yum install perl-DBD-MySQL
2. rpm -ivh mha4mysql-node-0.56-0.el6.noarch.rpm
1. yum install perl-DBD-MySQL
2. yum install perl-Config-Tiny
3. yum install perl-Log-Dispatch
4. yum install perl-Parallel-ForkManager
5. yum install perl-Time-HiRes
6. rpm -ivh mha4mysql-manager-0.56-0.el6.noarch.rpm
* Node
Node> rpm -qpl mha4mysql-node-0.56-0.el6.noarch.rpm
/usr/bin/apply_diff_relay_logs
/usr/bin/filter_mysqlbinlog
/usr/bin/purge_relay_logs
/usr/bin/save_binary_logs
/usr/share/man/man1/apply_diff_relay_logs.1.gz
/usr/share/man/man1/filter_mysqlbinlog.1.gz
/usr/share/man/man1/purge_relay_logs.1.gz
/usr/share/man/man1/save_binary_logs.1.gz
/usr/share/perl5/vendor_perl/MHA/BinlogHeaderParser.pm
/usr/share/perl5/vendor_perl/MHA/BinlogManager.pm
/usr/share/perl5/vendor_perl/MHA/BinlogPosFindManager.pm
/usr/share/perl5/vendor_perl/MHA/BinlogPosFinder.pm
/usr/share/perl5/vendor_perl/MHA/BinlogPosFinderElp.pm
/usr/share/perl5/vendor_perl/MHA/BinlogPosFinderXid.pm
/usr/share/perl5/vendor_perl/MHA/NodeConst.pm
/usr/share/perl5/vendor_perl/MHA/NodeUtil.pm
/usr/share/perl5/vendor_perl/MHA/SlaveUtil.pm
* Manager
Manager> rpm -qpl mha4mysql-manager-0.56-0.el6.noarch.rpm
/usr/bin/masterha_check_repl
/usr/bin/masterha_check_ssh
/usr/bin/masterha_check_status
/usr/bin/masterha_conf_host
/usr/bin/masterha_manager
/usr/bin/masterha_master_monitor
/usr/bin/masterha_master_switch
/usr/bin/masterha_secondary_check
/usr/bin/masterha_stop
/usr/share/man/man1/masterha_check_repl.1.gz
/usr/share/man/man1/masterha_check_ssh.1.gz
/usr/share/man/man1/masterha_check_status.1.gz
/usr/share/man/man1/masterha_conf_host.1.gz
/usr/share/man/man1/masterha_manager.1.gz
/usr/share/man/man1/masterha_master_monitor.1.gz
/usr/share/man/man1/masterha_master_switch.1.gz
/usr/share/man/man1/masterha_secondary_check.1.gz
/usr/share/man/man1/masterha_stop.1.gz
/usr/share/perl5/vendor_perl/MHA/Config.pm
/usr/share/perl5/vendor_perl/MHA/DBHelper.pm
/usr/share/perl5/vendor_perl/MHA/FileStatus.pm
/usr/share/perl5/vendor_perl/MHA/HealthCheck.pm
/usr/share/perl5/vendor_perl/MHA/ManagerAdmin.pm
/usr/share/perl5/vendor_perl/MHA/ManagerAdminWrapper.pm
/usr/share/perl5/vendor_perl/MHA/ManagerConst.pm
/usr/share/perl5/vendor_perl/MHA/ManagerUtil.pm
/usr/share/perl5/vendor_perl/MHA/MasterFailover.pm
/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm
/usr/share/perl5/vendor_perl/MHA/MasterRotate.pm
/usr/share/perl5/vendor_perl/MHA/SSHCheck.pm
/usr/share/perl5/vendor_perl/MHA/Server.pm
/usr/share/perl5/vendor_perl/MHA/ServerManager.pm
* /etc/masterha_default.cnf
[server default] -- 这下面的都是全局配置,适用于所有app.cnf
user=dba
password=dba
ssh_user=root
master_binlog_dir= /data/mysql.bin
secondary_check_script= masterha_secondary_check -s remote_host1 -s remote_host2
ping_interval=3
master_ip_failover_script=/script/masterha/master_ip_failover
shutdown_script= /script/masterha/power_manager
report_script= /script/masterha/send_master_failover_mail
* /etc/app1.cnf
[server default]
remote_workdir=/var/log/masterha/app1
manager_workdir=/var/log/masterha/app1
manager_log=/var/log/masterha/app1/app1.log
[server default] 里面的这些配置,会影响到下面的N 个 server
* /etc/app1.cnf
[server1]
hostname=192.168.1.10
candidate_master=1
[server2]
hostname=192.168.1.11
candidate_master=1
[server3]
hostname=192.168.1.12
no_master=1
每个server下面的配置,专属于某个server
基本上MHA manager,MNA node,以及二次检测的节点,都需要互相信任
# masterha_check_ssh --conf=/etc/app1.cnf
Sat May 14 14:42:19 2011 - [warn] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Sat May 14 14:42:19 2011 - [info] Reading application default configurations from /etc/app1.cnf..
Sat May 14 14:42:19 2011 - [info] Reading server configurations from /etc/app1.cnf..
Sat May 14 14:42:19 2011 - [info] Starting SSH connection tests..
Sat May 14 14:42:19 2011 - [debug] Connecting via SSH from root@host1(192.168.0.1) to root@host2(192.168.0.2)..
Sat May 14 14:42:20 2011 - [debug] ok.
Sat May 14 14:42:20 2011 - [debug] Connecting via SSH from root@host1(192.168.0.1) to root@host3(192.168.0.3)..
Sat May 14 14:42:20 2011 - [debug] ok.
Sat May 14 14:42:21 2011 - [debug] Connecting via SSH from root@host2(192.168.0.2) to root@host1(192.168.0.1)..
Sat May 14 14:42:21 2011 - [debug] ok.
Sat May 14 14:42:21 2011 - [debug] Connecting via SSH from root@host2(192.168.0.2) to root@host3(192.168.0.3)..
Sat May 14 14:42:21 2011 - [debug] ok.
Sat May 14 14:42:22 2011 - [debug] Connecting via SSH from root@host3(192.168.0.3) to root@host1(192.168.0.1)..
Sat May 14 14:42:22 2011 - [debug] ok.
Sat May 14 14:42:22 2011 - [debug] Connecting via SSH from root@host3(192.168.0.3) to root@host2(192.168.0.2)..
Sat May 14 14:42:22 2011 - [debug] ok.
Sat May 14 14:42:22 2011 - [info] All SSH connection tests passed successfully.
如果slave比较多,实例比较多,最好提高下 /etc/ssh/sshd_config MaxStartups 的值(默认是10)
仅在Linux上测试过
打从一开始,MHA就是为了解决数据一致性而出生,所以,最好是多个slave
如果你只有一个slave,根本就碰不到数据一致性问题,也就不需要mha了
如果是一个slave,用版同步复制也能解决
从0.52开始,MHA就支持多master的复制架构了,下面列举了多master环境下注意点
默认情况下,MHA是不支持3层或多层复制架构的(Master1 -> Master2 -> Slave3)
MHA可以恢复Master2,但是不能恢复Slave3,因为Master2,Slave3有不同的master
为了让MHA支持以上架构,可以参考如下配置:
[app@slave_host1]$ mysqlbinlog --version
mysqlbinlog Ver 3.3 for unknown-linux-gnu at x86_64
默认情况下,如果SQL线程执行完relay-log,relay logs就会被自动删除。
但是这些relay-logs 也许还会用来恢复其他的slave,所以你需要关闭自动删除relay-logs的purge线程,然后自己阶段性的来删除
如果是你自己来删的话,必须考虑repl 延迟问题
最好让slave删除relay log不要在同一时间点,假如需要恢复,那么这个时间点所有relay logs都被删除了就不好了
不管是SBR,还是RBR,最好不要使用load data
- [功能测试] 测试场景一、手工操作,手动完成,failover
- [功能测试] 测试场景二、手工操作, 自动完成,failover
- [功能测试] 测试场景三、手工操作,自动完成,online master switch
- [功能测试] 测试场景四、自动监控,自动操作,自动完成,failover
- [用例测试] MySQL master 服务 down掉,是否成功自动failover
- [用例测试] MySQL master too many connection,无权限,响应慢,是否自动failover
- [用例测试] MySQL master 服务器 down掉,且候选master落后的最多, 是否自动failover,知否可以成功的做日志补偿
- [用例测试] MySQL slave 服务 down掉,是否自动failover
- [用例测试] MySQL 有一台slave,或者多台slave服务器延迟很大,是否自动failover
- [用例测试] MySQL slave IO/SQL线程 stop,是否自动failover
- [用例测试] MySQL slave IO/SQL线程 报错,是否自动failover
- [用例测试] MySQL master 有大事务超过100s再执行,是否可以online master switch
- [用例测试] MySQL master 网络断掉,是否自动failover
- [用例测试] MySQL master 网路瞬断(1~30秒),是否自动failover
- [用例测试] MySQL master 和 候选master 网络都挂掉的情况,是否自动failover
- [用例测试] MHA manager 和 MySQL master之间的网络断掉,但是master和slave之间的网络是好的,是否自动failover
- [用例测试] MHA manager 和 MySQL master,slave 网络都断掉的情况,是否自动failover
- [用例测试] GTID模式下,还需要relay-log吗?是否能够成功的补齐日志
- [用例测试] 多线程复制模式下,做failover 和 online master switch,会不会有问题呢?
- [用例测试] 在一开始没有开启MHA的group中,如何做到日志补偿,然后change master呢?
manager_host$ cat /etc/app1.cnf
[server default]
# mysql user and password
user=root
password=mysqlpass
ssh_user=root
# working directory on the manager
manager_workdir=/var/log/masterha/app1
# working directory on MySQL servers
remote_workdir=/var/log/masterha/app1
[server1]
hostname=host1
[server2]
hostname=host2
[server3]
hostname=host3
# masterha_check_ssh --conf=/etc/app1.cnf
manager_host$ masterha_check_repl --conf=/etc/app1.cnf
manager_host$ masterha_manager --conf=/etc/app1.cnf
manager_host$ masterha_check_status --conf=/etc/app1.cnf
manager_host$ masterha_stop --conf=/etc/app1.cnf
host1$ killall -9 mysqld mysqld_safe
* 二次检测: 检查master是否真的挂了,避免脑裂 secondary_network_script
* master_ip_failover: 默认的是空,什么都不做
ip漂移
新master赋予写
日志保留
报警
等等
* 检查SSH
masterha_check_ssh --conf=/etc/app1.cnf
* 检查复制(如果是非GTID模式,slave如果没有设置relay_log_purge=0,那么会有警告)
masterha_check_repl --conf=/etc/app1.cnf
Thu Jul 28 14:03:33 2016 - [warning] relay_log_purge=0 is not set on slave host_2(host_2:3306).
Thu Jul 28 14:03:33 2016 - [warning] relay_log_purge=0 is not set on slave host_3(host_3:3306).
* 检查mha状态
masterha_check_status --conf=/etc/app1.cnf
* 开启mha 监控
nohup masterha_manager --conf=/etc/app1.cnf &
* 关闭mha 监控
masterha_stop --conf=/etc/app1.cnf
killall -9 mysqld mysqld_safe
masterha_master_switch --master_state=dead --conf=/etc/app1.cnf --dead_master_host=host_1 --interactive=1 --ignore_last_failover
* 切换日志如下:
--dead_master_ip= is not set. Using host_1.
--dead_master_port= is not set. Using 3306.
Thu Jul 28 16:37:04 2016 - [info] Reading default configuration from /etc/masterha_default.cnf..
Thu Jul 28 16:37:04 2016 - [info] Reading application default configuration from /etc/app1.cnf..
Thu Jul 28 16:37:04 2016 - [info] Reading server configuration from /etc/app1.cnf..
Thu Jul 28 16:37:04 2016 - [info] MHA::MasterFailover version 0.56.
Thu Jul 28 16:37:04 2016 - [info] Starting master failover.
Thu Jul 28 16:37:04 2016 - [info]
Thu Jul 28 16:37:04 2016 - [info] * Phase 1: Configuration Check Phase..
Thu Jul 28 16:37:04 2016 - [info]
Thu Jul 28 16:37:04 2016 - [info] GTID failover mode = 0
Thu Jul 28 16:37:04 2016 - [info] Dead Servers:
Thu Jul 28 16:37:04 2016 - [info] host_1(host_1:3306)
Thu Jul 28 16:37:04 2016 - [info] Checking master reachability via MySQL(double check)...
Thu Jul 28 16:37:04 2016 - [info] ok.
Thu Jul 28 16:37:04 2016 - [info] Alive Servers:
Thu Jul 28 16:37:04 2016 - [info] host_2(host_2:3306)
Thu Jul 28 16:37:04 2016 - [info] host_3(host_3:3306)
Thu Jul 28 16:37:04 2016 - [info] Alive Slaves:
Thu Jul 28 16:37:04 2016 - [info] host_2(host_2:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled
Thu Jul 28 16:37:04 2016 - [info] Replicating from host_1(host_1:3306)
Thu Jul 28 16:37:04 2016 - [info] Primary candidate for the new Master (candidate_master is set)
Thu Jul 28 16:37:04 2016 - [info] host_3(host_3:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled
Thu Jul 28 16:37:04 2016 - [info] Replicating from host_1(host_1:3306)
Thu Jul 28 16:37:04 2016 - [info] Not candidate for the new Master (no_master is set)
Master host_1(host_1:3306) is dead. Proceed? (yes/NO): yes
Thu Jul 28 16:37:06 2016 - [info] Starting Non-GTID based failover.
Thu Jul 28 16:37:06 2016 - [info]
Thu Jul 28 16:37:06 2016 - [info] ** Phase 1: Configuration Check Phase completed.
Thu Jul 28 16:37:06 2016 - [info]
Thu Jul 28 16:37:06 2016 - [info] * Phase 2: Dead Master Shutdown Phase..
Thu Jul 28 16:37:06 2016 - [info]
Thu Jul 28 16:37:06 2016 - [info] HealthCheck: SSH to host_1 is reachable.
Thu Jul 28 16:37:07 2016 - [info] Forcing shutdown so that applications never connect to the current master..
Thu Jul 28 16:37:07 2016 - [info] Executing master IP deactivation script:
Thu Jul 28 16:37:07 2016 - [info] /home/mysql/MHA/masterha/master_ip_failover --orig_master_host=host_1 --orig_master_ip=host_1 --orig_master_port=3306 --command=stopssh --ssh_user=root
Thu Jul 28 16:37:07 2016 - [info] done.
Thu Jul 28 16:37:07 2016 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.
Thu Jul 28 16:37:07 2016 - [info] * Phase 2: Dead Master Shutdown Phase completed.
Thu Jul 28 16:37:07 2016 - [info]
Thu Jul 28 16:37:07 2016 - [info] * Phase 3: Master Recovery Phase..
Thu Jul 28 16:37:07 2016 - [info]
Thu Jul 28 16:37:07 2016 - [info] * Phase 3.1: Getting Latest Slaves Phase..
Thu Jul 28 16:37:07 2016 - [info]
Thu Jul 28 16:37:07 2016 - [info] The latest binary log file/position on all slaves is host_1_name.000004:154
Thu Jul 28 16:37:07 2016 - [info] Latest slaves (Slaves that received relay log files to the latest):
Thu Jul 28 16:37:07 2016 - [info] host_2(host_2:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled
Thu Jul 28 16:37:07 2016 - [info] Replicating from host_1(host_1:3306)
Thu Jul 28 16:37:07 2016 - [info] Primary candidate for the new Master (candidate_master is set)
Thu Jul 28 16:37:07 2016 - [info] host_3(host_3:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled
Thu Jul 28 16:37:07 2016 - [info] Replicating from host_1(host_1:3306)
Thu Jul 28 16:37:07 2016 - [info] Not candidate for the new Master (no_master is set)
Thu Jul 28 16:37:07 2016 - [info] The oldest binary log file/position on all slaves is host_1_name.000004:154
Thu Jul 28 16:37:07 2016 - [info] Oldest slaves:
Thu Jul 28 16:37:07 2016 - [info] host_2(host_2:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled
Thu Jul 28 16:37:07 2016 - [info] Replicating from host_1(host_1:3306)
Thu Jul 28 16:37:07 2016 - [info] Primary candidate for the new Master (candidate_master is set)
Thu Jul 28 16:37:07 2016 - [info] host_3(host_3:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled
Thu Jul 28 16:37:07 2016 - [info] Replicating from host_1(host_1:3306)
Thu Jul 28 16:37:07 2016 - [info] Not candidate for the new Master (no_master is set)
Thu Jul 28 16:37:07 2016 - [info]
Thu Jul 28 16:37:07 2016 - [info] * Phase 3.2: Saving Dead Master's Binlog Phase..
Thu Jul 28 16:37:07 2016 - [info]
Thu Jul 28 16:37:07 2016 - [info] Fetching dead master's binary logs..
Thu Jul 28 16:37:07 2016 - [info] Executing command on the dead master host_1(host_1:3306): save_binary_logs --command=save --start_file=host_1_name.000004 --start_pos=154 --binlog_dir=/data/mysql.bin --output_file=/var/log/masterha/app1/saved_master_binlog_from_host_1_3306_20160728163704.binlog --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.56
Creating /var/log/masterha/app1 if not exists.. ok.
Concat binary/relay logs from host_1_name.000004 pos 154 to host_1_name.000004 EOF into /var/log/masterha/app1/saved_master_binlog_from_host_1_3306_20160728163704.binlog ..
Binlog Checksum enabled
Dumping binlog format description event, from position 0 to 154.. ok.
No need to dump effective binlog data from /data/mysql.bin/host_1_name.000004 (pos starts 154, filesize 154). Skipping.
Binlog Checksum enabled
/var/log/masterha/app1/saved_master_binlog_from_host_1_3306_20160728163704.binlog has no effective data events.
Event not exists.
Thu Jul 28 16:37:07 2016 - [info] Additional events were not found from the orig master. No need to save.
Thu Jul 28 16:37:07 2016 - [info]
Thu Jul 28 16:37:07 2016 - [info] * Phase 3.3: Determining New Master Phase..
Thu Jul 28 16:37:07 2016 - [info]
Thu Jul 28 16:37:07 2016 - [info] Finding the latest slave that has all relay logs for recovering other slaves..
Thu Jul 28 16:37:07 2016 - [info] All slaves received relay logs to the same position. No need to resync each other.
Thu Jul 28 16:37:07 2016 - [info] Searching new master from slaves..
Thu Jul 28 16:37:07 2016 - [info] Candidate masters from the configuration file:
Thu Jul 28 16:37:07 2016 - [info] host_2(host_2:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled
Thu Jul 28 16:37:07 2016 - [info] Replicating from host_1(host_1:3306)
Thu Jul 28 16:37:07 2016 - [info] Primary candidate for the new Master (candidate_master is set)
Thu Jul 28 16:37:07 2016 - [info] Non-candidate masters:
Thu Jul 28 16:37:07 2016 - [info] host_3(host_3:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled
Thu Jul 28 16:37:07 2016 - [info] Replicating from host_1(host_1:3306)
Thu Jul 28 16:37:07 2016 - [info] Not candidate for the new Master (no_master is set)
Thu Jul 28 16:37:07 2016 - [info] Searching from candidate_master slaves which have received the latest relay log events..
Thu Jul 28 16:37:07 2016 - [info] New master is host_2(host_2:3306)
Thu Jul 28 16:37:07 2016 - [info] Starting master failover..
Thu Jul 28 16:37:07 2016 - [info]
From:
host_1(host_1:3306) (current master)
+--host_2(host_2:3306)
+--host_3(host_3:3306)
To:
host_2(host_2:3306) (new master)
+--host_3(host_3:3306)
Starting master switch from host_1(host_1:3306) to host_2(host_2:3306)? (yes/NO): yes
Thu Jul 28 16:37:09 2016 - [info] New master decided manually is host_2(host_2:3306)
Thu Jul 28 16:37:09 2016 - [info]
Thu Jul 28 16:37:09 2016 - [info] * Phase 3.3: New Master Diff Log Generation Phase..
Thu Jul 28 16:37:09 2016 - [info]
Thu Jul 28 16:37:09 2016 - [info] This server has all relay logs. No need to generate diff files from the latest slave.
Thu Jul 28 16:37:09 2016 - [info]
Thu Jul 28 16:37:09 2016 - [info] * Phase 3.4: Master Log Apply Phase..
Thu Jul 28 16:37:09 2016 - [info]
Thu Jul 28 16:37:09 2016 - [info] *NOTICE: If any error happens from this phase, manual recovery is needed.
Thu Jul 28 16:37:09 2016 - [info] Starting recovery on host_2(host_2:3306)..
Thu Jul 28 16:37:09 2016 - [info] This server has all relay logs. Waiting all logs to be applied..
Thu Jul 28 16:37:09 2016 - [info] done.
Thu Jul 28 16:37:09 2016 - [info] All relay logs were successfully applied.
Thu Jul 28 16:37:09 2016 - [info] Getting new master's binlog name and position..
Thu Jul 28 16:37:09 2016 - [info] host_2_name.000002:294
Thu Jul 28 16:37:09 2016 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='host_2', MASTER_PORT=3306, MASTER_LOG_FILE='host_2_name.000002', MASTER_LOG_POS=294, MASTER_USER='repl', MASTER_PASSWORD='xxx';
Thu Jul 28 16:37:09 2016 - [info] Executing master IP activate script:
Thu Jul 28 16:37:09 2016 - [info] /home/mysql/MHA/masterha/master_ip_failover --command=start --ssh_user=root --orig_master_host=host_1 --orig_master_ip=host_1 --orig_master_port=3306 --new_master_host=host_2 --new_master_ip=host_2 --new_master_port=3306 --new_master_user='dba' --new_master_password='dba'
Set read_only=0 on the new master.
No need to Creating app user on the new master..
Thu Jul 28 16:37:09 2016 - [info] OK.
Thu Jul 28 16:37:09 2016 - [info] ** Finished master recovery successfully.
Thu Jul 28 16:37:09 2016 - [info] * Phase 3: Master Recovery Phase completed.
Thu Jul 28 16:37:09 2016 - [info]
Thu Jul 28 16:37:09 2016 - [info] * Phase 4: Slaves Recovery Phase..
Thu Jul 28 16:37:09 2016 - [info]
Thu Jul 28 16:37:09 2016 - [info] * Phase 4.1: Starting Parallel Slave Diff Log Generation Phase..
Thu Jul 28 16:37:09 2016 - [info]
Thu Jul 28 16:37:09 2016 - [info] -- Slave diff file generation on host host_3(host_3:3306) started, pid: 6211. Check tmp log /var/log/masterha/app1/host_3_3306_20160728163704.log if it takes time..
Thu Jul 28 16:37:09 2016 - [info]
Thu Jul 28 16:37:09 2016 - [info] Log messages from host_3 ...
Thu Jul 28 16:37:09 2016 - [info]
Thu Jul 28 16:37:09 2016 - [info] This server has all relay logs. No need to generate diff files from the latest slave.
Thu Jul 28 16:37:09 2016 - [info] End of log messages from host_3.
Thu Jul 28 16:37:09 2016 - [info] -- host_3(host_3:3306) has the latest relay log events.
Thu Jul 28 16:37:09 2016 - [info] Generating relay diff files from the latest slave succeeded.
Thu Jul 28 16:37:09 2016 - [info]
Thu Jul 28 16:37:09 2016 - [info] * Phase 4.2: Starting Parallel Slave Log Apply Phase..
Thu Jul 28 16:37:09 2016 - [info]
Thu Jul 28 16:37:09 2016 - [info] -- Slave recovery on host host_3(host_3:3306) started, pid: 6216. Check tmp log /var/log/masterha/app1/host_3_3306_20160728163704.log if it takes time..
Thu Jul 28 16:37:10 2016 - [info]
Thu Jul 28 16:37:10 2016 - [info] Log messages from host_3 ...
Thu Jul 28 16:37:10 2016 - [info]
Thu Jul 28 16:37:09 2016 - [info] Starting recovery on host_3(host_3:3306)..
Thu Jul 28 16:37:09 2016 - [info] This server has all relay logs. Waiting all logs to be applied..
Thu Jul 28 16:37:09 2016 - [info] done.
Thu Jul 28 16:37:09 2016 - [info] All relay logs were successfully applied.
Thu Jul 28 16:37:09 2016 - [info] Resetting slave host_3(host_3:3306) and starting replication from the new master host_2(host_2:3306)..
Thu Jul 28 16:37:09 2016 - [info] Executed CHANGE MASTER.
Thu Jul 28 16:37:10 2016 - [info] Slave started.
Thu Jul 28 16:37:10 2016 - [info] End of log messages from host_3.
Thu Jul 28 16:37:10 2016 - [info] -- Slave recovery on host host_3(host_3:3306) succeeded.
Thu Jul 28 16:37:10 2016 - [info] All new slave servers recovered successfully.
Thu Jul 28 16:37:10 2016 - [info]
Thu Jul 28 16:37:10 2016 - [info] * Phase 5: New master cleanup phase..
Thu Jul 28 16:37:10 2016 - [info]
Thu Jul 28 16:37:10 2016 - [info] Resetting slave info on the new master..
Thu Jul 28 16:37:10 2016 - [info] host_2: Resetting slave info succeeded.
Thu Jul 28 16:37:10 2016 - [info] Master failover to host_2(host_2:3306) completed successfully.
Thu Jul 28 16:37:10 2016 - [info]
----- Failover Report -----
app1: MySQL Master failover host_1(host_1:3306) to host_2(host_2:3306) succeeded
Master host_1(host_1:3306) is down!
Check MHA Manager logs at host_manager_name for details.
Started manual(interactive) failover.
Invalidated master IP address on host_1(host_1:3306)
The latest slave host_2(host_2:3306) has all relay logs for recovery.
Selected host_2(host_2:3306) as a new master.
host_2(host_2:3306): OK: Applying all logs succeeded.
host_2(host_2:3306): OK: Activated master IP address.
host_3(host_3:3306): This host has the latest relay log events.
Generating relay diff files from the latest slave succeeded.
host_3(host_3:3306): OK: Applying all logs succeeded. Slave started, replicating from host_2(host_2:3306)
host_2(host_2:3306): Resetting slave info succeeded.
Master failover to host_2(host_2:3306) completed successfully.
* 检查SSH
masterha_check_ssh --conf=/etc/app1.cnf
* 检查复制(如果是非GTID模式,slave如果没有设置relay_log_purge=0,那么会有警告)
masterha_check_repl --conf=/etc/app1.cnf
Thu Jul 28 14:03:33 2016 - [warning] relay_log_purge=0 is not set on slave host_2(host_2:3306).
Thu Jul 28 14:03:33 2016 - [warning] relay_log_purge=0 is not set on slave host_3(host_3:3306).
* 检查mha状态
masterha_check_status --conf=/etc/app1.cnf
* 开启mha 监控
nohup masterha_manager --conf=/etc/app1.cnf &
* 关闭mha 监控
masterha_stop --conf=/etc/app1.cnf
killall -9 mysqld mysqld_safe
masterha_master_switch --master_state=dead --conf=/etc/app1.cnf --dead_master_host=host_1 --interactive=0 --ignore_last_failover
* 切换日志如下:
--dead_master_ip= is not set. Using host_1.
--dead_master_port= is not set. Using 3306.
Thu Jul 28 16:33:21 2016 - [info] Reading default configuration from /etc/masterha_default.cnf..
Thu Jul 28 16:33:21 2016 - [info] Reading application default configuration from /etc/app1.cnf..
Thu Jul 28 16:33:21 2016 - [info] Reading server configuration from /etc/app1.cnf..
* 检查SSH
masterha_check_ssh --conf=/etc/app1.cnf
* 检查复制(如果是非GTID模式,slave如果没有设置relay_log_purge=0,那么会有警告)
masterha_check_repl --conf=/etc/app1.cnf
Thu Jul 28 14:03:33 2016 - [warning] relay_log_purge=0 is not set on slave host_2(host_2:3306).
Thu Jul 28 14:03:33 2016 - [warning] relay_log_purge=0 is not set on slave host_3(host_3:3306).
* 检查mha状态
masterha_check_status --conf=/etc/app1.cnf
* 开启mha 监控
nohup masterha_manager --conf=/etc/app1.cnf &
killall -9 mysqld mysqld_safe
masterha_master_switch --master_state=alive --conf=/etc/app1.cnf --orig_master_is_new_slave --interactive=0
* 在online master切换过程中,如果MHA manager 开启,那么切换不会成功
Thu Jul 28 16:13:57 2016 - [info] Checking MHA is not monitoring or doing failover..
Thu Jul 28 16:13:57 2016 - [error][/usr/share/perl5/vendor_perl/MHA/MasterRotate.pm, ln142] Getting advisory lock failed on the current master. MHA Monitor runs on the current master. Stop MHA Manager/Monitor and try again.
Thu Jul 28 16:13:57 2016 - [error][/usr/share/perl5/vendor_perl/MHA/ManagerUtil.pm, ln177] Got ERROR: at /usr/bin/masterha_master_switch line 53
* 在online master切换过程中,如果设置--orig_master_is_new_slave, 但是没有设置repl_password,就会报错
Thu Jul 28 16:15:13 2016 - [error][/usr/share/perl5/vendor_perl/MHA/Server.pm, ln784] Slave could not be started on host_1(host_1:3306)! Check slave status.
Thu Jul 28 16:15:13 2016 - [error][/usr/share/perl5/vendor_perl/MHA/Server.pm, ln862] Starting slave IO/SQL thread on host_1(host_1:3306) failed!
Thu Jul 28 16:15:13 2016 - [error][/usr/share/perl5/vendor_perl/MHA/MasterRotate.pm, ln573] Failed!
Thu Jul 28 16:15:13 2016 - [error][/usr/share/perl5/vendor_perl/MHA/MasterRotate.pm, ln602] Switching master to host_2(host_2:3306) done, but switching slaves partially failed.
* 切换成功的日志如下:
Thu Jul 28 16:23:25 2016 - [info] MHA::MasterRotate version 0.56.
Thu Jul 28 16:23:25 2016 - [info] Starting online master switch..
Thu Jul 28 16:23:25 2016 - [info]
Thu Jul 28 16:23:25 2016 - [info] * Phase 1: Configuration Check Phase..
Thu Jul 28 16:23:25 2016 - [info]
Thu Jul 28 16:23:25 2016 - [info] Reading default configuration from /etc/masterha_default.cnf..
Thu Jul 28 16:23:25 2016 - [info] Reading application default configuration from /etc/app1.cnf..
Thu Jul 28 16:23:25 2016 - [info] Reading server configuration from /etc/app1.cnf..
Thu Jul 28 16:23:25 2016 - [info] GTID failover mode = 0
Thu Jul 28 16:23:25 2016 - [info] Current Alive Master: host_2(host_2:3306)
Thu Jul 28 16:23:25 2016 - [info] Alive Slaves:
Thu Jul 28 16:23:25 2016 - [info] host_1(host_1:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled
Thu Jul 28 16:23:25 2016 - [info] Replicating from host_2(host_2:3306)
Thu Jul 28 16:23:25 2016 - [info] Primary candidate for the new Master (candidate_master is set)
Thu Jul 28 16:23:25 2016 - [info] host_3(host_3:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled
Thu Jul 28 16:23:25 2016 - [info] Replicating from host_2(host_2:3306)
Thu Jul 28 16:23:25 2016 - [info] Not candidate for the new Master (no_master is set)
Thu Jul 28 16:23:25 2016 - [info] Executing FLUSH NO_WRITE_TO_BINLOG TABLES. This may take long time..
Thu Jul 28 16:23:25 2016 - [info] ok.
Thu Jul 28 16:23:25 2016 - [info] Checking MHA is not monitoring or doing failover..
Thu Jul 28 16:23:25 2016 - [info] Checking replication health on host_1..
Thu Jul 28 16:23:25 2016 - [info] ok.
Thu Jul 28 16:23:25 2016 - [info] Checking replication health on host_3..
Thu Jul 28 16:23:25 2016 - [info] ok.
Thu Jul 28 16:23:25 2016 - [info] Searching new master from slaves..
Thu Jul 28 16:23:25 2016 - [info] Candidate masters from the configuration file:
Thu Jul 28 16:23:25 2016 - [info] host_1(host_1:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled
Thu Jul 28 16:23:25 2016 - [info] Replicating from host_2(host_2:3306)
Thu Jul 28 16:23:25 2016 - [info] Primary candidate for the new Master (candidate_master is set)
Thu Jul 28 16:23:25 2016 - [info] host_2(host_2:3306) Version=5.7.13-log log-bin:enabled
Thu Jul 28 16:23:25 2016 - [info] Non-candidate masters:
Thu Jul 28 16:23:25 2016 - [info] host_3(host_3:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled
Thu Jul 28 16:23:25 2016 - [info] Replicating from host_2(host_2:3306)
Thu Jul 28 16:23:25 2016 - [info] Not candidate for the new Master (no_master is set)
Thu Jul 28 16:23:25 2016 - [info] Searching from candidate_master slaves which have received the latest relay log events..
Thu Jul 28 16:23:25 2016 - [info]
From:
host_2(host_2:3306) (current master)
+--host_1(host_1:3306)
+--host_3(host_3:3306)
To:
host_1(host_1:3306) (new master)
+--host_3(host_3:3306)
+--host_2(host_2:3306)
Thu Jul 28 16:23:25 2016 - [info] Checking whether host_1(host_1:3306) is ok for the new master..
Thu Jul 28 16:23:25 2016 - [info] ok.
Thu Jul 28 16:23:25 2016 - [info] host_2(host_2:3306): SHOW SLAVE STATUS returned empty result. To check replication filtering rules, temporarily executing CHANGE MASTER to a dummy host.
Thu Jul 28 16:23:25 2016 - [info] host_2(host_2:3306): Resetting slave pointing to the dummy host.
Thu Jul 28 16:23:25 2016 - [info] ** Phase 1: Configuration Check Phase completed.
Thu Jul 28 16:23:25 2016 - [info]
Thu Jul 28 16:23:25 2016 - [info] * Phase 2: Rejecting updates Phase..
Thu Jul 28 16:23:25 2016 - [info]
Thu Jul 28 16:23:25 2016 - [warning] master_ip_online_change_script is not defined. Skipping disabling writes on the current master.
Thu Jul 28 16:23:25 2016 - [info] Locking all tables on the orig master to reject updates from everybody (including root):
Thu Jul 28 16:23:25 2016 - [info] Executing FLUSH TABLES WITH READ LOCK..
Thu Jul 28 16:23:25 2016 - [info] ok.
Thu Jul 28 16:23:25 2016 - [info] Orig master binlog:pos is host_2_name.000002:294.
Thu Jul 28 16:23:25 2016 - [info] Waiting to execute all relay logs on host_1(host_1:3306)..
Thu Jul 28 16:23:25 2016 - [info] master_pos_wait(host_2_name.000002:294) completed on host_1(host_1:3306). Executed 0 events.
Thu Jul 28 16:23:25 2016 - [info] done.
Thu Jul 28 16:23:25 2016 - [info] Getting new master's binlog name and position..
Thu Jul 28 16:23:25 2016 - [info] host_1_name.000004:154
Thu Jul 28 16:23:25 2016 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='host_1', MASTER_PORT=3306, MASTER_LOG_FILE='host_1_name.000004', MASTER_LOG_POS=154, MASTER_USER='repl', MASTER_PASSWORD='xxx';
Thu Jul 28 16:23:25 2016 - [info] Setting read_only=0 on host_1(host_1:3306)..
Thu Jul 28 16:23:26 2016 - [info] ok.
Thu Jul 28 16:23:26 2016 - [info]
Thu Jul 28 16:23:26 2016 - [info] * Switching slaves in parallel..
Thu Jul 28 16:23:26 2016 - [info]
Thu Jul 28 16:23:26 2016 - [info] -- Slave switch on host host_3(host_3:3306) started, pid: 32120
Thu Jul 28 16:23:26 2016 - [info]
Thu Jul 28 16:23:26 2016 - [info] Log messages from host_3 ...
Thu Jul 28 16:23:26 2016 - [info]
Thu Jul 28 16:23:26 2016 - [info] Waiting to execute all relay logs on host_3(host_3:3306)..
Thu Jul 28 16:23:26 2016 - [info] master_pos_wait(host_2_name.000002:294) completed on host_3(host_3:3306). Executed 0 events.
Thu Jul 28 16:23:26 2016 - [info] done.
Thu Jul 28 16:23:26 2016 - [info] Resetting slave host_3(host_3:3306) and starting replication from the new master host_1(host_1:3306)..
Thu Jul 28 16:23:26 2016 - [info] Executed CHANGE MASTER.
Thu Jul 28 16:23:26 2016 - [info] Slave started.
Thu Jul 28 16:23:26 2016 - [info] End of log messages from host_3 ...
Thu Jul 28 16:23:26 2016 - [info]
Thu Jul 28 16:23:26 2016 - [info] -- Slave switch on host host_3(host_3:3306) succeeded.
Thu Jul 28 16:23:26 2016 - [info] Unlocking all tables on the orig master:
Thu Jul 28 16:23:26 2016 - [info] Executing UNLOCK TABLES..
Thu Jul 28 16:23:26 2016 - [info] ok.
Thu Jul 28 16:23:26 2016 - [info] Starting orig master as a new slave..
Thu Jul 28 16:23:26 2016 - [info] Resetting slave host_2(host_2:3306) and starting replication from the new master host_1(host_1:3306)..
Thu Jul 28 16:23:26 2016 - [info] Executed CHANGE MASTER.
Thu Jul 28 16:23:26 2016 - [info] Slave started.
Thu Jul 28 16:23:26 2016 - [info] All new slave servers switched successfully.
Thu Jul 28 16:23:26 2016 - [info]
Thu Jul 28 16:23:26 2016 - [info] * Phase 5: New master cleanup phase..
Thu Jul 28 16:23:26 2016 - [info]
Thu Jul 28 16:23:26 2016 - [info] host_1: Resetting slave info succeeded.
Thu Jul 28 16:23:26 2016 - [info] Switching master to host_1(host_1:3306) completed successfully.
* 检查SSH
masterha_check_ssh --conf=/etc/app1.cnf
* 检查复制(如果是非GTID模式,slave如果没有设置relay_log_purge=0,那么会有警告)
masterha_check_repl --conf=/etc/app1.cnf
Thu Jul 28 14:03:33 2016 - [warning] relay_log_purge=0 is not set on slave host_2(host_2:3306).
Thu Jul 28 14:03:33 2016 - [warning] relay_log_purge=0 is not set on slave host_3(host_3:3306).
* 检查mha状态
masterha_check_status --conf=/etc/app1.cnf
* 开启mha 监控
nohup masterha_manager --conf=/etc/app1.cnf &
killall -9 mysqld mysqld_safe
Thu Jul 28 17:10:25 2016 - [info] MHA::MasterMonitor version 0.56.
Thu Jul 28 17:10:26 2016 - [info] GTID failover mode = 0
Thu Jul 28 17:10:26 2016 - [info] Dead Servers:
Thu Jul 28 17:10:26 2016 - [info] Alive Servers:
Thu Jul 28 17:10:26 2016 - [info] host_1(host_1:3306)
Thu Jul 28 17:10:26 2016 - [info] host_2(host_2:3306)
Thu Jul 28 17:10:26 2016 - [info] host_3(host_3:3306)
Thu Jul 28 17:10:26 2016 - [info] Alive Slaves:
Thu Jul 28 17:10:26 2016 - [info] host_2(host_2:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled
Thu Jul 28 17:10:26 2016 - [info] Replicating from host_1(host_1:3306)
Thu Jul 28 17:10:26 2016 - [info] Primary candidate for the new Master (candidate_master is set)
Thu Jul 28 17:10:26 2016 - [info] host_3(host_3:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled
Thu Jul 28 17:10:26 2016 - [info] Replicating from host_1(host_1:3306)
Thu Jul 28 17:10:26 2016 - [info] Not candidate for the new Master (no_master is set)
Thu Jul 28 17:10:26 2016 - [info] Current Alive Master: host_1(host_1:3306)
Thu Jul 28 17:10:26 2016 - [info] Checking slave configurations..
Thu Jul 28 17:10:26 2016 - [info] read_only=1 is not set on slave host_2(host_2:3306).
Thu Jul 28 17:10:26 2016 - [warning] relay_log_purge=0 is not set on slave host_3(host_3:3306).
Thu Jul 28 17:10:26 2016 - [info] Checking replication filtering settings..
Thu Jul 28 17:10:26 2016 - [info] binlog_do_db= , binlog_ignore_db=
Thu Jul 28 17:10:26 2016 - [info] Replication filtering check ok.
Thu Jul 28 17:10:26 2016 - [info] GTID (with auto-pos) is not supported
Thu Jul 28 17:10:26 2016 - [info] Starting SSH connection tests..
Thu Jul 28 17:10:27 2016 - [info] All SSH connection tests passed successfully.
Thu Jul 28 17:10:27 2016 - [info] Checking MHA Node version..
Thu Jul 28 17:10:28 2016 - [info] Version check ok.
Thu Jul 28 17:10:28 2016 - [info] Checking SSH publickey authentication settings on the current master..
Thu Jul 28 17:10:28 2016 - [info] HealthCheck: SSH to host_1 is reachable.
Thu Jul 28 17:10:29 2016 - [info] Master MHA Node version is 0.56.
Thu Jul 28 17:10:29 2016 - [info] Checking recovery script configurations on host_1(host_1:3306)..
Thu Jul 28 17:10:29 2016 - [info] Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/mysql.bin --output_file=/var/log/masterha/app1/save_binary_logs_test --manager_version=0.56 --start_file=host_1_name.000004
Thu Jul 28 17:10:29 2016 - [info] Connecting to root@host_1(host_1:22)..
Creating /var/log/masterha/app1 if not exists.. ok.
Checking output directory is accessible or not..
ok.
Binlog found at /data/mysql.bin, up to host_1_name.000004
Thu Jul 28 17:10:29 2016 - [info] Binlog setting check done.
Thu Jul 28 17:10:29 2016 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
Thu Jul 28 17:10:29 2016 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='dba' --slave_host=host_2 --slave_ip=host_2 --slave_port=3306 --workdir=/var/log/masterha/app1 --target_version=5.7.13-log --manager_version=0.56 --relay_dir=/data/mysql_data --current_relay_log=host_2_name-relay-bin.000001 --slave_pass=xxx
Thu Jul 28 17:10:29 2016 - [info] Connecting to root@host_2(host_2:22)..
Checking slave recovery environment settings..
Relay log found at /data/mysql_data, up to host_2_name-relay-bin.000002
Temporary relay log file is /data/mysql_data/host_2_name-relay-bin.000002
Testing mysql connection and privileges..mysql: [Warning] Using a password on the command line interface can be insecure.
done.
Testing mysqlbinlog output.. done.
Cleaning up test file(s).. done.
Thu Jul 28 17:10:29 2016 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='dba' --slave_host=host_3 --slave_ip=host_3 --slave_port=3306 --workdir=/var/log/masterha/app1 --target_version=5.7.13-log --manager_version=0.56 --relay_dir=/data/mysql_data --current_relay_log=host_3_name-relay-bin.000001 --slave_pass=xxx
Thu Jul 28 17:10:29 2016 - [info] Connecting to root@host_3(host_3:22)..
Checking slave recovery environment settings..
Relay log found at /data/mysql_data, up to host_3_name-relay-bin.000002
Temporary relay log file is /data/mysql_data/host_3_name-relay-bin.000002
Testing mysql connection and privileges..mysql: [Warning] Using a password on the command line interface can be insecure.
done.
Testing mysqlbinlog output.. done.
Cleaning up test file(s).. done.
Thu Jul 28 17:10:29 2016 - [info] Slaves settings check done.
Thu Jul 28 17:10:29 2016 - [info]
host_1(host_1:3306) (current master)
+--host_2(host_2:3306)
+--host_3(host_3:3306)
Thu Jul 28 17:10:29 2016 - [info] Checking master_ip_failover_script status:
Thu Jul 28 17:10:29 2016 - [info] /home/mysql/MHA/masterha/master_ip_failover --command=status --ssh_user=root --orig_master_host=host_1 --orig_master_ip=host_1 --orig_master_port=3306
Thu Jul 28 17:10:30 2016 - [info] OK.
Thu Jul 28 17:10:30 2016 - [warning] shutdown_script is not defined.
Thu Jul 28 17:10:30 2016 - [info] Set master ping interval 3 seconds.
Thu Jul 28 17:10:30 2016 - [warning] secondary_check_script is not defined. It is highly recommended setting it to check master reachability from two or more routes.
Thu Jul 28 17:10:30 2016 - [info] Starting ping health check on host_1(host_1:3306)..
Thu Jul 28 17:10:30 2016 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..
Thu Jul 28 17:11:21 2016 - [warning] Got error on MySQL select ping: 2006 (MySQL server has gone away)
Thu Jul 28 17:11:21 2016 - [info] Executing SSH check script: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/mysql.bin --output_file=/var/log/masterha/app1/save_binary_logs_test --manager_version=0.56 --binlog_prefix=host_1_name
Thu Jul 28 17:11:21 2016 - [info] HealthCheck: SSH to host_1 is reachable.
Thu Jul 28 17:11:24 2016 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111)
Thu Jul 28 17:11:24 2016 - [warning] Connection failed 2 time(s)..
Thu Jul 28 17:11:27 2016 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111)
Thu Jul 28 17:11:27 2016 - [warning] Connection failed 3 time(s)..
Thu Jul 28 17:11:30 2016 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111)
Thu Jul 28 17:11:30 2016 - [warning] Connection failed 4 time(s)..
Thu Jul 28 17:11:30 2016 - [warning] Master is not reachable from health checker!
Thu Jul 28 17:11:30 2016 - [warning] Master host_1(host_1:3306) is not reachable!
Thu Jul 28 17:11:30 2016 - [warning] SSH is reachable.
Thu Jul 28 17:11:30 2016 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha_default.cnf and /etc/app1.cnf again, and trying to connect to all servers to check server status..
Thu Jul 28 17:11:30 2016 - [info] Reading default configuration from /etc/masterha_default.cnf..
Thu Jul 28 17:11:30 2016 - [info] Reading application default configuration from /etc/app1.cnf..
Thu Jul 28 17:11:30 2016 - [info] Reading server configuration from /etc/app1.cnf..
Thu Jul 28 17:11:30 2016 - [info] GTID failover mode = 0
Thu Jul 28 17:11:30 2016 - [info] Dead Servers:
Thu Jul 28 17:11:30 2016 - [info] host_1(host_1:3306)
Thu Jul 28 17:11:30 2016 - [info] Alive Servers:
Thu Jul 28 17:11:30 2016 - [info] host_2(host_2:3306)
Thu Jul 28 17:11:30 2016 - [info] host_3(host_3:3306)
Thu Jul 28 17:11:30 2016 - [info] Alive Slaves:
Thu Jul 28 17:11:30 2016 - [info] host_2(host_2:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled
Thu Jul 28 17:11:30 2016 - [info] Replicating from host_1(host_1:3306)
Thu Jul 28 17:11:30 2016 - [info] Primary candidate for the new Master (candidate_master is set)
Thu Jul 28 17:11:30 2016 - [info] host_3(host_3:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled
Thu Jul 28 17:11:30 2016 - [info] Replicating from host_1(host_1:3306)
Thu Jul 28 17:11:30 2016 - [info] Not candidate for the new Master (no_master is set)
Thu Jul 28 17:11:30 2016 - [info] Checking slave configurations..
Thu Jul 28 17:11:30 2016 - [info] read_only=1 is not set on slave host_2(host_2:3306).
Thu Jul 28 17:11:30 2016 - [warning] relay_log_purge=0 is not set on slave host_3(host_3:3306).
Thu Jul 28 17:11:30 2016 - [info] Checking replication filtering settings..
Thu Jul 28 17:11:30 2016 - [info] Replication filtering check ok.
Thu Jul 28 17:11:30 2016 - [info] Master is down!
Thu Jul 28 17:11:30 2016 - [info] Terminating monitoring script.
Thu Jul 28 17:11:30 2016 - [info] Got exit code 20 (Master dead).
Thu Jul 28 17:11:30 2016 - [info] MHA::MasterFailover version 0.56.
Thu Jul 28 17:11:30 2016 - [info] Starting master failover.
Thu Jul 28 17:11:30 2016 - [info]
Thu Jul 28 17:11:30 2016 - [info] * Phase 1: Configuration Check Phase..
Thu Jul 28 17:11:30 2016 - [info]
Thu Jul 28 17:11:30 2016 - [info] GTID failover mode = 0
Thu Jul 28 17:11:30 2016 - [info] Dead Servers:
Thu Jul 28 17:11:30 2016 - [info] host_1(host_1:3306)
Thu Jul 28 17:11:30 2016 - [info] Checking master reachability via MySQL(double check)...
Thu Jul 28 17:11:30 2016 - [info] ok.
Thu Jul 28 17:11:30 2016 - [info] Alive Servers:
Thu Jul 28 17:11:30 2016 - [info] host_2(host_2:3306)
Thu Jul 28 17:11:30 2016 - [info] host_3(host_3:3306)
Thu Jul 28 17:11:30 2016 - [info] Alive Slaves:
Thu Jul 28 17:11:30 2016 - [info] host_2(host_2:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled
Thu Jul 28 17:11:30 2016 - [info] Replicating from host_1(host_1:3306)
Thu Jul 28 17:11:30 2016 - [info] Primary candidate for the new Master (candidate_master is set)
Thu Jul 28 17:11:30 2016 - [info] host_3(host_3:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled
Thu Jul 28 17:11:30 2016 - [info] Replicating from host_1(host_1:3306)
Thu Jul 28 17:11:30 2016 - [info] Not candidate for the new Master (no_master is set)
Thu Jul 28 17:11:30 2016 - [info] Starting Non-GTID based failover.
Thu Jul 28 17:11:30 2016 - [info]
Thu Jul 28 17:11:30 2016 - [info] ** Phase 1: Configuration Check Phase completed.
Thu Jul 28 17:11:30 2016 - [info]
Thu Jul 28 17:11:30 2016 - [info] * Phase 2: Dead Master Shutdown Phase..
Thu Jul 28 17:11:30 2016 - [info]
Thu Jul 28 17:11:30 2016 - [info] Forcing shutdown so that applications never connect to the current master..
Thu Jul 28 17:11:30 2016 - [info] Executing master IP deactivation script:
Thu Jul 28 17:11:30 2016 - [info] /home/mysql/MHA/masterha/master_ip_failover --orig_master_host=host_1 --orig_master_ip=host_1 --orig_master_port=3306 --command=stopssh --ssh_user=root
Thu Jul 28 17:11:30 2016 - [info] done.
Thu Jul 28 17:11:30 2016 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.
Thu Jul 28 17:11:30 2016 - [info] * Phase 2: Dead Master Shutdown Phase completed.
Thu Jul 28 17:11:30 2016 - [info]
Thu Jul 28 17:11:30 2016 - [info] * Phase 3: Master Recovery Phase..
Thu Jul 28 17:11:30 2016 - [info]
Thu Jul 28 17:11:30 2016 - [info] * Phase 3.1: Getting Latest Slaves Phase..
Thu Jul 28 17:11:30 2016 - [info]
Thu Jul 28 17:11:30 2016 - [info] The latest binary log file/position on all slaves is host_1_name.000004:154
Thu Jul 28 17:11:30 2016 - [info] Latest slaves (Slaves that received relay log files to the latest):
Thu Jul 28 17:11:30 2016 - [info] host_2(host_2:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled
Thu Jul 28 17:11:30 2016 - [info] Replicating from host_1(host_1:3306)
Thu Jul 28 17:11:30 2016 - [info] Primary candidate for the new Master (candidate_master is set)
Thu Jul 28 17:11:30 2016 - [info] host_3(host_3:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled
Thu Jul 28 17:11:30 2016 - [info] Replicating from host_1(host_1:3306)
Thu Jul 28 17:11:30 2016 - [info] Not candidate for the new Master (no_master is set)
Thu Jul 28 17:11:30 2016 - [info] The oldest binary log file/position on all slaves is host_1_name.000004:154
Thu Jul 28 17:11:30 2016 - [info] Oldest slaves:
Thu Jul 28 17:11:30 2016 - [info] host_2(host_2:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled
Thu Jul 28 17:11:30 2016 - [info] Replicating from host_1(host_1:3306)
Thu Jul 28 17:11:30 2016 - [info] Primary candidate for the new Master (candidate_master is set)
Thu Jul 28 17:11:30 2016 - [info] host_3(host_3:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled
Thu Jul 28 17:11:30 2016 - [info] Replicating from host_1(host_1:3306)
Thu Jul 28 17:11:30 2016 - [info] Not candidate for the new Master (no_master is set)
Thu Jul 28 17:11:30 2016 - [info]
Thu Jul 28 17:11:30 2016 - [info] * Phase 3.2: Saving Dead Master's Binlog Phase..
Thu Jul 28 17:11:30 2016 - [info]
Thu Jul 28 17:11:30 2016 - [info] Fetching dead master's binary logs..
Thu Jul 28 17:11:30 2016 - [info] Executing command on the dead master host_1(host_1:3306): save_binary_logs --command=save --start_file=host_1_name.000004 --start_pos=154 --binlog_dir=/data/mysql.bin --output_file=/var/log/masterha/app1/saved_master_binlog_from_host_1_3306_20160728171130.binlog --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.56
Creating /var/log/masterha/app1 if not exists.. ok.
Concat binary/relay logs from host_1_name.000004 pos 154 to host_1_name.000004 EOF into /var/log/masterha/app1/saved_master_binlog_from_host_1_3306_20160728171130.binlog ..
Binlog Checksum enabled
Dumping binlog format description event, from position 0 to 154.. ok.
No need to dump effective binlog data from /data/mysql.bin/host_1_name.000004 (pos starts 154, filesize 154). Skipping.
Binlog Checksum enabled
/var/log/masterha/app1/saved_master_binlog_from_host_1_3306_20160728171130.binlog has no effective data events.
Event not exists.
Thu Jul 28 17:11:31 2016 - [info] Additional events were not found from the orig master. No need to save.
Thu Jul 28 17:11:31 2016 - [info]
Thu Jul 28 17:11:31 2016 - [info] * Phase 3.3: Determining New Master Phase..
Thu Jul 28 17:11:31 2016 - [info]
Thu Jul 28 17:11:31 2016 - [info] Finding the latest slave that has all relay logs for recovering other slaves..
Thu Jul 28 17:11:31 2016 - [info] All slaves received relay logs to the same position. No need to resync each other.
Thu Jul 28 17:11:31 2016 - [info] Searching new master from slaves..
Thu Jul 28 17:11:31 2016 - [info] Candidate masters from the configuration file:
Thu Jul 28 17:11:31 2016 - [info] host_2(host_2:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled
Thu Jul 28 17:11:31 2016 - [info] Replicating from host_1(host_1:3306)
Thu Jul 28 17:11:31 2016 - [info] Primary candidate for the new Master (candidate_master is set)
Thu Jul 28 17:11:31 2016 - [info] Non-candidate masters:
Thu Jul 28 17:11:31 2016 - [info] host_3(host_3:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled
Thu Jul 28 17:11:31 2016 - [info] Replicating from host_1(host_1:3306)
Thu Jul 28 17:11:31 2016 - [info] Not candidate for the new Master (no_master is set)
Thu Jul 28 17:11:31 2016 - [info] Searching from candidate_master slaves which have received the latest relay log events..
Thu Jul 28 17:11:31 2016 - [info] New master is host_2(host_2:3306)
Thu Jul 28 17:11:31 2016 - [info] Starting master failover..
Thu Jul 28 17:11:31 2016 - [info]
From:
host_1(host_1:3306) (current master)
+--host_2(host_2:3306)
+--host_3(host_3:3306)
To:
host_2(host_2:3306) (new master)
+--host_3(host_3:3306)
Thu Jul 28 17:11:31 2016 - [info]
Thu Jul 28 17:11:31 2016 - [info] * Phase 3.3: New Master Diff Log Generation Phase..
Thu Jul 28 17:11:31 2016 - [info]
Thu Jul 28 17:11:31 2016 - [info] This server has all relay logs. No need to generate diff files from the latest slave.
Thu Jul 28 17:11:31 2016 - [info]
Thu Jul 28 17:11:31 2016 - [info] * Phase 3.4: Master Log Apply Phase..
Thu Jul 28 17:11:31 2016 - [info]
Thu Jul 28 17:11:31 2016 - [info] *NOTICE: If any error happens from this phase, manual recovery is needed.
Thu Jul 28 17:11:31 2016 - [info] Starting recovery on host_2(host_2:3306)..
Thu Jul 28 17:11:31 2016 - [info] This server has all relay logs. Waiting all logs to be applied..
Thu Jul 28 17:11:31 2016 - [info] done.
Thu Jul 28 17:11:31 2016 - [info] All relay logs were successfully applied.
Thu Jul 28 17:11:31 2016 - [info] Getting new master's binlog name and position..
Thu Jul 28 17:11:31 2016 - [info] host_2_name.000002:294
Thu Jul 28 17:11:31 2016 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='host_2', MASTER_PORT=3306, MASTER_LOG_FILE='host_2_name.000002', MASTER_LOG_POS=294, MASTER_USER='repl', MASTER_PASSWORD='xxx';
Thu Jul 28 17:11:31 2016 - [info] Executing master IP activate script:
Thu Jul 28 17:11:31 2016 - [info] /home/mysql/MHA/masterha/master_ip_failover --command=start --ssh_user=root --orig_master_host=host_1 --orig_master_ip=host_1 --orig_master_port=3306 --new_master_host=host_2 --new_master_ip=host_2 --new_master_port=3306 --new_master_user='dba' --new_master_password='dba'
Set read_only=0 on the new master.
No need to Creating app user on the new master..
Thu Jul 28 17:11:31 2016 - [info] OK.
Thu Jul 28 17:11:31 2016 - [info] ** Finished master recovery successfully.
Thu Jul 28 17:11:31 2016 - [info] * Phase 3: Master Recovery Phase completed.
Thu Jul 28 17:11:31 2016 - [info]
Thu Jul 28 17:11:31 2016 - [info] * Phase 4: Slaves Recovery Phase..
Thu Jul 28 17:11:31 2016 - [info]
Thu Jul 28 17:11:31 2016 - [info] * Phase 4.1: Starting Parallel Slave Diff Log Generation Phase..
Thu Jul 28 17:11:31 2016 - [info]
Thu Jul 28 17:11:31 2016 - [info] -- Slave diff file generation on host host_3(host_3:3306) started, pid: 22427. Check tmp log /var/log/masterha/app1/host_3_3306_20160728171130.log if it takes time..
Thu Jul 28 17:11:31 2016 - [info]
Thu Jul 28 17:11:31 2016 - [info] Log messages from host_3 ...
Thu Jul 28 17:11:31 2016 - [info]
Thu Jul 28 17:11:31 2016 - [info] This server has all relay logs. No need to generate diff files from the latest slave.
Thu Jul 28 17:11:31 2016 - [info] End of log messages from host_3.
Thu Jul 28 17:11:31 2016 - [info] -- host_3(host_3:3306) has the latest relay log events.
Thu Jul 28 17:11:31 2016 - [info] Generating relay diff files from the latest slave succeeded.
Thu Jul 28 17:11:31 2016 - [info]
Thu Jul 28 17:11:31 2016 - [info] * Phase 4.2: Starting Parallel Slave Log Apply Phase..
Thu Jul 28 17:11:31 2016 - [info]
Thu Jul 28 17:11:31 2016 - [info] -- Slave recovery on host host_3(host_3:3306) started, pid: 22429. Check tmp log /var/log/masterha/app1/host_3_3306_20160728171130.log if it takes time..
Thu Jul 28 17:11:31 2016 - [info]
Thu Jul 28 17:11:31 2016 - [info] Log messages from host_3 ...
Thu Jul 28 17:11:31 2016 - [info]
Thu Jul 28 17:11:31 2016 - [info] Starting recovery on host_3(host_3:3306)..
Thu Jul 28 17:11:31 2016 - [info] This server has all relay logs. Waiting all logs to be applied..
Thu Jul 28 17:11:31 2016 - [info] done.
Thu Jul 28 17:11:31 2016 - [info] All relay logs were successfully applied.
Thu Jul 28 17:11:31 2016 - [info] Resetting slave host_3(host_3:3306) and starting replication from the new master host_2(host_2:3306)..
Thu Jul 28 17:11:31 2016 - [info] Executed CHANGE MASTER.
Thu Jul 28 17:11:31 2016 - [info] Slave started.
Thu Jul 28 17:11:31 2016 - [info] End of log messages from host_3.
Thu Jul 28 17:11:31 2016 - [info] -- Slave recovery on host host_3(host_3:3306) succeeded.
Thu Jul 28 17:11:31 2016 - [info] All new slave servers recovered successfully.
Thu Jul 28 17:11:31 2016 - [info]
Thu Jul 28 17:11:31 2016 - [info] * Phase 5: New master cleanup phase..
Thu Jul 28 17:11:31 2016 - [info]
Thu Jul 28 17:11:31 2016 - [info] Resetting slave info on the new master..
Thu Jul 28 17:11:31 2016 - [info] host_2: Resetting slave info succeeded.
Thu Jul 28 17:11:31 2016 - [info] Master failover to host_2(host_2:3306) completed successfully.
Thu Jul 28 17:11:31 2016 - [info]
----- Failover Report -----
app1: MySQL Master failover host_1(host_1:3306) to host_2(host_2:3306) succeeded
Master host_1(host_1:3306) is down!
Check MHA Manager logs at host_manager_name:/var/log/masterha/app1/app1.log for details.
Started automated(non-interactive) failover.
Invalidated master IP address on host_1(host_1:3306)
The latest slave host_2(host_2:3306) has all relay logs for recovery.
Selected host_2(host_2:3306) as a new master.
host_2(host_2:3306): OK: Applying all logs succeeded.
host_2(host_2:3306): OK: Activated master IP address.
host_3(host_3:3306): This host has the latest relay log events.
Generating relay diff files from the latest slave succeeded.
host_3(host_3:3306): OK: Applying all logs succeeded. Slave started, replicating from host_2(host_2:3306)
host_2(host_2:3306): Resetting slave info succeeded.
Master failover to host_2(host_2:3306) completed successfully.
* 检查SSH
masterha_check_ssh --conf=/etc/app1.cnf
* 检查复制(如果是非GTID模式,slave如果没有设置relay_log_purge=0,那么会有警告)
masterha_check_repl --conf=/etc/app1.cnf
Thu Jul 28 14:03:33 2016 - [warning] relay_log_purge=0 is not set on slave host_2(host_2:3306).
Thu Jul 28 14:03:33 2016 - [warning] relay_log_purge=0 is not set on slave host_3(host_3:3306).
* 检查mha状态
masterha_check_status --conf=/etc/app1.cnf
* 开启mha 监控
nohup masterha_manager --conf=/etc/app1.cnf &
mysqlslap --concurrency=4000 --iterations=10 --query='insert into tb(id_2) select (1);' --debug-info -hhost_1 -uxx -pxx --create-schema=lc
mysqlslap: Error when connecting to server: 1040 Too many connections
mysqlslap: Error when connecting to server: 1040 Too many connections
mysqlslap: Error when connecting to server: 1040 Too many connections
* 1. 手动开启另一个session,链接报错tmc。 但是并没有发生failover,说明master 还活着
* 2. 检查了MHA的health check机制,原来是长链接
DIGEST_TEXT: SELECT ? AS VALUE
COUNT_STAR: 46
FIRST_SEEN: 2016-08-02 11:28:00
LAST_SEEN: 2016-08-02 11:30:37
* 检查SSH
masterha_check_ssh --conf=/etc/app1.cnf
* 检查复制(如果是非GTID模式,slave如果没有设置relay_log_purge=0,那么会有警告)
masterha_check_repl --conf=/etc/app1.cnf
Thu Jul 28 14:03:33 2016 - [warning] relay_log_purge=0 is not set on slave host_2(host_2:3306).
Thu Jul 28 14:03:33 2016 - [warning] relay_log_purge=0 is not set on slave host_3(host_3:3306).
* 检查mha状态
masterha_check_status --conf=/etc/app1.cnf
* 开启mha 监控
nohup masterha_manager --conf=/etc/app1.cnf &
候选master> stop slave; sleep 600s ; start slave;
master > reboot
Tue Aug 2 14:33:34 2016 - [warning] Slave host_2(host_2:3306) SQL Thread delays too much. Latest log file:host_1_name.000002:576864, Current log file:host_1_name.000001:39731438. This server is not selected as a new master because recovery will take long time.
Tue Aug 2 14:33:34 2016 - [error][/usr/share/perl5/vendor_perl/MHA/MasterFailover.pm, ln970] None of existing slaves matches as a new master. Maybe preferred node is misconfigured or all slaves are too far behind.
Tue Aug 2 14:33:34 2016 - [error][/usr/share/perl5/vendor_perl/MHA/ManagerUtil.pm, ln177] Got ERROR: at /usr/bin/masterha_master_switch line 53
由于要指定特定的slave为候选master,而此slave还落后非常多
必须在每组服务器上都加上check_repl_delay=0
vi /etc/app1.cnf
[server default]
remote_workdir=/var/log/masterha/app1
manager_workdir=/var/log/masterha/app1
manager_log=/var/log/masterha/app1/app1.log
[server1]
hostname=host_1
candidate_master=1
check_repl_delay=0
[server2]
hostname=host_2
candidate_master=1
check_repl_delay=0
[server3]
hostname=host_3
no_master=1
check_repl_delay=0
masterha_master_switch --master_state=dead --conf=/etc/app1.cnf --dead_master_host=host_1 --interactive=1 --ignore_last_failover --new_master_host=host_2
Tue Aug 2 14:38:55 2016 - [info] Reading default configuration from /etc/masterha_default.cnf..
Tue Aug 2 14:38:55 2016 - [info] Reading application default configuration from /etc/app1.cnf..
Tue Aug 2 14:38:55 2016 - [info] Reading server configuration from /etc/app1.cnf..
Tue Aug 2 14:38:55 2016 - [info] MHA::MasterFailover version 0.56.
Tue Aug 2 14:38:55 2016 - [info] Starting master failover.
Tue Aug 2 14:38:55 2016 - [info]
Tue Aug 2 14:38:55 2016 - [info] * Phase 1: Configuration Check Phase..
Tue Aug 2 14:38:55 2016 - [info]
Tue Aug 2 14:38:55 2016 - [info] GTID failover mode = 0
Tue Aug 2 14:38:55 2016 - [info] Dead Servers:
Tue Aug 2 14:38:55 2016 - [info] host_1(host_1:3306)
Tue Aug 2 14:38:55 2016 - [info] Checking master reachability via MySQL(double check)...
Tue Aug 2 14:38:55 2016 - [info] ok.
Tue Aug 2 14:38:55 2016 - [info] Alive Servers:
Tue Aug 2 14:38:55 2016 - [info] host_2(host_2:3306)
Tue Aug 2 14:38:55 2016 - [info] host_3(host_3:3306)
Tue Aug 2 14:38:55 2016 - [info] Alive Slaves:
Tue Aug 2 14:38:55 2016 - [info] host_2(host_2:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled
Tue Aug 2 14:38:55 2016 - [info] Replicating from host_1(host_1:3306)
Tue Aug 2 14:38:55 2016 - [info] Primary candidate for the new Master (candidate_master is set)
Tue Aug 2 14:38:55 2016 - [info] host_3(host_3:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled
Tue Aug 2 14:38:55 2016 - [info] Replicating from host_1(host_1:3306)
Tue Aug 2 14:38:55 2016 - [info] Not candidate for the new Master (no_master is set)
Master host_1(host_1:3306) is dead. Proceed? (yes/NO): yes
Tue Aug 2 14:38:57 2016 - [info] Starting Non-GTID based failover.
Tue Aug 2 14:38:57 2016 - [info]
Tue Aug 2 14:38:57 2016 - [info] ** Phase 1: Configuration Check Phase completed.
Tue Aug 2 14:38:57 2016 - [info]
Tue Aug 2 14:38:57 2016 - [info] * Phase 2: Dead Master Shutdown Phase..
Tue Aug 2 14:38:57 2016 - [info]
Tue Aug 2 14:38:58 2016 - [info] HealthCheck: SSH to host_1 is reachable.
Tue Aug 2 14:38:58 2016 - [info] Forcing shutdown so that applications never connect to the current master..
Tue Aug 2 14:38:58 2016 - [info] Executing master IP deactivation script:
Tue Aug 2 14:38:58 2016 - [info] /home/mysql/MHA/masterha/master_ip_failover --orig_master_host=host_1 --orig_master_ip=host_1 --orig_master_port=3306 --command=stopssh --ssh_user=root
Tue Aug 2 14:38:58 2016 - [info] done.
Tue Aug 2 14:38:58 2016 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.
Tue Aug 2 14:38:58 2016 - [info] * Phase 2: Dead Master Shutdown Phase completed.
Tue Aug 2 14:38:58 2016 - [info]
Tue Aug 2 14:38:58 2016 - [info] * Phase 3: Master Recovery Phase..
Tue Aug 2 14:38:58 2016 - [info]
Tue Aug 2 14:38:58 2016 - [info] * Phase 3.1: Getting Latest Slaves Phase..
Tue Aug 2 14:38:58 2016 - [info]
Tue Aug 2 14:38:58 2016 - [info] The latest binary log file/position on all slaves is host_1_name.000002:576864
Tue Aug 2 14:38:58 2016 - [info] Latest slaves (Slaves that received relay log files to the latest):
Tue Aug 2 14:38:58 2016 - [info] host_3(host_3:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled
Tue Aug 2 14:38:58 2016 - [info] Replicating from host_1(host_1:3306)
Tue Aug 2 14:38:58 2016 - [info] Not candidate for the new Master (no_master is set)
Tue Aug 2 14:38:58 2016 - [info] The oldest binary log file/position on all slaves is host_1_name.000001:39731619
Tue Aug 2 14:38:58 2016 - [info] Oldest slaves:
Tue Aug 2 14:38:58 2016 - [info] host_2(host_2:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled
Tue Aug 2 14:38:58 2016 - [info] Replicating from host_1(host_1:3306)
Tue Aug 2 14:38:58 2016 - [info] Primary candidate for the new Master (candidate_master is set)
Tue Aug 2 14:38:58 2016 - [info]
Tue Aug 2 14:38:58 2016 - [info] * Phase 3.2: Saving Dead Master's Binlog Phase..
Tue Aug 2 14:38:58 2016 - [info]
Tue Aug 2 14:38:58 2016 - [info] Fetching dead master's binary logs..
Tue Aug 2 14:38:58 2016 - [info] Executing command on the dead master host_1(host_1:3306): save_binary_logs --command=save --start_file=host_1_name.000002 --start_pos=576864 --binlog_dir=/data/mysql.bin --output_file=/var/log/masterha/app1/saved_master_binlog_from_host_1_3306_20160802143855.binlog --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.56
Creating /var/log/masterha/app1 if not exists.. ok.
Concat binary/relay logs from host_1_name.000002 pos 576864 to host_1_name.000002 EOF into /var/log/masterha/app1/saved_master_binlog_from_host_1_3306_20160802143855.binlog ..
Binlog Checksum enabled
Dumping binlog format description event, from position 0 to 154.. ok.
Dumping effective binlog data from /data/mysql.bin/host_1_name.000002 position 576864 to tail(577435).. ok.
Binlog Checksum enabled
Concat succeeded.
saved_master_binlog_from_host_1_3306_20160802143855.binlog 100% 725 0.7KB/s 00:00
Tue Aug 2 14:38:59 2016 - [info] scp from root@host_1:/var/log/masterha/app1/saved_master_binlog_from_host_1_3306_20160802143855.binlog to local:/var/log/masterha/app1/saved_master_binlog_from_host_1_3306_20160802143855.binlog succeeded.
Tue Aug 2 14:38:59 2016 - [info] HealthCheck: SSH to host_2 is reachable.
Tue Aug 2 14:39:00 2016 - [info] HealthCheck: SSH to host_3 is reachable.
Tue Aug 2 14:39:00 2016 - [info]
Tue Aug 2 14:39:00 2016 - [info] * Phase 3.3: Determining New Master Phase..
Tue Aug 2 14:39:00 2016 - [info]
Tue Aug 2 14:39:00 2016 - [info] Finding the latest slave that has all relay logs for recovering other slaves..
Tue Aug 2 14:39:00 2016 - [info] Checking whether host_3 has relay logs from the oldest position..
Tue Aug 2 14:39:00 2016 - [info] Executing command: apply_diff_relay_logs --command=find --latest_mlf=host_1_name.000002 --latest_rmlp=576864 --target_mlf=host_1_name.000001 --target_rmlp=39731619 --server_id=12616606 --workdir=/var/log/masterha/app1 --timestamp=20160802143855 --manager_version=0.56 --relay_dir=/data/mysql_data --current_relay_log=host_3_name-relay-bin.000004 :
Relay log found at /data/mysql_data, up to host_3_name-relay-bin.000004
Fast relay log position search failed. Reading relay logs to find..
Reading host_3_name-relay-bin.000004
Binlog Checksum enabled
Master Version is 5.7.13-log
Binlog Checksum enabled
host_3_name-relay-bin.000004 contains master host_1_name.000002 from position 4
Reading host_3_name-relay-bin.000003
Binlog Checksum enabled
host_3_name-relay-bin.000003 contains master host_1_name.000001 from position 218246073
Reading host_3_name-relay-bin.000002
Binlog Checksum enabled
host_3_name-relay-bin.000002 contains master host_1_name.000001 from position 154
Target relay log FOUND!
Tue Aug 2 14:39:00 2016 - [info] OK. host_3 has all relay logs.
Tue Aug 2 14:39:00 2016 - [info] Searching new master from slaves..
Tue Aug 2 14:39:00 2016 - [info] Candidate masters from the configuration file:
Tue Aug 2 14:39:00 2016 - [info] host_2(host_2:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled
Tue Aug 2 14:39:00 2016 - [info] Replicating from host_1(host_1:3306)
Tue Aug 2 14:39:00 2016 - [info] Primary candidate for the new Master (candidate_master is set)
Tue Aug 2 14:39:00 2016 - [info] Non-candidate masters:
Tue Aug 2 14:39:00 2016 - [info] host_3(host_3:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled
Tue Aug 2 14:39:00 2016 - [info] Replicating from host_1(host_1:3306)
Tue Aug 2 14:39:00 2016 - [info] Not candidate for the new Master (no_master is set)
Tue Aug 2 14:39:00 2016 - [info] Searching from candidate_master slaves which have received the latest relay log events..
Tue Aug 2 14:39:00 2016 - [info] Not found.
Tue Aug 2 14:39:00 2016 - [info] Searching from all candidate_master slaves..
Tue Aug 2 14:39:00 2016 - [info] New master is host_2(host_2:3306)
Tue Aug 2 14:39:00 2016 - [info] Starting master failover..
Tue Aug 2 14:39:00 2016 - [info]
From:
host_1(host_1:3306) (current master)
+--host_2(host_2:3306)
+--host_3(host_3:3306)
To:
host_2(host_2:3306) (new master)
+--host_3(host_3:3306)
Starting master switch from host_1(host_1:3306) to host_2(host_2:3306)? (yes/NO): yes
Tue Aug 2 14:39:05 2016 - [info] New master decided manually is host_2(host_2:3306)
Tue Aug 2 14:39:05 2016 - [info]
Tue Aug 2 14:39:05 2016 - [info] * Phase 3.3: New Master Diff Log Generation Phase..
Tue Aug 2 14:39:05 2016 - [info]
Tue Aug 2 14:39:05 2016 - [info] Server host_2 received relay logs up to: host_1_name.000001:39731619
Tue Aug 2 14:39:05 2016 - [info] Need to get diffs from the latest slave(host_3) up to: host_1_name.000002:576864 (using the latest slave's relay logs)
Tue Aug 2 14:39:05 2016 - [info] Connecting to the latest slave host host_3, generating diff relay log files..
Tue Aug 2 14:39:05 2016 - [info] Executing command: apply_diff_relay_logs --command=generate_and_send --scp_user=root --scp_host=host_2 --latest_mlf=host_1_name.000002 --latest_rmlp=576864 --target_mlf=host_1_name.000001 --target_rmlp=39731619 --server_id=12616606 --diff_file_readtolatest=/var/log/masterha/app1/relay_from_read_to_latest_host_2_3306_20160802143855.binlog --workdir=/var/log/masterha/app1 --timestamp=20160802143855 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.56 --relay_dir=/data/mysql_data --current_relay_log=host_3_name-relay-bin.000004
Tue Aug 2 14:39:11 2016 - [info]
Relay log found at /data/mysql_data, up to host_3_name-relay-bin.000004
Fast relay log position search failed. Reading relay logs to find..
Reading host_3_name-relay-bin.000004
Binlog Checksum enabled
Master Version is 5.7.13-log
Binlog Checksum enabled
host_3_name-relay-bin.000004 contains master host_1_name.000002 from position 4
Reading host_3_name-relay-bin.000003
Binlog Checksum enabled
host_3_name-relay-bin.000003 contains master host_1_name.000001 from position 218246073
Reading host_3_name-relay-bin.000002
Binlog Checksum enabled
host_3_name-relay-bin.000002 contains master host_1_name.000001 from position 154
Target relay log file/position found. start_file:host_3_name-relay-bin.000002, start_pos:39731788.
Concat binary/relay logs from host_3_name-relay-bin.000002 pos 39731788 to host_3_name-relay-bin.000004 EOF into /var/log/masterha/app1/relay_from_read_to_latest_host_2_3306_20160802143855.binlog ..
Binlog Checksum enabled
Binlog Checksum enabled
Dumping binlog format description event, from position 0 to 323.. ok.
Dumping effective binlog data from /data/mysql_data/host_3_name-relay-bin.000002 position 39731788 to tail(218246252).. ok.
Dumping binlog head events (rotate events), skipping format description events from /data/mysql_data/host_3_name-relay-bin.000003.. Binlog Checksum enabled
dumped up to pos 264. ok.
No need to dump effective binlog data from /data/mysql_data/host_3_name-relay-bin.000003 (pos starts 264, filesize 264). Skipping.
Dumping binlog head events (rotate events), skipping format description events from /data/mysql_data/host_3_name-relay-bin.000004.. Binlog Checksum enabled
Binlog Checksum enabled
dumped up to pos 373. ok.
Dumping effective binlog data from /data/mysql_data/host_3_name-relay-bin.000004 position 373 to tail(577083).. ok.
Binlog Checksum enabled
Binlog Checksum enabled
Concat succeeded.
Generating diff relay log succeeded. Saved at /var/log/masterha/app1/relay_from_read_to_latest_host_2_3306_20160802143855.binlog .
scp host_3_name.58os.org:/var/log/masterha/app1/relay_from_read_to_latest_host_2_3306_20160802143855.binlog to root@host_2(22) succeeded.
Tue Aug 2 14:39:11 2016 - [info] Generating diff files succeeded.
Tue Aug 2 14:39:11 2016 - [info] Sending binlog..
saved_master_binlog_from_host_1_3306_20160802143855.binlog 100% 725 0.7KB/s 00:00
Tue Aug 2 14:39:12 2016 - [info] scp from local:/var/log/masterha/app1/saved_master_binlog_from_host_1_3306_20160802143855.binlog to root@host_2:/var/log/masterha/app1/saved_master_binlog_from_host_1_3306_20160802143855.binlog succeeded.
Tue Aug 2 14:39:12 2016 - [info]
Tue Aug 2 14:39:12 2016 - [info] * Phase 3.4: Master Log Apply Phase..
Tue Aug 2 14:39:12 2016 - [info]
Tue Aug 2 14:39:12 2016 - [info] *NOTICE: If any error happens from this phase, manual recovery is needed.
Tue Aug 2 14:39:12 2016 - [info] Starting recovery on host_2(host_2:3306)..
Tue Aug 2 14:39:12 2016 - [info] Generating diffs succeeded.
Tue Aug 2 14:39:12 2016 - [info] Waiting until all relay logs are applied.
Tue Aug 2 14:39:12 2016 - [info] done.
Tue Aug 2 14:39:12 2016 - [info] Getting slave status..
Tue Aug 2 14:39:12 2016 - [info] This slave(host_2)'s Exec_Master_Log_Pos(host_1_name.000001:39731438) does not equal to Read_Master_Log_Pos(host_1_name.000001:39731619). It is likely that relay log was cut during transaction. Need to recover from Exec_Master_Log_Pos.
Tue Aug 2 14:39:12 2016 - [info] Saving local relay logs from exec pos to read pos on host_2: from host_2_name-relay-bin.000003:39726380 to the end of the relay log..
Tue Aug 2 14:39:12 2016 - [info] Executing command : save_binary_logs --command=save --start_file=host_2_name-relay-bin.000003 --start_pos=39726380 --output_file=/var/log/masterha/app1/relay_from_exec_to_read_host_2_3306_20160802143855.binlog --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.56 --binlog_dir=/data/mysql_data
Tue Aug 2 14:39:12 2016 - [info]
Creating /var/log/masterha/app1 if not exists.. ok.
Concat binary/relay logs from host_2_name-relay-bin.000003 pos 39726380 to host_2_name-relay-bin.000003 EOF into /var/log/masterha/app1/relay_from_exec_to_read_host_2_3306_20160802143855.binlog ..
Binlog Checksum enabled
Binlog Checksum enabled
Dumping binlog format description event, from position 0 to 323.. ok.
Dumping effective binlog data from /data/mysql_data/host_2_name-relay-bin.000003 position 39726380 to tail(39726561).. ok.
Binlog Checksum enabled
Binlog Checksum enabled
Concat succeeded.
Tue Aug 2 14:39:12 2016 - [info] Connecting to the target slave host host_2, running recover script..
Tue Aug 2 14:39:12 2016 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user='dba' --slave_host=host_2 --slave_ip=host_2 --slave_port=3306 --apply_files=/var/log/masterha/app1/relay_from_exec_to_read_host_2_3306_20160802143855.binlog,/var/log/masterha/app1/relay_from_read_to_latest_host_2_3306_20160802143855.binlog,/var/log/masterha/app1/saved_master_binlog_from_host_1_3306_20160802143855.binlog --workdir=/var/log/masterha/app1 --target_version=5.7.13-log --timestamp=20160802143855 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.56 --slave_pass=xxx
Tue Aug 2 14:42:11 2016 - [info]
Concat all apply files to /var/log/masterha/app1/total_binlog_for_host_2_3306.20160802143855.binlog ..
Copying the first binlog file /var/log/masterha/app1/relay_from_exec_to_read_host_2_3306_20160802143855.binlog to /var/log/masterha/app1/total_binlog_for_host_2_3306.20160802143855.binlog.. ok.
Dumping binlog head events (rotate events), skipping format description events from /var/log/masterha/app1/relay_from_read_to_latest_host_2_3306_20160802143855.binlog.. Binlog Checksum enabled
Binlog Checksum enabled
dumped up to pos 323. ok.
/var/log/masterha/app1/relay_from_read_to_latest_host_2_3306_20160802143855.binlog has effective binlog events from pos 323.
Dumping effective binlog data from /var/log/masterha/app1/relay_from_read_to_latest_host_2_3306_20160802143855.binlog position 323 to tail(179091707).. ok.
Dumping binlog head events (rotate events), skipping format description events from /var/log/masterha/app1/saved_master_binlog_from_host_1_3306_20160802143855.binlog.. Binlog Checksum enabled
dumped up to pos 154. ok.
/var/log/masterha/app1/saved_master_binlog_from_host_1_3306_20160802143855.binlog has effective binlog events from pos 154.
Dumping effective binlog data from /var/log/masterha/app1/saved_master_binlog_from_host_1_3306_20160802143855.binlog position 154 to tail(725).. ok.
Concat succeeded.
All apply target binary logs are concatinated at /var/log/masterha/app1/total_binlog_for_host_2_3306.20160802143855.binlog .
MySQL client version is 5.7.13. Using --binary-mode.
Applying differential binary/relay log files /var/log/masterha/app1/relay_from_exec_to_read_host_2_3306_20160802143855.binlog,/var/log/masterha/app1/relay_from_read_to_latest_host_2_3306_20160802143855.binlog,/var/log/masterha/app1/saved_master_binlog_from_host_1_3306_20160802143855.binlog on host_2:3306. This may take long time...
Applying log files succeeded.
Tue Aug 2 14:42:11 2016 - [info] All relay logs were successfully applied.
Tue Aug 2 14:42:11 2016 - [info] Getting new master's binlog name and position..
Tue Aug 2 14:42:11 2016 - [info] host_2_name.000001:217647730
Tue Aug 2 14:42:11 2016 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='host_2', MASTER_PORT=3306, MASTER_LOG_FILE='host_2_name.000001', MASTER_LOG_POS=217647730, MASTER_USER='repl', MASTER_PASSWORD='xxx';
Tue Aug 2 14:42:11 2016 - [info] Executing master IP activate script:
Tue Aug 2 14:42:11 2016 - [info] /home/mysql/MHA/masterha/master_ip_failover --command=start --ssh_user=root --orig_master_host=host_1 --orig_master_ip=host_1 --orig_master_port=3306 --new_master_host=host_2 --new_master_ip=host_2 --new_master_port=3306 --new_master_user='dba' --new_master_password='dba'
Set read_only=0 on the new master.
No need to Creating app user on the new master..
Tue Aug 2 14:42:11 2016 - [info] OK.
Tue Aug 2 14:42:11 2016 - [info] ** Finished master recovery successfully.
Tue Aug 2 14:42:11 2016 - [info] * Phase 3: Master Recovery Phase completed.
Tue Aug 2 14:42:11 2016 - [info]
Tue Aug 2 14:42:11 2016 - [info] * Phase 4: Slaves Recovery Phase..
Tue Aug 2 14:42:11 2016 - [info]
Tue Aug 2 14:42:11 2016 - [info] * Phase 4.1: Starting Parallel Slave Diff Log Generation Phase..
Tue Aug 2 14:42:11 2016 - [info]
Tue Aug 2 14:42:11 2016 - [info] -- Slave diff file generation on host host_3(host_3:3306) started, pid: 22541. Check tmp log /var/log/masterha/app1/host_3_3306_20160802143855.log if it takes time..
Tue Aug 2 14:42:11 2016 - [info]
Tue Aug 2 14:42:11 2016 - [info] Log messages from host_3 ...
Tue Aug 2 14:42:11 2016 - [info]
Tue Aug 2 14:42:11 2016 - [info] This server has all relay logs. No need to generate diff files from the latest slave.
Tue Aug 2 14:42:11 2016 - [info] End of log messages from host_3.
Tue Aug 2 14:42:11 2016 - [info] -- host_3(host_3:3306) has the latest relay log events.
Tue Aug 2 14:42:11 2016 - [info] Generating relay diff files from the latest slave succeeded.
Tue Aug 2 14:42:11 2016 - [info]
Tue Aug 2 14:42:11 2016 - [info] * Phase 4.2: Starting Parallel Slave Log Apply Phase..
Tue Aug 2 14:42:11 2016 - [info]
Tue Aug 2 14:42:11 2016 - [info] -- Slave recovery on host host_3(host_3:3306) started, pid: 22543. Check tmp log /var/log/masterha/app1/host_3_3306_20160802143855.log if it takes time..
saved_master_binlog_from_host_1_3306_20160802143855.binlog 100% 725 0.7KB/s 00:00
Tue Aug 2 14:42:12 2016 - [info]
Tue Aug 2 14:42:12 2016 - [info] Log messages from host_3 ...
Tue Aug 2 14:42:12 2016 - [info]
Tue Aug 2 14:42:11 2016 - [info] Sending binlog..
Tue Aug 2 14:42:12 2016 - [info] scp from local:/var/log/masterha/app1/saved_master_binlog_from_host_1_3306_20160802143855.binlog to root@host_3:/var/log/masterha/app1/saved_master_binlog_from_host_1_3306_20160802143855.binlog succeeded.
Tue Aug 2 14:42:12 2016 - [info] Starting recovery on host_3(host_3:3306)..
Tue Aug 2 14:42:12 2016 - [info] Generating diffs succeeded.
Tue Aug 2 14:42:12 2016 - [info] Waiting until all relay logs are applied.
Tue Aug 2 14:42:12 2016 - [info] done.
Tue Aug 2 14:42:12 2016 - [info] Getting slave status..
Tue Aug 2 14:42:12 2016 - [info] This slave(host_3)'s Exec_Master_Log_Pos equals to Read_Master_Log_Pos(host_1_name.000002:576864). No need to recover from Exec_Master_Log_Pos.
Tue Aug 2 14:42:12 2016 - [info] Connecting to the target slave host host_3, running recover script..
Tue Aug 2 14:42:12 2016 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user='dba' --slave_host=host_3 --slave_ip=host_3 --slave_port=3306 --apply_files=/var/log/masterha/app1/saved_master_binlog_from_host_1_3306_20160802143855.binlog --workdir=/var/log/masterha/app1 --target_version=5.7.13-log --timestamp=20160802143855 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.56 --slave_pass=xxx
Tue Aug 2 14:42:12 2016 - [info]
MySQL client version is 5.7.13. Using --binary-mode.
Applying differential binary/relay log files /var/log/masterha/app1/saved_master_binlog_from_host_1_3306_20160802143855.binlog on host_3:3306. This may take long time...
Applying log files succeeded.
Tue Aug 2 14:42:12 2016 - [info] All relay logs were successfully applied.
Tue Aug 2 14:42:12 2016 - [info] Resetting slave host_3(host_3:3306) and starting replication from the new master host_2(host_2:3306)..
Tue Aug 2 14:42:12 2016 - [info] Executed CHANGE MASTER.
Tue Aug 2 14:42:12 2016 - [info] Slave started.
Tue Aug 2 14:42:12 2016 - [info] End of log messages from host_3.
Tue Aug 2 14:42:12 2016 - [info] -- Slave recovery on host host_3(host_3:3306) succeeded.
Tue Aug 2 14:42:12 2016 - [info] All new slave servers recovered successfully.
Tue Aug 2 14:42:12 2016 - [info]
Tue Aug 2 14:42:12 2016 - [info] * Phase 5: New master cleanup phase..
Tue Aug 2 14:42:12 2016 - [info]
Tue Aug 2 14:42:12 2016 - [info] Resetting slave info on the new master..
Tue Aug 2 14:42:12 2016 - [info] host_2: Resetting slave info succeeded.
Tue Aug 2 14:42:12 2016 - [info] Master failover to host_2(host_2:3306) completed successfully.
Tue Aug 2 14:42:12 2016 - [info]
----- Failover Report -----
app1: MySQL Master failover host_1(host_1:3306) to host_2(host_2:3306) succeeded
Master host_1(host_1:3306) is down!
Check MHA Manager logs at host_manager_name for details.
Started manual(interactive) failover.
Invalidated master IP address on host_1(host_1:3306)
The latest slave host_3(host_3:3306) has all relay logs for recovery.
Selected host_2(host_2:3306) as a new master.
host_2(host_2:3306): OK: Applying all logs succeeded.
host_2(host_2:3306): OK: Activated master IP address.
host_3(host_3:3306): This host has the latest relay log events.
Generating relay diff files from the latest slave succeeded.
host_3(host_3:3306): OK: Applying all logs succeeded. Slave started, replicating from host_2(host_2:3306)
host_2(host_2:3306): Resetting slave info succeeded.
Master failover to host_2(host_2:3306) completed successfully.
* MHA 只会监控master的状态,只有master挂了,才会自动failover
[info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..
> 10. [用例测试] MySQL slave IO/SQL线程 stop,是否自动failover
> 11. [用例测试] MySQL slave IO/SQL线程 报错,是否自动failover
MySQL slave IO/SQL线程 stop
MySQL slave IO/SQL线程 延迟很多
* 检查SSH
masterha_check_ssh --conf=/etc/app1.cnf
* 检查复制(如果是非GTID模式,slave如果没有设置relay_log_purge=0,那么会有警告)
masterha_check_repl --conf=/etc/app1.cnf
Thu Jul 28 14:03:33 2016 - [warning] relay_log_purge=0 is not set on slave host_2(host_2:3306).
Thu Jul 28 14:03:33 2016 - [warning] relay_log_purge=0 is not set on slave host_3(host_3:3306).
* 检查mha状态
masterha_check_status --conf=/etc/app1.cnf
* 开启mha 监控
nohup masterha_manager --conf=/etc/app1.cnf &
某台slave> stop slave;
master> shutdown mysql;
* 如果是slave 停止同步,然后master也挂了,任然会自动failover,不会报错
* 如果是slave 延迟严重,然后master也挂了,任然会自动failover,不会报错
* 检查SSH
masterha_check_ssh --conf=/etc/app1.cnf
* 检查复制(如果是非GTID模式,slave如果没有设置relay_log_purge=0,那么会有警告)
masterha_check_repl --conf=/etc/app1.cnf
Thu Jul 28 14:03:33 2016 - [warning] relay_log_purge=0 is not set on slave host_2(host_2:3306).
Thu Jul 28 14:03:33 2016 - [warning] relay_log_purge=0 is not set on slave host_3(host_3:3306).
* 检查mha状态
masterha_check_status --conf=/etc/app1.cnf
master> select id , sleep(20) from tb for update;
* slave 有问题的时候
IO,sql thread有问题
slave 挂了
slave 延迟太多
复制过滤规则不统一
等等
以上情况,都会导致切换失败
如果有大事务,MHA会一直等待,直到事务结束
Wed Aug 3 15:33:04 2016 - [info] MHA::MasterRotate version 0.56.
Wed Aug 3 15:33:04 2016 - [info] Starting online master switch..
Wed Aug 3 15:33:04 2016 - [info]
Wed Aug 3 15:33:04 2016 - [info] * Phase 1: Configuration Check Phase..
Wed Aug 3 15:33:04 2016 - [info]
Wed Aug 3 15:33:04 2016 - [info] Reading default configuration from /etc/masterha_default.cnf..
Wed Aug 3 15:33:04 2016 - [info] Reading application default configuration from /etc/app1.cnf..
Wed Aug 3 15:33:04 2016 - [info] Reading server configuration from /etc/app1.cnf..
Wed Aug 3 15:33:04 2016 - [info] GTID failover mode = 0
Wed Aug 3 15:33:04 2016 - [info] Current Alive Master: host_1(host_1:3306)
Wed Aug 3 15:33:04 2016 - [info] Alive Slaves:
Wed Aug 3 15:33:04 2016 - [info] host_2(host_2:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled
Wed Aug 3 15:33:04 2016 - [info] Replicating from host_1(host_1:3306)
Wed Aug 3 15:33:04 2016 - [info] Primary candidate for the new Master (candidate_master is set)
Wed Aug 3 15:33:04 2016 - [info] host_3(host_3:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled
Wed Aug 3 15:33:04 2016 - [info] Replicating from host_1(host_1:3306)
Wed Aug 3 15:33:04 2016 - [info] Not candidate for the new Master (no_master is set)
Wed Aug 3 15:33:04 2016 - [info] Executing FLUSH NO_WRITE_TO_BINLOG TABLES. This may take long time..
... 一直等待 ...
* 检查SSH
masterha_check_ssh --conf=/etc/app1.cnf
* 检查复制(如果是非GTID模式,slave如果没有设置relay_log_purge=0,那么会有警告)
masterha_check_repl --conf=/etc/app1.cnf
Thu Jul 28 14:03:33 2016 - [warning] relay_log_purge=0 is not set on slave host_2(host_2:3306).
Thu Jul 28 14:03:33 2016 - [warning] relay_log_purge=0 is not set on slave host_3(host_3:3306).
* 检查mha状态
masterha_check_status --conf=/etc/app1.cnf
* 开启mha 监控
nohup masterha_manager --conf=/etc/app1.cnf &
* MASTER 网络断掉,只允许host_monitor的22端口访问(便于恢复网络)
master> iptables -A INPUT -p tcp -s host_monitor --dprot 22 -j ACCEPT
master> iptables -A INPUT -p tcp -s 0.0.0.0/0 -j DROP
* MASTER 恢复网络
master> service iptables restart
Wed Aug 3 18:32:26 2016 - [warning] Got timeout on MySQL Ping(SELECT) child process and killed it! at /usr/share/perl5/vendor_perl/MHA/HealthCheck.pm line 431.
Wed Aug 3 18:32:26 2016 - [info] Executing SSH check script: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/mysql.bin --output_file=/var/log/masterha/app1/save_binary_logs_test --manager_version=0.56 --binlog_prefix=host_1_name
Wed Aug 3 18:32:29 2016 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on 'host_1' (4))
Wed Aug 3 18:32:29 2016 - [warning] Connection failed 2 time(s)..
Wed Aug 3 18:32:31 2016 - [warning] HealthCheck: Got timeout on checking SSH connection to host_1! at /usr/share/perl5/vendor_perl/MHA/HealthCheck.pm line 342.
Wed Aug 3 18:32:32 2016 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on 'host_1' (4))
Wed Aug 3 18:32:32 2016 - [warning] Connection failed 3 time(s)..
Wed Aug 3 18:32:35 2016 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on 'host_1' (4))
Wed Aug 3 18:32:35 2016 - [warning] Connection failed 4 time(s)..
Wed Aug 3 18:32:35 2016 - [warning] Master is not reachable from health checker!
Wed Aug 3 18:32:35 2016 - [warning] Master host_1(host_1:3306) is not reachable!
Wed Aug 3 18:32:35 2016 - [warning] SSH is NOT reachable.
Wed Aug 3 18:32:35 2016 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha_default.cnf and /etc/app1.cnf again, and trying to connect to all servers to check server status..
Wed Aug 3 18:32:35 2016 - [info] Reading default configuration from /etc/masterha_default.cnf..
Wed Aug 3 18:32:35 2016 - [info] Reading application default configuration from /etc/app1.cnf..
Wed Aug 3 18:32:35 2016 - [info] Reading server configuration from /etc/app1.cnf..
Wed Aug 3 18:32:35 2016 - [info] GTID failover mode = 0
Wed Aug 3 18:32:35 2016 - [info] Dead Servers:
Wed Aug 3 18:32:35 2016 - [info] host_1(host_1:3306)
Wed Aug 3 18:32:35 2016 - [info] Alive Servers:
Wed Aug 3 18:32:35 2016 - [info] host_2(host_2:3306)
Wed Aug 3 18:32:35 2016 - [info] host_3(host_3:3306)
Wed Aug 3 18:32:35 2016 - [info] Alive Slaves:
Wed Aug 3 18:32:35 2016 - [info] host_2(host_2:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled
Wed Aug 3 18:32:35 2016 - [info] Replicating from host_1(host_1:3306)
Wed Aug 3 18:32:35 2016 - [info] Primary candidate for the new Master (candidate_master is set)
Wed Aug 3 18:32:35 2016 - [info] host_3(host_3:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled
Wed Aug 3 18:32:35 2016 - [info] Replicating from host_1(host_1:3306)
Wed Aug 3 18:32:35 2016 - [info] Not candidate for the new Master (no_master is set)
Wed Aug 3 18:32:35 2016 - [info] Checking slave configurations..
Wed Aug 3 18:32:35 2016 - [info] read_only=1 is not set on slave host_2(host_2:3306).
Wed Aug 3 18:32:35 2016 - [info] read_only=1 is not set on slave host_3(host_3:3306).
Wed Aug 3 18:32:35 2016 - [warning] relay_log_purge=0 is not set on slave host_3(host_3:3306).
Wed Aug 3 18:32:35 2016 - [info] Checking replication filtering settings..
Wed Aug 3 18:32:35 2016 - [info] Replication filtering check ok.
Wed Aug 3 18:32:35 2016 - [info] Master is down!
Wed Aug 3 18:32:35 2016 - [info] Terminating monitoring script.
Wed Aug 3 18:32:35 2016 - [info] Got exit code 20 (Master dead).
Wed Aug 3 18:32:35 2016 - [info] MHA::MasterFailover version 0.56.
Wed Aug 3 18:32:35 2016 - [info] Starting master failover.
Wed Aug 3 18:32:35 2016 - [info]
Wed Aug 3 18:32:35 2016 - [info] * Phase 1: Configuration Check Phase..
Wed Aug 3 18:32:35 2016 - [info]
Wed Aug 3 18:32:35 2016 - [info] GTID failover mode = 0
Wed Aug 3 18:32:35 2016 - [info] Dead Servers:
Wed Aug 3 18:32:35 2016 - [info] host_1(host_1:3306)
Wed Aug 3 18:32:35 2016 - [info] Checking master reachability via MySQL(double check)...
Wed Aug 3 18:32:36 2016 - [info] ok.
Wed Aug 3 18:32:36 2016 - [info] Alive Servers:
Wed Aug 3 18:32:36 2016 - [info] host_2(host_2:3306)
Wed Aug 3 18:32:36 2016 - [info] host_3(host_3:3306)
Wed Aug 3 18:32:36 2016 - [info] Alive Slaves:
Wed Aug 3 18:32:36 2016 - [info] host_2(host_2:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled
Wed Aug 3 18:32:36 2016 - [info] Replicating from host_1(host_1:3306)
Wed Aug 3 18:32:36 2016 - [info] Primary candidate for the new Master (candidate_master is set)
Wed Aug 3 18:32:36 2016 - [info] host_3(host_3:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled
Wed Aug 3 18:32:36 2016 - [info] Replicating from host_1(host_1:3306)
Wed Aug 3 18:32:36 2016 - [info] Not candidate for the new Master (no_master is set)
Wed Aug 3 18:32:36 2016 - [error][/usr/share/perl5/vendor_perl/MHA/MasterFailover.pm, ln309] Last failover was done at 2016/08/03 16:07:43. Current time is too early to do failover again. If you want to do failover, manually remove /var/log/masterha/app1/app1.failover.complete and run this script again.
Wed Aug 3 18:32:36 2016 - [error][/usr/share/perl5/vendor_perl/MHA/ManagerUtil.pm, ln177] Got ERROR: at /usr/bin/masterha_manager line 65
* 由于之前做过多次切换,然后时间太短,所以failover终止,但是从这里可以看出,MHA已经检测到网络问题,进行failover动作了
* 检查SSH
masterha_check_ssh --conf=/etc/app1.cnf
* 检查复制(如果是非GTID模式,slave如果没有设置relay_log_purge=0,那么会有警告)
masterha_check_repl --conf=/etc/app1.cnf
Thu Jul 28 14:03:33 2016 - [warning] relay_log_purge=0 is not set on slave host_2(host_2:3306).
Thu Jul 28 14:03:33 2016 - [warning] relay_log_purge=0 is not set on slave host_3(host_3:3306).
* 检查mha状态
masterha_check_status --conf=/etc/app1.cnf
* 开启mha 监控
nohup masterha_manager --conf=/etc/app1.cnf &
* MASTER 网络断掉,只允许host_monitor的22端口访问(便于恢复网络)
master> iptables -A INPUT -p tcp -s host_monitor --dprot 22 -j ACCEPT
master> iptables -A INPUT -p tcp -s 0.0.0.0/0 -j DROP
* 1~10秒左右 MASTER 恢复网络
master> service iptables restart
结论:如果时间非常短,比如1~2秒的网络瞬断,不会failover
结论:如果时间比较长,比如:> 5~10秒,那么如果MHA已经判定到Master is not reachable from health checker! 那么,就会进行failover阶段
Wed Aug 3 18:48:35 2016 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..
Wed Aug 3 18:48:50 2016 - [warning] Got timeout on MySQL Ping(SELECT) child process and killed it! at /usr/share/perl5/vendor_perl/MHA/HealthCheck.pm line 431.
Wed Aug 3 18:48:50 2016 - [info] Executing SSH check script: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/mysql.bin --output_file=/var/log/masterha/app1/save_binary_logs_test --manager_version=0.56 --binlog_prefix=host_1_name
Wed Aug 3 18:48:53 2016 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on 'host_1' (4))
Wed Aug 3 18:48:53 2016 - [warning] Connection failed 2 time(s)..
Wed Aug 3 18:48:55 2016 - [warning] HealthCheck: Got timeout on checking SSH connection to host_1! at /usr/share/perl5/vendor_perl/MHA/HealthCheck.pm line 342.
Wed Aug 3 18:48:56 2016 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on 'host_1' (4))
Wed Aug 3 18:48:56 2016 - [warning] Connection failed 3 time(s)..
Wed Aug 3 18:48:59 2016 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on 'host_1' (4))
Wed Aug 3 18:48:59 2016 - [warning] Connection failed 4 time(s)..
Wed Aug 3 18:48:59 2016 - [warning] Master is not reachable from health checker!
Wed Aug 3 18:48:59 2016 - [warning] Master host_1(host_1:3306) is not reachable!
Wed Aug 3 18:48:59 2016 - [warning] SSH is NOT reachable.
----- Failover Report -----
app1: MySQL Master failover host_1(host_1:3306) to host_2(host_2:3306) succeeded
Master host_1(host_1:3306) is down!
Check MHA Manager logs at host_manager_name:/var/log/masterha/app1/app1.log for details.
Started automated(non-interactive) failover.
Invalidated master IP address on host_1(host_1:3306)
The latest slave host_2(host_2:3306) has all relay logs for recovery.
Selected host_2(host_2:3306) as a new master.
host_2(host_2:3306): OK: Applying all logs succeeded.
host_2(host_2:3306): OK: Activated master IP address.
host_3(host_3:3306): This host has the latest relay log events.
Generating relay diff files from the latest slave succeeded.
host_3(host_3:3306): OK: Applying all logs succeeded. Slave started, replicating from host_2(host_2:3306)
host_2(host_2:3306): Resetting slave info succeeded.
Master failover to host_2(host_2:3306) completed successfully.
Manager <-- 不通 --> Master
Manager <-- 正常 --> S1 <-- 正常 --> master
Manager <-- 正常 --> S2 <-- 正常 --> master
* 检查SSH
masterha_check_ssh --conf=/etc/app1.cnf
* 检查复制(如果是非GTID模式,slave如果没有设置relay_log_purge=0,那么会有警告)
masterha_check_repl --conf=/etc/app1.cnf
Thu Jul 28 14:03:33 2016 - [warning] relay_log_purge=0 is not set on slave host_2(host_2:3306).
Thu Jul 28 14:03:33 2016 - [warning] relay_log_purge=0 is not set on slave host_3(host_3:3306).
* 检查mha状态
masterha_check_status --conf=/etc/app1.cnf
* 开启mha 监控
nohup masterha_manager --conf=/etc/app1.cnf &
master> iptables -A INPUT -p tcp -s host_monitor -j ACCEPT
master> iptables -A INPUT -p tcp -s host_2 -j ACCEPT
master> iptables -A INPUT -p tcp -s host_3 -j ACCEPT
master> iptables -A INPUT -p tcp -s 0.0.0.0/0 -j DROP
已经发生failover
----- Failover Report -----
app1: MySQL Master failover host_1(host_1:3306) to host_2(host_2:3306) succeeded
Master host_1(host_1:3306) is down!
Check MHA Manager logs at host_manager_name:/var/log/masterha/app1/app1.log for details.
Started automated(non-interactive) failover.
Invalidated master IP address on host_1(host_1:3306)
The latest slave host_2(host_2:3306) has all relay logs for recovery.
Selected host_2(host_2:3306) as a new master.
host_2(host_2:3306): OK: Applying all logs succeeded.
host_2(host_2:3306): OK: Activated master IP address.
host_3(host_3:3306): This host has the latest relay log events.
Generating relay diff files from the latest slave succeeded.
host_3(host_3:3306): OK: Applying all logs succeeded. Slave started, replicating from host_2(host_2:3306)
host_2(host_2:3306): Resetting slave info succeeded.
Master failover to host_2(host_2:3306) completed successfully.
Manager <-- 不通 --> Master
Manager <-- 正常 --> S1 <-- 正常 --> master
Manager <-- 正常 --> S2 <-- 正常 --> master
* 检查SSH
masterha_check_ssh --conf=/etc/app1.cnf
* 检查复制(如果是非GTID模式,slave如果没有设置relay_log_purge=0,那么会有警告)
masterha_check_repl --conf=/etc/app1.cnf
Thu Jul 28 14:03:33 2016 - [warning] relay_log_purge=0 is not set on slave host_2(host_2:3306).
Thu Jul 28 14:03:33 2016 - [warning] relay_log_purge=0 is not set on slave host_3(host_3:3306).
* 检查mha状态
masterha_check_status --conf=/etc/app1.cnf
* 开启mha 监控
nohup masterha_manager --conf=/etc/app1.cnf --last_failover_minute=1 &
* 二次检测
secondary_check_script= masterha_secondary_check -s host_2 -s host_3
master> iptables -A INPUT -p tcp -s host_monitor -j ACCEPT
master> iptables -A INPUT -p tcp -s host_2 -j ACCEPT
master> iptables -A INPUT -p tcp -s host_3 -j ACCEPT
master> iptables -A INPUT -p tcp -s 0.0.0.0/0 -j DROP
结论: 因为二次检查可以通过,所以MHA认为master没有挂,没有failover,但是在一直循环检测中
结论: 只要有二次检测,那么只要有一个别的server可以连通master,那么就会认为master没有挂,就不会failover
Fri Aug 5 10:16:24 2016 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..
Fri Aug 5 10:16:24 2016 - [info] HealthCheck: SSH to host_1 is reachable.
Fri Aug 5 10:17:06 2016 - [warning] Got timeout on MySQL Ping(SELECT) child process and killed it! at /usr/share/perl5/vendor_perl/MHA/HealthCheck.pm line 431.
Fri Aug 5 10:17:06 2016 - [info] Executing secondary network check script: masterha_secondary_check -s host_2 -s host_3 --user=root --master_host=host_1 --master_ip=host_1 --master_port=3306 --master_user=dba --master_password=dba --ping_type=SELECT
Fri Aug 5 10:17:06 2016 - [info] Executing SSH check script: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/mysql.bin --output_file=/var/log/masterha/app1/save_binary_logs_test --manager_version=0.56 --binlog_prefix=host_1_name
Master is reachable from host_2!
Fri Aug 5 10:17:07 2016 - [warning] Master is reachable from at least one of other monitoring servers. Failover should not happen.
Fri Aug 5 10:17:09 2016 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on 'host_1' (4))
Fri Aug 5 10:17:09 2016 - [warning] Connection failed 2 time(s)..
Fri Aug 5 10:17:11 2016 - [warning] HealthCheck: Got timeout on checking SSH connection to host_1! at /usr/share/perl5/vendor_perl/MHA/HealthCheck.pm line 342.
Fri Aug 5 10:17:12 2016 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on 'host_1' (4))
Fri Aug 5 10:17:12 2016 - [warning] Connection failed 3 time(s)..
Fri Aug 5 10:17:15 2016 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on 'host_1' (4))
Fri Aug 5 10:17:15 2016 - [warning] Connection failed 4 time(s)..
Fri Aug 5 10:17:15 2016 - [warning] Secondary network check script returned errors. Failover should not start so checking server status again. Check network settings for details.
Fri Aug 5 10:17:18 2016 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on 'host_1' (4))
Fri Aug 5 10:17:18 2016 - [warning] Connection failed 1 time(s)..
Fri Aug 5 10:17:19 2016 - [info] Executing SSH check script: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/mysql.bin --output_file=/var/log/masterha/app1/save_binary_logs_test --manager_version=0.56 --binlog_prefix=host_1_name
Fri Aug 5 10:17:19 2016 - [info] Executing secondary network check script: masterha_secondary_check -s host_2 -s host_3 --user=root --master_host=host_1 --master_ip=host_1 --master_port=3306 --master_user=dba --master_password=dba --ping_type=SELECT
Master is reachable from host_2!
Fri Aug 5 10:17:19 2016 - [warning] Master is reachable from at least one of other monitoring servers. Failover should not happen.
Fri Aug 5 10:17:22 2016 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on 'host_1' (4))
Fri Aug 5 10:17:22 2016 - [warning] Connection failed 2 time(s)..
Fri Aug 5 10:17:24 2016 - [warning] HealthCheck: Got timeout on checking SSH connection to host_1! at /usr/share/perl5/vendor_perl/MHA/HealthCheck.pm line 342.
Fri Aug 5 10:17:25 2016 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on 'host_1' (4))
Fri Aug 5 10:17:25 2016 - [warning] Connection failed 3 time(s)..
Fri Aug 5 10:17:28 2016 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on 'host_1' (4))
Fri Aug 5 10:17:28 2016 - [warning] Connection failed 4 time(s)..
Fri Aug 5 10:17:28 2016 - [warning] Secondary network check script returned errors. Failover should not start so checking server status again. Check network settings for details.
Fri Aug 5 10:17:31 2016 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on 'host_1' (4))
Fri Aug 5 10:17:31 2016 - [warning] Connection failed 1 time(s)..
Fri Aug 5 10:17:31 2016 - [info] Executing SSH check script: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/mysql.bin --output_file=/var/log/masterha/app1/save_binary_logs_test --manager_version=0.56 --binlog_prefix=host_1_name
Fri Aug 5 10:17:31 2016 - [info] Executing secondary network check script: masterha_secondary_check -s host_2 -s host_3 --user=root --master_host=host_1 --master_ip=host_1 --master_port=3306 --master_user=dba --master_password=dba --ping_type=SELECT
Master is reachable from host_2!
Manager <-- 不通 --> Master
Manager <-- 正常 --> S1 <-- 不通 --> master
Manager <-- 正常 --> S2 <-- 正常 --> master
* 检查SSH
masterha_check_ssh --conf=/etc/app1.cnf
* 检查复制(如果是非GTID模式,slave如果没有设置relay_log_purge=0,那么会有警告)
masterha_check_repl --conf=/etc/app1.cnf
Thu Jul 28 14:03:33 2016 - [warning] relay_log_purge=0 is not set on slave host_2(host_2:3306).
Thu Jul 28 14:03:33 2016 - [warning] relay_log_purge=0 is not set on slave host_3(host_3:3306).
* 检查mha状态
masterha_check_status --conf=/etc/app1.cnf
* 开启mha 监控
nohup masterha_manager --conf=/etc/app1.cnf &
master> iptables -A INPUT -p tcp -s host_monitor -j ACCEPT
master> iptables -A INPUT -p tcp -s host_3 -j ACCEPT
master> iptables -A INPUT -p tcp -s 0.0.0.0/0 -j DROP
结论: 因为二次检查可以通过,所以MHA认为master没有挂,没有failover,但是在一直循环检测中
结论: 只要有二次检测,那么只要有一个别的server可以连通master,那么就会认为master没有挂,就不会failover
Fri Aug 5 10:32:27 2016 - [info] Starting ping health check on host_1(host_1:3306)..
Fri Aug 5 10:32:27 2016 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..
Fri Aug 5 10:33:30 2016 - [warning] Got timeout on MySQL Ping(SELECT) child process and killed it! at /usr/share/perl5/vendor_perl/MHA/HealthCheck.pm line 431.
Fri Aug 5 10:33:30 2016 - [info] Executing secondary network check script: masterha_secondary_check -s host_2 -s host_3 --user=root --master_host=host_1 --master_ip=host_1 --master_port=3306 --master_user=dba --master_password=dba --ping_type=SELECT
Fri Aug 5 10:33:30 2016 - [info] Executing SSH check script: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/mysql.bin --output_file=/var/log/masterha/app1/save_binary_logs_test --manager_version=0.56 --binlog_prefix=host_1_name
Fri Aug 5 10:33:33 2016 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on 'host_1' (4))
Fri Aug 5 10:33:33 2016 - [warning] Connection failed 2 time(s)..
Fri Aug 5 10:33:35 2016 - [warning] HealthCheck: Got timeout on checking SSH connection to host_1! at /usr/share/perl5/vendor_perl/MHA/HealthCheck.pm line 342.
Monitoring server host_2 is reachable, Master is not reachable from host_2. OK.
Master is reachable from host_3!
Fri Aug 5 10:33:35 2016 - [warning] Master is reachable from at least one of other monitoring servers. Failover should not happen.
Fri Aug 5 10:33:36 2016 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on 'host_1' (4))
Fri Aug 5 10:33:36 2016 - [warning] Connection failed 3 time(s)..
Fri Aug 5 10:33:39 2016 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on 'host_1' (4))
Fri Aug 5 10:33:39 2016 - [warning] Connection failed 4 time(s)..
Fri Aug 5 10:33:39 2016 - [warning] Secondary network check script returned errors. Failover should not start so checking server status again. Check network settings for details.
Fri Aug 5 10:33:42 2016 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on 'host_1' (4))
Fri Aug 5 10:33:42 2016 - [warning] Connection failed 1 time(s)..
Fri Aug 5 10:33:42 2016 - [info] Executing secondary network check script: masterha_secondary_check -s host_2 -s host_3 --user=root --master_host=host_1 --master_ip=host_1 --master_port=3306 --master_user=dba --master_password=dba --ping_type=SELECT
Fri Aug 5 10:33:42 2016 - [info] Executing SSH check script: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/mysql.bin --output_file=/var/log/masterha/app1/save_binary_logs_test --manager_version=0.56 --binlog_prefix=host_1_name
Fri Aug 5 10:33:45 2016 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on 'host_1' (4))
Fri Aug 5 10:33:45 2016 - [warning] Connection failed 2 time(s)..
Fri Aug 5 10:33:47 2016 - [warning] HealthCheck: Got timeout on checking SSH connection to host_1! at /usr/share/perl5/vendor_perl/MHA/HealthCheck.pm line 342.
Monitoring server host_2 is reachable, Master is not reachable from host_2. OK.
Master is reachable from host_3!
Fri Aug 5 10:33:47 2016 - [warning] Master is reachable from at least one of other monitoring servers. Failover should not happen.
Fri Aug 5 10:33:48 2016 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on 'host_1' (4))
Fri Aug 5 10:33:48 2016 - [warning] Connection failed 3 time(s)..
Fri Aug 5 10:33:51 2016 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on 'host_1' (4))
Fri Aug 5 10:33:51 2016 - [warning] Connection failed 4 time(s)..
Fri Aug 5 10:33:51 2016 - [warning] Secondary network check script returned errors. Failover should not start so checking server status again. Check network settings for details.
Fri Aug 5 10:33:54 2016 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on 'host_1' (4))
Fri Aug 5 10:33:54 2016 - [warning] Connection failed 1 time(s)..
Fri Aug 5 10:33:54 2016 - [info] Executing SSH check script: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/mysql.bin --output_file=/var/log/masterha/app1/save_binary_logs_test --manager_version=0.56 --binlog_prefix=host_1_name
Fri Aug 5 10:33:54 2016 - [info] Executing secondary network check script: masterha_secondary_check -s host_2 -s host_3 --user=root --master_host=host_1 --master_ip=host_1 --master_port=3306 --master_user=dba --master_password=dba --ping_type=SELECT
Fri Aug 5 10:33:57 2016 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on 'host_1' (4))
Fri Aug 5 10:33:57 2016 - [warning] Connection failed 2 time(s)..
Fri Aug 5 10:33:59 2016 - [warning] HealthCheck: Got timeout on checking SSH connection to host_1! at /usr/share/perl5/vendor_perl/MHA/HealthCheck.pm line 342.
Monitoring server host_2 is reachable, Master is not reachable from host_2. OK.
Master is reachable from host_3!
Fri Aug 5 10:34:00 2016 - [warning] Master is reachable from at least one of other monitoring servers. Failover should not happen.
Fri Aug 5 10:34:00 2016 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on 'host_1' (4))
Fri Aug 5 10:34:00 2016 - [warning] Connection failed 3 time(s)..
Manager <-- 不通 --> Master
Manager <-- 正常 --> S1 <-- 不通 --> master
Manager <-- 正常 --> S2 <-- 不通 --> master
* 检查SSH
masterha_check_ssh --conf=/etc/app1.cnf
* 检查复制(如果是非GTID模式,slave如果没有设置relay_log_purge=0,那么会有警告)
masterha_check_repl --conf=/etc/app1.cnf
Thu Jul 28 14:03:33 2016 - [warning] relay_log_purge=0 is not set on slave host_2(host_2:3306).
Thu Jul 28 14:03:33 2016 - [warning] relay_log_purge=0 is not set on slave host_3(host_3:3306).
* 检查mha状态
masterha_check_status --conf=/etc/app1.cnf
* 开启mha 监控
nohup masterha_manager --conf=/etc/app1.cnf &
master> iptables -A INPUT -p tcp -s host_monitor -j ACCEPT
master> iptables -A INPUT -p tcp -s 0.0.0.0/0 -j DROP
由于都不通,那么会failover
Fri Aug 5 10:27:56 2016 - [warning] Got timeout on MySQL Ping(SELECT) child process and killed it! at /usr/share/perl5/vendor_perl/MHA/HealthCheck.pm line 431.
Fri Aug 5 10:27:56 2016 - [info] Executing SSH check script: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/mysql.bin --output_file=/var/log/masterha/app1/save_binary_logs_test --manager_version=0.56 --binlog_prefix=host_1_name
Fri Aug 5 10:27:56 2016 - [info] Executing secondary network check script: masterha_secondary_check -s host_2 -s host_3 --user=root --master_host=host_1 --master_ip=host_1 --master_port=3306 --master_user=dba --master_password=dba --ping_type=SELECT
Fri Aug 5 10:27:59 2016 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on 'host_1' (4))
Fri Aug 5 10:27:59 2016 - [warning] Connection failed 2 time(s)..
Fri Aug 5 10:28:01 2016 - [warning] HealthCheck: Got timeout on checking SSH connection to host_1! at /usr/share/perl5/vendor_perl/MHA/HealthCheck.pm line 342.
Monitoring server host_2 is reachable, Master is not reachable from host_2. OK.
Fri Aug 5 10:28:02 2016 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on 'host_1' (4))
Fri Aug 5 10:28:02 2016 - [warning] Connection failed 3 time(s)..
Fri Aug 5 10:28:05 2016 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on 'host_1' (4))
Fri Aug 5 10:28:05 2016 - [warning] Connection failed 4 time(s)..
Monitoring server host_3 is reachable, Master is not reachable from host_3. OK.
Fri Aug 5 10:28:06 2016 - [info] Master is not reachable from all other monitoring servers. Failover should start.
Fri Aug 5 10:28:06 2016 - [warning] Master is not reachable from health checker!
Fri Aug 5 10:28:06 2016 - [warning] Master host_1(host_1:3306) is not reachable!
Fri Aug 5 10:28:06 2016 - [warning] SSH is NOT reachable.
Fri Aug 5 10:28:06 2016 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha_default.cnf and /etc/app1.cnf again, and trying to connect to all servers to check server status..
Fri Aug 5 10:28:06 2016 - [info] Reading default configuration from /etc/masterha_default.cnf..
Fri Aug 5 10:28:06 2016 - [info] Reading application default configuration from /etc/app1.cnf..
Fri Aug 5 10:28:06 2016 - [info] Reading server configuration from /etc/app1.cnf..
Fri Aug 5 10:28:06 2016 - [info] GTID failover mode = 0
Fri Aug 5 10:28:06 2016 - [info] Dead Servers:
Fri Aug 5 10:28:06 2016 - [info] host_1(host_1:3306)
Fri Aug 5 10:28:06 2016 - [info] Alive Servers:
Fri Aug 5 10:28:06 2016 - [info] host_2(host_2:3306)
Fri Aug 5 10:28:06 2016 - [info] host_3(host_3:3306)
Fri Aug 5 10:28:06 2016 - [info] Alive Slaves:
Fri Aug 5 10:28:06 2016 - [info] host_2(host_2:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled
Fri Aug 5 10:28:06 2016 - [info] Replicating from host_1(host_1:3306)
Fri Aug 5 10:28:06 2016 - [info] Primary candidate for the new Master (candidate_master is set)
Fri Aug 5 10:28:06 2016 - [info] host_3(host_3:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled
Fri Aug 5 10:28:06 2016 - [info] Replicating from host_1(host_1:3306)
Fri Aug 5 10:28:06 2016 - [info] Not candidate for the new Master (no_master is set)
Fri Aug 5 10:28:06 2016 - [info] Checking slave configurations..
Fri Aug 5 10:28:06 2016 - [info] read_only=1 is not set on slave host_2(host_2:3306).
Fri Aug 5 10:28:06 2016 - [info] read_only=1 is not set on slave host_3(host_3:3306).
Fri Aug 5 10:28:06 2016 - [warning] relay_log_purge=0 is not set on slave host_3(host_3:3306).
Fri Aug 5 10:28:06 2016 - [info] Checking replication filtering settings..
Fri Aug 5 10:28:06 2016 - [info] Replication filtering check ok.
Fri Aug 5 10:28:06 2016 - [info] Master is down!
Fri Aug 5 10:28:06 2016 - [info] Terminating monitoring script.
Fri Aug 5 10:28:06 2016 - [info] Got exit code 20 (Master dead).
Fri Aug 5 10:28:06 2016 - [info] MHA::MasterFailover version 0.56.
Fri Aug 5 10:28:06 2016 - [info] Starting master failover.
Fri Aug 5 10:28:06 2016 - [info]
Fri Aug 5 10:28:06 2016 - [info] * Phase 1: Configuration Check Phase..
Fri Aug 5 10:28:06 2016 - [info]
Fri Aug 5 10:28:07 2016 - [info] GTID failover mode = 0
Fri Aug 5 10:28:07 2016 - [info] Dead Servers:
Fri Aug 5 10:28:07 2016 - [info] host_1(host_1:3306)
Fri Aug 5 10:28:07 2016 - [info] Checking master reachability via MySQL(double check)...
Fri Aug 5 10:28:08 2016 - [info] ok.
Fri Aug 5 10:28:08 2016 - [info] Alive Servers:
Fri Aug 5 10:28:08 2016 - [info] host_2(host_2:3306)
Fri Aug 5 10:28:08 2016 - [info] host_3(host_3:3306)
Fri Aug 5 10:28:08 2016 - [info] Alive Slaves:
Fri Aug 5 10:28:08 2016 - [info] host_2(host_2:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled
Fri Aug 5 10:28:08 2016 - [info] Replicating from host_1(host_1:3306)
Fri Aug 5 10:28:08 2016 - [info] Primary candidate for the new Master (candidate_master is set)
Fri Aug 5 10:28:08 2016 - [info] host_3(host_3:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled
Fri Aug 5 10:28:08 2016 - [info] Replicating from host_1(host_1:3306)
Fri Aug 5 10:28:08 2016 - [info] Not candidate for the new Master (no_master is set)
Fri Aug 5 10:28:08 2016 - [info] Starting Non-GTID based failover.
Fri Aug 5 10:28:08 2016 - [info]
Fri Aug 5 10:28:08 2016 - [info] ** Phase 1: Configuration Check Phase completed.
Fri Aug 5 10:28:08 2016 - [info]
Fri Aug 5 10:28:08 2016 - [info] * Phase 2: Dead Master Shutdown Phase..
Fri Aug 5 10:28:08 2016 - [info]
Fri Aug 5 10:28:08 2016 - [info] Forcing shutdown so that applications never connect to the current master..
Fri Aug 5 10:28:08 2016 - [info] Executing master IP deactivation script:
Fri Aug 5 10:28:08 2016 - [info] /home/mysql/MHA/masterha/master_ip_failover --orig_master_host=host_1 --orig_master_ip=host_1 --orig_master_port=3306 --command=stop
Fri Aug 5 10:28:08 2016 - [info] done.
Fri Aug 5 10:28:08 2016 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.
Fri Aug 5 10:28:08 2016 - [info] * Phase 2: Dead Master Shutdown Phase completed.
Fri Aug 5 10:28:08 2016 - [info]
Fri Aug 5 10:28:08 2016 - [info] * Phase 3: Master Recovery Phase..
Fri Aug 5 10:28:08 2016 - [info]
Fri Aug 5 10:28:08 2016 - [info] * Phase 3.1: Getting Latest Slaves Phase..
Fri Aug 5 10:28:08 2016 - [info]
Fri Aug 5 10:28:08 2016 - [info] The latest binary log file/position on all slaves is host_1_name.000001:154
Fri Aug 5 10:28:08 2016 - [info] Latest slaves (Slaves that received relay log files to the latest):
Fri Aug 5 10:28:08 2016 - [info] host_2(host_2:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled
Fri Aug 5 10:28:08 2016 - [info] Replicating from host_1(host_1:3306)
Fri Aug 5 10:28:08 2016 - [info] Primary candidate for the new Master (candidate_master is set)
Fri Aug 5 10:28:08 2016 - [info] host_3(host_3:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled
Fri Aug 5 10:28:08 2016 - [info] Replicating from host_1(host_1:3306)
Fri Aug 5 10:28:08 2016 - [info] Not candidate for the new Master (no_master is set)
Fri Aug 5 10:28:08 2016 - [info] The oldest binary log file/position on all slaves is host_1_name.000001:154
Fri Aug 5 10:28:08 2016 - [info] Oldest slaves:
Fri Aug 5 10:28:08 2016 - [info] host_2(host_2:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled
Fri Aug 5 10:28:08 2016 - [info] Replicating from host_1(host_1:3306)
Fri Aug 5 10:28:08 2016 - [info] Primary candidate for the new Master (candidate_master is set)
Fri Aug 5 10:28:08 2016 - [info] host_3(host_3:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled
Fri Aug 5 10:28:08 2016 - [info] Replicating from host_1(host_1:3306)
Fri Aug 5 10:28:08 2016 - [info] Not candidate for the new Master (no_master is set)
Fri Aug 5 10:28:08 2016 - [info]
Fri Aug 5 10:28:08 2016 - [info] * Phase 3.2: Saving Dead Master's Binlog Phase..
Fri Aug 5 10:28:08 2016 - [info]
Fri Aug 5 10:28:08 2016 - [warning] Dead Master is not SSH reachable. Could not save it's binlogs. Transactions that were not sent to the latest slave (Read_Master_Log_Pos to the tail of the dead master's binlog) were lost.
Fri Aug 5 10:28:08 2016 - [info]
Fri Aug 5 10:28:08 2016 - [info] * Phase 3.3: Determining New Master Phase..
Fri Aug 5 10:28:08 2016 - [info]
Fri Aug 5 10:28:08 2016 - [info] Finding the latest slave that has all relay logs for recovering other slaves..
Fri Aug 5 10:28:08 2016 - [info] All slaves received relay logs to the same position. No need to resync each other.
Fri Aug 5 10:28:08 2016 - [info] Searching new master from slaves..
Fri Aug 5 10:28:08 2016 - [info] Candidate masters from the configuration file:
Fri Aug 5 10:28:08 2016 - [info] host_2(host_2:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled
Fri Aug 5 10:28:08 2016 - [info] Replicating from host_1(host_1:3306)
Fri Aug 5 10:28:08 2016 - [info] Primary candidate for the new Master (candidate_master is set)
Fri Aug 5 10:28:08 2016 - [info] Non-candidate masters:
Fri Aug 5 10:28:08 2016 - [info] host_3(host_3:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled
Fri Aug 5 10:28:08 2016 - [info] Replicating from host_1(host_1:3306)
Fri Aug 5 10:28:08 2016 - [info] Not candidate for the new Master (no_master is set)
Fri Aug 5 10:28:08 2016 - [info] Searching from candidate_master slaves which have received the latest relay log events..
Fri Aug 5 10:28:08 2016 - [info] New master is host_2(host_2:3306)
Fri Aug 5 10:28:08 2016 - [info] Starting master failover..
Fri Aug 5 10:28:08 2016 - [info]
From:
host_1(host_1:3306) (current master)
+--host_2(host_2:3306)
+--host_3(host_3:3306)
To:
host_2(host_2:3306) (new master)
+--host_3(host_3:3306)
Fri Aug 5 10:28:08 2016 - [info]
Fri Aug 5 10:28:08 2016 - [info] * Phase 3.3: New Master Diff Log Generation Phase..
Fri Aug 5 10:28:08 2016 - [info]
Fri Aug 5 10:28:08 2016 - [info] This server has all relay logs. No need to generate diff files from the latest slave.
Fri Aug 5 10:28:08 2016 - [info]
Fri Aug 5 10:28:08 2016 - [info] * Phase 3.4: Master Log Apply Phase..
Fri Aug 5 10:28:08 2016 - [info]
Fri Aug 5 10:28:08 2016 - [info] *NOTICE: If any error happens from this phase, manual recovery is needed.
Fri Aug 5 10:28:08 2016 - [info] Starting recovery on host_2(host_2:3306)..
Fri Aug 5 10:28:08 2016 - [info] This server has all relay logs. Waiting all logs to be applied..
Fri Aug 5 10:28:08 2016 - [info] done.
Fri Aug 5 10:28:08 2016 - [info] All relay logs were successfully applied.
Fri Aug 5 10:28:08 2016 - [info] Getting new master's binlog name and position..
Fri Aug 5 10:28:08 2016 - [info] host_2_name.000001:154
Fri Aug 5 10:28:08 2016 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='host_2', MASTER_PORT=3306, MASTER_LOG_FILE='host_2_name.000001', MASTER_LOG_POS=154, MASTER_USER='repl', MASTER_PASSWORD='xxx';
Fri Aug 5 10:28:08 2016 - [info] Executing master IP activate script:
Fri Aug 5 10:28:08 2016 - [info] /home/mysql/MHA/masterha/master_ip_failover --command=start --ssh_user=root --orig_master_host=host_1 --orig_master_ip=host_1 --orig_master_port=3306 --new_master_host=host_2 --new_master_ip=host_2 --new_master_port=3306 --new_master_user='dba' --new_master_password='dba'
Set read_only=0 on the new master.
No need to Creating app user on the new master..
Fri Aug 5 10:28:08 2016 - [info] OK.
Fri Aug 5 10:28:08 2016 - [info] ** Finished master recovery successfully.
Fri Aug 5 10:28:08 2016 - [info] * Phase 3: Master Recovery Phase completed.
Fri Aug 5 10:28:08 2016 - [info]
Fri Aug 5 10:28:08 2016 - [info] * Phase 4: Slaves Recovery Phase..
Fri Aug 5 10:28:08 2016 - [info]
Fri Aug 5 10:28:08 2016 - [info] * Phase 4.1: Starting Parallel Slave Diff Log Generation Phase..
Fri Aug 5 10:28:08 2016 - [info]
Fri Aug 5 10:28:08 2016 - [info] -- Slave diff file generation on host host_3(host_3:3306) started, pid: 3964. Check tmp log /var/log/masterha/app1/host_3_3306_20160805102806.log if it takes time..
Fri Aug 5 10:28:08 2016 - [info]
Fri Aug 5 10:28:08 2016 - [info] Log messages from host_3 ...
Fri Aug 5 10:28:08 2016 - [info]
Fri Aug 5 10:28:08 2016 - [info] This server has all relay logs. No need to generate diff files from the latest slave.
Fri Aug 5 10:28:08 2016 - [info] End of log messages from host_3.
Fri Aug 5 10:28:08 2016 - [info] -- host_3(host_3:3306) has the latest relay log events.
Fri Aug 5 10:28:08 2016 - [info] Generating relay diff files from the latest slave succeeded.
Fri Aug 5 10:28:08 2016 - [info]
Fri Aug 5 10:28:08 2016 - [info] * Phase 4.2: Starting Parallel Slave Log Apply Phase..
Fri Aug 5 10:28:08 2016 - [info]
Fri Aug 5 10:28:08 2016 - [info] -- Slave recovery on host host_3(host_3:3306) started, pid: 3966. Check tmp log /var/log/masterha/app1/host_3_3306_20160805102806.log if it takes time..
Fri Aug 5 10:28:08 2016 - [info]
Fri Aug 5 10:28:08 2016 - [info] Log messages from host_3 ...
Fri Aug 5 10:28:08 2016 - [info]
Fri Aug 5 10:28:08 2016 - [info] Starting recovery on host_3(host_3:3306)..
Fri Aug 5 10:28:08 2016 - [info] This server has all relay logs. Waiting all logs to be applied..
Fri Aug 5 10:28:08 2016 - [info] done.
Fri Aug 5 10:28:08 2016 - [info] All relay logs were successfully applied.
Fri Aug 5 10:28:08 2016 - [info] Resetting slave host_3(host_3:3306) and starting replication from the new master host_2(host_2:3306)..
Fri Aug 5 10:28:08 2016 - [info] Executed CHANGE MASTER.
Fri Aug 5 10:28:08 2016 - [info] Slave started.
Fri Aug 5 10:28:08 2016 - [info] End of log messages from host_3.
Fri Aug 5 10:28:08 2016 - [info] -- Slave recovery on host host_3(host_3:3306) succeeded.
Fri Aug 5 10:28:08 2016 - [info] All new slave servers recovered successfully.
Fri Aug 5 10:28:08 2016 - [info]
Fri Aug 5 10:28:08 2016 - [info] * Phase 5: New master cleanup phase..
Fri Aug 5 10:28:08 2016 - [info]
Fri Aug 5 10:28:08 2016 - [info] Resetting slave info on the new master..
Fri Aug 5 10:28:08 2016 - [info] host_2: Resetting slave info succeeded.
Fri Aug 5 10:28:08 2016 - [info] Master failover to host_2(host_2:3306) completed successfully.
Fri Aug 5 10:28:08 2016 - [info]
----- Failover Report -----
app1: MySQL Master failover host_1(host_1:3306) to host_2(host_2:3306) succeeded
Master host_1(host_1:3306) is down!
Check MHA Manager logs at host_manager_name:/var/log/masterha/app1/app1.log for details.
Started automated(non-interactive) failover.
Invalidated master IP address on host_1(host_1:3306)
The latest slave host_2(host_2:3306) has all relay logs for recovery.
Selected host_2(host_2:3306) as a new master.
host_2(host_2:3306): OK: Applying all logs succeeded.
host_2(host_2:3306): OK: Activated master IP address.
host_3(host_3:3306): This host has the latest relay log events.
Generating relay diff files from the latest slave succeeded.
host_3(host_3:3306): OK: Applying all logs succeeded. Slave started, replicating from host_2(host_2:3306)
host_2(host_2:3306): Resetting slave info succeeded.
Master failover to host_2(host_2:3306) completed successfully.
Manager <-- 不通 --> Master
Manager <-- 不通 --> S1 <-- 正常 --> master
Manager <-- 不通 --> S2 <-- 正常 --> master
* 检查SSH
masterha_check_ssh --conf=/etc/app1.cnf
* 检查复制(如果是非GTID模式,slave如果没有设置relay_log_purge=0,那么会有警告)
masterha_check_repl --conf=/etc/app1.cnf
Thu Jul 28 14:03:33 2016 - [warning] relay_log_purge=0 is not set on slave host_2(host_2:3306).
Thu Jul 28 14:03:33 2016 - [warning] relay_log_purge=0 is not set on slave host_3(host_3:3306).
* 检查mha状态
masterha_check_status --conf=/etc/app1.cnf
* 开启mha 监控
nohup masterha_manager --conf=/etc/app1.cnf &
master> iptables -A INPUT -p tcp -s host_monitor -j ACCEPT
master> iptables -A INPUT -p tcp -s host_2 -j ACCEPT
master> iptables -A INPUT -p tcp -s host_3 -j ACCEPT
master> iptables -A INPUT -p tcp -s 0.0.0.0/0 -j DROP
s1> iptables -A INPUT -p tcp -s host_monitor -j ACCEPT
s1> iptables -A INPUT -p tcp -s host_1 -j ACCEPT
s1> iptables -A INPUT -p tcp -s 0.0.0.0/0 -j DROP
s2> iptables -A INPUT -p tcp -s host_monitor -j ACCEPT
s2> iptables -A INPUT -p tcp -s host_1 -j ACCEPT
s2> iptables -A INPUT -p tcp -s 0.0.0.0/0 -j DROP
结论:因为master不通,其他slave也不通,那么没有存活的slave,也就不会failover了
Fri Aug 5 10:39:39 2016 - [info] Checking master_ip_failover_script status:
Fri Aug 5 10:39:39 2016 - [info] /home/mysql/MHA/masterha/master_ip_failover --command=status --ssh_user=root --orig_master_host=host_1 --orig_master_ip=host_1 --orig_master_port=3306
Fri Aug 5 10:39:39 2016 - [info] OK.
Fri Aug 5 10:39:39 2016 - [warning] shutdown_script is not defined.
Fri Aug 5 10:39:39 2016 - [info] Set master ping interval 3 seconds.
Fri Aug 5 10:39:39 2016 - [warning] secondary_check_script is not defined. It is highly recommended setting it to check master reachability from two or more routes.
Fri Aug 5 10:39:39 2016 - [info] Starting ping health check on host_1(host_1:3306)..
Fri Aug 5 10:39:39 2016 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..
Fri Aug 5 10:41:51 2016 - [warning] Got timeout on MySQL Ping(SELECT) child process and killed it! at /usr/share/perl5/vendor_perl/MHA/HealthCheck.pm line 431.
Fri Aug 5 10:41:51 2016 - [info] Executing SSH check script: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/mysql.bin --output_file=/var/log/masterha/app1/save_binary_logs_test --manager_version=0.56 --binlog_prefix=host_1_name
Fri Aug 5 10:41:54 2016 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on 'host_1' (4))
Fri Aug 5 10:41:54 2016 - [warning] Connection failed 2 time(s)..
Fri Aug 5 10:41:56 2016 - [warning] HealthCheck: Got timeout on checking SSH connection to host_1! at /usr/share/perl5/vendor_perl/MHA/HealthCheck.pm line 342.
Fri Aug 5 10:41:57 2016 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on 'host_1' (4))
Fri Aug 5 10:41:57 2016 - [warning] Connection failed 3 time(s)..
Fri Aug 5 10:42:00 2016 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on 'host_1' (4))
Fri Aug 5 10:42:00 2016 - [warning] Connection failed 4 time(s)..
Fri Aug 5 10:42:00 2016 - [warning] Master is not reachable from health checker!
Fri Aug 5 10:42:00 2016 - [warning] Master host_1(host_1:3306) is not reachable!
Fri Aug 5 10:42:00 2016 - [warning] SSH is NOT reachable.
Fri Aug 5 10:42:00 2016 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha_default.cnf and /etc/app1.cnf again, and trying to connect to all servers to check server status..
Fri Aug 5 10:42:00 2016 - [info] Reading default configuration from /etc/masterha_default.cnf..
Fri Aug 5 10:42:00 2016 - [info] Reading application default configuration from /etc/app1.cnf..
Fri Aug 5 10:42:00 2016 - [info] Reading server configuration from /etc/app1.cnf..
Fri Aug 5 10:42:08 2016 - [error][/usr/share/perl5/vendor_perl/MHA/ServerManager.pm, ln188] There is no alive server. We can't do failover
Fri Aug 5 10:42:08 2016 - [warning] Got Error: at /usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm line 558
Fri Aug 5 10:42:08 2016 - [info] Got exit code 1 (Not master dead).
Manager <-- 不通 --> Master
Manager <-- 不通 --> S1 <-- 正常 --> master
Manager <-- 不通 --> S2 <-- 正常 --> master
* 检查SSH
masterha_check_ssh --conf=/etc/app1.cnf
* 检查复制(如果是非GTID模式,slave如果没有设置relay_log_purge=0,那么会有警告)
masterha_check_repl --conf=/etc/app1.cnf
Thu Jul 28 14:03:33 2016 - [warning] relay_log_purge=0 is not set on slave host_2(host_2:3306).
Thu Jul 28 14:03:33 2016 - [warning] relay_log_purge=0 is not set on slave host_3(host_3:3306).
* 检查mha状态
masterha_check_status --conf=/etc/app1.cnf
* 开启mha 监控
nohup masterha_manager --conf=/etc/app1.cnf &
master> iptables -A INPUT -p tcp -s host_monitor -j ACCEPT
master> iptables -A INPUT -p tcp -s host_2 -j ACCEPT
master> iptables -A INPUT -p tcp -s host_3 -j ACCEPT
master> iptables -A INPUT -p tcp -s 0.0.0.0/0 -j DROP
s1> iptables -A INPUT -p tcp -s host_monitor -j ACCEPT
s1> iptables -A INPUT -p tcp -s host_1 -j ACCEPT
s1> iptables -A INPUT -p tcp -s 0.0.0.0/0 -j DROP
s2> iptables -A INPUT -p tcp -s host_monitor -j ACCEPT
s2> iptables -A INPUT -p tcp -s host_1 -j ACCEPT
s2> iptables -A INPUT -p tcp -s 0.0.0.0/0 -j DROP
Monitoring server host_2 is NOT reachable!
只要monitor 服务器,有一台不通,那么就不会failover,MHA manager会认为这是网络故障
Fri Aug 5 10:47:41 2016 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..
Fri Aug 5 10:55:50 2016 - [warning] Got timeout on MySQL Ping(SELECT) child process and killed it! at /usr/share/perl5/vendor_perl/MHA/HealthCheck.pm line 431.
Fri Aug 5 10:55:50 2016 - [info] Executing secondary network check script: masterha_secondary_check -s host_2 -s host_3 --user=root --master_host=host_1 --master_ip=host_1 --master_port=3306 --master_user=dba --master_password=dba --ping_type=SELECT
Fri Aug 5 10:55:50 2016 - [info] Executing SSH check script: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/mysql.bin --output_file=/var/log/masterha/app1/save_binary_logs_test --manager_version=0.56 --binlog_prefix=host_1_name
Fri Aug 5 10:55:53 2016 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on 'host_1' (4))
Fri Aug 5 10:55:53 2016 - [warning] Connection failed 2 time(s)..
Fri Aug 5 10:55:55 2016 - [warning] HealthCheck: Got timeout on checking SSH connection to host_1! at /usr/share/perl5/vendor_perl/MHA/HealthCheck.pm line 342.
ssh: connect to host host_2 port 22: Connection timed out
Monitoring server host_2 is NOT reachable!
Fri Aug 5 10:55:55 2016 - [warning] At least one of monitoring servers is not reachable from this script. This is likely a network problem. Failover should not happen.
Fri Aug 5 10:55:56 2016 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on 'host_1' (4))
Fri Aug 5 10:55:56 2016 - [warning] Connection failed 3 time(s)..
Fri Aug 5 10:55:59 2016 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on 'host_1' (4))
Fri Aug 5 10:55:59 2016 - [warning] Connection failed 4 time(s)..
Fri Aug 5 10:55:59 2016 - [warning] Secondary network check script returned errors. Failover should not start so checking server status again. Check network settings for details.
Fri Aug 5 10:56:02 2016 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on 'host_1' (4))
Fri Aug 5 10:56:02 2016 - [warning] Connection failed 1 time(s)..
Fri Aug 5 10:56:02 2016 - [info] Executing secondary network check script: masterha_secondary_check -s host_2 -s host_3 --user=root --master_host=host_1 --master_ip=host_1 --master_port=3306 --master_user=dba --master_password=dba --ping_type=SELECT
Fri Aug 5 10:56:02 2016 - [info] Executing SSH check script: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/mysql.bin --output_file=/var/log/masterha/app1/save_binary_logs_test --manager_version=0.56 --binlog_prefix=host_1_name
Fri Aug 5 10:56:05 2016 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on 'host_1' (4))
Fri Aug 5 10:56:05 2016 - [warning] Connection failed 2 time(s)..
Fri Aug 5 10:56:07 2016 - [warning] HealthCheck: Got timeout on checking SSH connection to host_1! at /usr/share/perl5/vendor_perl/MHA/HealthCheck.pm line 342.
ssh: connect to host host_2 port 22: Connection timed out
Monitoring server host_2 is NOT reachable!
Fri Aug 5 10:56:07 2016 - [warning] At least one of monitoring servers is not reachable from this script. This is likely a network problem. Failover should not happen.
Manager <-- 不通 --> Master
Manager <-- 不通 --> S1 <-- 不通 --> master
Manager <-- 不通 --> S2 <-- 不通 --> master
* 检查SSH
masterha_check_ssh --conf=/etc/app1.cnf
* 检查复制(如果是非GTID模式,slave如果没有设置relay_log_purge=0,那么会有警告)
masterha_check_repl --conf=/etc/app1.cnf
Thu Jul 28 14:03:33 2016 - [warning] relay_log_purge=0 is not set on slave host_2(host_2:3306).
Thu Jul 28 14:03:33 2016 - [warning] relay_log_purge=0 is not set on slave host_3(host_3:3306).
* 检查mha状态
masterha_check_status --conf=/etc/app1.cnf
* 开启mha 监控
nohup masterha_manager --conf=/etc/app1.cnf &
master> iptables -A INPUT -p tcp -s host_monitor -j ACCEPT
master> iptables -A INPUT -p tcp -s 0.0.0.0/0 -j DROP
s1> iptables -A INPUT -p tcp -s host_monitor -j ACCEPT
s1> iptables -A INPUT -p tcp -s 0.0.0.0/0 -j DROP
s2> iptables -A INPUT -p tcp -s host_monitor -j ACCEPT
s2> iptables -A INPUT -p tcp -s 0.0.0.0/0 -j DROP
由于都不通,所以不会failover
Manager <-- 不通 --> Master
Manager <-- 不通 --> S1 <-- 不通 --> master
Manager <-- 不通 --> S2 <-- 不通 --> master
* 检查SSH
masterha_check_ssh --conf=/etc/app1.cnf
* 检查复制(如果是非GTID模式,slave如果没有设置relay_log_purge=0,那么会有警告)
masterha_check_repl --conf=/etc/app1.cnf
Thu Jul 28 14:03:33 2016 - [warning] relay_log_purge=0 is not set on slave host_2(host_2:3306).
Thu Jul 28 14:03:33 2016 - [warning] relay_log_purge=0 is not set on slave host_3(host_3:3306).
* 检查mha状态
masterha_check_status --conf=/etc/app1.cnf
* 开启mha 监控
nohup masterha_manager --conf=/etc/app1.cnf &
master> iptables -A INPUT -p tcp -s host_monitor -j ACCEPT
master> iptables -A INPUT -p tcp -s 0.0.0.0/0 -j DROP
s1> iptables -A INPUT -p tcp -s host_monitor -j ACCEPT
s1> iptables -A INPUT -p tcp -s 0.0.0.0/0 -j DROP
s2> iptables -A INPUT -p tcp -s host_monitor -j ACCEPT
s2> iptables -A INPUT -p tcp -s 0.0.0.0/0 -j DROP
由于都不通,所以不会failover
如果没有足够的relay-log,在恢复的时候,会报错的
在生产差异relay log的时候发现没有足够的日志,所以就报错,不会failover
Fri Aug 5 11:09:51 2016 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..
Fri Aug 5 11:10:42 2016 - [warning] Got timeout on MySQL Ping(SELECT) child process and killed it! at /usr/share/perl5/vendor_perl/MHA/HealthCheck.pm line 431.
Fri Aug 5 11:10:42 2016 - [info] Executing SSH check script: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/mysql.bin --output_file=/var/log/masterha/app1/save_binary_logs_test --manager_version=0.56 --binlog_prefix=host_1_name
Fri Aug 5 11:10:45 2016 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on 'host_1' (4))
Fri Aug 5 11:10:45 2016 - [warning] Connection failed 2 time(s)..
Fri Aug 5 11:10:47 2016 - [warning] HealthCheck: Got timeout on checking SSH connection to host_1! at /usr/share/perl5/vendor_perl/MHA/HealthCheck.pm line 342.
Fri Aug 5 11:10:48 2016 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on 'host_1' (4))
Fri Aug 5 11:10:48 2016 - [warning] Connection failed 3 time(s)..
Fri Aug 5 11:10:51 2016 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on 'host_1' (4))
Fri Aug 5 11:10:51 2016 - [warning] Connection failed 4 time(s)..
Fri Aug 5 11:10:51 2016 - [warning] Master is not reachable from health checker!
Fri Aug 5 11:10:51 2016 - [warning] Master host_1(host_1:3306) is not reachable!
Fri Aug 5 11:10:51 2016 - [warning] SSH is NOT reachable.
Fri Aug 5 11:10:51 2016 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha_default.cnf and /etc/app1.cnf again, and trying to connect to all servers to check server status..
Fri Aug 5 11:10:51 2016 - [info] Reading default configuration from /etc/masterha_default.cnf..
Fri Aug 5 11:10:51 2016 - [info] Reading application default configuration from /etc/app1.cnf..
Fri Aug 5 11:10:51 2016 - [info] Reading server configuration from /etc/app1.cnf..
Fri Aug 5 11:10:51 2016 - [warning] SQL Thread is stopped(no error) on host_3(host_3:3306)
Fri Aug 5 11:10:51 2016 - [info] GTID failover mode = 0
Fri Aug 5 11:10:51 2016 - [info] Dead Servers:
Fri Aug 5 11:10:51 2016 - [info] host_1(host_1:3306)
Fri Aug 5 11:10:51 2016 - [info] Alive Servers:
Fri Aug 5 11:10:51 2016 - [info] host_2(host_2:3306)
Fri Aug 5 11:10:51 2016 - [info] host_3(host_3:3306)
Fri Aug 5 11:10:51 2016 - [info] Alive Slaves:
Fri Aug 5 11:10:51 2016 - [info] host_2(host_2:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled
Fri Aug 5 11:10:51 2016 - [info] Replicating from host_1(host_1:3306)
Fri Aug 5 11:10:51 2016 - [info] Primary candidate for the new Master (candidate_master is set)
Fri Aug 5 11:10:51 2016 - [info] host_3(host_3:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled
Fri Aug 5 11:10:51 2016 - [info] Replicating from host_1(host_1:3306)
Fri Aug 5 11:10:51 2016 - [info] Not candidate for the new Master (no_master is set)
Fri Aug 5 11:10:51 2016 - [info] Checking slave configurations..
Fri Aug 5 11:10:51 2016 - [info] read_only=1 is not set on slave host_2(host_2:3306).
Fri Aug 5 11:10:51 2016 - [warning] relay_log_purge=0 is not set on slave host_2(host_2:3306).
Fri Aug 5 11:10:51 2016 - [info] read_only=1 is not set on slave host_3(host_3:3306).
Fri Aug 5 11:10:51 2016 - [warning] relay_log_purge=0 is not set on slave host_3(host_3:3306).
Fri Aug 5 11:10:51 2016 - [info] Checking replication filtering settings..
Fri Aug 5 11:10:51 2016 - [info] Replication filtering check ok.
Fri Aug 5 11:10:51 2016 - [info] Master is down!
Fri Aug 5 11:10:51 2016 - [info] Terminating monitoring script.
Fri Aug 5 11:10:51 2016 - [info] Got exit code 20 (Master dead).
Fri Aug 5 11:10:51 2016 - [info] MHA::MasterFailover version 0.56.
Fri Aug 5 11:10:51 2016 - [info] Starting master failover.
Fri Aug 5 11:10:51 2016 - [info]
Fri Aug 5 11:10:51 2016 - [info] * Phase 1: Configuration Check Phase..
Fri Aug 5 11:10:51 2016 - [info]
Fri Aug 5 11:10:51 2016 - [warning] SQL Thread is stopped(no error) on host_3(host_3:3306)
Fri Aug 5 11:10:51 2016 - [info] GTID failover mode = 0
Fri Aug 5 11:10:51 2016 - [info] Dead Servers:
Fri Aug 5 11:10:51 2016 - [info] host_1(host_1:3306)
Fri Aug 5 11:10:51 2016 - [info] Checking master reachability via MySQL(double check)...
Fri Aug 5 11:10:52 2016 - [info] ok.
Fri Aug 5 11:10:52 2016 - [info] Alive Servers:
Fri Aug 5 11:10:52 2016 - [info] host_2(host_2:3306)
Fri Aug 5 11:10:52 2016 - [info] host_3(host_3:3306)
Fri Aug 5 11:10:52 2016 - [info] Alive Slaves:
Fri Aug 5 11:10:52 2016 - [info] host_2(host_2:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled
Fri Aug 5 11:10:52 2016 - [info] Replicating from host_1(host_1:3306)
Fri Aug 5 11:10:52 2016 - [info] Primary candidate for the new Master (candidate_master is set)
Fri Aug 5 11:10:52 2016 - [info] host_3(host_3:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled
Fri Aug 5 11:10:52 2016 - [info] Replicating from host_1(host_1:3306)
Fri Aug 5 11:10:52 2016 - [info] Not candidate for the new Master (no_master is set)
Fri Aug 5 11:10:52 2016 - [info] Starting SQL thread on host_3(host_3:3306) ..
Fri Aug 5 11:10:52 2016 - [info] done.
Fri Aug 5 11:10:52 2016 - [info] Starting Non-GTID based failover.
Fri Aug 5 11:10:52 2016 - [info]
Fri Aug 5 11:10:52 2016 - [info] ** Phase 1: Configuration Check Phase completed.
Fri Aug 5 11:10:52 2016 - [info]
Fri Aug 5 11:10:52 2016 - [info] * Phase 2: Dead Master Shutdown Phase..
Fri Aug 5 11:10:52 2016 - [info]
Fri Aug 5 11:10:52 2016 - [info] Forcing shutdown so that applications never connect to the current master..
Fri Aug 5 11:10:52 2016 - [info] Executing master IP deactivation script:
Fri Aug 5 11:10:52 2016 - [info] /home/mysql/MHA/masterha/master_ip_failover --orig_master_host=host_1 --orig_master_ip=host_1 --orig_master_port=3306 --command=stop
Fri Aug 5 11:10:53 2016 - [info] done.
Fri Aug 5 11:10:53 2016 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.
Fri Aug 5 11:10:53 2016 - [info] * Phase 2: Dead Master Shutdown Phase completed.
Fri Aug 5 11:10:53 2016 - [info]
Fri Aug 5 11:10:53 2016 - [info] * Phase 3: Master Recovery Phase..
Fri Aug 5 11:10:53 2016 - [info]
Fri Aug 5 11:10:53 2016 - [info] * Phase 3.1: Getting Latest Slaves Phase..
Fri Aug 5 11:10:53 2016 - [info]
Fri Aug 5 11:10:53 2016 - [info] The latest binary log file/position on all slaves is host_1_name.000001:32925754
Fri Aug 5 11:10:53 2016 - [info] Latest slaves (Slaves that received relay log files to the latest):
Fri Aug 5 11:10:53 2016 - [info] host_2(host_2:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled
Fri Aug 5 11:10:53 2016 - [info] Replicating from host_1(host_1:3306)
Fri Aug 5 11:10:53 2016 - [info] Primary candidate for the new Master (candidate_master is set)
Fri Aug 5 11:10:53 2016 - [info] The oldest binary log file/position on all slaves is host_1_name.000001:2093512
Fri Aug 5 11:10:53 2016 - [info] Oldest slaves:
Fri Aug 5 11:10:53 2016 - [info] host_3(host_3:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled
Fri Aug 5 11:10:53 2016 - [info] Replicating from host_1(host_1:3306)
Fri Aug 5 11:10:53 2016 - [info] Not candidate for the new Master (no_master is set)
Fri Aug 5 11:10:53 2016 - [info]
Fri Aug 5 11:10:53 2016 - [info] * Phase 3.2: Saving Dead Master's Binlog Phase..
Fri Aug 5 11:10:53 2016 - [info]
Fri Aug 5 11:10:53 2016 - [warning] Dead Master is not SSH reachable. Could not save it's binlogs. Transactions that were not sent to the latest slave (Read_Master_Log_Pos to the tail of the dead master's binlog) were lost.
Fri Aug 5 11:10:53 2016 - [info]
Fri Aug 5 11:10:53 2016 - [info] * Phase 3.3: Determining New Master Phase..
Fri Aug 5 11:10:53 2016 - [info]
Fri Aug 5 11:10:53 2016 - [info] Finding the latest slave that has all relay logs for recovering other slaves..
Fri Aug 5 11:10:53 2016 - [info] HealthCheck: SSH to host_2 is reachable.
Fri Aug 5 11:10:53 2016 - [info] Checking whether host_2 has relay logs from the oldest position..
Fri Aug 5 11:10:53 2016 - [info] Executing command: apply_diff_relay_logs --command=find --latest_mlf=host_1_name.000001 --latest_rmlp=32925754 --target_mlf=host_1_name.000001 --target_rmlp=2093512 --server_id=12616506 --workdir=/var/log/masterha/app1 --timestamp=20160805111051 --manager_version=0.56 --relay_dir=/data/mysql_data --current_relay_log=host_2_name-relay-bin.000008 :
Relay log found at /data/mysql_data, up to host_2_name-relay-bin.000008
Fast relay log position search failed. Reading relay logs to find..
Reading host_2_name-relay-bin.000008
Binlog Checksum enabled
Master Version is 5.7.13-log
Binlog Checksum enabled
host_2_name-relay-bin.000008 contains master host_1_name.000001 from position 26175199
Reading host_2_name-relay-bin.000007
Binlog Checksum enabled
host_2_name-relay-bin.000007 contains master host_1_name.000001 from position 12184106
Reading host_2_name-relay-bin.000006
No such file or directory:/data/mysql_data/host_2_name-relay-bin.000006 at /usr/share/perl5/vendor_perl/MHA/BinlogPosFindManager.pm line 102
Fri Aug 5 11:10:53 2016 - [warning] host_2 doesn't have all relay logs. Maybe some logs were purged.
Fri Aug 5 11:10:53 2016 - [warning] None of latest servers have enough relay logs from oldest position. We can't recover oldest slaves.
Fri Aug 5 11:10:53 2016 - [error][/usr/share/perl5/vendor_perl/MHA/MasterFailover.pm, ln947] None of the latest slaves has enough relay logs for recovery.
Fri Aug 5 11:10:53 2016 - [error][/usr/share/perl5/vendor_perl/MHA/ManagerUtil.pm, ln177] Got ERROR: at /usr/bin/masterha_manager line 65
Fri Aug 5 11:10:53 2016 - [info]
----- Failover Report -----
app1: MySQL Master failover host_1(host_1:3306)
Master host_1(host_1:3306) is down!
Check MHA Manager logs at host_manager_name:/var/log/masterha/app1/app1.log for details.
Started automated(non-interactive) failover.
Invalidated master IP address on host_1(host_1:3306)
None of the latest slaves has enough relay logs for recovery.
Got Error so couldn't continue failover from here.
GTID模式下,不需要relay-log,因为是走新的协议。
但是必须打开 binlog,log_slave_updates
Fri Aug 5 13:32:25 2016 - [info] OK.
Fri Aug 5 13:32:25 2016 - [warning] shutdown_script is not defined.
Fri Aug 5 13:32:25 2016 - [info] Set master ping interval 3 seconds.
Fri Aug 5 13:32:25 2016 - [warning] secondary_check_script is not defined. It is highly recommended setting it to check master reachability from two or more routes.
Fri Aug 5 13:32:25 2016 - [info] Starting ping health check on host_1(host_1:3306)..
Fri Aug 5 13:32:28 2016 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on 'host_1' (4))
Fri Aug 5 13:32:28 2016 - [warning] Connection failed 1 time(s)..
Fri Aug 5 13:32:28 2016 - [info] Executing SSH check script: exit 0
Fri Aug 5 13:32:31 2016 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on 'host_1' (4))
Fri Aug 5 13:32:31 2016 - [warning] Connection failed 2 time(s)..
Fri Aug 5 13:32:33 2016 - [warning] HealthCheck: Got timeout on checking SSH connection to host_1! at /usr/share/perl5/vendor_perl/MHA/HealthCheck.pm line 342.
Fri Aug 5 13:32:34 2016 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on 'host_1' (4))
Fri Aug 5 13:32:34 2016 - [warning] Connection failed 3 time(s)..
Fri Aug 5 13:32:37 2016 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on 'host_1' (4))
Fri Aug 5 13:32:37 2016 - [warning] Connection failed 4 time(s)..
Fri Aug 5 13:32:37 2016 - [warning] Master is not reachable from health checker!
Fri Aug 5 13:32:37 2016 - [warning] Master host_1(host_1:3306) is not reachable!
Fri Aug 5 13:32:37 2016 - [warning] SSH is NOT reachable.
Fri Aug 5 13:32:37 2016 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha_default.cnf and /etc/app1.cnf again, and trying to connect to all servers to check server status..
Fri Aug 5 13:32:37 2016 - [info] Reading default configuration from /etc/masterha_default.cnf..
Fri Aug 5 13:32:37 2016 - [info] Reading application default configuration from /etc/app1.cnf..
Fri Aug 5 13:32:37 2016 - [info] Reading server configuration from /etc/app1.cnf..
Fri Aug 5 13:32:37 2016 - [warning] SQL Thread is stopped(no error) on host_3(host_3:3306)
Fri Aug 5 13:32:37 2016 - [info] GTID failover mode = 1
Fri Aug 5 13:32:37 2016 - [info] Dead Servers:
Fri Aug 5 13:32:37 2016 - [info] host_1(host_1:3306)
Fri Aug 5 13:32:37 2016 - [info] Alive Servers:
Fri Aug 5 13:32:37 2016 - [info] host_2(host_2:3306)
Fri Aug 5 13:32:37 2016 - [info] host_3(host_3:3306)
Fri Aug 5 13:32:37 2016 - [info] Alive Slaves:
Fri Aug 5 13:32:37 2016 - [info] host_2(host_2:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled
Fri Aug 5 13:32:37 2016 - [info] GTID ON
Fri Aug 5 13:32:37 2016 - [info] Replicating from host_1(host_1:3306)
Fri Aug 5 13:32:37 2016 - [info] Primary candidate for the new Master (candidate_master is set)
Fri Aug 5 13:32:37 2016 - [info] host_3(host_3:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled
Fri Aug 5 13:32:37 2016 - [info] GTID ON
Fri Aug 5 13:32:37 2016 - [info] Replicating from host_1(host_1:3306)
Fri Aug 5 13:32:37 2016 - [info] Not candidate for the new Master (no_master is set)
Fri Aug 5 13:32:37 2016 - [info] Checking slave configurations..
Fri Aug 5 13:32:37 2016 - [info] Checking replication filtering settings..
Fri Aug 5 13:32:37 2016 - [info] Replication filtering check ok.
Fri Aug 5 13:32:37 2016 - [info] Master is down!
Fri Aug 5 13:32:37 2016 - [info] Terminating monitoring script.
Fri Aug 5 13:32:37 2016 - [info] Got exit code 20 (Master dead).
Fri Aug 5 13:32:37 2016 - [info] MHA::MasterFailover version 0.56.
Fri Aug 5 13:32:37 2016 - [info] Starting master failover.
Fri Aug 5 13:32:37 2016 - [info]
Fri Aug 5 13:32:37 2016 - [info] * Phase 1: Configuration Check Phase..
Fri Aug 5 13:32:37 2016 - [info]
Fri Aug 5 13:32:37 2016 - [warning] SQL Thread is stopped(no error) on host_3(host_3:3306)
Fri Aug 5 13:32:37 2016 - [info] GTID failover mode = 1
Fri Aug 5 13:32:37 2016 - [info] Dead Servers:
Fri Aug 5 13:32:37 2016 - [info] host_1(host_1:3306)
Fri Aug 5 13:32:37 2016 - [info] Checking master reachability via MySQL(double check)...
Fri Aug 5 13:32:38 2016 - [info] ok.
Fri Aug 5 13:32:38 2016 - [info] Alive Servers:
Fri Aug 5 13:32:38 2016 - [info] host_2(host_2:3306)
Fri Aug 5 13:32:38 2016 - [info] host_3(host_3:3306)
Fri Aug 5 13:32:38 2016 - [info] Alive Slaves:
Fri Aug 5 13:32:38 2016 - [info] host_2(host_2:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled
Fri Aug 5 13:32:38 2016 - [info] GTID ON
Fri Aug 5 13:32:38 2016 - [info] Replicating from host_1(host_1:3306)
Fri Aug 5 13:32:38 2016 - [info] Primary candidate for the new Master (candidate_master is set)
Fri Aug 5 13:32:38 2016 - [info] host_3(host_3:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled
Fri Aug 5 13:32:38 2016 - [info] GTID ON
Fri Aug 5 13:32:38 2016 - [info] Replicating from host_1(host_1:3306)
Fri Aug 5 13:32:38 2016 - [info] Not candidate for the new Master (no_master is set)
Fri Aug 5 13:32:38 2016 - [info] Starting SQL thread on host_3(host_3:3306) ..
Fri Aug 5 13:32:38 2016 - [info] done.
Fri Aug 5 13:32:38 2016 - [info] Starting GTID based failover.
Fri Aug 5 13:32:38 2016 - [info]
Fri Aug 5 13:32:38 2016 - [info] ** Phase 1: Configuration Check Phase completed.
Fri Aug 5 13:32:38 2016 - [info]
Fri Aug 5 13:32:38 2016 - [info] * Phase 2: Dead Master Shutdown Phase..
Fri Aug 5 13:32:38 2016 - [info]
Fri Aug 5 13:32:38 2016 - [info] Forcing shutdown so that applications never connect to the current master..
Fri Aug 5 13:32:38 2016 - [info] Executing master IP deactivation script:
Fri Aug 5 13:32:38 2016 - [info] /home/mysql/MHA/masterha/master_ip_failover --orig_master_host=host_1 --orig_master_ip=host_1 --orig_master_port=3306 --command=stop
Fri Aug 5 13:32:38 2016 - [info] done.
Fri Aug 5 13:32:38 2016 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.
Fri Aug 5 13:32:38 2016 - [info] * Phase 2: Dead Master Shutdown Phase completed.
Fri Aug 5 13:32:38 2016 - [info]
Fri Aug 5 13:32:38 2016 - [info] * Phase 3: Master Recovery Phase..
Fri Aug 5 13:32:38 2016 - [info]
Fri Aug 5 13:32:38 2016 - [info] * Phase 3.1: Getting Latest Slaves Phase..
Fri Aug 5 13:32:38 2016 - [info]
Fri Aug 5 13:32:38 2016 - [info] The latest binary log file/position on all slaves is host_1_name.000002:56745425
Fri Aug 5 13:32:38 2016 - [info] Retrieved Gtid Set: 0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-210949
Fri Aug 5 13:32:38 2016 - [info] Latest slaves (Slaves that received relay log files to the latest):
Fri Aug 5 13:32:38 2016 - [info] host_2(host_2:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled
Fri Aug 5 13:32:38 2016 - [info] GTID ON
Fri Aug 5 13:32:38 2016 - [info] Replicating from host_1(host_1:3306)
Fri Aug 5 13:32:38 2016 - [info] Primary candidate for the new Master (candidate_master is set)
Fri Aug 5 13:32:38 2016 - [info] The oldest binary log file/position on all slaves is host_1_name.000002:517969
Fri Aug 5 13:32:38 2016 - [info] Retrieved Gtid Set: 0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-1925
Fri Aug 5 13:32:38 2016 - [info] Oldest slaves:
Fri Aug 5 13:32:38 2016 - [info] host_3(host_3:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled
Fri Aug 5 13:32:38 2016 - [info] GTID ON
Fri Aug 5 13:32:38 2016 - [info] Replicating from host_1(host_1:3306)
Fri Aug 5 13:32:38 2016 - [info] Not candidate for the new Master (no_master is set)
Fri Aug 5 13:32:38 2016 - [info]
Fri Aug 5 13:32:38 2016 - [info] * Phase 3.3: Determining New Master Phase..
Fri Aug 5 13:32:38 2016 - [info]
Fri Aug 5 13:32:38 2016 - [info] Searching new master from slaves..
Fri Aug 5 13:32:38 2016 - [info] Candidate masters from the configuration file:
Fri Aug 5 13:32:38 2016 - [info] host_2(host_2:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled
Fri Aug 5 13:32:38 2016 - [info] GTID ON
Fri Aug 5 13:32:38 2016 - [info] Replicating from host_1(host_1:3306)
Fri Aug 5 13:32:38 2016 - [info] Primary candidate for the new Master (candidate_master is set)
Fri Aug 5 13:32:38 2016 - [info] Non-candidate masters:
Fri Aug 5 13:32:38 2016 - [info] host_3(host_3:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled
Fri Aug 5 13:32:38 2016 - [info] GTID ON
Fri Aug 5 13:32:38 2016 - [info] Replicating from host_1(host_1:3306)
Fri Aug 5 13:32:38 2016 - [info] Not candidate for the new Master (no_master is set)
Fri Aug 5 13:32:38 2016 - [info] Searching from candidate_master slaves which have received the latest relay log events..
Fri Aug 5 13:32:38 2016 - [info] New master is host_2(host_2:3306)
Fri Aug 5 13:32:38 2016 - [info] Starting master failover..
Fri Aug 5 13:32:38 2016 - [info]
From:
host_1(host_1:3306) (current master)
+--host_2(host_2:3306)
+--host_3(host_3:3306)
To:
host_2(host_2:3306) (new master)
+--host_3(host_3:3306)
Fri Aug 5 13:32:38 2016 - [info]
Fri Aug 5 13:32:38 2016 - [info] * Phase 3.3: New Master Recovery Phase..
Fri Aug 5 13:32:38 2016 - [info]
Fri Aug 5 13:32:38 2016 - [info] Waiting all logs to be applied..
Fri Aug 5 13:32:38 2016 - [info] done.
Fri Aug 5 13:32:38 2016 - [info] Getting new master's binlog name and position..
Fri Aug 5 13:32:38 2016 - [info] host_2_name.000008:10198806
Fri Aug 5 13:32:38 2016 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='host_2', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='repl', MASTER_PASSWORD='xxx';
Fri Aug 5 13:32:38 2016 - [info] Master Recovery succeeded. File:Pos:Exec_Gtid_Set: host_2_name.000008, 10198806, 0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-210949
Fri Aug 5 13:32:38 2016 - [info] Executing master IP activate script:
Fri Aug 5 13:32:38 2016 - [info] /home/mysql/MHA/masterha/master_ip_failover --command=start --ssh_user=root --orig_master_host=host_1 --orig_master_ip=host_1 --orig_master_port=3306 --new_master_host=host_2 --new_master_ip=host_2 --new_master_port=3306 --new_master_user='dba' --new_master_password='dba'
Set read_only=0 on the new master.
No need to Creating app user on the new master..
Fri Aug 5 13:32:38 2016 - [info] OK.
Fri Aug 5 13:32:38 2016 - [info] Setting read_only=0 on host_2(host_2:3306)..
Fri Aug 5 13:32:38 2016 - [info] ok.
Fri Aug 5 13:32:38 2016 - [info] ** Finished master recovery successfully.
Fri Aug 5 13:32:38 2016 - [info] * Phase 3: Master Recovery Phase completed.
Fri Aug 5 13:32:38 2016 - [info]
Fri Aug 5 13:32:38 2016 - [info] * Phase 4: Slaves Recovery Phase..
Fri Aug 5 13:32:38 2016 - [info]
Fri Aug 5 13:32:38 2016 - [info]
Fri Aug 5 13:32:38 2016 - [info] * Phase 4.1: Starting Slaves in parallel..
Fri Aug 5 13:32:38 2016 - [info]
Fri Aug 5 13:32:38 2016 - [info] -- Slave recovery on host host_3(host_3:3306) started, pid: 22029. Check tmp log /var/log/masterha/app1/host_3_3306_20160805133237.log if it takes time..
Fri Aug 5 13:33:38 2016 - [info]
Fri Aug 5 13:33:38 2016 - [info] Log messages from host_3 ...
Fri Aug 5 13:33:38 2016 - [info]
Fri Aug 5 13:32:38 2016 - [info] Resetting slave host_3(host_3:3306) and starting replication from the new master host_2(host_2:3306)..
Fri Aug 5 13:32:38 2016 - [info] Executed CHANGE MASTER.
Fri Aug 5 13:32:39 2016 - [info] Slave started.
Fri Aug 5 13:33:38 2016 - [info] gtid_wait(0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-210949) completed on host_3(host_3:3306). Executed 206170 events.
Fri Aug 5 13:33:38 2016 - [info] End of log messages from host_3.
Fri Aug 5 13:33:38 2016 - [info] -- Slave on host host_3(host_3:3306) started.
Fri Aug 5 13:33:38 2016 - [info] All new slave servers recovered successfully.
Fri Aug 5 13:33:38 2016 - [info]
Fri Aug 5 13:33:38 2016 - [info] * Phase 5: New master cleanup phase..
Fri Aug 5 13:33:38 2016 - [info]
Fri Aug 5 13:33:38 2016 - [info] Resetting slave info on the new master..
Fri Aug 5 13:33:38 2016 - [info] host_2: Resetting slave info succeeded.
Fri Aug 5 13:33:38 2016 - [info] Master failover to host_2(host_2:3306) completed successfully.
Fri Aug 5 13:33:38 2016 - [info]
----- Failover Report -----
app1: MySQL Master failover host_1(host_1:3306) to host_2(host_2:3306) succeeded
Master host_1(host_1:3306) is down!
Check MHA Manager logs at host_manager_name:/var/log/masterha/app1/app1.log for details.
Started automated(non-interactive) failover.
Invalidated master IP address on host_1(host_1:3306)
Selected host_2(host_2:3306) as a new master.
host_2(host_2:3306): OK: Applying all logs succeeded.
host_2(host_2:3306): OK: Activated master IP address.
host_3(host_3:3306): OK: Slave started, replicating from host_2(host_2:3306)
host_2(host_2:3306): Resetting slave info succeeded.
Master failover to host_2(host_2:3306) completed successfully.
* 如果master挂了,而且没有HA,那怎么做日志补偿呢?
1. 其实很简单,只要之前在slave都设置了relay-log-purge=0就可以
2. 然后自己在按照正常流程部署好manager和node
3. 操作 手工-自动 failover 就可以了