MySQL MHA–故障切换模式(GTID模式和非GTID模式)
MHA with binlog server
这两篇文章结论就是
MHA has changed failover procedure when turn on GTID.
另外基于GTID的MHA failover不需要relay log, binlog就可以 relay_log_purge可以设置为1
https://yq.aliyun.com/articles/58920/#
这个需要实际测试一下, 不过按照上面的第二条来看, 只要chang就行了, slave开了binlog和log_slave_updates就行, 这一般没有人会不开的. 这样看的话线上没必要设置relay_log_purge=0了
如果是基于GTID的 failover ,不会从原master去拉差异binlog, 如果没配置binlog server就可能丢数据
看代码也确实是这样
另外看manager启动日志也发现
n Mar 1 21:21:01 2020 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.
不检查ssh了
代码中也可以看到如果开了GTID就不查了
我实际测试也确实没有走3.2这一步save binlog
Fri Feb 28 20:55:02 2020 - [info] * Phase 3: Master Recovery Phase..
Fri Feb 28 20:55:02 2020 - [info]
Fri Feb 28 20:55:02 2020 - [info] * Phase 3.1: Getting Latest Slaves Phase..
Fri Feb 28 20:55:02 2020 - [info]
Fri Feb 28 20:55:02 2020 - [info] The latest binary log file/position on all slaves is mysql-bin.000004:234
Fri Feb 28 20:55:02 2020 - [info] Latest slaves (Slaves that received relay log files to the latest):
Fri Feb 28 20:55:02 2020 - [info] 192.168.98.10(192.168.98.10:3306) Version=5.7.29-32-log (oldest major version between slaves) log-bin:enabled
Fri Feb 28 20:55:02 2020 - [info] GTID ON
Fri Feb 28 20:55:02 2020 - [info] Replicating from 192.168.98.11(192.168.98.11:3306)
Fri Feb 28 20:55:02 2020 - [info] Primary candidate for the new Master (candidate_master is set)
Fri Feb 28 20:55:02 2020 - [info] 192.168.98.12(192.168.98.12:3306) Version=5.7.29-32-log (oldest major version between slaves) log-bin:enabled
Fri Feb 28 20:55:02 2020 - [info] GTID ON
Fri Feb 28 20:55:02 2020 - [info] Replicating from 192.168.98.11(192.168.98.11:3306)
Fri Feb 28 20:55:02 2020 - [info] The oldest binary log file/position on all slaves is mysql-bin.000004:234
Fri Feb 28 20:55:02 2020 - [info] Oldest slaves:
Fri Feb 28 20:55:02 2020 - [info] 192.168.98.10(192.168.98.10:3306) Version=5.7.29-32-log (oldest major version between slaves) log-bin:enabled
Fri Feb 28 20:55:02 2020 - [info] GTID ON
Fri Feb 28 20:55:02 2020 - [info] Replicating from 192.168.98.11(192.168.98.11:3306)
Fri Feb 28 20:55:02 2020 - [info] Primary candidate for the new Master (candidate_master is set)
Fri Feb 28 20:55:02 2020 - [info] 192.168.98.12(192.168.98.12:3306) Version=5.7.29-32-log (oldest major version between slaves) log-bin:enabled
Fri Feb 28 20:55:02 2020 - [info] GTID ON
Fri Feb 28 20:55:02 2020 - [info] Replicating from 192.168.98.11(192.168.98.11:3306)
Fri Feb 28 20:55:02 2020 - [info]
Fri Feb 28 20:55:02 2020 - [info] * Phase 3.3: Determining New Master Phase..
Fri Feb 28 20:55:02 2020 - [info]
Fri Feb 28 20:55:02 2020 - [info] Searching new master from slaves..
Fri Feb 28 20:55:02 2020 - [info] Candidate masters from the configuration file:
这有个文章也说了这个问题mha 0.58 gtid模式下 丢数据问题修复
那么要么按照上面的文章改代码
要么就按照MHA with binlog server文章中说的配置binlog server.
MHA check binlog server 1 and find error with binlog then try to recover from binlog server 2.
With this feature we can add master server as first binlog server to guarantee no binlog loss.If master server crash and can not been reached,binlog server 2 will also provide service.
将原主库配置成binlog server 1, 然后再拍一个binlog server. 因为binlog server也不是半同步的, 也可能缺binlog
[server1]
hostname=192.168.98.10
candidate_master=1
[server2]
hostname=192.168.98.11
candidate_master=1
[server3]
hostname=192.168.98.12
# no_master=1
[binlog1]
no_master=1
hostname=192.168.98.10
master_binlog_dir=/data/mysql_3306/data/
按此配置启动manager后可以看到日志
Sun Mar 1 21:38:16 2020 - [info] Binlog server 192.168.98.10 is reachable.
Sun Mar 1 21:38:16 2020 - [info] Checking recovery script configurations on 192.168.98.10(192.168.98.10:3306)..
Sun Mar 1 21:38:16 2020 - [info] Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/mysql_3306/data/ --output_file=/masterha//save_binary_logs_test --manager_version=0.58 --start_file=mysql-bin.000014
Sun Mar 1 21:38:16 2020 - [info] Connecting to [email protected](192.168.98.10:22)..
Creating /masterha if not exists.. ok.
Checking output directory is accessible or not..
ok.
Binlog found at /data/mysql_3306/data/, up to mysql-bin.000014
Sun Mar 1 21:38:17 2020 - [info] Binlog setting check done.
尝试failover, 可以看到日志中多了
Sun Mar 1 21:43:37 2020 - [info] -- Saving binlog from host 192.168.98.10 started, pid: 98503
Sun Mar 1 21:43:39 2020 - [info]
Sun Mar 1 21:43:39 2020 - [info] Log messages from 192.168.98.10 ...
Sun Mar 1 21:43:39 2020 - [info]
Sun Mar 1 21:43:37 2020 - [info] Fetching binary logs from binlog server 192.168.98.10..
Sun Mar 1 21:43:37 2020 - [info] Executing binlog save command: save_binary_logs --command=save --start_file=mysql-bin.000014 --start_pos=234 --output_file=/masterha//saved_binlog_binlog1_20200301214334.binlog --handle_raw_binlog=0 --skip_filter=1 --disable_log_bin=0 --manager_version=0.58 --oldest_version=5.7.29-32-log --binlog_dir=/data/mysql_3306/data/
Creating /masterha if not exists.. ok.
Concat binary/relay logs from mysql-bin.000014 pos 234 to mysql-bin.000014 EOF into /masterha//saved_binlog_binlog1_20200301214334.binlog ..
Concat succeeded.
Sun Mar 1 21:43:39 2020 - [info] scp from [email protected]:/masterha//saved_binlog_binlog1_20200301214334.binlog to local:/masterha/cls_all//saved_binlog_192.168.98.10_binlog1_20200301214334.binlog succeeded.
Sun Mar 1 21:43:39 2020 - [info] End of log messages from 192.168.98.10.
Sun Mar 1 21:43:39 2020 - [info] Saved mysqlbinlog size from 192.168.98.10 is 820 bytes.
没配置binlog server的日志
Fri Feb 28 20:55:02 2020 - [info] * Phase 3.3: New Master Recovery Phase..
Fri Feb 28 20:55:02 2020 - [info]
Fri Feb 28 20:55:02 2020 - [info] Waiting all logs to be applied..
Fri Feb 28 20:55:02 2020 - [info] done.
Fri Feb 28 20:55:02 2020 - [info] Getting new master's binlog name and position..
Fri Feb 28 20:55:02 2020 - [info] mysql-bin.000010:234
Fri Feb 28 20:55:02 2020 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='192.168.98.10', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='repler', MASTER_PASSWORD='xxx';
Fri Feb 28 20:55:02 2020 - [info] Master Recovery succeeded. File:Pos:Exec_Gtid_Set: mysql-bin.000010, 234, 3a60f8c7-592c-11ea-8cb1-000c2973aaf0:1-6,
68703597-592c-11ea-88b3-000c2998280b:1-4
Fri Feb 28 20:55:02 2020 - [info] Executing master IP activate script:
配置了binlog server的日志
Sun Mar 1 21:43:37 2020 - [info] * Phase 3.3: New Master Recovery Phase..
Sun Mar 1 21:43:37 2020 - [info]
Sun Mar 1 21:43:37 2020 - [info] Waiting all logs to be applied..
Sun Mar 1 21:43:37 2020 - [info] done.
Sun Mar 1 21:43:37 2020 - [info] -- Saving binlog from host 192.168.98.10 started, pid: 98503
Sun Mar 1 21:43:39 2020 - [info]
Sun Mar 1 21:43:39 2020 - [info] Log messages from 192.168.98.10 ...
Sun Mar 1 21:43:39 2020 - [info]
Sun Mar 1 21:43:37 2020 - [info] Fetching binary logs from binlog server 192.168.98.10..
Sun Mar 1 21:43:37 2020 - [info] Executing binlog save command: save_binary_logs --command=save --start_file=mysql-bin.000014 --start_pos=234 --output_file=/masterha//saved_binlog_binlog1_20200301214334.binlog --handle_raw_binlog=0 --skip_filter=1 --disable_log_bin=0 --manager_version=0.58 --oldest_version=5.7.29-32-log --binlog_dir=/data/mysql_3306/data/
Creating /masterha if not exists.. ok.
Concat binary/relay logs from mysql-bin.000014 pos 234 to mysql-bin.000014 EOF into /masterha//saved_binlog_binlog1_20200301214334.binlog ..
Concat succeeded.
Sun Mar 1 21:43:39 2020 - [info] scp from [email protected]:/masterha//saved_binlog_binlog1_20200301214334.binlog to local:/masterha/cls_all//saved_binlog_192.168.98.10_binlog1_20200301214334.binlog succeeded.
Sun Mar 1 21:43:39 2020 - [info] End of log messages from 192.168.98.10.
Sun Mar 1 21:43:39 2020 - [info] Saved mysqlbinlog size from 192.168.98.10 is 820 bytes.
Sun Mar 1 21:43:39 2020 - [info] Checking if super_read_only is defined and turned on..
Sun Mar 1 21:43:39 2020 - [info] not present or turned off, ignoring.
Sun Mar 1 21:43:39 2020 - [info] Applying differential binlog /masterha/cls_all//saved_binlog_192.168.98.10_binlog1_20200301214334.binlog ..
Sun Mar 1 21:43:39 2020 - [info] Differential log apply from binlog server succeeded.
Sun Mar 1 21:43:39 2020 - [info] Getting new master's binlog name and position..
Sun Mar 1 21:43:39 2020 - [info] mysql-bin.000012:234
Sun Mar 1 21:43:39 2020 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='192.168.98.11', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='repler', MASTER_PASSWORD='xxx';
Sun Mar 1 21:43:39 2020 - [info] Master Recovery succeeded. File:Pos:Exec_Gtid_Set: mysql-bin.000012, 234, 3a60f8c7-592c-11ea-8cb1-000c2973aaf0:1-7,
68703597-592c-11ea-88b3-000c2998280b:1-74
Sun Mar 1 21:43:39 2020 - [info] Executing master IP activate script:
当然如果你说你是MHA加半同步, 那么这些你可能都不用关心了