MySQL-HA MHA(Master High Availability manager)搭建手册


mha结构恢复过程
MySQL-HA MHA(Master High Availability manager)搭建手册_第1张图片

mha架构图:

MySQL-HA MHA(Master High Availability manager)搭建手册_第2张图片



本次测试环境为3台服务器

其中将一台slave作为mha中的manager节点

-------------------------------------- ip             hostname      mysql -------------------------------------- 10.1.10.244    data01        master 10.1.10.80     data02        slave 10.1.10.91     manager       slave --------------------------------------

主机情况:


服务器发行版:centos6.7
内核:2.6.32-220.el6.x86_64
数据库软件版本:MySQL Community 5.6.29




------------------------------------- 修改主机名 -------------------------------------

在三台服务器上分别执行:
244(master):
  1. # vi /etc/sysconfig/network
将HOSTNAME修改为 data01

80(slave):
  1. # vi /etc/sysconfig/network
将HOSTNAME修改为 data02

91(slave):
  1. # vi /etc/sysconfig/network
将HOSTNAME修改为 manager


同时修改三台服务器hosts:
  1. # vi /etc/hosts
添加:
10.1.10.244 data01
10.1.10.80 data02
10.1.10.91 manager



------------------------------------- 建立互信关系 ------------------------------------- 

在三台服务器上分别执行:
244(master)、80(slave)、91(slave):
  1. # cd /root/.ssh
  2. # ssh-keygen -t rsa
全部默认,直接回车


在224(master)上执行:
  1. # cd /root/.ssh
  2. # cat id_dsa.pub >> authorized_keys
  3. # scp [email protected]:/root/.ssh/id_dsa.pub ./id_dsa.pub.data02.80
  4. # scp [email protected]:/root/.ssh/id_dsa.pub ./id_dsa.pub.manager.91
  5. # cat id_dsa.pub.data02.80 >> authorized_keys
  6. # cat id_dsa.pub.manager.91 >> authorized_keys
  7. # scp authorized_keys [email protected]:/root/.ssh/authorized_keys
  8. # scp authorized_keys [email protected]:/root/.ssh/authorized_keys


从244(master)开始两两验证:
[root@data01 ~]# ssh data02
……略……
[root@data02 ~]# ssh manager
……略……
[root@manager ~]# ssh data01
……略……
[root@data01 ~]# ssh manager
……略……
[root@manager ~]# ssh data02
……略……
[root@data02 ~]# ssh data01
……略……



------------------------------------- 建立主从关系 ------------------------------------- 

在三台服务器上的mysql-server上分别执行:

244(master)、80(slave)、91(slave):

  1. mysql> GRANT super, replication slave, reload ON *.* TO repl_user@'10.1.10.%' IDENTIFIED BY 'repl_user';
  2. Query OK, 0 rows affected (0.00 sec)

  3. mysql> FLUSH PRIVILEGES;
  4. Query OK, 0 rows affected (0.00 sec)


分别修改三台服务器上mysql的配置文件:
244(master):
  1. [mysqld]
  2. log-bin = /var/lib/mysql/data01-bin
  3. log-bin-index = /var/lib/mysql/data01-bin.index
  4. server-id = 244
  5. relay-log = /var/lib/mysql/data01-relay-bin
  6. relay-log-index = /var/lib/mysql/data01-relay-bin.index
  7. log-slave-updates

80(slave):
  1. [mysqld]
  2. log-bin = /var/lib/mysql/data02-bin
  3. log-bin-index = /var/lib/mysql/data02-bin.index
  4. server-id = 80
  5. relay-log = /var/lib/mysql/data02-relay-bin
  6. relay-log-index = /var/lib/mysql/data02-relay-bin.index
  7. log-slave-updates

91(slave):
  1. [mysqld]
  2. log-bin = /var/lib/mysql/manager-bin
  3. log-bin-index = /var/lib/mysql/manager-bin.index
  4. server-id = 91
  5. relay-log = /var/lib/mysql/manager-relay-bin
  6. relay-log-index = /var/lib/mysql/manager-relay-bin.index
  7. log-slave-updates


在两台slave上分别执行:
80(slave)、91(slave):
  1. mysql> CHANGE MASTER TO
  2.     -> MASTER_HOST = '10.1.10.244',
  3.     -> MASTER_PORT = 3306,
  4.     -> MASTER_USER = 'repl_user',
  5.     -> MASTER_PASSWORD = 'repl_user';
  6. Query OK, 0 rows affected, 2 warnings (0.03 sec)

  7. mysql> START SLAVE;
  8. Query OK, 0 rows affected (0.00 sec)


执行完后可以查看一下复制进程是否出问题:
  1. mysql> SHOW SLAVE STATUS\G

若显示IO和SQL线程为Yes则表示复制建立好了:

Slave_IO_Running: Yes
Slave_SQL_Running: Yes

也可以通过SHOW PROCESSLIST来看。



------------------------------------- 安装mha-manager及mha-node ------------------------------------- 

即将需要用到的附件:
mha4mysql-node-0.56.tar.gz
mha4mysql-manager-0.56.tar.gz
这两个tar包可以搜一下


在三台服务器上分别执行:
244(data01)、80(data02)、91(manager):

  1. # yum install -y perl-DBD-MySQL


继续分别在三台服务器上安装mha-node工具:
mha-node安装:
  1. # tar zxvf mha4mysql-node-0.56.tar.gz
  2. # cd mha4mysql-node-0.56
  3. # perl Makefile.PL
  4. # make && make install


在管理服务器上安装mha-manager工具:
91(manager):
先在manager服务器上安装一些依赖工具或perl模块:
  1. # yum install -y perl-Config-Tiny

以下三个可能安不上:
  1. # yum install -y perl-Log-Dispatch
  2. # yum install -y perl-Parallel-ForkManager
  3. # yum install -y perl-Config-IniFiles

可以通过perl CPAN来安装,如果没有,要先通过yum安装一下CPAN:
  1. # yum install -y perl-CPAN


进入CPAN来安装perl模块:
  1. # perl -MCPAN -e shell
  1. cpan[1]> install Log::Dispatch
  2. cpan[1]> install Parallel::ForkManager
  3. cpan[1]> install Config::IniFiles

如果安装失败,可以根据提示信息,先安装一些依赖模块,暂时总结如下(CPAN中)

  1. install Test::Requires
  2. install Module::Runtime
  3. install Dist::CheckConflicts
  4. install Params::Validate
  5. install Module::Runtime
  6. install Module::Implementation
  7. install ExtUtils::MakeMaker


mha-manger安装:
  1. # tar zxvf mha4mysql-manager-0.56.tar.gz
  2. # cd mha4mysql-manager-0.56
  3. # perl Makefile.PL

通过检查Makefile.PL可以得知之前的模块是否安装成功。
检查效果如下:

…………
[Core Features]
- DBI                   ...loaded. (1.609)
- DBD::mysql            ...loaded. (4.013)
- Time::HiRes           ...loaded. (1.9728)
- Config::Tiny          ...loaded. (2.12)
- Log::Dispatch         ...loaded. (2.54)
- Parallel::ForkManager ...loaded. (1.17)
- MHA::NodeConst        ...loaded. (0.56)
…………

如果missing则表示缺少。
如果都ok,则编译:
  1. # make && make install



------------------------------------- 添加mha配置文件 -------------------------------------

在管理服务器上配置:
91(manager):
  1. # mkdir /var/log/mha
  2. # vi /etc/mha-manger.cnf

此次实验仅一个节点,若多个节点可以改一下manager服务器的配置文件结构:
touch /etc/masterha_default.cnf 写[server default]
touch /etc/node1.cnf 写该节点上的server信息,如[server_1 ]、[server_2 ]等
分别用 --global_conf=/etc/masterha_default.cnf --conf=/etc/node1.cnf来使用


-------- 配置文件 --------
[server default] manager_workdir = /var/log/mha/ manager_log = /var/log/mha/manager.log ssh_user = root repl_user = repl_user repl_password = repl_user [server_1] hostname = 10.1.10.244 user = root password = root candidate_master = 1 master_binlog_dir = /var/lib/mysql [server_2] hostname = 10.1.10.80 user = root password = root #candidate_master = 1 master_binlog_dir = /var/lib/mysql [server_3] hostname = 10.1.10.91 user = root password = root #candidate_master = 1 master_binlog_dir = /var/lib/mysql
其他参数此处取默认值,更多参数可查阅本文结尾处的参考文档。
-------- 配置文件 --------



测试ssh是否通:
  1. # masterha_check_ssh --conf=/etc/mha-manager.cnf

 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
 - [info] Reading application default configuration from /etc/mha-manager.cnf..
 - [info] Reading server configuration from /etc/mha-manager.cnf..
 - [info] Starting SSH connection tests..
 - [debug] 
 - [debug]  Connecting via SSH from [email protected](10.1.10.244:22) to [email protected](10.1.10.80:22)..
 - [debug]   ok.
 - [debug]  Connecting via SSH from [email protected](10.1.10.244:22) to [email protected](10.1.10.91:22)..
 - [debug]   ok.
 - [debug] 
 - [debug]  Connecting via SSH from [email protected](10.1.10.80:22) to [email protected](10.1.10.244:22)..
 - [debug]   ok.
 - [debug]  Connecting via SSH from [email protected](10.1.10.80:22) to [email protected](10.1.10.91:22)..
 - [debug]   ok.
 - [debug] 
 - [debug]  Connecting via SSH from [email protected](10.1.10.91:22) to [email protected](10.1.10.244:22)..
Warning: Permanently added '10.1.10.91' (RSA) to the list of known hosts.
 - [debug]   ok.
 - [debug]  Connecting via SSH from [email protected](10.1.10.91:22) to [email protected](10.1.10.80:22)..
 - [debug]   ok.
 - [info] All SSH connection tests passed successfully.


继续检查复制是否通:
  1. # masterha_check_repl --conf=/etc/mha-manager.cnf

若看到【MySQL Replication Health is OK.】则表示成功连通。

若失败,则可以从防火墙,对应mysql服务器用户权限,对应操作系统用户对文件的读写权限等方面排查。


------------------------------------- 管理manager -------------------------------------

继续在管理服务器上启动:
91(manager):
启动manager:
  1. # nohup masterha_manager --conf=/etc/mha-manager.cnf /var/log/mha/manager.log 2>&1 &

检查进程:
  1. # ps -ef | grep manager
root     18210 10044  0 18:11 pts/1    00:00:00 perl /usr/local/bin/masterha_manager --conf=/etc/mha-manager.cnf /var/log/mha/manager.log


可以看一下 最后的 日志 情况:
  1. # tail -20 /var/log/mha/manager.log
……………………
Thu Feb 18 18:11:37 2016 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..


继续检查master状态
  1. # masterha_check_status --conf=/etc/mha-manager.cnf
mha-manager (pid:18210) is running(0:PING_OK), master:10.1.10.244


关闭manager:
  1. # masterha_stop --conf=/etc/mha-manager.cnf
Stopped mha-manager successfully.


------------------------------------- failover检测 -------------------------------------

此处模拟服务器整个宕机
操作:直接重启master服务器(此处是data01,即244)

通过tail -f查看mha-manager.log日志,刷出来如下记录:
  1. Thu Feb 18 18:23:37 2016 - [warning] Got error on MySQL select ping: 2006 (MySQL server has gone away)
  2. Thu Feb 18 18:23:37 2016 - [info] Executing SSH check script: save_binary_logs --command=test --start_pos=4 --binlog_dir=/var/lib/mysql --output_file=/var/tmp/save_binary_logs_test --manager_version=0.56 --binlog_prefix=data01-bin
  3. Thu Feb 18 18:23:37 2016 - [warning] HealthCheck: SSH to 10.1.10.244 is NOT reachable.
  4. Thu Feb 18 18:23:43 2016 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '10.1.10.244' (4))
  5. Thu Feb 18 18:23:43 2016 - [warning] Connection failed 2 time(s)..
  6. Thu Feb 18 18:23:46 2016 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '10.1.10.244' (4))
  7. Thu Feb 18 18:23:46 2016 - [warning] Connection failed 3 time(s)..
  8. Thu Feb 18 18:23:49 2016 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '10.1.10.244' (4))
  9. Thu Feb 18 18:23:49 2016 - [warning] Connection failed 4 time(s)..
  10. Thu Feb 18 18:23:49 2016 - [warning] Master is not reachable from health checker!
  11. Thu Feb 18 18:23:49 2016 - [warning] Master 10.1.10.244(10.1.10.244:3306) is not reachable!
  12. Thu Feb 18 18:23:49 2016 - [warning] SSH is NOT reachable.
  13. Thu Feb 18 18:23:49 2016 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha_default.cnf and /etc/mha-manager.cnf again, and trying to connect to all servers to check server status..
  14. Thu Feb 18 18:23:49 2016 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
  15. Thu Feb 18 18:23:49 2016 - [info] Reading application default configuration from /etc/mha-manager.cnf..
  16. Thu Feb 18 18:23:49 2016 - [info] Reading server configuration from /etc/mha-manager.cnf..
  17. Thu Feb 18 18:23:50 2016 - [info] GTID failover mode = 0
  18. Thu Feb 18 18:23:50 2016 - [info] Dead Servers:
  19. Thu Feb 18 18:23:50 2016 - [info] 10.1.10.244(10.1.10.244:3306)
  20. Thu Feb 18 18:23:50 2016 - [info] Alive Servers:
  21. Thu Feb 18 18:23:50 2016 - [info] 10.1.10.80(10.1.10.80:3306)
  22. Thu Feb 18 18:23:50 2016 - [info] 10.1.10.91(10.1.10.91:3306)
  23. Thu Feb 18 18:23:50 2016 - [info] Alive Slaves:
  24. Thu Feb 18 18:23:50 2016 - [info] 10.1.10.80(10.1.10.80:3306) Version=5.6.29-log (oldest major version between slaves) log-bin:enabled
  25. Thu Feb 18 18:23:50 2016 - [info] Replicating from 10.1.10.244(10.1.10.244:3306)
  26. Thu Feb 18 18:23:50 2016 - [info] 10.1.10.91(10.1.10.91:3306) Version=5.6.29-log (oldest major version between slaves) log-bin:enabled
  27. Thu Feb 18 18:23:50 2016 - [info] Replicating from 10.1.10.244(10.1.10.244:3306)
  28. Thu Feb 18 18:23:50 2016 - [info] Checking slave configurations..
  29. Thu Feb 18 18:23:50 2016 - [info] read_only=1 is not set on slave 10.1.10.80(10.1.10.80:3306).
  30. Thu Feb 18 18:23:50 2016 - [warning] relay_log_purge=0 is not set on slave 10.1.10.80(10.1.10.80:3306).
  31. Thu Feb 18 18:23:50 2016 - [info] read_only=1 is not set on slave 10.1.10.91(10.1.10.91:3306).
  32. Thu Feb 18 18:23:50 2016 - [warning] relay_log_purge=0 is not set on slave 10.1.10.91(10.1.10.91:3306).
  33. Thu Feb 18 18:23:50 2016 - [info] Checking replication filtering settings..
  34. Thu Feb 18 18:23:50 2016 - [info] Replication filtering check ok.
  35. Thu Feb 18 18:23:50 2016 - [info] Master is down!
  36. Thu Feb 18 18:23:50 2016 - [info] Terminating monitoring script.
  37. Thu Feb 18 18:23:50 2016 - [info] Got exit code 20 (Master dead).
  38. Thu Feb 18 18:23:50 2016 - [info] MHA::MasterFailover version 0.56.
  39. Thu Feb 18 18:23:50 2016 - [info] Starting master failover.
  40. Thu Feb 18 18:23:50 2016 - [info]
  41. Thu Feb 18 18:23:50 2016 - [info] * Phase 1: Configuration Check Phase..
  42. Thu Feb 18 18:23:50 2016 - [info]
  43. Thu Feb 18 18:23:51 2016 - [info] GTID failover mode = 0
  44. Thu Feb 18 18:23:51 2016 - [info] Dead Servers:
  45. Thu Feb 18 18:23:51 2016 - [info] 10.1.10.244(10.1.10.244:3306)
  46. Thu Feb 18 18:23:51 2016 - [info] Checking master reachability via MySQL(double check)...
  47. Thu Feb 18 18:23:52 2016 - [info] ok.
  48. Thu Feb 18 18:23:52 2016 - [info] Alive Servers:
  49. Thu Feb 18 18:23:52 2016 - [info] 10.1.10.80(10.1.10.80:3306)
  50. Thu Feb 18 18:23:52 2016 - [info] 10.1.10.91(10.1.10.91:3306)
  51. Thu Feb 18 18:23:52 2016 - [info] Alive Slaves:
  52. Thu Feb 18 18:23:52 2016 - [info] 10.1.10.80(10.1.10.80:3306) Version=5.6.29-log (oldest major version between slaves) log-bin:enabled
  53. Thu Feb 18 18:23:52 2016 - [info] Replicating from 10.1.10.244(10.1.10.244:3306)
  54. Thu Feb 18 18:23:52 2016 - [info] 10.1.10.91(10.1.10.91:3306) Version=5.6.29-log (oldest major version between slaves) log-bin:enabled
  55. Thu Feb 18 18:23:52 2016 - [info] Replicating from 10.1.10.244(10.1.10.244:3306)
  56. Thu Feb 18 18:23:52 2016 - [info] Starting Non-GTID based failover.
  57. Thu Feb 18 18:23:52 2016 - [info]
  58. Thu Feb 18 18:23:52 2016 - [info] ** Phase 1: Configuration Check Phase completed.
  59. Thu Feb 18 18:23:52 2016 - [info]
  60. Thu Feb 18 18:23:52 2016 - [info] * Phase 2: Dead Master Shutdown Phase..
  61. Thu Feb 18 18:23:52 2016 - [info]
  62. Thu Feb 18 18:23:52 2016 - [info] Forcing shutdown so that applications never connect to the current master..
  63. Thu Feb 18 18:23:52 2016 - [warning] master_ip_failover_script is not set. Skipping invalidating dead master IP address.
  64. Thu Feb 18 18:23:52 2016 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.
  65. Thu Feb 18 18:23:53 2016 - [info] * Phase 2: Dead Master Shutdown Phase completed.
  66. Thu Feb 18 18:23:53 2016 - [info]
  67. Thu Feb 18 18:23:53 2016 - [info] * Phase 3: Master Recovery Phase..
  68. Thu Feb 18 18:23:53 2016 - [info]
  69. Thu Feb 18 18:23:53 2016 - [info] * Phase 3.1: Getting Latest Slaves Phase..
  70. Thu Feb 18 18:23:53 2016 - [info]
  71. Thu Feb 18 18:23:53 2016 - [info] The latest binary log file/position on all slaves is data01-bin.000016:120
  72. Thu Feb 18 18:23:53 2016 - [info] Latest slaves (Slaves that received relay log files to the latest):
  73. Thu Feb 18 18:23:53 2016 - [info] 10.1.10.80(10.1.10.80:3306) Version=5.6.29-log (oldest major version between slaves) log-bin:enabled
  74. Thu Feb 18 18:23:53 2016 - [info] Replicating from 10.1.10.244(10.1.10.244:3306)
  75. Thu Feb 18 18:23:53 2016 - [info] 10.1.10.91(10.1.10.91:3306) Version=5.6.29-log (oldest major version between slaves) log-bin:enabled
  76. Thu Feb 18 18:23:53 2016 - [info] Replicating from 10.1.10.244(10.1.10.244:3306)
  77. Thu Feb 18 18:23:53 2016 - [info] The oldest binary log file/position on all slaves is data01-bin.000016:120
  78. Thu Feb 18 18:23:53 2016 - [info] Oldest slaves:
  79. Thu Feb 18 18:23:53 2016 - [info] 10.1.10.80(10.1.10.80:3306) Version=5.6.29-log (oldest major version between slaves) log-bin:enabled
  80. Thu Feb 18 18:23:53 2016 - [info] Replicating from 10.1.10.244(10.1.10.244:3306)
  81. Thu Feb 18 18:23:53 2016 - [info] 10.1.10.91(10.1.10.91:3306) Version=5.6.29-log (oldest major version between slaves) log-bin:enabled
  82. Thu Feb 18 18:23:53 2016 - [info] Replicating from 10.1.10.244(10.1.10.244:3306)
  83. Thu Feb 18 18:23:53 2016 - [info]
  84. Thu Feb 18 18:23:53 2016 - [info] * Phase 3.2: Saving Dead Master's Binlog Phase..
  85. Thu Feb 18 18:23:53 2016 - [info]
  86. Thu Feb 18 18:23:53 2016 - [warning] Dead Master is not SSH reachable. Could not save it's binlogs. Transactions that were not sent to the latest slave (Read_Master_Log_Pos to the tail of the dead master's binlog) were lost.
  87. Thu Feb 18 18:23:53 2016 - [info]
  88. Thu Feb 18 18:23:53 2016 - [info] * Phase 3.3: Determining New Master Phase..
  89. Thu Feb 18 18:23:53 2016 - [info]
  90. Thu Feb 18 18:23:53 2016 - [info] Finding the latest slave that has all relay logs for recovering other slaves..
  91. Thu Feb 18 18:23:53 2016 - [info] All slaves received relay logs to the same position. No need to resync each other.
  92. Thu Feb 18 18:23:53 2016 - [info] Searching new master from slaves..
  93. Thu Feb 18 18:23:53 2016 - [info] Candidate masters from the configuration file:
  94. Thu Feb 18 18:23:53 2016 - [info] Non-candidate masters:
  95. Thu Feb 18 18:23:53 2016 - [info] New master is 10.1.10.80(10.1.10.80:3306)
  96. Thu Feb 18 18:23:53 2016 - [info] Starting master failover..
  97. Thu Feb 18 18:23:53 2016 - [info]
  98. From:
  99. 10.1.10.244(10.1.10.244:3306) (current master)
  100. +--10.1.10.80(10.1.10.80:3306)
  101. +--10.1.10.91(10.1.10.91:3306)
  102. To:
  103. 10.1.10.80(10.1.10.80:3306) (new master)
  104. +--10.1.10.91(10.1.10.91:3306)
  105. Thu Feb 18 18:23:53 2016 - [info]
  106. Thu Feb 18 18:23:53 2016 - [info] * Phase 3.3: New Master Diff Log Generation Phase..
  107. Thu Feb 18 18:23:53 2016 - [info]
  108. Thu Feb 18 18:23:53 2016 - [info] This server has all relay logs. No need to generate diff files from the latest slave.
  109. Thu Feb 18 18:23:53 2016 - [info]
  110. Thu Feb 18 18:23:53 2016 - [info] * Phase 3.4: Master Log Apply Phase..
  111. Thu Feb 18 18:23:53 2016 - [info]
  112. Thu Feb 18 18:23:53 2016 - [info] *NOTICE: If any error happens from this phase, manual recovery is needed.
  113. Thu Feb 18 18:23:53 2016 - [info] Starting recovery on 10.1.10.80(10.1.10.80:3306)..
  114. Thu Feb 18 18:23:53 2016 - [info] This server has all relay logs. Waiting all logs to be applied..
  115. Thu Feb 18 18:23:53 2016 - [info] done.
  116. Thu Feb 18 18:23:53 2016 - [info] All relay logs were successfully applied.
  117. Thu Feb 18 18:23:53 2016 - [info] Getting new master's binlog name and position..
  118. Thu Feb 18 18:23:53 2016 - [info] data02-bin.000003:684
  119. Thu Feb 18 18:23:53 2016 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='10.1.10.80', MASTER_PORT=3306, MASTER_LOG_FILE='data02-bin.000003', MASTER_LOG_POS=684, MASTER_USER='repl_user', MASTER_PASSWORD='xxx';
  120. Thu Feb 18 18:23:53 2016 - [warning] master_ip_failover_script is not set. Skipping taking over new master IP address.
  121. Thu Feb 18 18:23:53 2016 - [info] ** Finished master recovery successfully.
  122. Thu Feb 18 18:23:53 2016 - [info] * Phase 3: Master Recovery Phase completed.
  123. Thu Feb 18 18:23:53 2016 - [info]
  124. Thu Feb 18 18:23:53 2016 - [info] * Phase 4: Slaves Recovery Phase..
  125. Thu Feb 18 18:23:53 2016 - [info]
  126. Thu Feb 18 18:23:53 2016 - [info] * Phase 4.1: Starting Parallel Slave Diff Log Generation Phase..
  127. Thu Feb 18 18:23:53 2016 - [info]
  128. Thu Feb 18 18:23:53 2016 - [info] -- Slave diff file generation on host 10.1.10.91(10.1.10.91:3306) started, pid: 18577. Check tmp log /var/log/mha//10.1.10.91_3306_20160218182350.log if it takes time..
  129. Thu Feb 18 18:23:54 2016 - [info]
  130. Thu Feb 18 18:23:54 2016 - [info] Log messages from 10.1.10.91 ...
  131. Thu Feb 18 18:23:54 2016 - [info]
  132. Thu Feb 18 18:23:53 2016 - [info] This server has all relay logs. No need to generate diff files from the latest slave.
  133. Thu Feb 18 18:23:54 2016 - [info] End of log messages from 10.1.10.91.
  134. Thu Feb 18 18:23:54 2016 - [info] -- 10.1.10.91(10.1.10.91:3306) has the latest relay log events.
  135. Thu Feb 18 18:23:54 2016 - [info] Generating relay diff files from the latest slave succeeded.
  136. Thu Feb 18 18:23:54 2016 - [info]
  137. Thu Feb 18 18:23:54 2016 - [info] * Phase 4.2: Starting Parallel Slave Log Apply Phase..
  138. Thu Feb 18 18:23:54 2016 - [info]
  139. Thu Feb 18 18:23:54 2016 - [info] -- Slave recovery on host 10.1.10.91(10.1.10.91:3306) started, pid: 18579. Check tmp log /var/log/mha//10.1.10.91_3306_20160218182350.log if it takes time..
  140. Thu Feb 18 18:23:55 2016 - [info]
  141. Thu Feb 18 18:23:55 2016 - [info] Log messages from 10.1.10.91 ...
  142. Thu Feb 18 18:23:55 2016 - [info]
  143. Thu Feb 18 18:23:54 2016 - [info] Starting recovery on 10.1.10.91(10.1.10.91:3306)..
  144. Thu Feb 18 18:23:54 2016 - [info] This server has all relay logs. Waiting all logs to be applied..
  145. Thu Feb 18 18:23:54 2016 - [info] done.
  146. Thu Feb 18 18:23:54 2016 - [info] All relay logs were successfully applied.
  147. Thu Feb 18 18:23:54 2016 - [info] Resetting slave 10.1.10.91(10.1.10.91:3306) and starting replication from the new master 10.1.10.80(10.1.10.80:3306)..
  148. Thu Feb 18 18:23:54 2016 - [info] Executed CHANGE MASTER.
  149. Thu Feb 18 18:23:54 2016 - [info] Slave started.
  150. Thu Feb 18 18:23:55 2016 - [info] End of log messages from 10.1.10.91.
  151. Thu Feb 18 18:23:55 2016 - [info] -- Slave recovery on host 10.1.10.91(10.1.10.91:3306) succeeded.
  152. Thu Feb 18 18:23:55 2016 - [info] All new slave servers recovered successfully.
  153. Thu Feb 18 18:23:55 2016 - [info]
  154. Thu Feb 18 18:23:55 2016 - [info] * Phase 5: New master cleanup phase..
  155. Thu Feb 18 18:23:55 2016 - [info]
  156. Thu Feb 18 18:23:55 2016 - [info] Resetting slave info on the new master..
  157. Thu Feb 18 18:23:55 2016 - [info] 10.1.10.80: Resetting slave info succeeded.
  158. Thu Feb 18 18:23:55 2016 - [info] Master failover to 10.1.10.80(10.1.10.80:3306) completed successfully.
  159. Thu Feb 18 18:23:55 2016 - [info]
  160. ----- Failover Report -----
  161. mha-manager: MySQL Master failover 10.1.10.244(10.1.10.244:3306) to 10.1.10.80(10.1.10.80:3306) succeeded
  162. Master 10.1.10.244(10.1.10.244:3306) is down!
  163. Check MHA Manager logs at manager:/var/log/mha/manager.log for details.
  164. Started automated(non-interactive) failover.
  165. The latest slave 10.1.10.80(10.1.10.80:3306) has all relay logs for recovery.
  166. Selected 10.1.10.80(10.1.10.80:3306) as a new master.
  167. 10.1.10.80(10.1.10.80:3306): OK: Applying all logs succeeded.
  168. 10.1.10.91(10.1.10.91:3306): This host has the latest relay log events.
  169. Generating relay diff files from the latest slave succeeded.
  170. 10.1.10.91(10.1.10.91:3306): OK: Applying all logs succeeded. Slave started, replicating from 10.1.10.80(10.1.10.80:3306)
  171. 10.1.10.80(10.1.10.80:3306): Resetting slave info succeeded.
  172. Master failover to 10.1.10.80(10.1.10.80:3306) completed successfully.



模拟结束

测的不太多,目前知道的一个坑:
mha建议关闭relay_log_purge,以保证用于有足够的中继日志去恢复其他的从库,也就是需要将relay_log_purge=0,导致relay-log无法定期清理。
可能需要手动添加定时任务来清理,清理方式可以是:
 ....../purge_relay_logs --user=$user --password=$password–disable_relay_log_purge >> ....../log/purge_relay_logs.log 2>&1




当然还需要配合vip节点等保证应用透明性,实现failover。

参考文档:

https://code.google.com/p/mysql-master-ha/wiki/TableOfContents?tm=6
http://wubx.net/mha-parameters/ 
http://www.chocolee.cn/archives/276#_Toc14903



作者微信公众号(持续更新)
MySQL-HA MHA(Master High Availability manager)搭建手册_第3张图片


来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/29773961/viewspace-1990914/,如需转载,请注明出处,否则将追究法律责任。

转载于:http://blog.itpub.net/29773961/viewspace-1990914/

你可能感兴趣的:(MySQL-HA MHA(Master High Availability manager)搭建手册)