Mysql-MHA 安装过程中遇到的问题  

1.运行masterha_check_repl --conf=/etc/masterha/app1.cnf

   Can't exec "mysqlbinlog": No such file or directory at /usr/local/perl5/MHA/BinlogManager.pm line 99.

   在node节点上执行 which mysqlbinlog,比如我的结果就是

   [localhost~]$ which mysqlbinlog
   /usr/local/mysql/bin/mysqlbinlog

   需要做一个软连接

   ln -s /usr/local/mysql/bin/mysqlbinlog /usr/bin/mysqlbinlog


2.运行master_check_ssh --conf=/etc/masterha/aap1.cnf

   connection via SSH [email protected]@192.168.17.200  ...

   permission denied (publickey,gssapi-keyex,gssapi-with-mic,password)

   [error]  [/usr/local/share/perl5/MHA/SSHcheck.pm,ln163]

   一般是公钥有问题,需要删除 /root/.ssh/known_hosts里面的相关ip内容 重新生成一下就ok了


3.事先解决perl依赖包问题

   yum -y install perl-Config-Tiny perl-Params-Validate perl-Log-Dispatch perl-Parallel-ForkManager

   yum -y install perl-DBD-MySQL  ncftp

   使用CPAN模块自动安装方法一:    

   安装前需要先联上网,并且您需要取得root权限。

   perl -MCPAN -e shell

   初次运行CPAN时需要做一些设置,如果您的机器是直接与因特网相联(拨号上网、专线,etc.),那么一路回车就行了,只需要在最后一步选一个离您最近的 CPAN 镜像站点。例如我选的是位于国内的http://www.cnblogs.com/itech/admin/ftp://www.perl87.cn/CPAN/。否则,如果您的机器位于防火墙之后,还需要设置ftp代理或http代理。下面是常用 cpan 命令。

   获得帮助

   cpan>help

   列出CPAN上所有模块的列表

   cpan>m

   安装模块,自动完成Net::Server模块从下载到安装的全过程。

   cpan>install Net::Server

   退出

   cpan>quit

   使用CPAN模块自动安装方法二:

   cpan -i 模块名    例如:cpan -i Net::Server


安装配置过程

192.168.17.199 node manager
192.168.17.200 node
192.168.17.201 node


先到https://code.google.com/p/mysql-master-ha/downloads/list 下载mha-manager 和 mha-node 的包

我下载的是mha4mysql-manager-0.54.tar.gz和mha4mysql-node-0.54.tar.gz

下载好了之后先安装perl依赖模块

yum -y install perl-Config-Tiny perl-Params-Validate perl-Log-Dispatch perl-Parallel-ForkManager

yum -y install perl-DBD-MySQL  ncftp


1.安装mha-node(三台机器上都装)

[local]# tar -zxvf mha4mysql-node-0.54.tar.gz -C /user/local/

[local]#cd /user/local/ mha4mysql-node-0.54/

[local]#perl Makefile.PL

*** Module::AutoInstall version 1.03
*** Checking for Perl dependencies...
[Core Features]
- DBI        ...loaded. (1.609)
- DBD::mysql ...loaded. (4.013)
*** Module::AutoInstall configuration finished.
Checking if your kit is complete...
Looks good
Writing Makefile for mha4mysql::node

[local]#make && make install


2.安装manager(192.168.17.199上装)

[local]#tar -zxvf mha4mysql-manager-0.54.tar.gz -C /user/local/
[local]#cd /user/local/mha4mysql-manager-0.54/
[local]#perl Makefile.PL

*** Module::AutoInstall version 1.03
*** Checking for Perl dependencies...
[Core Features]
- DBI                   ...loaded. (1.609)
- DBD::mysql            ...loaded. (4.013)
- Time::HiRes           ...loaded. (1.9721)
- Config::Tiny          ...loaded. (2.19)
- Log::Dispatch         ...loaded. (2.41)
- Parallel::ForkManager ...loaded. (1.05)
- MHA::NodeConst        ...loaded. (0.54)
*** Module::AutoInstall configuration finished.
Checking if your kit is complete...
Looks good

Writing Makefile for mha4mysql::manager
[local]#make && make install

3.编辑配置文件

[local]#mkdir /etc/masterha
[local]#mkdir -p /masterha/app1
[local]#cp samples/conf/* /etc/masterha/

[local]#cat /etc/masterha/app1.cnf  

[server default]
manager_workdir=/masterha/app1
manager_log=/masterha/app1/manager.log
#mysql user and password
user=king
password=king123
#
ssh_user=root
repl_user=repl
repl_password=repl
ping_interval=1
shutdown_script=""
#master_ip_failover_script="/data/master_ip_failover"
master_ip_online_change_script=""
report_script=""
[server1]
hostname=192.168.17.199
master_binlog_dir="/data/mydb/db01/logs/binlog/"
candidate_master=1
[server2]
hostname=192.168.17.200
master_binlog_dir="/data/mydb/db01/logs/binlog/"
candidate_master=1
[server3]
hostname=192.168.17.201
master_binlog_dir="/data/mydb/db01/logs/binlog/"
candidate_master=1

[local]#

4.设置三台机器之间的ssh公钥信任

      192.168.17.199 上
      [local]# ssh-keygen -t rsa
      [local]#  ssh-copy-id -i /root/.ssh/[email protected]
      [local]#ssh-copy-id -i /root/.ssh/[email protected]


      192.168.17.200 上
       [local]#ssh-keygen -t rsa
       [local]#ssh-copy-id -i /root/.ssh/[email protected]
       [local]#ssh-copy-id -i /root/.ssh/[email protected]


      192.168.17.201 上
      [local]# ssh-keygen -t rsa
      [local]# ssh-copy-id -i /root/.ssh/[email protected]
      [local]#ssh-copy-id -i /root/.ssh/[email protected]

5.测试ssh连接

[local]# masterha_check_ssh --conf=/etc/masterha/app1.cnf
Tue Nov 19 02:19:56 2013 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Tue Nov 19 02:19:56 2013 - [info] Reading application default configurations from /etc/masterha/app1.cnf..
Tue Nov 19 02:19:56 2013 - [info] Reading server configurations from /etc/masterha/app1.cnf..
Tue Nov 19 02:19:56 2013 - [info] Starting SSH connection tests..
Tue Nov 19 02:19:57 2013 - [debug]
Tue Nov 19 02:19:56 2013 - [debug]  Connecting via SSH from
[email protected](192.168.17.199:22) to[email protected](192.168.17.200:22)..
Tue Nov 19 02:19:56 2013 - [debug]   ok.
Tue Nov 19 02:19:56 2013 - [debug]  Connecting via SSH from
[email protected](192.168.17.199:22) to[email protected](192.168.17.201:22)..
Tue Nov 19 02:19:57 2013 - [debug]   ok.
Tue Nov 19 02:19:57 2013 - [debug]
Tue Nov 19 02:19:56 2013 - [debug]  Connecting via SSH from
[email protected](192.168.17.200:22) to[email protected](192.168.17.199:22)..
Tue Nov 19 02:19:57 2013 - [debug]   ok.
Tue Nov 19 02:19:57 2013 - [debug]  Connecting via SSH from
[email protected](192.168.17.200:22) to[email protected](192.168.17.201:22)..
Tue Nov 19 02:19:57 2013 - [debug]   ok.
Tue Nov 19 02:19:58 2013 - [debug]
Tue Nov 19 02:19:57 2013 - [debug]  Connecting via SSH from
[email protected](192.168.17.201:22) to[email protected](192.168.17.199:22)..
Tue Nov 19 02:19:57 2013 - [debug]   ok.
Tue Nov 19 02:19:57 2013 - [debug]  Connecting via SSH from
[email protected](192.168.17.201:22) to[email protected](192.168.17.200:22)..
Tue Nov 19 02:19:58 2013 - [debug]   ok.

Tue Nov 19 02:19:58 2013 - [info] All SSH connection tests passed successfully.

[local]#

6.配置主从(过程略)

192.168.17.199:3306  master

192.168.17.200:3306  slave1

192.168.17.201:3306  slave2


三台机器的mysql里都建上king用户和repl用户

GRANT ALL PRIVILEGES ON *.* TO'king'@'%'IDENTIFIED BY 'king123'

GRANT REPLICATION SLAVE ON *.* TO'repl'@'%'IDENTIFIED BY 'repl'


7.测试replication

[local]#masterha_check_repl --conf=/etc/masterha/app1.cnf

Tue Nov 19 02:27:17 2013 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Tue Nov 19 02:27:17 2013 - [info] Reading application default configurations from /etc/masterha/app1.cnf..
Tue Nov 19 02:27:17 2013 - [info] Reading server configurations from /etc/masterha/app1.cnf..
Tue Nov 19 02:27:17 2013 - [info] MHA::MasterMonitor version 0.54.
Tue Nov 19 02:27:17 2013 - [info] Dead Servers:
Tue Nov 19 02:27:17 2013 - [info] Alive Servers:
Tue Nov 19 02:27:17 2013 - [info]   192.168.17.199(192.168.17.199:3306)
Tue Nov 19 02:27:17 2013 - [info]   192.168.17.200(192.168.17.200:3306)
Tue Nov 19 02:27:17 2013 - [info]   192.168.17.201(192.168.17.201:3306)
Tue Nov 19 02:27:17 2013 - [info] Alive Slaves:
Tue Nov 19 02:27:17 2013 - [info]   192.168.17.200(192.168.17.200:3306)  Version=5.5.33-rel31.1-log (oldest major version between slaves) log-bin:enabled
Tue Nov 19 02:27:17 2013 - [info]     Replicating from 192.168.17.199(192.168.17.199:3306)
Tue Nov 19 02:27:17 2013 - [info]     Primary candidate for the new Master (candidate_master is set)
Tue Nov 19 02:27:17 2013 - [info]   192.168.17.201(192.168.17.201:3306)  Version=5.5.33-rel31.1-log (oldest major version between slaves) log-bin:enabled
Tue Nov 19 02:27:17 2013 - [info]     Replicating from 192.168.17.199(192.168.17.199:3306)
Tue Nov 19 02:27:17 2013 - [info]     Primary candidate for the new Master (candidate_master is set)
Tue Nov 19 02:27:17 2013 - [info] Current Alive Master: 192.168.17.199(192.168.17.199:3306)
Tue Nov 19 02:27:17 2013 - [info] Checking slave configurations..
Tue Nov 19 02:27:17 2013 - [info]  read_only=1 is not set on slave 192.168.17.200(192.168.17.200:3306).
Tue Nov 19 02:27:17 2013 - [warning]  relay_log_purge=0 is not set on slave 192.168.17.200(192.168.17.200:3306).
Tue Nov 19 02:27:17 2013 - [info]  read_only=1 is not set on slave 192.168.17.201(192.168.17.201:3306).
Tue Nov 19 02:27:17 2013 - [warning]  relay_log_purge=0 is not set on slave 192.168.17.201(192.168.17.201:3306).
Tue Nov 19 02:27:17 2013 - [info] Checking replication filtering settings..
Tue Nov 19 02:27:17 2013 - [info]  binlog_do_db= , binlog_ignore_db= information_schema.%,mysql.%
Tue Nov 19 02:27:17 2013 - [info]  Replication filtering check ok.
Tue Nov 19 02:27:17 2013 - [info] Starting SSH connection tests..
Tue Nov 19 02:27:19 2013 - [info] All SSH connection tests passed successfully.
Tue Nov 19 02:27:19 2013 - [info] Checking MHA Node version..
Tue Nov 19 02:27:20 2013 - [info]  Version check ok.
Tue Nov 19 02:27:20 2013 - [info] Checking SSH publickey authentication settings on the current master..
Tue Nov 19 02:27:20 2013 - [info] HealthCheck: SSH to 192.168.17.199 is reachable.
Tue Nov 19 02:27:20 2013 - [info] Master MHA Node version is 0.54.
Tue Nov 19 02:27:20 2013 - [info] Checking recovery script configurations on the current master..
Tue Nov 19 02:27:20 2013 - [info]   Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/mydb/db01/logs/binlog/ --output_file=/var/tmp/save_binary_logs_test --manager_version=0.54 --start_file=mysql-bin.000009
Tue Nov 19 02:27:20 2013 - [info]   Connecting to
[email protected](192.168.17.199)..
 Creating /var/tmp if not exists..    ok.
 Checking output directory is accessible or not..
  ok.
 Binlog found at /data/mydb/db01/logs/binlog/, up to mysql-bin.000009
Tue Nov 19 02:27:20 2013 - [info] Master setting check done.
Tue Nov 19 02:27:20 2013 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
Tue Nov 19 02:27:20 2013 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user='king' --slave_host=192.168.17.200 --slave_ip=192.168.17.200 --slave_port=3306 --workdir=/var/tmp --target_version=5.5.33-rel31.1-log --manager_version=0.54 --relay_log_info=/data/mydb/db01/data/relay-log.info  --relay_dir=/data/mydb/db01/data/  --slave_pass=xxx
Tue Nov 19 02:27:20 2013 - [info]   Connecting to
[email protected](192.168.17.200:22)..
 Checking slave recovery environment settings..
   Opening /data/mydb/db01/data/relay-log.info ... ok.
   Relay log found at /data/mydb/db01/data, up to relay-bin.000004
   Temporary relay log file is /data/mydb/db01/data/relay-bin.000004
   Testing mysql connection and privileges.. done.
   Testing mysqlbinlog output.. done.
   Cleaning up test file(s).. done.
Tue Nov 19 02:27:21 2013 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user='king' --slave_host=192.168.17.201 --slave_ip=192.168.17.201 --slave_port=3306 --workdir=/var/tmp --target_version=5.5.33-rel31.1-log --manager_version=0.54 --relay_log_info=/data/mydb/db01/data/relay-log.info  --relay_dir=/data/mydb/db01/data/  --slave_pass=xxx
Tue Nov 19 02:27:21 2013 - [info]   Connecting to
[email protected](192.168.17.201:22)..
 Checking slave recovery environment settings..
   Opening /data/mydb/db01/data/relay-log.info ... ok.
   Relay log found at /data/mydb/db01/data, up to relay-bin.000004
   Temporary relay log file is /data/mydb/db01/data/relay-bin.000004
   Testing mysql connection and privileges.. done.
   Testing mysqlbinlog output.. done.
   Cleaning up test file(s).. done.
Tue Nov 19 02:27:21 2013 - [info] Slaves settings check done.
Tue Nov 19 02:27:21 2013 - [info]
192.168.17.199 (current master)
+--192.168.17.200
+--192.168.17.201

Tue Nov 19 02:27:21 2013 - [info] Checking replication health on 192.168.17.200..
Tue Nov 19 02:27:21 2013 - [info]  ok.
Tue Nov 19 02:27:21 2013 - [info] Checking replication health on 192.168.17.201..
Tue Nov 19 02:27:21 2013 - [info]  ok.
Tue Nov 19 02:27:21 2013 - [warning] master_ip_failover_script is not defined.
Tue Nov 19 02:27:21 2013 - [warning] shutdown_script is not defined.
Tue Nov 19 02:27:21 2013 - [info] Got exit code 0 (Not master dead).

MySQL Replication Health is OK.

[local]#


8.启动管理节点进程

[local]# nohup masterha_manager --conf=/etc/masterha/app1.cnf > /tmp/mha_manager.log  < /dev/null 2>&1 &

[local]# masterha_check_status --conf=/etc/masterha/app1.cnf
app1 (pid:22852) is running(0:PING_OK), master:192.168.17.199
[local]#

9.测试master favior

在192.168.17.199(manager)上 tailf /etc/masterha/app1/manager.log

然后停止192.168.17.199的3306 mysql实例,并查看manager.log


]# tail -f /masterha/app1/manager.log
192.168.17.199 (current master)
+--192.168.17.200
+--192.168.17.201

Tue Nov 19 00:32:04 2013 - [warning] master_ip_failover_script is not defined.
Tue Nov 19 00:32:04 2013 - [warning] shutdown_script is not defined.
Tue Nov 19 00:32:04 2013 - [info] Set master ping interval 1 seconds.
Tue Nov 19 00:32:04 2013 - [warning] secondary_check_script is not defined. It is highly recommended setting it to check master reachability from two or more routes.
Tue Nov 19 00:32:04 2013 - [info] Starting ping health check on 192.168.17.199(192.168.17.199:3306)..
Tue Nov 19 00:32:04 2013 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..
Tue Nov 19 17:59:07 2013 - [warning] Got error on MySQL select ping: 2006 (MySQL server has gone away)
Tue Nov 19 17:59:07 2013 - [info] Executing SSH check script: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/mydb/db01/logs/binlog/ --output_file=/var/tmp/save_binary_logs_test --manager_version=0.54 --binlog_prefix=mysql-bin
Tue Nov 19 17:59:08 2013 - [info] HealthCheck: SSH to 192.168.17.199 is reachable.
Tue Nov 19 17:59:08 2013 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111)
Tue Nov 19 17:59:08 2013 - [warning] Connection failed 1 time(s)..
Tue Nov 19 17:59:09 2013 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111)
Tue Nov 19 17:59:09 2013 - [warning] Connection failed 2 time(s)..
Tue Nov 19 17:59:10 2013 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111)
Tue Nov 19 17:59:10 2013 - [warning] Connection failed 3 time(s)..
Tue Nov 19 17:59:10 2013 - [warning] Master is not reachable from health checker!
Tue Nov 19 17:59:10 2013 - [warning] Master 192.168.17.199(192.168.17.199:3306) is not reachable!
Tue Nov 19 17:59:10 2013 - [warning] SSH is reachable.
Tue Nov 19 17:59:10 2013 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha_default.cnf and /etc/masterha/app1.cnf again, and trying to connect to all servers to check server status..
Tue Nov 19 17:59:10 2013 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Tue Nov 19 17:59:10 2013 - [info] Reading application default configurations from /etc/masterha/app1.cnf..
Tue Nov 19 17:59:10 2013 - [info] Reading server configurations from /etc/masterha/app1.cnf..
Tue Nov 19 17:59:10 2013 - [info] Dead Servers:
Tue Nov 19 17:59:10 2013 - [info]   192.168.17.199(192.168.17.199:3306)
Tue Nov 19 17:59:10 2013 - [info] Alive Servers:
Tue Nov 19 17:59:10 2013 - [info]   192.168.17.200(192.168.17.200:3306)
Tue Nov 19 17:59:10 2013 - [info]   192.168.17.201(192.168.17.201:3306)
Tue Nov 19 17:59:10 2013 - [info] Alive Slaves:
Tue Nov 19 17:59:10 2013 - [info]   192.168.17.200(192.168.17.200:3306)  Version=5.5.33-rel31.1-log (oldest major version between slaves) log-bin:enabled
Tue Nov 19 17:59:10 2013 - [info]     Replicating from 192.168.17.199(192.168.17.199:3306)
Tue Nov 19 17:59:10 2013 - [info]     Primary candidate for the new Master (candidate_master is set)
Tue Nov 19 17:59:10 2013 - [info]   192.168.17.201(192.168.17.201:3306)  Version=5.5.33-rel31.1-log (oldest major version between slaves) log-bin:enabled
Tue Nov 19 17:59:10 2013 - [info]     Replicating from 192.168.17.199(192.168.17.199:3306)
Tue Nov 19 17:59:10 2013 - [info]     Primary candidate for the new Master (candidate_master is set)
Tue Nov 19 17:59:10 2013 - [info] Checking slave configurations..
Tue Nov 19 17:59:10 2013 - [info]  read_only=1 is not set on slave 192.168.17.200(192.168.17.200:3306).
Tue Nov 19 17:59:10 2013 - [warning]  relay_log_purge=0 is not set on slave 192.168.17.200(192.168.17.200:3306).
Tue Nov 19 17:59:10 2013 - [info]  read_only=1 is not set on slave 192.168.17.201(192.168.17.201:3306).
Tue Nov 19 17:59:10 2013 - [warning]  relay_log_purge=0 is not set on slave 192.168.17.201(192.168.17.201:3306).
Tue Nov 19 17:59:10 2013 - [info] Checking replication filtering settings..
Tue Nov 19 17:59:10 2013 - [info]  Replication filtering check ok.
Tue Nov 19 17:59:10 2013 - [info] Master is down!
Tue Nov 19 17:59:10 2013 - [info] Terminating monitoring script.
Tue Nov 19 17:59:10 2013 - [info] Got exit code 20 (Master dead).
Tue Nov 19 17:59:10 2013 - [info] MHA::MasterFailover version 0.54.
Tue Nov 19 17:59:10 2013 - [info] Starting master failover.
Tue Nov 19 17:59:10 2013 - [info]
Tue Nov 19 17:59:10 2013 - [info] * Phase 1: Configuration Check Phase..
Tue Nov 19 17:59:10 2013 - [info]
Tue Nov 19 17:59:11 2013 - [info] Dead Servers:
Tue Nov 19 17:59:11 2013 - [info]   192.168.17.199(192.168.17.199:3306)
Tue Nov 19 17:59:11 2013 - [info] Checking master reachability via mysql(double check)..
Tue Nov 19 17:59:11 2013 - [info]  ok.
Tue Nov 19 17:59:11 2013 - [info] Alive Servers:
Tue Nov 19 17:59:11 2013 - [info]   192.168.17.200(192.168.17.200:3306)
Tue Nov 19 17:59:11 2013 - [info]   192.168.17.201(192.168.17.201:3306)
Tue Nov 19 17:59:11 2013 - [info] Alive Slaves:
Tue Nov 19 17:59:11 2013 - [info]   192.168.17.200(192.168.17.200:3306)  Version=5.5.33-rel31.1-log (oldest major version between slaves) log-bin:enabled
Tue Nov 19 17:59:11 2013 - [info]     Replicating from 192.168.17.199(192.168.17.199:3306)
Tue Nov 19 17:59:11 2013 - [info]     Primary candidate for the new Master (candidate_master is set)
Tue Nov 19 17:59:11 2013 - [info]   192.168.17.201(192.168.17.201:3306)  Version=5.5.33-rel31.1-log (oldest major version between slaves) log-bin:enabled
Tue Nov 19 17:59:11 2013 - [info]     Replicating from 192.168.17.199(192.168.17.199:3306)
Tue Nov 19 17:59:11 2013 - [info]     Primary candidate for the new Master (candidate_master is set)
Tue Nov 19 17:59:11 2013 - [info] ** Phase 1: Configuration Check Phase completed.
Tue Nov 19 17:59:11 2013 - [info]
Tue Nov 19 17:59:11 2013 - [info] * Phase 2: Dead Master Shutdown Phase..
Tue Nov 19 17:59:11 2013 - [info]
Tue Nov 19 17:59:11 2013 - [info] Forcing shutdown so that applications never connect to the current master..
Tue Nov 19 17:59:11 2013 - [warning] master_ip_failover_script is not set. Skipping invalidating dead master ip address.
Tue Nov 19 17:59:11 2013 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.
Tue Nov 19 17:59:11 2013 - [info] * Phase 2: Dead Master Shutdown Phase completed.
Tue Nov 19 17:59:11 2013 - [info]
Tue Nov 19 17:59:11 2013 - [info] * Phase 3: Master Recovery Phase..
Tue Nov 19 17:59:11 2013 - [info]
Tue Nov 19 17:59:11 2013 - [info] * Phase 3.1: Getting Latest Slaves Phase..
Tue Nov 19 17:59:11 2013 - [info]
Tue Nov 19 17:59:11 2013 - [info] The latest binary log file/position on all slaves is mysql-bin.000009:2386
Tue Nov 19 17:59:11 2013 - [info] Latest slaves (Slaves that received relay log files to the latest):
Tue Nov 19 17:59:11 2013 - [info]   192.168.17.200(192.168.17.200:3306)  Version=5.5.33-rel31.1-log (oldest major version between slaves) log-bin:enabled
Tue Nov 19 17:59:11 2013 - [info]     Replicating from 192.168.17.199(192.168.17.199:3306)
Tue Nov 19 17:59:11 2013 - [info]     Primary candidate for the new Master (candidate_master is set)
Tue Nov 19 17:59:11 2013 - [info]   192.168.17.201(192.168.17.201:3306)  Version=5.5.33-rel31.1-log (oldest major version between slaves) log-bin:enabled
Tue Nov 19 17:59:11 2013 - [info]     Replicating from 192.168.17.199(192.168.17.199:3306)
Tue Nov 19 17:59:11 2013 - [info]     Primary candidate for the new Master (candidate_master is set)
Tue Nov 19 17:59:11 2013 - [info] The oldest binary log file/position on all slaves is mysql-bin.000009:2386
Tue Nov 19 17:59:11 2013 - [info] Oldest slaves:
Tue Nov 19 17:59:11 2013 - [info]   192.168.17.200(192.168.17.200:3306)  Version=5.5.33-rel31.1-log (oldest major version between slaves) log-bin:enabled
Tue Nov 19 17:59:11 2013 - [info]     Replicating from 192.168.17.199(192.168.17.199:3306)
Tue Nov 19 17:59:11 2013 - [info]     Primary candidate for the new Master (candidate_master is set)
Tue Nov 19 17:59:11 2013 - [info]   192.168.17.201(192.168.17.201:3306)  Version=5.5.33-rel31.1-log (oldest major version between slaves) log-bin:enabled
Tue Nov 19 17:59:11 2013 - [info]     Replicating from 192.168.17.199(192.168.17.199:3306)
Tue Nov 19 17:59:11 2013 - [info]     Primary candidate for the new Master (candidate_master is set)
Tue Nov 19 17:59:11 2013 - [info]
Tue Nov 19 17:59:11 2013 - [info] * Phase 3.2: Saving Dead Master's Binlog Phase..
Tue Nov 19 17:59:11 2013 - [info]
Tue Nov 19 17:59:11 2013 - [info] Fetching dead master's binary logs..
Tue Nov 19 17:59:11 2013 - [info] Executing command on the dead master 192.168.17.199(192.168.17.199:3306): save_binary_logs --command=save --start_file=mysql-bin.000009  --start_pos=2386 --binlog_dir=/data/mydb/db01/logs/binlog/ --output_file=/var/tmp/saved_master_binlog_from_192.168.17.199_3306_20131119175910.binlog --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.54
 Creating /var/tmp if not exists..    ok.
Concat binary/relay logs from mysql-bin.000009 pos 2386 to mysql-bin.000009 EOF into /var/tmp/saved_master_binlog_from_192.168.17.199_3306_20131119175910.binlog ..
 Dumping binlog format description event, from position 0 to 107.. ok.
 Dumping effective binlog data from /data/mydb/db01/logs/binlog//mysql-bin.000009 position 2386 to tail(2405).. ok.
sh: mysqlbinlog: command not found
Failed to save binary log: /var/tmp/saved_master_binlog_from_192.168.17.199_3306_20131119175910.binlog is broken!
at /usr/local/bin/save_binary_logs line 170
Tue Nov 19 17:59:11 2013 - [error][/usr/local/share/perl5/MHA/MasterFailover.pm, ln577] Failed to save binary log events from the orig master. Maybe disks on binary logs are not accessible or binary log itself is corrupt?
Tue Nov 19 17:59:11 2013 - [info]
Tue Nov 19 17:59:11 2013 - [info] * Phase 3.3: Determining New Master Phase..
Tue Nov 19 17:59:11 2013 - [info]
Tue Nov 19 17:59:11 2013 - [info] Finding the latest slave that has all relay logs for recovering other slaves..
Tue Nov 19 17:59:11 2013 - [info] All slaves received relay logs to the same position. No need to resync each other.
Tue Nov 19 17:59:11 2013 - [info] Searching new master from slaves..
Tue Nov 19 17:59:11 2013 - [info]  Candidate masters from the configuration file:
Tue Nov 19 17:59:11 2013 - [info]   192.168.17.200(192.168.17.200:3306)  Version=5.5.33-rel31.1-log (oldest major version between slaves) log-bin:enabled
Tue Nov 19 17:59:11 2013 - [info]     Replicating from 192.168.17.199(192.168.17.199:3306)
Tue Nov 19 17:59:11 2013 - [info]     Primary candidate for the new Master (candidate_master is set)
Tue Nov 19 17:59:11 2013 - [info]   192.168.17.201(192.168.17.201:3306)  Version=5.5.33-rel31.1-log (oldest major version between slaves) log-bin:enabled
Tue Nov 19 17:59:11 2013 - [info]     Replicating from 192.168.17.199(192.168.17.199:3306)
Tue Nov 19 17:59:11 2013 - [info]     Primary candidate for the new Master (candidate_master is set)
Tue Nov 19 17:59:11 2013 - [info]  Non-candidate masters:
Tue Nov 19 17:59:11 2013 - [info]  Searching from candidate_master slaves which have received the latest relay log events..
Tue Nov 19 17:59:11 2013 - [info] New master is 192.168.17.200(192.168.17.200:3306)
Tue Nov 19 17:59:11 2013 - [info] Starting master failover..
Tue Nov 19 17:59:11 2013 - [info]
From:
192.168.17.199 (current master)
+--192.168.17.200
+--192.168.17.201

To:
192.168.17.200 (new master)
+--192.168.17.201
Tue Nov 19 17:59:11 2013 - [info]
Tue Nov 19 17:59:11 2013 - [info] * Phase 3.3: New Master Diff Log Generation Phase..
Tue Nov 19 17:59:11 2013 - [info]
Tue Nov 19 17:59:11 2013 - [info]  This server has all relay logs. No need to generate diff files from the latest slave.
Tue Nov 19 17:59:11 2013 - [info]
Tue Nov 19 17:59:11 2013 - [info] * Phase 3.4: Master Log Apply Phase..
Tue Nov 19 17:59:11 2013 - [info]
Tue Nov 19 17:59:11 2013 - [info] *NOTICE: If any error happens from this phase, manual recovery is needed.
Tue Nov 19 17:59:11 2013 - [info] Starting recovery on 192.168.17.200(192.168.17.200:3306)..
Tue Nov 19 17:59:11 2013 - [info]  This server has all relay logs. Waiting all logs to be applied..
Tue Nov 19 17:59:11 2013 - [info]   done.
Tue Nov 19 17:59:11 2013 - [info]  All relay logs were successfully applied.
Tue Nov 19 17:59:11 2013 - [info] Getting new master's binlog name and position..
Tue Nov 19 17:59:11 2013 - [info]  mysql-bin.000008:2606
Tue Nov 19 17:59:11 2013 - [info]  All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='192.168.17.200', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-bin.000008', MASTER_LOG_POS=2606, MASTER_USER='repl', MASTER_PASSWORD='xxx';
Tue Nov 19 17:59:11 2013 - [warning] master_ip_failover_script is not set. Skipping taking over new master ip address.
Tue Nov 19 17:59:11 2013 - [info] ** Finished master recovery successfully.
Tue Nov 19 17:59:11 2013 - [info] * Phase 3: Master Recovery Phase completed.
Tue Nov 19 17:59:11 2013 - [info]
Tue Nov 19 17:59:11 2013 - [info] * Phase 4: Slaves Recovery Phase..
Tue Nov 19 17:59:11 2013 - [info]
Tue Nov 19 17:59:11 2013 - [info] * Phase 4.1: Starting Parallel Slave Diff Log Generation Phase..
Tue Nov 19 17:59:11 2013 - [info]
Tue Nov 19 17:59:11 2013 - [info] -- Slave diff file generation on host 192.168.17.201(192.168.17.201:3306) started, pid: 38557. Check tmp log /masterha/app1/192.168.17.201_3306_20131119175910.log if it takes time..
Tue Nov 19 17:59:11 2013 - [info]
Tue Nov 19 17:59:11 2013 - [info] Log messages from 192.168.17.201 ...
Tue Nov 19 17:59:11 2013 - [info]
Tue Nov 19 17:59:11 2013 - [info]  This server has all relay logs. No need to generate diff files from the latest slave.
Tue Nov 19 17:59:11 2013 - [info] End of log messages from 192.168.17.201.
Tue Nov 19 17:59:11 2013 - [info] -- 192.168.17.201(192.168.17.201:3306) has the latest relay log events.
Tue Nov 19 17:59:11 2013 - [info] Generating relay diff files from the latest slave succeeded.
Tue Nov 19 17:59:11 2013 - [info]
Tue Nov 19 17:59:11 2013 - [info] * Phase 4.2: Starting Parallel Slave Log Apply Phase..
Tue Nov 19 17:59:11 2013 - [info]
Tue Nov 19 17:59:11 2013 - [info] -- Slave recovery on host 192.168.17.201(192.168.17.201:3306) started, pid: 38559. Check tmp log /masterha/app1/192.168.17.201_3306_20131119175910.log if it takes time..
Tue Nov 19 17:59:11 2013 - [info]
Tue Nov 19 17:59:11 2013 - [info] Log messages from 192.168.17.201 ...
Tue Nov 19 17:59:11 2013 - [info]
Tue Nov 19 17:59:11 2013 - [info] Starting recovery on 192.168.17.201(192.168.17.201:3306)..
Tue Nov 19 17:59:11 2013 - [info]  This server has all relay logs. Waiting all logs to be applied..
Tue Nov 19 17:59:11 2013 - [info]   done.
Tue Nov 19 17:59:11 2013 - [info]  All relay logs were successfully applied.
Tue Nov 19 17:59:11 2013 - [info]  Resetting slave 192.168.17.201(192.168.17.201:3306) and starting replication from the new master 192.168.17.200(192.168.17.200:3306)..
Tue Nov 19 17:59:11 2013 - [info]  Executed CHANGE MASTER.
Tue Nov 19 17:59:11 2013 - [info]  Slave started.
Tue Nov 19 17:59:11 2013 - [info] End of log messages from 192.168.17.201.
Tue Nov 19 17:59:11 2013 - [info] -- Slave recovery on host 192.168.17.201(192.168.17.201:3306) succeeded.
Tue Nov 19 17:59:11 2013 - [info] All new slave servers recovered successfully.
Tue Nov 19 17:59:11 2013 - [info]
Tue Nov 19 17:59:11 2013 - [info] * Phase 5: New master cleanup phase..
Tue Nov 19 17:59:11 2013 - [info]
Tue Nov 19 17:59:11 2013 - [info] Resetting slave info on the new master..
Tue Nov 19 17:59:12 2013 - [info]  192.168.17.200: Resetting slave info succeeded.
Tue Nov 19 17:59:12 2013 - [info] Master failover to 192.168.17.200(192.168.17.200:3306) completed successfully.
Tue Nov 19 17:59:12 2013 - [info]

----- Failover Report -----

app1: MySQL Master failover 192.168.17.199 to 192.168.17.200 succeeded

Master 192.168.17.199 is down!

Check MHA Manager logs at rhel-king-01:/masterha/app1/manager.log for details.

Started automated(non-interactive) failover.
The latest slave 192.168.17.200(192.168.17.200:3306) has all relay logs for recovery.
Selected 192.168.17.200 as a new master.
192.168.17.200: OK: Applying all logs succeeded.
192.168.17.201: This host has the latest relay log events.
Generating relay diff files from the latest slave succeeded.
192.168.17.201: OK: Applying all logs succeeded. Slave started, replicating from 192.168.17.200.
192.168.17.200: Resetting slave info succeeded.
Master failover to 192.168.17.200(192.168.17.200:3306) completed successfully.


10.切换后旧master的修复及重新上线

master已经由192.168.17.199  3306 切到了192.168.17.200 3306  实际环境中数据是在不断的变化的,而在切换点mha没有记录当时新master的log-file和log-pos 所以要想直接启动192.168.17.199 3306 然后change master to 192.168.17.200 3306的话是不行的,只能对新主或slave2做一个全备然后再恢复再change。

另外,当执行切换后管理节点上的masterha_manager进程会自动stop,所以等修复好后要再次执行启动


[local]#nohup masterha_manager --conf=/etc/masterha/app1.cnf > /tmp/mha_manager.log  < /dev/null 2>&1 &
[2] 41276
[local]# masterha_check_status --conf=/etc/masterha/app1.cnf                                                                      
app1 (pid:41276) is running(0:PING_OK), master:192.168.17.200


看详细日志


Tue Nov 19 20:52:38 2013 - [info] MHA::MasterMonitor version 0.54.
Tue Nov 19 20:52:38 2013 - [info] Dead Servers:
Tue Nov 19 20:52:38 2013 - [info] Alive Servers:
Tue Nov 19 20:52:38 2013 - [info]   192.168.17.199(192.168.17.199:3306)
Tue Nov 19 20:52:38 2013 - [info]   192.168.17.200(192.168.17.200:3306)
Tue Nov 19 20:52:38 2013 - [info]   192.168.17.201(192.168.17.201:3306)

Tue Nov 19 20:52:38 2013 - [info] Alive Slaves:
Tue Nov 19 20:52:38 2013 - [info]   192.168.17.199(192.168.17.199:3306)  Version=5.5.33-rel31.1-log (oldest major version between slaves) log-bin:enabled
Tue Nov 19 20:52:38 2013 - [info]     Replicating from 192.168.17.200(192.168.17.200:3306)
Tue Nov 19 20:52:38 2013 - [info]     Primary candidate for the new Master (candidate_master is set)
Tue Nov 19 20:52:38 2013 - [info]   192.168.17.201(192.168.17.201:3306)  Version=5.5.33-rel31.1-log (oldest major version between slaves) log-bin:enabled
Tue Nov 19 20:52:38 2013 - [info]     Replicating from 192.168.17.200(192.168.17.200:3306)
Tue Nov 19 20:52:38 2013 - [info]     Primary candidate for the new Master (candidate_master is set)
Tue Nov 19 20:52:38 2013 - [info] Current Alive Master: 192.168.17.200(192.168.17.200:3306)

Tue Nov 19 20:52:38 2013 - [info] Checking slave configurations..
Tue Nov 19 20:52:38 2013 - [info]  read_only=1 is not set on slave 192.168.17.199(192.168.17.199:3306).
Tue Nov 19 20:52:38 2013 - [warning]  relay_log_purge=0 is not set on slave 192.168.17.199(192.168.17.199:3306).
Tue Nov 19 20:52:38 2013 - [info]  read_only=1 is not set on slave 192.168.17.201(192.168.17.201:3306).
Tue Nov 19 20:52:38 2013 - [warning]  relay_log_purge=0 is not set on slave 192.168.17.201(192.168.17.201:3306).
Tue Nov 19 20:52:38 2013 - [info] Checking replication filtering settings..
Tue Nov 19 20:52:38 2013 - [info]  binlog_do_db= , binlog_ignore_db= information_schema.%,mysql.%
Tue Nov 19 20:52:38 2013 - [info]  Replication filtering check ok.
Tue Nov 19 20:52:38 2013 - [info] Starting SSH connection tests..
Tue Nov 19 20:52:40 2013 - [info] All SSH connection tests passed successfully.
Tue Nov 19 20:52:40 2013 - [info] Checking MHA Node version..
Tue Nov 19 20:52:41 2013 - [info]  Version check ok.
Tue Nov 19 20:52:41 2013 - [info] Checking SSH publickey authentication settings on the current master..
Tue Nov 19 20:52:41 2013 - [info] HealthCheck: SSH to 192.168.17.200 is reachable.
Tue Nov 19 20:52:41 2013 - [info] Master MHA Node version is 0.54.
Tue Nov 19 20:52:41 2013 - [info] Checking recovery script configurations on the current master..
Tue Nov 19 20:52:41 2013 - [info]   Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/mydb/db01/logs/binlog/ --output_file=/var/tmp/save_binary_logs_test --manager_version=0.54 --start_file=mysql-bin.000008
Tue Nov 19 20:52:41 2013 - [info]   Connecting to [email protected](192.168.17.200)..
 Creating /var/tmp if not exists..    ok.
 Checking output directory is accessible or not..
  ok.
 Binlog found at /data/mydb/db01/logs/binlog/, up to mysql-bin.000008
Tue Nov 19 20:52:42 2013 - [info] Master setting check done.
Tue Nov 19 20:52:42 2013 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
Tue Nov 19 20:52:42 2013 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user='king' --slave_host=192.168.17.199 --slave_ip=192.168.17.199 --slave_port=3306 --workdir=/var/tmp --target_version=5.5.33-rel31.1-log --manager_version=0.54 --relay_log_info=/data/mydb/db01/data/relay-log.info  --relay_dir=/data/mydb/db01/data/  --slave_pass=xxx
Tue Nov 19 20:52:42 2013 - [info]   Connecting to [email protected](192.168.17.199:22)..
 Checking slave recovery environment settings..
   Opening /data/mydb/db01/data/relay-log.info ... ok.
   Relay log found at /data/mydb/db01/data, up to relay-bin.000002
   Temporary relay log file is /data/mydb/db01/data/relay-bin.000002
   Testing mysql connection and privileges.. done.
   Testing mysqlbinlog output.. done.
   Cleaning up test file(s).. done.
Tue Nov 19 20:52:42 2013 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user='king' --slave_host=192.168.17.201 --slave_ip=192.168.17.201 --slave_port=3306 --workdir=/var/tmp --target_version=5.5.33-rel31.1-log --manager_version=0.54 --relay_log_info=/data/mydb/db01/data/relay-log.info  --relay_dir=/data/mydb/db01/data/  --slave_pass=xxx
Tue Nov 19 20:52:42 2013 - [info]   Connecting to [email protected](192.168.17.201:22)..
 Checking slave recovery environment settings..
   Opening /data/mydb/db01/data/relay-log.info ... ok.
   Relay log found at /data/mydb/db01/data, up to relay-bin.000002
   Temporary relay log file is /data/mydb/db01/data/relay-bin.000002
   Testing mysql connection and privileges.. done.
   Testing mysqlbinlog output.. done.
   Cleaning up test file(s).. done.
Tue Nov 19 20:52:42 2013 - [info] Slaves settings check done.
Tue Nov 19 20:52:42 2013 - [info]
192.168.17.200 (current master)
+--192.168.17.199
+--192.168.17.201

Tue Nov 19 20:52:42 2013 - [warning] master_ip_failover_script is not defined.
Tue Nov 19 20:52:42 2013 - [warning] shutdown_script is not defined.
Tue Nov 19 20:52:42 2013 - [info] Set master ping interval 1 seconds.
Tue Nov 19 20:52:42 2013 - [warning] secondary_check_script is not defined. It is highly recommended setting it to check master reachability from two or more routes.
Tue Nov 19 20:52:42 2013 - [info] Starting ping health check on 192.168.17.200(192.168.17.200:3306)..
Tue Nov 19 20:52:42 2013 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..