Master:10.1.5.8:3306
Slave1:10.1.5.9:3306 (候选master)
Slave2:10.1.5.195:3306
log-bin = mysql-bin
log-bin-index = mysql-bin.index
read_only=1
relay_log_purge=0 #(一主一丛不需要此项,两从及以上建议开次参数,防止切换为成主库的从库自动删除中继日志后,无法给其他从库应用这部分日志)
步骤略。
可切换成主库的从库,建议配置成半同步。
详细的主从配置见:
http://blog.csdn.net/lichangzai/article/details/50423906
manager:10.1.5.8
node1:10.1.5.8
node2:10.1.5.9
node3:10.1.5.195
注:manager节点可以安装独立的服务器上,本例为了节省机器,manager安装在了主库(10.1.5.8)上.
详细配置见:“配置ssh免密码连入”章节
http://blog.csdn.net/lichangzai/article/details/8206834
下载网址:
https://code.google.com/p/mysql-master-ha/wiki/Downloads?tm=2
下载rpm包或tarall均可,建议用rpm包,因为安装简单。
在manager和node的所有节点均需安装MHA Node。
rpm安装方式:
# yum install perl-DBD-MySQL
# rpm -ivh mha4mysql-node-0.56-0.el5.noarch.rpm
tarall的安装方式:
tar -zxfmha4mysql-node-0.56.tar.gz
cd mha4mysql-node-0.56
perl Makefile.PL
make
make install
# yum install perl-DBD-MySQLperl-Config-Tiny perl-Log-Dispatch perl-Parallel-ForkManager
# rpm -ivh mha4mysql-node-0.56-0.el5.noarch.rpm
# rpm -ivh mha4mysql-manager-0.56-0.el5.noarch.rpm
注:
上面有些包需要先安装附加软件包(EPEL)才能使用yum安装,
安装EPEL源详细见:“yum安装”章节
http://blog.csdn.net/lichangzai/article/details/39272469
MHA Manager另一种安装方式:
MHA Manager 0.56 tarball源码安装
tar -zxf mha4mysql-manager-0.56.tar.gz
cd mha4mysql-manager-0.56
perl Makefile.PL
make
make install
各参数含义:https://code.google.com/p/mysql-master-ha/wiki/Parameters#no_master
MHA Manager端配置,建议使用root操作系统用户执行,因为涉及到vip 启停。
# mkdir -p /etc/masterha/app1
# vi /etc/masterha/app1/app1.cnf
[server default]
manager_workdir=/etc/masterha/app1
manager_log=/etc/masterha/app1/manager.log
user=root
password=root
ssh_user=root
repl_user=repl_user
repl_password=licz
#ping_interval=10
#master_ip_failover_script=/etc/masterha/app1/master_ip_failover #master failover时执行,不配置vip时不用配
#shutdown_script=/etc/masterha/power_manager
#report_script=/etc/masterha/app1/send_report #master failover时执行,可选
#master_ip_online_change_script=/etc/masterha/app1/master_ip_online_change #masterswitchover时执行,不配置vip时不用配
[server1]
hostname=10.1.5.8
port=3306
master_binlog_dir=/var/lib/mysql
candidate_master=1
check_repl_delay=0
[server2]
hostname=10.1.5.9
port=3306
master_binlog_dir=/var/lib/mysql
candidate_master=1 #如果候选master有延迟的话,relay日志超过100m,failover切换不能成功,加上此参数后会忽略延迟日志大小。
check_repl_delay=0
[server3]
hostname=10.1.5.195
port=3306
master_binlog_dir=/var/lib/mysql
ignore_fail=1 #如果这个节点挂了,mha将不可用,加上这个参数,slave挂了一样可以用
no_master=1 #从不将这台主机转换为master
masterha_check_ssh --conf=/etc/masterha/app1/app1.cnf
[root@host8 ~]#masterha_check_ssh --conf=/etc/masterha/app1/app1.cnf
Tue Jan 5 17:16:40 2016 - [warning] Globalconfiguration file /etc/masterha_default.cnf not found. Skipping.
Tue Jan 5 17:16:40 2016 - [info] Reading applicationdefault configuration from /etc/masterha/app1/app1.cnf..
Tue Jan 5 17:16:40 2016 - [info] Reading serverconfiguration from /etc/masterha/app1/app1.cnf..
Tue Jan 5 17:16:40 2016 - [info] Starting SSHconnection tests..
Tue Jan 5 17:16:41 2016 - [debug]
Tue Jan 5 17:16:40 2016 - [debug] Connecting via SSH [email protected](10.1.5.8:22) to [email protected](10.1.5.9:22)..
Tue Jan 5 17:16:40 2016 - [debug] ok.
Tue Jan 5 17:16:40 2016 - [debug] Connecting via SSH [email protected](10.1.5.8:22) to [email protected](10.1.5.195:22)..
......
Tue Jan 5 17:16:41 2016 - [debug] Connecting via SSH [email protected](10.1.5.195:22) to [email protected](10.1.5.9:22)..
Tue Jan 5 17:16:41 2016 - [debug] ok.
Tue Jan 5 17:16:41 2016 - [info] All SSH connectiontests passed successfully.
成功!
masterha_check_repl --conf=/etc/masterha/app1/app1.cnf
# masterha_check_repl --conf=/etc/masterha/app1/app1.cnf
Tue Jan 5 10:50:16 2016 - [warning] Globalconfiguration file /etc/masterha_default.cnf not found. Skipping.
Tue Jan 5 10:50:16 2016 - [info] Reading applicationdefault configuration from /etc/masterha/app1/app1.cnf..
Tue Jan 5 10:50:16 2016 - [info] Reading serverconfiguration from /etc/masterha/app1/app1.cnf..
Tue Jan 5 10:50:16 2016 - [info] MHA::MasterMonitorversion 0.56.
Tue Jan 5 10:50:16 2016 - [info] GTID failover mode =0
Tue Jan 5 10:50:16 2016 - [info] Dead Servers:
Tue Jan 5 10:50:16 2016 - [info] Alive Servers:
Tue Jan 5 10:50:16 2016 - [info] 10.1.5.8(10.1.5.8:3306)
Tue Jan 5 10:50:16 2016 - [info] 10.1.5.9(10.1.5.9:3306)
Tue Jan 5 10:50:16 2016 - [info] 10.1.5.195(10.1.5.195:3306)
Tue Jan 5 10:50:16 2016 - [info] Alive Slaves:
Tue Jan 5 10:50:16 2016 - [info] 10.1.5.9(10.1.5.9:3306) Version=5.6.28-log (oldest major versionbetween slaves) log-bin:enabled
.......
Tue Jan 5 13:34:16 2016 - [info] Slaves settingscheck done.
Tue Jan 5 13:34:16 2016 - [info]
10.1.5.8(10.1.5.8:3306)(current master)
+--10.1.5.9(10.1.5.9:3306)
+--10.1.5.195(10.1.5.195:3306)
Tue Jan 5 13:34:16 2016 - [info] Checking replicationhealth on 10.1.5.9..
Tue Jan 5 13:34:16 2016 - [info] ok.
Tue Jan 5 13:34:16 2016 - [info] Checking replicationhealth on 10.1.5.195..
Tue Jan 5 13:34:16 2016 - [info] ok.
Tue Jan 5 13:34:16 2016 - [warning]master_ip_failover_script is not defined.
Tue Jan 5 13:34:16 2016 - [warning] shutdown_scriptis not defined.
Tue Jan 5 13:34:16 2016 - [info] Got exit code 0 (Notmaster dead).
MySQL Replication Health is OK.
成功!
注意:期间可能会遇到一些问题,可见另一文章的问题总结:
① 检查是否有下列文件,有则删除。
发生主从切换后,MHAmanager服务会自动停掉,且在manager_workdir目录下面生成文件app1.failover.complete,若要启动MHA,必须先确保无此文件)
# ll /etc/masterha/app1/app1.failover.complete
# ll /etc/masterha/app1/app1.failover.error
② 检查MHA当前置:
# masterha_check_repl --conf=/etc/masterha/app1/app1.cnf
③ 启动MHA:
# nohup masterha_manager --conf=/etc/masterha/app1/app1.cnf > /etc/masterha/app1/mha_manager.log 2>&1 &
当有slave 节点宕掉时,默认是启动不了的,加上 --ignore_fail_on_start 即使有节点宕掉也能启动MHA,如下:
# nohup masterha_manager --conf=/etc/masterha/app1/app1.cnf --ignore_fail_on_start >/etc/masterha/app1/mha_manager.log 2>&1 &
④ 检查状态:
# masterha_check_status --conf=/etc/masterha/app1/app1.cnf
⑤ 检查日志:
#tail -f /etc/masterha/app1/manager.log
⑥ 主从切换后续工作
主库切换后,把原主库修复成新从库,然后重新执行以上5步。原主库数据文件完整的情况下,可通过以下方式找出最后执行的CHANGE MASTER命令:
# grep "CHANGE MASTER TO MASTER" /etc/masterha/app1/manager.log | tail -1
CHANGE MASTER TO MASTER_HOST='10.1.5.9',MASTER_PORT=3306, MASTER_LOG_FILE='master-bin.000001', MASTER_LOG_POS=120,MASTER_USER='repl_user', MASTER_PASSWORD='xxx';
--最后启动新从库
# start slave;
# show slave status\G
应用场景1:master dead后,MHA当时已经开启,候选Master库(Slave)会自动failover为Master.
--shutdown mysql master node
# service mysql stop
--check new master node
mysql> show master status\G;
*************************** 1.row ***************************
File: master-bin.000001
Position: 330
--check slave node
mysql> show slave status\G;
*************************** 1.row ***************************
Slave_IO_State: Waiting formaster to send event
Master_Host: 10.1.5.9
Master_User: repl_user
Master_Port: 3306
Connect_Retry: 60
Master_Log_File:master-bin.000001
--check manager.log
[root@host8 ~]# tail -100f /etc/masterha/app1/manager.log
----- Failover Report -----
app1: MySQL Master failover10.1.5.8(10.1.5.8:3306) to 10.1.5.9(10.1.5.9:3306) succeeded
Master 10.1.5.8(10.1.5.8:3306)is down!
Check MHA Manager logs athost8.localdomain:/etc/masterha/app1/manager.log for details.
Startedautomated(non-interactive) failover.
The latest slave10.1.5.9(10.1.5.9:3306) has all relay logs for recovery.
Selected10.1.5.9(10.1.5.9:3306) as a new master.
10.1.5.9(10.1.5.9:3306): OK:Applying all logs succeeded.
10.1.5.195(10.1.5.195:3306): Thishost has the latest relay log events.
Generating relay diff filesfrom the latest slave succeeded.
10.1.5.195(10.1.5.195:3306):OK: Applying all logs succeeded. Slave started, replicating from10.1.5.9(10.1.5.9:3306)
10.1.5.9(10.1.5.9:3306):Resetting slave info succeeded.
Master failover to10.1.5.9(10.1.5.9:3306) completed successfully.
--最后把原主库修复成一个新的slave
# grep "CHANGE MASTER TO MASTER" /etc/masterha/app1/manager.log | tail -1
CHANGE MASTER TO MASTER_HOST='10.1.5.9',MASTER_PORT=3306, MASTER_LOG_FILE='master-bin.000001', MASTER_LOG_POS=120,MASTER_USER='repl_user', MASTER_PASSWORD='xxx';
mysql> CHANGE MASTER TO MASTER_HOST='10.1.5.9', MASTER_PORT=3306,MASTER_LOG_FILE='master-bin.000001', MASTER_LOG_POS=120,MASTER_USER='repl_user', MASTER_PASSWORD='licz';
Query OK, 0 rows affected, 2warnings (0.17 sec)
mysql> start slave;
mysql> show slave status\G;
*************************** 1.row ***************************
Slave_IO_State: Waiting formaster to send event
Master_Host: 10.1.5.9
Master_User: repl_user
应用场景2:master dead,但是当时MHA没有开启,可以通过手工failover。
1.检查是否有下列文件,有则删除。
# ll /etc/masterha/app1/app1.failover.complete
# ll /etc/masterha/app1/app1.failover.error
2. 如果MHA在运行,需先停止MHA: masterha_stop--conf=/etc/masterha/app1/app1.cnf
3. 检查MHA当前置: masterha_check_repl --conf=/etc/masterha/app1/app1.cnf
4. 手动切换:masterha_master_switch--conf=/etc/masterha/app1/app1.cnf --master_state=dead --dead_master_host=10.1.5.9 --dead_master_port=3306
# 接以上的
以下为切换时指定了new_master_host和new_master_port,如果不指定new_master_host,则根据配置文件app1.cnf选出new_master_host,但new_master_port默认是3306。
# masterha_master_switch --conf=/etc/masterha/app1/app1.cnf --master_state=dead --dead_master_host=10.1.5.9 --dead_master_port=3306 --new_master_host=10.1.5.8 --new_master_port=3306
应用场景1:master和slave正常,MHA正常开启,维护操作时(例如更换新主机硬件、添加/删除列或主键)手动在线切换master到其他主机。
1. 如果MHA在运行,需先停止MHA
masterha_stop --conf=/etc/masterha/app1/app1.cnf
2. 检查MHA当前置
masterha_check_repl --conf=/etc/masterha/app1/app1.cnf
3. 手动切换
masterha_master_switch --master_state=alive --conf=/etc/masterha/app1/app1.cnf --orig_master_is_new_slave --running_updates_limit=3600 --interactive=0
注意:执行masterha_master_switch调用的不是master_ip_failover_script脚本,而是master_ip_online_change_script脚本,可把启动和停止VIP放到这个脚本中,如果没有配置VIP,则需要手动执行VIP切换,如下:
ssh root@$orig_master_ip /sbin/ifconfig eth0:1 down
ssh root@$new_master_ip /sbin/ifconfig eth0:1 10.1.5.21/24
以下为切换时指定了new_master_host和new_master_port,如果不指定new_master_host,则根据配置文件app1.cnf选出new_master_host,但new_master_port默认是3306。
masterha_master_switch --master_state=alive --conf=/etc/masterha/app1/app1.cnf --orig_master_is_new_slave --running_updates_limit=3600 --interactive=0--new_master_host=10.1.5.9 --new_master_port=3306
参数 --running_updates_limit 如果现在的master执行写操作的执行时间大于这个参数,或者任何一台slave的Seconds_Behind_Master大于这个参数,那么master switch将自动放弃。默认参数为1s
参数 --interactive=0 非交互切换,建议加上,可大大加快切换速度,加上后库不忙时大概3秒内切换完成。
由于在第一步中,每个slave上设置了参数relay_log_purge=0,所以slave节点需要定期删除中继日志,建议每个slave节点删除中继日志的时间错开。
corntab -e
0 5 * * * /usr/bin/purge_relay_logs --user=root--password=123456 --port=3306 --disable_relay_log_purge >> /var/lib/mysql/purge_relay.log 2>&1
删除中继日志可参考:https://code.google.com/p/mysql-master-ha/wiki/Requirements#Replication_user_must_exist_on_candidate_masters
--查看purge_relay_logs命令帮助
# purge_relay_logs -h
Option h is ambiguous (help,host)
Usage:
purge_relay_logs --user=root--password=rootpass --host=127.0.0.1
See online reference
(http://code.google.com/p/mysql-master-ha/wiki/Requirements#purge_relay_
logs_script) for details.