注:阅读本文前请先参考一下mha官方站点的内容,之后安装和部署的过程遇到一些VIP切换的问题时可以参考本文。
一。环境
操作系统:Redhat 6.3
MySQL:MySQL 5.5
MHA:mha4mysql-manager-0.55,mha4mysql-node-0.54
perl:ActivePerl-5.16
二。安装
1.安装node(装有mysql实例的所有机器上安装,包括mha manager机器)
a.安装依赖的perl module
#安装cpan
#yum install cpan -y
依赖的包(可以通过下载的tar.gz包中的Makefile.PL找到)
YAML DBD::mysql |
依次安装如上moudle
命令参考如下(假设如上列表存在于文件/tmp/mha.list):
#for i in $(cat /tmp/mha.list);do yes| cpan $i;done
发布yes命令可以跳过无数次的yes确认
如果安装失败,可以尝试force安装.
a.下载和安装mha node
#cd /tmp/
#wget https://mysql-master-ha.googlecode.com/files/mha4mysql-node-0.54.tar.gz
#tar zxvf mha4mysql-node-0.54.tar.gz
#cd /tmp/mha4mysql-node-0.54
#perl Makefile.PL
#make
#make install
make install Installing /usr/local/share/perl5/MHA/BinlogPosFinderXid.pm Installing /usr/local/share/perl5/MHA/SlaveUtil.pm Installing /usr/local/share/perl5/MHA/BinlogPosFindManager.pm Installing /usr/local/share/perl5/MHA/BinlogManager.pm Installing /usr/local/share/perl5/MHA/NodeUtil.pm Installing /usr/local/share/perl5/MHA/BinlogHeaderParser.pm Installing /usr/local/share/perl5/MHA/BinlogPosFinderElp.pm Installing /usr/local/share/perl5/MHA/BinlogPosFinder.pm Installing /usr/local/share/perl5/MHA/NodeConst.pm Installing /usr/local/share/man/man1/filter_mysqlbinlog.1 Installing /usr/local/share/man/man1/save_binary_logs.1 Installing /usr/local/share/man/man1/apply_diff_relay_logs.1 Installing /usr/local/share/man/man1/purge_relay_logs.1 Installing /usr/local/bin/purge_relay_logs Installing /usr/local/bin/save_binary_logs Installing /usr/local/bin/filter_mysqlbinlog Installing /usr/local/bin/apply_diff_relay_logs Appending installation info to /usr/lib64/perl5/perllocal.pod |
2.安装MHA Manager(只安装在管理机即可)
建议:安装在独立的服务器,不建议安装在线上MySQL 服务器,因为假设安装在线上的master的服务器上,该服务器的OS宕机,将不能执行迁移该maste实例.
a.安装依赖的perl module
#安装cpan
#yum install cpan -y
依赖的包(可以通过下载的tar.gz包中的Makefile.PL找到
YAML DBI DBD::mysql Time::HiRes Config::Tiny Log::Dispatch Parallel::ForkManager MHA::NodeConst |
依次安装如下上moudle
命令参考如下(假设如上列表存在于文件/tmp/mha.list):
#for i in $(cat /tmp/mha.list);do yes| cpan $i;done
发布yes命令可以跳过无数次的yes确认
如果安装失败,可以尝试force安装.
a.下载安装mha manager
#cd /tmp
#wget https://mysql-master-ha.googlecode.com/files/mha4mysql-manager-0.55.tar.gz
#tar zxvf mha4mysql-manager-0.55.tar.gz
#cd mha4mysql-manager-0.55
#perl Makefile.PL
# make
#make install
#make install /usr/bin/perl "-Iinc" Makefile.PL --config= --installdeps=MHA::NodeConst,0 *** Installing dependencies... *** Installing MHA::NodeConst... CPAN: Storable loaded ok (v2.20) Going to read '/root/.cpan/Metadata' Database was generated on Fri, 21 Mar 2014 05:17:02 GMT *** Could not find a version 0 or above for MHA::NodeConst; skipping. *** Module::AutoInstall installation finished. Installing /usr/local/share/perl5/MHA/ManagerUtil.pm Installing /usr/local/share/perl5/MHA/HealthCheck.pm Installing /usr/local/share/perl5/MHA/Config.pm Installing /usr/local/share/perl5/MHA/ServerManager.pm Installing /usr/local/share/perl5/MHA/ManagerConst.pm Installing /usr/local/share/perl5/MHA/FileStatus.pm Installing /usr/local/share/perl5/MHA/ManagerAdmin.pm Installing /usr/local/share/perl5/MHA/MasterFailover.pm Installing /usr/local/share/perl5/MHA/ManagerAdminWrapper.pm Installing /usr/local/share/perl5/MHA/MasterRotate.pm Installing /usr/local/share/perl5/MHA/MasterMonitor.pm Installing /usr/local/share/perl5/MHA/Server.pm Installing /usr/local/share/perl5/MHA/SSHCheck.pm Installing /usr/local/share/perl5/MHA/DBHelper.pm Installing /usr/local/share/man/man1/masterha_check_repl.1 Installing /usr/local/share/man/man1/masterha_check_ssh.1 Installing /usr/local/share/man/man1/masterha_check_status.1 Installing /usr/local/share/man/man1/masterha_conf_host.1 Installing /usr/local/share/man/man1/masterha_manager.1 Installing /usr/local/share/man/man1/masterha_master_monitor.1 Installing /usr/local/share/man/man1/masterha_master_switch.1 Installing /usr/local/share/man/man1/masterha_secondary_check.1 Installing /usr/local/share/man/man1/masterha_stop.1 Installing /usr/local/bin/masterha_stop Installing /usr/local/bin/masterha_conf_host Installing /usr/local/bin/masterha_check_repl Installing /usr/local/bin/masterha_check_status Installing /usr/local/bin/masterha_master_monitor Installing /usr/local/bin/masterha_check_ssh Installing /usr/local/bin/masterha_master_switch Installing /usr/local/bin/masterha_secondary_check Installing /usr/local/bin/masterha_manager Appending installation info to /usr/lib64/perl5/perllocal.pod |
三.通过集成的perl脚本实现VIP漂移.
1.涉及到两个文件
master_ip_failover
master_ip_online_change
两个文件的配置保持相同,具体如下:
my $vip = '10.0.0.100/24'; # Virtual IP my $key = "3306"; my $nic = "em1"; my $ssh_start_vip = "sudo /sbin/ifconfig $nic:$key $vip"; my $ssh_stop_vip = "sudo /sbin/ifconfig $nic:$key down"; |
10.0.0.100就是虚拟ip,分别在master宕机和手动切换master时漂移.
3306 可以自己喜好随意指定,只要和其它虚拟ip冲突即可.我的环境是一个服务器多台实例,所以会根据端口来作为虚拟ip的名字
em1就是网卡的设备号,通过ifconfig可以看到.
四。Running MHA Manager from daemontools
Currently MHA Manager process does not run as a daemon. If failover completed successfully or the master process was by accident, the manager stops working. To run as a daemon, daemontool. or any external daemon program can be used. Here is an example to run from daemontools.
1. 安装daemontools
1).下载daemontools
wget ftp://ftp.pbone.net/mirror/ftp5.gwdg.de/pub/opensuse/repositories/utilities/RHEL_6/x86_64/daemontools-doc-0.76-2.2.x86_64.rpm
2).
2.配置daemontools(参考http://www.linuxquestions.org/questions/linux-server-73/svc-warning-unable-to-control-service-qmail-smtpd-file-does-not-exist-948161/)
1)增加文件 /etc/init/svscan_test1.conf,如果有多个MHA集群,请增加多个文件:
start on runlevel [345]
respawn
exec masterha_manager --global_conf=/etc/mha/masterha_default.cnf --conf=/etc/mha/masterha_test1/conf/app.conf --wait_on_monitor_error=60 --w从工 on_failover_error=60 >> /var/log/masterha/test1/app.log 2>&1
2)运行如下linux命令
# initctl reload-configuration
# initctl start svscan_test1
如果没有按如下操作,将会报错如下
svc: warning: unable to control /service/masterha_test1: file does not exist 或 svc: warning: unable to control /service/masterha_test1: supervise not running |
五.注意事项:
1.mha manager并会初始化虚拟ip,只会切换IP,所以请在开启mha manager前,在master增加VIP,具体命令如下:
#/sbin/ifconfig em1:3306 10.0.0.100/24
2.如果master以前是slave,并且slave线程停止掉了,请reset掉,不然启动manager时会报错,reset slave的命令如下:
mysql> reset slave all;
五.日志:
最后show 一下正常启动manager的log
Tue Mar 25 18:16:38 2014 - [info] MHA::MasterMonitor version 0.55. Tue Mar 25 18:16:38 2014 - [info] Dead Servers: Tue Mar 25 18:16:38 2014 - [info] Alive Servers: Tue Mar 25 18:16:38 2014 - [info] db3dg.example.com(10.0.0.177:3306) Tue Mar 25 18:16:38 2014 - [info] db4dg.example.com(10.0.0.178:3306) Tue Mar 25 18:16:38 2014 - [info] db5dg.example.com(10.0.0.179:3306) Tue Mar 25 18:16:38 2014 - [info] Alive Slaves: Tue Mar 25 18:16:38 2014 - [info] db4dg.example.com(10.0.0.178:3306) Version=5.5.31-log (oldest major version between slaves) log-bin:enabled Tue Mar 25 18:16:38 2014 - [info] Replicating from 10.0.0.177(10.0.0.177:3306) Tue Mar 25 18:16:38 2014 - [info] Primary candidate for the new Master (candidate_master is set) Tue Mar 25 18:16:38 2014 - [info] db5dg.example.com(10.0.0.179:3306) Version=5.5.31-log (oldest major version between slaves) log-bin:enabled Tue Mar 25 18:16:38 2014 - [info] Replicating from 10.0.0.177(10.0.0.177:3306) Tue Mar 25 18:16:38 2014 - [info] Primary candidate for the new Master (candidate_master is set) Tue Mar 25 18:16:38 2014 - [info] Current Alive Master: db3dg.example.com(10.0.0.177:3306) Tue Mar 25 18:16:38 2014 - [info] Checking slave configurations.. Tue Mar 25 18:16:38 2014 - [info] read_only=1 is not set on slave db4dg.example.com(10.0.0.178:3306). Tue Mar 25 18:16:38 2014 - [warning] relay_log_purge=0 is not set on slave db4dg.example.com(10.0.0.178:3306). Tue Mar 25 18:16:38 2014 - [info] read_only=1 is not set on slave db5dg.example.com(10.0.0.179:3306). Tue Mar 25 18:16:38 2014 - [warning] relay_log_purge=0 is not set on slave db5dg.example.com(10.0.0.179:3306). Tue Mar 25 18:16:38 2014 - [info] Checking replication filtering settings.. Tue Mar 25 18:16:38 2014 - [info] binlog_do_db= , binlog_ignore_db= Tue Mar 25 18:16:38 2014 - [info] Replication filtering check ok. Tue Mar 25 18:16:38 2014 - [info] Starting SSH connection tests.. Tue Mar 25 18:16:40 2014 - [info] All SSH connection tests passed successfully. Tue Mar 25 18:16:40 2014 - [info] Checking MHA Node version.. Tue Mar 25 18:16:40 2014 - [info] Version check ok. Tue Mar 25 18:16:40 2014 - [info] Checking SSH publickey authentication settings on the current master.. Tue Mar 25 18:16:40 2014 - [info] HealthCheck: SSH to db3dg.example.com is reachable. Tue Mar 25 18:16:41 2014 - [info] Master MHA Node version is 0.54. Tue Mar 25 18:16:41 2014 - [info] Checking recovery script configurations on the current master.. Tue Mar 25 18:16:41 2014 - [info] Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/installed/mysql_offline_multi/data --output_file=/data/installed/mysql_offline_multi/masterha/save_binary_logs_test --manager_version=0.55 --start_file=mysql-bin.000159 Tue Mar 25 18:16:41 2014 - [info] Connecting to [email protected](db3dg.example.com).. Warning: Permanently added 'db3dg.example.com' (RSA) to the list of known hosts. Creating /data/installed/mysql_offline_multi/masterha if not exists.. ok. Checking output directory is accessible or not.. ok. Binlog found at /data/installed/mysql_offline_multi/data, up to mysql-bin.000159 Tue Mar 25 18:16:41 2014 - [info] Master setting check done. Tue Mar 25 18:16:41 2014 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers.. Tue Mar 25 18:16:41 2014 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='mha' --slave_host=db4dg.example.com --slave_ip=10.0.0.178 --slave_port=3306 --workdir=/data/installed/mysql_offline_multi/masterha --target_version=5.5.31-log --manager_version=0.55 --relay_log_info=/data/installed/mysql_offline_multi/data/relay-log.info --relay_dir=/data/installed/mysql_offline_multi/data/ --slave_pass=xxx Tue Mar 25 18:16:41 2014 - [info] Connecting to [email protected](db4dg.example.com:22).. Checking slave recovery environment settings.. Opening /data/installed/mysql_offline_multi/data/relay-log.info ... ok. Relay log found at /data/installed/mysql_offline_multi/data, up to mysql-relay-bin.000447 Temporary relay log file is /data/installed/mysql_offline_multi/data/mysql-relay-bin.000447 Testing mysql connection and privileges.. done. Testing mysqlbinlog output.. done. Cleaning up test file(s).. done. Tue Mar 25 18:16:41 2014 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='mha' --slave_host=db5dg.example.com --slave_ip=10.0.0.179 --slave_port=3306 --workdir=/data/installed/mysql_offline_multi/masterha --target_version=5.5.31-log --manager_version=0.55 --relay_log_info=/data/installed/mysql_offline_multi/data/relay-log.info --relay_dir=/data/installed/mysql_offline_multi/data/ --slave_pass=xxx Tue Mar 25 18:16:41 2014 - [info] Connecting to [email protected](db5dg.example.com:22).. Checking slave recovery environment settings.. Opening /data/installed/mysql_offline_multi/data/relay-log.info ... ok. Relay log found at /data/installed/mysql_offline_multi/data, up to mysql-relay-bin.000447 Temporary relay log file is /data/installed/mysql_offline_multi/data/mysql-relay-bin.000447 Testing mysql connection and privileges.. done. Testing mysqlbinlog output.. done. Cleaning up test file(s).. done. Tue Mar 25 18:16:42 2014 - [info] Slaves settings check done. Tue Mar 25 18:16:42 2014 - [info] db3dg.example.com (current master) +--db4dg.example.com +--db5dg.example.com Tue Mar 25 18:16:42 2014 - [info] Checking master_ip_failover_script status: Tue Mar 25 18:16:42 2014 - [info] /etc/mha/masterha_offline_multi/scripts/master_ip_failover --command=status --ssh_user=root --orig_master_host=db3dg.example.com --orig_master_ip=10.0.0.177 --orig_master_port=3306 IN SCRIPT TEST====sudo /sbin/ifconfig em1:3306 down==sudo /sbin/ifconfig em1:3306 10.0.0.210/24=== Checking the Status of the script.. OK Tue Mar 25 18:16:42 2014 - [info] OK. Tue Mar 25 18:16:42 2014 - [warning] shutdown_script is not defined. Tue Mar 25 18:16:42 2014 - [info] Set master ping interval 3 seconds. Tue Mar 25 18:16:42 2014 - [info] Set secondary check script: masterha_secondary_check -s db3dg.example.com -s db4dg.example.com -s db5dg.example.com Tue Mar 25 18:16:42 2014 - [info] Starting ping health check on db3dg.example.com(10.0.0.177:3306).. Tue Mar 25 18:16:42 2014 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond.. |
六.其它配置:
如上笔记是我遇到问题的地方,记录了下来,从而可以形成可用的mha解决方案.
有关MHA的其它配置,请参考附件中我的project及mha 官方wiki.
我的project如下:
https://code.google.com/p/mha-config
如果有疑问和建议,请留言.
七。其它细节:
1.确保master及每个slave的配置文件及show global variables中关闭自动清除relay log 的功能。
relay_log_purge=0
自动清除,将如下行加入到/etc/crontab文件中,根据mysql实例端口进行更改。
0 5 * * * root echo "-----33-06----" >>/var/log/masterha/purge_relay_logs.log && /usr/local/bin/purge_relay_logs --host=127.0.0.1 --port=3306 --user=mha --password='abcd' --disable_relay_log_purge >> /var/log/masterha/purge_relay_logs.log 2>&1abc
2.配置文件masterha_default.cnf中的mysql用户密码部分不要含有特殊符号,否则mha manager将无法登录mysql进行failover,最恶心的是vip也无法漂移,但master/slave仍然能够执行相关操作,料想是scripts/master_ip_failover 这个脚本文件写得不够好。
如果无法漂移vip,/var/log/masterha/test1/app.log将会出现如下报错
Tue Apr 1 15:18:50 2014 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='db4dg.example.com or 10.47.7.178', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-bin.000189', MASTER_LOG_POS=547798028, MASTER_USER='repl', MASTER_PASSWORD='xxx'; Tue Apr 1 15:18:50 2014 - [info] Executing master IP activate script: Tue Apr 1 15:18:50 2014 - [info] /etc/mha/masterha_offline_multi/scripts/master_ip_failover --command=start --ssh_user=root --orig_master_host=db3dg.example.com --orig_master_ip=10.47.7.177 --orig_master_port=3306 --new_master_host=db4dg.example.com --new_master_ip=10.47.7.178 --new_master_port=3306 --new_master_user='mha' --new_master_password='Adyav3984\#' DBI connect(';host=10.0.0.100;port=33306;mysql_connect_timeout=4','mha',...) failed: Access denied for user 'mha'@'10.47.7.217' (using password: YES) at /usr/local/share/perl5/MHA/DBHelper.pm line 181 at /etc/mha/masterha_offline_multi/scripts/master_ip_failover line 69 IN SCRIPT TEST====sudo /sbin/ifconfig em1:3306 down==sudo /sbin/ifconfig em1:3306 10.0.0100/24=== Tue Apr 1 15:18:50 2014 - [error][/usr/local/share/perl5/MHA/MasterFailover.pm, ln1262] Failed to activate master IP address for db4dg.example.com with return code 10:0 |
3.在线切换命令,如下需求:
1.原master作为slave运行且alive.
2.非交互。
3.端口非默认3306而是9306
切换命令
masterha_master_switch --orig_master_is_new_slave --master_state=alive --global_conf=/etc/mha/masterha_default.cnf --conf=/etc/mha/masterha_test/conf/app.conf --interactive=0 --new_master_host=test1 --new_master_port=9306 |
注:1.如果MySQL端口不是默认端口,原来的master将无法Change到新的master.具体报错如下:
Mon Apr 14 16:33:48 2014 - [info] Executed CHANGE MASTER. Mon Apr 14 16:34:02 2014 - [error][/usr/local/share/perl5/MHA/Server.pm, ln744] Slave could not be started on test2(192.168.3.107:9306)! Check slave status. Mon Apr 14 16:34:02 2014 - [error][/usr/local/share/perl5/MHA/Server.pm, ln817] Starting slave IO/SQL thread on test2(192.168.3.107:9306) failed! Mon Apr 14 16:34:02 2014 - [error][/usr/local/share/perl5/MHA/MasterRotate.pm, ln561] Failed! Mon Apr 14 16:34:02 2014 - [error][/usr/local/share/perl5/MHA/MasterRotate.pm, ln590] Switching master to test1(192.168.3.12:9306) done, but switching slaves partially failed. |
八。自动切换后需要处理的事宜:
1.启动原来的master作为slave运行,Change的信息(比如binlog文件和position)可以查看mha的日志。
2.调整监控,比如zabbix等slave监控。
3.其它相关。
参考:
https://code.google.com/p/mysql-master-ha/wiki/Installation
http://cr.yp.to/daemontools.html