mha-manager | 192.168.29.134 |
---|---|
mysql-master | 192.168.29.132 |
msyql-slave1 | 192.168.29.131 |
mysql-slave2 | 192.168.29.133 |
MHA(Master High Availability)目前在MySQL高可用方面是一个相对成熟的解决方案,它由日本DeNA公司youshimaton(现就职于Facebook公司)开发,是一套优秀的作为MySQL高可用性环境下故障切换和主从提升的高可用软件。在MySQL故障切换过程中,MHA能做到在0~30秒之内自动完成数据库的故障切换操作,并且在进行故障切换的过程中,MHA能在最大程度上保证数据的一致性,以达到真正意义上的高可用。
该软件由两部分组成:MHA Manager(管理节点)和MHA Node(数据节点)。MHA Manager可以单独部署在一台独立的机器上管理多个master-slave集群,也可以部署在一台slave节点上。MHA Node运行在每台MySQL服务器上,MHA Manager会定时探测集群中的master节点,当master出现故障时,它可以自动将最新数据的slave提升为新的master,然后将所有其他的slave重新指向新的master。整个故障转移过程对应用程序完全透明。
在MHA自动故障切换过程中,MHA试图从宕机的主服务器上保存二进制日志,最大程度的保证数据的不丢失,但这并不总是可行的。例如,如果主服务器硬件故障或无法通过ssh访问,MHA没法保存二进制日志,只进行故障转移而丢失了最新的数据。使用MySQL 5.5的半同步复制,可以大大降低数据丢失的风险。MHA可以与半同步复制结合起来。如果只有一个slave已经收到了最新的二进制日志,MHA可以将最新的二进制日志应用于其他所有的slave服务器上,因此可以保证所有节点的数据一致性。
目前MHA主要支持一主多从的架构,要搭建MHA,要求一个复制集群中必须最少有三台数据库服务器,一主二从,即一台充当master,一台充当备用master,另外一台充当从库,因为至少需要三台服务器,出于机器成本的考虑,淘宝也在该基础上进行了改造,目前淘宝TMHA已经支持一主一从。MHA 适合任何存储引擎, 只要能主从复制的存储引擎它都支持,不限于支持事物的 innodb 引擎。
官方介绍:https://code.google.com/p/mysql-master-ha/
(1)从宕机崩溃的master保存二进制日志事件(binlog events);
(2)识别含有最新更新的slave;
(3)应用差异的中继日志(relay log)到其他的slave;
(4)应用从master保存的二进制日志事件(binlog events);
(5)提升一个slave为新的master;
(6)使其他的slave连接新的master进行复制;
MHA软件由两部分组成,Manager工具包和Node工具包,具体的说明如下。
Manager工具包主要包括以下几个工具:
masterha_check_ssh 检查MHA的SSH配置状况
masterha_check_repl 检查MySQL复制状况
masterha_manger 启动MHA
masterha_check_status 检测当前MHA运行状态
masterha_master_monitor 检测master是否宕机
masterha_master_switch 控制故障转移(自动或者手动)
masterha_conf_host 添加或删除配置的server信息
Node工具包(这些工具通常由MHA Manager的脚本触发,无需人为操作)主要包括以下几个工具:
save_binary_logs 保存和复制master的二进制日志
apply_diff_relay_logs 识别差异的中继日志事件并将其差异的事件应用于其他的slave
filter_mysqlbinlog 去除不必要的ROLLBACK事件(MHA已不再使用这个工具)
purge_relay_logs 清除中继日志(不会阻塞SQL线程)
原来的架构132和131已经做了主从配置,这里只需要把133配置为mysql-slave2,131配置改配置文件my.cnf
log-bin=mysql-slave1,具体配置方法参考web架构之nginx反代+LAP负载均衡+mysql主从(一),这里就不细说了,三台主机配置分别如下:
两台slave服务器设置read_only(从库对外提供读服务,只所以没有写进配置文件,是因为slave随时会提升为master);
[root@sql-slave1 ~]# mysql -uroot -p123.com -e 'set global read_only=1'
[root@sql-slave2 ~]# mysql -uroot -p123.com -e 'set global read_only=1'
创建监控用户,在mysql主从上都执行:
MariaDB [(none)]> grant all privileges on *.* to 'root'@'192.168.29.%' identified by '123.com';
MariaDB [(none)]> flush privileges;
[root@sql-master ~]# ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa):
/root/.ssh/id_rsa already exists.
Overwrite (y/n)? n
[root@sql-master ~]# ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa):
/root/.ssh/id_rsa already exists.
Overwrite (y/n)? y
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:tJCy92OdIanewneDTGoz0oSsFymFGRaN6nr/XWF5XQ0 root@sql-master
The key's randomart image is:
+---[RSA 2048]----+
| .+ E |
| + . . ..|
| o +. o . o|
|. o .o o o . . . |
|. o.o. S = . . |
| .. =..o.+ = |
|. o =.++.+ |
|. o o.O++oo |
| . o.+o=o. . |
+----[SHA256]-----+
[root@sql-master ~]# ssh-copy-id 192.168.29.132
[root@sql-master ~]# ssh-copy-id 192.168.29.132
[root@sql-master ~]# ssh-copy-id 192.168.29.132
其他主机重复上面操作。
因为网络问题,我这是有用到的包均是本地上传。
1、所有主机安装yum扩展源和相关依赖。
[root@localhost ~]# rpm -ivh epel-release-latest-7.noarch.rpm
[root@localhost ~]# yum clean all
[root@localhost ~]# yum list
[root@localhost ~]# yum -y install perl-DBD-MySQL perl-Config-Tiny perl-Log-Dispatch perl-Parallel-ForkManager --skip-broken
[root@localhost ~]# rpm -ivh mha4mysql-node-0.56-0.el6.noarch.rpm
安装完成后会在/usr/bin/目录下生成以下脚本文件:
[root@localhost ~]# cd /usr/bin/
[root@localhost bin]# ll apply_* filter_* purge* save*
-rwxr-xr-x. 1 root root 16367 4月 1 2014 apply_diff_relay_logs
-rwxr-xr-x. 1 root root 4807 4月 1 2014 filter_mysqlbinlog
-rwxr-xr-x. 1 root root 8261 4月 1 2014 purge_relay_logs
-rwxr-xr-x. 1 root root 7525 4月 1 2014 save_binary_logs
2、mha-manager安装管理包
[root@localhost ~]# rpm -ivh mha4mysql-manager-0.56-0.el6.noarch.rpm
#若有提示缺少依赖包执行:
yum install perl-DBD-MySQL perl-Config-Tiny perl-Log-Dispatch perl-Parallel-ForkManager perl-Time-HiRes perl-ExtUtils-CBuilder perl-ExtUtils-MakeMaker perl-CPAN -y
[root@localhost ~]# rpm -ivh mha4mysql-manager-0.56-0.el6.noarch.rpm
准备中... ################################# [100%]
正在升级/安装...
1:mha4mysql-manager-0.56-0.el6 ################################# [100%]
查看安装结果
[root@localhost ~]# ll /usr/bin/master*
-rwxr-xr-x. 1 root root 1995 4月 1 2014 /usr/bin/masterha_check_repl
-rwxr-xr-x. 1 root root 1779 4月 1 2014 /usr/bin/masterha_check_ssh
-rwxr-xr-x. 1 root root 1865 4月 1 2014 /usr/bin/masterha_check_status
-rwxr-xr-x. 1 root root 3201 4月 1 2014 /usr/bin/masterha_conf_host
-rwxr-xr-x. 1 root root 2517 4月 1 2014 /usr/bin/masterha_manager
-rwxr-xr-x. 1 root root 2165 4月 1 2014 /usr/bin/masterha_master_monitor
-rwxr-xr-x. 1 root root 2373 4月 1 2014 /usr/bin/masterha_master_switch
-rwxr-xr-x. 1 root root 5171 4月 1 2014 /usr/bin/masterha_secondary_check
-rwxr-xr-x. 1 root root 1739 4月 1 2014 /usr/bin/masterha_stop
3、创建mha-manager配置文件管理目录和配置文件,如果下载的是.tar安装包,里面一般包含对应的配置模板和相应脚本,我这里用没有0.56.tar安装包
[root@localhost ~]# mkdir -p /etc/masterha
[root@localhost ~]# mkdir -p /var/log/masterha/app1
[root@localhost ~]# touch /etc/masterha/app1.cnf
[root@localhost ~]# vim /etc/masterha/app1.cnf
写入一下内容
[server default]
manager_workdir=/var/log/masterha/app1
manager_log=/var/log/masterha/app1/manager.log
master_binlog_dir=/var/lib/mysql
#master_ip_failover_script= /usr/local/bin/master_ip_failover #我这里没有这个脚本,不注释会报错
#master_ip_online_change_script= /usr/local/bin/master_ip_online_change #我这里没有这个脚本,不注释会报错
password=123.com
user=root
ping_interval=1
remote_workdir=/tmp
repl_password=123.com
repl_user=tongbu
#report_script=/usr/local/send_report #我这里没有这个脚本,不注释会报错
shutdown_script=""
ssh_user=root
[server1]
hostname=192.168.29.132
port=3306
[server2]
hostname=192.168.29.131
candidate_master=1
check_repl_delay=0
[server3]
hostname=192.168.29.133
port=3306
配置文件解析
[server default]
manager_workdir=/var/log/masterha/app1 //设置manager的工作目录
manager_log=/var/log/masterha/app1/manager.log //设置manager的日志
master_binlog_dir=/var/lib/mysql //设置master 保存binlog的位置,以便MHA可以找到master的日志,我这里的也就是mysql的数据目录
master_ip_failover_script= /usr/local/bin/master_ip_failover //设置自动failover时候的切换脚本
master_ip_online_change_script= /usr/local/bin/master_ip_online_change //设置手动切换时候的切换脚本
password=123.com //设置mysql中root用户的密码,这个密码是前文中创建监控用户的那个密码
user=root 设置监控用户root
ping_interval=1 //设置监控主库,发送ping包的时间间隔,默认是3秒,尝试三次没有回应的时候自动进行railover
remote_workdir=/tmp //设置远端mysql在发生切换时binlog的保存位置
repl_password=123.com //设置复制用户的密码
repl_user=tongbu //设置复制环境中的复制用户名
report_script=/usr/local/send_report //设置发生切换后发送的报警的脚本
shutdown_script="" //设置故障发生后关闭故障主机脚本(该脚本的主要作用是关闭主机放在发生脑裂,这里没有使用)
ssh_user=root //设置ssh的登录用户名
[server1]
hostname=192.168.29.132
port=3306
[server2]
hostname=192.168.29.131
port=3306
candidate_master=1 #设置为候选master,如果设置该参数以后,发生主从切换以后将会将此从库提升为主库,即使这个主库不是集群中事件最新的slave
check_repl_delay=0 #默认情况下如果一个slave落后master 100M的relay logs的话,MHA将不会选择该slave作为一个新的master,因为对于这个slave的恢复需要花费很长时间,通过设置check_repl_delay=0,MHA触发切换在选择一个新的master的时候将会忽略复制延时,这个参数对于设置了candidate_master=1的主机非常有用,因为这个候选主在切换的过程中一定是新的master
[server3]
hostname=192.168.29.133
port=3306
4、检查SSH配置,检查MHA Manger到所有MHA Node的SSH连接状态:
[root@localhost ~]# masterha_check_ssh --conf=/etc/masterha/app1.cnf
5、.检查整个复制环境状况。通过masterha_check_repl脚本查看整个集群的状态
[root@localhost ~]# masterha_check_repl --conf=/etc/masterha/app1.cnf
MySQL Replication Health is NOT OK! 如果提示这个不ok,说明有问题
MySQL Replication Health is OK. 显示Ok ,正常!
6、(5).检查MHA Manager的状态;通过master_check_status脚本查看Manager的状态:
[root@localhost ~]# masterha_check_status --conf=/etc/masterha/app1.cnf
app1 is stopped(2:NOT_RUNNING).
如果正常,会显示"PING_OK",当前显示"NOT_RUNNING",说明MHA监控没有开启。
7、开启MHA Manager监控。(关闭: masterha_stop --conf=/etc/masterha/app1.conf)
[root@localhost ~]# nohup masterha_manager --conf=/etc/masterha/app1.cnf &
这里可以选参数及参数说明:
nohup masterha_manager --conf=/etc/masterha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /var/log/masterha/app1/manager.log 2>&1 &
启动参数介绍:
--remove_dead_master_conf 该参数代表当发生主从切换后,老的主库的ip将会从配置文件中移除。
--manger_log 日志存放位置
--ignore_last_failover 在缺省情况下,如果MHA检测到连续发生宕机,且两次宕机间隔不足8小时的话,则不会进行Failover,之所以这样限制是为了避免ping-pong效应。该参数代表忽略上次MHA触发切换产生的文件,默认情况下,MHA发生切换后会在日志目录,也就是上面我设置的产生app1.failover.complete文件,下次再次切换的时候如果发现该目录下存在该文件将不允许触发切换,除非在第一次切换后收到删除该文件,为了方便,这里设置为--ignore_last_failover。
查看MHA Manager监控是否正常:
在这里插入代码片
可以看出现在mysql-master是192.168.29.132
mha-manager启用mha监控,开启动态日志;
[root@localhost ~]# nohup masterha_manager --conf=/etc/masterha/app1.cnf &
[root@localhost ~]# tail -f /var/log/masterha/app1/manager.log
mysql-master停止mysql服务,模拟主库宕机,并跟踪mha-manager日志查看master是否自动切换,按配置应该切换到mysql-slave1:192.168.29.131
192.168.29.132停止mysql服务
[root@sql-master ~]# systemctl stop mariadb
192.168.29.134:mha-manager启用mha监控并跟踪日志
[root@localhost ~]# nohup masterha_manager --conf=/etc/masterha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /var/log/masterha/app1/manager.log 2>&1 &
[1] 96284
[root@localhost ~]# tail -f /var/log/masterha/app1/manager.log
#############mha提升192.168.29.131mysql-slave1的过程#####################################
192.168.29.132(192.168.29.132:3306) (current master)
+--192.168.29.131(192.168.29.131:3306)
+--192.168.29.133(192.168.29.133:3306)
Tue Jun 11 07:43:58 2019 - [warning] master_ip_failover_script is not defined.
Tue Jun 11 07:43:58 2019 - [warning] shutdown_script is not defined.
Tue Jun 11 07:43:58 2019 - [info] Set master ping interval 1 seconds.
Tue Jun 11 07:43:58 2019 - [warning] secondary_check_script is not defined. It is highly recommended setting it to check master reachability from two or more routes.
Tue Jun 11 07:43:58 2019 - [info] Starting ping health check on 192.168.29.132(192.168.29.132:3306)..
Tue Jun 11 07:43:58 2019 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..
Tue Jun 11 07:44:14 2019 - [warning] Got error on MySQL select ping: 2006 (MySQL server has gone away)
Tue Jun 11 07:44:14 2019 - [info] Executing SSH check script: save_binary_logs --command=test --start_pos=4 --binlog_dir=/var/lib/mysql --output_file=/tmp/save_binary_logs_test --manager_version=0.56 --binlog_prefix=mysql-bin-master
Tue Jun 11 07:44:15 2019 - [info] HealthCheck: SSH to 192.168.29.132 is reachable.
Tue Jun 11 07:44:15 2019 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.29.132' (111))
Tue Jun 11 07:44:15 2019 - [warning] Connection failed 2 time(s)..
Tue Jun 11 07:44:16 2019 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.29.132' (111))
Tue Jun 11 07:44:16 2019 - [warning] Connection failed 3 time(s)..
Tue Jun 11 07:44:17 2019 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.29.132' (111))
Tue Jun 11 07:44:17 2019 - [warning] Connection failed 4 time(s)..
Tue Jun 11 07:44:17 2019 - [warning] Master is not reachable from health checker!
Tue Jun 11 07:44:17 2019 - [warning] Master 192.168.29.132(192.168.29.132:3306) is not reachable!
Tue Jun 11 07:44:17 2019 - [warning] SSH is reachable.
Tue Jun 11 07:44:17 2019 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha_default.cnf and /etc/masterha/app1.cnf again, and trying to connect to all servers to check server status..
Tue Jun 11 07:44:17 2019 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Tue Jun 11 07:44:17 2019 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Tue Jun 11 07:44:17 2019 - [info] Reading server configuration from /etc/masterha/app1.cnf..
Tue Jun 11 07:44:18 2019 - [info] GTID failover mode = 0
Tue Jun 11 07:44:18 2019 - [info] Dead Servers:
Tue Jun 11 07:44:18 2019 - [info] 192.168.29.132(192.168.29.132:3306)
Tue Jun 11 07:44:18 2019 - [info] Alive Servers:
Tue Jun 11 07:44:18 2019 - [info] 192.168.29.131(192.168.29.131:3306)
Tue Jun 11 07:44:18 2019 - [info] 192.168.29.133(192.168.29.133:3306)
Tue Jun 11 07:44:18 2019 - [info] Alive Slaves:
Tue Jun 11 07:44:18 2019 - [info] 192.168.29.131(192.168.29.131:3306) Version=5.5.60-MariaDB (oldest major version between slaves) log-bin:enabled
Tue Jun 11 07:44:18 2019 - [info] Replicating from 192.168.29.132(192.168.29.132:3306)
Tue Jun 11 07:44:18 2019 - [info] Primary candidate for the new Master (candidate_master is set)
Tue Jun 11 07:44:18 2019 - [info] 192.168.29.133(192.168.29.133:3306) Version=5.5.60-MariaDB (oldest major version between slaves) log-bin:enabled
Tue Jun 11 07:44:18 2019 - [info] Replicating from 192.168.29.132(192.168.29.132:3306)
Tue Jun 11 07:44:18 2019 - [info] Checking slave configurations..
Tue Jun 11 07:44:18 2019 - [warning] relay_log_purge=0 is not set on slave 192.168.29.131(192.168.29.131:3306).
Tue Jun 11 07:44:18 2019 - [warning] relay_log_purge=0 is not set on slave 192.168.29.133(192.168.29.133:3306).
Tue Jun 11 07:44:18 2019 - [info] Checking replication filtering settings..
Tue Jun 11 07:44:18 2019 - [info] Replication filtering check ok.
Tue Jun 11 07:44:18 2019 - [info] Master is down!
Tue Jun 11 07:44:18 2019 - [info] Terminating monitoring script.
Tue Jun 11 07:44:18 2019 - [info] Got exit code 20 (Master dead).
Tue Jun 11 07:44:18 2019 - [info] MHA::MasterFailover version 0.56.
Tue Jun 11 07:44:18 2019 - [info] Starting master failover.
Tue Jun 11 07:44:18 2019 - [info]
Tue Jun 11 07:44:18 2019 - [info] * Phase 1: Configuration Check Phase..
Tue Jun 11 07:44:18 2019 - [info]
Tue Jun 11 07:44:19 2019 - [info] GTID failover mode = 0
Tue Jun 11 07:44:19 2019 - [info] Dead Servers:
Tue Jun 11 07:44:19 2019 - [info] 192.168.29.132(192.168.29.132:3306)
Tue Jun 11 07:44:19 2019 - [info] Checking master reachability via MySQL(double check)...
Tue Jun 11 07:44:19 2019 - [info] ok.
Tue Jun 11 07:44:19 2019 - [info] Alive Servers:
Tue Jun 11 07:44:19 2019 - [info] 192.168.29.131(192.168.29.131:3306)
Tue Jun 11 07:44:19 2019 - [info] 192.168.29.133(192.168.29.133:3306)
Tue Jun 11 07:44:19 2019 - [info] Alive Slaves:
Tue Jun 11 07:44:19 2019 - [info] 192.168.29.131(192.168.29.131:3306) Version=5.5.60-MariaDB (oldest major version between slaves) log-bin:enabled
Tue Jun 11 07:44:19 2019 - [info] Replicating from 192.168.29.132(192.168.29.132:3306)
Tue Jun 11 07:44:19 2019 - [info] Primary candidate for the new Master (candidate_master is set)
Tue Jun 11 07:44:19 2019 - [info] 192.168.29.133(192.168.29.133:3306) Version=5.5.60-MariaDB (oldest major version between slaves) log-bin:enabled
Tue Jun 11 07:44:19 2019 - [info] Replicating from 192.168.29.132(192.168.29.132:3306)
Tue Jun 11 07:44:19 2019 - [info] Starting Non-GTID based failover.
Tue Jun 11 07:44:19 2019 - [info]
Tue Jun 11 07:44:19 2019 - [info] ** Phase 1: Configuration Check Phase completed.
Tue Jun 11 07:44:19 2019 - [info]
Tue Jun 11 07:44:19 2019 - [info] * Phase 2: Dead Master Shutdown Phase..
Tue Jun 11 07:44:19 2019 - [info]
Tue Jun 11 07:44:19 2019 - [info] Forcing shutdown so that applications never connect to the current master..
Tue Jun 11 07:44:19 2019 - [warning] master_ip_failover_script is not set. Skipping invalidating dead master IP address.
Tue Jun 11 07:44:19 2019 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.
Tue Jun 11 07:44:20 2019 - [info] * Phase 2: Dead Master Shutdown Phase completed.
Tue Jun 11 07:44:20 2019 - [info]
Tue Jun 11 07:44:20 2019 - [info] * Phase 3: Master Recovery Phase..
Tue Jun 11 07:44:20 2019 - [info]
Tue Jun 11 07:44:20 2019 - [info] * Phase 3.1: Getting Latest Slaves Phase..
Tue Jun 11 07:44:20 2019 - [info]
Tue Jun 11 07:44:20 2019 - [info] The latest binary log file/position on all slaves is mysql-bin-master.000024:245
Tue Jun 11 07:44:20 2019 - [info] Latest slaves (Slaves that received relay log files to the latest):
Tue Jun 11 07:44:20 2019 - [info] 192.168.29.131(192.168.29.131:3306) Version=5.5.60-MariaDB (oldest major version between slaves) log-bin:enabled
Tue Jun 11 07:44:20 2019 - [info] Replicating from 192.168.29.132(192.168.29.132:3306)
Tue Jun 11 07:44:20 2019 - [info] Primary candidate for the new Master (candidate_master is set)
Tue Jun 11 07:44:20 2019 - [info] 192.168.29.133(192.168.29.133:3306) Version=5.5.60-MariaDB (oldest major version between slaves) log-bin:enabled
Tue Jun 11 07:44:20 2019 - [info] Replicating from 192.168.29.132(192.168.29.132:3306)
Tue Jun 11 07:44:20 2019 - [info] The oldest binary log file/position on all slaves is mysql-bin-master.000024:245
Tue Jun 11 07:44:20 2019 - [info] Oldest slaves:
Tue Jun 11 07:44:20 2019 - [info] 192.168.29.131(192.168.29.131:3306) Version=5.5.60-MariaDB (oldest major version between slaves) log-bin:enabled
Tue Jun 11 07:44:20 2019 - [info] Replicating from 192.168.29.132(192.168.29.132:3306)
Tue Jun 11 07:44:20 2019 - [info] Primary candidate for the new Master (candidate_master is set)
Tue Jun 11 07:44:20 2019 - [info] 192.168.29.133(192.168.29.133:3306) Version=5.5.60-MariaDB (oldest major version between slaves) log-bin:enabled
Tue Jun 11 07:44:20 2019 - [info] Replicating from 192.168.29.132(192.168.29.132:3306)
Tue Jun 11 07:44:20 2019 - [info]
Tue Jun 11 07:44:20 2019 - [info] * Phase 3.2: Saving Dead Master's Binlog Phase..
Tue Jun 11 07:44:20 2019 - [info]
Tue Jun 11 07:44:21 2019 - [info] Fetching dead master's binary logs..
Tue Jun 11 07:44:21 2019 - [info] Executing command on the dead master 192.168.29.132(192.168.29.132:3306): save_binary_logs --command=save --start_file=mysql-bin-master.000024 --start_pos=245 --binlog_dir=/var/lib/mysql --output_file=/tmp/saved_master_binlog_from_192.168.29.132_3306_20190611074418.binlog --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.56
Creating /tmp if not exists.. ok.
Concat binary/relay logs from mysql-bin-master.000024 pos 245 to mysql-bin-master.000024 EOF into /tmp/saved_master_binlog_from_192.168.29.132_3306_20190611074418.binlog ..
Dumping binlog format description event, from position 0 to 245.. ok.
Dumping effective binlog data from /var/lib/mysql/mysql-bin-master.000024 position 245 to tail(264).. ok.
Concat succeeded.
Tue Jun 11 07:44:22 2019 - [info] scp from [email protected]:/tmp/saved_master_binlog_from_192.168.29.132_3306_20190611074418.binlog to local:/var/log/masterha/app1/saved_master_binlog_from_192.168.29.132_3306_20190611074418.binlog succeeded.
Tue Jun 11 07:44:22 2019 - [info] HealthCheck: SSH to 192.168.29.131 is reachable.
Tue Jun 11 07:44:23 2019 - [info] HealthCheck: SSH to 192.168.29.133 is reachable.
Tue Jun 11 07:44:23 2019 - [info]
Tue Jun 11 07:44:23 2019 - [info] * Phase 3.3: Determining New Master Phase..
Tue Jun 11 07:44:23 2019 - [info]
Tue Jun 11 07:44:23 2019 - [info] Finding the latest slave that has all relay logs for recovering other slaves..
Tue Jun 11 07:44:23 2019 - [info] All slaves received relay logs to the same position. No need to resync each other.
Tue Jun 11 07:44:23 2019 - [info] Searching new master from slaves..
Tue Jun 11 07:44:23 2019 - [info] Candidate masters from the configuration file:
Tue Jun 11 07:44:23 2019 - [info] 192.168.29.131(192.168.29.131:3306) Version=5.5.60-MariaDB (oldest major version between slaves) log-bin:enabled
Tue Jun 11 07:44:23 2019 - [info] Replicating from 192.168.29.132(192.168.29.132:3306)
Tue Jun 11 07:44:23 2019 - [info] Primary candidate for the new Master (candidate_master is set)
Tue Jun 11 07:44:23 2019 - [info] Non-candidate masters:
Tue Jun 11 07:44:23 2019 - [info] Searching from candidate_master slaves which have received the latest relay log events..
Tue Jun 11 07:44:23 2019 - [info] New master is 192.168.29.131(192.168.29.131:3306)
Tue Jun 11 07:44:23 2019 - [info] Starting master failover..
Tue Jun 11 07:44:23 2019 - [info]
From:
192.168.29.132(192.168.29.132:3306) (current master)
+--192.168.29.131(192.168.29.131:3306)
+--192.168.29.133(192.168.29.133:3306)
To:
192.168.29.131(192.168.29.131:3306) (new master)
+--192.168.29.133(192.168.29.133:3306)
Tue Jun 11 07:44:23 2019 - [info]
Tue Jun 11 07:44:23 2019 - [info] * Phase 3.3: New Master Diff Log Generation Phase..
Tue Jun 11 07:44:23 2019 - [info]
Tue Jun 11 07:44:23 2019 - [info] This server has all relay logs. No need to generate diff files from the latest slave.
Tue Jun 11 07:44:23 2019 - [info] Sending binlog..
Tue Jun 11 07:44:24 2019 - [info] scp from local:/var/log/masterha/app1/saved_master_binlog_from_192.168.29.132_3306_20190611074418.binlog to [email protected]:/tmp/saved_master_binlog_from_192.168.29.132_3306_20190611074418.binlog succeeded.
Tue Jun 11 07:44:24 2019 - [info]
Tue Jun 11 07:44:24 2019 - [info] * Phase 3.4: Master Log Apply Phase..
Tue Jun 11 07:44:24 2019 - [info]
Tue Jun 11 07:44:24 2019 - [info] *NOTICE: If any error happens from this phase, manual recovery is needed.
Tue Jun 11 07:44:24 2019 - [info] Starting recovery on 192.168.29.131(192.168.29.131:3306)..
Tue Jun 11 07:44:24 2019 - [info] Generating diffs succeeded.
Tue Jun 11 07:44:24 2019 - [info] Waiting until all relay logs are applied.
Tue Jun 11 07:44:24 2019 - [info] done.
Tue Jun 11 07:44:24 2019 - [info] Getting slave status..
Tue Jun 11 07:44:24 2019 - [info] This slave(192.168.29.131)'s Exec_Master_Log_Pos equals to Read_Master_Log_Pos(mysql-bin-master.000024:245). No need to recover from Exec_Master_Log_Pos.
Tue Jun 11 07:44:24 2019 - [info] Connecting to the target slave host 192.168.29.131, running recover script..
Tue Jun 11 07:44:24 2019 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user='root' --slave_host=192.168.29.131 --slave_ip=192.168.29.131 --slave_port=3306 --apply_files=/tmp/saved_master_binlog_from_192.168.29.132_3306_20190611074418.binlog --workdir=/tmp --target_version=5.5.60-MariaDB --timestamp=20190611074418 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.56 --slave_pass=xxx
Tue Jun 11 07:44:25 2019 - [info]
Applying differential binary/relay log files /tmp/saved_master_binlog_from_192.168.29.132_3306_20190611074418.binlog on 192.168.29.131:3306. This may take long time...
Applying log files succeeded.
Tue Jun 11 07:44:25 2019 - [info] All relay logs were successfully applied.
Tue Jun 11 07:44:25 2019 - [info] Getting new master's binlog name and position..
Tue Jun 11 07:44:25 2019 - [info] mysql-slave1.000007:245
Tue Jun 11 07:44:25 2019 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='192.168.29.131', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-slave1.000007', MASTER_LOG_POS=245, MASTER_USER='tongbu', MASTER_PASSWORD='xxx';
Tue Jun 11 07:44:25 2019 - [warning] master_ip_failover_script is not set. Skipping taking over new master IP address.
Tue Jun 11 07:44:25 2019 - [info] Setting read_only=0 on 192.168.29.131(192.168.29.131:3306)..
Tue Jun 11 07:44:25 2019 - [info] ok.
Tue Jun 11 07:44:25 2019 - [info] ** Finished master recovery successfully.
Tue Jun 11 07:44:25 2019 - [info] * Phase 3: Master Recovery Phase completed.
Tue Jun 11 07:44:25 2019 - [info]
Tue Jun 11 07:44:25 2019 - [info] * Phase 4: Slaves Recovery Phase..
Tue Jun 11 07:44:25 2019 - [info]
Tue Jun 11 07:44:25 2019 - [info] * Phase 4.1: Starting Parallel Slave Diff Log Generation Phase..
Tue Jun 11 07:44:25 2019 - [info]
Tue Jun 11 07:44:25 2019 - [info] -- Slave diff file generation on host 192.168.29.133(192.168.29.133:3306) started, pid: 96496. Check tmp log /var/log/masterha/app1/192.168.29.133_3306_20190611074418.log if it takes time..
Tue Jun 11 07:44:26 2019 - [info]
Tue Jun 11 07:44:26 2019 - [info] Log messages from 192.168.29.133 ...
Tue Jun 11 07:44:26 2019 - [info]
Tue Jun 11 07:44:25 2019 - [info] This server has all relay logs. No need to generate diff files from the latest slave.
Tue Jun 11 07:44:26 2019 - [info] End of log messages from 192.168.29.133.
Tue Jun 11 07:44:26 2019 - [info] -- 192.168.29.133(192.168.29.133:3306) has the latest relay log events.
Tue Jun 11 07:44:26 2019 - [info] Generating relay diff files from the latest slave succeeded.
Tue Jun 11 07:44:26 2019 - [info]
Tue Jun 11 07:44:26 2019 - [info] * Phase 4.2: Starting Parallel Slave Log Apply Phase..
Tue Jun 11 07:44:26 2019 - [info]
Tue Jun 11 07:44:26 2019 - [info] -- Slave recovery on host 192.168.29.133(192.168.29.133:3306) started, pid: 96498. Check tmp log /var/log/masterha/app1/192.168.29.133_3306_20190611074418.log if it takes time..
Tue Jun 11 07:44:28 2019 - [info]
Tue Jun 11 07:44:28 2019 - [info] Log messages from 192.168.29.133 ...
Tue Jun 11 07:44:28 2019 - [info]
Tue Jun 11 07:44:26 2019 - [info] Sending binlog..
Tue Jun 11 07:44:27 2019 - [info] scp from local:/var/log/masterha/app1/saved_master_binlog_from_192.168.29.132_3306_20190611074418.binlog to [email protected]:/tmp/saved_master_binlog_from_192.168.29.132_3306_20190611074418.binlog succeeded.
Tue Jun 11 07:44:27 2019 - [info] Starting recovery on 192.168.29.133(192.168.29.133:3306)..
Tue Jun 11 07:44:27 2019 - [info] Generating diffs succeeded.
Tue Jun 11 07:44:27 2019 - [info] Waiting until all relay logs are applied.
Tue Jun 11 07:44:27 2019 - [info] done.
Tue Jun 11 07:44:27 2019 - [info] Getting slave status..
Tue Jun 11 07:44:27 2019 - [info] This slave(192.168.29.133)'s Exec_Master_Log_Pos equals to Read_Master_Log_Pos(mysql-bin-master.000024:245). No need to recover from Exec_Master_Log_Pos.
Tue Jun 11 07:44:27 2019 - [info] Connecting to the target slave host 192.168.29.133, running recover script..
Tue Jun 11 07:44:27 2019 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user='root' --slave_host=192.168.29.133 --slave_ip=192.168.29.133 --slave_port=3306 --apply_files=/tmp/saved_master_binlog_from_192.168.29.132_3306_20190611074418.binlog --workdir=/tmp --target_version=5.5.60-MariaDB --timestamp=20190611074418 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.56 --slave_pass=xxx
Tue Jun 11 07:44:27 2019 - [info]
Applying differential binary/relay log files /tmp/saved_master_binlog_from_192.168.29.132_3306_20190611074418.binlog on 192.168.29.133:3306. This may take long time...
Applying log files succeeded.
Tue Jun 11 07:44:27 2019 - [info] All relay logs were successfully applied.
Tue Jun 11 07:44:27 2019 - [info] Resetting slave 192.168.29.133(192.168.29.133:3306) and starting replication from the new master 192.168.29.131(192.168.29.131:3306)..
Tue Jun 11 07:44:27 2019 - [info] Executed CHANGE MASTER.
Tue Jun 11 07:44:27 2019 - [info] Slave started.
Tue Jun 11 07:44:28 2019 - [info] End of log messages from 192.168.29.133.
Tue Jun 11 07:44:28 2019 - [info] -- Slave recovery on host 192.168.29.133(192.168.29.133:3306) succeeded.
Tue Jun 11 07:44:28 2019 - [info] All new slave servers recovered successfully.
Tue Jun 11 07:44:28 2019 - [info]
Tue Jun 11 07:44:28 2019 - [info] * Phase 5: New master cleanup phase..
Tue Jun 11 07:44:28 2019 - [info]
Tue Jun 11 07:44:28 2019 - [info] Resetting slave info on the new master..
Tue Jun 11 07:44:28 2019 - [info] 192.168.29.131: Resetting slave info succeeded.
Tue Jun 11 07:44:28 2019 - [info] Master failover to 192.168.29.131(192.168.29.131:3306) completed successfully.
Tue Jun 11 07:44:28 2019 - [info] Deleted server1 entry from /etc/masterha/app1.cnf .
Tue Jun 11 07:44:28 2019 - [info]
----- Failover Report -----
app1: MySQL Master failover 192.168.29.132(192.168.29.132:3306) to 192.168.29.131(192.168.29.131:3306) succeeded
Master 192.168.29.132(192.168.29.132:3306) is down!
Check MHA Manager logs at localhost.localdomain:/var/log/masterha/app1/manager.log for details.
Started automated(non-interactive) failover.
The latest slave 192.168.29.131(192.168.29.131:3306) has all relay logs for recovery.
Selected 192.168.29.131(192.168.29.131:3306) as a new master.
192.168.29.131(192.168.29.131:3306): OK: Applying all logs succeeded.
192.168.29.133(192.168.29.133:3306): This host has the latest relay log events.
Generating relay diff files from the latest slave succeeded.
192.168.29.133(192.168.29.133:3306): OK: Applying all logs succeeded. Slave started, replicating from 192.168.29.131(192.168.29.131:3306)
192.168.29.131(192.168.29.131:3306): Resetting slave info succeeded.
Master failover to 192.168.29.131(192.168.29.131:3306) completed successfully.
确认是否切换成功
[root@localhost ~]# nohup masterha_manager --conf=/etc/masterha/app1.cnf &
[root@localhost ~]# masterha_check_status --conf=/etc/masterha/app1.cnf
192.168.29.133查看master;
MHA配置OK!!!!!!!!!!!!
总结:
目前高可用方案可以一定程度上实现数据库的高可用,还有其他方案heartbeat+drbd,Cluster、MGR等。这些高可用软件各有优劣。在进行高可用方案选择时,主要是看业务还有对数据一致性方面的要求。
扩展:(来源网络)
关于配置VIP配合MHA使用
vip配置可以采用两种方式,一种通过keepalived的方式管理虚拟ip的浮动;另外一种通过脚本方式启动虚拟ip的方式(即不需要keepalived或者heartbeat类似的软件)。为了防止脑裂发生,推荐生产环境采用脚本的方式来管理虚拟ip,而不是使用keepalived来完成。自行脑补百度或者参考http://www.cnblogs.com/gomysql/p/3675429.html,下面是通过脚本的方式管理VIP。这里是修改/usr/local/bin/master_ip_failover,修改完成后内容如下,而且如果使用脚本管理vip的话,需要手动在master服务器上绑定一个vip,编写脚本/usr/ /bin/master_ip_failover,要会perl脚本语言(主库上操作,192.168.29.132)。
在MHA Manager修改脚本修改后的内容如下(参考资料比较少):
#!/usr/bin/env perl
use strict;
use warnings FATAL => 'all';
use Getopt::Long;
my (
$command, $ssh_user, $orig_master_host, $orig_master_ip,
$orig_master_port, $new_master_host, $new_master_ip, $new_master_port
);
my $vip = '192.168.29.88';
my $ssh_start_vip = "/etc/init.d/keepalived start";
my $ssh_stop_vip = "/etc/init.d/keepalived stop";
GetOptions(
'command=s' => \$command,
'ssh_user=s' => \$ssh_user,
'orig_master_host=s' => \$orig_master_host,
'orig_master_ip=s' => \$orig_master_ip,
'orig_master_port=i' => \$orig_master_port,
'new_master_host=s' => \$new_master_host,
'new_master_ip=s' => \$new_master_ip,
'new_master_port=i' => \$new_master_port,
);
exit &main();
sub main {
print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n";
if ( $command eq "stop" || $command eq "stopssh" ) {
my $exit_code = 1;
eval {
print "Disabling the VIP on old master: $orig_master_host \n";
&stop_vip();
$exit_code = 0;
};
if ($@) {
warn "Got Error: $@\n";
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "start" ) {
my $exit_code = 10;
eval {
print "Enabling the VIP - $vip on the new master - $new_master_host \n";
&start_vip();
$exit_code = 0;
};
if ($@) {
warn $@;
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "status" ) {
print "Checking the Status of the script.. OK \n";
#`ssh $ssh_user\@cluster1 \" $ssh_start_vip \"`;
exit 0;
}
else {
&usage();
exit 1;
}
}
# A simple system call that enable the VIP on the new master
sub start_vip() {
`ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`;
}
# A simple system call that disable the VIP on the old_master
sub stop_vip() {
`ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
}
sub usage {
print
"Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
}