MySQL高可用架构之MHA

一、简介

MHA(Master High Availability)目前在MySQL高可用方面是一个相对成熟的解决方案,它是一套优秀的作为MySQL高可用性环境下故障切换和主从提升的高可用软件。在MySQL故障切换过程中,MHA能做到在0~30秒之内自动完成数据库的故障切换操作,并且在进行故障切换的过程中,MHA能在最大程度上保证数据的一致性,以达到真正意义上的高可用。

该软件由两部分组成:MHA Manager(管理节点)和MHA Node(数据节点)。MHA Manager可以单独部署在一台独立的机器上管理多个master-slave集群,也可以部署在一台slave节点上。MHA Node运行在每台MySQL服务器上,MHA Manager会定时探测集群中的master节点,当master出现故障时,它可以自动将最新数据的slave提升为新的master,然后将所有其他的slave重新指向新的master。整个故障转移过程对应用程序完全透明。

在MHA自动故障切换过程中,MHA试图从宕机的主服务器上保存二进制日志,最大程度的保证数据的不丢失,但这并不总是可行的。例如,如果主服务器硬件故障或无法通过ssh访问,MHA没法保存二进制日志,只进行故障转移而丢失了最新的数据。使用MySQL 5.5的半同步复制,可以大大降低数据丢失的风险。MHA可以与半同步复制结合起来。如果只有一个slave已经收到了最新的二进制日志,MHA可以将最新的二进制日志应用于其他所有的slave服务器上,因此可以保证所有节点的数据一致性。

目前MHA主要支持一主多从的架构,要搭建MHA,要求一个复制集群中必须最少有三台数据库服务器,一主二从,即一台充当master,一台充当备用master,另外一台充当从库,因为至少需要三台服务器

MHA工作原理总结为以下几条:

(1)从宕机崩溃的master保存二进制日志事件(binlog events);
(2)识别含有最新更新的slave;
(3)应用差异的中继日志(relay log) 到其他slave;
(4)应用从master保存的二进制日志事件(binlog events);
(5)提升一个slave为新master;
(6)使用其他的slave连接新的master进行复制。

官方介绍:https://code.google.com/p/mysql-master-ha/

二、部署MHA:

1.实验环境:

redhat6.5 、MySQL 5.7.19

角色 ip 主机名 server_id 服务类型
Monitor host 172.25.27.4 server4 - 监控复制组
Master 172.25.27.1 server1 1
Candicate master 172.25.27.2 server2 2
Slave 172.25.27.3 server3 3

server2和server3是server1的slave,其中master对外提供写服务,备选master(实际的slave,主机名server2)提供读服务,slave也提供相关的读服务,一旦master宕机,将会把备选master提升为新的master,slave指向新的master

2.搭建主从复制环境

下载mysql 相关rpm包:
https://dev.mysql.com/downloads/mysql/

##在server1(172.25.27.1)操作:

[root@server1 mysql5.7.19]# ls
mysql-community-client-5.7.19-1.el6.x86_64.rpm
mysql-community-common-5.7.19-1.el6.x86_64.rpm
mysql-community-libs-5.7.19-1.el6.x86_64.rpm
mysql-community-libs-compat-5.7.19-1.el6.x86_64.rpm
mysql-community-server-5.7.19-1.el6.x86_64.rpm
[root@server1 mysql5.7.19]# yum install -y *
[root@server1 mysql5.7.19]# vim /etc/my.cnf

symbolic-links=0

server-id=1
gtid_mode=ON
enforce_gtid_consistency=ON
master_info_repository=TABLE
relay_log_info_repository=TABLE
log_slave_updates=ON
log_bin=binlog
binlog_format=ROW
binlog_do-db=test
binlog_ignore_db=mysql

log-error=/var/log/mysqld.log


[root@server1 mysql5.7.19]# scp /etc/my.cnf server2:/etc/
[root@server1 mysql5.7.19]# scp /etc/my.cnf server3:/etc/

[root@server1 mysql5.7.19]# /etc/init.d/mysqld start
Initializing MySQL database:                               [  OK  ]
Starting mysqld:                                           [  OK  ]
[root@server1 mysql5.7.19]# cat /var/log/mysqld.log | grep localhost        ##查看数据库初始密码
2017-10-07T08:23:32.276339Z 1 [Note] A temporary password is generated for root@localhost: ekso,kwhk1B&
2017-10-07T08:23:39.043717Z 3 [Note] Access denied for user 'UNKNOWN_MYSQL_USER'@'localhost' (using password: NO)
[root@server1 mysql5.7.19]# mysql -p
Enter password: 
mysql> ALTER USER 'root'@'localhost' IDENTIFIED BY 'Mypasswd+1';
Query OK, 0 rows affected (0.00 sec)
mysql> grant replication slave on *.* to repl@'172.25.27.%' identified by 'Mypasswd+1';
Query OK, 0 rows affected, 1 warning (0.01 sec)

mysql> show master status;
+---------------+----------+--------------+------------------+------------------------------------------+
| File          | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set                        |
+---------------+----------+--------------+------------------+------------------------------------------+
| binlog.000002 |      691 | test         | mysql            | ce1d2cae-ab38-11e7-9244-525400ac4caf:1-2 |
+---------------+----------+--------------+------------------+------------------------------------------+
1 row in set (0.00 sec)

MySQL高可用架构之MHA_第1张图片

##在server2\3(172.25.27.2\3)操作:
##yum 安装mysql 与在server1上完全相同,这里不再叙述
[root@server2 ~]# vim /etc/my.cnf
server-id=2\3       ##只需将此处数字改为2或者3,或者其他任意数字,三台服务器不可重复

##更改完成之后启动数据库并修改密码,均与server1相同
mysql> ALTER USER 'root'@'localhost' IDENTIFIED BY 'Mypasswd+1';
Query OK, 0 rows affected (0.01 sec)

mysql> grant replication slave on *.* to repl@'172.25.27.%' identified by 'Mypasswd+1';
Query OK, 0 rows affected, 1 warning (0.01 sec)

mysql> change master to master_host='172.25.27.1', master_user='repl', master_password='Mypasswd+1', master_auto_position=1;
Query OK, 0 rows affected, 2 warnings (0.07 sec)

mysql> start slave;
Query OK, 0 rows affected (0.02 sec)

mysql> show slave status\G;
*************************** 1. row ***************************
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes

##或者使用如下命令查看
[root@server2 ~]# mysql -pMypasswd+1 -e 'show slave status\G' | egrep 'Slave_IO|Slave_SQL'
mysql: [Warning] Using a password on the command line interface can be insecure.
               Slave_IO_State: Waiting for master to send event
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
      Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates

3.主从复制测试

##SERVER 2/3:
mysql> SHOW DATABASES;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| mysql              |
| performance_schema |
| sys                |
+--------------------+

##SERVER 1:
mysql> SHOW DATABASES;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| mysql              |
| performance_schema |
| sys                |
+--------------------+
mysql> CREATE DATABASE test;
mysql> SHOW DATABASES;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| mysql              |
| performance_schema |
| sys                |
| test               |
+--------------------+

##SERVER2/3 :
mysql> SHOW DATABASES;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| mysql              |
| performance_schema |
| sys                |
| test               |
+--------------------+
5 rows in set (0.00 sec)

同步成功

4.安装MHA Manager

[root@server4 ~]# cd MHA/
[root@server4 MHA]# ls
master_ip_failover
master_ip_online_change
mha4mysql-manager-0.56-0.el6.noarch.rpm
mha4mysql-node-0.56-0.el6.noarch.rpm
perl-Config-Tiny-2.12-7.1.el6.noarch.rpm
perl-Email-Date-Format-1.002-5.el6.noarch.rpm
perl-Log-Dispatch-2.27-1.el6.noarch.rpm
perl-Mail-Sender-0.8.16-3.el6.noarch.rpm
perl-Mail-Sendmail-0.79-12.el6.noarch.rpm
perl-MIME-Lite-3.027-2.el6.noarch.rpm
perl-MIME-Types-1.28-2.el6.noarch.rpm
perl-Parallel-ForkManager-0.7.9-1.el6.noarch.rpm

[root@server4 MHA]# yum install -y *.rpm

5.创建监控用户

在master上执行,也就是server1:

mysql> grant all on *.* to 'root'@'172.25.27.%' identified  by 'Mypasswd+2';

6.创建MHA的工作目录,并且创建相关配置文件


[root@server4 MHA]# mkdir /etc/masterha
[root@server4 MHA]# cd /etc/masterha/
[root@server4 masterha]# vim app.conf

[server default]
manager_workdir=/etc/masterha
manager_log=/etc/masterha/mha.log
master_binlog_dir=/var/lib/mysql
#master_ip_failover_script=/etc/masterha/master_ip_failover
#master_ip_online_change_script= /etc/masterha/master_ip_online_change
password=Mypasswd+2
user=root
ping_interval=1
remote_workdir=/tmp
repl_password=Mypasswd+1
repl_user=repl
#report_script=/usr/local/send_report
#secondary_check_script=/usr/bin/masterha_secondary_check -s 172.25.27.2 -s 172.25.27.3
#shutdown_script=""
ssh_user=root

[server1]
hostname=172.25.27.1
port=3306
#candidate_master=1
#check_repl_delay=0

[server2]
hostname=172.25.27.2
port=3306
#candidate_master=1
#check_repl_delay=0

[server3]
hostname=172.25.27.3
port=3306
no_master=1

7.在所有的节点安装mha node:

##server1/2/3

[root@server1 ~]# ls
mha4mysql-node-0.56-0.el6.noarch.rpm
[root@server1 ~]# yum install -y mha4mysql-node-0.56-0.el6.noarch.rpm

8.配置SSH登录无密码验证

使用key登录,服务器之间ssh登陆无需密码验证。关于配置使用key登录,这里只做简单介绍。注意:不能禁止 password 登陆,否则会出现错误

[root@server4 ~]# ssh-keygen 
[root@server4 ~]# ssh-copy-id 172.25.27.1
[root@server4 ~]# ssh-copy-id 172.25.27.2
[root@server4 ~]# ssh-copy-id 172.25.27.3
[root@server4 ~]# ssh [email protected]
[root@server4 ~]# ssh [email protected]
[root@server4 ~]# ssh [email protected]

注意服务器之间均需要SSH登录无密码验证,所以在三台服务器上也需要配置使用key登录,这里不再赘述

9.检查SSH配置

使用masterha_check_ssh工具检查MHA的SSH配置状况

[root@server4 ~]# masterha_check_ssh --conf=/etc/masterha/app.cnf
Mon Oct  9 11:15:21 2017 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Mon Oct  9 11:15:21 2017 - [info] Reading application default configuration from /etc/masterha/app.cnf..
Mon Oct  9 11:15:21 2017 - [info] Reading server configuration from /etc/masterha/app.cnf..
Mon Oct  9 11:15:21 2017 - [info] Starting SSH connection tests..
Mon Oct  9 11:15:22 2017 - [debug] 
Mon Oct  9 11:15:21 2017 - [debug]  Connecting via SSH from root@172.25.27.1(172.25.27.1:22) to root@172.25.27.2(172.25.27.2:22)..
Mon Oct  9 11:15:21 2017 - [debug]   ok.
Mon Oct  9 11:15:21 2017 - [debug]  Connecting via SSH from root@172.25.27.1(172.25.27.1:22) to root@172.25.27.3(172.25.27.3:22)..
Mon Oct  9 11:15:22 2017 - [debug]   ok.
Mon Oct  9 11:15:22 2017 - [debug] 
Mon Oct  9 11:15:22 2017 - [debug]  Connecting via SSH from root@172.25.27.2(172.25.27.2:22) to root@172.25.27.1(172.25.27.1:22)..
Mon Oct  9 11:15:22 2017 - [debug]   ok.
Mon Oct  9 11:15:22 2017 - [debug]  Connecting via SSH from root@172.25.27.2(172.25.27.2:22) to root@172.25.27.3(172.25.27.3:22)..
Mon Oct  9 11:15:22 2017 - [debug]   ok.
Mon Oct  9 11:15:23 2017 - [debug] 
Mon Oct  9 11:15:22 2017 - [debug]  Connecting via SSH from root@172.25.27.3(172.25.27.3:22) to root@172.25.27.1(172.25.27.1:22)..
Mon Oct  9 11:15:22 2017 - [debug]   ok.
Mon Oct  9 11:15:22 2017 - [debug]  Connecting via SSH from root@172.25.27.3(172.25.27.3:22) to root@172.25.27.2(172.25.27.2:22)..
Mon Oct  9 11:15:22 2017 - [debug]   ok.
Mon Oct  9 11:15:23 2017 - [info] All SSH connection tests passed successfully.

检测结果不能有error,否则就是SSH登录无密码验证配置存在问题

10.检查整个复制环境状况。

通过masterha_check_repl脚本查看整个集群的状态

[root@server4 ~]# yum install -y mysql-server
[root@server4 ~]# mysql -h 172.25.27.1 -u repl -pMypasswd+1
[root@server4 ~]# mysql -h 172.25.27.1 -u root -pMypasswd+2
[root@server4 ~]# masterha_check_repl --conf=/etc/masterha/app.cnf
Mon Oct  9 11:32:21 2017 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Mon Oct  9 11:32:21 2017 - [info] Reading application default configuration from /etc/masterha/app.cnf..
Mon Oct  9 11:32:21 2017 - [info] Reading server configuration from /etc/masterha/app.cnf..
Mon Oct  9 11:32:21 2017 - [info] MHA::MasterMonitor version 0.56.
Mon Oct  9 11:32:22 2017 - [info] GTID failover mode = 1
Mon Oct  9 11:32:22 2017 - [info] Dead Servers:
Mon Oct  9 11:32:22 2017 - [info] Alive Servers:
Mon Oct  9 11:32:22 2017 - [info]   172.25.27.1(172.25.27.1:3306)
Mon Oct  9 11:32:22 2017 - [info]   172.25.27.2(172.25.27.2:3306)
Mon Oct  9 11:32:22 2017 - [info]   172.25.27.3(172.25.27.3:3306)
Mon Oct  9 11:32:22 2017 - [info] Alive Slaves:
Mon Oct  9 11:32:22 2017 - [info]   172.25.27.2(172.25.27.2:3306)  Version=5.7.19-log (oldest major version between slaves) log-bin:enabled
Mon Oct  9 11:32:22 2017 - [info]     GTID ON
Mon Oct  9 11:32:22 2017 - [info]     Replicating from 172.25.27.1(172.25.27.1:3306)
Mon Oct  9 11:32:22 2017 - [info]   172.25.27.3(172.25.27.3:3306)  Version=5.7.19-log (oldest major version between slaves) log-bin:enabled
Mon Oct  9 11:32:22 2017 - [info]     GTID ON
Mon Oct  9 11:32:22 2017 - [info]     Replicating from 172.25.27.1(172.25.27.1:3306)
Mon Oct  9 11:32:22 2017 - [info]     Not candidate for the new Master (no_master is set)
Mon Oct  9 11:32:22 2017 - [info] Current Alive Master: 172.25.27.1(172.25.27.1:3306)
Mon Oct  9 11:32:22 2017 - [info] Checking slave configurations..
Mon Oct  9 11:32:22 2017 - [info]  read_only=1 is not set on slave 172.25.27.2(172.25.27.2:3306).
Mon Oct  9 11:32:22 2017 - [info]  read_only=1 is not set on slave 172.25.27.3(172.25.27.3:3306).
Mon Oct  9 11:32:22 2017 - [info] Checking replication filtering settings..
Mon Oct  9 11:32:22 2017 - [info]  binlog_do_db= test, binlog_ignore_db= mysql
Mon Oct  9 11:32:22 2017 - [info]  Replication filtering check ok.
Mon Oct  9 11:32:22 2017 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.
Mon Oct  9 11:32:22 2017 - [info] Checking SSH publickey authentication settings on the current master..
Mon Oct  9 11:32:22 2017 - [info] HealthCheck: SSH to 172.25.27.1 is reachable.
Mon Oct  9 11:32:22 2017 - [info] 
172.25.27.1(172.25.27.1:3306) (current master)
 +--172.25.27.2(172.25.27.2:3306)
 +--172.25.27.3(172.25.27.3:3306)

Mon Oct  9 11:32:22 2017 - [info] Checking replication health on 172.25.27.2..
Mon Oct  9 11:32:22 2017 - [info]  ok.
Mon Oct  9 11:32:22 2017 - [info] Checking replication health on 172.25.27.3..
Mon Oct  9 11:32:22 2017 - [info]  ok.
Mon Oct  9 11:32:22 2017 - [warning] master_ip_failover_script is not defined.
Mon Oct  9 11:32:22 2017 - [warning] shutdown_script is not defined.
Mon Oct  9 11:32:22 2017 - [info] Got exit code 0 (Not master dead).

MySQL Replication Health is OK.

没有明显报错,只有两个警告而已,复制显示正常。

11.检查MHA Manager的状态:

通过master_check_status脚本查看Manager的状态:

[root@server4 ~]# masterha_check_status --conf=/etc/masterha/app.cnf
app is stopped(2:NOT_RUNNING).

注意:如果正常,会显示”PING_OK”,否则会显示”NOT_RUNNING”,这代表MHA监控没有开启。

12.开启MHA Manager监控

[root@server4 ~]# nohup masterha_manager --conf=/etc/masterha/app.cnf &
[1] 1272
[root@server4 ~]# nohup: ignoring input and appending output to `nohup.out'

[root@server4 ~]#

查看MHA Manager监控是否正常:

[root@server4 ~]# masterha_check_status --conf=/etc/masterha/app.cnf
app (pid:1272) is running(0:PING_OK), master:172.25.27.1

已经在监控了,而且master的主机为172.25.27.1

13.查看启动日志

[root@server4 ~]# tail -n20 /etc/masterha/mha.log 
Mon Oct  9 11:39:19 2017 - [info] Checking slave configurations..
Mon Oct  9 11:39:19 2017 - [info]  read_only=1 is not set on slave 172.25.27.2(172.25.27.2:3306).
Mon Oct  9 11:39:19 2017 - [info]  read_only=1 is not set on slave 172.25.27.3(172.25.27.3:3306).
Mon Oct  9 11:39:19 2017 - [info] Checking replication filtering settings..
Mon Oct  9 11:39:19 2017 - [info]  binlog_do_db= test, binlog_ignore_db= mysql
Mon Oct  9 11:39:19 2017 - [info]  Replication filtering check ok.
Mon Oct  9 11:39:19 2017 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.
Mon Oct  9 11:39:19 2017 - [info] Checking SSH publickey authentication settings on the current master..
Mon Oct  9 11:39:19 2017 - [info] HealthCheck: SSH to 172.25.27.1 is reachable.
Mon Oct  9 11:39:19 2017 - [info] 
172.25.27.1(172.25.27.1:3306) (current master)
 +--172.25.27.2(172.25.27.2:3306)
 +--172.25.27.3(172.25.27.3:3306)

Mon Oct  9 11:39:19 2017 - [warning] master_ip_failover_script is not defined.
Mon Oct  9 11:39:19 2017 - [warning] shutdown_script is not defined.
Mon Oct  9 11:39:19 2017 - [info] Set master ping interval 1 seconds.
Mon Oct  9 11:39:19 2017 - [warning] secondary_check_script is not defined. It is highly recommended setting it to check master reachability from two or more routes.
Mon Oct  9 11:39:19 2017 - [info] Starting ping health check on 172.25.27.1(172.25.27.1:3306)..
Mon Oct  9 11:39:19 2017 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..

中”Ping(SELECT) succeeded, waiting until MySQL doesn’t respond..”说明整个系统已经开始监控了

14.master故障测试

##server2、3  查看master
[root@server2 ~]# mysql -pMypasswd+1 -e 'show slave status\G' | head -n 6
mysql: [Warning] Using a password on the command line interface can be insecure.
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 172.25.27.1
                  Master_User: repl
                  Master_Port: 3306
                Connect_Retry: 60

可以看到master为172.25.27.1
接下来我们模拟172.25.27.1出现故障的情况

[root@server4 ~]# masterha_check_status --conf=/etc/masterha/app.cnf
app (pid:1272) is running(0:PING_OK), master:172.25.27.1

[root@server1 ~]# ps -ax | grep mysql
Warning: bad syntax, perhaps a bogus '-'? See /usr/share/doc/procps-3.2.8/FAQ
  922 ?        S      0:00 /bin/sh /usr/bin/mysqld_safe --datadir=/var/lib/mysql --socket=/var/lib/mysql/mysql.sock --pid-file=/var/run/mysqld/mysqld.pid --basedir=/usr --user=mysql
 1236 ?        Sl     0:03 /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib64/mysql/plugin --user=mysql --log-error=/var/log/mysqld.log --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/lib/mysql/mysql.sock
 1706 pts/0    S+     0:00 grep mysql

[root@server1 ~]# kill -9 922 1236
[root@server1 ~]# ps -ax | grep mysql
Warning: bad syntax, perhaps a bogus '-'? See /usr/share/doc/procps-3.2.8/FAQ
 1713 pts/0    S+     0:00 grep mysql

现在再去server3 上查看master

[root@server3 ~]# mysql -pMypasswd+1 -e 'show slave status\G' | head -n 13
mysql: [Warning] Using a password on the command line interface can be insecure.
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 172.25.27.2
                  Master_User: repl
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: binlog.000003
          Read_Master_Log_Pos: 774
               Relay_Log_File: server3-relay-bin.000002
                Relay_Log_Pos: 695
        Relay_Master_Log_File: binlog.000003
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes

发现master已经切换到了server2
接下来server1故障修复,使其加入集群成为slave

[root@server1 ~]# /etc/init.d/mysqld start
Starting mysqld:                                           [  OK  ]
[root@server1 ~]# mysql -p
Enter password: 
mysql> change master to master_host='172.25.27.2', master_user='repl', master_password='Mypasswd+1', master_auto_position=1;
Query OK, 0 rows affected, 2 warnings (0.05 sec)

mysql> start slave;
Query OK, 0 rows affected (0.01 sec)

mysql> set global read_only=1;
Query OK, 0 rows affected (0.00 sec)

mysql> show slave status\G;
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 172.25.27.2
                  Master_User: repl
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: binlog.000003
          Read_Master_Log_Pos: 774
               Relay_Log_File: server1-relay-bin.000003
                Relay_Log_Pos: 735
        Relay_Master_Log_File: binlog.000003
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes

15.强制在线切换

[root@server3 ~]# mysql -pMypasswd+1 -e 'show slave status\G' | head -n 13
mysql: [Warning] Using a password on the command line interface can be insecure.
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 172.25.27.2
                  Master_User: repl
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: binlog.000003
          Read_Master_Log_Pos: 774
               Relay_Log_File: server3-relay-bin.000002
                Relay_Log_Pos: 695
        Relay_Master_Log_File: binlog.000003
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes

master为server2,接下来我们要强制手动切换到server1

[root@server4 ~]# masterha_master_switch --conf=/etc/masterha/app.cnf --master_state=alive --new_master_host=172.25.27.1 --new_master_port=3306 --orig_master_is_new_slave --running_updates_limit=10000

提示是否切换,输入三次yes即可
其中参数的意思:
–orig_master_is_new_slave 切换时加上此参数是将原 master 变为 slave 节点,如果不加此参数,原来的 master 将不启动

–running_updates_limit=10000,故障切换时,候选master 如果有延迟的话, mha 切换不能成功,加上此参数表示延迟在此时间范围内都可切换(单位为s),但是切换的时间长短是由recover 时relay 日志的大小决定

[root@server2 ~]# mysql -pMypasswd+1 -e 'show slave status\G' | head -n 13
mysql: [Warning] Using a password on the command line interface can be insecure.
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 172.25.27.1
                  Master_User: repl
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: binlog.000004
          Read_Master_Log_Pos: 896
               Relay_Log_File: server3-relay-bin.000002
                Relay_Log_Pos: 405
        Relay_Master_Log_File: binlog.000004
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes

server2上查看,slave变成了server1,server2变成了slave,切换成功

查看日志:

[root@server4 ~]# tail -n20 /etc/masterha/mha.log Mon Oct  9 11:55:33 2017 - [info] 
Mon Oct  9 11:55:33 2017 - [info] Resetting slave info on the new master..
Mon Oct  9 11:55:33 2017 - [info]  172.25.27.2: Resetting slave info succeeded.
Mon Oct  9 11:55:33 2017 - [info] Master failover to 172.25.27.2(172.25.27.2:3306) completed successfully.
Mon Oct  9 11:55:33 2017 - [info] 

----- Failover Report -----

app: MySQL Master failover 172.25.27.1(172.25.27.1:3306) to 172.25.27.2(172.25.27.2:3306) succeeded

Master 172.25.27.1(172.25.27.1:3306) is down!

Check MHA Manager logs at server4:/etc/masterha/mha.log for details.

Started automated(non-interactive) failover.
Selected 172.25.27.2(172.25.27.2:3306) as a new master.
172.25.27.2(172.25.27.2:3306): OK: Applying all logs succeeded.
172.25.27.3(172.25.27.3:3306): OK: Slave started, replicating from 172.25.27.2(172.25.27.2:3306)
172.25.27.2(172.25.27.2:3306): Resetting slave info succeeded.
Master failover to 172.25.27.2(172.25.27.2:3306) completed successfully.

16.强制故障切换

前面实验过 MHA Manager监控开启的情况下master故障会自动从slave中选取一台提升为master,但是成功切换一次之后MHA Manager监控就自动关闭了,那么这个时候如果新的master发生故障之后就无法自动切换了,就需要我们手动进行切换

查看MHA Manager监控状态

[root@server4 ~]# masterha_check_status --conf=/etc/masterha/app.cnfapp is stopped(2:NOT_RUNNING).

果然是关闭的
查看master主机

[root@server3 ~]# mysql -pMypasswd+1 -e 'show slave status\G' | head -n 5
mysql: [Warning] Using a password on the command line interface can be insecure.
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 172.25.27.1
                  Master_User: repl
                  Master_Port: 3306

server1为master,接下来模拟master发生故障

[root@server1 ~]# ps -ax | grep mysql
Warning: bad syntax, perhaps a bogus '-'? See /usr/share/doc/procps-3.2.8/FAQ
 1742 pts/0    S      0:00 /bin/sh /usr/bin/mysqld_safe --datadir=/var/lib/mysql --socket=/var/lib/mysql/mysql.sock --pid-file=/var/run/mysqld/mysqld.pid --basedir=/usr --user=mysql
 2060 pts/0    Sl     0:00 /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib64/mysql/plugin --user=mysql --log-error=/var/log/mysqld.log --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/lib/mysql/mysql.sock
 2113 pts/0    S+     0:00 grep mysql
[root@server1 ~]# kill -9 1742 2060
[root@server1 ~]# ps -ax | grep mysql
Warning: bad syntax, perhaps a bogus '-'? See /usr/share/doc/procps-3.2.8/FAQ
 2115 pts/0    S+     0:00 grep mysql

查看server2/3的状态

[root@server2 ~]# mysql -pMypasswd+1 -e 'show slave status\G' | head -n 13
mysql: [Warning] Using a password on the command line interface can be insecure.
*************************** 1. row ***************************
               Slave_IO_State: Reconnecting after a failed master event read
                  Master_Host: 172.25.27.1
                  Master_User: repl
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: binlog.000004
          Read_Master_Log_Pos: 896
               Relay_Log_File: server2-relay-bin.000002
                Relay_Log_Pos: 405
        Relay_Master_Log_File: binlog.000004
             Slave_IO_Running: Connecting
            Slave_SQL_Running: Yes


[root@server3 ~]# mysql -pMypasswd+1 -e 'show slave status\G' | head -n 13
mysql: [Warning] Using a password on the command line interface can be insecure.
*************************** 1. row ***************************
               Slave_IO_State: Reconnecting after a failed master event read
                  Master_Host: 172.25.27.1
                  Master_User: repl
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: binlog.000004
          Read_Master_Log_Pos: 896
               Relay_Log_File: server3-relay-bin.000002
                Relay_Log_Pos: 405
        Relay_Master_Log_File: binlog.000004
             Slave_IO_Running: Connecting
            Slave_SQL_Running: Yes

master仍然为server1,但是server1已经挂了,无法提供服务了,此时就需要我们手动使候选master接管服务

[root@server4 ~]# masterha_master_switch --master_state=dead --conf=/etc/masterha/app.cnf --dead_master_host=172.25.27.1 --dead_master_port=3306 --new_master_host=172.25.27.2 --new_master_port=3306 --ignore_last_failover

检查是否接管服务

[root@server3 ~]# mysql -pMypasswd+1 -e 'show slave status\G' | head -n 13
mysql: [Warning] Using a password on the command line interface can be insecure.
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 172.25.27.2
                  Master_User: repl
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: binlog.000003
          Read_Master_Log_Pos: 774
               Relay_Log_File: server3-relay-bin.000002
                Relay_Log_Pos: 405
        Relay_Master_Log_File: binlog.000003
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes

server2接管服务成为master,说明切换成功
查看日志:

[root@server4 ~]# tail -n20 /etc/masterha/mha.log Mon Oct  9 11:55:33 2017 - [info] 
Mon Oct  9 11:55:33 2017 - [info] Resetting slave info on the new master..
Mon Oct  9 11:55:33 2017 - [info]  172.25.27.2: Resetting slave info succeeded.
Mon Oct  9 11:55:33 2017 - [info] Master failover to 172.25.27.2(172.25.27.2:3306) completed successfully.
Mon Oct  9 11:55:33 2017 - [info] 

----- Failover Report -----

app: MySQL Master failover 172.25.27.1(172.25.27.1:3306) to 172.25.27.2(172.25.27.2:3306) succeeded

Master 172.25.27.1(172.25.27.1:3306) is down!

Check MHA Manager logs at server4:/etc/masterha/mha.log for details.

Started automated(non-interactive) failover.
Selected 172.25.27.2(172.25.27.2:3306) as a new master.
172.25.27.2(172.25.27.2:3306): OK: Applying all logs succeeded.
172.25.27.3(172.25.27.3:3306): OK: Slave started, replicating from 172.25.27.2(172.25.27.2:3306)
172.25.27.2(172.25.27.2:3306): Resetting slave info succeeded.
Master failover to 172.25.27.2(172.25.27.2:3306) completed successfully.

接下来手动修复server1并加入集群成为slave,这一步此处不再赘述

17.配置VIP

vip配置可以采用两种方式,一种通过keepalived的方式管理虚拟ip的浮动;另外一种通过脚本方式启动虚拟ip的方式(即不需要keepalived或者heartbeat类似的软件)。

1.通过脚本的方式管理VIP。

这里是修改/usr/local/bin/master_ip_failover,也可以使用其他的语言完成,比如php语言。使用php脚本编写的failover这里就不介绍了。修改完成后内容如下,而且如果使用脚本管理vip的话,需要手动在master服务器上绑定一个vip

[root@server4 ~]# vim /etc/masterha/app.cnf
master_ip_failover_script=/etc/masterha/master_ip_failover
master_ip_online_change_script= /etc/masterha/master_ip_online_change
##这两行的注释去掉

[root@server4 ~]# cd /etc/masterha/
[root@server4 masterha]# vim master_ip_online_change

#!/usr/bin/env perl
use strict;
use warnings FATAL =>'all';

use Getopt::Long;

my $vip = '172.25.27.100/24';  # Virtual IP  
my $key = "1";
my $ssh_start_vip = "/sbin/ifconfig eth0:$key $vip";
my $ssh_stop_vip = "/sbin/ifconfig eth0:$key down";
my $exit_code = 0;

my (
  $command,              $orig_master_is_new_slave, $orig_master_host,
  $orig_master_ip,       $orig_master_port,         $orig_master_user,
  $orig_master_password, $orig_master_ssh_user,     $new_master_host,
  $new_master_ip,        $new_master_port,          $new_master_user,
  $new_master_password,  $new_master_ssh_user,
);
GetOptions(
  'command=s'                => \$command,
  'orig_master_is_new_slave' => \$orig_master_is_new_slave,
  'orig_master_host=s'       => \$orig_master_host,
  'orig_master_ip=s'         => \$orig_master_ip,
  'orig_master_port=i'       => \$orig_master_port,
  'orig_master_user=s'       => \$orig_master_user,
  'orig_master_password=s'   => \$orig_master_password,
  'orig_master_ssh_user=s'   => \$orig_master_ssh_user,
  'new_master_host=s'        => \$new_master_host,
  'new_master_ip=s'          => \$new_master_ip,
  'new_master_port=i'        => \$new_master_port,
  'new_master_user=s'        => \$new_master_user,
  'new_master_password=s'    => \$new_master_password,
  'new_master_ssh_user=s'    => \$new_master_ssh_user,
);


exit &main();

sub main {

#print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n";  

if ( $command eq "stop" || $command eq "stopssh" ) {

        # $orig_master_host, $orig_master_ip, $orig_master_port are passed.  
        # If you manage master ip address at global catalog database,  
        # invalidate orig_master_ip here.  
        my $exit_code = 1;
        eval {
            print "\n\n\n***************************************************************\n";
            print "Disabling the VIP - $vip on old master: $orig_master_host\n";
            print "***************************************************************\n\n\n\n";
&stop_vip();
            $exit_code = 0;
        };
        if ($@) {
            warn "Got Error: $@\n";
            exit $exit_code;
        }
        exit $exit_code;
}
elsif ( $command eq "start" ) {

        # all arguments are passed.  
        # If you manage master ip address at global catalog database,  
        # activate new_master_ip here.  
        # You can also grant write access (create user, set read_only=0, etc) here.  
my $exit_code = 10;
        eval {
            print "\n\n\n***************************************************************\n";
            print "Enabling the VIP - $vip on new master: $new_master_host \n";
            print "***************************************************************\n\n\n\n";
&start_vip();
            $exit_code = 0;
        };
        if ($@) {
            warn $@;
            exit $exit_code;
        }
        exit $exit_code;
}
elsif ( $command eq "status" ) {
        print "Checking the Status of the script.. OK \n";
        `ssh $orig_master_ssh_user\@$orig_master_host \" $ssh_start_vip \"`;
        exit 0;
}
else {
&usage();
        exit 1;
}
}

# A simple system call that enable the VIP on the new master  
sub start_vip() {
`ssh $new_master_ssh_user\@$new_master_host \" $ssh_start_vip \"`;
}
# A simple system call that disable the VIP on the old_master  
sub stop_vip() {
`ssh $orig_master_ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
}

sub usage {
print
"Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
}

[root@server4 masterha]# vim master_ip_failover

#!/usr/bin/env perl
use strict;
use warnings FATAL => 'all';
use Getopt::Long;

my (
    $command,          $ssh_user,        $orig_master_host, $orig_master_ip,
    $orig_master_port, $new_master_host, $new_master_ip,    $new_master_port
);

my $vip = '172.25.27.100/24';
my $key = '1';
my $ssh_start_vip = "/sbin/ifconfig eth0:$key $vip";
my $ssh_stop_vip = "/sbin/ifconfig eth0:$key down";

GetOptions(
    'command=s'          => \$command,
    'ssh_user=s'         => \$ssh_user,
    'orig_master_host=s' => \$orig_master_host,
    'orig_master_ip=s'   => \$orig_master_ip,
    'orig_master_port=i' => \$orig_master_port,
    'new_master_host=s'  => \$new_master_host,
    'new_master_ip=s'    => \$new_master_ip,
    'new_master_port=i'  => \$new_master_port,
);

exit &main();

sub main {

    print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n";
    if ( $command eq "stop" || $command eq "stopssh" ) {

        my $exit_code = 1;
        eval {
            print "Disabling the VIP on old master: $orig_master_host \n";
            &stop_vip();
            $exit_code = 0;
        };
        if ($@) {
            warn "Got Error: $@\n";
            exit $exit_code;
        }
        exit $exit_code;
    }
    elsif ( $command eq "start" ) {

        my $exit_code = 10;
        eval {
            print "Enabling the VIP - $vip on the new master - $new_master_host \n";
            &start_vip();
            $exit_code = 0;
        };
        if ($@) {
            warn $@;
            exit $exit_code;
        }
        exit $exit_code;
    }
    elsif ( $command eq "status" ) {
        print "Checking the Status of the script.. OK \n";
        exit 0;
    }
    else {
        &usage();
        exit 1;
    }
}

sub start_vip() {
    `ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`;
}
sub stop_vip() {
     return 0  unless  ($ssh_user);
    `ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
}

sub usage {
    print
    "Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
}
[root@server4 masterha]# chmod +x master_ip_*

[root@server3 ~]# mysql -pMypasswd+1 -e 'show slave status\G' | head -n 13
mysql: [Warning] Using a password on the command line interface can be insecure.
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 172.25.27.2
                  Master_User: repl
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: binlog.000003
          Read_Master_Log_Pos: 774
               Relay_Log_File: server3-relay-bin.000002
                Relay_Log_Pos: 405
        Relay_Master_Log_File: binlog.000003
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes

[root@server4 masterha]# masterha_master_switch --conf=/etc/masterha/app.cnf --master_state=alive --new_master_host=172.25.27.1 --new_master_port=3306 --orig_master_is_new_slave --running_updates_limit=10000

[root@server3 ~]# mysql -pMypasswd+1 -e 'show slave status\G' | head -n 13
mysql: [Warning] Using a password on the command line interface can be insecure.
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 172.25.27.1
                  Master_User: repl
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: binlog.000005
          Read_Master_Log_Pos: 234
               Relay_Log_File: server3-relay-bin.000002
                Relay_Log_Pos: 361
        Relay_Master_Log_File: binlog.000005
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes


[root@server1 ~]# ip addr show eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 52:54:00:ac:4c:af brd ff:ff:ff:ff:ff:ff
    inet 172.25.27.1/24 brd 172.25.27.255 scope global eth0
    inet 172.25.27.100/24 brd 172.25.27.255 scope global secondary eth0:1
    inet6 fe80::5054:ff:feac:4caf/64 scope link 
       valid_lft forever preferred_lft forever

可以看到切换成功,虚拟ip也成功,访问虚拟ip即可获取服务

[root@server4 ~]# mysql -h 172.25.27.100 -u root -pMypasswd+2
mysql> SHOW DATABASES;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| mysql              |
| performance_schema |
| sys                |
| test               |
+--------------------+
5 rows in set (0.00 sec)

刚才我们写了master_ip_failover 脚本,还可以进行master故障强制切换的模拟,这里不做介绍了,有兴趣可以自己实践

2.通过keepalived的方式管理虚拟ip

关于通过keepalived的方式管理虚拟ip,将在后面的文章介绍,感谢关注

你可能感兴趣的:(linux运维)