MYSQL高可用架构之MHA的搭建

#############################相关概念##############################

MySQL MHA架构介绍:

MHA(Master High Availability)目前在MySQL高可用方面是一个相对成熟的解决方案,它由日本DeNA公司youshimaton(现就职于 Facebook公司)开发,是一套优秀的作为MySQL高可用性环境下故障切换和主从提升的高可用软件。在MySQL故障切换过程中,MHA能做到在 0~30秒之内自动完成数据库的故障切换操作,并且在进行故障切换的过程中,MHA能在最大程度上保证数据的一致性,以达到真正意义上的高可用。

该软件由两部分组成:MHA Manager(管理节点)和MHA Node(数据节点)。MHA Manager可以单独部署在一台独立的机器上管理多个master-slave集群,也可以部署在一台slave节点上。MHA Node运行在每台MySQL服务器上,MHA Manager会定时探测集群中的master节点,当master出现故障时,它可以自动将最新数据的slave提升为新的master,然后将所有其他的slave重新指向新的master。整个故障转移过程对应用程序完全透明。

在MHA自动故障切换过程中,MHA试图从宕机(down机:指电脑不能正常工作)的主服务器上保存二进制日志,最大程度的保证数据的不丢失,但这并不总是可行的。例如,如果主服务器硬件故障或无法通过ssh访问,MHA没法保存二进制日志,只进行故障转移而丢失了最新的数据。使用MySQL5.5的半同步复制,可以大大降低数据丢失的风险。MHA可以与半同步复制结合起来。如果只有一个slave已经收到了最新的二进制日志,MHA可以将最新的二进制日志应用于其他所有的slave服务器上,因此可以保证所有节点的数据一致性。

目前MHA主要支持一主多从的架构,要搭建MHA,要求一个复制集群中必须最少有三台数据库服务器,一主二从,即一台充当master,一台充当备用master,另外一台充当从库,因为至少需要三台服务器,出于机器成本的考虑,淘宝也在该基础上进行了改造,目前淘宝TMHA已经支持一主一从。

MHA集群架构图:

MYSQL高可用架构之MHA的搭建_第1张图片

MHA工作原理总结为以下几条:

(1)从宕机崩溃的master保存二进制日志事件(binlog events);

(2)识别含有最新更新的slave;

(3)应用差异的中继日志(relay log) 到其他slave;

(4)应用从master保存的二进制日志事件(binlog events);

(5)提升一个slave为新master;

(6)使用其他的slave连接新的master进行复制。

MHA软件由两部分组成,Manager工具包和Node工具包,具体的说明如下。

Manager工具包主要包括以下几个工具:

masterha_check_ssh              检查MHA的SSH配置状况
masterha_check_repl             检查MySQL复制状况
masterha_manger                 启动MHA
masterha_check_status           检测当前MHA运行状态
masterha_master_monitor         检测master是否宕机
masterha_master_switch          控制故障转移(自动或者手动)
masterha_conf_host              添加或删除配置的server信息

Node工具包(这些工具通常由MHA Manager的脚本触发,无需人为操作)主要包括以下几个工具:

save_binary_logs                保存和复制master的二进制日志
apply_diff_relay_logs           识别差异的中继日志事件并将其差异的事件应用于其他的slave
filter_mysqlbinlog              去除不必要的ROLLBACK事件(MHA已不再使用这个工具)
purge_relay_logs                清除中继日志(不会阻塞SQL线程)

注意:

为了尽可能的减少主库硬件损坏宕机造成的数据丢失,因此在配置MHA的同时建议配置成MySQL 5.7的半同步复制。

参考文档:
MHA原理:https://code.google.com/p/mysql-master-ha/wiki/HowMHAWorks
MHA原理PPT:http://www.slideshare.net/matsunobu/automated-master-failover
Linux配置代理方法:http://blog.csdn.net/bojie5744/article/details/42148719

#############################MHA搭建###################################

配置环境:

版本:redhat6.5
master:server1(172.25.9.1)
Candicate slave:server2(172.25.9.2)
slave:server3(172.25.9.3)
manager:server4(172.25.9.4)

角色                    ip地址          主机名          server_id                  类型
Master                  172.25.9.1    server1            1                      写入
Candicate master        172.25.9.2    server2            2                      读
Slave                   172.25.9.3    server3            3                      读
Monitor host            172.25.9.4    server4            4                      监控复制组

其中master对外提供写服务,备选master(实际的slave,主机名server03)提供读服务,slave也提供相关的读服务,一旦master宕机,将会把备选master提升为新的master,slave指向新的master。

一.搭建主从复制(一主两从)

    点击查看详细过程(主从复制在这里不再赘述)

二.安装MHA软件

(1)三个节点(server1,server2,server3)安装

mha4mysql-node-0.56-0.el6.noarch.rpm

root@server1 mha]# rpm -qa | grep mha4   ##安装的mha node软件
mha4mysql-node-0.56-0.el6.noarch
[root@server1 mha]# rpm -qa | grep perl   ##需要的依赖包
perl-DBD-MySQL-4.013-3.el6.x86_64
perl-Pod-Escapes-1.04-136.el6.x86_64
perl-Pod-Simple-3.13-136.el6.x86_64
perl-Module-Pluggable-3.90-136.el6.x86_64
perl-DBI-1.609-4.el6.x86_64
perl-Error-0.17015-4.el6.noarch
perl-Git-1.7.1-3.el6_4.1.noarch
perl-libs-5.10.1-136.el6.x86_64
perl-version-0.77-136.el6.x86_64
perl-5.10.1-136.el6.x86_64

(2)管理节点安装manager

mha4mysql-manager-0.56-0.el6.noarch

MYSQL高可用架构之MHA的搭建_第2张图片

(3)免密配置

方法一(运用rsync(两边都要装此命令)命令):

root@server4 ~]# ssh-keygen 
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa): 
/root/.ssh/id_rsa already exists.
Overwrite (y/n)? y
Enter passphrase (empty for no passphrase): 
Enter same passphrase again: 
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
11:bd:98:f5:22:27:33:a0:df:46:85:de:cd:7d:1c:b4 root@server4
The key's randomart image is:
+--[ RSA 2048]----+
|        .o    .. |
|      . ..+    ..|
|     . o.* = . E.|
|    .   X.= + . o|
|     . oS* .   . |
|      . o        |
|       .         |
|                 |
|                 |
+-----------------+
[root@server4 ~]# cd /root/.ssh/
[root@server4 .ssh]# yum install -y rsync
[root@server4 .ssh]# ssh-copy-id server4
The authenticity of host 'server4 (172.25.254.4)' can't be established.
RSA key fingerprint is 57:9d:a3:b0:00:cb:7e:c0:8a:a5:75:55:de:53:19:7b.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'server4,172.25.254.4' (RSA) to the list of known hosts.
root@server4's password: 
Now try logging into the machine, with "ssh 'server4'", and check in:

  .ssh/authorized_keys

to make sure we haven't added extra keys that you weren't expecting.

[root@server4 .ssh]# rsync -p * server1:/root/.ssh/
root@server1's password: 
[root@server4 .ssh]# rsync -p * server2:/root/.ssh/
root@server2's password: 
[root@server4 .ssh]# rsync -p * server3:/root/.ssh/
root@server3's password: 
[root@server4 .ssh]# ssh server1
[root@server1 ~]# logout
Connection to server1 closed.
[root@server4 .ssh]# ssh server2
Last login: Wed Jul 11 23:34:42 2018 from server1
[root@server2 ~]# logout
Connection to server2 closed.
[root@server4 .ssh]# ssh server3
Last login: Wed Jul 11 23:35:15 2018 from server4
[root@server3 ~]# logout
Connection to server3 closed.

方法二:

四台机子都需要配置比较麻烦

root@server1 ~]# ssh-keygen -t rsa ##Enter 即可,选择默认方式
[root@server1 ~]# ssh-copy-id -i .ssh/id_rsa.pub [email protected]
[root@server1 ~]# ssh-copy-id -i .ssh/id_rsa.pub [email protected]
[root@server1 ~]# ssh-copy-id -i .ssh/id_rsa.pub [email protected]

三.MHA配置

manager节点配置:

(1)新建目录,并创建文件

root@server4 masterha]# pwd
/etc/masterha    ##此目录为mkdir目录
[root@server4 masterha]# vim app.cnf

编写内容:

[server default]
manager_log=/etc/masterha/mha.log
manager_workdir=/etc/masterha/
master_binlog_dir=/var/lib/mysql
#master_ip_online_change_script=/etc/masterha/master_ip_online_change
password=Chao@199512
ping_interval=1
remote_workdir=/tmp
repl_password=redhat
repl_user=wuyanzu
ssh_user=root
user=root

[server1]
hostname=172.25.9.1
port=3306

[server2]
candidate_master=1
check_repl_delay=0
hostname=172.25.9.2
port=3306

[server3]
hostname=172.25.9.3
port=3306

slave节点(server2和server3)配置:

server2和server3配置relay log的清除方式和slave配置只读,但不要写入配置文件,因为master机down掉后可能随时会升级成master。

root@server2 mysql]# mysql -p
Enter password: 
Welcome to the MySQL monitor.  Commands end with ; or \g.

mysql> set global relay_log_purge=0;
Query OK, 0 rows affected (0.00 sec)
mysql> set global read_only=on;
Query OK, 0 rows affected (0.00 sec)

manager节点检测ssh配置:

root@server4 ~]# masterha_check_ssh --conf=/etc/masterha/app.cnf

Wed Aug 09 20:55:27 2018 - [info] All SSH connection tests passed successfully.

manager节点检测repl环境:

root@server4 ~]# masterha_check_repl --conf=/etc/masterha/app.cnf 
Wed Aug 09 23:56:50 2018 - [warning] Global configuration file
MySQL Replication Health is NOT OK!


 at /usr/share/perl5/vendor_perl/MHA/ServerManager.pm line 297
Thu Aug 09 10:00:50 2018 - [error][/usr/share/perl5/vendor_perl/MHA/ServerManager.pm, ln301] Got MySQL error when connecting 172.25.9.2(172.25.9.2:3306) :1045:Access denied for user 'root'@'server4' (using password: YES), but this is not a MySQL crash. Check MySQL server settings.
 at /usr/share/perl5/vendor_perl/MHA/ServerManager.pm line 297
Thu Aug 09 00:00:50 2018 - [error][/usr/share/perl5/vendor_perl/MHA/ServerManager.pm, ln301] Got MySQL error when connecting 172.25.9.1(172.25.254.1:3306) :1045:Access denied for user 'root'@'server4' (using password: YES), but this is not a MySQL crash. Check MySQL server settings.
 at /usr/share/perl5/vendor_perl/MHA/ServerManager.pm line 297
Thu Aug 09 10:00:50 2018 - [error][/usr/share/perl5/vendor_perl/MHA/ServerManager.pm, ln309] Got fatal error, stopping operations
Thu Aug 09 10:00:50 2018 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln424] Error happened on checking configurations.  at /usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm line 326
Thu Aug 09 10:00:50 2018 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln523] Error happened on monitoring servers.
Thu Aug 09 10:00:50 2018 - [info] Got exit code 1 (Not master dead).

MySQL Replication Health is NOT OK!

我们会发现这时候报错了,那么如何解决呢?

解决方法:master机(server1)给与监控权限,然后再manager机进行测试就会发现repl检测成功。

root@server1 ~]# mysql -p
Enter password: 
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 14
Server version: 5.7.17-log MySQL Community Server (GPL)

Copyright (c) 2000, 2016, Oracle and/or its affiliates. All rights reserved.

mysql> grant all on *.* to root@'172.25.9.%' identified by 'Chao@199512';
Query OK, 0 rows affected, 1 warning (0.13 sec)
root@server4 ~]# masterha_check_repl --conf=/etc/masterha/app.cnf 
Thu Aug 09 23:04:51 2018 - [info] Checking replication health on 172.25.254.2..
Thu Aug 09 23:04:51 2018 - [info]  ok.
Thu Aug 09 23:04:51 2018 - [info] Checking replication health on 172.25.254.3..
Thu Aug 09 23:04:51 2018 - [info]  ok.
Thu Aug 09 23:04:51 2018 - [warning] master_ip_failover_script is not defined.
Thu Aug 09 23:04:51 2018 - [warning] shutdown_script is not defined.
Thu Aug 09 23:04:51 2018 - [info] Got exit code 0 (Not master dead).

MySQL Replication Health is OK.

四.测试

manager机(server4)开启监控:

root@server4 ~]# nohup masterha_manager --conf=/etc/masterha/app.cnf &
[1] 1201
[root@server4 ~]# nohup: ignoring input and appending output to `nohup.out'

将master机(server1)的mysql down掉:

1527 pts/0    S      0:00 /bin/sh /usr/bin/mysqld_safe --datadir=/var/lib/mysq
 1772 pts/0    Sl     0:02 /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/m
 1969 pts/0    R+     0:00 ps ax
[root@server1 ~]# kill -9 1527
[root@server1 ~]# kill -9 1772
或者
[root@server1 ~]# killall -9 mysqld_safe ##这样就会直接跳过去
[root@server1 ~]# killall -9 mysqld

查看manager机(server4)会自动生成日志等文件:

root@server4 masterha]# ls
app.cnf  app.failover.complete  mha.log
[root@server4 masterha]# cat mha.log
----- Failover Report -----

app: MySQL Master failover 172.25.9.1(172.25.9.1:3306) to 172.25.9.2(172.25.9.2:3306) succeeded

Master 172.25.9.1(172.25.9.1:3306) is down!

Check MHA Manager logs at server4:/etc/masterha/mha.log for details.

Started automated(non-interactive) failover.
Selected 172.25.9.2(172.25.9.2:3306) as a new master.
172.25.9.2(172.25.9.2:3306): OK: Applying all logs succeeded.
172.25.9.3(172.25.9.3:3306o_master=1): OK: Slave started, replicating from 172.25.9.2(172.25.9.2:3306)
172.25.254.2(172.25.254.2:3306): Resetting slave info succeeded.
Master failover to 172.25.254.2(172.25.254.2:3306) completed successfully.

日志文件中提示master已经由server2接管,此时我们在server2和server3分别查看,可知:

在server2上面查看,会有master的状态,没有slave状态,server3上面查看,master和slave状态都有。

这时我们将serve1的mysql开启,并将他手动设置为slave,指向新的master。

[root@server1 ~]# mysql -p
Enter password: 
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 4
Server version: 5.7.17-log MySQL Community Server (GPL)

mysql> show slave status;
Empty set (0.00 sec)

mysql> change master to master_host='172.25.9.2',master_user='root',master_password='Chao@199512',master_auto_position=1;
Query OK, 0 rows affected, 2 warnings (0.34 sec)

mysql> start slave;
Query OK, 0 rows affected (0.03 sec)

mysql> show slave status\G;
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 172.25.9.2
                  Master_User: root
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000002
          Read_Master_Log_Pos: 710
               Relay_Log_File: server1-relay-bin.000002
                Relay_Log_Pos: 923
        Relay_Master_Log_File: mysql-bin.000002
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes

 

 

你可能感兴趣的:(运维,linux)