MySQL MHA架构介绍:
MHA(Master High Availability)目前在MySQL高可用方面是一个相对成熟的解决方案,它由日本DeNA公司youshimaton(现就职于 Facebook公司)开发,是一套优秀的作为MySQL高可用性环境下故障切换和主从提升的高可用软件。在MySQL故障切换过程中,MHA能做到在 0~30秒之内自动完成数据库的故障切换操作,并且在进行故障切换的过程中,MHA能在最大程度上保证数据的一致性,以达到真正意义上的高可用。
该软件由两部分组成:MHA Manager(管理节点)和MHA Node(数据节点)。MHA Manager可以单独部署在一台独立的机器上管理多个master-slave集群,也可以部署在一台slave节点上。MHA Node运行在每台MySQL服务器上,MHA Manager会定时探测集群中的master节点,当master出现故障时,它可以自动将最新数据的slave提升为新的master,然后将所有其他的slave重新指向新的master。整个故障转移过程对应用程序完全透明。
在MHA自动故障切换过程中,MHA试图从宕机(down机:指电脑不能正常工作)的主服务器上保存二进制日志,最大程度的保证数据的不丢失,但这并不总是可行的。例如,如果主服务器硬件故障或无法通过ssh访问,MHA没法保存二进制日志,只进行故障转移而丢失了最新的数据。使用MySQL5.5的半同步复制,可以大大降低数据丢失的风险。MHA可以与半同步复制结合起来。如果只有一个slave已经收到了最新的二进制日志,MHA可以将最新的二进制日志应用于其他所有的slave服务器上,因此可以保证所有节点的数据一致性。
目前MHA主要支持一主多从的架构,要搭建MHA,要求一个复制集群中必须最少有三台数据库服务器,一主二从,即一台充当master,一台充当备用master,另外一台充当从库,因为至少需要三台服务器,出于机器成本的考虑,淘宝也在该基础上进行了改造,目前淘宝TMHA已经支持一主一从。
MHA集群架构图:
MHA工作原理总结为以下几条:
(1)从宕机崩溃的master保存二进制日志事件(binlog events);
(2)识别含有最新更新的slave;
(3)应用差异的中继日志(relay log) 到其他slave;
(4)应用从master保存的二进制日志事件(binlog events);
(5)提升一个slave为新master;
(6)使用其他的slave连接新的master进行复制。
MHA软件由两部分组成,Manager工具包和Node工具包,具体的说明如下。
Manager工具包主要包括以下几个工具:
masterha_check_ssh 检查MHA的SSH配置状况
masterha_check_repl 检查MySQL复制状况
masterha_manger 启动MHA
masterha_check_status 检测当前MHA运行状态
masterha_master_monitor 检测master是否宕机
masterha_master_switch 控制故障转移(自动或者手动)
masterha_conf_host 添加或删除配置的server信息
Node工具包(这些工具通常由MHA Manager的脚本触发,无需人为操作)主要包括以下几个工具:
save_binary_logs 保存和复制master的二进制日志
apply_diff_relay_logs 识别差异的中继日志事件并将其差异的事件应用于其他的slave
filter_mysqlbinlog 去除不必要的ROLLBACK事件(MHA已不再使用这个工具)
purge_relay_logs 清除中继日志(不会阻塞SQL线程)
注意:
为了尽可能的减少主库硬件损坏宕机造成的数据丢失,因此在配置MHA的同时建议配置成MySQL 5.7的半同步复制。
参考文档:
MHA原理:https://code.google.com/p/mysql-master-ha/wiki/HowMHAWorks
MHA原理PPT:http://www.slideshare.net/matsunobu/automated-master-failover
Linux配置代理方法:http://blog.csdn.net/bojie5744/article/details/42148719
版本:redhat6.5
master:server1(172.25.9.1)
Candicate slave:server2(172.25.9.2)
slave:server3(172.25.9.3)
manager:server4(172.25.9.4)
角色 ip地址 主机名 server_id 类型
Master 172.25.9.1 server1 1 写入
Candicate master 172.25.9.2 server2 2 读
Slave 172.25.9.3 server3 3 读
Monitor host 172.25.9.4 server4 4 监控复制组
其中master对外提供写服务,备选master(实际的slave,主机名server03)提供读服务,slave也提供相关的读服务,一旦master宕机,将会把备选master提升为新的master,slave指向新的master。
点击查看详细过程(主从复制在这里不再赘述)
(1)三个节点(server1,server2,server3)安装
mha4mysql-node-0.56-0.el6.noarch.rpm
root@server1 mha]# rpm -qa | grep mha4 ##安装的mha node软件
mha4mysql-node-0.56-0.el6.noarch
[root@server1 mha]# rpm -qa | grep perl ##需要的依赖包
perl-DBD-MySQL-4.013-3.el6.x86_64
perl-Pod-Escapes-1.04-136.el6.x86_64
perl-Pod-Simple-3.13-136.el6.x86_64
perl-Module-Pluggable-3.90-136.el6.x86_64
perl-DBI-1.609-4.el6.x86_64
perl-Error-0.17015-4.el6.noarch
perl-Git-1.7.1-3.el6_4.1.noarch
perl-libs-5.10.1-136.el6.x86_64
perl-version-0.77-136.el6.x86_64
perl-5.10.1-136.el6.x86_64
(2)管理节点安装manager
mha4mysql-manager-0.56-0.el6.noarch
(3)免密配置
方法一(运用rsync(两边都要装此命令)命令):
root@server4 ~]# ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa):
/root/.ssh/id_rsa already exists.
Overwrite (y/n)? y
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
11:bd:98:f5:22:27:33:a0:df:46:85:de:cd:7d:1c:b4 root@server4
The key's randomart image is:
+--[ RSA 2048]----+
| .o .. |
| . ..+ ..|
| . o.* = . E.|
| . X.= + . o|
| . oS* . . |
| . o |
| . |
| |
| |
+-----------------+
[root@server4 ~]# cd /root/.ssh/
[root@server4 .ssh]# yum install -y rsync
[root@server4 .ssh]# ssh-copy-id server4
The authenticity of host 'server4 (172.25.254.4)' can't be established.
RSA key fingerprint is 57:9d:a3:b0:00:cb:7e:c0:8a:a5:75:55:de:53:19:7b.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'server4,172.25.254.4' (RSA) to the list of known hosts.
root@server4's password:
Now try logging into the machine, with "ssh 'server4'", and check in:
.ssh/authorized_keys
to make sure we haven't added extra keys that you weren't expecting.
[root@server4 .ssh]# rsync -p * server1:/root/.ssh/
root@server1's password:
[root@server4 .ssh]# rsync -p * server2:/root/.ssh/
root@server2's password:
[root@server4 .ssh]# rsync -p * server3:/root/.ssh/
root@server3's password:
[root@server4 .ssh]# ssh server1
[root@server1 ~]# logout
Connection to server1 closed.
[root@server4 .ssh]# ssh server2
Last login: Wed Jul 11 23:34:42 2018 from server1
[root@server2 ~]# logout
Connection to server2 closed.
[root@server4 .ssh]# ssh server3
Last login: Wed Jul 11 23:35:15 2018 from server4
[root@server3 ~]# logout
Connection to server3 closed.
方法二:
四台机子都需要配置比较麻烦
root@server1 ~]# ssh-keygen -t rsa ##Enter 即可,选择默认方式
[root@server1 ~]# ssh-copy-id -i .ssh/id_rsa.pub [email protected]
[root@server1 ~]# ssh-copy-id -i .ssh/id_rsa.pub [email protected]
[root@server1 ~]# ssh-copy-id -i .ssh/id_rsa.pub [email protected]
manager节点配置:
(1)新建目录,并创建文件
root@server4 masterha]# pwd
/etc/masterha ##此目录为mkdir目录
[root@server4 masterha]# vim app.cnf
编写内容:
[server default]
manager_log=/etc/masterha/mha.log
manager_workdir=/etc/masterha/
master_binlog_dir=/var/lib/mysql
#master_ip_online_change_script=/etc/masterha/master_ip_online_change
password=Chao@199512
ping_interval=1
remote_workdir=/tmp
repl_password=redhat
repl_user=wuyanzu
ssh_user=root
user=root
[server1]
hostname=172.25.9.1
port=3306
[server2]
candidate_master=1
check_repl_delay=0
hostname=172.25.9.2
port=3306
[server3]
hostname=172.25.9.3
port=3306
slave节点(server2和server3)配置:
server2和server3配置relay log的清除方式和slave配置只读,但不要写入配置文件,因为master机down掉后可能随时会升级成master。
root@server2 mysql]# mysql -p
Enter password:
Welcome to the MySQL monitor. Commands end with ; or \g.
mysql> set global relay_log_purge=0;
Query OK, 0 rows affected (0.00 sec)
mysql> set global read_only=on;
Query OK, 0 rows affected (0.00 sec)
manager节点检测ssh配置:
root@server4 ~]# masterha_check_ssh --conf=/etc/masterha/app.cnf
Wed Aug 09 20:55:27 2018 - [info] All SSH connection tests passed successfully.
manager节点检测repl环境:
root@server4 ~]# masterha_check_repl --conf=/etc/masterha/app.cnf
Wed Aug 09 23:56:50 2018 - [warning] Global configuration file
MySQL Replication Health is NOT OK!
at /usr/share/perl5/vendor_perl/MHA/ServerManager.pm line 297
Thu Aug 09 10:00:50 2018 - [error][/usr/share/perl5/vendor_perl/MHA/ServerManager.pm, ln301] Got MySQL error when connecting 172.25.9.2(172.25.9.2:3306) :1045:Access denied for user 'root'@'server4' (using password: YES), but this is not a MySQL crash. Check MySQL server settings.
at /usr/share/perl5/vendor_perl/MHA/ServerManager.pm line 297
Thu Aug 09 00:00:50 2018 - [error][/usr/share/perl5/vendor_perl/MHA/ServerManager.pm, ln301] Got MySQL error when connecting 172.25.9.1(172.25.254.1:3306) :1045:Access denied for user 'root'@'server4' (using password: YES), but this is not a MySQL crash. Check MySQL server settings.
at /usr/share/perl5/vendor_perl/MHA/ServerManager.pm line 297
Thu Aug 09 10:00:50 2018 - [error][/usr/share/perl5/vendor_perl/MHA/ServerManager.pm, ln309] Got fatal error, stopping operations
Thu Aug 09 10:00:50 2018 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln424] Error happened on checking configurations. at /usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm line 326
Thu Aug 09 10:00:50 2018 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln523] Error happened on monitoring servers.
Thu Aug 09 10:00:50 2018 - [info] Got exit code 1 (Not master dead).
MySQL Replication Health is NOT OK!
我们会发现这时候报错了,那么如何解决呢?
解决方法:master机(server1)给与监控权限,然后再manager机进行测试就会发现repl检测成功。
root@server1 ~]# mysql -p
Enter password:
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 14
Server version: 5.7.17-log MySQL Community Server (GPL)
Copyright (c) 2000, 2016, Oracle and/or its affiliates. All rights reserved.
mysql> grant all on *.* to root@'172.25.9.%' identified by 'Chao@199512';
Query OK, 0 rows affected, 1 warning (0.13 sec)
root@server4 ~]# masterha_check_repl --conf=/etc/masterha/app.cnf
Thu Aug 09 23:04:51 2018 - [info] Checking replication health on 172.25.254.2..
Thu Aug 09 23:04:51 2018 - [info] ok.
Thu Aug 09 23:04:51 2018 - [info] Checking replication health on 172.25.254.3..
Thu Aug 09 23:04:51 2018 - [info] ok.
Thu Aug 09 23:04:51 2018 - [warning] master_ip_failover_script is not defined.
Thu Aug 09 23:04:51 2018 - [warning] shutdown_script is not defined.
Thu Aug 09 23:04:51 2018 - [info] Got exit code 0 (Not master dead).
MySQL Replication Health is OK.
四.测试
manager机(server4)开启监控:
root@server4 ~]# nohup masterha_manager --conf=/etc/masterha/app.cnf &
[1] 1201
[root@server4 ~]# nohup: ignoring input and appending output to `nohup.out'
将master机(server1)的mysql down掉:
1527 pts/0 S 0:00 /bin/sh /usr/bin/mysqld_safe --datadir=/var/lib/mysq
1772 pts/0 Sl 0:02 /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/m
1969 pts/0 R+ 0:00 ps ax
[root@server1 ~]# kill -9 1527
[root@server1 ~]# kill -9 1772
或者
[root@server1 ~]# killall -9 mysqld_safe ##这样就会直接跳过去
[root@server1 ~]# killall -9 mysqld
查看manager机(server4)会自动生成日志等文件:
root@server4 masterha]# ls
app.cnf app.failover.complete mha.log
[root@server4 masterha]# cat mha.log
----- Failover Report -----
app: MySQL Master failover 172.25.9.1(172.25.9.1:3306) to 172.25.9.2(172.25.9.2:3306) succeeded
Master 172.25.9.1(172.25.9.1:3306) is down!
Check MHA Manager logs at server4:/etc/masterha/mha.log for details.
Started automated(non-interactive) failover.
Selected 172.25.9.2(172.25.9.2:3306) as a new master.
172.25.9.2(172.25.9.2:3306): OK: Applying all logs succeeded.
172.25.9.3(172.25.9.3:3306o_master=1): OK: Slave started, replicating from 172.25.9.2(172.25.9.2:3306)
172.25.254.2(172.25.254.2:3306): Resetting slave info succeeded.
Master failover to 172.25.254.2(172.25.254.2:3306) completed successfully.
日志文件中提示master已经由server2接管,此时我们在server2和server3分别查看,可知:
在server2上面查看,会有master的状态,没有slave状态,server3上面查看,master和slave状态都有。
这时我们将serve1的mysql开启,并将他手动设置为slave,指向新的master。
[root@server1 ~]# mysql -p
Enter password:
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 4
Server version: 5.7.17-log MySQL Community Server (GPL)
mysql> show slave status;
Empty set (0.00 sec)
mysql> change master to master_host='172.25.9.2',master_user='root',master_password='Chao@199512',master_auto_position=1;
Query OK, 0 rows affected, 2 warnings (0.34 sec)
mysql> start slave;
Query OK, 0 rows affected (0.03 sec)
mysql> show slave status\G;
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 172.25.9.2
Master_User: root
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000002
Read_Master_Log_Pos: 710
Relay_Log_File: server1-relay-bin.000002
Relay_Log_Pos: 923
Relay_Master_Log_File: mysql-bin.000002
Slave_IO_Running: Yes
Slave_SQL_Running: Yes