一、简介
MHA(Master High Availability)目前在MySQL高可用方面是一个相对成熟的解决方案,现在很多大型的电商网站都采用此解决方案例如:某宝、某东、某会,是一套优秀的作为MySQL高可用性环境下故障切换和主从提升的高可用软件。在MySQL故障切换过程中,MHA能做到在0~30秒之内手动或自动(如需自动需结合使用脚本实现)完成数据库的故障切换操作,并且在进行故障切换的过程中,MHA能在最大程度上保证数据的一致性,以达到真正意义上的高可用性,就因为有此特性,受到很多大型电商网站的宠爱,并将其进行二次研发。
该软件由两部分组成:MHA Manager(管理节点)和MHA Node(数据节点)。MHA Manager可以单独部署在一台独立的机器上管理多个master-slave集群,也可以部署在一台slave节点上。MHA Node运行在每台MySQL服务器上,MHA Manager会定时探测集群中的master节点,当master出现故障时,它可以自动将最新数据的slave提升为新的master,然后将所有其他的slave重新指向新的master。整个故障转移过程对应用程序完全透明。
在MHA自动故障切换过程中,MHA试图从宕机的主服务器上保存二进制日志,最大程度的保证数据的不丢失,但这并不总是可行的。例如,如果主服务器硬件故障或无法通过ssh访问,MHA没法保存二进制日志,只进行故障转移而丢失了最新的数据。使用MySQL 5.5的半同步复制,可以大大降低数据丢失的风险。MHA可以与半同步复制结合起来。如果只有一个slave已经收到了最新的二进制日志,MHA可以将最新的二进制日志应用于其他所有的slave服务器上,因此可以保证所有节点的数据一致性,有时候可故意设置从节点慢于主节点,当发生意外删除数据库倒是数据丢失时可从从节点二进制日志中恢复。
目前MHA主要支持一主多从的架构,要搭建MHA,要求一个复制集群中必须最少有三台数据库服务器,一主二从,即一台充当master,一台充当备用master,另外一台充当从库,因为至少需要三台服务器,出于机器成本的考虑,淘宝也在该基础上进行了改造,目前淘宝TMHA已经支持一主一从。
MHA高可用集群架构图:
二、实验配置部署及要求
IP地址规划:
服务器名称 |
IP地址 |
主机名 |
MySQL Manager |
10.1.10.65 |
node1.alen.com |
master |
10.1.10.66 |
node2.alren.com |
slave01 |
10.1.10.67 |
node3.alren.com |
slave02 |
10.1.10.68 |
node4.alren.com |
配置要求:
①各个节点之间需通过主机名可互相通信(此实现简单自行查找资料解决)
②在MHA上需安装mha4mysql-manager及其mha4mysql-node两管理软件
③需手动创建配置文件目录及书写配置文件
三、MHA实战配置
配置各mysql并启动服务:
node2:mysql master配置如下: 编辑/etc/my.cnf innodb_file_per_table = 1 skip_name_resolve = 1 log-bin = master-bin relay-log = relay-bin server-id = 1
启动服务执行如下操作: systemctl start mariadb.service 授权一个有所有权限的账号: grant all on *.* to 'mhauser'@'10.1.10.%' identified by 'cncn' 授权一个有复制功能的账号: grant replication slave,replication client on *.* to 'repluser'@'10.1.10.%' identified by 'cncn' flush privileges; node3:mysql slave1
编辑/etc/my.cnf配置文件; innodb_file_per_table = 1 skip_name_resolve = 1 log-bin = master-bin relay-log = relay-bin server-id = 3 read-only = 1 relay-log-purge = 0
启动服务执行如下操作: systemctl start mariadb.service change master to MASTER_HOST='10.1.10.66',MASTER_USER='repluser',MASTER_PASSWORD='cncn',MASTER_LOG_FILE='master-bin.000001',MASTER_LOG_POS=245; start slave node4:mysql slave2
编辑/etc/my.cnf配置文件; innodb_file_per_table = 1 skip_name_resolve = 1 log-bin = master-bin relay-log = relay-bin server-id = 4 read-only = 1 relay-log-purge = 0
启动服务执行如下操作: systemctl start mariadb.service change master to MASTER_HOST='10.1.10.66',MASTER_USER='repluser',MASTER_PASSWORD='cncn',MASTER_LOG_FILE='master-bin.000001',MASTER_LOG_POS=245; start slave; |
手动创建mha目录及其创建配置文件
mkdir /etc/masterha vim /etc/masterha/app1.conf [server default] user=mhauser password=cncn manager_workdir=/data/masterha/app1 manager_log=/data/masterha/app1/manager.log remote_workdir=/data/masterha/app1 ssh_user=root repl_user=repluser repl_password=cncn ping_interval=1 [server1] hostname=10.1.10.66 candidate_master=1 [server2] hostname=10.1.10.67 candidate_master=1 [server3] hostname=10.1.10.68 |
检测各个节点是否可相互通信及各个节点的健康状态信息是否ok
[root@node1 ~]# masterha_check_ssh --conf=/etc/mastermha/app1.conf Fri Nov 25 21:46:49 2016 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. Fri Nov 25 21:46:49 2016 - [info] Reading application default configuration from /etc/mastermha/app1.conf.. Fri Nov 25 21:46:49 2016 - [info] Reading server configuration from /etc/mastermha/app1.conf.. Fri Nov 25 21:46:49 2016 - [info] Starting SSH connection tests.. Fri Nov 25 21:46:51 2016 - [debug] Fri Nov 25 21:46:49 2016 - [debug] Connecting via SSH from [email protected](10.1.10.66:22) to [email protected](10.1.10.67:22).. Fri Nov 25 21:46:50 2016 - [debug] ok. Fri Nov 25 21:46:50 2016 - [debug] Connecting via SSH from [email protected](10.1.10.66:22) to [email protected](10.1.10.68:22).. Fri Nov 25 21:46:51 2016 - [debug] ok. Fri Nov 25 21:46:51 2016 - [debug] Fri Nov 25 21:46:49 2016 - [debug] Connecting via SSH from [email protected](10.1.10.67:22) to [email protected](10.1.10.66:22).. Fri Nov 25 21:46:51 2016 - [debug] ok. Fri Nov 25 21:46:51 2016 - [debug] Connecting via SSH from [email protected](10.1.10.67:22) to [email protected](10.1.10.68:22).. Fri Nov 25 21:46:51 2016 - [debug] ok. Fri Nov 25 21:46:52 2016 - [debug] Fri Nov 25 21:46:50 2016 - [debug] Connecting via SSH from [email protected](10.1.10.68:22) to [email protected](10.1.10.66:22).. Fri Nov 25 21:46:51 2016 - [debug] ok. Fri Nov 25 21:46:51 2016 - [debug] Connecting via SSH from [email protected](10.1.10.68:22) to [email protected](10.1.10.67:22).. Fri Nov 25 21:46:52 2016 - [debug] ok. Fri Nov 25 21:46:52 2016 - [info] All SSH connection tests passed successfully. #说明各节点间的通信正常
[root@node1 ~]# masterha_check_repl --conf=/etc/mastermha/app1.conf Fri Nov 25 22:14:12 2016 - [warning] Global configuration file /etc/masterha_default.c Fri Nov 25 22:14:12 2016 - [info] Reading application default configuration from /etc/ Fri Nov 25 22:14:12 2016 - [info] Reading server configuration from /etc/mastermha/app Fri Nov 25 22:14:12 2016 - [info] MHA::MasterMonitor version 0.56. Fri Nov 25 22:14:13 2016 - [info] GTID failover mode = 0 Fri Nov 25 22:14:13 2016 - [info] Dead Servers: Fri Nov 25 22:14:13 2016 - [info] Alive Servers: Fri Nov 25 22:14:13 2016 - [info] 10.1.10.66(10.1.10.66:3306) Fri Nov 25 22:14:13 2016 - [info] 10.1.10.67(10.1.10.67:3306) Fri Nov 25 22:14:13 2016 - [info] 10.1.10.68(10.1.10.68:3306) Fri Nov 25 22:14:13 2016 - [info] Alive Slaves: Fri Nov 25 22:14:13 2016 - [info] 10.1.10.67(10.1.10.67:3306) Version=5.5.44-MariaD Fri Nov 25 22:14:13 2016 - [info] Replicating from 10.1.10.66(10.1.10.66:3306) Fri Nov 25 22:14:13 2016 - [info] Primary candidate for the new Master (candidate_ Fri Nov 25 22:14:13 2016 - [info] 10.1.10.68(10.1.10.68:3306) Version=5.5.44-MariaD Fri Nov 25 22:14:13 2016 - [info] Replicating from 10.1.10.66(10.1.10.66:3306) Fri Nov 25 22:14:13 2016 - [info] Current Alive Master: 10.1.10.66(10.1.10.66:3306) Fri Nov 25 22:14:13 2016 - [info] Checking slave configurations.. Fri Nov 25 22:14:13 2016 - [warning] relay_log_purge=0 is not set on slave 10.1.10.67 Fri Nov 25 22:14:13 2016 - [warning] relay_log_purge=0 is not set on slave 10.1.10.68 Fri Nov 25 22:14:13 2016 - [info] Checking replication filtering settings.. Fri Nov 25 22:14:13 2016 - [info] binlog_do_db= , binlog_ignore_db= Fri Nov 25 22:14:13 2016 - [info] Replication filtering check ok. Fri Nov 25 22:14:13 2016 - [info] GTID (with auto-pos) is not supported Fri Nov 25 22:14:13 2016 - [info] Starting SSH connection tests.. Fri Nov 25 22:14:16 2016 - [info] All SSH connection tests passed successfully. Fri Nov 25 22:14:16 2016 - [info] Checking MHA Node version.. Fri Nov 25 22:14:18 2016 - [info] Version check ok. Fri Nov 25 22:14:18 2016 - [info] Checking SSH publickey authentication settings on th Fri Nov 25 22:14:18 2016 - [info] HealthCheck: SSH to 10.1.10.66 is reachable. Fri Nov 25 22:14:19 2016 - [info] Master MHA Node version is 0.56. Fri Nov 25 22:14:19 2016 - [info] Checking recovery script configurations on 10.1.10.6 Fri Nov 25 22:14:19 2016 - [info] Executing command: save_binary_logs --command=testterha/app1/save_binary_logs_test --manager_version=0.56 --start_file=master-bin.000003 Fri Nov 25 22:14:19 2016 - [info] Connecting to [email protected](10.1.10.66:22).. Creating /data/masterha/app1 if not exists.. Creating directory /data/masterha/app1. ok. Checking output directory is accessible or not.. ok. Binlog found at /var/lib/mysql, up to master-bin.000003 Fri Nov 25 22:14:19 2016 - [info] Binlog setting check done. Fri Nov 25 22:14:19 2016 - [info] Checking SSH publickey authentication and checking r Fri Nov 25 22:14:19 2016 - [info] Executing command : apply_diff_relay_logs --commanve_port=3306 --workdir=/data/masterha/app1 --target_version=5.5.44-MariaDB-log --managlib/mysql/ --slave_pass=xxx Fri Nov 25 22:14:19 2016 - [info] Connecting to [email protected](10.1.10.67:22).. Creating directory /data/masterha/app1.. done. Checking slave recovery environment settings.. Opening /var/lib/mysql/relay-log.info ... ok. Relay log found at /var/lib/mysql, up to relay-bin.000002 Temporary relay log file is /var/lib/mysql/relay-bin.000002 Testing mysql connection and privileges.. done. Testing mysqlbinlog output.. done. Cleaning up test file(s).. done. Fri Nov 25 22:14:20 2016 - [info] Executing command : apply_diff_relay_logs --commanve_port=3306 --workdir=/data/masterha/app1 --target_version=5.5.44-MariaDB-log --managlib/mysql/ --slave_pass=xxx Fri Nov 25 22:14:20 2016 - [info] Connecting to [email protected](10.1.10.68:22).. Creating directory /data/masterha/app1.. done. Checking slave recovery environment settings.. Opening /var/lib/mysql/relay-log.info ... ok. Relay log found at /var/lib/mysql, up to relay-bin.000002 Temporary relay log file is /var/lib/mysql/relay-bin.000002 Testing mysql connection and privileges.. done. Testing mysqlbinlog output.. done. Cleaning up test file(s).. done. Fri Nov 25 22:14:21 2016 - [info] Slaves settings check done. Fri Nov 25 22:14:21 2016 - [info] 10.1.10.66(10.1.10.66:3306) (current master) +--10.1.10.67(10.1.10.67:3306) +--10.1.10.68(10.1.10.68:3306) Fri Nov 25 22:14:21 2016 - [info] Checking replication health on 10.1.10.67.. Fri Nov 25 22:14:21 2016 - [info] ok. Fri Nov 25 22:14:21 2016 - [info] Checking replication health on 10.1.10.68.. Fri Nov 25 22:14:21 2016 - [info] ok. Fri Nov 25 22:14:21 2016 - [warning] master_ip_failover_script is not defined. Fri Nov 25 22:14:21 2016 - [warning] shutdown_script is not defined. Fri Nov 25 22:14:21 2016 - [info] Got exit code 0 (Not master dead). MySQL Replication Health is OK. #说明各节点之间的健康状态ok |
启动集群管理,让其后台运行即可,调试阶段可让其在前台运行
[root@node1 ~]# masterha_manager --conf=/etc/masterha/app1.conf
Fri Nov 25 22:15:37 2016 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln280] Configuration file /etc/masterha/app1.conf not found!
Fri Nov 25 22:15:37 2016 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln424] Error happened on checking configurations. at /usr/bin/masterha_manager line 50.
Fri Nov 25 22:15:37 2016 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln523] Error happened on monitoring servers.
Fri Nov 25 22:15:37 2016 - [info] Got exit code 1 (Not master dead).
[root@node1 ~]# masterha_manager --conf=/etc/mastermha/app1.conf
Fri Nov 25 22:16:40 2016 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Fri Nov 25 22:16:40 2016 - [info] Reading application default configuration from /etc/mastermha/app1.conf..
Fri Nov 25 22:16:40 2016 - [info] Reading server configuration from /etc/mastermha/app1.conf..
总结:大家可模拟测试,模拟节点故障,查看主节点是否迁移,及上线新的节点,查看其运行状态信息是否正常?高可用MHA在实际生产中可大大减少平均无故障时间,提高数据库的可用性,对敏感类的数据来说不建议结合脚本自动修复故障节点,手动往往相对来说比较安全。