1.关于MHA
MHA是一款开源的mysql的高可用程序,它为mysql主从复制架构提供了automating master failover功能。MHA在监控到master节点故障时,会提升其中拥有最新数据的slave节点成为新的master节点,在此期间,MHA会通过与其他从节点获取额外的信息来避免不一致性的问题。MHA还提供了master节点的在线切换功能,即按需切换master/slave节点。
MHA服务有两种角色,MHA Manager(管理节点)和MHA Node(数据节点):
MHA Manager:通常单独部署在一台独立机器上管理多个master/slave集群,每个master/slave集群称作一个application。 MHA Node:运行在每台mysql服务器上(包括manager),它通过监控具备解析和清理logs功能的脚本来加快故障转移。
(1)提升一个从节点为主节点 (2)在提升之前,会把所有其他从节点记录的所有数据合并到要提升为主节点的那个从节点上
2、MHA的组件
(1)Manager节点
masterha_check_ssh:MHA依赖的SSH环境检测工具 masterha_check_repl:mysql复制环境检测工具 masterha_manager:MHA服务主程序 masterha_check_status:MHA运行状态探测工具 masterha_master_monitor:mysql master节点可用性检测工具 masterha_master_switch:master节点切换工具 masterha_conf_host:添加或删除配置的节点 masterha_stop:关闭MHA服务的工具
(2)Node节点
save_binary_logs:保存和复制master的二进制日志 apply_diff_relay_logs:识别差异的中继日志事件并应用于其它slave filter_mysqlbinlog:去除不必要的rollback事件(MHA已不再使用这个工具) purge_relay_logs:清除中继日志(不会阻塞SQL线程)
(3)自定义扩展
secondary_check_script:通过多条网络路由检测master的可用性 master_ip_failover_script:更新application使用的masterip shutdown_script:强制关闭master节点 report_script:发送报告 init_conf_load_script:加载初始配置参数 master_ip_online_change_script:更新master节点ip地址
3、MHA的实现配置
注意各节点之间的时间同步
(1)主从复制的实现
1)master节点的配置
安装mariadb-server
[root@localhost ~]# yum -y install mariadb-server
编辑/etc/my.cnf文件
[root@localhost ~]# vim /etc/my.cnf 在[mysqld]段的最后面添加如下内容 innodb_file_per_table = ON skip_name_resolve = ON server-id = 1 relay-log = relay-log log-bin = master-log
授权具有复制权限的用户
[root@localhost ~]# systemctl start mariadb.service [root@localhost ~]# mysql MariaDB [(none)]> grant replication slave,replication client on *.* to 'repluser'@'10.1.51.%' identified by 'replpass'; MariaDB [(none)]> flush privileges; MariaDB [(none)]> show master status; +-------------------+----------+--------------+------------------+ | File | Position | Binlog_Do_DB | Binlog_Ignore_DB | +-------------------+----------+--------------+------------------+ | master-log.000003 | 495 | | | +-------------------+----------+--------------+------------------+ MariaDB [(none)]> show binlog events in 'master-log.000003';
由上图可以看到,我们等下在slave设置复制开始位置为245,即可省去手动在slave节点设置复制权限。
2)slave节点的配置
两个从节点的配置出来server-id不同外,其他配置完全相同,后面会特别指出。
安装mariadb-server
[root@localhost ~]# yum -y install mariadb-server
编辑/etc/my.cnf文件
[root@localhost ~]# vim /etc/my.cnf 在[mysqld]段的最后面添加如下内容 innodb_file_per_table = ON skip_name_resolve = ON server-id = 2 # 注意slave2的server-id = 3 relay-log = relay-log log-bin = master-log read-only = 1 relay-log-purge = 0
指定master节点
[root@localhost ~]# systemctl start mariadb.service [root@localhost ~]# mysql MariaDB [(none)]> change master to master_host='10.1.51.30',master_user='repluser',master_password='replpass',master_log_file='master-log.000003',master_log_pos=245; MariaDB [(none)]> start slave; MariaDB [(none)]> show slave status\G *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 10.1.51.30 Master_User: repluser Master_Port: 3306 Connect_Retry: 60 Master_Log_File: master-log.000003 Read_Master_Log_Pos: 495 Relay_Log_File: relay-log.000002 Relay_Log_Pos: 780 Relay_Master_Log_File: master-log.000003 Slave_IO_Running: Yes Slave_SQL_Running: Yes ... ... MariaDB [(none)]> select user,host from mysql.user; +----------+-----------------------+ | user | host | +----------+-----------------------+ | repluser | 10.1.51.% | # repluser用户已经同步过来了 | root | 127.0.0.1 | | root | ::1 | | | localhost | | root | localhost | | | localhost.localdomain | | root | localhost.localdomain | +----------+-----------------------+
至此,mariadb服务的主从复制集群已然配置完成。
(2)准备基于ssh互相通信环境
在manager节点上完成以下操作
[root@localhost ~]# ssh-keygen -t rsa -P '' #创建私钥 [root@localhost ~]# cat .ssh/id_rsa.pub > .ssh/authorized_keys [root@localhost ~]# scp .ssh/authorized_keys .ssh/id_rsa .ssh/id_rsa.pub 10.1.51.30:/root/.ssh/ [root@localhost ~]# scp .ssh/authorized_keys .ssh/id_rsa .ssh/id_rsa.pub 10.1.51.50:/root/.ssh/ [root@localhost ~]# scp .ssh/authorized_keys .ssh/id_rsa .ssh/id_rsa.pub 10.1.51.60:/root/.ssh/
(3)MHA实现的具体操作
MHA官方提供了RPM格式的程序包,可自行下载
1)安装MHA
Manager节点
[root@localhost ~]# yum -y install mha4mysql-node-0.56-0.el6.noarch.rpm mha4mysql-manager-0.56-0.el6.noarch.rpm
所有节点,包括manager
[root@localhost ~]# yum -y install mha4mysql-node-0.56-0.el6.noarch.rpm
2)在manager节点上配置MHA
Manager节点需要为每个监控的master/slave集群提供一个专用的配置文件,而所有的master/slave集群也可共享全局配置。每个application的配置文件是自定义的,此处将使用/etc/masterha/app.cnf。
为了安全起见,需要在master节点上创建一个管理mysql的用户
MariaDB [(none)]> grant all on *.* to 'mhaadmin'@'10.1.51.%' identified by 'mhapass';
编辑/etc/masterha/app.cnf
[root@localhost ~]# vim /etc/masterha/app.cnf [server default] user=mhaadmin #mysql的管理用户 password=mhapass #mysql的管理用户的密码 manager_workdir=/data/masterha/app #manager的工作路径,会自动创建 manager_log=/data/masterha/app/manager.log #manager日志文件 remote_workdir=/data/masterha/app #远程主机的工作路径 ssh_user=root repl_user=repluser repl_password=replpass ping_intervarl=1 [server1] hostname=10.1.51.30 ssh_port=22 candidate_master=1 [server2] hostname=10.1.51.50 ssh_port=22 candidate_master=1 [server3] hostname=10.1.51.60 ssh_port=22 candidate_master=1
检测各节点间ssh互相通信配置是否正常
[root@localhost ~]# masterha_check_ssh --conf=/etc/masterha/app.cnf 输出的结果最后两行如下,表示正常 Sat Nov 26 20:25:40 2016 - [debug] ok. Sat Nov 26 20:25:40 2016 - [info] All SSH connection tests passed successfully.
检查管理的mysql复制集群的连接配置参数是否正常
[root@localhost ~]# masterha_check_repl --conf=/etc/masterha/app.cnf 输出的最后一行如下,表示正常 MySQL Replication Health is OK.
启动MHA
[root@localhost ~]# masterha_manager --conf=/etc/masterha/app.cnf 此命令会一直保持连接在前台 可以另启一个终端查看master节点的状态 [root@localhost ~]# masterha_check_status --conf=/etc/masterha/app.cnf app (pid:5740) is running(0:PING_OK), master:10.1.51.30
3)测试
手动停止master节点上的mariadb服务
[root@localhost ~]# killall -9 mysqld mysqld_safe
查看manager节点日志
[root@localhost ~]# cat /data/masterha/app/manager.log Started automated(non-interactive) failover. The latest slave 10.1.51.50(10.1.51.50:3306) has all relay logs for recovery. Selected 10.1.51.50(10.1.51.50:3306) as a new master. #master自动切换到10.1.51.50节点上 10.1.51.50(10.1.51.50:3306): OK: Applying all logs succeeded. 10.1.51.60(10.1.51.60:3306): This host has the latest relay log events. Generating relay diff files from the latest slave succeeded. 10.1.51.60(10.1.51.60:3306): OK: Applying all logs succeeded. Slave started, replicating from 10.1.51.50(10.1.51.50:3306) 10.1.51.50(10.1.51.50:3306): Resetting slave info succeeded. Master failover to 10.1.51.50(10.1.51.50:3306) completed successfully. 此时MHA将无法启动
在10.1.51.60节点上查看其状态
4)修复10.1.51.30节点,使其成为从节点
在/etc/my.cnf的[mysqld]段最后添加下面两行
read-only = 1 relay-log-purge = 0 [root@localhost ~]# systemctl start mariadb
指定master节点
注意:需要查看master节点的状态位置 MariaDB [(none)]> change master to master_host='10.1.51.50',master_user='repluser',master_password='replpass',master_log_file='master-log.000003',master_log_pos=245; MariaDB [(none)]> start slave;
再次启动MHA即可恢复正常
[root@localhost ~]# masterha_manager --conf=/etc/masterha/app.cnf [root@localhost ~]# masterha_check_status --conf=/etc/masterha/app.cnf app (pid:6343) is running(0:PING_OK), master:10.1.51.50
补充:主节点的手动切换
1)在当前的主节点(10.1.51.50)中执行以下命令
MariaDB [(none)]> FLUSH NO_WRITE_TO_BINLOG TABLES;
2)在manager节点中执行以下命令,注意要先停止MHA
[root@localhost ~]# masterha_stop --conf=/etc/masterha/app.cnf [root@localhost ~]# masterha_master_switch --master_state=alive --conf=/etc/masterha/app.cnf 输出结果如下,表示切换成功切换到10.1.51.30节点
3)在当前的主节点(10.1.51.50)中执行以下命令
MariaDB [(none)]> change master to master_host='10.1.51.30',master_user='repluser',master_password='replpass',master_log_file='master-log.000003',master_log_pos=245; MariaDB [(none)]> start slave;
3)重新启动MHA
[root@localhost ~]# masterha_manager --conf=/etc/masterha/app.cnf [root@localhost ~]# masterha_check_status --conf=/etc/masterha/app.cnf app (pid:7758) is running(0:PING_OK), master:10.1.51.30