MHA
一个门户站点,最重要的就是数据库了。但是对Web服务器高可用和负载均衡可以找到很多的方法,但是对数据库的方法支持就不是那么多了,之前用heartbeat+drbd做的mysql高可用明显对MySQL的支持不是很好。所以,很多的技术和手段就孕育而生了。
MHA的实现过程
MHA(主节点高可用)是日本人写的一套MySQL的故障切换方案以保障数据库的高可用性质。可以做到30s内故障转移。该系统分为两个部分,MHA Manager(管理节点)和MHA Node(数据节点)。数据节点都是一些MySQL的服务器。其上有一些例如监控mysql的脚本,MHA会定时探测集群中的master节点。如果master节点宕机的话,它可以将最新数据的slave更新为master节点。然后将其他的slave重新指向新的master。
就MHA的功能来讲,MHA可以做到
1、保存宕机的master上的二进制日志内容。
2、识别数据最新的slave节点。
3、提升一个slave成为master。
4、使其他的slave连接新的master进行复制。
MHA的组件
安装MHA会提供许多的工具程序。
Manager节点:
masterha_check_ssh 检查MHA的ssh配置情况
masterha_check_repl 检查MySQL的复制状况
masterha_manger 启动MHA
masterha_check_status 检查当前的MHA的运行状态
masterha_master_monitor 检查master是否宕机
masterha_master_switch 控制故障转移
masterha_conf_host 添加或删除配置的server信息
masterha_stop 关闭MHA的服务工具
Node节点:
save_binary_logs 保存和复制master的二进制日志
apply_diff_relay_logs 识别差异的中继日志事件并应用于其他slave
filter_mysqlbinlog 去除不必要的ROLLBACK事件
purge_relay_logs 清除中继日志(不会阻塞SQL线程)
MySQL的环境。
MHA对各个节点都要求开启二进制日志和中继日志。各从节点必须显示的启用read-only。并关闭relay_log_purge功能等。
MySQL的版本为5.6.26
MySQL集群环境
master_monitor |
192.168.217.130 |
server_id = |
master |
192.168.217.14 |
server_id = 1 |
slave1 |
192.168.217.15 |
server_id = 2 |
slave2 |
192.168.217.16 |
server_id = 3 |
由于各个节点之间的通信依赖于ssh。所以最好配置一下/etc/hosts文件
初始化配置主节点master配置:
server_id = 1
relay-log = relay-bin
log-bin = master-bin
初始化所有的slave的节点的配置:
server_id = 2 (3)
relay-log=relay-bin
log-bin=master-bin
relay_log_purge=0
read_only = 1
因为MHA只是提供了MySQL的高可用,并不提供主从复制。所以主从复制这里还需要自己配置。
配置master。
修改MySQL的配置文件
innodb_file_per_table = 1
skip_name_resolve = 1
log-bin=master-bin
relay-log=relay-bin
server_id = 1
socket=/usr/local/mysql/data/mysql.sock
启动MySQL。添加复制的用户。
mysql> grant replication slave,replication client on *.*
-> to 'repluser'@'192.168.%.%' identified by 'replpass';
Query OK, 0 rows affected (0.38 sec)
mysql> flush privileges;
Query OK, 0 rows affected (0.06 sec)
查看MySQL的状态。
mysql> show master status \G
*************************** 1. row ***************************
File: master-bin.000007
Position: 532
Binlog_Do_DB:
Binlog_Ignore_DB:
Executed_Gtid_Set: 42fcdd78-002b-11e7-994b-000c2969e289:1-7
1 row in set (0.00 sec)
配置slave1。(slave2同理)
修改MySQL的配置文件(slave2的server_id需要更改)
innodb_file_per_table = 1
skip_name_resolve = 1
log-bin = master-bin
relay-log = relay-bin
server-id = 2
read_only = 1
socket=/usr/local/mysql/data/mysql.sock
启动MySQL,并配置主从。
mysql> change master to master_host='192.168.217.14',
-> master_user='repluser',
-> master_password='replpass',
-> master_log_file='master-bin.000007',
-> master_log_pos=532;
Query OK, 0 rows affected, 2 warnings (0.27 sec)
查看状态:
mysql> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.217.14
Master_User: repluser
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: master-bin.000007
Read_Master_Log_Pos: 532
Relay_Log_File: relay-bin.000002
Relay_Log_Pos: 284
Relay_Master_Log_File: master-bin.000007
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 532
Relay_Log_Space: 451
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 1
Master_UUID: 42fcdd78-002b-11e7-994b-000c2969e289
Master_Info_File: /usr/local/mysql/data/master.info
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it
Master_Retry_Count: 86400
Master_Bind:
Last_IO_Error_Timestamp:
Last_SQL_Error_Timestamp:
Master_SSL_Crl:
Master_SSL_Crlpath:
Retrieved_Gtid_Set:
Executed_Gtid_Set: 42fcdd78-002b-11e7-994b-000c2969e289:1-7
Auto_Position: 0
1 row in set (0.00 sec)
master上创建管理账号
mysql> grant all on *.* to 'mhauser'@'192.168.217.%' identified by 'mhapass';
Query OK, 0 rows affected (0.01 sec)
mysql> flush privileges;
Query OK, 0 rows affected (0.00 sec)
monitor创建密钥对。
并配置免密登录
[root@monitor ~]# ssh-keygen -t rsa -P ''
[root@monitor ~]# cat ~/.ssh/id_rsa.pub > ~/.ssh/authorized_keys
[root@monitor ~]# chmod 600 ~/.ssh/authorized_keys
[root@monitor ~]# scp -p ~/.ssh/id_rsa ~/.ssh/authorized_keys master:~root/.ssh
root@master's password:
id_rsa 100% 1675 1.6KB/s 00:00
authorized_keys 100% 394 0.4KB/s 00:00
[root@monitor ~]# scp -p ~/.ssh/id_rsa ~/.ssh/authorized_keys slave1:~root/.ssh
root@slave1's password:
id_rsa 100% 1675 1.6KB/s 00:00
authorized_keys 100% 394 0.4KB/s 00:00
[root@monitor ~]# scp -p ~/.ssh/id_rsa ~/.ssh/authorized_keys slave2:~root/.ssh
root@slave2's password:
id_rsa 100% 1675 1.6KB/s 00:00
authorized_keys 100% 394 0.4KB/s 00:00
测试免密登录
[root@monitor ~]# ssh master 'hostname -I'
192.168.217.14
[root@monitor ~]# ssh slave1 'hostname -I'
192.168.217.15
[root@monitor ~]# ssh slave2 'hostname -I'
192.168.217.13
各个节点都要准备mha4mysql-node-0.56-0.el6.north.rpm
而监控节点需要配置mha4mysql-manager-0.56-el6.north.rpm.
MySQL节点
[root@master ~]# yum -y install mha4mysql-node-0.56-0.el6.noarch.rpm
监控主机(安装监控节点)
[root@monitor ~]# yum -y install mha4mysql-manager-0.56-0.el6.noarch.rpm mha4mysql-node-0.56-0.el6.noarch.rpm
由于安装mha4mysql-manager需要大量perl环境,所以还需要安装perl环境。(yum源里配个epel的源)
MHA的配置文件。
global配置,为各application提供默认配置:
application配置。
[root@monitor ~]# mkdir /etc/masterha.cnf
[root@monitor ~]# vim /etc/masterha.cnf/app1.cnf
内容如下
[server default]
user=mhauser
password=mhapass
manager_workdir=/data/masterha/app1
manager_log=/data/masterha/app1/manager.log
remote_workdir=/data/masterha/app1
ssh_user=root
repl_user=repluser
repl_password=replpass
ping_interval=1
[server1]
hostname=192.168.217.14
[server2]
hostname=192.168.217.15
[server3]
hostname=192.168.217.13
检查ssh状态,如果有如下状态则显示通过了,一切正常。
[root@monitor masterha]# masterha_check_ssh --conf=/etc/masterha/app1.cnf
Wed Mar 29 14:57:33 2017 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Wed Mar 29 14:57:33 2017 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Wed Mar 29 14:57:33 2017 - [info] Reading server configuration from /etc/masterha/app1.cnf..
Wed Mar 29 14:57:33 2017 - [info] Starting SSH connection tests..
Wed Mar 29 14:57:35 2017 - [debug]
Wed Mar 29 14:57:33 2017 - [debug] Connecting via SSH from [email protected](192.168.217.14:22) to [email protected](192.168.217.15:22)..
Warning: Permanently added '192.168.217.14' (RSA) to the list of known hosts.
Wed Mar 29 14:57:33 2017 - [debug] ok.
Wed Mar 29 14:57:33 2017 - [debug] Connecting via SSH from [email protected](192.168.217.14:22) to [email protected](192.168.217.13:22)..
Wed Mar 29 14:57:35 2017 - [debug] ok.
Wed Mar 29 14:57:35 2017 - [debug]
Wed Mar 29 14:57:33 2017 - [debug] Connecting via SSH from [email protected](192.168.217.15:22) to [email protected](192.168.217.14:22)..
Warning: Permanently added '192.168.217.15' (RSA) to the list of known hosts.
Wed Mar 29 14:57:34 2017 - [debug] ok.
Wed Mar 29 14:57:34 2017 - [debug] Connecting via SSH from [email protected](192.168.217.15:22) to [email protected](192.168.217.13:22)..
Wed Mar 29 14:57:35 2017 - [debug] ok.
Wed Mar 29 14:57:36 2017 - [debug]
Wed Mar 29 14:57:34 2017 - [debug] Connecting via SSH from [email protected](192.168.217.13:22) to [email protected](192.168.217.14:22)..
Warning: Permanently added '192.168.217.13' (RSA) to the list of known hosts.
Wed Mar 29 14:57:35 2017 - [debug] ok.
Wed Mar 29 14:57:35 2017 - [debug] Connecting via SSH from [email protected](192.168.217.13:22) to [email protected](192.168.217.15:22)..
Wed Mar 29 14:57:36 2017 - [debug] ok.
Wed Mar 29 14:57:36 2017 - [info] All SSH connection tests passed successfully.
检测复制状态。
[root@mastersql ~]# masterha_check_repl --conf=/etc/app1.cnf
Mon Jul 1 02:08:37 2013 - [warning] shutdown_script is not defined.(中间省略)
Mon Jul 1 02:08:37 2013 - [info] Got exit code 0 (Not master dead).
MySQL Replication Health is OK.
开始工作。
[root@monitor masterha]# masterha_manager --conf=/etc/masterha/app1.cnf
Wed Mar 29 15:27:21 2017 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Wed Mar 29 15:27:21 2017 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Wed Mar 29 15:27:21 2017 - [info] Reading server configuration from /etc/masterha/app1.cnf..
这就是MHA的配置,但是,MHA还不完整,比如说,一个主节点挂了,MHA完成主从切换后,自己就掉了,需要重新启动起来。这就需要自己手动写脚本。