MySQL Group Replication(后面称为MGR)GA版本已经出了1个多月,经过一轮简单测试,发现MGR对数据最终一致性(注意,是数据最终一致性,实时?做不到,还是有一定延迟)的保证还是挺靠谱的。通过前面的文章对MGR的介绍,MGR有两种模式,单主和多主,针对当前多主的限制,以及测试中发现的一些问题,多主的实用性应该还不大。
实际上,通过MGR单主模式已经可以消除传统的主从架构故障切换后数据可能不一致的隐患,除此之外还有一个问题,就是HA,如何做到MGR主节点故障自动切换?
我们知道,MGR单主模式下,当主节点故障,MGR内部将会发起一轮选举,选出新的主,这是由MGR内部决定并执行的。但是,MGR并没有为我们考虑周全,应用的连接遇到主节点挂掉的情况,是不会自动发生切换的。也就是说,MGR内部没有提供一种机制,来实现主节点故障切换对应用无感知。
来自MGR官方文档的描述:
Quite obviously, regardless the mode Group Replication is deployed, it does not handle client-side fail-over. That must be handled by the application itself, connector or a middleware framework such as a proxy or router.
意思就是说,我们并不能帮你处理客户端的故障切换,这事得由我们应用自己来!又或者是依靠中间件、proxy这类软件!
在实际应用中,我们当然希望,主节点挂掉,应用无需重启,自动能够将连接重置到新的主上,继续提供服务。基于这方面考虑,赶紧在Google上面检索相关资料,受到了下面这篇博客的启发:
http://lefred.be/content/ha-with-mysql-group-replication-and-proxysql/
可以通过一款MySQL中间件ProySQL来解决上面提到的问题。
说了这么多,简述下我们的目标:
MGR单主模式下,实现主节点故障切换,对应用无感知
目标的实现,需要依赖ProxySQL这款中间件。
前面说到,可以用ProySQL来实现我们的目标。文字不多说,先来张实现思路图:
描述下上面的实现思路:应用通过连接ProxSQL中间件,间接访问后端MGR主节点。ProxySQL内部有配置表,可以维护MGR节点的访问信息,并开启调度器调度检查脚本(Shell脚本),定期巡检后端MGR节点状态,若发现MGR主节点挂掉,ProxySQL调度脚本检测到这个错误,从而确定新的主节点,将原先持有的旧的连接丢弃,产生新节点的连接(ProxySQL内部会维护和后端MGR各个节点的连接,多源连接池的概念)。
上述的整个过程中,应用无需任何变动。应用从意识发生了故障,到连接重新指向新的主,正常提供服务,秒级别的间隔。
【重要】 脚本的校验逻辑,如下面伪代码所示:
set flag switchOver = false;
find current write node;
for( read node in readhostgroup) {
isOk = check read node status;
if(read node is current write node) {
if(!isOk) {
// need to find new write node
set flag switchOver = 1;
update current read node status to be 'OFFLINE_SOFT';
update current write node status to be 'OFFLINE_HARD';
} else {
// is ok
isPrimaryNode = check current write node is really the primary node;
if(!isPrimaryNode) {
// need to find new write node
set flag switchOver = true;
update current write node status to be 'OFFLINE_HARD';
if(read node status != 'ONLINE') {
update read node status to be 'ONLINE';
}
continue;
}
// is primary node
if(read node status != 'ONLINE') {
update current write node status to be 'ONLINE';
update read node status to be 'ONLINE';
}
}
} else if(!isOk) { // node is not current node and status is not ok
update read node status to be 'OFFLINE_SOFT';
} else if(isOk and read node status == 'OFFLINE_SOFT') {
update read node status to be 'ONLINE';
}
}
if(switchOver) {
// need to find new write node
successSwitchOver = false;
for(read node in readgroup and status is 'ONLINE') {
isNewPrimaryNode = check node is the new primary node;
if(isNewPrimaryNode) {
update current write node info as read node;
successSwitchOver = true;
break;
}
}
if(!successSwitchOver) {
// can not find the new write node
report error msg;
}
}
下面介绍如何配置ProxySQL和MGR共同作用来达成我们的目标。
相关软件的安装部署不在此文考虑范围(包括MRG的搭建)
假设我们在一台机器(机器资源有限)上已经部署了一个MGR 3节点集群,模式为单主模式:
mysql> SELECT * FROM performance_schema.replication_group_members;
+---------------------------+--------------------------------------+-------------+-------------+--------------+
| CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE |
+---------------------------+--------------------------------------+-------------+-------------+--------------+
| group_replication_applier | 4a48f63a-d47b-11e6-a16f-a434d920fb4d | CrazyPig-PC | 24801 | ONLINE |
| group_replication_applier | 592b4ea5-d47b-11e6-a3cd-a434d920fb4d | CrazyPig-PC | 24802 | ONLINE |
| group_replication_applier | 6610aa92-d47b-11e6-a60e-a434d920fb4d | CrazyPig-PC | 24803 | ONLINE |
+---------------------------+--------------------------------------+-------------+-------------+--------------+
3 rows in set (0.00 sec)
并且,在这台机器上,成功安装ProxySQL并启动。接下来,有以下配置工作:
1) MGR集群创建相关用户并授权
为了能够让proxysql定期检查MGR节点状态,以及能够作为后端MGR代理层对外提供服务,必须在proxysql创建登陆MGR节点的相应用户并授权:
在MGR主节点上执行:
grant all privileges on *.* to 'proxysql'@'%' identified by 'proxysql';
flush privileges;
2) 创建检查MGR节点状态的函数和视图
参照前面的博客,在MGR主节点上执行下面链接中的SQL:
https://github.com/lefred/mysql_gr_routing_check/blob/master/addition_to_sys.sql
3) 配置proxysql
添加MGR成员节点到proxysql mysql_servers
表:
insert into mysql_servers (hostgroup_id, hostname, port) values(1, '127.0.0.1', 24801);
insert into mysql_servers (hostgroup_id, hostname, port) values(2, '127.0.0.1', 24801);
insert into mysql_servers (hostgroup_id, hostname, port) values(2, '127.0.0.1', 24802);
insert into mysql_servers (hostgroup_id, hostname, port) values(2, '127.0.0.1', 24803);
hostgroup_id = 1
代表write group,针对我们提出的限制,这个地方只配置了一个节点;hostgroup_id = 2
代表read group,包含了MGR的所有节点。
proxysql还可以配置读写分离,本文不考虑这个特性的配置。对于上面的hostgroup配置,所有的读写操作,默认会发送到hostgroup_id为1的hostgroup,也就是发送到写节点上。
接下来我们需要修改proxysql的监控用户和密码为我们上面 步骤 1) 提供的用户和密码。
UPDATE global_variables SET variable_value='proxysql' WHERE variable_name='mysql-monitor_username';
UPDATE global_variables SET variable_value='proxysql' WHERE variable_name='mysql-monitor_password';
并且,添加应用通过proxysql访问后端MGR节点的用户:
insert into mysql_users(username, password) values('proxysql', 'proxysql');
最后我们需要将global_variables
,mysql_servers
、mysql_users
表的信息加载到RUNTIME,更进一步加载到DISK:
LOAD MYSQL VARIABLES TO RUNTIME;
SAVE MYSQL VARIABLES TO DISK;
LOAD MYSQL SERVERS TO RUNTIME;
SAVE MYSQL SERVERS TO DISK;
LOAD MYSQL USERS TO RUNTIME;
SAVE MYSQL USERS TO DISK;
4) 配置scheduler
首先,请在Github地址https://github.com/ZzzCrazyPig/proxysql_groupreplication_checker下载gr_sw_mode_checker.sh
接着,将我们提供的脚本gr_sw_mode_cheker.sh
放到目录/var/lib/proxysql/
下
最后,我们在proxysql的scheduler表里面加载如下记录,然后加载到RUNTIME使其生效,同时还可以持久化到磁盘:
insert into scheduler(id, active, interval_ms, filename, arg1, arg2, arg3, arg4)
values(1, 1, 3000, '/var/lib/proxysql/gr_sw_mode_checker.sh', 1, 2, 1, '/var/lib/proxysql/checker.log');
LOAD SCHEDULER TO RUNTIME;
SAVE SCHEDULER TO DISK;
/var/lib/proxysql/checker.log
脚本及对应的参数说明如下:
gr_sw_mode_cheker.sh writehostgroup_id readhostgroup_id [writeNodeCanRead] [log file]
'./checker.log'
好的,到这里就大功告成了。
由于我是搞Java的,就想着写一段Java程序,通过JDBC连接,连接ProxySQL(注意,客户端应该连接的是ProxySQL),执行select @@port
查看当前连着的是后端MGR哪个节点,然后手动模拟主机挂掉的情况,观察现象。代码如下:
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.sql.Statement;
public class TestMgrHAWithProxysql {
private static final String JDBC_URL = "jdbc:mysql://10.202.7.88:6033/test";
private static final String USER = "proxysql";
private static final String PASSWORD = "proxysql";
public static void main(String[] args) {
tryAgain();
}
private static void tryAgain() {
Connection conn = null;
try {
conn = DriverManager.getConnection(JDBC_URL, USER, PASSWORD);
conn.setAutoCommit(false);
String sql = "select @@port";
Statement stmt = conn.createStatement();
while(true) {
ResultSet rs = stmt.executeQuery(sql);
if(rs.next()) {
System.out.println("port : " + rs.getString(1));
}
try {
Thread.sleep(500);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
} catch(SQLException e) {
e.printStackTrace();
tryAgain();
} finally {
if(conn != null) {
try {
conn.close();
} catch (SQLException e) {
e.printStackTrace();
}
}
}
}
}
初始MGR主节点所在端口为24802,所以程序一直输出:
port : 24802
...
...
...
程序一直运行,模拟挂掉主节点的情况:
mysql> stop group_replication;
Query OK, 0 rows affected (8.34 sec)
此时java程序输出异常,然后持续输出:
com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure
The last packet successfully received from the server was 567 milliseconds ago. The last packet sent successfully to the server was 66 milliseconds ago.
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
at java.lang.reflect.Constructor.newInstance(Unknown Source)
at com.mysql.jdbc.Util.handleNewInstance(Util.java:408)
at com.mysql.jdbc.SQLError.createCommunicationsException(SQLError.java:1137)
at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3715)
at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3604)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:4149)
at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2615)
at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2776)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2834)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2783)
at com.mysql.jdbc.StatementImpl.executeQuery(StatementImpl.java:1569)
at TestMgrHAWithProxysql.tryAgain(TestMgrHAWithProxysql.java:26)
at TestMgrHAWithProxysql.main(TestMgrHAWithProxysql.java:15)
Caused by: java.io.EOFException: Can not read response from server. Expected to read 4 bytes, read 0 bytes before connection was unexpectedly lost.
at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:3161)
at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3615)
... 9 more
port : 24803
...
...
这时候主已经切换到24803端口了,也就是第3个节点。再次验证,先把之前剔除掉的24802端口节点重新加回去:
mysql> start group_replication;
Query OK, 0 rows affected (2.64 sec)
然后登陆24803端口节点,执行:
mysql> stop group_replication;
Query OK, 0 rows affected (8.15 sec)
主继续切换到24802:
com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure
The last packet successfully received from the server was 547 milliseconds ago. The last packet sent successfully to the server was 47 milliseconds ago.
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
at java.lang.reflect.Constructor.newInstance(Unknown Source)
at com.mysql.jdbc.Util.handleNewInstance(Util.java:408)
at com.mysql.jdbc.SQLError.createCommunicationsException(SQLError.java:1137)
at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3715)
at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3604)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:4149)
at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2615)
at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2776)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2834)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2783)
at com.mysql.jdbc.StatementImpl.executeQuery(StatementImpl.java:1569)
at TestMgrHAWithProxysql.tryAgain(TestMgrHAWithProxysql.java:26)
at TestMgrHAWithProxysql.main(TestMgrHAWithProxysql.java:15)
Caused by: java.io.EOFException: Can not read response from server. Expected to read 4 bytes, read 0 bytes before connection was unexpectedly lost.
at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:3161)
at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3615)
... 9 more
port : 24802
...
...
这次我们不把24803重新加入,而是再次stop 24802,让整个MGR只有1个节点,看是否会选择24801端口,答案是肯定的:
com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure
The last packet successfully received from the server was 542 milliseconds ago. The last packet sent successfully to the server was 41 milliseconds ago.
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
at java.lang.reflect.Constructor.newInstance(Unknown Source)
at com.mysql.jdbc.Util.handleNewInstance(Util.java:408)
at com.mysql.jdbc.SQLError.createCommunicationsException(SQLError.java:1137)
at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3715)
at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3604)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:4149)
at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2615)
at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2776)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2834)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2783)
at com.mysql.jdbc.StatementImpl.executeQuery(StatementImpl.java:1569)
at TestMgrHAWithProxysql.tryAgain(TestMgrHAWithProxysql.java:26)
at TestMgrHAWithProxysql.main(TestMgrHAWithProxysql.java:15)
Caused by: java.io.EOFException: Can not read response from server. Expected to read 4 bytes, read 0 bytes before connection was unexpectedly lost.
at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:3161)
at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3615)
... 9 more
port : 24801
...
...
当然,此时如果连24801端口也退出,那么就不用玩了!
本套方案是通过引入中间件,来解决MGR单主模式下主发生故障切换,对应用无感知。有以下影响: