可以通过start/stop group_replication来启动停止组复制进程.
mysql> start group_replication; /* 启动MySQL组复制相关线程, 启动对应端口的监听, 如,33061 */
mysql> stop group_replicaiton; /* 关闭MySQL组复制相关进程*/
可以通过group_replication_start_on_boot变量来指定是否随着MySQL启动而启动, 如果组没有其他成员, 第一个启动的成员需要创建组(bootstrap), 否者节点和后续的节点无法入组(因为还没有组)。
组复制里的组可以看成是一个动态视图,不是固定的,以第一个成员执行bootstrap而创建产生,其他成员不断加入/离开而扩大减小,以最后一个成员来开而陨灭。
组复制提供了一系列的表来查询复制状态, 这些表放在performance_schema下:
可以通过以下表来查询组中所有成员的状态和统计信息, 正常情况下, 无论在哪个MySQL服务器上查询都是一样的。
select * from performance_schema.replication_group_members; /*组成员, 及状态;*/
select * from performance_schema.replication_group_member_stats; /*组成员的统计信息;*/
正常情况下, 成员的状态应该都是ONLINE.
[email protected] [tpcc]> select * from performance_schema.replication_group_members;
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
| CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
| group_replication_applier | 15f92590-e587-11d4-a2d8-525400401a99 | oceanbase02 | 3306 | ONLINE | PRIMARY | 8.0.19 |
| group_replication_applier | 2f4e8149-e624-11d4-b492-525400bab9b9 | oceanbase03 | 3306 | ONLINE | PRIMARY | 8.0.19 |
| group_replication_applier | 3eed933f-e584-11d4-a71b-525400046468 | oceanbase01 | 3306 | ONLINE | PRIMARY | 8.0.19 |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
3 rows in set (0.00 sec)
通常组复制插件会创建如下复制通道(replication channels):
一个channel其实是一个独立的Slave,包含自己的io_thread和sql_thread。通过以下语句监控复制的情况(注意, 组复制下, show slave status是没有输出的):
select * from performance_schema.replication_connection_status; /* io_thread状态 */
select * from performance_schema.replication_applier_status; /* sql_thread状态 */
select * from performance_schema.replication_applier_status_by_coordinator; /* 多线程apply, 协调线程的状态(汇总) */
select * from performance_schema.replication_applier_status_by_worker; /* 多线程apply, 每个线程状态 */
正常情况下, “group_replication_recovery” channel是不启动的, “group_replication_applier” channel用来执行来自组成员的事务。
[email protected] [tpcc]> select channel_name, service_state, last_error_number, last_error_message, received_transaction_set, queueing_transaction from performance_schema.replication_connection_status;
+----------------------------+---------------+-------------------+--------------------+----------------------------------------------------------------------------------------------------+----------------------+
| channel_name | service_state | last_error_number | last_error_message | received_transaction_set | queueing_transaction |
+----------------------------+---------------+-------------------+--------------------+----------------------------------------------------------------------------------------------------+----------------------+
| group_replication_applier | ON | 0 | | 3eed933f-e584-11d4-a71b-525400046468:1-4,65e5f89b-770d-4936-b72f-c7a190d5884a:1-9:1000008-1000085 | |
| group_replication_recovery | OFF | 0 | | | |
+----------------------------+---------------+-------------------+--------------------+----------------------------------------------------------------------------------------------------+----------------------+
2 rows in set (0.00 sec)
MySQL Router为应用程序和后端MySQL服务器之间提供透明路由和(MySQL服务器失败后)重定向连接, 它只负责简单的请求的转发, 不会对请求包进行拆封解析; MySQL Router不具有数据复制感知功能,他不知道后端(destinations)中的哪个MySQL服务器是主(primary), 哪个是从(secondary), 或者都是从。所以, MySQL Router不会根据请求包内容(select/update/insert/…)的来路由, 不能简单的依靠MySQL Router来进行读写分离。
假设一个单主(single-primary)的复制组, server1(primary), server2, server3。在设计上可以:
定义一个路由, 连接入(bind_port=7001)端口的连接, 根据策略(routing_strategy=next-available), 转发到(destinations=server1)主服务器上(考虑切换, 也考虑按照选主优先级把所有成员都放进去);
定义一个路由, 连接入(bind_port=7002)端口的连接, 根据策略(routing_strategy=round-robin), 转发到从服务器(destinations=server2,server3)从服务器上;
在程序上, 分别创建2个连接, 分别连接到7001和7002, 处理事务的使用到7001的连接, 处理查询和报表的使用7002的连接。
实际肯定没这么简单,程序需要支持重连,为确保可靠还需要额外的事务失败后重做的代码,读写分离对应用不是透明得,你在设计和编码上都需要进行规划和考虑。
MySQL Router只是简单的尝试, 没深入研究,仅分享一下我的安装过程。配置项可以参考: https://dev.mysql.com/doc/mysql-router/8.0/en/mysql-router-conf-options.html
注意: MySQL Router 8.0.4开始, 原来的'mode'选项已经废弃, 并被新引入的'routing_strategy'选项替代, mode和routing_strategy不能同时设置.
原来的'mode'的两个值:
- mode=read-write: 可以由routing_strategy=next-available取代,两个选项的行为是一致的;
- mode=read-only: 可以由routing_strategy=round-robin取代,两个选项的行为是一致的;
#下载mysql router的二进制包, 解压后直接就可以运行。
# unxz mysql-router-8.0.20-linux-glibc2.12-x86_64.tar.xz
# tar -xvf mysql-router-8.0.20-linux-glibc2.12-x86_64.tar
# mv mysql-router-8.0.20-linux-glibc2.12-x86_64 /usr/local/
# cd /usr/local/
# ln -s mysql-router-8.0.20-linux-glibc2.12-x86_64 mysql-router
创建一个配置文件'mysqlrouter.conf', 配置文件名和位置是很随意的,执行时指定即可。
# cat /etc/mysqlrouter.conf
[DEFAULT]
logging_folder = /usr/local/mysql-router/log
user=root
connect_timeout=15
read_timeout=30
[logger]
level = INFO
timestamp_precision = second
[routing:read_only]
bind_address = 0.0.0.0
bind_port = 7001
routing_strategy = round-robin
destinations = 192.168.203.115:3306,192.168.203.116:3306
[routing:read_write]
bind_address = 0.0.0.0
bind_port = 7002
routing_strategy = round-robin
destinations = 192.168.203.114:3306
[routing:mysql_tpcc]
bind_address = 0.0.0.0
bind_port = 7003
routing_strategy = first-available
destinations = 192.168.203.114:3306,192.168.203.115:3306,192.168.203.116:3306
# 简单指定配置文件即可执行,连接监听的端口既可以连接到数据库:
# mysqlrouter --config=/etc/mysqlrouter.conf
MySQL的JDBC, 也称为Connect/J的功能是非常强大的,可以独立支持自动重连, 失败切换和负载均衡。具体可以参考Connect/J官方文档,非常详细:
https://dev.mysql.com/doc/connector-j/8.0/en/connector-j-master-slave-replication-connection.html
以下是我测试的demo代码:
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.sql.Statement;
import java.sql.Timestamp;
import java.text.SimpleDateFormat;
import java.util.Date;
import java.util.Properties;
public class MySQL1 {
public static void main(String[] args) throws Throwable {
Class.forName("com.mysql.cj.jdbc.Driver");
final Properties props = new Properties();
props.put("user", "tpcc_user");
props.put("password", "Passw0rd");
props.put("useSSL", "false");
props.put("useUnicode", "true");
props.put("characterEncoding", "UTF-8");
props.put("autoReconnect", "true");
props.put("roundRobinLoadBalance", "true");
//final String url = "jdbc:mysql://192.168.203.154:7001/tpcc";
final String url = "jdbc:mysql:replication://192.168.203.115:3306,192.168.203.114:3306,192.168.203.116:3306/tpcc";
Connection connection = DriverManager.getConnection(url, props);
Statement roStmt = connection.createStatement();
PreparedStatement rwStmt = connection.prepareStatement("insert into tpcc.test1 values(?)");
final SimpleDateFormat dtFormat = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
int i = 1;
String output = "";
while (i < 100) {
try {
if ((i % 5) == 0) {
connection.setReadOnly(false);
connection.setAutoCommit(false);
rwStmt.setTimestamp(1, new Timestamp(new Date().getTime()));
rwStmt.execute();
connection.commit();
} else {
connection.setReadOnly(true);
}
ResultSet rs = roStmt.executeQuery("select @@server_id server_id, @@hostname hostname ");
if (rs.next()) {
output = dtFormat.format(new Date())+": server_id=" + rs.getString("server_id") + ", hostname=" + rs.getString("hostname");
output += " ( ro="+connection.isReadOnly()+" )";
System.out.println(output);
}
rs.close();
Thread.sleep(1000);
} catch (SQLException e) {
System.out.println(dtFormat.format(new Date())+": continue with exception:" +e.getMessage());
}
i++;
}
roStmt.close();
connection.close();
}
}
只要简单的设置连接属性autoReconnect=true,就可以实现自动重连。开启制动重连时, 当所连的数据库发生故障时,JDBC的数据库连接不会被关闭,而是自动切换到可用的服务器上,如果关闭了自动重连,则数据库连接将会被关闭,需要重新创建打开。
// 当autoReconnec=true时, 可以看到程序切换到102上继续执行。
2020-05-19 14:34:43: server_id=101, hostname=oceanbase01 ( ro=false )
2020-05-19 14:34:44: server_id=101, hostname=oceanbase01 ( ro=false )
2020-05-19 14:34:45: continue with exception:Server shutdown in progress
2020-05-19 14:34:45: server_id=102, hostname=oceanbase02 ( ro=false )
2020-05-19 14:34:46: server_id=102, hostname=oceanbase02 ( ro=false )
2020-05-19 14:34:47: server_id=102, hostname=oceanbase02 ( ro=false )
//当autoReconnec=false时, 程序没有重连,无法继续执行。
2020-05-19 14:37:50: server_id=101, hostname=oceanbase01 ( ro=false )
2020-05-19 14:37:51: server_id=101, hostname=oceanbase01 ( ro=false )
2020-05-19 14:37:52: continue with exception:Server shutdown in progress
2020-05-19 14:37:52: continue with exception:Communications link failure
The last packet successfully received from the server was 46 milliseconds ago. The last packet sent successfully to the server was 48 milliseconds ago.
2020-05-19 14:37:52: continue with exception:No operations allowed after statement closed.
2020-05-19 14:37:52: continue with exception:No operations allowed after statement closed.
2020-05-19 14:37:52: continue with exception:No operations allowed after statement closed.
2020-05-19 14:37:52: continue with exception:No operations allowed after statement closed.
...
Connector/J支持复制感知连接,Connector/J可以自动的根据 Connection.getReadOnly()的返回值,进行复制环境下的读写分离,失败切换和负载均衡等功能。
注意,普通的mysql的jdbc的连接串为"jdbc:mysql://…",而复制感知的连接串为"jdbc:mysql:replication://…"
...
2020-05-19 14:44:12: server_id=103, hostname=oceanbase03 ( ro=true )
2020-05-19 14:44:13: server_id=103, hostname=oceanbase03 ( ro=true )
2020-05-19 14:44:14: server_id=103, hostname=oceanbase03 ( ro=true )
2020-05-19 14:44:15: server_id=103, hostname=oceanbase03 ( ro=true )
2020-05-19 14:44:16: server_id=102, hostname=oceanbase02 ( ro=false ) --> insert...
2020-05-19 14:44:17: server_id=103, hostname=oceanbase03 ( ro=true )
2020-05-19 14:44:18: server_id=103, hostname=oceanbase03 ( ro=true )
2020-05-19 14:44:19: server_id=103, hostname=oceanbase03 ( ro=true )
2020-05-19 14:44:20: server_id=103, hostname=oceanbase03 ( ro=true )
2020-05-19 14:44:21: server_id=102, hostname=oceanbase02 ( ro=false ) --> insert ...
2020-05-19 14:44:22: server_id=103, hostname=oceanbase03 ( ro=true )
...
tpcc_mysql在组复制环境下运行不是很理想(特别是在多主模式下),虎头蛇尾了,不过也再次说明,组复制对应用不是透明的, 一个设计和编码都没考虑过组复制(集群)的应用,直接放到组复制环境下运行,很可能是有问题的,而且无法发挥组复制的优势。
tpcc_mysql有个history表,是没有主键(等效唯一键),不满足官方文档的要求,修改一下。
ERROR 3750 (HY000): Unable to create or change a table without a primary key, when the system variable 'sql_require_primary_key' is set.
Add a primary key to the table or unset this variable to avoid this message.
Note that tables without a primary key can cause performance problems in row-based replication, so please consult your DBA before changing this setting.
手动添加一个主键(可能不合理,只是为符合MGR的要求),修改create_table.sql里的history表:
create table history (
h_u_id int not null auto_increment primary key, /* 添加一个自增列, 作为主键 */
h_c_id int,
h_c_d_id tinyint,
h_c_w_id smallint,
h_d_id tinyint,
h_w_id smallint,
h_date datetime,
h_amount decimal(6,2),
h_data varchar(24) ) Engine=InnoDB;
相应的程序也需要修改,将src目录下的load.c的:
if( mysql_stmt_prepare(stmt[5],
"INSERT INTO history values(?,?,?,?,?,?,?,?)",
43) ) goto Error_SqlCall_close;
改为:
if( mysql_stmt_prepare(stmt[5],
"INSERT INTO history values(NULL,?,?,?,?,?,?,?,?)",
48) ) goto Error_SqlCall_close;
重新编译后:
# cd src
# make clean
# make
在只连接一个成员(primary),并发多个连接(-c 6)情况下,单主模式比较顺利,偶尔出现事务回滚的报错,而多主模式则出现非常多的事务回滚的报错。
多主模式下,通过mysql router,负载均衡(round-robin)到3个成员(3个成员同时写)时,tpcc_mysql每分钟完成的事务数更低,且会有’Lock deadlock’的报错。
# ./tpcc_start -u root -h 192.168.203.115 -u tpcc_user -p"Passw0rd" -c 6
...
payment 1:10
3101, 40000, Plugin instructed the server to rollback the current transaction.
payment 1:10
1180, HY000, Got error 149 - 'Lock deadlock; Retry transaction' during COMMIT
neword 4:5
1213, 40001, Deadlock found when trying to get lock; try restarting transaction
payment 1:10
3101, 40000, Plugin instructed the server to rollback the current transaction.
payment 3:10
3101, 40000, Plugin instructed the server to rollback the current transaction.
payment 0:7
1213, 40001, Deadlock found when trying to get lock; try restarting transaction
payment 3:5
1213, 40001, Deadlock found when trying to get lock; try restarting transaction
neword 4:8
1213, 40001, Deadlock found when trying to get lock; try restarting transaction
payment 0:7
1213, 40001, Deadlock found when trying to get lock; try restarting transaction
payment 3:7
1213, 40001, Deadlock found when trying to get lock; try restarting transaction
payment 3:10
1213, 40001, Deadlock found when trying to get lock; try restarting transaction
neword 1:6
1213, 40001, Deadlock found when trying to get lock; try restarting transaction
payment 3:5
1213, 40001, Deadlock found when trying to get lock; try restarting transaction
payment 3:2
1213, 40001, Deadlock found when trying to get lock; try restarting transaction
neword 5:9
3101, 40000, Plugin instructed the server to rollback the current transaction.
payment 0:10
1180, HY000, Got error 149 - 'Lock deadlock; Retry transaction' during COMMIT
neword 1:8
1213, 40001, Deadlock found when trying to get lock; try restarting transaction
payment 0:10
3101, 40000, Plugin instructed the server to rollback the current transaction.
...
从测试来说,对一个没有进行组复制优化的应用,多主模式性能不会比单主模式高,且增加了发生事务冲突回滚的几率。
从组复制发布以来,组复制的更新非常快,可以看到不断有新特性在小版本的更新中引入,可以感觉到组复制是MySQL在高可用集群反面的主要发展方向。