MySQL组复制学习笔记(基于MySQL 8+) -- 使用篇

3.1. 启动/停止

可以通过start/stop group_replication来启动停止组复制进程.

mysql> start group_replication; /* 启动MySQL组复制相关线程, 启动对应端口的监听, 如,33061 */
mysql> stop group_replicaiton;  /* 关闭MySQL组复制相关进程*/

可以通过group_replication_start_on_boot变量来指定是否随着MySQL启动而启动, 如果组没有其他成员, 第一个启动的成员需要创建组(bootstrap), 否者节点和后续的节点无法入组(因为还没有组)。

组复制里的组可以看成是一个动态视图,不是固定的,以第一个成员执行bootstrap而创建产生,其他成员不断加入/离开而扩大减小,以最后一个成员来开而陨灭。

3.2. 监控组复制

组复制提供了一系列的表来查询复制状态, 这些表放在performance_schema下:

1). 监控组员

可以通过以下表来查询组中所有成员的状态和统计信息, 正常情况下, 无论在哪个MySQL服务器上查询都是一样的。

select * from performance_schema.replication_group_members;      /*组成员, 及状态;*/
select * from performance_schema.replication_group_member_stats; /*组成员的统计信息;*/

正常情况下, 成员的状态应该都是ONLINE.

[email protected] [tpcc]> select * from performance_schema.replication_group_members;
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
| CHANNEL_NAME              | MEMBER_ID                            | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
| group_replication_applier | 15f92590-e587-11d4-a2d8-525400401a99 | oceanbase02 |        3306 | ONLINE       | PRIMARY     | 8.0.19         |
| group_replication_applier | 2f4e8149-e624-11d4-b492-525400bab9b9 | oceanbase03 |        3306 | ONLINE       | PRIMARY     | 8.0.19         |
| group_replication_applier | 3eed933f-e584-11d4-a71b-525400046468 | oceanbase01 |        3306 | ONLINE       | PRIMARY     | 8.0.19         |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
3 rows in set (0.00 sec)

2). 监控复制

通常组复制插件会创建如下复制通道(replication channels):

  • group_replication_recovery - This channel is used for the replication changes that are related to the distributed recovery phase.
  • group_replication_applier - This channel is used for the incoming changes from the group. This is the channel used to apply transactions coming directly from the group.

一个channel其实是一个独立的Slave,包含自己的io_thread和sql_thread。通过以下语句监控复制的情况(注意, 组复制下, show slave status是没有输出的):

select * from performance_schema.replication_connection_status; /* io_thread状态  */
select * from performance_schema.replication_applier_status;    /* sql_thread状态 */    
select * from performance_schema.replication_applier_status_by_coordinator; /* 多线程apply, 协调线程的状态(汇总) */
select * from performance_schema.replication_applier_status_by_worker;      /* 多线程apply, 每个线程状态 */

正常情况下, “group_replication_recovery” channel是不启动的, “group_replication_applier” channel用来执行来自组成员的事务。

[email protected] [tpcc]> select channel_name, service_state, last_error_number, last_error_message, received_transaction_set, queueing_transaction  from performance_schema.replication_connection_status;
+----------------------------+---------------+-------------------+--------------------+----------------------------------------------------------------------------------------------------+----------------------+
| channel_name               | service_state | last_error_number | last_error_message | received_transaction_set                                                                           | queueing_transaction |
+----------------------------+---------------+-------------------+--------------------+----------------------------------------------------------------------------------------------------+----------------------+
| group_replication_applier  | ON            |                 0 |                    | 3eed933f-e584-11d4-a71b-525400046468:1-4,65e5f89b-770d-4936-b72f-c7a190d5884a:1-9:1000008-1000085  |                      |
| group_replication_recovery | OFF           |                 0 |                    |                                                                                                    |                      |
+----------------------------+---------------+-------------------+--------------------+----------------------------------------------------------------------------------------------------+----------------------+
2 rows in set (0.00 sec)

3.3. MySQL Router 8

MySQL Router为应用程序和后端MySQL服务器之间提供透明路由和(MySQL服务器失败后)重定向连接, 它只负责简单的请求的转发, 不会对请求包进行拆封解析; MySQL Router不具有数据复制感知功能,他不知道后端(destinations)中的哪个MySQL服务器是主(primary), 哪个是从(secondary), 或者都是从。所以, MySQL Router不会根据请求包内容(select/update/insert/…)的来路由, 不能简单的依靠MySQL Router来进行读写分离。

假设一个单主(single-primary)的复制组, server1(primary), server2, server3。在设计上可以:
  定义一个路由, 连接入(bind_port=7001)端口的连接, 根据策略(routing_strategy=next-available), 转发到(destinations=server1)主服务器上(考虑切换, 也考虑按照选主优先级把所有成员都放进去); 
  定义一个路由, 连接入(bind_port=7002)端口的连接, 根据策略(routing_strategy=round-robin), 转发到从服务器(destinations=server2,server3)从服务器上;
在程序上, 分别创建2个连接, 分别连接到7001和7002, 处理事务的使用到7001的连接, 处理查询和报表的使用7002的连接。

实际肯定没这么简单,程序需要支持重连,为确保可靠还需要额外的事务失败后重做的代码,读写分离对应用不是透明得,你在设计和编码上都需要进行规划和考虑。

MySQL Router只是简单的尝试, 没深入研究,仅分享一下我的安装过程。配置项可以参考: https://dev.mysql.com/doc/mysql-router/8.0/en/mysql-router-conf-options.html

注意: MySQL Router 8.0.4开始, 原来的'mode'选项已经废弃, 并被新引入的'routing_strategy'选项替代, mode和routing_strategy不能同时设置. 
原来的'mode'的两个值:
 - mode=read-write: 可以由routing_strategy=next-available取代,两个选项的行为是一致的;
 - mode=read-only: 可以由routing_strategy=round-robin取代,两个选项的行为是一致的;
#下载mysql router的二进制包, 解压后直接就可以运行。
# unxz mysql-router-8.0.20-linux-glibc2.12-x86_64.tar.xz
# tar -xvf mysql-router-8.0.20-linux-glibc2.12-x86_64.tar
# mv mysql-router-8.0.20-linux-glibc2.12-x86_64 /usr/local/
# cd /usr/local/
# ln -s mysql-router-8.0.20-linux-glibc2.12-x86_64 mysql-router

创建一个配置文件'mysqlrouter.conf', 配置文件名和位置是很随意的,执行时指定即可。

# cat /etc/mysqlrouter.conf 
[DEFAULT]
logging_folder = /usr/local/mysql-router/log
user=root
connect_timeout=15
read_timeout=30

[logger]
level = INFO
timestamp_precision = second

[routing:read_only]
bind_address     = 0.0.0.0
bind_port        = 7001
routing_strategy = round-robin
destinations     = 192.168.203.115:3306,192.168.203.116:3306

[routing:read_write]
bind_address     = 0.0.0.0
bind_port        = 7002
routing_strategy = round-robin
destinations     = 192.168.203.114:3306

[routing:mysql_tpcc]
bind_address     = 0.0.0.0
bind_port        = 7003
routing_strategy = first-available
destinations     = 192.168.203.114:3306,192.168.203.115:3306,192.168.203.116:3306

# 简单指定配置文件即可执行,连接监听的端口既可以连接到数据库:
# mysqlrouter --config=/etc/mysqlrouter.conf 

3.4. Connect/J(MySQL JDBC)

MySQL的JDBC, 也称为Connect/J的功能是非常强大的,可以独立支持自动重连, 失败切换和负载均衡。具体可以参考Connect/J官方文档,非常详细:
https://dev.mysql.com/doc/connector-j/8.0/en/connector-j-master-slave-replication-connection.html

以下是我测试的demo代码:

import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.sql.Statement;
import java.sql.Timestamp;
import java.text.SimpleDateFormat;
import java.util.Date;
import java.util.Properties;

public class MySQL1 {
	
	public static void main(String[] args) throws Throwable {
		Class.forName("com.mysql.cj.jdbc.Driver");
		
		final Properties props = new Properties();
	    props.put("user", "tpcc_user");
	    props.put("password", "Passw0rd");
	    props.put("useSSL", "false");
	    props.put("useUnicode", "true");
	    props.put("characterEncoding", "UTF-8");
	    
	    props.put("autoReconnect", "true");
	    props.put("roundRobinLoadBalance", "true");
	    
		//final String url = "jdbc:mysql://192.168.203.154:7001/tpcc";
		final String url = "jdbc:mysql:replication://192.168.203.115:3306,192.168.203.114:3306,192.168.203.116:3306/tpcc";
		Connection connection = DriverManager.getConnection(url, props);
		Statement roStmt = connection.createStatement();
		PreparedStatement rwStmt = connection.prepareStatement("insert into tpcc.test1 values(?)");
		
		final SimpleDateFormat dtFormat = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
		int i = 1;
		String output = "";
		while (i < 100) {
			try {
				
				if ((i % 5) == 0) {
					connection.setReadOnly(false);
					connection.setAutoCommit(false);
					rwStmt.setTimestamp(1, new Timestamp(new Date().getTime()));
					rwStmt.execute();
					connection.commit();
				} else {
					connection.setReadOnly(true);
				}
				
				ResultSet rs = roStmt.executeQuery("select @@server_id server_id, @@hostname hostname ");
				if (rs.next()) {
					output = dtFormat.format(new Date())+": server_id=" + rs.getString("server_id") + ", hostname=" + rs.getString("hostname");
					output += " ( ro="+connection.isReadOnly()+" )";
					System.out.println(output);
				}
				rs.close();
				Thread.sleep(1000);
			} catch (SQLException e) {
				System.out.println(dtFormat.format(new Date())+": continue with exception:" +e.getMessage());
			}
			i++;
		}
		
		roStmt.close();
	    connection.close();
	}
}

1. 自动重连(autoReconnect)

只要简单的设置连接属性autoReconnect=true,就可以实现自动重连。开启制动重连时, 当所连的数据库发生故障时,JDBC的数据库连接不会被关闭,而是自动切换到可用的服务器上,如果关闭了自动重连,则数据库连接将会被关闭,需要重新创建打开。

// 当autoReconnec=true时, 可以看到程序切换到102上继续执行。
2020-05-19 14:34:43: server_id=101, hostname=oceanbase01 ( ro=false )
2020-05-19 14:34:44: server_id=101, hostname=oceanbase01 ( ro=false )
2020-05-19 14:34:45: continue with exception:Server shutdown in progress
2020-05-19 14:34:45: server_id=102, hostname=oceanbase02 ( ro=false )
2020-05-19 14:34:46: server_id=102, hostname=oceanbase02 ( ro=false )
2020-05-19 14:34:47: server_id=102, hostname=oceanbase02 ( ro=false )

//当autoReconnec=false时, 程序没有重连,无法继续执行。
2020-05-19 14:37:50: server_id=101, hostname=oceanbase01 ( ro=false )
2020-05-19 14:37:51: server_id=101, hostname=oceanbase01 ( ro=false )
2020-05-19 14:37:52: continue with exception:Server shutdown in progress
2020-05-19 14:37:52: continue with exception:Communications link failure

The last packet successfully received from the server was 46 milliseconds ago. The last packet sent successfully to the server was 48 milliseconds ago.
2020-05-19 14:37:52: continue with exception:No operations allowed after statement closed.
2020-05-19 14:37:52: continue with exception:No operations allowed after statement closed.
2020-05-19 14:37:52: continue with exception:No operations allowed after statement closed.
2020-05-19 14:37:52: continue with exception:No operations allowed after statement closed.
...

2. 复制感知连接(replication-aware connections)

Connector/J支持复制感知连接,Connector/J可以自动的根据 Connection.getReadOnly()的返回值,进行复制环境下的读写分离,失败切换和负载均衡等功能。

  • 在执行查询语句时, 可以设置connection.setReadOnly(true),将查询语句自动发送到slave执行;
  • 在执行修改语句时,可以设置connection.setReadOnly(false),将更新语句自动发送到master执行。
    Connector/J还提供了丰富的复制相关的方法:getMasterHosts/promoteSlaveToMaster(String host)/…

注意,普通的mysql的jdbc的连接串为"jdbc:mysql://…",而复制感知的连接串为"jdbc:mysql:replication://…"

...
2020-05-19 14:44:12: server_id=103, hostname=oceanbase03 ( ro=true )
2020-05-19 14:44:13: server_id=103, hostname=oceanbase03 ( ro=true )
2020-05-19 14:44:14: server_id=103, hostname=oceanbase03 ( ro=true )
2020-05-19 14:44:15: server_id=103, hostname=oceanbase03 ( ro=true )
2020-05-19 14:44:16: server_id=102, hostname=oceanbase02 ( ro=false ) --> insert...
2020-05-19 14:44:17: server_id=103, hostname=oceanbase03 ( ro=true )
2020-05-19 14:44:18: server_id=103, hostname=oceanbase03 ( ro=true )
2020-05-19 14:44:19: server_id=103, hostname=oceanbase03 ( ro=true )
2020-05-19 14:44:20: server_id=103, hostname=oceanbase03 ( ro=true )
2020-05-19 14:44:21: server_id=102, hostname=oceanbase02 ( ro=false )  --> insert ...
2020-05-19 14:44:22: server_id=103, hostname=oceanbase03 ( ro=true )
...

3.4 压力测试 (tpcc_mysql)

tpcc_mysql在组复制环境下运行不是很理想(特别是在多主模式下),虎头蛇尾了,不过也再次说明,组复制对应用不是透明的, 一个设计和编码都没考虑过组复制(集群)的应用,直接放到组复制环境下运行,很可能是有问题的,而且无法发挥组复制的优势。

tpcc_mysql有个history表,是没有主键(等效唯一键),不满足官方文档的要求,修改一下。

ERROR 3750 (HY000): Unable to create or change a table without a primary key, when the system variable 'sql_require_primary_key' is set.  
Add a primary key to the table or unset this variable to avoid this message. 
Note that tables without a primary key can cause performance problems in row-based replication, so please consult your DBA before changing this setting.

手动添加一个主键(可能不合理,只是为符合MGR的要求),修改create_table.sql里的history表:

create table history (
h_u_id int not null auto_increment primary key, /* 添加一个自增列, 作为主键 */
h_c_id int, 
h_c_d_id tinyint, 
h_c_w_id smallint,
h_d_id tinyint,
h_w_id smallint,
h_date datetime,
h_amount decimal(6,2), 
h_data varchar(24) ) Engine=InnoDB;

相应的程序也需要修改,将src目录下的load.c的:

	if( mysql_stmt_prepare(stmt[5],
			       "INSERT INTO history values(?,?,?,?,?,?,?,?)",
			       43) ) goto Error_SqlCall_close;

改为:

  if( mysql_stmt_prepare(stmt[5],
			       "INSERT INTO history values(NULL,?,?,?,?,?,?,?,?)",
			        48) ) goto Error_SqlCall_close;

重新编译后:

# cd src
# make clean
# make

在只连接一个成员(primary),并发多个连接(-c 6)情况下,单主模式比较顺利,偶尔出现事务回滚的报错,而多主模式则出现非常多的事务回滚的报错。

多主模式下,通过mysql router,负载均衡(round-robin)到3个成员(3个成员同时写)时,tpcc_mysql每分钟完成的事务数更低,且会有’Lock deadlock’的报错。

# ./tpcc_start -u root -h 192.168.203.115 -u tpcc_user -p"Passw0rd" -c 6
...
payment 1:10
3101, 40000, Plugin instructed the server to rollback the current transaction.
payment 1:10
1180, HY000, Got error 149 - 'Lock deadlock; Retry transaction' during COMMIT
neword 4:5
1213, 40001, Deadlock found when trying to get lock; try restarting transaction
payment 1:10
3101, 40000, Plugin instructed the server to rollback the current transaction.
payment 3:10
3101, 40000, Plugin instructed the server to rollback the current transaction.
payment 0:7
1213, 40001, Deadlock found when trying to get lock; try restarting transaction
payment 3:5
1213, 40001, Deadlock found when trying to get lock; try restarting transaction
neword 4:8
1213, 40001, Deadlock found when trying to get lock; try restarting transaction
payment 0:7
1213, 40001, Deadlock found when trying to get lock; try restarting transaction
payment 3:7
1213, 40001, Deadlock found when trying to get lock; try restarting transaction
payment 3:10
1213, 40001, Deadlock found when trying to get lock; try restarting transaction
neword 1:6
1213, 40001, Deadlock found when trying to get lock; try restarting transaction
payment 3:5
1213, 40001, Deadlock found when trying to get lock; try restarting transaction
payment 3:2
1213, 40001, Deadlock found when trying to get lock; try restarting transaction
neword 5:9
3101, 40000, Plugin instructed the server to rollback the current transaction.
payment 0:10
1180, HY000, Got error 149 - 'Lock deadlock; Retry transaction' during COMMIT
neword 1:8
1213, 40001, Deadlock found when trying to get lock; try restarting transaction
payment 0:10
3101, 40000, Plugin instructed the server to rollback the current transaction.
...

从测试来说,对一个没有进行组复制优化的应用,多主模式性能不会比单主模式高,且增加了发生事务冲突回滚的几率。

从组复制发布以来,组复制的更新非常快,可以看到不断有新特性在小版本的更新中引入,可以感觉到组复制是MySQL在高可用集群反面的主要发展方向。

你可能感兴趣的:(MySQL,mysql)