点击蓝字关注我们
2020年1月除了来源信息如迷之故事会一般的冠状病毒外,MySQL 官方发布了全新的8.0.19版本,而其中最大的亮点莫过于ReplicaSet功能。
InnoDB ReplicaSet由一个主数据库和多个辅助数据库组成 , 可以使用ReplicaSet对象和AdminAPI操作,在发生故障时手动故障转移到新的主数据库。
官方的MySQL Router同样也支持ReplicaSet,可自动配置、使用InnoDB ReplicaSet,无需手动配置文件。
InnoDB ReplicaSet 先决条件:
仅支持MySQL 8.0及更高版本
仅支持基于GTID的复制
仅支持基于行的复制(RBR),不支持基于语句的复制(SBR)
不支持复制过滤,比如只复制某个库、表。
一个副本集最多包含一个 primary 主实例,支持一个或多个 secondaries 辅助实例。可以添加的辅助实例数量无限制,但由于会占用路由资源,也不要搞得太多。
ReplicaSet必须由MySQL Shell管理, 如:复制帐户的创建和管理。直接用SQL语句无法操作副本集
通常InnoDB ReplicaSet本身不能提供高可用性。与InnoDB Cluster相比,InnoDB ReplicaSet并不十分完善,使用起来也有不少限制,因此还是建议尽可能部署InnoDB Cluster。
InnoDB ReplicaSet的限制包括:
没有自动故障转移。如果主服务器不可用,则需要使用AdminAPI手动触发故障转移,然后才能再次进行任何更改。
故障之前尚未应用的事务可能会丢失。虽然辅助实例仍然可用于读取,但是无法防止因意外停止或不可用而导致部分数据丢失。
无法防止崩溃或不可用后出现不一致情况。如果故障转移在辅助节点仍可用的情况下提升了辅助节点(例如,由于网络分区),则可能会因脑裂而引起不一致。
1. 配置ReplicaSet
# 在/etc/hosts 中加入主机名
192.168.0.77 bj77
192.168.0.78 bj78
192.168.0.79 bj79
# 在3个库中修改参数文件,确保server_id不同,gtid开启:
# 测试库1:
report_host = bj77
report_port = 3306
enforce_gtid_consistency = 1
gtid_mode = 1
server_id = 777
# 测试库2:
report_host = bj78
report_port = 3306
enforce_gtid_consistency = 1
gtid_mode = 1
server_id = 888
# 测试库3:
report_host = bj79
report_port = 3306
enforce_gtid_consistency = 1
gtid_mode = 1
server_id = 999
# 在节点1 (192.168.0.77 bj77)启动MySQL Shell:
mysqlsh -uroot -P3306 -p"*****"
# 配置ReplicaSet实例,根据提示输入root密码
MySQL 8.0.19 localhost:33060+ ssl JS > dba.configureReplicaSetInstance('root@localhost:3306', {clusterAdmin: "'rsadmin'@'%'", clusterAdminPassword: 'TestRepl@123'});
Please provide the password for 'root@localhost:3306': *********
Save password for 'root@localhost:3306'? [Y]es/[N]o/Ne[v]er (default No):
Configuring local MySQL instance listening at port 3306 for use in an InnoDB ReplicaSet...
This instance reports its own address as bj77:3306
Clients and other cluster members will communicate with it through this address by default. If this is not correct, the report_host MySQL system variable should be changed.
The instance 'bj77:3306' is valid to be used in an InnoDB ReplicaSet.
Cluster admin user 'rsadmin'@'%' created.
The instance 'bj77:3306' is already ready to be used in an InnoDB ReplicaSet.
# 创建ReplicaSet
MySQL 8.0.19 localhost:33060+ ssl JS > var rs = dba.createReplicaSet("TestReplicaset")
A new replicaset with instance 'bj77:3306' will be created.
* Checking MySQL instance at bj77:3306
This instance reports its own address as bj77:3306
bj77:3306: Instance configuration is suitable.
* Updating metadata...
ReplicaSet object successfully created for bj77:3306.
Use rs.addInstance() to add more asynchronously replicated instances to this replicaset and rs.status() to check its status.
# 检查状态:
# 添加第二个节点:bj78 192.168.0.78 (仍然在节点1操作)
# 这里用刚才建立的用户配置失败,这是因为rsadmin并没有在节点2创建。
# 因此偷个懒不指定用户,由系统自动生成
# 从库数据恢复方式:选择 C 克隆方式
# 再次查看集群状态,可以看到节点2已经添加,角色SECONDARY
# 同样方式添加节点3:bj79 192.168.0.79 (仍然在节点1操作)
# 查看集群状态,3个节点都有了,1个PRIMARY,2个SECONDARY
MySQL 8.0.19 localhost:3306 ssl JS > rs.status()
{
"replicaSet": {
"name": "TestReplicaset",
"primary": "bj77:3306",
"status": "AVAILABLE",
"statusText": "All instances available.",
"topology": {
"bj77:3306": {
"address": "bj77:3306",
"instanceRole": "PRIMARY",
"mode": "R/W",
"status": "ONLINE"
},
"bj78:3306": {
"address": "bj78:3306",
"instanceRole": "SECONDARY",
"mode": "R/O",
"replication": {
"applierStatus": "APPLIED_ALL",
"applierThreadState": "Slave has read all relay log; waiting for more updates",
"receiverStatus": "ON",
"receiverThreadState": "Waiting for master to send event",
"replicationLag": null
},
"status": "ONLINE"
},
"bj79:3306": {
"address": "bj79:3306",
"instanceRole": "SECONDARY",
"mode": "R/O",
"replication": {
"applierStatus": "APPLIED_ALL",
"applierThreadState": "Slave has read all relay log; waiting for more updates",
"receiverStatus": "ON",
"receiverThreadState": "Waiting for master to send event",
"replicationLag": null
},
"status": "ONLINE"
}
},
"type": "ASYNC"
}
}
# 连接到2个SECONDARY节点,看到复制状态正常。
# 而自动建立复制账号的格式:mysql_innodb_rs_ 加 本机的server_id
[root@localhost][(none)]> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: bj77
Master_User: mysql_innodb_rs_888 <---节点2的server_id=888
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: binlog.000007
Read_Master_Log_Pos: 28380
Relay_Log_File: relay.000002
Relay_Log_Pos: 3637
Relay_Master_Log_File: binlog.000007
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
[root@localhost][(none)]> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: bj77
Master_User: mysql_innodb_rs_999 <---节点3的server_id=999
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: binlog.000007
Read_Master_Log_Pos: 32768
Relay_Log_File: relay.000002
Relay_Log_Pos: 3579
Relay_Master_Log_File: binlog.000007
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
# 在user表里也可看到这3个用户,和对应的权限
2. 管理ReplicaSet
常用的命令:
ReplicaSet.addInstance() 添加实例
ReplicaSet.removeInstance() 移除实例
ReplicaSet.getName() 检查ReplicaSet名称
ReplicaSet.status() 检查ReplicaSet的状态信息
ReplicaSet.rejoinInstance() 重新加入集群
\help ReplicaSet或 ReplicaSet.help() 获取帮助命令
MySQL 8.0.19 localhost:3306 ssl mysql JS > \help ReplicaSet
NAME
ReplicaSet - Represents an InnoDB ReplicaSet.
DESCRIPTION
The ReplicaSet object is used to manage MySQL server topologies that use
asynchronous replication. It can be created using the
dba.createReplicaSet() or dba.getReplicaSet() functions.
PROPERTIES
name
Returns the name of the replicaset.
FUNCTIONS
addInstance(instance[, options])
Adds an instance to the replicaset.
disconnect()
Disconnects all internal sessions used by the replicaset object.
forcePrimaryInstance(instance, options)
Performs a failover in a replicaset with an unavailable PRIMARY.
getName()
Returns the name of the replicaset.
help([member])
Provides help about this class and it s members
listRouters([options])
Lists the Router instances.
rejoinInstance(instance[, options])
Rejoins an instance to the replicaset.
removeInstance(instance[, options])
Removes an Instance from the replicaset.
removeRouterMetadata(routerDef)
Removes metadata for a router instance.
setPrimaryInstance(instance, options)
Performs a safe PRIMARY switchover, promoting the given instance.
status([options])
Describe the status of the replicaset.
For more help on a specific function, use the \help shell command, e.g.:
\help ReplicaSet.addInstance
3. 主从切换测试
# 目前的集群角色:
192.168.0.77 bj77 --主
192.168.0.78 bj78 --从
192.168.0.79 bj79 --从
# 使用setPrimaryInstance可以手动切换:
# 在此示例中,将把 bj78 更改为PRIMARY并将 bj77 更改为 SECONDARY成员。
MySQL 8.0.19 localhost:3306 ssl mysql JS > var rs = dba.getReplicaSet()
You are connected to a member of replicaset 'TestReplicaset'.
MySQL 8.0.19 localhost:3306 ssl mysql JS > rs.setPrimaryInstance('bj78:3306')
bj78:3306 will be promoted to PRIMARY of 'TestReplicaset'.
The current PRIMARY is bj77:3306.
* Connecting to replicaset instances
** Connecting to bj77:3306
** Connecting to bj78:3306
** Connecting to bj79:3306
** Connecting to bj77:3306
** Connecting to bj78:3306
** Connecting to bj79:3306
* Performing validation checks
** Checking async replication topology...
** Checking transaction state of the instance...
* Synchronizing transaction backlog at bj78:3306
* Updating metadata
* Acquiring locks in replicaset instances
** Pre-synchronizing SECONDARIES
** Acquiring global lock at PRIMARY
** Acquiring global lock at SECONDARIES
* Updating replication topology
** Configuring bj77:3306 to replicate from bj78:3306
** Changing replication source of bj79:3306 to bj78:3306
bj78:3306 was promoted to PRIMARY.
# 检查状态,当前主库已经是bj78了。
MySQL 8.0.19 localhost:3306 ssl mysql JS > rs.status()
{
"replicaSet": {
"name": "TestReplicaset",
"primary": "bj78:3306",
"status": "AVAILABLE",
"statusText": "All instances available.",
"topology": {
"bj77:3306": {
"address": "bj77:3306",
"instanceRole": "SECONDARY",
"mode": "R/O",
"replication": {
"applierStatus": "APPLIED_ALL",
"applierThreadState": "Slave has read all relay log; waiting for more updates",
"receiverStatus": "ON",
"receiverThreadState": "Waiting for master to send event",
"replicationLag": null
},
"status": "ONLINE"
},
"bj78:3306": {
"address": "bj78:3306",
"instanceRole": "PRIMARY",
"mode": "R/W",
"status": "ONLINE"
},
"bj79:3306": {
"address": "bj79:3306",
"instanceRole": "SECONDARY",
"mode": "R/O",
"replication": {
"applierStatus": "APPLIED_ALL",
"applierThreadState": "Slave has read all relay log; waiting for more updates",
"receiverStatus": "ON",
"receiverThreadState": "Waiting for master to send event",
"replicationLag": null
},
"status": "ONLINE"
}
},
"type": "ASYNC"
}
}
4. 故障转移
与InnoDB Cluster不同,InnoDB ReplicaSet 没有自动故障检测与组复制机制。在主库发生意外故障时,并不会自动故障转移。如果主库损坏不可用,ReplicaSet实际上是只读的,必须选择一个新主库。
使用ReplicaSet.forcePrimaryInstance() 可以强制配置(故障转移)PRIMARY实例。这只能在当前主库不可用,且无法还原的灾难发生中才使用。
这次测试切换到bj79
# 目前的集群角色:
192.168.0.77 bj77 从
192.168.0.78 bj78 主
192.168.0.79 bj79 从
# 当主库正常可用时使用这个命令是无效的。
MySQL 8.0.19 localhost:3306 ssl mysql JS > rs.forcePrimaryInstance('bj79:3306')
* Connecting to replicaset instances
** Connecting to bj77:3306
** Connecting to bj79:3306
* Waiting for all received transactions to be applied
** Waiting for received transactions to be applied at bj77:3306
** Waiting for received transactions to be applied at bj79:3306
bj79:3306 will be promoted to PRIMARY of the replicaset and the former PRIMARY will be invalidated.
* Checking status of last known PRIMARY
PRIMARY bj78:3306 is still available.
Operation not allowed while there is still an available PRIMARY. Use setPrimaryInstance() to safely switch the PRIMARY.
ReplicaSet.forcePrimaryInstance: PRIMARY still available (MYSQLSH 51116)
# 在主库bj78 上kill掉mysqld_safe、mysqld
步骤略
# 连接到bj79,确认当前集群主库bj78已不可用报错。
MySQL 8.0.19 localhost:3306 ssl JS > var rs = dba.getReplicaSet()
You are connected to a member of replicaset 'TestReplicaset'.
MySQL 8.0.19 localhost:3306 ssl JS > rs.status()
ERROR: Unable to connect to the PRIMARY of the replicaset TestReplicaset: bj78:3306: Can't connect to MySQL server on 'bj78' (111)
Cluster change operations will not be possible unless the PRIMARY can be reached.
If the PRIMARY is unavailable, you must either repair it or perform a forced failover.
See \help forcePrimaryInstance for more information.
WARNING: MYSQLSH 51118: PRIMARY instance is unavailable
{
"replicaSet": {
"name": "TestReplicaset",
"primary": "bj78:3306",
"status": "UNAVAILABLE",
"statusText": "PRIMARY instance is not available, but there is at least one SECONDARY that could be force-promoted.",
"topology": {
"bj77:3306": {
"address": "bj77:3306",
"fenced": true,
"instanceErrors": [
"ERROR: Replication I/O thread (receiver) has stopped with an error."
],
"instanceRole": "SECONDARY",
"mode": "R/O",
"replication": {
"applierStatus": "APPLIED_ALL",
"applierThreadState": "Slave has read all relay log; waiting for more updates",
"expectedSource": "bj78:3306",
"receiverLastError": "error reconnecting to master 'mysql_innodb_rs_777@bj78:3306' - retry-time: 60 retries: 2 message: Can't connect to MySQL server on 'bj78' (111)",
"receiverLastErrorNumber": 2003,
"receiverLastErrorTimestamp": "2020-01-23 19:55:59.782554",
"receiverStatus": "ERROR",
"receiverThreadState": "",
"replicationLag": null,
"source": "bj78:3306"
},
"status": "ERROR",
"transactionSetConsistencyStatus": null
},
"bj78:3306": {
"address": "bj78:3306",
"connectError": "bj78:3306: Can't connect to MySQL server on 'bj78' (111)",
"fenced": null,
"instanceRole": "PRIMARY",
"mode": null,
"status": "UNREACHABLE"
},
"bj79:3306": {
"address": "bj79:3306",
"fenced": true,
"instanceErrors": [
"ERROR: Replication I/O thread (receiver) has stopped with an error."
],
"instanceRole": "SECONDARY",
"mode": "R/O",
"replication": {
"applierStatus": "APPLIED_ALL",
"applierThreadState": "Slave has read all relay log; waiting for more updates",
"expectedSource": "bj78:3306",
"receiverLastError": "error reconnecting to master 'mysql_innodb_rs_999@bj78:3306' - retry-time: 60 retries: 2 message: Can't connect to MySQL server on 'bj78' (111)",
"receiverLastErrorNumber": 2003,
"receiverLastErrorTimestamp": "2020-01-23 19:55:59.782926",
"receiverStatus": "ERROR",
"receiverThreadState": "",
"replicationLag": null,
"source": "bj78:3306"
},
"status": "ERROR",
"transactionSetConsistencyStatus": null
}
},
"type": "ASYNC"
}
}
# 强制提升bj79为新主库
MySQL 8.0.19 localhost:3306 ssl JS > rs.forcePrimaryInstance('bj79:3306')
* Connecting to replicaset instances
** Connecting to bj77:3306
** Connecting to bj79:3306
* Waiting for all received transactions to be applied
** Waiting for received transactions to be applied at bj79:3306
** Waiting for received transactions to be applied at bj77:3306
bj79:3306 will be promoted to PRIMARY of the replicaset and the former PRIMARY will be invalidated.
* Checking status of last known PRIMARY
NOTE: bj78:3306 is UNREACHABLE
* Checking status of promoted instance
NOTE: bj79:3306 has status ERROR
* Checking transaction set status
* Promoting bj79:3306 to a PRIMARY...
* Updating metadata...
bj79:3306 was force-promoted to PRIMARY.
NOTE: Former PRIMARY bj78:3306 is now invalidated and must be removed from the replicaset.
* Updating source of remaining SECONDARY instances
** Changing replication source of bj77:3306 to bj79:3306
Failover finished successfully.
# 检查当前状态:
# 注意现在是AVAILABLE_PARTIAL即:主库虽然可用,但有SECONDARY节点不可用
MySQL 8.0.19 localhost:3306 ssl JS > rs.status()
{
"replicaSet": {
"name": "TestReplicaset",
"primary": "bj79:3306",
"status": "AVAILABLE_PARTIAL",
"statusText": "The PRIMARY instance is available, but one or more SECONDARY instances are not.",
"topology": {
"bj77:3306": {
"address": "bj77:3306",
"instanceRole": "SECONDARY",
"mode": "R/O",
"replication": {
"applierStatus": "APPLIED_ALL",
"applierThreadState": "Slave has read all relay log; waiting for more updates",
"receiverStatus": "ON",
"receiverThreadState": "Waiting for master to send event",
"replicationLag": null
},
"status": "ONLINE"
},
"bj78:3306": {
"address": "bj78:3306",
"connectError": "bj78:3306: Can't connect to MySQL server on 'bj78' (111)",
"fenced": null,
"instanceRole": null,
"mode": null,
"status": "INVALIDATED"
},
"bj79:3306": {
"address": "bj79:3306",
"instanceRole": "PRIMARY",
"mode": "R/W",
"status": "ONLINE"
}
},
"type": "ASYNC"
}
}
# 启动原主库bj78:
步骤略
# 启动后的bj78仍然无法加入集群:
"bj78:3306": {
"address": "bj78:3306",
"fenced": false,
"instanceErrors": [
"WARNING: Instance was INVALIDATED and must be removed from the replicaset.",
"ERROR: Instance is NOT a PRIMARY but super_read_only option is OFF. Accidental updates to this instance are possible and will cause inconsistencies in the replicaset."
],
"instanceRole": null,
"mode": null,
"status": "INVALIDATED",
"transactionSetConsistencyStatus": "OK"
},
# 使用 rs.rejoinInstance() 重新加入集群
MySQL 8.0.19 localhost:3306 ssl JS > rs.rejoinInstance('bj78:3306')
* Validating instance...
** Checking transaction state of the instance...
The safest and most convenient way to provision a new instance is through automatic clone provisioning, which will completely overwrite the state of 'bj78:3306' with a physical snapshot from an existing replicaset member. To use this method by default, set the 'recoveryMethod' option to 'clone'.
WARNING: It should be safe to rely on replication to incrementally recover the state of the new instance if you are sure all updates ever executed in the replicaset were done with GTIDs enabled, there are no purged transactions and the new instance contains the same GTID set as the replicaset or a subset of it. To use this method by default, set the 'recoveryMethod' option to 'incremental'.
Incremental state recovery was selected because it seems to be safely usable.
* Rejoining instance to replicaset...
** Configuring bj78:3306 to replicate from bj79:3306
** Checking replication channel status...
** Waiting for rejoined instance to synchronize with PRIMARY...
* Updating the Metadata...
The instance 'bj78:3306' rejoined the replicaset and is replicating from bj79:3306.
# 最后检查一下,集群已经完全恢复正常
MySQL 8.0.19 localhost:3306 ssl JS > rs.status()
{
"replicaSet": {
"name": "TestReplicaset",
"primary": "bj79:3306",
"status": "AVAILABLE",
"statusText": "All instances available.",
"topology": {
"bj77:3306": {
"address": "bj77:3306",
"instanceRole": "SECONDARY",
"mode": "R/O",
"replication": {
"applierStatus": "APPLIED_ALL",
"applierThreadState": "Slave has read all relay log; waiting for more updates",
"receiverStatus": "ON",
"receiverThreadState": "Waiting for master to send event",
"replicationLag": null
},
"status": "ONLINE"
},
"bj78:3306": {
"address": "bj78:3306",
"instanceRole": "SECONDARY",
"mode": "R/O",
"replication": {
"applierStatus": "APPLIED_ALL",
"applierThreadState": "Slave has read all relay log; waiting for more updates",
"receiverStatus": "ON",
"receiverThreadState": "Waiting for master to send event",
"replicationLag": null
},
"status": "ONLINE"
},
"bj79:3306": {
"address": "bj79:3306",
"instanceRole": "PRIMARY",
"mode": "R/W",
"status": "ONLINE"
}
},
"type": "ASYNC"
}
}
总体来说,ReplicaSet的管理方式和MongoDB挺像,但是目前提供出来的功能还不是太完善,而且目的也不是很明确。短期内应该还不会有生产场景敢用,等后期周边配套都完善后,我们再做进一步的测试。
# 历史文章归档
GitHub都在用的高可用工具Orch:
Orchestrator:01 基础篇
Orchestrator:02 高可用方案VIP篇
Orchestrator:03 高可用方案ProxySQL篇
Orchestrator:04 高可用方式部署
Percona 全力打造的监控平台 PMM:
监控利器 PMM2.0X GA 版本发布!
PMM监控的告警配置
PMM的Ansible部署与重点指标
在PMM中添加Redis和ES
叶老师新课程《MySQL性能优化》已经在腾讯课堂发布,本课程讲解读几个MySQL性能优化的核心要素:合理利用索引,降低锁影响,提高事务并发度。下面是报名小程序码,厚着脸皮请求大家推荐给需要的小伙伴们。
下面是本课程内容目录
扫码加入MySQL技术Q群
(群号:650149401)
点“在看”给我一朵小黄花