【背景介绍】
故障方描述:一次用户刷权限的时候不小心把数据库用户表记录删掉了,执行之后发现不对后重建用户,杀掉进程后重新MGR启动报错。
【报错信息】
2018-06-13T12:47:41.405593Z 32 [Note] Plugin group_replication reported: 'Group communication SSL configuration: group_replication_ssl_mode: "DISABLED"'
2018-06-13T12:47:41.405820Z 32 [Note] Plugin group_replication reported: '[GCS] Added automatically IP ranges 127.0.0.1/8,172.xx.xxx.xxx/26,192.xxx.xx.xxx/24 to the whitelist'
2018-06-13T12:47:41.406172Z 32 [Note] Plugin group_replication reported: '[GCS] SSL was not enabled'
2018-06-13T12:47:41.406216Z 32 [Note] Plugin group_replication reported: 'Initialized group communication with configuration: group_replication_group_name: "b47a8cea-6cf5-4ea4-933f-a8c20905f900"; group_replication_local_address: "172.xx.xxx.xxx:xxx1"; group_replication_group_seeds: "172.xx.xxx.xxx:xxx1,172.xx.xxx.xxx:24901,172.xx.xxx.xxx:xxxx1"; group_replication_bootstrap_group: true; group_replication_poll_spin_loops: 0; group_replication_compression_threshold: 100; group_replication_ip_whitelist: "AUTOMATIC"'
2018-06-13T12:47:41.406944Z 34 [Note] 'CHANGE MASTER TO FOR CHANNEL 'group_replication_applier' executed'. Previous state master_host='
2018-06-13T12:47:41.434136Z 34 [ERROR] Slave SQL for channel 'group_replication_applier': Slave failed to initialize relay log info structure from the repository, Error_code: 1872
2018-06-13T12:47:41.434183Z 34 [ERROR] Plugin group_replication reported: 'Error while starting the group replication applier thread'
2018-06-13T12:47:41.434323Z 34 [Note] Plugin group_replication reported: 'The group replication applier thread was killed'
2018-06-13T12:47:41.434389Z 32 [ERROR] Plugin group_replication reported: 'Unable to initialize the Group Replication applier module.'
2018-06-13T12:47:41.434551Z 32 [Note] Plugin group_replication reported: 'Requesting to leave the group despite of not being a member'
2018-06-13T12:47:41.434588Z 32 [ERROR] Plugin group_replication reported: '[GCS] The member is leaving a group without being on one.'
【问题分析】
从报错日志查看,数据库在识别relay log时出现问题,从Oracle官方文档可以确认异常终止MGR服务命中了Bug25534078:
MySQL Innodb Cluster Node Failed To Start With Error: "[ERROR] Slave SQL for channel 'group_replication_applier': Slave failed to initialize relay log info structure from the repository, Error_code: 1872" (文档 ID 2386403.1)
进行信息查询符合上面BUG现象:
【解决办法】
清理mysql.slave_relay_log_info时先记录日志信息,按照如下方法正常修复,同时建议参数文件要指定relay log参数路径。
但是由于再重新创建用户时,没有关闭binlog同步到其他节点,导致其他节点加入集群是报错。
提供第一个方案:进行reset master(此操作非常危险),但是由于MGR本身存在问题时间比较久导致binlog过期丢失,因此无法修复。
提供第二个方案:暂时提供单节点服务,制定好方案后,找另一个时间窗口对MGR架构进行修复。
【总结】
在涉及数据库重要配置时,谨慎操作。
在出现问题的时候更加注意再次误操作,导致更加难恢复。
对数据库重要进程进行监控,及时发现问题及时修复。