m
墨墨导读:本文是作者用MySQL数据库手动注册binlog文件造成主从同步异常后,详述整个分析与解决的过程。
云和恩墨大讲堂线上分享:《weblogic 优化》
时间:2019年7月30日(星期二)20:30-21:30
立即报名:https://cs.enmotech.com/event/57
将备份的一些binlog文件加入到了binlog日志目录文件中。
修改index文件,加入了这些binlog文件。
flush binary logs注册文件。
做完这些操作后,主库确实binlog注册回来了, 但是整个主从环境大量延迟。
(root:db1@xucl:10:30:22)[(none)]> show binary logs;+---------------------+-----------+| Log_name | File_size |+---------------------+-----------+| mysql-binlog.000035 | 2134 || mysql-binlog.000036 | 845915 || mysql-binlog.000037 | 11735 || mysql-binlog.000038 | 284 || mysql-binlog.000039 | 284 || mysql-binlog.000040 | 284 || mysql-binlog.000041 | 284 || mysql-binlog.000042 | 234 |+---------------------+-----------+12 rows in set (0.00 sec)
+---------------------+-----------+
| Log_name | File_size |
+---------------------+-----------+
| mysql-binlog.000035 | 2134 |
| mysql-binlog.000036 | 845915 |
| mysql-binlog.000037 | 11735 |
| mysql-binlog.000038 | 284 |
| mysql-binlog.000039 | 284 |
| mysql-binlog.000040 | 284 |
| mysql-binlog.000041 | 284 |
| mysql-binlog.000042 | 234 |
+---------------------+-----------+
12 rows in set (0.00 sec)
拷贝需要清理的binlog到备份目录,即binlog 35、36、37、38。
(root:db1@xucl:10:30:34)[(none)]> purge binary logs to 'mysql-binlog.000039';Query OK, 0 rows affected (0.00 sec)(root:db1@xucl:10:30:49)[(none)]> show binary logs;+---------------------+-----------+| Log_name | File_size |+---------------------+-----------+| mysql-binlog.000039 | 284 || mysql-binlog.000040 | 284 || mysql-binlog.000041 | 284 || mysql-binlog.000042 | 234 |+---------------------+-----------+4 rows in set (0.00 sec)
Query OK, 0 rows affected (0.00 sec)
(root:db1@xucl:10:30:49)[(none)]> show binary logs;
+---------------------+-----------+
| Log_name | File_size |
+---------------------+-----------+
| mysql-binlog.000039 | 284 |
| mysql-binlog.000040 | 284 |
| mysql-binlog.000041 | 284 |
| mysql-binlog.000042 | 234 |
+---------------------+-----------+
4 rows in set (0.00 sec)
[root@izbp12nspj47ypto9t6vyez logs]# ll-rw-r----- 1 mysql mysql 2134 5月 20 22:03 mysql-binlog.000035-rw-r----- 1 mysql mysql 845915 5月 20 22:03 mysql-binlog.000036-rw-r----- 1 mysql mysql 11735 5月 20 22:05 mysql-binlog.000037-rw-r----- 1 mysql mysql 284 5月 20 22:06 mysql-binlog.000038-rw-r----- 1 mysql mysql 284 5月 21 10:28 mysql-binlog.000039-rw-r----- 1 mysql mysql 284 5月 21 10:28 mysql-binlog.000040-rw-r----- 1 mysql mysql 284 5月 21 10:28 mysql-binlog.000041-rw-r----- 1 mysql mysql 491 5月 21 10:31 mysql-binlog.000042-rw-r----- 1 mysql mysql 204 5月 21 10:30 mysql-binlog.index
-rw-r----- 1 mysql mysql 2134 5月 20 22:03 mysql-binlog.000035
-rw-r----- 1 mysql mysql 845915 5月 20 22:03 mysql-binlog.000036
-rw-r----- 1 mysql mysql 11735 5月 20 22:05 mysql-binlog.000037
-rw-r----- 1 mysql mysql 284 5月 20 22:06 mysql-binlog.000038
-rw-r----- 1 mysql mysql 284 5月 21 10:28 mysql-binlog.000039
-rw-r----- 1 mysql mysql 284 5月 21 10:28 mysql-binlog.000040
-rw-r----- 1 mysql mysql 284 5月 21 10:28 mysql-binlog.000041
-rw-r----- 1 mysql mysql 491 5月 21 10:31 mysql-binlog.000042
-rw-r----- 1 mysql mysql 204 5月 21 10:30 mysql-binlog.index
(root:db1@xucl:10:32:51)[(none)]> flush binary logs;Query OK, 0 rows affected (0.01 sec)(root:db1@xucl:10:32:57)[(none)]> show binary logs;+---------------------+-----------+| Log_name | File_size |+---------------------+-----------+| mysql-binlog.000035 | 2134 || mysql-binlog.000036 | 845915 || mysql-binlog.000037 | 11735 || mysql-binlog.000038 | 284 || mysql-binlog.000039 | 284 || mysql-binlog.000040 | 284 || mysql-binlog.000041 | 284 || mysql-binlog.000042 | 541 || mysql-binlog.000043 | 234 |+---------------------+-----------+13 rows in set (0.00 sec)
Query OK, 0 rows affected (0.01 sec)
(root:db1@xucl:10:32:57)[(none)]> show binary logs;
+---------------------+-----------+
| Log_name | File_size |
+---------------------+-----------+
| mysql-binlog.000035 | 2134 |
| mysql-binlog.000036 | 845915 |
| mysql-binlog.000037 | 11735 |
| mysql-binlog.000038 | 284 |
| mysql-binlog.000039 | 284 |
| mysql-binlog.000040 | 284 |
| mysql-binlog.000041 | 284 |
| mysql-binlog.000042 | 541 |
| mysql-binlog.000043 | 234 |
+---------------------+-----------+
13 rows in set (0.00 sec)
报错如下:
Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'Cannot replicate GTID-transaction when @@GLOBAL.GTID_MODE = OFF, at file /storage/single/mysql3306/logs/mysql-binlog.000035, position 194.
transaction when @@GLOBAL.GTID_MODE = OFF, at file /storage/single/mysql3306/logs/mysql-binlog.000035, position 194.
从这个错误的位点来看,在主库flush binary logs后,视乎主库的DUMP线程又在重新发送老的mysql-binlog.000035这个binlog。报错是因为以前是GTID模式,但是测试的时候已经改为了POSITION模式,是主库DUMP线程在检测主库的GTID Event和从库的GTID模式是否兼容的时候报出来的,如下图
这个错误在这里并不重要,我们只需要知道老的binlog mysql-binlog.000035居然再次发送了。
具体可以参考函数MYSQL_BIN_LOG::new_file_impl。其中要说明一下获取新的binlog文件名字是通过函数find_uniq_filename实现的,其中包含如下代码:
file_info= dir_info->dir_entry; for (i= dir_info->number_off_files ; i-- ; file_info++) { if (strncmp(file_info->name, start, length) == 0 && is_number(file_info->name+length, &number,0)) { set_if_bigger(max_found, number); } }... *next= (need_next || max_found == 0) ? max_found + 1 : max_found;
for (i= dir_info->number_off_files ; i-- ; file_info++)
{
if (strncmp(file_info->name, start, length) == 0 &&
is_number(file_info->name+length, &number,0))
{
set_if_bigger(max_found, number);
}
}
...
*next= (need_next || max_found == 0) ? max_found + 1 : max_found;
大概意思就是扫描binlog目录下的binlog文件,获取其序号最高的一个,然后加1。因此即便我们手动修改了index file,flush binary logs不会有问题,因为它实际扫描了binlog文件。同时我们也看到flush binary logs重新加载了index file,这个时候手动修改的index file就生效了,使用show binary logs就能查看到你手动加入的文件了。
栈帧如下:
#0 find_uniq_filename (name=0x7fffec5ec6d0 "/mysqldata/mysql3340/log/binlog", next=0x7fffec5ec678, need_next=true) at /mysqldata/percona-server-locks-detail-5.7.22/sql/binlog.cc:3679#1 0x000000000187d208 in generate_new_log_name (new_name=0x7fffec5ec6d0 "/mysqldata/mysql3340/log/binlog", new_ext=0x0, log_name=0x7ffedc011520 "/mysqldata/mysql3340/log/binlog", is_binlog=true) at /mysqldata/percona-server-locks-detail-5.7.22/sql/binlog.cc:3767#2 0x0000000001884fdb in MYSQL_BIN_LOG::new_file_impl (this=0x2e83640, need_lock_log=false, extra_description_event=0x0) at /mysqldata/percona-server-locks-detail-5.7.22/sql/binlog.cc:7175#3 0x0000000001884cbb in MYSQL_BIN_LOG::new_file_without_locking (this=0x2e83640, extra_description_event=0x0) at /mysqldata/percona-server-locks-detail-5.7.22/sql/binlog.cc:7099#4 0x0000000001886b6b in MYSQL_BIN_LOG::rotate (this=0x2e83640, force_rotate=true, check_purge=0x7fffec5ecbfb) at /mysqldata/percona-server-locks-detail-5.7.22/sql/binlog.cc:7775#5 0x0000000001886d53 in MYSQL_BIN_LOG::rotate_and_purge (this=0x2e83640, thd=0x7ffedc000b90, force_rotate=true) at /mysqldata/percona-server-locks-detail-5.7.22/sql/binlog.cc:7846
at /mysqldata/percona-server-locks-detail-5.7.22/sql/binlog.cc:3679
#1 0x000000000187d208 in generate_new_log_name (new_name=0x7fffec5ec6d0 "/mysqldata/mysql3340/log/binlog", new_ext=0x0,
log_name=0x7ffedc011520 "/mysqldata/mysql3340/log/binlog", is_binlog=true) at /mysqldata/percona-server-locks-detail-5.7.22/sql/binlog.cc:3767
#2 0x0000000001884fdb in MYSQL_BIN_LOG::new_file_impl (this=0x2e83640, need_lock_log=false, extra_description_event=0x0)
at /mysqldata/percona-server-locks-detail-5.7.22/sql/binlog.cc:7175
#3 0x0000000001884cbb in MYSQL_BIN_LOG::new_file_without_locking (this=0x2e83640, extra_description_event=0x0)
at /mysqldata/percona-server-locks-detail-5.7.22/sql/binlog.cc:7099
#4 0x0000000001886b6b in MYSQL_BIN_LOG::rotate (this=0x2e83640, force_rotate=true, check_purge=0x7fffec5ecbfb)
at /mysqldata/percona-server-locks-detail-5.7.22/sql/binlog.cc:7775
#5 0x0000000001886d53 in MYSQL_BIN_LOG::rotate_and_purge (this=0x2e83640, thd=0x7ffedc000b90, force_rotate=true)
at /mysqldata/percona-server-locks-detail-5.7.22/sql/binlog.cc:7846
int error= mysql_bin_log.find_next_log(&m_linfo, 0);
mysql_bin_log.find_next_log 包含的代码:
my_b_seek(&index_file, linfo->index_file_offset);//进行偏移量偏移 linfo->index_file_start_offset= linfo->index_file_offset; length=my_b_gets(&index_file, fname, FN_REFLEN));//读取文件名字... if(normalize_binlog_name(full_fname, fname, is_relay_log))... linfo->index_file_offset= my_b_tell(&index_file);//偏移量从新记录以备下一次使用
linfo->index_file_start_offset= linfo->index_file_offset;
length=my_b_gets(&index_file, fname, FN_REFLEN));//读取文件名字
...
if(normalize_binlog_name(full_fname, fname, is_relay_log))
...
linfo->index_file_offset= my_b_tell(&index_file);//偏移量从新记录以备下一次使用
我们能够看到DUMP线程并没有实际扫描整个index文件,而是通过一个index文件的偏移量进行读取。如果手动修改index文件那么这个偏移量就不会维护,DUMP发送的下一个文件将是不确定的,因此出现了发送手动注册的binlog文件给从库的现象,根据主从设置的不同将会出现下面的情况:
另外我们需要考虑一下purge binary logs会删除binlog,为什么不会出现偏移量错乱的问题呢,因为purge binary logs会实际的维护这个偏移量在函数adjust_linfo_offsets中调用Adjust_offset::operator()进行维护,代码如下:
linfo->index_file_offset -= m_purge_offset;
mysql> show binary logs;+---------------+-----------+| Log_name | File_size |+---------------+-----------+| binlog.000001 | 198 || binlog.000002 | 154 |+---------------+-----------+2 rows in set (0.00 sec)mysql> show create table testcp \G;*************************** 1. row *************************** Table: testcpCreate Table: CREATE TABLE `testcp` ( `id` int(11) NOT NULL, PRIMARY KEY (`id`)) ENGINE=InnoDB DEFAULT CHARSET=utf81 row in set (0.00 sec)ERROR: No query specified
+---------------+-----------+
| Log_name | File_size |
+---------------+-----------+
| binlog.000001 | 198 |
| binlog.000002 | 154 |
+---------------+-----------+
2 rows in set (0.00 sec)
mysql> show create table testcp \G;
*************************** 1. row ***************************
Table: testcp
Create Table: CREATE TABLE `testcp` (
`id` int(11) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
1 row in set (0.00 sec)
ERROR:
No query specified
主库:
mysql> insert into testcp values(20);Query OK, 1 row affected (0.43 sec)mysql> select * from testcp;+----+| id |+----+| 10 || 20 |+----+2 rows in set (0.01 sec)
Query OK, 1 row affected (0.43 sec)
mysql> select * from testcp;
+----+
| id |
+----+
| 10 |
| 20 |
+----+
2 rows in set (0.01 sec)
从库:
mysql> show slave status \G;*************************** 1. row ***************************... Master_Log_File: binlog.000002 Read_Master_Log_Pos: 417 Relay_Log_File: relay.000004 Relay_Log_Pos: 624 Relay_Master_Log_File: binlog.000002 Slave_IO_Running: Yes Slave_SQL_Running: Yes...mysql> select * from testcp;+----+| id |+----+| 10 || 20 |+----+
*************************** 1. row ***************************
...
Master_Log_File: binlog.000002
Read_Master_Log_Pos: 417
Relay_Log_File: relay.000004
Relay_Log_Pos: 624
Relay_Master_Log_File: binlog.000002
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
...
mysql> select * from testcp;
+----+
| id |
+----+
| 10 |
| 20 |
+----+
这个时候DUMP线程index file偏移指针如下,如下图:
做之前拷贝原有binlog.000001为binlog.000001bak因为等会要拷贝回去
mysql> purge binary logs to 'binlog.000002';Query OK, 0 rows affected (3.14 sec)mysql> show binary logs;+---------------+-----------+| Log_name | File_size |+---------------+-----------+| binlog.000002 | 417 |+---------------+-----------+1 row in set (0.00 sec)
Query OK, 0 rows affected (3.14 sec)
mysql> show binary logs;
+---------------+-----------+
| Log_name | File_size |
+---------------+-----------+
| binlog.000002 | 417 |
+---------------+-----------+
1 row in set (0.00 sec)
因为purge binary logs命令会维护偏移指针,这个时候DUMP线程的index file偏移指针如下,如下图:
将binlog.000001bak拷贝为binlog.000001,修改index file将binlog.000001加入回去,然后flush binary logs如下:
mysql> flush binary logs;Query OK, 0 rows affected (0.15 sec)mysql> show binary logs;+---------------+-----------+| Log_name | File_size |+---------------+-----------+| binlog.000001 | 198 || binlog.000002 | 461 |+---------------+-----------+2 rows in set (0.00 sec)
Query OK, 0 rows affected (0.15 sec)
mysql> show binary logs;
+---------------+-----------+
| Log_name | File_size |
+---------------+-----------+
| binlog.000001 | 198 |
| binlog.000002 | 461 |
+---------------+-----------+
2 rows in set (0.00 sec)
手动完成这个工作并不会维护DUMP线程的index file偏移指针,如下图:
这个时候我们已经看到这个偏移指针已经错乱了,接下来DUMP线程将再次发送binlog.000002。
就这样DUMP线程重新发送了一次binlog.000002的内容,POSITION 模式的从库只能报错了如下,我们可以看到重复的行出现了,错误1062如下:
LAST_ERROR_MESSAGE: Worker 1 failed executing transaction 'ANONYMOUS' at master log binlog.000002, end_log_pos 386; Could not execute Write_rows event on table testmts.testcp; Duplicate entry '20' for key 'PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log binlog.000002, end_log_pos 386
end_log_pos 386; Could not execute Write_rows event on table testmts.testcp; Duplicate entry '20' for key
'PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log
binlog.000002, end_log_pos 386
这样重新启动从库将重新初始化DUMP的index file偏移指针,应该是没有问题的。但是应该还是要尽量避免此类操作,因为index file和binlog都是MySQL自己维护的。
原创:高鹏(八怪),现任易极付高级DBA,从事Oracle/MySQL相关工作10余年Oracle OCM 有丰富的数据库性能调优故障诊断经验,对MySQL源码有一定研究。《深入理解MySQL主从原理》作者。请关注《深入MySQL主从原理32节》,地址:http://j.youzan.com/yEY_Xi 。
扩展阅读
谈PaaS平台建设:如何应对企业架构多元异构资源的挑战
一次执行truncate触发ORA-02266解决过程
《数据安全警示录》一书修订版出版
美女DBA带你了解PostgreSQL用户及角色
分布式锁用Redis坚决不用Zookeeper?
频发:记ADG备库日志应用延迟的一次故障处理-云和恩墨技术通讯精选
MySQL锁的灵魂七拷问
从Oracle到PostgreSQL:最全控制文件
数据和云
ID:OraNews
如有收获,请划至底部,点击“在看”,谢谢!
资源下载
关注公众号:数据和云(OraNews)回复关键字获取
help,30万+下载的完整菜单栏
2019DTCC,数据库大会PPT
2018DTCC , 数据库大会PPT
2018DTC,2018 DTC 大会 PPT
ENMOBK,《Oracle性能优化与诊断案例》
DBALIFE,“DBA 的一天”海报
DBA04,DBA 手记4 电子书
122ARCH,Oracle 12.2体系结构图
2018OOW,Oracle OpenWorld 资料
产品推荐云和恩墨Bethune Pro2 企业版,集监控、巡检、安全于一身,你的专属数据库实时监控和智能巡检平台,漂亮的不像实力派,你值得拥有!
云和恩墨zData一体机现已发布超融合版本和精简版,支持各种简化场景部署,零数据丢失备份一体机ZDBM也已发布,欢迎关注。
云和恩墨大讲堂 | 一个分享交流的地方
长按,识别二维码,加入万人交流社群
请备注:云和恩墨大讲堂