1. 版本
1)操作系统版本
cat /proc/version
Linux version 3.10.0-1062.9.1.el7.x86_64 ([email protected]) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-39) (GCC) ) #1 SMP Fri Dec 6 15:49:49 UTC 2019
2)数据库版本
mysql> select version();
+-----------+
| version() |
+-----------+
| 8.0.17 |
+-----------+
1 row in set (0.00 sec)
2.问题描述
研发兄弟反馈说,他们的压测数据库中有一些表不能正常查询。我试了一下查询会报如下错误:
mysql> select * from test_part;
ERROR 2013 (HY000): Lost connection to MySQL server during query
日志中报错如下:
2020-07-06T17:42:52.413990+08:00 64 [ERROR] [MY-013183] [InnoDB] Assertion failure: btr0pcur.cc:318:page_is_comp(next_page) == page_is_comp(page) thread 140622855923456
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/8.0/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
09:42:52 UTC - mysqld got signal 6 ;
Most likely, you have hit a bug, but this error can also be caused by malfunctioning hardware.
Thread pointer: 0x7fe0d80102c0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7fe54f5fdc30 thread_stack 0x46000
/usr/sbin/mysqld(my_print_stacktrace(unsigned char*, unsigned long)+0x3d) [0x1e435bd]
/usr/sbin/mysqld(handle_fatal_signal+0x333) [0xf25e83]
/lib64/libpthread.so.0(+0xf5f0) [0x7fe5cf88a5f0]
/lib64/libc.so.6(gsignal+0x37) [0x7fe5cdb9f337]
/lib64/libc.so.6(abort+0x148) [0x7fe5cdba0a28]
/usr/sbin/mysqld() [0xca40da]
/usr/sbin/mysqld(btr_pcur_t::move_to_next_page(mtr_t*)+0x1b8) [0x212d458]
/usr/sbin/mysqld(row_search_mvcc(unsigned char*, page_cur_mode_t, row_prebuilt_t*, unsigned long, unsigned long)+0x118a) [0x20620ea]
/usr/sbin/mysqld(ha_innobase::general_fetch(unsigned char*, unsigned int, unsigned int)+0xd3) [0x1f07d53]
/usr/sbin/mysqld(handler::ha_rnd_next(unsigned char*)+0x66) [0x102f606]
/usr/sbin/mysqld(TableScanIterator::Read()+0x1d) [0xd51b1d]
/usr/sbin/mysqld(JOIN::exec()+0x45e) [0xdbbe6e]
/usr/sbin/mysqld(Sql_cmd_dml::execute_inner(THD*)+0x2cc) [0xe4088c]
/usr/sbin/mysqld(Sql_cmd_dml::execute(THD*)+0x418) [0xe47e18]
/usr/sbin/mysqld(mysql_execute_command(THD*, bool)+0x14a0) [0xdf4360]
/usr/sbin/mysqld(mysql_parse(THD*, Parser_state*)+0x353) [0xdf8703]
/usr/sbin/mysqld(dispatch_command(THD*, COM_DATA const*, enum_server_command)+0x2d10) [0xdfb920]
/usr/sbin/mysqld(do_command(THD*)+0x1b4) [0xdfc404]
/usr/sbin/mysqld() [0xf17630]
/usr/sbin/mysqld() [0x233495c]
/lib64/libpthread.so.0(+0x7e65) [0x7fe5cf882e65]
/lib64/libc.so.6(clone+0x6d) [0x7fe5cdc6788d]
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (7fe0d9d17b28): select * from test_part
Connection ID (thread ID): 64
Status: NOT_KILLED
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
#日志开头 Assertion failure: btr0pcur.cc:318:page_is_comp(next_page) == page_is_comp(page),扫描全表时遇到坏块
mysql> check table t_order_viapoint;
ERROR 2013 (HY000): Lost connection to MySQL server during query
check table 报错后查看errorlog 日志
2020-07-06T17:43:28.385547+08:00 16 [ERROR] [MY-013051] [InnoDB] In pages [page id: space=105, page number=21] and [page id: space=105, page number=22] of index `PRIMARY` of table `saic_trip_order2`.`test_part`
InnoDB: broken FIL_PAGE_NEXT or FIL_PAGE_PREV links
2020-07-06T17:43:28.385628+08:00 16 [ERROR] [MY-013051] [InnoDB] In pages [page id: space=105, page number=21] and [page id: space=105, page number=22] of index `PRIMARY` of table `saic_trip_order2`.`test_part`
InnoDB: 'compact' flag mismatch
2020-07-06T17:43:28.385646+08:00 16 [ERROR] [MY-011866] [InnoDB] Page index id 0 != data dictionary index id 443
2020-07-06T17:43:28.385658+08:00 16 [ERROR] [MY-013183] [InnoDB] Assertion failure: btr0btr.cc:4256:!page_is_empty(page) || (level == 0 && page_get_page_no(page) == dict_index_get_page(index)) thread 139819373106944
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/8.0/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
09:43:28 UTC - mysqld got signal 6 ;
Most likely, you have hit a bug, but this error can also be caused by malfunctioning hardware.
Thread pointer: 0x7f25940008c0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7f2a3c110c30 thread_stack 0x46000
/usr/sbin/mysqld(my_print_stacktrace(unsigned char*, unsigned long)+0x3d) [0x1e435bd]
/usr/sbin/mysqld(handle_fatal_signal+0x333) [0xf25e83]
/lib64/libpthread.so.0(+0xf5f0) [0x7f2a6e6475f0]
/lib64/libc.so.6(gsignal+0x37) [0x7f2a6c95c337]
/lib64/libc.so.6(abort+0x148) [0x7f2a6c95da28]
/usr/sbin/mysqld() [0xca40da]
/usr/sbin/mysqld() [0x211789b]
/usr/sbin/mysqld(btr_validate_index(dict_index_t*, trx_t const*, bool)+0x3f5) [0x2117eb5]
/usr/sbin/mysqld(ha_innobase::check(THD*, HA_CHECK_OPT*)+0x3c6) [0x1f0c7a6]
/usr/sbin/mysqld(handler::ha_check(THD*, HA_CHECK_OPT*)+0x13b) [0x1032aeb]
/usr/sbin/mysqld() [0x120becb]
/usr/sbin/mysqld(Sql_cmd_check_table::execute(THD*)+0x99) [0x120ccd9]
/usr/sbin/mysqld(mysql_execute_command(THD*, bool)+0x14a0) [0xdf4360]
/usr/sbin/mysqld(mysql_parse(THD*, Parser_state*)+0x353) [0xdf8703]
/usr/sbin/mysqld(dispatch_command(THD*, COM_DATA const*, enum_server_command)+0x2d10) [0xdfb920]
/usr/sbin/mysqld(do_command(THD*)+0x1b4) [0xdfc404]
/usr/sbin/mysqld() [0xf17630]
/usr/sbin/mysqld() [0x233495c]
/lib64/libpthread.so.0(+0x7e65) [0x7f2a6e63fe65]
/lib64/libc.so.6(clone+0x6d) [0x7f2a6ca2488d]
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (7f2594008a98): check table test_part
Connection ID (thread ID): 16
Status: NOT_KILLED
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
#InnoDB: broken FIL_PAGE_NEXT or FIL_PAGE_PREV links,innodb 数据页双向链表出现问题(比如FILE_PAGE_NEXT指定下一个页是 page:32 ,但是读取到的页号却不是32)
2020-07-06T17:43:28.385646+08:00 16 [ERROR] [MY-011866] [InnoDB] Page index id 0 != data dictionary index id 443
2020-07-06T17:43:28.385658+08:00 16 [ERROR] [MY-013183] [InnoDB] Assertion failure: btr0btr.cc:4256:!page_is_empty(page) || (level == 0 && page_get_page_no(page) == dict_index_get_page(index)) thread 139819373106944
##数据字典中该表和索引相关元数据信息如下:
select t.TABLE_ID,t.NAME table_name,t.ROW_FORMAT,t.SPACE table_sapce,i.INDEX_ID,i.NAME index_name,i.TYPE index_type,i.PAGE_NO index_page,i.SPACE index_space from information_schema.INNODB_TABLES t,information_schema.INNODB_INDEXES i where t.TABLE_ID=i.TABLE_ID and i.INDEX_ID=443;
+----------+-----------------------------------+------------+-------------+----------+------------+------------+------------+-------------+
| TABLE_ID | table_name | ROW_FORMAT | table_sapce | INDEX_ID | index_name | index_type | index_page | index_space |
+----------+-----------------------------------+------------+-------------+----------+------------+------------+------------+-------------+
| 1162 | test_shao/test_part | Dynamic | 105 | 443 | PRIMARY | 3 | 4 | 105 |
+----------+-----------------------------------+------------+-------------+----------+------------+------------+------------+-------------+
3.问题处理
1) 使用 undrop-for-innodb 恢复
详细请参考:
Recover Corrupt MySQL Database
2) select 恢复第一个坏块之前的数据
#主键链表上第一个坏块之前的所有数据都可以通过 select 语句正常查询备份,之后的数据无法正常访问(所以该方法只能恢复第一个坏块之前的数据)
#可以通过如下方法大概判断第一个坏块出现的问题
使用mysqldump导出该表,会报如下错误:
mysqldump -uroot -pxxxx -S /data/mysql/mysql3306/run/mysql.3306.sock --single-transaction --default-character-set=utf8mb4 --complete-insert --flush-privileges --set-gtid-purged=OFF --hex-blob --skip-add-locks --skip-add-drop-table test_shao test_part >test_part.sql
mysqldump: [Warning] Using a password on the command line interface can be insecure.
mysqldump: Error 2013: Lost connection to MySQL server during query when dumping table `test_part` at row: 637
#这里面的出现的出错的行数并不准确,大概就在这个行数附近。
3) check table 和 optimize table无法修复, innodb_force_recovery 也无法用来修复该问题
#因为出现数据页损坏,所以就算设置innodb_force_recovery参数重启数据库,读到该坏页的时候还是会导致数据库宕机