Mysql选用InnoDB出现数据冲突的解决方法

1. 环境描述

操作系统：ubuntu16.04
数据库：mysql5.7.11
数据库引擎：InnoDB

2. 现象描述

数据库中包含一个工作的database，暂时命名为abc，其中有很多表，如table1，table2，...， tablen。

通过终端连入，执行use abc，执行show tables；
出现如下错误：
2006，MySQL Server has gone away.

3. 排查过程

通过ps命令查看，数据库服务任然在工作，而且可以正常的启动、停止。
通过程序日志，能查到的日志就是2003，Cannot Connect to MySQL Server on 'xxx'，而程序这里通过的内网ip连接，且所有的内网ip均已经通过权限放通，并且包括localhost上面的设备也会报这个错，所以排除权限的问题，并且程序是间歇性的报数据库无法连接的错误。
通过程序、终端提示，获取到的2003和2006号错误信息范围太过于广泛，而且绝大部分的google结果都是建议设置my.cnf中的max_allowed_packet等参数，这里出现的问题和这些实际操作并无意义。

分析到这里，想到了mysql的error日志中可能还存在某些有意义的信息，于是进行了排查。果然找到了如下的信息：

2017-07-03T03:37:45.826669Z 10 [Note] InnoDB: Uncompressed page, stored checksum in field1 3016282142, calcksum in field2 2124946097, calculated checksums for field2: crc32 860595562/1814632262, innodb 1314564691, red to page already) 1, space id (if created with >= MySQL-4.1.1 and stored already) 8292
InnoDB: Page may be an insert buffer bitmap page
2017-07-03T03:37:45.826689Z 10 [Note] InnoDB: It is also possible that your operating system has corrupted  You can also try to fix the corruption by dumping, dropping, and reimporting the corrupt table. You can us/forcing-innodb-recovery.html for information about forcing recovery.
2017-07-03T03:37:45.826728Z 10 [ERROR] InnoDB: Database page corruption on disk or a failed file read of pa
2017-07-03T03:37:45.826740Z 10 [Note] InnoDB: Page dump in ascii and hex (16384 bytes):
@                                                                                                          @                                                                                                          @                                                                                                          @                                                                                                          @                                                                                                          @                                                                                                          @                                                                                                          @                                                                                                          @                                                                                                          @                                                                                                          @                                                                                                          @                                                                                                          @                                                                                                          
2017-07-03T03:37:45.691584Z 10 [Note] InnoDB: Page dump in ascii and hex (16384 bytes):
InnoDB: End of page dump
InnoDB: Page may be an insert buffer bitmap page
2017-07-03T03:37:30.574822Z 8 [Note] InnoDB: It is also possible that your operating system has corrupted its own file cache and rebooting your computer removes the error. If the corrupt page is an index page. You can also try to fix the corruption by dumping, dropping, and reimporting the corrupt table. You can use CHECK TABLE to scan your table for corruption. Please refer to http://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html for information about forcing recovery.
2017-07-03T03:37:30.574844Z 8 [ERROR] [FATAL] InnoDB: Unable to read page [page id: space=8292, page number=1] into the buffer pool after 100 attempts. The most probable cause of this error may be that the table has been corrupted. Or, the table was compressed with with an algorithm that is not supported by this instance. If it is not a decompress failure, you can try to fix this problem by using innodb_force_recovery. Please see http://dev.mysql.com/doc/refman/5.7/en/ for more details. Aborting...
2017-07-03 11:37:30 0x7f34b1b81700  InnoDB: Assertion failure in thread 139864296658688 in file ut0ut.cc line 920
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
03:37:30 UTC - mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
Attempting to collect some information that could help diagnose the problem.
As this is a crash and something is definitely wrong, the information
collection process might fail.

key_buffer_size=8388608
read_buffer_size=131072
max_used_connections=152
max_threads=151
thread_count=135
connection_count=135
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 68188 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x7f2255a93560
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7f225d717e40 thread_stack 0x40000
/usr/sbin/mysqld(my_print_stacktrace+0x2c)[0xe9cb1c]
/usr/sbin/mysqld(handle_fatal_signal+0x451)[0x78e9e1]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x10340)[0x7f22c3980340]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0x39)[0x7f22c2dc1cc9]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x148)[0x7f22c2dc50d8]
/usr/sbin/mysqld[0x75f732]
/usr/sbin/mysqld(_ZN2ib5fatalD1Ev+0xee)[0x105807e]

...

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (7f2255b01260): is an invalid pointer
Connection ID (thread ID): 24530
Status: NOT_KILLED

The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
2017-07-03T03:37:10.753743Z mysqld_safe Number of processes running now: 0
2017-07-03T03:37:10.754693Z mysqld_safe mysqld restarted
2017-07-03T03:37:10.762109Z 0 [Warning] Changed limits: max_open_files: 1024 (requested 5000)
2017-07-03T03:37:10.762218Z 0 [Warning] Changed limits: table_open_cache: 431 (requested 2000)
2017-07-03T03:37:10.917967Z 0 [Note] /usr/sbin/mysqld (mysqld 5.7.11) starting as process 10490 ...
2017-07-03T03:37:10.922060Z 0 [Note] InnoDB: PUNCH HOLE support available
2017-07-03T03:37:10.922090Z 0 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2017-07-03T03:37:10.922096Z 0 [Note] InnoDB: Uses event mutexes
2017-07-03T03:37:10.922099Z 0 [Note] InnoDB: GCC builtin __atomic_thread_fence() is used for memory barrier
2017-07-03T03:37:10.922105Z 0 [Note] InnoDB: Compressed tables use zlib 1.2.8
2017-07-03T03:37:10.922108Z 0 [Note] InnoDB: Using Linux native AIO
2017-07-03T03:37:10.922397Z 0 [Note] InnoDB: Number of pools: 1
2017-07-03T03:37:10.922533Z 0 [Note] InnoDB: Using CPU crc32 instructions
2017-07-03T03:37:10.939738Z 0 [Note] InnoDB: Initializing buffer pool, total size = 128M, instances = 1, chunk size = 128M
2017-07-03T03:37:10.948098Z 0 [Note] InnoDB: Completed initialization of buffer pool
2017-07-03T03:37:10.954216Z 0 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority().
2017-07-03T03:37:10.966637Z 0 [Note] InnoDB: Highest supported file format is Barracuda.
2017-07-03T03:37:10.968310Z 0 [Note] InnoDB: Log scan progressed past the checkpoint lsn 15822260650
2017-07-03T03:37:10.968329Z 0 [Note] InnoDB: Doing recovery: scanned up to log sequence number 15822261566
2017-07-03T03:37:10.968492Z 0 [Note] InnoDB: Doing recovery: scanned up to log sequence number 15822261566
2017-07-03T03:37:10.968677Z 0 [Note] InnoDB: Database was not shutdown normally!
2017-07-03T03:37:10.968684Z 0 [Note] InnoDB: Starting crash recovery.
2017-07-03T03:37:10.985498Z 0 [Note] InnoDB: Starting an apply batch of log records to the database...
InnoDB: Progress in percent: 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99
2017-07-03T03:37:11.489700Z 0 [Note] InnoDB: Apply batch completed
2017-07-03T03:37:11.598387Z 0 [Note] InnoDB: Removed temporary tablespace data file: "ibtmp1"
2017-07-03T03:37:11.598423Z 0 [Note] InnoDB: Creating shared tablespace for temporary tables
2017-07-03T03:37:11.598487Z 0 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...
2017-07-03T03:37:11.853450Z 0 [Note] InnoDB: File './ibtmp1' size is now 12 MB.
2017-07-03T03:37:11.854759Z 0 [Note] InnoDB: 96 redo rollback segment(s) found. 96 redo rollback segment(s) are active.
2017-07-03T03:37:11.854772Z 0 [Note] InnoDB: 32 non-redo rollback segment(s) are active.
2017-07-03T03:37:11.855597Z 0 [Note] InnoDB: 5.7.11 started; log sequence number 15822261566
2017-07-03T03:37:11.856051Z 0 [Note] InnoDB: Loading buffer pool(s) from /var/lib/mysql/ib_buffer_pool
2017-07-03T03:37:11.856330Z 0 [Note] Plugin 'FEDERATED' is disabled.
2017-07-03T03:37:11.856844Z 0 [Note] InnoDB: Buffer pool(s) load completed at 170703 11:37:11
2017-07-03T03:37:11.857795Z 0 [Note] Found ca.pem, server-cert.pem and server-key.pem in data directory. Trying to enable SSL support using them.
2017-07-03T03:37:11.858042Z 0 [Warning] CA certificate ca.pem is self signed.
2017-07-03T03:37:11.859534Z 0 [Note] Server hostname (bind-address): '*'; port: 3306
2017-07-03T03:37:11.859592Z 0 [Note] IPv6 is available.
2017-07-03T03:37:11.859605Z 0 [Note]   - '::' resolves to '::';
2017-07-03T03:37:11.859614Z 0 [Note] Server socket created on IP: '::'.
2017-07-03T03:37:11.868865Z 0 [Note] Event Scheduler: Loaded 0 events
2017-07-03T03:37:11.869455Z 0 [Note] /usr/sbin/mysqld: ready for connections.
Version: '5.7.11'  socket: '/var/run/mysqld/mysqld.sock'  port: 3306  MySQL Community Server (GPL)

上面这一大段的日志信息反应出的关键信息就一个——表遭到损坏了。紧接着导致了线程出现异常，mysqld_safe执行重启操作解决冲突，但实际上是失败了。
这里看到日志里面提供了参考链接：https://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html。
在这个参考链接里，找到了一些信息。

4.解决方案

通过上面链接的指引，尝试修改my.cnf文件中的如下配置（没有的话加上），重启服务，再次登陆mysql，发现现在可以操作数据库了，但是，还是存在某些问题，如：导入数据失败。所以到这里，并未完全解决这个问题。

[mysqld]
innodb_force_recovery = 1

上面这个参数可设置的值为1-6，此处设置为1的含义为：让服务器运行，即使它检测到损坏的页面。尝试使SELECT * FROM tbl_name跳过损坏的索引记录和页面，这有助于转储表。
也就是说，这里只是强制让mysql不去使用损坏的数据，而不是根本上解决问题。
通过大量翻阅其他资料，包括参考mysql bug反馈社区的一些帖子，发现完全解决这个问题的方式为如下（查到的，可能存在更好的方式）:

在my.cnf 中设置innodb_force_recovery = 1，重启数据库服务
备份数据库数据
重新注释innodb_force_recovery = 1这个信息或者设置innodb_force_recovery = 0，重启数据库服务
通过备份数据，恢复数据库数据

这里还有一个语句能辅助帮助去检查哪个表遭到了损坏：check table table_name。

5. 问题原因猜测

猜测一：出现这个问题之前，数据库设备存在一次日志和备份文件将磁盘空间占满，并导致程序写入数据库失败大量rollback的现象，后来清除多余数据库日志和备份释放了数据库空间。
猜测二：操作设备文件中，对数据库相关文件造成了损坏。

6.额外的信息

网上有人遇到了同样的问题，信息如下：

Oracle/MySQL blames in such cases the Operating System, the I/O system or the hardware. What we have seen is that such incidents occur more often on Windows systems and when people are running their databases in a virtualized environment (VMWare). Because of the small number of cases we are aware of, this may not be representative.

他指出，在windows操作系统上或者虚拟机上面遇到这个问题的概率比较大，虽然他不确信。这也验证了在查找问题的过程中，发现网上遇到问题比较多的是windows 2003的设备。另外，我们的设备采用了虚拟机部署的方式，也值得我们多加注意。

7.参考文章

WHICH TABLE IS HIT BY AN INNODB PAGE CORRUPTION
Forcing InnoDB Recovery
Mysql 之配置文件my.cnf
mysql bug报告
CHECK TABLE Syntax