Error: semaphore wait has lasted > 600 seconds导致数据库实例重启

生产数据库实例在晚上的时候突然重启(真是悲剧的事情)~

1.环境:

DB version:mariadb 10.0.28 x64

OS version:centos6.6 x64

kernel:2.6.32-504.el6.x86_64

系统sem:kernel.sem = 1000 40960001000 4096

2.error log

InnoDB: ###### Diagnostic info printed to the standard error stream
InnoDB: Warning: a long semaphore wait:
--Thread 139562561287936 has waited at row0ins.cc line 2730 for 923.00 seconds the semaphore:
X-lock (wait_ex) on RW-latch at 0x7ef7b756b4a0 '&block->lock'
a writer (thread id 139562561287936) has reserved it in mode  wait exclusive
number of readers 1, waiters flag 1, lock_word: ffffffffffffffff
Last time read locked in file row0sel.cc line 4152
Last time write locked in file /home/buildbot/buildbot/build/mariadb-10.0.28/storage/xtradb/row/row0ins.cc line 2730
InnoDB: Warning: a long semaphore wait:
--Thread 139562523481856 has waited at row0ins.cc line 2730 for 921.00 seconds the semaphore:
X-lock (wait_ex) on RW-latch at 0x7ef9c7eb5500 '&block->lock'
a writer (thread id 139562523481856) has reserved it in mode  wait exclusive
number of readers 1, waiters flag 1, lock_word: ffffffffffffffff
Last time read locked in file buf0flu.cc line 1069
Last time write locked in file /home/buildbot/buildbot/build/mariadb-10.0.28/storage/xtradb/btr/btr0sea.cc line 979
InnoDB: Warning: a long semaphore wait:
--Thread 139562609211136 has waited at row0ins.cc line 2730 for 911.00 seconds the semaphore:
X-lock (wait_ex) on RW-latch at 0x7ef7a6ef37e0 '&block->lock'
a writer (thread id 139562609211136) has reserved it in mode  wait exclusive
number of readers 1, waiters flag 0, lock_word: ffffffffffffffff
Last time read locked in file buf0flu.cc line 1069
Last time write locked in file /home/buildbot/buildbot/build/mariadb-10.0.28/storage/xtradb/row/row0ins.cc line 2730
InnoDB: Warning: a long semaphore wait:
--Thread 139562551437056 has waited at trx0undo.ic line 171 for 907.00 seconds the semaphore:
X-lock (wait_ex) on RW-latch at 0x7ef9c5d1e4e0 '&block->lock'
a writer (thread id 139562551437056) has reserved it in mode  wait exclusive
number of readers 1, waiters flag 0, lock_word: ffffffffffffffff
Last time read locked in file buf0flu.cc line 1069
Last time write locked in file /home/buildbot/buildbot/build/mariadb-10.0.28/storage/xtradb/include/trx0undo.ic line 171
InnoDB: Warning: a long semaphore wait:
--Thread 139560391223040 has waited at row0ins.cc line 2730 for 906.00 seconds the semaphore:
X-lock on RW-latch at 0x7ef9c7eb5500 '&block->lock'
a writer (thread id 139562523481856) has reserved it in mode  wait exclusive
number of readers 1, waiters flag 1, lock_word: ffffffffffffffff
Last time read locked in file buf0flu.cc line 1069
Last time write locked in file /home/buildbot/buildbot/build/mariadb-10.0.28/storage/xtradb/btr/btr0sea.cc line 979
InnoDB: ###### Diagnostic info printed to the standard error stream
InnoDB: Error: semaphore wait has lasted > 600 seconds
InnoDB: We intentionally crash the server, because it appears to be hung.
2017-03-30 21:11:18 7eee7d9fe700  InnoDB: Assertion failure in thread 139562774947584 in file srv0srv.cc line 2222
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.6/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.

3.临时方案

查看手册的时候发现自适应哈希索引可能会导致btr0sea.c文件的rw latch获取争用,从而导致SEMAPHORES问题。

详细链接:https://dev.mysql.com/doc/refman/5.7/en/innodb-adaptive-hash.html。

临时解决方案:set global innodb_adaptive_hash_index=0;

4.后续处理

给mariadb提了一个bug,发现centos6.6有坑。 Haswell-based Servers在centos6.6内核下可能会导致hang死(跟我的环境一模一样~)。详细链接
https://www.infoq.com/news/2015/05/redhat-futex
https://groups.google.com/forum/?hl=zh-Cn#!starred/codership-team/Ne6WsTWixH8

当然在后续如果出现问题希望能够及时gdb dump出来文件提供给官方进行研究吧。



你可能感兴趣的:(MariaDB)