同事突然反应数据库连接失败,提示超过连接数
一听到这个消息,感觉很不可能,当时配的数据库最大连接是2000
应用的连接很少,怎么会突然这么多呢,
登上数据库执行
show processlist;
发现已经占用的连接有1850多个,很多都是update语句,并且等待时间都已经高达60000s
而且当我查询select * from information_schema.processlist的时候提示
The total number of locks exceeds the lock table size
查询相关资料,这个错误大部分原因是innodb_buffer_pool_size 大小不足
但是数据库配置的buffer_pool_size 为20G,根据业务量绝对可以满足,正要进一步查看的时候,数据库自己crash
重启了。
查看后端mysql.error日志出现大量的
--Thread 139835962963712 has waited at lock0lock.cc line 6407 for 0.00 seconds the semaphore:
Mutex at 0x25dbdb08, Mutex LOCK_SYS created lock0lock.cc:446, lock var 1
--Thread 139835957638912 has waited at lock0lock.cc line 6407 for 0.00 seconds the semaphore:
Mutex at 0x25dbdb08, Mutex LOCK_SYS created lock0lock.cc:446, lock var 1
--Thread 139836010886912 has waited at lock0lock.cc line 6407 for 0.00 seconds the semaphore:
Mutex at 0x25dbdb08, Mutex LOCK_SYS created lock0lock.cc:446, lock var 1
--Thread 139833355343616 has waited at lock0lock.cc line 6407 for 0.00 seconds the semaphore:
Mutex at 0x25dbdb08, Mutex LOCK_SYS created lock0lock.cc:446, lock var 1
wait has ended
--Thread 139835982010112 has waited at lock0lock.cc line 6407 for 0.00 seconds the semaphore:
Mutex at 0x25dbdb08, Mutex LOCK_SYS created lock0lock.cc:446, lock var 1
--Thread 139833334249216 has waited at lock0lock.cc line 6407 for 0.00 seconds the semaphore:
Mutex at 0x25dbdb08, Mutex LOCK_SYS created lock0lock.cc:446, lock var 1
--Thread 139837043767040 has waited at lock0lock.cc line 4269 for 0.00 seconds the semaphore:
Mutex at 0x25dbdb08, Mutex LOCK_SYS created lock0lock.cc:446, lock var 1
--Thread 139837032707840 has waited at lock0lock.cc line 4269 for 0.00 seconds the semaphore:
Mutex at 0x25dbdb08, Mutex LOCK_SYS created lock0lock.cc:446, lock var 1
OS WAIT ARRAY INFO: signal count 2150204236
RW-shared spins 0, rounds 3489036, OS waits 2721875
RW-excl spins 0, rounds 4316611, OS waits 140186
RW-sx spins 1180703, rounds 35403715, OS waits 1179317
Spin rounds per wait: 3489036.00 RW-shared, 4316611.00 RW-excl, 29.99 RW-sx
FAIL TO OBTAIN LOCK MUTEX, SKIP LOCK INFO PRINTING
这是系统信号量等待
查看操作系统sem
250 32000 32 128
更改为:
250 32000 100 128
sysctl -w kernel.sem="250 32000 100 128"
echo "kernel.sem=250 32000 100 128">> /etc/sysctl.conf
2 SET GLOBAL innodb_adaptive_hash_index=0;
发现spin waits和OS waits等待时间相当高,在手册里查到了这一句话:
You can monitor the use of the adaptive hash index and the contention for its use in the SEMAPHORES section of the output of the SHOW ENGINE INNODB STATUS command. If you see many threads waiting on an RW-latch created in btr0sea.c, then it might be useful to disable adaptive hash indexing.
Sometimes, the read/write lock that guards access to the adaptive hash index can become a source of contention under heavy workloads, such as multiple concurrent joins.
由于自适应哈希索引造成大量的锁争用,进而堵塞很多进程,最终导致MySQL崩溃重启
set global innodb_adaptive_hash_index = 0;
修改完后继续观察看下效果.