理论上,insert on duplicate key 的执行过程中加锁顺序一致,并且对唯一索引加Record Lock 不存在唯一索引的风险,但是生产中近来出现了多次该场景下的DEADLOCK,在这里借着这个场景聊聊一些知识点以及排查问题的思路,无论是实操还是面试这篇文章都是必读的干货!
所有的死锁都来源于资源的争抢,A、B两个线程 A持有的“甲号”资源等待“乙号”资源,而B持有了“乙号”资源等待“甲号”资源。在程序设计上我们通常使用确保加锁顺序一致的方式来防止发生死锁,但是现实生产中各种权衡下很难用一句话的方案解决多个场景下的复杂问题。
由于安全问题,我不在这里贴生产环境出问题的场景了,直接复用MySql官方bug的一个场景,其上下文几乎完全一致:
CREATE TABLE `tt` (
`a` bigint NOT NULL AUTO_INCREMENT,
`b` bigint DEFAULT NULL,
`c` bigint DEFAULT NULL,
`d` bigint DEFAULT NULL,
PRIMARY KEY (`a`),
UNIQUE KEY `blah` (`b`,`c`)
)
T1: INSERT ... (b,c,d) VALUES (1, rand, rand),(1, rand, rand) ON DUPLICATE KEY UPDATE d=VALUES(d)
T2: INSERT ... (b,c,d) VALUES (2, rand, rand),(2, rand, rand) ON DUPLICATE KEY UPDATE d=VALUES(d)
死锁日志:
LATEST DETECTED DEADLOCK
------------------------
2021-05-04 10:53:24 0x7f2e1b4ee700
*** (1) TRANSACTION:
TRANSACTION 2310041, ACTIVE 0 sec inserting
mysql tables in use 1, locked 1
LOCK WAIT 7 lock struct(s), heap size 1136, 4 row lock(s), undo log entries 1
MySQL thread id 291, OS thread handle 139847773697792, query id 768872 localhost local_admin:sys.database update
INSERT INTO S230677 (b, c, d) VALUES (2697564,509247.782987144,1620150804984388.8),(2697564,680856.9544766529,1620150804984389.8) ON DUPLICATE KEY UPDATE d=VALUES(d)
*** (1) HOLDS THE LOCK(S):
RECORD LOCKS space id 26 page no 34703 n bits 304 index PRIMARY of table `test`.`S230677` trx id 2310041 lock_mode X
Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0
0: len 8; hex 73757072656d756d; asc supremum;;
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 26 page no 34703 n bits 304 index PRIMARY of table `test`.`S230677` trx id 2310041 lock_mode X insert intention waiting
Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0
0: len 8; hex 73757072656d756d; asc supremum;;
*** (2) TRANSACTION:
TRANSACTION 2310042, ACTIVE 0 sec inserting
mysql tables in use 1, locked 1
LOCK WAIT 6 lock struct(s), heap size 1136, 4 row lock(s), undo log entries 1
MySQL thread id 290, OS thread handle 139847812486912, query id 768873 localhost local_admin:sys.database update
INSERT INTO S230677 (b, c, d) VALUES (2697563,36712.30166479333,1620150804984440.0),(2697563,784844.3560763699,1620150804984440.8) ON DUPLICATE KEY UPDATE d=VALUES(d)
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 26 page no 34703 n bits 304 index PRIMARY of table `test`.`S230677` trx id 2310042 lock_mode X
Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0
0: len 8; hex 73757072656d756d; asc supremum;;
*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 26 page no 34703 n bits 304 index PRIMARY of table `test`.`S230677` trx id 2310042 lock_mode X insert intention waiting
Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0
0: len 8; hex 73757072656d756d; asc supremum;;
先说结论:这种现象是一个innodb在MySQL5.7下的bug,可以通过升级内核解决,只了解到这儿那几乎是没有收获的我们详细看一下为什么会出现这个bug?
生产环境下我的事务隔离级别是RC,RC情况下仍然后存在gap lock这个稍后我在详细只是点讲解的时候讲述为什么RC情况下innodb仍然加gap lock。
读上面的死锁日志你会发现非常迷惑,就是明明已经持有了“RECORD LOCKS space id 26 page no 34703 n bits 304”依然还在申请这个锁,原因是这个死锁日志有问题,RECORD LOCKS在这个日志里有迷惑性,并不仅仅是行锁,我通过构建场景发现Gap Lock,Record Lock,Insert Intension 都被泛泛描述为 Record Lock。
加锁顺序:
死锁产生。
MySQL官方的bug日志展示在5.7.4的版本上他们修复了这个bug。
Despite the title, Bug#50413 can occur even if MySQL replication or binlog is not used. That bug has was fixed in MySQL 5.7.4. The fix is that when we encounter a duplicate key in the clustered index or in any unique secondary index during an INSERT, we will acquire gap locks in the not-yet-checked secondary indexes as well. In this way, the INSERT will already have acquired some locks for the ON DUPLICATE KEY UPDATE part, thus avoiding some potential deadlocks.
从官方的这个修复日志来看,他们加强了对RC情况下聚簇索引和唯一索引的加锁,即在insert过程中,就对聚簇索引和唯一索引加gap lock,来避免在update进行唯一性检查的时候才加锁。
直接说,当前的加锁机制下,的确存在这个死锁的问题且不可避免,不是仅仅加锁顺序一致就可以避免,我准备在官方论坛跟他们讨论下后续有结论我会继续更新。
这里我只粗略的介绍一个原则,即首先加等级锁如S锁然后加高等级锁如X锁,类似上文中提到的加锁顺序,这种类似锁升级的设计保证了MySQL的高性能,当然,弊端是极端场景下的死锁问题,目前来看我任务这个缺点是需要程序设计的时候避免的,而不能归咎于MySQL的设计缺陷。
MySQL :: MySQL 8.0 Reference Manual :: 15.7.1 InnoDB Locking
想了下大家还是直接读官方文档吧,有问题可以给我留言。
在MySQL5.4之前:Gap locking is only used for foreign-key constraint checking and duplicate-key checking.
翻一下:gap lock 只有在外键或者唯一索引冲突检查是才会使用。
5.4之后,上文中提到了,insert的时候也会加锁。
那么为什么这样设计?
我直接引用一段老外的描述:
Suppose transaction T1, with the RC isolation, updates the value of a column that is part of a unique key (index). Clearly this is only possible if the new key value does not already exist. Suppose also transaction T2, also with the RC isolation, attempts to insert a new record with the key value equal to that just created by T1. Due to the RC isolation level T2 does not see the change made by T1, since it has not been committed yet, so the duplicate key error is not raised. If it weren't for a gap lock on the unique index in question, this would eventually result in two records with the same purportedly unique key.
MySQL engine features - about InnoDB transaction locks (1) - Alibaba Cloud Developer Forums: Cloud Discussion Forums
MySQL :: MySQL 8.0 Reference Manual :: 13.2.6.2 INSERT ... ON DUPLICATE KEY UPDATE Statement
MySQL Bugs: #103576: insert ... on duplicate key ... disjoint sets of rows deadlock on supremum
MySQL Bugs: #52020: InnoDB can still deadlock on just INSERT...ON DUPLICATE KEY
https://docs.oracle.com/cd/E17952_01/mysql-5.7-en/innodb-locks-set.html
https://docs.oracle.com/cd/E17952_01/mysql-5.7-en/innodb-transaction-isolation-levels.html
https://docs.oracle.com/cd/E17952_01/mysql-5.7-en/innodb-locking.html#innodb-gap-locks
https://docs.oracle.com/cd/E17952_01/mysql-5.7-en/innodb-locks-set.html