mysql死锁问题定位与解决
java后台报错如下
java.util.concurrent.ExecutionException: org.springframework.dao.DeadlockLoserDataAccessException:
### Error updating database. Cause: com.mysql.jdbc.exceptions.jdbc4.MySQLTransactionRollbackException: Deadlock found when trying to get lock; try restarting transaction
### The error may involve com.ruisitech.bi.mapper.bireport.RhMapper.updateRhShowInfo-Inline
### The error occurred while setting parameters
### SQL: update dau_result set num= ? where time=? and condition=?
### Cause: com.mysql.jdbc.exceptions.jdbc4.MySQLTransactionRollbackException: Deadlock found when trying to get lock; try restarting transaction
mysql中执行语句:show engine innodb status;
关注的信息如下:
------------------------
LATEST DETECTED DEADLOCK
------------------------
2019-01-24 17:03:26 7efcb18aa700
*** (1) TRANSACTION:
TRANSACTION 1748980943, ACTIVE 0 sec fetching rows
mysql tables in use 3, locked 3
LOCK WAIT 10 lock struct(s), heap size 2936, 121 row lock(s)
MySQL thread id 12653229, OS thread handle 0x7efef2fd6700, query id 1708475432 172.16.27.230 boss_ddl Searching rows for update
update dau_result
set num= 68
where time='20181001' and condition='332'
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 8959 page no 21 n bits 352 index `PRIMARY` of table `boss_bi`.`dau_result` trx id 1748980943 lock_mode X locks rec but not gap waiting
*** (2) TRANSACTION:
TRANSACTION 1748980942, ACTIVE 0 sec fetching rows, thread declared inside InnoDB 4997
mysql tables in use 3, locked 3
12 lock struct(s), heap size 2936, 115 row lock(s)
MySQL thread id 12652735, OS thread handle 0x7efcb18aa700, query id 1708475433 172.16.27.230 boss_ddl Searching rows for update
update dau_result
set num= 244618
where time='20181002' and condition='333'
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 8959 page no 21 n bits 352 index `PRIMARY` of table `boss_bi`.`dau_result` trx id 1748980942 lock_mode X locks rec but not gap
*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 8959 page no 21 n bits 352 index `PRIMARY` of table `boss_bi`.`dau_result` trx id 1748980942 lock_mode X locks rec but not gap waiting
*** WE ROLL BACK TRANSACTION (1)
------------
现在将问题抽象出来,假设建表语句如下:
CREATE TABLE `dau_result` (
`id` int(11) NOT NULL AUTO_INCREMENT COMMENT '自增主键',
`time` varchar(16) NOT NULL DEFAULT '' COMMENT '时间,格式如1001',
`condition` int(11) NOT NULL DEFAULT '-1' COMMENT '条件id',
`num` int(11) NOT NULL DEFAULT '0' COMMENT '的数量',
PRIMARY KEY (`id`) USING BTREE,
KEY `idx_time` (`time`) USING BTREE,
KEY `idx_condition` (`condition`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=24609 DEFAULT CHARSET=utf8mb4 COMMENT='日活量查询结果表';
表中的数据有如下6条(1,20181001,332,10),(2,20181001,333,10)(3,20181002,332,10),(4,20181002,333,10),(5,20181003,332,10),(6,20181003,333,10)
发生死锁的两个sql如下:
sqlA:update dau_result set num= 244618 where time='20181002' and condition='333'
sqlB:update dau_result set num= 68 where time='20181001' and condition='332'
首先确定的是两个sql最终的定位不是同一条记录,怎么发生死锁了呢?其实即使是定位到同一条记录也只是会阻塞而已,并不会导致死锁,下面分析导致死锁的原因。
explain update dau_result set num= 68 where time='20181002' and condition='333' ;
结果:
key:idx_query_condition,idx_time
extra:Using intersect(idx_condition,idx_time); Using where; Using temporary
单个sql会同时走idx_condition和idx_time,但是两个sql的time和condition都不同,但是上面explain的结果中,key同时使用了idx_condition,idx_time两个索引,extra中也包含了Using intersect(idx_condition,idx_time),所以mysql做了优化),查询符合条件的记录时,实际使用了下面两个个sql并取交集:
sqlA1:select * from dau_result where time='20181002' for update;//此时对应上锁的id范围是id=3和id=4
sqlA2:select * from dau_result where condition='333' for update ;//此时对应上锁的id范围是id=2和id=4和id=6
注意虽然虽然sqlA1和sqlA2都对id=4上互斥锁,但是不会有阻塞问题,因为sqlA1和sqlA2在同一个事务中。取交集后定位id=4的进行修改,sql如下
sqlA3:update *** where id = 4 (time='20181002' and condition='333'语句对应的id=4)
说明:上面需要有mysql锁相关知识,可以参考:
https://blog.csdn.net/h2604396739/article/details/86518943和https://blog.csdn.net/h2604396739/article/details/81483051
如果对sqlB也做如上分析,则sqlB1上锁范围为:id=1和id=2
sqlB2上锁范围,id=1,id=3和id=5
那多线程中整体执行流程就可能如下图:
session1(sqlA) | session2(sqlB) | |
t1 | sqlA1:id=3和id=4上锁 | |
t2 | sqlB1:id=1和id=2上锁 | |
t3 | sqlA2:id=2和id=4和id=6上锁(id=2导致blocked) | |
t4 | sqlB2:id=1,id=3和id=5(id=3导致blocked) | |
t5 | deadlock | deadlock |
t6 | sqlA rollback,并且释放id=3的锁 | |
t7 | sqlB获取锁并执行 |
导致死锁原因:
被访问到的记录上的是互斥锁;并且因为分别走了两个索引导致上锁范围扩大。
解决方案
1修改为共享锁:定位实际修改记录过程中被访问记录使用共享锁,先按照条件查询对应的id(因为select上的是共享锁S,不会导致死锁),然后再用id来更新语句,因为id是唯一主键,两条sql会对应不同的主键,互斥锁也会加在不同的id上,不会导致死锁。
即update dau_result set num= 68, update_time=now() where time='20181001' and query_condition='332' 变更为下面两条sql:
select id from dau_result where time='20181001' and condition='332'';#此时仅仅有共享锁,不会死多
update dau_result set num= 68 where id=1 ;#只对一条记录上锁,一定不会死锁,至多阻塞
2减小上锁范围:强制使用索引idx_time或idx_condition,减小上锁的范围。以idx_time为例此时对应sql如下
update dau_result FORCE index(idx_time) set num= 68 where time='20181001' and query_condition='332' ;
update dau_result FORCE index(idx_time) set num= 244618 where time='20181002' and query_condition='333' ;
执行时序如下图,不会存在锁冲突的问题。
session1(sqlA) | session2(sqlB) | |
t1 | sqlA1:id=3和id=4上锁 | |
t2 | sqlB1:id=1和id=2上锁 | |
t3 | sqlA 执行 | sqlB执行 |
3如果idx_condition索引是冗余的,则可以直接删除,这样就一定会使用索引idx_time,本质和方案二一致,也能解决问题
方案一修改起来比较费力,而且会成倍增加查询次数;方案三可能每个索引也确实有用,所以有时方案三不可行;方案二的问题是将指定索引(FORCE index(idx_time))写在了代码(mapper.xml)里,如果索引idx_time被删除了,那么记得一定同步修改代码里面的force index 。这里推荐方案二。