记一次MYSQL死锁 排查

项目报了一个死锁异常

org.apache.ibatis.exceptions.PersistenceException:
### Error flushing statements. Cause: org.apache.ibatis.executor.BatchExecutorException: com.baturu.wms.business.outbound.dao.OutboundNoticeHeaderDao.updateById (batch index #1) failed. Cause: java.sql.BatchUpdateException: Deadlock found when trying to get lock; try restarting transaction
### Cause: org.apache.ibatis.executor.BatchExecutorException: com.baturu.wms.business.outbound.dao.OutboundNoticeHeaderDao.updateById (batch index #1) failed. Cause: java.sql.BatchUpdateException: Deadlock found when trying to get lock; try restarting transaction
at org.apache.ibatis.exceptions.ExceptionFactory.wrapException(ExceptionFactory.java:30)
...
Caused by: org.apache.ibatis.executor.BatchExecutorException: com.baturu.wms.business.outbound.dao.OutboundNoticeHeaderDao.updateById (batch index #1) failed. Cause: java.sql.BatchUpdateException: Deadlock found when trying to get lock; try restarting transaction
at org.apache.ibatis.executor.BatchExecutor.doFlushStatements(BatchExecutor.java:148)
...
at org.apache.ibatis.session.defaults.DefaultSqlSession.flushStatements(DefaultSqlSession.java:253)
... 44 more
Caused by: java.sql.BatchUpdateException: Deadlock found when trying to get lock; try restarting transaction
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
...
at org.apache.ibatis.executor.BatchExecutor.doFlushStatements(BatchExecutor.java:122)
... 52 more
Caused by: com.mysql.jdbc.exceptions.jdbc4.MySQLTransactionRollbackException: Deadlock found when trying to get lock; try restarting transaction
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
...

第一次排查死锁异常,开始时有点无从下手,借着最近刚好看在innodb的劲,先百度了一下如何在mysql看死锁日志

查看mysql死锁记录日志

命令:

mysql> show engine innodb status; 

会出来很多很多数据,不要慌,只看我们需要的死锁部分
死锁一般是由两个事务互相等待锁导致,所以由事务1和事务2,我们看日志的目的就是找出两个事务互相等待的是什么锁

*** (1) TRANSACTION:
TRANSACTION 468429219, ACTIVE 0 sec starting index read
mysql tables in use 1, locked 1
LOCK WAIT 7 lock struct(s), heap size 1136, 7 row lock(s), undo log entries 17
MySQL thread id 31567782, OS thread handle 140580182210304, query id 786068511 1.1.1.1 ops updating
UPDATE wms_outbound_notice_header SET status=10,wave_code='19089',wave_error_msg='',updater=null,update_date='2021-11-10 17:01:09.377'  WHERE id=1458342344177786881

事务1在执行【UPDATE wms_outbound_notice_header …】时生成了死锁

*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 9028 page no 90 n bits 136 index PRIMARY of table `btr_wms`.`wms_outbound_notice_header` trx id 468429219 lock_mode X locks rec but not gap waiting

事务1等待锁情况:
事务id【468429219】在等待表【btr_wms.wms_outbound_notice_header】上的 X 锁(即排他锁)
所以事务1的id为468429219

*** (2) TRANSACTION:
TRANSACTION 468429218, ACTIVE 0 sec fetching rows
mysql tables in use 1, locked 1
248 lock struct(s), heap size 41168, 40534 row lock(s), undo log entries 1662
MySQL thread id 31568472, OS thread handle 140582284355328, query id 786068544 1.1.1.1 ops updating
UPDATE wms_wave_detail  SET active=0,updater=null,update_date='2021-11-10 17:01:09.441' WHERE outbound_notice_id IN (1452477021064904706)

事务2在执行【UPDATE wms_wave_detail …】时等待锁

*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 9028 page no 90 n bits 136 index PRIMARY of table `btr_wms`.`wms_outbound_notice_header` trx id 468429218 lock_mode X locks rec but not gap
*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 8978 page no 395 n bits 88 index PRIMARY of table `btr_wms`.`wms_wave_detail` trx id 468429218 lock_mode X waiting

事务2拥有锁:
事务id【468429218】持有表btr_wms.wms_outbound_notice_header的 X 锁
事务2等待锁:
事务id【468429218】等待表btr_wms.wms_wave_detail的 X 锁

*** WE ROLL BACK TRANSACTION (1)

最后是回滚了事务1,解决死锁

由上面的日志可以得出锁互相等待的结论:

  • 持有表wms_outbound_notice_header的锁在等待表wms_wave_detail的锁
  • 持有表wms_wave_detail的锁在等待表wms_outbound_notice_header的锁

代码分析

由异常日志可以看找出事务1执行的代码位置

@Override
@Transactional(rollbackFor = Exception.class)
public Long createWave(Xxx xxx) {  
	...
	waveDetailService.insertBatch(saveWaveDetailList);
	//执行下面的更新时等待锁
	outboundNoticeHeaderService.updateBatchById(updateOutboundNoticeHeaderList);
	...
}

接下来寻找哪里是先更新了outboundNoticeHeader,再要去更新waveDetail
根据事务2执行sql的情况找出更新waveDetail的位置

@Override
@Transactional(rollbackFor = Exception.class)
public void saveWaveErrorMsg(Xxx xxx) {
    ...
    waveDetailService.update(WaveDetailEntity.builder().active(BooleanStatus.FREEZE).build(), detailQueryWrapper);
    ...
    outboundNoticeHeaderService.update(
            OutboundNoticeHeaderEntity.builder()
                    .status(OutboundNoticeStatusEnum.CREATE.getType())
                    .waveErrorMsg(errorMsg)
                    .build(),
            outboundNoticeHeaderEntityQueryWrapper);
}

看起来没毛病,也是先更新 waveDetail 再更新 outboundNoticeHeader
不过再看看外层代码发现大问题

@Transactional(rollbackFor = Exception.class)
public void execute(Xxx xxx){
	...
	for (OutboundNoticeHeaderDTO outboundNoticeHeaderDTO : outboundNoticeHeaderDTOS) {
		...
		waveHeaderService.saveWaveErrorMsg(errorMsg, Lists.newArrayList(outboundNoticeHeaderDTO));
		...
	}
	...
}

方法【saveWaveErrorMsg】被套了一层for循环,而for循环所在方法【execute】是开启事务的
循环第一次【saveWaveErrorMsg】时,会获取到 outboundNoticeHeader的锁,那么在循环第二次执行到【waveDetailService.update】就需要等待 waveDetail 的锁了
与方法【createWave】获取锁的顺序正好相反,所以产生了死锁
最后通过分析发现在【execute】上开启事务是错误的,于是把@Transactional去掉解决了问题
简图如下:
记一次MYSQL死锁 排查_第1张图片

你可能感兴趣的:(问题排查,mysql,数据库锁,innodb)