在了解了事务和锁的基本知识后,终于进入我们的正题,死锁的分析,大家可以根据链接回看之前的章节
一、事务的基本知识
二、锁的基本知识
当线上出现问题的时候,如果使用的是JDBC方式去对数据库进行操作,我们首先看到更多的可能是应用中报出的如下错误
com.mysql.jdbc.exceptions.jdbc4.MySQLTransactionRollbackException: Deadlock found when trying to get lock; try restarting transaction at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:422) at com.mysql.jdbc.Util.handleNewInstance(Util.java:411) at com.mysql.jdbc.Util.getInstance(Util.java:386) at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1065) at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:4120) at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:4052) at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2503) at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2664) at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2794) at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:2155) at com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:2458) at com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:2375) at com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:2359) at com.trs.util.dbcp.impl.ImplPreparedStatement.executeUpdate(ImplPreparedStatement.java:226)......
这个错误能帮助我们定位到具体的逻辑代码位置,但是SQL为什么出现死锁就无能为力了,所以这个时候,我们就需要去查看数据库的死锁日志,看看到底是什么SQL导致了锁住。我们可以通过 在数据库中执行
show engine innodb status;
命令来获取死锁日志,获取到的死锁日志形如:
=====================================2021-08-30 21:31:43 0x7f39abf80700 INNODB MONITOR OUTPUT=====================================
Per second averages calculated from the last 38 seconds
-----------------BACKGROUND THREAD-----------------
srv_master_thread loops: 954091 srv_active, 0 srv_shutdown, 2508681 srv_idlesrv_master_thread log flush and writes: 3462771
----------SEMAPHORES----------
OS WAIT ARRAY INFO: reservation count 216391537OS WAIT ARRAY INFO: signal count 917101415RW-shared spins 4424452927, rounds 24011275677, OS waits 203368606RW-excl spins 168213427, rounds 576279420, OS waits 5793524RW-sx spins 129855, rounds 1708261, OS waits 15698Spin rounds per wait: 5.43 RW-shared, 3.43 RW-excl, 13.16 RW-sx
------------------------LATEST DETECTED DEADLOCK------------------------
2021-08-30 21:26:20 0x7f39ad52e700***
(1) TRANSACTION:
TRANSACTION 672853416, ACTIVE 0 sec starting index readmysql tables in use 1, locked 1LOCK WAIT 3 lock struct(s), heap size 1128, 2 row lock(s)MySQL thread id 1170656, OS thread handle 139885679015680, query id 1614382927 59.212.147.206 trs Searching rows for updateupdate WCMDocument set DOCStatus=10 where DocId=3019271 and DocChannel=70659 and not DocStatus=10 and DocStatus>0***
(1) WAITING FOR THIS LOCK TO BE GRANTED:RECORD LOCKS space id 1048 page no 491022 n bits 792 index IX_WCMDOCUMENT_CHNL of table `trs_hycloud_iip`.`wcmdocument` trx id 672853416 lock_mode X waitingRecord lock, heap no 439 PHYSICAL RECORD: n_fields 4; compact format; info bits 0 0: len 4; hex 80011403; asc ;; 1: len 4; hex 8000000a; asc ;; 2: len 4; hex 80000000; asc ;; 3: len 4; hex 802e0c61; asc . a;;***
(2) TRANSACTION:
TRANSACTION 672853406, ACTIVE 0 sec updating or deletingmysql tables in use 1, locked 13103 lock struct(s), heap size 335992, 9037 row lock(s), undo log entries 1MySQL thread id 1170651, OS thread handle 139885697754880, query id 1614382738 59.212.147.206 trs Updatingupdate WCMDocument set DOCStatus=10 where DocId=3017705 and DocChannel=70659 and not DocStatus=10 and DocStatus>0***
(2) HOLDS THE LOCK(S):RECORD LOCKS space id 1048 page no 491022 n bits 792 index IX_WCMDOCUMENT_CHNL of table `trs_hycloud_iip`.`wcmdocument` trx id 672853406 lock_mode XRecord lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0 0: len 8; hex 73757072656d756d; asc supremum;;
(2) WAITING FOR THIS LOCK TO BE GRANTED:RECORD LOCKS space id 1048 page no 491022 n bits 792 index IX_WCMDOCUMENT_CHNL of table `trs_hycloud_iip`.`wcmdocument` trx id 672853406 lock_mode X locks gap before rec insert intention waitingRecord lock, heap no 439 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
0: len 4; hex 80011403; asc ;; 1: len 4; hex 8000000a; asc ;; 2: len 4; hex 80000000; asc ;; 3: len 4; hex 802e0c61; asc . a;;*** WE ROLL BACK TRANSACTION (1)
......
日志中列出了死锁发生的时间,以及导致死锁的事务信息(只显示两个事务,如果由多个事务导致的死锁也只显示两个),并显示出每个事务正在执行的 SQL 语句、等待的锁以及持有的锁信息等。下面我们就来研究下这份死锁日志,看看从这份死锁日志中能不能发现死锁的原因?
首先看事务一:ACTIVE 0 sec starting index read mysql tables in use 1, locked 1 LOCK WAIT 3 lock struct(s), heap size 1128, 2 row lock(s)
ACTIVE 0 sec 表示事务活动时间, starting index read 表示读取表索引, tables in use 1表示有一张表被使用,3 lock struct(s) 表示该事务的锁链表的长度为 3,每个链表节点代表该事务持有的一个锁结构,包括表锁,记录锁以及 autoinc 锁等,heap size 1128 为事务分配的锁堆内存大小。2 row lock(s) 表示当前事务持有的行锁个数。
在第二节,数据库锁的内容中我们提到了四种行锁,这四种行锁在死锁日志中的展示内容如下所示
记录锁(LOCK_REC_NOT_GAP): lock_mode X locks rec but not gap
间隙锁(LOCK_GAP): lock_mode X locks gap before rec
Next-key 锁(LOCK_ORNIDARY): lock_mode X
插入意向锁(LOCK_INSERT_INTENTION): lock_mode X locks gap before rec insert intention
有一点要注意的是 ,并不是在日志里看到 lock_mode X 就认为这是 Next-key 锁,因为还有一个例外:如果在 supremum record (最大数据)上加锁,locks gap before rec 会省略掉,间隙锁会显示成 lock_mode X,插入意向锁会显示成 lock_mode X insert intention
由此我们可以看到对于事务一来说是在等next-key lock,事务二持有了next-key lock,在等待插入意向锁。
拿到具体死锁的日志后,我们就可以在数据库中查看SQL的执行计划,根据执行计划进行对应的具体问题分析
可参考的解决方案
方案一:尽量以相同的顺序来访问索引记录和表。在程序以批量方式处理数据的时候,如果事先对数据排序,保证每个线程按固定的顺序来处理记录,这样同时需要对相同数据加锁的概率大大降低。
方案二:为表添加合理的索引,如果不走索引将会为表的每一行记录加锁,死锁的概率就会大大增大。
方案三:避免大事务,尽量将大事务拆成多个小事务来处理;因为大事务占用资源多,耗时长,与其他事务冲突的概率也会变高。