实际场景代码如下:
10个线程同时执行该方法,死锁;
@Override
@Transactional(rollbackFor = Throwable.class)
public Integer save(T t) {
if (StringUtils.hasText(t.getId())) {
Integer updCnt = update(t);
if (updCnt <= 0) {
return insert(t);
} else {
return updCnt;
}
} else {
return insert(t);
}
}
错误信息:
"code":1004,"msg":"系统出现故障,请联系管理员进行排查!","tag":"\n### Error updating database. Cause: com.mysql.cj.jdbc.exceptions.MySQLTransactionRollbackException: Deadlock found when trying to get lock; try restarting transaction\n### The error may exist in gyqx/spd/herp/purchase/mapper/McmsPurchaseMapper.java (best guess)\n### The error may involve gyqx.spd.herp.purchase.mapper.McmsPurchaseMapper.insert-Inline\n### The error occurred while setting parameters\n### SQL: INSERT INTO mcms_purchase ( id, order_no, bill_mode, rec_type, hos_id, branch_id, dept_id, dept_name, rec_org_id, rec_org_name, area_code, area_name, source_id, source_type, rec_user, rec_phone, rec_addr, prov_id, prov_name, sub_send_flag, exe_prov_id, exe_prov_name, bill_relation_json, ext_info, barcode_flag, pkg_flag, remark, priority, close_status, close_date, up_status, tb_status, del_flag, last_modified, version, create_time, create_user, last_modified_user ) VALUES ( ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, now(), 0, now(), ?, ? )\n### Cause: com.mysql.cj.jdbc.exceptions.MySQLTransactionRollbackException: Deadlock found when trying to get lock; try restarting transaction\n; Deadlock found when trying to get lock; try restarting transaction; nested exception is com.mysql.cj.jdbc.exceptions.MySQLTransactionRollbackException: Deadlock found when trying to get lock; try restarting transaction","data":null
sql如下:
START TRANSACTION;
UPDATE table_a SET ... WHERE id = x ;
IF(ROW_COUNT() = 0) THEN
INSERT INTO table_a id VALUES x;
END IF;
COMMIT;
其中id为主键。
平均一天有不到10次的死锁。
首先查看程序日志,发现死锁都只有新用户首次登录时才出现。也就是说,update时发现数据库中并没有相应的行,所以会进行接下来的插入操作,这时发生了死锁。
然后,查询了innodb的日志,这里贴出关键的部分。
------------------------
LATEST DETECTED DEADLOCK
------------------------
2018-02-02 23:35:03 7fe7f03ff700
*** (1) TRANSACTION:
TRANSACTION 72155984, ACTIVE 0 sec inserting
mysql tables in use 1, locked 1
LOCK WAIT 3 lock struct(s), heap size 1184, 2 row lock(s)
MySQL thread id 1279838, OS thread handle 0x7fe803bff700, query id 988205122 172.16.0.123 acspassport update
INSERT INTO table_a id VALUES x
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 85 page no 3763 n bits 184 index `PRIMARY` of table `accountdb`.`last_login_openid` trx id 72155984 lock_mode X insert intention waiting
Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0
0: len 8; hex 73757072656d756d; asc supremum;;
*** (2) TRANSACTION:
TRANSACTION 72155983, ACTIVE 0 sec inserting
mysql tables in use 1, locked 1
3 lock struct(s), heap size 1184, 2 row lock(s)
MySQL thread id 1248122, OS thread handle 0x7fe7f03ff700, query id 988205121 172.16.0.123 acspassport update
INSERT INTO table_a id VALUES x
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 85 page no 3763 n bits 184 index `PRIMARY` of table `accountdb`.`last_login_openid` trx id 72155983 lock_mode X
Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0
0: len 8; hex 73757072656d756d; asc supremum;;
*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 85 page no 3763 n bits 184 index `PRIMARY` of table `accountdb`.`last_login_openid` trx id 72155983 lock_mode X insert intention waiting
Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0
0: len 8; hex 73757072656d756d; asc supremum;;
*** WE ROLL BACK TRANSACTION (2)
从两个transaction的WAITING FOR THIS LOCK TO BE GRANTED
可以看出两个transaction都在insert申请insert intent X lock 时等待,从而导致死锁。
总结出精简后的导致死锁的过程如下,可以轻松的使用控制台复现。
transaction A | transaction B |
---|---|
begin | begin |
update row a失败 | |
update row b失败 | |
insert row a等待 | |
insert row b等待 | |
死锁发生 |
innodb不是行锁吗,为什么会发生死锁呢?
这里就涉及到innodb的锁机制了,innodb使用了Repeatable Read(RR)的隔离级别。在此级别下,innodb为了防止幻读(Phantom Rows),在实现上使用了gap lock。并且在search和scan的时候使用next-key lock(record lock + gap lock),但是对于在主键或唯一索引上进行查找的时候仅使用record lock。这里有一个例外,即当使用主键找不到的时候该记录的时候,则在该区间加gap lock。这次死锁的就是由这个例外情况引起的。
下面根据以上原理分析本次死锁的成因。
transaction A | 锁的情况 | transaction B | 锁的情况 |
---|---|---|---|
begin | begin | ||
update row a(a>x)失败 | 区间(x,正无穷)gap lock | ||
update row b(b>x)失败 | 区间(x,正无穷)gap lock(gap lock之间不互斥) | ||
insert row a等待 | insert intention lock等待(gap lock only stop *other* transactions from inserting to the gap) | ||
insert row b等待 | insert intention lock 等待 | ||
死锁发生 |
根据以上原理,包括
select ... where id=x lock in share mode/for update;
if(found_rows()=0)
insert into id values 1;
也可能导致死锁。
根据select结果判断,这里有极小的概率出现insert duplicate,因为每次多一次select,效率肯定不如原来的。如果还有问题可以试试使用insert ignore或者insert on duplicate key update。
SELECT 1 FROM table_a WHERE id=x;
IF(FOUND_ROWS() = 0) THEN
INSERT INTO table_a id VALUES x;
ELSE
UPDATE table_a SET ... WHERE id=x;
END IF;
以上。