并发删除数据时出现数据库死锁

1.场景:

在直连酒店静态信息更新JOB执行数据比对、推送资源系统成功之后,之前对旧数据进行的是逻辑删除,随着酒店数量逐渐增大数据库压力剧增,后面决定物理删除旧数据。单线程执行没问题,几个酒店并发执行更新时,数据库出现死锁。
MySQL版本为: 5.6.38 MySQL Community Server (GPL)
事务隔离级别为MySQL默认:RR (Repeatable Read)

报错报文样例如下:

 
  
  1. ### Error updating database. Cause: com.mysql.jdbc.exceptions.jdbc4.MySQLTransactionRollbackException: Deadlock found when trying to get lock; try restarting transaction
  2. ### The error may involve com.banma.ota.mapper.hotel.DescriptionDoMapper.deleteByHotelInfoId-Inline
  3. ### The error occurred while setting parameters
  4. ### SQL: delete from ota_policy_info where ota_policy_info.policy_id in (select p.id from ota_hotel_info h,ota_policy p where h.id=p.hotel_info_id and h.id=?)
  5. ### Cause: com.mysql.jdbc.exceptions.jdbc4.MySQLTransactionRollbackException: Deadlock found when trying to get lock; try restarting transaction; SQL []; Deadlock found when trying to get lock; try restarting transaction; nested exception is com.mysql.jdbc.exceptions.jdbc4.MySQLTransactionRollbackException: Deadlock found when trying to get lock; try restarting transaction

2.删除表数据的逻辑如下:

分为两种情况

a.通过hotel_code查出hotel_info表的自增id,再通过该id删除其他几张表

b.policy表与policy_info表、address表与address_zone表存在里外联系处理方法为先根据hotel_info_id找到policy表对应的policy_id,根据policy_id先删除policy_info表再删除policy表。

3.产生报错的几个SQL整理如下

(以下SQL均不是本人所写 = ,=)

 
  
  1. delete from ota_hotel_room_type where ota_hotel_room_type.hotel_info_id=(select id from ota_hotel_info where id=?)
  2. delete from ota_contact_info where ota_contact_info.hotel_info_id =(select id from ota_hotel_info where id=?)
  3. delete from ota_affiliation_info where ota_affiliation_info.hotel_info_id=(select id from ota_hotel_info where id=?)
  4. delete from ota_contact_info where ota_contact_info.hotel_info_id =(select id from ota_hotel_info where id=?)
  5. delete from ota_address_zone where ota_address_zone.address_id in (select a.id from ota_hotel_info h,ota_hotel_address a where h.id=a.hotel_info_id and h.id=?)
  6. delete from ota_policy_info where ota_policy_info.policy_id in (select p.id from ota_hotel_info h,ota_policy p where h.id=p.hotel_info_id and h.id=?)

4.InnoDB的锁机制
(以下内容参考自网络)
InnoDB中的锁是基于索引的,事务在对数据进行锁定时,是对数据涉及的聚集索引和非聚集索引进行锁定。

如果SELECT操作属于某种写操作的前置子查询,这些SELECT语句就涉及到锁机制,是“当前读”(读取的是记录的最新版本,并且当前读返回的记录,都会加上锁,保证其他事务不会再并发修改这条记录。)

UPDATE、DELETE、INSERT操作时,会根据WHERE条件涉及到的数据加排它锁。

5.SQL分析

a.各个表的索引情况
当天在测服并发删除旧数据时,测服库ota_hotel_test中涉及到的几张表均没有在hotel_info_id列上加索引,policy_info表中policy_id也没有索引,address_zone表也一样。这几张表都只有一个id主键索引。

b.SQL执行计划


c.数据库死锁日志

 
  
  1. LATEST DETECTED DEADLOCK
  2. ------------------------
  3. 2018-05-11 15:53:45 7f333d7d1700
  4. *** (1) TRANSACTION:
  5. TRANSACTION 1062077, ACTIVE 2 sec fetching rows
  6. mysql tables in use 3, locked 3
  7. LOCK WAIT 144 lock struct(s), heap size 30248, 12350 row lock(s), undo log entries 6144
  8. MySQL thread id 3677, OS thread handle 0x7f333d74f700, query id 5017870 172.16.41.239 root Sending data
  9. delete from ota_policy_info where ota_policy_info.policy_id in (select p.id from ota_hotel_info h,ota_policy p where h.id=p.hotel_info_id and h.id=3138)
  10. *** (1) WAITING FOR THIS LOCK TO BE GRANTED:
  11. RECORD LOCKS space id 723 page no 97 n bits 152 index `PRIMARY` of table `ota_hotel_test`.`ota_policy_info` trx id 1062077 lock_mode X waiting
  12. Record lock, heap no 70 PHYSICAL RECORD: n_fields 8; compact format; info bits 0
  13. 0: len 4; hex 8000586a; asc Xj;;
  14. 1: len 6; hex 0000001034bc; asc 4 ;;
  15. 2: len 7; hex a200000b623082; asc b0 ;;
  16. 3: len 4; hex 80000ca3; asc ;;
  17. 4: len 30; hex e585a5e4bd8fe697b6e997b4efbc9a31353a3030e4bba5e5908e20202020; asc 15:00 ; (total 58 bytes);
  18. 5: len 19; hex 4172726976616c416e64446570617274757265; asc ArrivalAndDeparture;;
  19. 6: len 5; hex 999fd6fd6d; asc m;;
  20. 7: len 5; hex 999fd6fd6d; asc m;;
  21. *** (2) TRANSACTION:
  22. TRANSACTION 1062076, ACTIVE 2 sec starting index read
  23. mysql tables in use 3, locked 3
  24. 70 lock struct(s), heap size 13864, 8718 row lock(s), undo log entries 8726
  25. MySQL thread id 3673, OS thread handle 0x7f333d7d1700, query id 5017876 172.16.41.239 root updating
  26. delete from ota_policy_info where ota_policy_info.policy_id in (select p.id from ota_hotel_info h,ota_policy p where h.id=p.hotel_info_id and h.id=3133)
  27. *** (2) HOLDS THE LOCK(S):
  28. RECORD LOCKS space id 723 page no 97 n bits 152 index `PRIMARY` of table `ota_hotel_test`.`ota_policy_info` trx id 1062076 lock_mode X locks rec but not gap
  29. Record lock, heap no 70 PHYSICAL RECORD: n_fields 8; compact format; info bits 0
  30. 0: len 4; hex 8000586a; asc Xj;;
  31. 1: len 6; hex 0000001034bc; asc 4 ;;
  32. 2: len 7; hex a200000b623082; asc b0 ;;
  33. 3: len 4; hex 80000ca3; asc ;;
  34. 4: len 30; hex e585a5e4bd8fe697b6e997b4efbc9a31353a3030e4bba5e5908e20202020; asc 15:00 ; (total 58 bytes);
  35. 5: len 19; hex 4172726976616c416e64446570617274757265; asc ArrivalAndDeparture;;
  36. 6: len 5; hex 999fd6fd6d; asc m;;
  37. 7: len 5; hex 999fd6fd6d; asc m;;
  38. *** (2) WAITING FOR THIS LOCK TO BE GRANTED:
  39. RECORD LOCKS space id 723 page no 4 n bits 168 index `PRIMARY` of table `ota_hotel_test`.`ota_policy_info` trx id 1062076 lock_mode X waiting
  40. Record lock, heap no 23 PHYSICAL RECORD: n_fields 8; compact format; info bits 0
  41. 0: len 4; hex 80000040; asc @;;
  42. 1: len 6; hex 0000000df6ae; asc ;;
  43. 2: len 7; hex b700000b6709f4; asc g ;;
  44. 3: len 4; hex 8000000a; asc ;;
  45. 4: len 30; hex e585a5e4bd8fe697b6e997b4efbc9a31343a3030e4bba5e5908e20202020; asc 14:00 ; (total 58 bytes);
  46. 5: len 19; hex 4172726976616c416e64446570617274757265; asc ArrivalAndDeparture;;
  47. 6: len 5; hex 999fa12cb9; asc , ;;
  48. 7: len 5; hex 999fa12cb9; asc , ;;
  49. *** WE ROLL BACK TRANSACTION (1)

6.出现死锁的原因(初步猜想)

因为对应的列上没有索引,SQL会走聚簇索引的全表扫描进行过滤,虽然满足删除条件的记录可能只有几条,但聚簇索引上所有的记录都被加上了排它锁,每条记录的间隙也被加上了GAP锁。

MySQL会对这种情况进行优化:在semi-consistent read开启的情况下,当存储引擎层将所有记录加锁返回后,会由MySQL Server层进行过滤,对于不满足条件的记录,会在判断后提前释放锁。所以不满足条件的记录会有加锁+释放锁的动作。

semi-consistent read触发的条件:隔离级别是RC或者RR同时需要设置innodb_locks_unsafe_for_binlog参数关于semi-consistent read还在查看相关资料。

7.解决方案

1.当天晚上采取的措施是在每张表的对应列上都加索引,去掉了冗余的子查询sql,将删除policy_info、address_zon这两张表的逻辑拆成两个步骤,即先根据hotel_info_id查出policy_id,之后再另外启动一个session根据policy_id删除policy_info表。改良SQL后本地并发跑了几次,都是没有出现死锁报错的。

2.今天采用控制变量法排查原因,试图重现死锁。a.发现将索引去掉并不会出现死锁b.将下面两条SQL改回老版本,并发跑14个酒店删除老数据时出现死锁报错。

 
  
  1. delete from ota_address_zone where ota_address_zone.address_id in (select a.id from ota_hotel_info h,ota_hotel_address a where h.id=a.hotel_info_id and h.id=?)
  2. delete from ota_policy_info where ota_policy_info.policy_id in (select p.id from ota_hotel_info h,ota_policy p where h.id=p.hotel_info_id and h.id=?)

你可能感兴趣的:(MySQL,concurrency)