【mysql并行批量删除死锁排查】

文章目录

    • 背景
    • 表单和索引结构
    • 原因分析
    • 解决方案

背景

mysql批量删除并插入新数据的场景下,为提高执行效率,使用了多线程并发执行的方式。当然mysql建表时使用了分区(partition)机制,聚焦到我们这次讨论的问题,分区(partition)以及跟案例无关的内容暂且不提。

测试环境并发量不高,简单验证执行ok,预发环境并发量比较高,逻辑验证时,出现了以下错误:


2023-08-03 15:15:17.924 32098 ERROR [] --- [thread-6] druid.sql.Statement                     :149 : {conn-210049, pstmt-220068} execute error. DELETE FROM `tablename_202308`
        WHERE `date` = 20230803 AND `app_code` = 'APPCODE' AND `referer` = ?
         AND `target_type` IN (  1
         , 6
         , 8
         , 9
         , 12
         , 5
         )

        LIMIT 5000;

com.mysql.cj.jdbc.exceptions.MySQLTransactionRollbackException: Deadlock found when trying to get lock; try restarting transaction
        at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:123)
        at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:97)
        at com.mysql.cj.jdbc.exceptions.SQLExceptionsMapping.translateException(SQLExceptionsMapping.java:122)
        at com.mysql.cj.jdbc.ClientPreparedStatement.executeInternal(ClientPreparedStatement.java:953)
        at com.mysql.cj.jdbc.ClientPreparedStatement.execute(ClientPreparedStatement.java:370)
        at com.alibaba.druid.filter.FilterChainImpl.preparedStatement_execute(FilterChainImpl.java:3461)

mysql死锁了:Deadlock found,通过查看mysql死锁日志“命令: show engine innodb status;”,得到以下内容:

------------------------
LATEST DETECTED DEADLOCK
------------------------
2023-08-03 15:15:17 140057117718272
*** (1) TRANSACTION:
TRANSACTION 701477003, ACTIVE 1 sec starting index read
mysql tables in use 1, locked 1
LOCK WAIT 171 lock struct(s), heap size 24784, 10001 row lock(s), undo log entries 5000
MySQL thread id 1136049, OS thread handle 140057034860288, query id 753710384 172.30.184.173 zykj updating
DELETE FROM `tablename_202308`
        WHERE `date` = 20230803 AND `app_code` = 'APPCODE' AND `referer` = 'happy'
         AND `target_type` IN (  1
         , 6
         , 8
         , 9
         , 12
         , 5
         ) 

        LIMIT 5000

*** (1) HOLDS THE LOCK(S):
RECORD LOCKS space id 234584 page no 6144 n bits 352 index idx_ad_network of table `pre`.`tablename_202308` /* Partition `p14` */ trx id 701477003 lock_mode X locks rec but not gap
Record lock, heap no 2 PHYSICAL RECORD: n_fields 5; compact format; info bits 32
 0: len 4; hex 0134b293; asc  4  ;;
 1: len 17; hex 3130325f61645f726571756573745f7076; asc 102_ad_request_pv;;
 2: len 1; hex 06; asc  ;;
 3: len 8; hex 176969e363f9f00f; asc  ii c   ;;
 4: len 12; hex 636f6d2e73732e6861707079; asc happy;;


*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 234584 page no 2156 n bits 400 index idx_ad_network of table `pre`.`tablename_202308` /* Partition `p14` */ trx id 701477003 lock_mode X locks rec but not gap waiting
Record lock, heap no 97 PHYSICAL RECORD: n_fields 5; compact format; info bits 32
 0: len 4; hex 0134b293; asc  4  ;;
 1: len 17; hex 3130325f61645f726571756573745f7076; asc 102_ad_request_pv;;
 2: len 1; hex 05; asc  ;;
 3: len 8; hex 176969e3c3f9f003; asc  ii     ;;
 4: len 14; hex 636f6d2e64642e72656164696e67; asc com.dd.reading;;


*** (2) TRANSACTION:
TRANSACTION 701477006, ACTIVE 1 sec fetching rows
mysql tables in use 1, locked 1
LOCK WAIT 64 lock struct(s), heap size 8400, 581 row lock(s), undo log entries 290
MySQL thread id 1136048, OS thread handle 140071716505344, query id 753710360 172.30.184.173 zykj updating
DELETE FROM `tablename_202308`
        WHERE `date` = 20230803 AND `app_code` = 'APPCODE' AND `referer` = 'com.dd.reading'
         AND `target_type` IN (  1
         , 6
         , 8
         , 9
         , 12
         , 5
         ) 

        LIMIT 5000

*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 234584 page no 2156 n bits 400 index idx_ad_network of table `pre`.`tablename_202308` /* Partition `p14` */ trx id 701477006 lock_mode X locks rec but not gap
Record lock, heap no 97 PHYSICAL RECORD: n_fields 5; compact format; info bits 32
 0: len 4; hex 0134b293; asc  4  ;;
 1: len 17; hex 3130325f61645f726571756573745f7076; asc 102_ad_request_pv;;
 2: len 1; hex 05; asc  ;;
 3: len 8; hex 176969e3c3f9f003; asc  ii     ;;
 4: len 14; hex 636f6d2e64642e72656164696e67; asc com.dd.reading;;
 
 .... 忽略部分日志
 Record lock, heap no 333 PHYSICAL RECORD: n_fields 5; compact format; info bits 32
 0: len 4; hex 0134b293; asc  4  ;;
 1: len 17; hex 3130325f61645f726571756573745f7076; asc 102_ad_request_pv;;
 2: len 1; hex 05; asc  ;;
 3: len 8; hex 176969e3c3f9f41d; asc  ii     ;;
 4: len 14; hex 636f6d2e64642e72656164696e67; asc com.dd.reading;;


*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 234584 page no 6144 n bits 352 index idx_ad_network of table `pre`.`tablename_202308` /* Partition `p14` */ trx id 701477006 lock_mode X locks rec but not gap waiting
Record lock, heap no 2 PHYSICAL RECORD: n_fields 5; compact format; info bits 32
 0: len 4; hex 0134b293; asc  4  ;;
 1: len 17; hex 3130325f61645f726571756573745f7076; asc 102_ad_request_pv;;
 2: len 1; hex 06; asc  ;;
 3: len 8; hex 176969e363f9f00f; asc  ii c   ;;
 4: len 12; hex 636f6d2e73732e6861707079; asc happy;;

*** WE ROLL BACK TRANSACTION (2)

表单和索引结构


CREATE TABLE `tablename_202308` (
  `ad_rev_record_id` bigint unsigned NOT NULL COMMENT '主键',
  `app_code` varchar(100) COLLATE utf8mb4_general_ci NOT NULL DEFAULT '' COMMENT '',
  `target_id` varchar(100) COLLATE utf8mb4_general_ci NOT NULL DEFAULT '' COMMENT '指标id',
  `target_type` tinyint unsigned NOT NULL DEFAULT '0' COMMENT '指标类型',
  `ad_rev_target_id` bigint unsigned NOT NULL COMMENT 'xx主键',
  `this_qid` varchar(200) COLLATE utf8mb4_general_ci NOT NULL DEFAULT '' COMMENT '渠道',
  `this_lid` varchar(200) COLLATE utf8mb4_general_ci NOT NULL DEFAULT '' COMMENT '',
  `brand` varchar(50) COLLATE utf8mb4_general_ci NOT NULL DEFAULT '' COMMENT '品牌',
  `ad_network` tinyint unsigned NOT NULL DEFAULT '0' COMMENT '广告网络:',
  `referer` varchar(100) COLLATE utf8mb4_general_ci NOT NULL DEFAULT '' COMMENT '',
  `app_ad_id` varchar(200) COLLATE utf8mb4_general_ci NOT NULL DEFAULT '' COMMENT '()',
  `adcdn_ad_id` varchar(200) COLLATE utf8mb4_general_ci NOT NULL DEFAULT '' COMMENT '',
  `date` int unsigned NOT NULL DEFAULT '0' COMMENT '日期,格式:yyyyMMdd',
  `value` decimal(14,6) NOT NULL DEFAULT '0.000000' COMMENT '结果',
  `create_by` bigint unsigned NOT NULL DEFAULT '0' COMMENT '创建者',
  `update_by` bigint unsigned NOT NULL DEFAULT '0' COMMENT '更新者',
  `create_time` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT '创建时间',
  `update_time` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT '更新时间',
  PRIMARY KEY (`ad_rev_record_id`,`referer`),
  KEY `idx_date_type` (`date`,`target_type`),
  KEY `idx_ad_rev_target_id` (`ad_rev_target_id`),
  KEY `idx_ad_network` (`date`,`target_id`,`ad_network`),
  KEY `idx_qid` (`date`,`target_id`,`this_qid`,`this_lid`),
  KEY `idx_ad` (`date`,`target_id`,`app_ad_id`,`ad_network`,`adcdn_ad_id`),
  KEY `idx_ad_qid` (`date`,`target_id`,`app_ad_id`,`this_qid`,`ad_network`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci COMMENT='精简结果表'
/*!50100 PARTITION BY KEY (referer)
PARTITIONS 20 */

原因分析

从mysql的死锁日志可以看出mysql的删除流程中,会涉及到索引的锁定,索引的删除;
mysql的锁是加在如下位置:

RECORD LOCKS space id 234584 page no 6144 n bits 352 index idx_ad_network of table `pre`.`tablename_202308` /* Partition `p14` */ trx id 701477003 lock_mode X locks rec but not gap

space id 234584 page no 6144 n bits 352 index,
因为我们是事务操作(先删除,后插入),在事务提交前,锁会一直存在。

在我们的case中,两个事务a和b分别持有记录锁A(space id 234584 page no 6144 n bits 352 )和锁B(space id 234584 page no 2156 n bits 400),而又分别去请求锁B(space id 234584 page no 2156 n bits 400)和锁A(space id 234584 page no 6144 n bits 352 ),导致了死锁问题的产生。

重点看下索引:KEY idx_ad_network (date,target_id,ad_network),和删除语句

WHERE `date` = 20230803 AND `app_code` = 'APPCODE' AND `referer` = ?
         AND `target_type` IN (  1
         , 6
         , 8
         , 9
         , 12
         , 5
         )

该索引(idx_ad_network)不能区分出target_type,所以mysql在删除时选择了锁定所有索引页,而多个线程删除时有顺序的差异,最终导致死锁的产生了。

解决方案

在索引中添加target_type字段,那么mysql在删除锁定分页时,可以根据target_type进行区分,不必锁定所有索引页,来避免死锁产生。
如下更改索引后,多线程并发删除不再有死锁的异常(实际场景时app_code固定值,所有没有添加到索引中。)

CREATE TABLE `tablename_202308` (
  .... 忽略表字段信息
  PRIMARY KEY (`ad_rev_record_id`,`referer`),
  KEY `idx_ad_rev_target_id` (`date`,`referer`,`target_type`,`ad_rev_target_id`),
  KEY `idx_ad_network` (`date`,`referer`,`target_type`,`target_id`,`ad_network`),
  KEY `idx_qid` (`date`,`referer`,`target_type`,`target_id`,`this_qid`,`this_lid`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci COMMENT='精简结果表'
/*!50100 PARTITION BY KEY (referer)
PARTITIONS 30 */

你可能感兴趣的:(mysql,数据库)