(13)生产环境出现大量DB死锁

1、死锁日志

2018-10-23T07:16:23.919555+08:00 478808 [Note] InnoDB: Transactions deadlock detected, dumping detailed information.

2018-10-23T07:16:23.919573+08:00 478808 [Note] InnoDB: 

*** (1) TRANSACTION:

TRANSACTION 638350242, ACTIVE 0 sec starting index read

mysql tables in use 1, locked 1

LOCK WAIT 18 lock struct(s), heap size 1136, 6 row lock(s), undo log entries 2

MySQL thread id 482727, OS thread handle 139949379430144, query id 1038382163 10.205.72.161 ucp updating

update ccp_order_info_1 set bill_push_status = 1 where partner_id = '80640511' and  order_code in

     (  'UCP181023071619030500' , 'UCP181023071613030491', 'UCP181023071604030490', 'UCP181023071558030489' )

2018-10-23T07:16:23.919607+08:00 478808 [Note] InnoDB: *** (1) WAITING FOR THIS LOCK TO BE GRANTED:

RECORD LOCKS space id 383 page no 9655 n bits 104 index PRIMARY of table `ccp`.`ccp_order_info_1` /* Partition `p201810` */ trx id 638350242 lock_mode X locks rec but not gap waiting

Record lock, heap no 32 PHYSICAL RECORD: n_fields 86; compact format; info bits 0

 0: len 8; hex fd67828666135000; asc  g  f P ;;

 1: len 4; hex 5bce5a3d; asc [ Z=;;

 2: len 6; hex 0000260c73a3; asc   & s ;;

 3: len 7; hex 28000000230937; asc (   # 7;;

 4: len 21; hex 554350313831303233303731363133303330343931; asc UCP181023071613030491;;

  ... // 省略后续82个字段的信息

TRANSACTION 638350243, ACTIVE 0 sec fetching rows

mysql tables in use 1, locked 1

14 lock struct(s), heap size 1136, 2 row lock(s), undo log entries 1

MySQL thread id 478808, OS thread handle 139949392209664, query id 1038382165 10.205.72.157 ucp updating

UPDATE ccp_order_info_1 SET order_Status=1,error_Flag='1',push_time='2018-10-23 07:16:23.91'  WHERE partner_id = '80640511' And ORDER_ID =         

9036334691091959808

2018-10-23T07:16:23.921748+08:00 478808 [Note] InnoDB: *** (2) HOLDS THE LOCK(S):

RECORD LOCKS space id 383 page no 9655 n bits 112 index PRIMARY of table `ccp`.`ccp_order_info_1` /* Partition `p201810` */ trx id 638350243 lock_mode X locks rec but not gap

Record lock, heap no 32 PHYSICAL RECORD: n_fields 86; compact format; info bits 0

 0: len 8; hex fd67828666135000; asc  g  f P ;;

 1: len 4; hex 5bce5a3d; asc [ Z=;;

 2: len 6; hex 0000260c73a3; asc   & s ;;

 3: len 7; hex 28000000230937; asc (   # 7;;

 4: len 21; hex 554350313831303233303731363133303330343931; asc UCP181023071613030491;;

 ... 省略后面82个字段的信息

2018-10-23T07:16:23.923431+08:00 478808 [Note] InnoDB: *** (2) WAITING FOR THIS LOCK TO BE GRANTED:

RECORD LOCKS space id 383 page no 9655 n bits 104 index PRIMARY of table `ccp`.`ccp_order_info_1` /* Partition `p201810` */ trx id 638350243 lock_mode X locks rec but not gap waiting

Record lock, heap no 4 PHYSICAL RECORD: n_fields 86; compact format; info bits 0

 0: len 8; hex fd67a62566135000; asc  g %f P ;;

 1: len 4; hex 5bce5a34; asc [ Z4;;

 2: len 6; hex 0000260c73a2; asc   & s ;;

 3: len 7; hex 27000040030785; asc '  @   ;;

 4: len 21; hex 554350313831303233303731363034303330343930; asc UCP181023071604030490;;

 ... 省略后面82个字段的信息

2018-10-23T07:16:23.925041+08:00 478808 [Note] InnoDB: *** WE ROLL BACK TRANSACTION (2)

上面是两个事物发生死锁的日志,关键日志行分析如下:

1、*** (1) TRANSACTION:第一个事务消息开始。

2、LOCK WAIT 18 lock struct(s), heap size 1136, 6 row lock(s), undo log entries 2

18个锁等待结构体,6行被锁定,事务内已经生成了 2个undo项

3、MySQL thread id 482727, OS thread handle 139949379430144, query id 1038382163 10.205.72.161 ucp updating

mysql中对应的线程ID为482727操作系统线程ID为139949379430144,查询id1038382163 ,线程状态:更新中。

4、update ccp_order_info_1 set bill_push_status = 1 where partner_id = '80640511' and  order_code in  (  'UCP181023071619030500' , 'UCP181023071613030491', 'UCP181023071604030490', 'UCP181023071558030489' )

当前事务SQL,后文也得该语句执行需申请锁而被阻塞

5、2018-10-23T07:16:23.919607+08:00 478808 [Note] InnoDB: *** (1) WAITING FOR THIS LOCK TO BE GRANTED:

SQL语句中,需要申请的锁(无法立即获取锁信息)

6、RECORD LOCKS space id 383 page no 9655 n bits 112 index PRIMARY of table `ccp`.`ccp_order_info_1` /* Partition `p201810` */ trx id 638350243 lock_mode X locks rec but not gap

需要申请的行锁信息:表空间ID为383,页为9655,在112个字节开始,需要锁定的索引名为(ccp_order_info_1)的主键索引,事务ID为638350243,锁的类型为lock_mode X locks rec but not gap waiting(行级别的排他锁,并不是间隙锁)。

7、Record lock, heap no 32 PHYSICAL RECORD: n_fields 86; compact format; info bits 0

 0: len 8; hex fd67828666135000; asc  g  f P ;;

 1: len 4; hex 5bce5a3d; asc [ Z=;;

 2: len 6; hex 0000260c73a3; asc   & s ;;

 3: len 7; hex 28000000230937; asc (   # 7;;

 4: len 21; hex 554350313831303233303731363133303330343931; asc UCP181023071613030491;;

 ... 省略后面82个字段的信息

请求行锁所对应的物理数据(真实的行数据),从这里也开以看出,这里申请order_no='UCP181023071613030491' 该行数据的行锁。

8、2018-10-23T07:16:23.921701+08:00 478808 [Note] InnoDB: *** (2) TRANSACTION:

接下来看第二个事务的信息,根据上文的分析,事务二在执行的SQL语句为:UPDATE ccp_order_info_1 SET order_Status=1,error_Flag='1',push_time='2018-10-23 07:16:23.91'  WHERE partner_id = '80640511' And ORDER_ID =         9036334691091959808 根据order_id更新1条数据。

9、2018-10-23T07:16:23.921748+08:00 478808 [Note] InnoDB: *** (2) HOLDS THE LOCK(S):

记录事务二已持有的锁信息:

10、2018-10-23T07:16:23.921748+08:00 478808 [Note] InnoDB: *** (2) HOLDS THE LOCK(S):

RECORD LOCKS space id 383 page no 9655 n bits 112 index PRIMARY of table `ccp`.`ccp_order_info_1` /* Partition `p201810` */ trx id 638350243 lock_mode X locks rec but not gap

发现事务二持有的锁 ,正是事务一急切响应的锁,即order_no=''UCP181023071613030491'的主键索引,即该索引对应的行数据。

11、2018-10-23T07:16:23.923431+08:00 478808 [Note] InnoDB: *** (2) WAITING FOR THIS LOCK TO BE GRANTED:

显示事务二需要申请的锁。

其主要是申请记录UCP181023071604030490的主键索引,然后mysql立马检测到发生了索引,因为该锁已经被事务1所持有,innodb会选择回滚一个事务,解除死锁,从日志看出,innodb选择将事务二进行回滚。

为什么事务二会去申请记录UCP181023071604030490行锁呢?从哪里可以看出事务1已经持有记录UCP181023071604030490的行锁呢?

死锁日志中,没有事务一中输出事务1当前所持有的行锁,故我们只能从如下信息进行推论:

LOCK WAIT 18 lock struct(s), heap size 1136, 6 row lock(s), undo log entries 2

MySQL thread id 482727, OS thread handle 139949379430144, query id 1038382163 10.205.72.161 ucp updating

update ccp_order_info_1 set bill_push_status = 1 where partner_id = '80640511' and  order_code in

     (  'UCP181023071619030500' , 'UCP181023071613030491', 'UCP181023071604030490', 'UCP181023071558030489' )

undo log entries 2,创建了2个undo条目,猜测两条update,与in中最后两个条目吻合,故认为上述推论可信。第二个问题,为什么事务二会去申请UCP181023071604030490的行锁,应该是事务2中还会有根据order_id去更新UCP181023071604030490该行的SQL语句,与项目组确认代码后分析,确实是有for循环对多条数据进行更新,符合推论,死锁问题得到完成分析。

死锁原因分析:

1、事务1根据order_no(二级索引)更新多条记录,其加锁顺序: 

UCP181023071558030489行:先申请 order_no_index , 再申请 primary_index (主键索引)   申请成功

UCP181023071604030490行:先申请 order_no_index, 再申请 primary_index (主键索引)  申请成功

UCP181023071613030491行:先申请 order_no_index, 再申请 primary_index(主键索引)       order_no_index 申请成功,primary_index 排队等待

UCP181023071619030500行:先申请 order_no_index, 再申请 primary_index(主键索引)

2、事务2是根据主键ID循环来更新多条记录,其加锁顺序为:

UCP181023071613030491行:申请 primary_index(主键索引),然后再申请UCP181023071604030490 的主键索引,事务一,二相互持有各自需要锁,死锁发生。

解决方案:

1、同表更新,用唯一字段更新, 要么order_id ,要么order_no 

2、事务对多个数据更新操作,先集合排序顺序加锁,避免死锁。

Mysql 数据库加锁,可以看看何登成http://hedengcheng.com/?p=771

你可能感兴趣的:((13)生产环境出现大量DB死锁)