两个及以上的事务各自持有对方需要的锁,导致双方都无法继续执行,只能互相等待(循环等待)。因为双方都在等待对方释放锁,而各自又无法释放锁,从而导致了死循环。这就是死锁(dead lock)。
当两个事务锁定了多张表的多条记录(如 UPDATE 或者 SELECT … FOR UPDATE),且它们各自锁定的顺序相反时,就会出现死锁。当 sql 语句锁定了多行记录和间隙时,也容易出现死锁。
关于 InnoDB 锁的类型和兼容性,可参考 https://blog.csdn.net/lamp_yang_3533/article/details/129291575
产生死锁的四个条件:
死锁检测功能 innodb_deadlock_detect
默认是开启的。当出现死锁时,InnoDB 就会检测到状态,并及时回滚导致死锁的多个事务中的一个。
mysql> show variables like 'innodb_deadlock_detect';
+------------------------+-------+
| Variable_name | Value |
+------------------------+-------+
| innodb_deadlock_detect | ON |
+------------------------+-------+
1 row in set, 1 warning (0.00 sec)
如果关闭了死锁检测功能,InnoDB 就依赖锁等待超时时间 innodb_lock_wait_timeout
,当锁等待超时(默认 50 秒)后,对事务进行回滚。
mysql> show variables like 'innodb_lock_wait_timeout';
+--------------------------+-------+
| Variable_name | Value |
+--------------------------+-------+
| innodb_lock_wait_timeout | 50 |
+--------------------------+-------+
1 row in set, 1 warning (0.00 sec)
说明: 以下示例基于 MySQL 8.0,如果是之前的版本会有所不同。
CREATE TABLE `t` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT COMMENT '主键',
`name` char(10) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT '' COMMENT '用户名称',
`department_id` tinyint(1) unsigned NOT NULL DEFAULT '0' COMMENT '用户所属的部门id',
`status` tinyint(1) NOT NULL DEFAULT '1' COMMENT '用户的状态: 1启用, 2禁用',
`created_at` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT '创建时间',
`updated_at` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT '更新时间',
PRIMARY KEY (`id`),
KEY `department_id_index` (`department_id`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci COMMENT='用户表';
# 插入部分测试数据
INSERT t (`id`, `name`, `department_id`, `status`) VALUES
('1', 'test', '8', '1'),
('2', 'test', '1', '1'),
('3', 'test', '8', '1'),
('7', 'test', '2', '1'),
('8', 'test', '2', '1'),
('9', 'test', '8', '1');
首先,在第一个连接中,开启事务 A。
# 事务 A
BEGIN;
UPDATE t SET department_id=3 WHERE id BETWEEN 3 AND 7;
由于在 MySQL 8.0 中,只要持有锁(不管是否出现锁争用),locks 表中就会记录数据。
故可以先简单看一下事务中锁的信息。
SELECT
trx_mysql_thread_id AS sql_thread_id,trx_id, OBJECT_NAME as `table`,
INDEX_NAME as `index`, LOCK_DATA as data, LOCK_MODE as mode,
LOCK_STATUS as status, LOCK_TYPE as type
FROM performance_schema.data_locks as lk
LEFT JOIN information_schema.innodb_trx as trx ON trx.trx_id=lk.ENGINE_TRANSACTION_ID;
+---------------+--------+-------+---------+------+------+---------+--------+
| sql_thread_id | trx_id | table | index | data | mode | status | type |
+---------------+--------+-------+---------+------+------+---------+--------+
| 30 | 28220 | t | NULL | NULL | IX | GRANTED | TABLE |
| 30 | 28220 | t | PRIMARY | 3 | X | GRANTED | RECORD |
| 30 | 28220 | t | PRIMARY | 7 | X | GRANTED | RECORD |
| 30 | 28220 | t | PRIMARY | 8 | X | GRANTED | RECORD |
+---------------+--------+-------+---------+------+------+---------+--------+
4 rows in set (0.00 sec)
然后,在第二个连接中,开启事务 B。
# 事务 B
BEGIN;
UPDATE t SET status=2 WHERE id=9;
继续看一下事务中锁的信息。
+---------------+--------+-------+---------+------+------+---------+--------+
| sql_thread_id | trx_id | table | index | data | mode | status | type |
+---------------+--------+-------+---------+------+------+---------+--------+
| 33 | 28226 | t | NULL | NULL | IX | GRANTED | TABLE |
| 33 | 28226 | t | PRIMARY | 9 | X | GRANTED | RECORD |
| 30 | 28220 | t | NULL | NULL | IX | GRANTED | TABLE |
| 30 | 28220 | t | PRIMARY | 3 | X | GRANTED | RECORD |
| 30 | 28220 | t | PRIMARY | 7 | X | GRANTED | RECORD |
| 30 | 28220 | t | PRIMARY | 8 | X | GRANTED | RECORD |
+---------------+--------+-------+---------+------+------+---------+--------+
6 rows in set (0.00 sec)
还未出现锁争用的情况。
然后,在事务 B 中,继续执行下面的 SQL。
UPDATE t SET status=2 WHERE id=3;
查看事务中的锁信息,如下:
+---------------+--------+-------+---------+------+------+---------+--------+
| sql_thread_id | trx_id | table | index | data | mode | status | type |
+---------------+--------+-------+---------+------+------+---------+--------+
| 33 | 28234 | t | NULL | NULL | IX | GRANTED | TABLE |
| 33 | 28234 | t | PRIMARY | 9 | X | GRANTED | RECORD |
| 33 | 28234 | t | PRIMARY | 3 | X | WAITING | RECORD |
| 30 | 28229 | t | NULL | NULL | IX | GRANTED | TABLE |
| 30 | 28229 | t | PRIMARY | 3 | X | GRANTED | RECORD |
| 30 | 28229 | t | PRIMARY | 7 | X | GRANTED | RECORD |
| 30 | 28229 | t | PRIMARY | 8 | X | GRANTED | RECORD |
+---------------+--------+-------+---------+------+------+---------+--------+
7 rows in set (0.00 sec)
可发现,目前已经出现资源竞争,事务 B 处于锁等待(wait)状态。
最后,在事务 A 中,继续执行下面的 SQL。
UPDATE t SET status=2 WHERE id=9;
出现双方循环等待,立即产生死锁,事务报错。
事务 B 报错信息如下:
[Err] 1213 - Deadlock found when trying to get lock; try restarting transaction
再来看看事务中锁的信息。
+---------------+--------+-------+---------+------+------+---------+--------+
| sql_thread_id | trx_id | table | index | data | mode | status | type |
+---------------+--------+-------+---------+------+------+---------+--------+
| 30 | 28229 | t | NULL | NULL | IX | GRANTED | TABLE |
| 30 | 28229 | t | PRIMARY | 3 | X | GRANTED | RECORD |
| 30 | 28229 | t | PRIMARY | 7 | X | GRANTED | RECORD |
| 30 | 28229 | t | PRIMARY | 8 | X | GRANTED | RECORD |
| 30 | 28229 | t | PRIMARY | 9 | X | GRANTED | RECORD |
+---------------+--------+-------+---------+------+------+---------+--------+
5 rows in set (0.00 sec)
由于死锁,事务 B 被 InnoDB 回滚了。故现在只剩下事务 A 的锁,它可以正常提交事务了。
这里,我们利用下面语句查看最新监测到的死锁信息。
SHOW ENGINE INNODB STATUS\G;
------------------------
LATEST DETECTED DEADLOCK
------------------------
2023-02-28 17:25:14 0x45c
*** (1) TRANSACTION:
TRANSACTION 28234, ACTIVE 50 sec starting index read
mysql tables in use 1, locked 1
LOCK WAIT 3 lock struct(s), heap size 1136, 2 row lock(s), undo log entries 1
MySQL thread id 33, OS thread handle 11904, query id 557 localhost 127.0.0.1 root updating
UPDATE t SET status=2 WHERE id=3
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 6 page no 4 n bits 80 index PRIMARY of table `mytest`.`t` trx id 28234 lock_mode X locks rec but not gap waiting
Record lock, heap no 4 PHYSICAL RECORD: n_fields 8; compact format; info bits 0
0: len 4; hex 00000003; asc ;;
1: len 6; hex 000000006e45; asc nE;;
2: len 7; hex 01000001260281; asc & ;;
3: len 10; hex 74657374202020202020; asc test ;;
4: len 1; hex 03; asc ;;
5: len 1; hex 81; asc ;;
6: len 5; hex 99af790294; asc y ;;
7: len 5; hex 99af791609; asc y ;;
*** (2) TRANSACTION:
TRANSACTION 28229, ACTIVE 65 sec starting index read
mysql tables in use 1, locked 1
4 lock struct(s), heap size 1136, 4 row lock(s), undo log entries 2
MySQL thread id 30, OS thread handle 1116, query id 574 localhost 127.0.0.1 root updating
UPDATE t SET status=2 WHERE id=9
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 6 page no 4 n bits 80 index PRIMARY of table `mytest`.`t` trx id 28229 lock_mode X locks rec but not gap
Record lock, heap no 4 PHYSICAL RECORD: n_fields 8; compact format; info bits 0
0: len 4; hex 00000003; asc ;;
1: len 6; hex 000000006e45; asc nE;;
2: len 7; hex 01000001260281; asc & ;;
3: len 10; hex 74657374202020202020; asc test ;;
4: len 1; hex 03; asc ;;
5: len 1; hex 81; asc ;;
6: len 5; hex 99af790294; asc y ;;
7: len 5; hex 99af791609; asc y ;;
*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 6 page no 4 n bits 80 index PRIMARY of table `mytest`.`t` trx id 28229 lock_mode X locks rec but not gap waiting
Record lock, heap no 7 PHYSICAL RECORD: n_fields 8; compact format; info bits 0
0: len 4; hex 00000009; asc ;;
1: len 6; hex 000000006e4a; asc nJ;;
2: len 7; hex 02000001160280; asc ;;
3: len 10; hex 74657374202020202020; asc test ;;
4: len 1; hex 08; asc ;;
5: len 1; hex 82; asc ;;
6: len 5; hex 99af790294; asc y ;;
7: len 5; hex 99af791618; asc y ;;
*** WE ROLL BACK TRANSACTION (1)
死锁出现的前提条件是锁等待,或者说是锁资源的争用。
不仅需要对死锁问题进行排查处理,还需要对长时间的锁等待问题,进行排查和分析。
锁等待会阻塞当前的线程,甚至对其他的线程造成影响,降低了系统的并发性能,线程资源和其持有的锁迟迟不能释放,不仅造成了资源的浪费,还有可能导致服务器崩溃。
下面是一些常常会用到的分析手段。
show processlist;
可以显示哪些线程正在运行。如果当前用户有 SUPER 权限,就可以看到所有线程。
如果有线程正在 UPDATE 或者 INSERT 某张表,那么进程的 state 为 Updating 或者 Sending data。
show processlist;
只会列出前 100 条数据,如果想列出所有结果,可以使用 show full processlist;
select connection_id();
获取。select database();
获取。关于 state 的常见状态值如下:
mysql> show full processlist;
+----+-----------------+-----------------+--------+---------+-------+------------------------+----------------------------------+
| Id | User | Host | db | Command | Time | State | Info |
+----+-----------------+-----------------+--------+---------+-------+------------------------+----------------------------------+
| 4 | event_scheduler | localhost | NULL | Daemon | 11355 | Waiting on empty queue | NULL |
| 8 | root | localhost:63023 | NULL | Sleep | 11271 | | NULL |
| 13 | root | localhost:63066 | mytest | Sleep | 10490 | | NULL |
| 20 | root | localhost:63458 | mytest | Sleep | 143 | | NULL |
| 23 | root | localhost:63470 | mytest | Sleep | 9971 | | NULL |
| 24 | root | localhost:63480 | mytest | Sleep | 8050 | | NULL |
| 25 | root | localhost:63485 | mytest | Sleep | 9583 | | NULL |
| 27 | root | localhost:63504 | mytest | Sleep | 30 | | NULL |
| 30 | root | localhost:63601 | mytest | Sleep | 63 | | NULL |
| 32 | root | localhost:63891 | mytest | Query | 0 | starting | show full processlist |
| 33 | root | localhost:63965 | mytest | Query | 34 | updating | UPDATE t SET status=2 WHERE id=9 |
+----+-----------------+-----------------+--------+---------+-------+------------------------+----------------------------------+
11 rows in set (0.00 sec)
当然,也可直接用下面的 SQL 语句查表。
select * from information_schema.processlist;
注意:从 MySQL 8.0 开始,data_locks 取代了 innodb_locks,data_lock_waits 取代了 innodb_lock_waits,并移到了数据库 performance_schema。
information_shcema.innodb_trx 表存储了当前的事务(进行中的)信息。
SELECT * FROM information_schema.innodb_trx;
performance_schema.data_locks 存储了当前的锁信息。
SELECT * FROM performance_schema.data_locks;
performance_schema.data_lock_waits 存储了当前的锁等待信息。
SELECT * FROM performance_schema.data_lock_waits;
information_schema.processlist 存储了当前的连接信息。
select * from information_schema.processlist;
除了单独查看上面几张表的详细信息外,还可以联表查询需要的字段,这样会更直观。
# 查看线程和锁的信息
SELECT
trx_mysql_thread_id AS sql_thread_id,trx_id, BLOCKING_ENGINE_TRANSACTION_ID AS blocked_by_trx_id,
OBJECT_NAME as `table`, INDEX_NAME as `index`, LOCK_DATA as data, LOCK_MODE as mode, LOCK_STATUS as status, LOCK_TYPE as type,
trx_state,trx_started,trx_wait_started,trx_query
FROM performance_schema.data_locks as lk
LEFT JOIN information_schema.innodb_trx as trx ON trx.trx_id=lk.ENGINE_TRANSACTION_ID
LEFT JOIN performance_schema.data_lock_waits as lkw ON lkw.REQUESTING_ENGINE_TRANSACTION_ID=trx.trx_id;
+---------------+--------+-------------------+-------+---------+------+------+---------+--------+-----------+---------------------+---------------------+----------------------------------+
| sql_thread_id | trx_id | blocked_by_trx_id | table | index | data | mode | status | type | trx_state | trx_started | trx_wait_started | trx_query |
+---------------+--------+-------------------+-------+---------+------+------+---------+--------+-----------+---------------------+---------------------+----------------------------------+
| 33 | 28248 | 28239 | t | NULL | NULL | IX | GRANTED | TABLE | LOCK WAIT | 2023-02-28 20:01:15 | 2023-02-28 20:01:15 | UPDATE t SET status=2 WHERE id=9 |
| 33 | 28248 | NULL | t | PRIMARY | 9 | X | WAITING | RECORD | LOCK WAIT | 2023-02-28 20:01:15 | 2023-02-28 20:01:15 | UPDATE t SET status=2 WHERE id=9 |
| 30 | 28239 | NULL | t | NULL | NULL | IX | GRANTED | TABLE | RUNNING | 2023-02-28 19:04:41 | NULL | NULL |
| 30 | 28239 | NULL | t | PRIMARY | 9 | X | GRANTED | RECORD | RUNNING | 2023-02-28 19:04:41 | NULL | NULL |
+---------------+--------+-------------------+-------+---------+------+------+---------+--------+-----------+---------------------+---------------------+----------------------------------+
4 rows in set (0.00 sec)
show engine innodb status
查看最近一次检测到的死锁信息(LATEST DETECTED DEADLOCK)。
show engine innodb status\G;
如果死锁问题很频繁,想要查看更多的死锁信息。可以开启 innodb_print_all_deadlocks
选项,将所有的死锁信息记录到 MySQL 服务器的错误日志。
show status like '%lock%';
mysql> show status like '%lock%';
+------------------------------------------+--------+
| Variable_name | Value |
+------------------------------------------+--------+
| Com_lock_instance | 0 |
| Com_lock_tables | 0 |
| Com_unlock_instance | 0 |
| Com_unlock_tables | 0 |
| Handler_external_lock | 26 |
| Innodb_row_lock_current_waits | 1 |
| Innodb_row_lock_time | 596291 |
| Innodb_row_lock_time_avg | 37268 |
| Innodb_row_lock_time_max | 51895 |
| Innodb_row_lock_waits | 16 |
| Key_blocks_not_flushed | 0 |
| Key_blocks_unused | 6698 |
| Key_blocks_used | 0 |
| Locked_connects | 0 |
| Performance_schema_locker_lost | 0 |
| Performance_schema_metadata_lock_lost | 0 |
| Performance_schema_rwlock_classes_lost | 0 |
| Performance_schema_rwlock_instances_lost | 0 |
| Performance_schema_table_lock_stat_lost | 0 |
| Table_locks_immediate | 409 |
| Table_locks_waited | 0 |
+------------------------------------------+--------+
21 rows in set (0.00 sec)
可重点关注下面几个参数:
如果锁等待的总次数和锁等待的平均时长比较大,可以考虑排查优化。
查看哪些表是打开的,正在被线程使用。
in_use 表示有多少线程正在使用某张表,可以用来过滤。
show open tables where in_use>0;
show open tables from db_name where in_use>0;
一般只用于查看哪些表被很多线程打开使用。
select ... for update
和 共享锁(S)select ... lock in share mode
。