Ⅰ、 show engine innodb status\G
1.1 实力分析一波
锁介绍的那篇中已经提到了这个命令,现在我们开一个参数,更细致的分析一下这个命令
(root@localhost) [(none)]> set global innodb_status_output_locks=1;
Query OK, 0 rows affected (0.00 sec)
(root@localhost) [test]> begin;
Query OK, 0 rows affected (0.00 sec)
(root@localhost) [test]> delete from l where a = 2;
Query OK, 1 row affected (0.00 sec)
(root@localhost) [test]> update l set b = b + 1 where a = 4;
Query OK, 1 row affected (0.01 sec)
Rows matched: 1 Changed: 1 Warnings: 0
(root@localhost) [test]> show engine innodb status\G
...
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 30217412, ACTIVE 37 sec
2 lock struct(s), heap size 1136, 2 row lock(s), undo log entries 2
MySQL thread id 355, OS thread handle 140483080300288, query id 1263 localhost root starting
show engine innodb status
TABLE LOCK table `test`.`l` trx id 30217412 lock mode IX
RECORD LOCKS space id 1358 page no 3 n bits 72 index PRIMARY of table `test`.`l` trx id 30217412 lock_mode X locks rec but not gap
Record lock, heap no 2 PHYSICAL RECORD: n_fields 6; compact format; info bits 32
0: len 4; hex 80000002; asc ;;
1: len 6; hex 000001cd14c4; asc ;;
2: len 7; hex 2400000fc21499; asc $ ;;
3: len 4; hex 80000004; asc ;;
4: len 4; hex 80000006; asc ;;
5: len 4; hex 80000008; asc ;;
Record lock, heap no 3 PHYSICAL RECORD: n_fields 6; compact format; info bits 0
0: len 4; hex 80000004; asc ;;
1: len 6; hex 000001cd14c4; asc ;;
2: len 7; hex 2400000fc214c8; asc $ ;;
3: len 4; hex 80000007; asc ;;
4: len 4; hex 80000008; asc ;;
5: len 4; hex 8000000a; asc ;;
...
解析:
- table lock IX 意向排他锁(意向锁都是表锁)
- record locks 记录锁
-->space id 表空间
-->page no 第几个页,所有的记录开始写都是从表的第四个页开始写,第四个页也是聚集索引的root page
-->index PRIMARY 表示在主键上加了一把锁
-->lock_mode 锁的模式
-->locks rec but not gap 这个先不看
-->heap no 2 PHYSICAL RECORD: n_fields 6 锁住记录的heap no为2的物理记录,这个记录一共6个列
-->compact format 这条记录的存储格式是compact(dynamic也是compact)
-->info bits 0表示这条记录没有被删除;非0表示被修改或者被删除(32)
Q? 表中是四个列,为什么这把是6个列?
- 如果没有主键的话,会多一个隐藏列row_id,这里有主键row_id就是主键那不谈
- 6个字节的表示事务id,7个字节表示回滚指针,这两个列就是隐藏列
1.2、趁热打铁,分析一下等待的情况
session1:
(root@localhost) [test]> begin;
Query OK, 0 rows affected (0.00 sec)
(root@localhost) [test]> delete from l where a = 2;
Query OK, 1 row affected (0.00 sec)
session2:
(root@localhost) [test]> begin;
Query OK, 0 rows affected (0.00 sec)
(root@localhost) [test]> select * from l where a=2 for update;
hang住了
session3:
...
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 421958478909040, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 30217455, ACTIVE 1741 sec starting index read
mysql tables in use 1, locked 1
LOCK WAIT 2 lock struct(s), heap size 1136, 2 row lock(s)
MySQL thread id 396, OS thread handle 140483215816448, query id 2340 localhost root statistics
select * from l where a=2 for update
------- TRX HAS BEEN WAITING 27 SEC FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 1358 page no 3 n bits 72 index PRIMARY of table `test`.`l` trx id 30217455 lock_mode X locks rec but not gap waiting
Record lock, heap no 2 PHYSICAL RECORD: n_fields 6; compact format; info bits 32
0: len 4; hex 80000002; asc ;;
1: len 6; hex 000001cd14ee; asc ;;
2: len 7; hex 230000013d27d5; asc # =' ;;
3: len 4; hex 80000004; asc ;;
4: len 4; hex 80000006; asc ;;
5: len 4; hex 80000008; asc ;;
------------------
TABLE LOCK table `test`.`l` trx id 30217455 lock mode IX
RECORD LOCKS space id 1358 page no 3 n bits 72 index PRIMARY of table `test`.`l` trx id 30217455 lock_mode X locks rec but not gap waiting
Record lock, heap no 2 PHYSICAL RECORD: n_fields 6; compact format; info bits 32
0: len 4; hex 80000002; asc ;;
1: len 6; hex 000001cd14ee; asc ;;
2: len 7; hex 230000013d27d5; asc # =' ;;
3: len 4; hex 80000004; asc ;;
4: len 4; hex 80000006; asc ;;
5: len 4; hex 80000008; asc ;;
---TRANSACTION 30217454, ACTIVE 1821 sec
2 lock struct(s), heap size 1136, 1 row lock(s), undo log entries 1
MySQL thread id 355, OS thread handle 140483080300288, query id 2339 localhost root
TABLE LOCK table `test`.`l` trx id 30217454 lock mode IX
RECORD LOCKS space id 1358 page no 3 n bits 72 index PRIMARY of table `test`.`l` trx id 30217454 lock_mode X locks rec but not gap
Record lock, heap no 2 PHYSICAL RECORD: n_fields 6; compact format; info bits 32
0: len 4; hex 80000002; asc ;;
1: len 6; hex 000001cd14ee; asc ;;
2: len 7; hex 230000013d27d5; asc # =' ;;
3: len 4; hex 80000004; asc ;;
4: len 4; hex 80000006; asc ;;
5: len 4; hex 80000008; asc ;;
...
- 找到LOCK WAIT
LOCK WAIT 2 lock struct(s), heap size 1136, 1 row lock(s) 两个锁结构,一个记录锁 - 找到TRX HAS BEEN WAITING 27 SEC FOR THIS LOCK TO BE GRANTED
等的是主键是2的这条记录上的锁,锁的类型是排他锁 - 再往下看,找到hold住2这条记录的事务,根据thread id 355可以找到对应的线程
这个355就是show processlist;对应的id,我们去session1上看下便知
(root@localhost) [test]> show processlist;
+-----+------+-----------+------+---------+------+----------+------------------+
| Id | User | Host | db | Command | Time | State | Info |
+-----+------+-----------+------+---------+------+----------+------------------+
| 355 | root | localhost | test | Query | 0 | starting | show processlist |
| 396 | root | localhost | test | Sleep | 1321 | | NULL |
+-----+------+-----------+------+---------+------+----------+------------------+
2 rows in set (0.00 sec)
注意再thread_id表中就不一样了,是对应proceelist_id
(root@localhost) [test]> select thread_id,processlist_id,thread_os_id from performance_schema.threads where processlist_id is not NULL;
+-----------+----------------+--------------+
| thread_id | processlist_id | thread_os_id |
+-----------+----------------+--------------+
| 27 | 1 | 10574 |
| 381 | 355 | 18745 |
| 422 | 396 | 10592 |
+-----------+----------------+--------------+
3 rows in set (0.00 sec)
分别表示内部线程号(自增的),对应show processlist里的id,进程号
Ⅱ、简单点,上面是不是太专业了
2.1 利用三张表写一个sql脚本
重复之前的步骤,一边开一个事务删除2这条记录不提交,另一边用for update查2这条记录
(root@localhost) [(none)]> SELECT
-> r.trx_id waiting_trx_id,
-> r.trx_mysql_thread_id waiting_thread,
-> r.trx_query wating_query,
-> b.trx_id blocking_trx_id,
-> b.trx_mysql_thread_id blocking_thread,
-> b.trx_query blocking_query
-> FROM
-> information_schema.innodb_lock_waits w
-> INNER JOIN
-> information_schema.innodb_trx b ON b.trx_id = w.blocking_trx_id
-> INNER JOIN
-> information_schema.innodb_trx r ON r.trx_id = w.requesting_trx_id;
+----------------+----------------+--------------------------------------+-----------------+-----------------+----------------+
| waiting_trx_id | waiting_thread | wating_query | blocking_trx_id | blocking_thread | blocking_query |
+----------------+----------------+--------------------------------------+-----------------+-----------------+----------------+
| 30217455 | 396 | select * from l where a=2 for update | 30217454 | 355 | NULL |
+----------------+----------------+--------------------------------------+-----------------+-----------------+----------------+
1 row in set, 1 warning (0.02 sec)
2.2 走sys库看一把,更简单
5.7才有sys库,不过5.6也可以自行把sys库弄进去
(root@localhost) [(none)]> select * from sys.innodb_lock_waits\G
*************************** 1. row ***************************
wait_started: 2018-06-03 00:52:01
wait_age: 00:00:14
wait_age_secs: 14
locked_table: `test`.`l`
locked_index: PRIMARY
locked_type: RECORD
waiting_trx_id: 30217455
waiting_trx_started: 2018-06-03 00:11:13
waiting_trx_age: 00:41:02
waiting_trx_rows_locked: 5
waiting_trx_rows_modified: 0
waiting_pid: 396
waiting_query: select * from l where a=2 for update
waiting_lock_id: 30217455:1358:3:2
waiting_lock_mode: X
blocking_trx_id: 30217454
blocking_pid: 355
blocking_query: NULL
blocking_lock_id: 30217454:1358:3:2
blocking_lock_mode: X
blocking_trx_started: 2018-06-03 00:09:53
blocking_trx_age: 00:42:22
blocking_trx_rows_locked: 1
blocking_trx_rows_modified: 1
sql_kill_blocking_query: KILL QUERY 355
sql_kill_blocking_connection: KILL 355
1 row in set, 3 warnings (0.09 sec)
tips:
- waiting_lock_id: 30217455:1358:3:2 这个东西表示 事务ID:space:page_No:heap_no,其他得比较简单不用说了
- blocking_query是null,waiting_query是知道的,为什么?
因为blocking的语句已经执行结束了,只是事务没提交罢了
线上大部分时间是看不到这个blocking_query的
即使show engine innodb status\G也是只能看到在等待哪条记录上的锁释放,而看不到是哪条sql导致的这个问题 - 最下面的KILL QUERY和KILL的区别是?
KILL QUERY是杀这个查询,KILL是直接杀连接
Ⅲ、锁超时
刚才模拟锁等待过程中出现了下面得报错
(root@localhost) [test]> select * from l where a=2 for update;
ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction
这叫锁等待超时,开发人员通常把这个和死锁混为一谈
lock持有的时间是以事务为单位的,事务提交后才会把事务里所有的锁释放,这是无法避免的,不过可以通过一个参数来控制超时时间
(root@localhost) [test]> show variables like 'innodb_lock_wait_timeout';
+--------------------------+-------+
| Variable_name | Value |
+--------------------------+-------+
| innodb_lock_wait_timeout | 50 |
+--------------------------+-------+
1 row in set (0.00 sec)
默认50s,建议设置为3s左右即可
Ⅲ、强行分析heap no
innodb中页里面的记录是逻辑有序的
一个页中,第一条插入的记录heap no是2,后面插入的heap no递增,这样在堆中就是有序的了,但是记录之间又是逻辑有序的,通过指针连接
heap no表示插入时的顺序,用来表示一个page中的record是什么时候插入的,所以加锁的定位是space->page_no->heap_no
一个page中,一条记录都没有,innodb默认会生成两条虚拟伪记录,min和max,min的heap_no是0,max的heap_no是1,所以用户插入的记录heap_no都是从2开始
max上是可以加锁的,min上面通常不加锁
Ⅳ、InnoDB中锁的管理
- 每个事务每个page(不是每条记录)有一个锁的对象,通过位图(lock bitmap )的方式来管理,位图是基于每个page的
page里面哪条record加锁了,就会把这条record的heap_no设置为1,heap_no就表示一个位图,表示第几位,所以innodb的锁是占用内存的,但是不是一个锁一个锁来管理锁的存储的(mysql上一个page的锁差不多30个字节就够了,网上都说的是100) - 没有锁升级(like oracle)
sqlserver有锁升级,sqlserver是每个锁一个锁对象,innodb是每个page一个锁对象,所以锁的空间占用上,oracle
补充sqlserver和innodb全表更新对比
sqlserver每个记录一个锁对象
如果占用10字节,300w个page,每个page100条记录
InnoDB:300M(300w*100/1000/1000)
sqlserver:3G(300w10010/1000/1000)
tips:
- sqlserver锁升级
一个事务持有5000(默认)行锁升级到表锁,锁升级也不是一点都不好,毕竟内存变小了