体系结构
- master Thread
负责将内存中的数据刷新到磁盘,保证数据的一致性。包括脏页刷新,合并插入缓冲,undo页的回收等。 - IO Thread
InnoDb大量使用AIO处理写io请求,目前版本1.0.x开始,write thread和read thread各有4个,由 innodb_read_io_threads 和 innodb_write_io_threads 参数调整。 - Purge Thread
用于清理事务提交后无用的undo页 - Page Cleaner Thread
用于脏页的刷新
内存
- 重做日志缓冲 redo_log_buffer
- 额外内存池 innodb_additional_mem_pool_size
- 缓冲池 innodb_buffer_pool
- 数据页 data page
- 插入缓冲 insert buffer
- 锁信息 lock info
- 索引页 index page
- 自适应哈希索引
- 数据字典信息
缓冲池管理策略
- LRU List
- Free List
Lastest Recent Used,即频繁使用的页和最近使用的页放在前端,淘汰列表尾部的数据,但是innodb引入midpoint位置进行优化,即最新使用的页被放在midpoint种,该算法被称为midpoint insertion strategy 默认5/8处,由innodb_old_bloccks_pct控制。
一般将midpoint前的数据成为热点数据。如果使用朴素的lru算法的话,在某些查找,诸如需要全表匹配的查找中,会将全表数据全部刷新到lru列表中,但是这些数据仅在全表匹配时使用一次,但是却淘汰了真正的热点数据。
innodb默认前5/8的数据时热点数据,后3/8的数据为临时数据,如果能对自己的数据类型有所预估,可以通过调整参数避免自己的热点数据被淘汰。
innodb引入innodb_old_blocks_time用于管理热点数据,该值默认为1000。当某个页被访问超过该值时,则被晋升为热点数据。
我们可以通过SHOW ENGINE INNODB STATUS
查看innodb的运行状态。或在新版本中通过SELECT * FROM information_schema.INNODB_BUFFER_POOL_STATS
查看innodb的运行状态。
=====================================
2020-06-08 19:49:26 0x48a8 INNODB MONITOR OUTPUT
=====================================
Per second averages calculated from the last 59 seconds
-----------------
BACKGROUND THREAD
-----------------
srv_master_thread loops: 729 srv_active, 0 srv_shutdown, 26208 srv_idle
srv_master_thread log flush and writes: 26937
----------
SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 1468
OS WAIT ARRAY INFO: signal count 1465
RW-shared spins 0, rounds 2908, OS waits 1454
RW-excl spins 0, rounds 0, OS waits 0
RW-sx spins 0, rounds 0, OS waits 0
Spin rounds per wait: 2908.00 RW-shared, 0.00 RW-excl, 0.00 RW-sx
------------
TRANSACTIONS
------------
Trx id counter 3785630
Purge done for trx's n:o < 3785630 undo n:o < 0 state: running but idle
History list length 10
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 283213221646960, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 283213221646088, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 283213221645216, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 283213221644344, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 283213221643472, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 283213221642600, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 283213221641728, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 283213221640856, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 283213221639984, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
--------
FILE I/O
--------
I/O thread 0 state: wait Windows aio (insert buffer thread)
I/O thread 1 state: wait Windows aio (log thread)
I/O thread 2 state: wait Windows aio (read thread)
I/O thread 3 state: wait Windows aio (read thread)
I/O thread 4 state: wait Windows aio (read thread)
I/O thread 5 state: wait Windows aio (read thread)
I/O thread 6 state: wait Windows aio (write thread)
I/O thread 7 state: wait Windows aio (write thread)
I/O thread 8 state: wait Windows aio (write thread)
I/O thread 9 state: wait Windows aio (write thread)
Pending normal aio reads: [0, 0, 0, 0] , aio writes: [0, 0, 0, 0] ,
ibuf aio reads:, log i/o's:, sync i/o's:
Pending flushes (fsync) log: 0; buffer pool: 0
1684 OS file reads, 11510 OS file writes, 6523 OS fsyncs
0.00 reads/s, 0 avg bytes/read, 0.41 writes/s, 0.31 fsyncs/s
-------------------------------------
INSERT BUFFER AND ADAPTIVE HASH INDEX
-------------------------------------
Ibuf: size 1, free list len 51, seg size 53, 0 merges
merged operations:
insert 0, delete mark 0, delete 0
discarded operations:
insert 0, delete mark 0, delete 0
Hash table size 34679, node heap has 2 buffer(s)
Hash table size 34679, node heap has 0 buffer(s)
Hash table size 34679, node heap has 0 buffer(s)
Hash table size 34679, node heap has 2 buffer(s)
Hash table size 34679, node heap has 1 buffer(s)
Hash table size 34679, node heap has 0 buffer(s)
Hash table size 34679, node heap has 0 buffer(s)
Hash table size 34679, node heap has 4 buffer(s)
1.15 hash searches/s, 1.07 non-hash searches/s
---
LOG
---
Log sequence number 820345990
Log flushed up to 820345990
Pages flushed up to 820345990
Last checkpoint at 820345981
0 pending log flushes, 0 pending chkp writes
4364 log i/o's done, 0.20 log i/o's/second
----------------------
BUFFER POOL AND MEMORY
----------------------
Total large memory allocated 137297920
Dictionary memory allocated 705343
Buffer pool size 8192
Free buffers 6976
Database pages 1207
Old database pages 461
Modified db pages 0
Pending reads 0
Pending writes: LRU 0, flush list 0, single page 0
Pages made young 0, not young 0
0.00 youngs/s, 0.00 non-youngs/s
Pages read 1118, created 89, written 6385
0.00 reads/s, 0.00 creates/s, 0.17 writes/s
Buffer pool hit rate 1000 / 1000, young-making rate 0 / 1000 not 0 / 1000
Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
LRU len: 1207, unzip_LRU len: 0
I/O sum[0]:cur[0], unzip sum[0]:cur[0]
--------------
ROW OPERATIONS
--------------
0 queries inside InnoDB, 0 queries in queue
0 read views open inside InnoDB
Process ID=4284, Main thread ID=6668, state: sleeping
Number of rows inserted 1704, updated 1428, deleted 0, read 1006480
0.00 inserts/s, 0.07 updates/s, 0.00 deletes/s, 39.32 reads/s
----------------------------
END OF INNODB MONITOR OUTPUT
============================
可以看到Buffer pool size共有8192个页,即该库的缓冲池大小为8192*16K = 128M,其中Free buffers与Database pages并不等于Buffer pool size,因为内存池往往还需要分配给自适应哈希索引、lock、insert buffer等,并不全由lru列表管理。此外还有一个参数为Buffer pool hit rate,代表缓冲池命中比例,该报告中为1000/1000,即100%,通常该值不应低于95%。
innodb从1.0.x开始支持页压缩功能。
- Flush List
当lru的也被修改之后,该页被称为脏页(dirty page),此时缓冲池中页数据与硬盘不一致,需要通过checkpoint机制将脏页中的修改写入硬盘。所有的脏页会加入flush list中,当也同时存在lru list中,Modified db pages表明了当前脏页的数量。
重做日志缓冲
通过SHOW VARIABLES LIKE 'innodb_log_buffer_size'
可以看到重做日志缓冲区的大小。
- master thread
- 每个事务提交时会刷新到重做日志文件
- 当重做日志缓冲池剩余空间小于1/2时,重做日志缓冲会刷新到重做日志文件
额外的内存池
在innodb中,内存被以一种成为堆(heap)的方式管理。例如在缓冲池(innodb_buffer_pool)中的帧缓冲(frame buffer)及其对应的缓冲控制对象(bugger control block),这些对象记录了诸如lru、锁、等待等信息,这些对象需要的内存会从额外的内存池中申请
checkpoint机制
为了协调cpu与硬盘的速度鸿沟,页的操作都是在内存池中完成的,在执行一条DML(Data Manipulation Language)语句后,该页即为需要写入硬盘的脏页。但是如果操作集中在某几条数据,而每次操作都在硬盘上执行一次io操作,那性能将非常差,同时会存在如果写入磁盘时发生意外,可能会导致数据的丢失。为此,很多事务数据库采用了write ahead log 策略,即在事务提交时,先写重做日志,在修改页数据,从而保证了数据的持久性。
而checkpoint的出现则是为了标记可以被覆盖的重做日志,并将内存中的页数据写入磁盘中。在缓冲池不足时,以及重做日志不可用时,会强制产生checkpoint。而具体checkpoint的时间条件,脏页的选择,刷新多少脏页到磁盘的决定十分复杂。
在innodb中,checkpoint分为两种:
- sharp checkpoint
- fuzzy checckpoint
sharp checkpoint 发生在数据库库关闭时,此时会将所有脏页刷新到磁盘上,这是默认的工作方式,参数innodb_fast_shutdown=1。但在数据库运行时往往不采用将所有脏页刷新到磁盘的工作方式,而是每次刷新一部分脏页,从而提高性能,这种被称为fuzzy checkpoint。
fuzzy checkpoint 又分为以下四种情况:- master thread checkpoint
- flush_lru_list checkpoint
- async/sync flush checkpoint
- dirty page too much checkpoint
master thread在各个版本中有不同的实现,在此暂且略过。master thread checkpoint差不多以每秒或每10秒的速度将脏页一步刷新到磁盘中,该过程中依然可以进行其他操作。
flush_lru_list checkpoint: innodb需要保证lru队列中有大约100个空闲的页,所以当空闲页不足时,触发该checkpoint。在mysql5.6之前,该操作是在用户查询线程中进行的,所以会阻塞用户操作,但从5.6版本开始,该操作由单独的page cleaner线程进行,并且由innodb_lru_scan_depth参数控制空闲页的数量,默认1024。
saync/sync flush checkpoint 指重做日志不可用时,需要将一部分脏页刷新到磁盘中。
dirty page too much checkpoin:当脏页占内存池总数超过参数innodb_max_dirty_pages_pct时,刷新一部分脏页到磁盘中。在innodb 1.0.x之前,该值默认为90,之后默认为75。