准备好大文件
使用一下python脚本自动创建一些sql语句:
with open('large.sql', 'w+') as fp:
fp.write('BEGIN TRANSACTION;')
for i in range(3, 1000):
fp.write('insert into department values(%d, "test", %d);\n'%(i, i-1))
fp.write('END TRANSACTION;')
然后导入到small.db里面
PS G:\code-2\sqlite3> ./sqlite3 small.db
SQLite version 3.28.0 2019-04-16 19:49:53
Enter ".help" for usage hints.
sqlite> .read large.sql
b-tree 分析
此时department表的根页2的类型从原来的leaf 变成了 interior
PS G:\code-2\sqlite3> ./showdb small.db 2bdCCC
Pagesize: 4096
Available pages: 1..7
Header on btree page 2:
000: 05 5 table interior node
001: 00 00 0 Offset to first freeblock
003: 00 04 4 Number of cells on this page
005: 0f e8 4072 Offset to cell content area
007: 00 0 Fragmented byte count
008: 00 00 00 07 7 Right child
Cell[0]:
ffa: 00 00 00 03 left child page:: 3
ffe: 82 07 rowid: 263
Cell[1]:
ff4: 00 00 00 04 left child page:: 4
ff8: 83 77 rowid: 503
Cell[2]:
fee: 00 00 00 05 left child page:: 5
ff2: 85 67 rowid: 743
Cell[3]:
fe8: 00 00 00 06 left child page:: 6
fec: 87 57 rowid: 983
可见b-tree page 是b+树,因为内部结点并不存储信息。可是这文件还是不够大,因为这只有一个内部结点,其余的都是叶节点。
数据量增加至原来的十倍,emmm一样没变化,还是只有一个内部节点。那我们计算下要多少才能产生多个内部节点,官方文档给出的资料是,k等于页大小/cell数据。内部节点的cell开销很小,就一个页号以及可变长的rowid,外加cell pointer的2字节,目前来说就8字节。页表头12字节,而叶节点存满要262条记录。。。。
经过一系列努力,终于完成了分裂。
到目前为止,两种b-tree page已经弄清楚究竟是干嘛的了。内部节点存放的是 {rowid + left child page number}的数组,而叶节点,则是{rowid + payload(也就是record format)}。
添加索引
一开始为了不必要的干扰,创建表的时候我并没有使用primary key,其会隐式创建索引 sqlite_autoindex_TABLE_N
,插入到sqlite_schema中。现在为dept添加索引:
PS G:\code-2\sqlite3> ./sqlite3 small.db
SQLite version 3.28.0 2019-04-16 19:49:53
Enter ".help" for usage hints.
sqlite> .sch
CREATE TABLE department(id int, dept char(30), emp_id int);
CREATE INDEX depId_index on department(id);
sqlite> CREATE INDEX depName_index on department(dept);
sqlite> select * from sqlite_master;
table|department|department|2|CREATE TABLE department(id int, dept char(30), emp_id int)
index|depId_index|department|6333|CREATE INDEX depId_index on department(id)
index|depName_index|department|9263|CREATE INDEX depName_index on department(dept)
sqlite> .qu
可以看见新创建的索引depName_index的根页号为9263:
索引内部节点分析
PS G:\code-2\sqlite3> ./showdb small.db 9263bdccc
Pagesize: 4096
Available pages: 1..13876
Header on btree page 9263:
000: 02 2 index interior node
001: 00 00 0 Offset to first freeblock
003: 00 19 25 Number of cells on this page
005: 0d f5 3573 Offset to cell content area
007: 00 0 Fragmented byte count
008: 00 00 35 d0 13776 Right child
Cell[0]:
feb: 00 00 24 ed left child page:: 9453
fef: 10 payload-size: 16
ff0: 03 record-header-size: 3
ff1: 21 typecode[0]: 33 - text(10)
ff2: 03 typecode[1]: 3 - int24
ff3: 74 65 73 74 31 33 31 34 35 data[0]: 'test131453'
ffd: 02 01 7b data[1]: 131451
Cell[1]:
fd6: 00 00 24 ee left child page:: 9454
fda: 10 payload-size: 16
fdb: 03 record-header-size: 3
fdc: 21 typecode[0]: 33 - text(10)
fdd: 03 typecode[1]: 3 - int24
fde: 74 65 73 74 31 36 37 31 32 data[0]: 'test167126'
fe8: 02 8c d4 data[1]: 167124
可以看出来这里的key = dept,并且它映射到rowid上,也就是data[1]。回忆起b-tree page的时候,必然有rowid的出现,实际上原本的表就有index了,这index为rowid。创建新的index,实际上就是建立字段与对应rowid的关联。
此外,可以看出index page采用的是b树,这个与table b-tree page的不一样。
PS G:\code-2\sqlite3> ./sqlite3 small.db
SQLite version 3.28.0 2019-04-16 19:49:53
Enter ".help" for usage hints.
sqlite> select * from department where rowid = 131451;
131453|test131453|131452
sqlite>
注:如果是表使用without rowid的话,那么表是只有index page构成,并且必须有主键,主键在磁盘中存放在最前面,接着才是其他列。接着如果再创建别的索引的话,也是按照主键映射关系来间接寻得数据。
PS G:\code-2\sqlite3> ./sqlite3 test2.db
SQLite version 3.28.0 2019-04-16 19:49:53
Enter ".help" for usage hints.
sqlite> create table department(id int primary key, dept char(30), emp_id int) without rowid;
sqlite> insert into department values(1, "test", -1);
sqlite> insert into department values(2, "test", 1);
sqlite> .qu
PS G:\code-2\sqlite3> ./showdb test2.db 2bdccc
Pagesize: 4096
Available pages: 1..2
Header on btree page 2:
000: 0a 10 index leaf
001: 00 00 0 Offset to first freeblock
003: 00 02 2 Number of cells on this page
005: 0f ec 4076 Offset to cell content area
007: 00 0 Fragmented byte count
Cell[0]:
ff6: 09 payload-size: 9
ff7: 04 record-header-size: 4
ff8: 09 typecode[0]: 9 - one
ff9: 15 typecode[1]: 21 - text(4)
ffa: 01 typecode[2]: 1 - int8
ffb: 74 65 73 74 data[1]: 'test'
fff: ff data[2]: -1
Cell[1]:
fec: 09 payload-size: 9
fed: 04 record-header-size: 4
fee: 01 typecode[0]: 1 - int8
fef: 15 typecode[1]: 21 - text(4)
ff0: 09 typecode[2]: 9 - one
ff1: 02 data[0]: 2
ff2: 74 65 73 74 data[1]: 'test'
碎片、自由块和空闲页
碎片:在update时候,数据内容发生增减。缩减很正常,但是如果是增长呢?情况估计要看代码,先检查自由块,然后还有碎片大小,最后再根据这些情况再整体挪位置
PS G:\code-2\sqlite3> ./showdb test2.db 2bdccc
Pagesize: 4096
Available pages: 1..5
Header on btree page 2:
000: 0a 10 index leaf
001: 0f ec 4076 Offset to first freeblock
003: 00 02 2 Number of cells on this page
005: 0d eb 3563 Offset to cell content area
007: 03 3 Fragmented byte count
Cell[0]:
fdd: 0e payload-size: 14
fde: 04 record-header-size: 4
fdf: 01 typecode[0]: 1 - int8
fe0: 19 typecode[1]: 25 - text(6)
fe1: 03 typecode[2]: 3 - int24
fe2: 03 data[0]: 3
fe3: 74 65 73 74 33 33 data[1]: 'test33'
fe9: 00 82 35 data[2]: 33333
Cell[1]:
deb: 8f 5b payload-size: 2011 (489 local, 1522 overflow)
ded: 05 record-header-size: 5
dee: 03 typecode[0]: 3 - int24
def: 9f 2d typecode[1]: 4013 - text(2000)
df1: 03 typecode[2]: 3 - int24
df2: 06 f7 49 data[0]: 456521
fd6: 00 00 00 05 overflow-page: 5
PS G:\code-2\sqlite3> ./sqlite3 test2.db
sqlite> update department set dept='testlonglonglonglonglong' where id = 3;
sqlite> .qu
PS G:\code-2\sqlite3> ./showdb test2.db 2bdccc
Pagesize: 4096
Available pages: 1..5
Header on btree page 2:
000: 0a 10 index leaf
001: 00 00 0 Offset to first freeblock
003: 00 02 2 Number of cells on this page
005: 0d eb 3563 Offset to cell content area
007: 05 5 Fragmented byte count
Cell[0]:
fdf: 20 payload-size: 32
fe0: 04 record-header-size: 4
fe1: 01 typecode[0]: 1 - int8
fe2: 3d typecode[1]: 61 - text(24)
fe3: 03 typecode[2]: 3 - int24
fe4: 03 data[0]: 3
fe5: 74 65 73 74 6c 6f 6e 67 6c data[1]: 'testlonglonglonglonglon...'
ffd: 00 82 35 data[2]: 33333
Cell[1]:
deb: 8f 5b payload-size: 2011 (489 local, 1522 overflow)
ded: 05 record-header-size: 5
dee: 03 typecode[0]: 3 - int24
def: 9f 2d typecode[1]: 4013 - text(2000)
df1: 03 typecode[2]: 3 - int24
df2: 06 f7 49 data[0]: 456521
df5: ... 481 bytes of content ...
fd6: 00 00 00 05 overflow-page: 5
自由块:在删除某条记录时候,并且该记录不是最上面的那一条,那么就会产生free block
空闲页:删的记录过多,导致整个页全是空的的时候,还是原先那个大文件,small.db,现在删掉很多很多数据。
PS G:\code-2\sqlite3> ./sqlite3 small.db
SQLite version 3.28.0 2019-04-16 19:49:53
Enter ".help" for usage hints.
sqlite> delete from department where id >= 10001;
sqlite> .qu
PS G:\code-2\sqlite3> ./showdb small.db dbheader
Pagesize: 4096
Available pages: 1..13876
000: 53 51 4c 69 74 65 20 66 6f 72 6d 61 74 20 33 00 SQLite format 3.
010: 10 00 01 01 00 40 20 20 00 00 03 ed 00 00 36 34 .....@ ......64
020: 00 00 35 72 00 00 35 cf 00 00 00 03 00 00 00 04 ..5r..5.........
030: 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00 00 ................
040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 03 ed ................
060: 00 2e 34 20 00 ..4 .
Decoded:
010: 10 00 4096 Database page size
012: 01 1 File format write version
013: 01 1 File format read version
014: 00 0 Reserved space at end of page
018: 00 00 03 ed 1005 File change counter
01c: 00 00 36 34 13876 Size of database in pages
020: 00 00 35 72 13682 Page number of first freelist page
024: 00 00 35 cf 13775 Number of freelist pages
028: 00 00 00 03 3 Schema cookie
02c: 00 00 00 04 4 Schema format version
030: 00 00 00 00 0 Default page cache size
034: 00 00 00 00 0 Largest auto-vac root page
038: 00 00 00 01 1 Text encoding
03c: 00 00 00 00 0 User version
040: 00 00 00 00 0 Incremental-vacuum mode
044: 00 00 00 00 0 Application ID
048: 00 00 00 00 0 meta[8]
04c: 00 00 00 00 0 meta[9]
050: 00 00 00 00 0 meta[10]
054: 00 00 00 00 0 meta[11]
058: 00 00 00 00 0 meta[12]
05c: 00 00 03 ed 1005 Change counter for version number
060: 00 2e 34 20 3028000 SQLite version number
PS G:\code-2\sqlite3>
检索空闲chunk链表:
PS G:\code-2\sqlite3> ./showdb small.db 13682tr
Pagesize: 4096
Available pages: 1..13876
Decode of freelist trunk page 13682:
000: 00 00 34 0b 13323 Next freelist trunk page
004: 00 00 02 29 553 Number of entries on this page
Decode of freelist trunk page 13323:
000: 00 00 22 20 8736 Next freelist trunk page
004: 00 00 03 f8 1016 Number of entries on this page
Decode of freelist trunk page 8736:
000: 00 00 12 84 4740 Next freelist trunk page
004: 00 00 03 f8 1016 Number of entries on this page
Decode of freelist trunk page 4740:
000: 00 00 2f d6 12246 Next freelist trunk page
004: 00 00 03 f8 1016 Number of entries on this page
Decode of freelist trunk page 12246:
000: 00 00 1f b1 8113 Next freelist trunk page
004: 00 00 03 f8 1016 Number of entries on this page
Decode of freelist trunk page 8113:
000: 00 00 0d 3e 3390 Next freelist trunk page
004: 00 00 03 f8 1016 Number of entries on this page
Decode of freelist trunk page 3390:
000: 00 00 2b a0 11168 Next freelist trunk page
004: 00 00 03 f8 1016 Number of entries on this page
Decode of freelist trunk page 11168:
000: 00 00 1d 43 7491 Next freelist trunk page
004: 00 00 03 f8 1016 Number of entries on this page
Decode of freelist trunk page 7491:
000: 00 00 1c 73 7283 Next freelist trunk page
004: 00 00 03 f8 1016 Number of entries on this page
Decode of freelist trunk page 7283:
000: 00 00 27 6b 10091 Next freelist trunk page
004: 00 00 03 f8 1016 Number of entries on this page
Decode of freelist trunk page 10091:
000: 00 00 26 05 9733 Next freelist trunk page
004: 00 00 03 f8 1016 Number of entries on this page
Decode of freelist trunk page 9733:
000: 00 00 24 9e 9374 Next freelist trunk page
004: 00 00 03 f8 1016 Number of entries on this page
Decode of freelist trunk page 9374:
000: 00 00 00 2e 46 Next freelist trunk page
004: 00 00 03 f8 1016 Number of entries on this page
Decode of freelist trunk page 46:
000: 00 00 00 00 0 Next freelist trunk page
004: 00 00 03 f8 1016 Number of entries on this page
查看具体某个chunk里面的空闲页:
PS G:\code-2\sqlite3> ./showdb small.db 46td
Pagesize: 4096
Available pages: 1..13876
Decode of freelist trunk page 46:
000: 00 00 00 00 0 Next freelist trunk page
004: 00 00 03 f8 1016 Number of entries on this page
[0] 47 [1] 6360 [2] 48 [3] 49 [4] 6361
[5] 50 [6] 51 [7] 52 [8] 53 [9] 54
[10] 6363 [11] 55 [12] 56 [13] 6364 [14] 57
[15] 58 [16] 6365 [17] 59 [18] 60 [19] 61
[20] 6366 [21] 62 [22] 63 [23] 6367 [24] 64
[25] 65 [26] 6368 [27] 66 [28] 67 [29] 6369
[30] 68 [31] 69 [32] 6370 [33] 70 [34] 71
[35] 72 [36] 6371 [37] 73 [38] 74 [39] 6372
[40] 75 [41] 76 [42] 6373 [43] 77 [44] 78
[45] 6374 [46] 79 [47] 80 [48] 6375 [49] 81
[50] 82 [51] 83 [52] 6376 [53] 84 [54] 85
[55] 6377 [56] 86 [57] 87 [58] 6378 [59] 88
[60] 89 [61] 90 [62] 6379 [63] 91 [64] 92
[65] 6380 [66] 93 [67] 94 [68] 6381 [69] 95
[70] 96 [71] 97 [72] 6382 [73] 98 [74] 99
[75] 6383 [76] 100 [77] 101 [78] 6384 [79] 102
[80] 103 [81] 104 [82] 6385 [83] 105 [84] 106
[85] 6386 [86] 107 [87] 108 [88] 6387 [89] 109
[90] 110 [91] 111 [92] 6388 [93] 112 [94] 113
[95] 6389 [96] 114 [97] 115 [98] 6390 [99] 116
但是这些页里面的数据可能是没有被清除掉的。
溢出页
它主要是指的是对应某个关键字的playload的数据过大,导致一个页放不下。计算方式有点复杂,参考showdb.c里面的
/*
** Compute the local payload size given the total payload size and
** the page size.
*/
static i64 localPayload(i64 nPayload, char cType){
i64 maxLocal;
i64 minLocal;
i64 surplus;
i64 nLocal;
if( cType==13 ){
/* Table leaf */
maxLocal = pagesize-35;
minLocal = (pagesize-12)*32/255-23;
}else{
maxLocal = (pagesize-12)*64/255-23;
minLocal = (pagesize-12)*32/255-23;
}
if( nPayload>maxLocal ){
surplus = minLocal + (nPayload-minLocal)%(pagesize-4);
if( surplus<=maxLocal ){
nLocal = surplus;
}else{
nLocal = minLocal;
}
}else{
nLocal = nPayload;
}
return nLocal;
}
这里页头用最大值,也就是12字节,单个cell头部长23字节。
35 = 12 + 23; //应该是这个意思,不过具体cell部分的计算不太清楚。
所以让我们尝试给index插入2k大小的数据吧,毕竟index的允许的最大值只有25%,也就是1k大小。先准备好sql语句。
with open('large.sql', 'w+') as fp:
s = ''
for i in range(1000, 1500):
s += ('%d'%(i))
fp.write('insert into department values(%d, "%s", %d);\n'%(456521, s, 456520))
然后再导入test2.db,并且查询。
PS G:\code-2\sqlite3> ./sqlite3 .\test2.db
SQLite version 3.28.0 2019-04-16 19:49:53
Enter ".help" for usage hints.
sqlite> .sch
CREATE TABLE department(id int primary key, dept char(30), emp_id int) without rowid;
CREATE INDEX depName_index on department(dept);
sqlite> .read large.sql
sqlite> select * from department;
3|test33|33333
456521|10001001100210031004100510061007100810091010101110121013101410151016101710181019102010211022102310241025102610271028102910301031103210331034103510361037103810391040104110421043104410451046104710481049105010511052105310541055105610571058105910601061106210631064106510661067106810691070107110721073107410751076107710781079108010811082108310841085108610871088108910901091109210931094109510961097109810991100110111021103110411051106110711081109111011111112111311141115111611171118111911201121112211231124112511261127112811291130113111321133113411351136113711381139114011411142114311441145114611471148114911501151115211531154115511561157115811591160116111621163116411651166116711681169117011711172117311741175117611771178117911801181118211831184118511861187118811891190119111921193119411951196119711981199120012011202120312041205120612071208120912101211121212131214121512161217121812191220122112221223122412251226122712281229123012311232123312341235123612371238123912401241124212431244124512461247124812491250125112521253125412551256125712581259126012611262126312641265126612671268126912701271127212731274127512761277127812791280128112821283128412851286128712881289129012911292129312941295129612971298129913001301130213031304130513061307130813091310131113121313131413151316131713181319132013211322132313241325132613271328132913301331133213331334133513361337133813391340134113421343134413451346134713481349135013511352135313541355135613571358135913601361136213631364136513661367136813691370137113721373137413751376137713781379138013811382138313841385138613871388138913901391139213931394139513961397139813991400140114021403140414051406140714081409141014111412141314141415141614171418141914201421142214231424142514261427142814291430143114321433143414351436143714381439144014411442144314441445144614471448144914501451145214531454145514561457145814591460146114621463146414651466146714681469147014711472147314741475147614771478147914801481148214831484148514861487148814891490149114921493149414951496149714981499|456520
sqlite> select length(dept) from department where id = 456521;
2000
sqlite> .qu
查看页号2里面的数据,发现溢出了
PS G:\code-2\sqlite3> ./showdb test2.db 2bdccc
Pagesize: 4096
Available pages: 1..5
Header on btree page 2:
000: 0a 10 index leaf
001: 0f ec 4076 Offset to first freeblock
003: 00 02 2 Number of cells on this page
005: 0d eb 3563 Offset to cell content area
007: 03 3 Fragmented byte count
Cell[0]:
fdd: 0e payload-size: 14
fde: 04 record-header-size: 4
fdf: 01 typecode[0]: 1 - int8
fe0: 19 typecode[1]: 25 - text(6)
fe1: 03 typecode[2]: 3 - int24
fe2: 03 data[0]: 3
fe3: 74 65 73 74 33 33 data[1]: 'test33'
fe9: 00 82 35 data[2]: 33333
Cell[1]:
deb: 8f 5b payload-size: 2011 (489 local, 1522 overflow)
ded: 05 record-header-size: 5
dee: 03 typecode[0]: 3 - int24
def: 9f 2d typecode[1]: 4013 - text(2000)
df1: 03 typecode[2]: 3 - int24
df2: 06 f7 49 data[0]: 456521
df5: ... 481 bytes of content ...
fd6: 00 00 00 05 overflow-page: 5
这里就比较奇怪了,sqlite是直接通过计算比较得到字符串的长度,以及在本页内的存储长度,溢出页号作为可选字段,存放在实际内容之后。这就极大限制了payload fraction的实现了,这里相当于硬编码,计算得到的长度,不具备可扩展性。
溢出页的构造就简单多了,就是next-page-num + content。也就是一个链表。