大文件分析 - (一)

准备好大文件

使用一下python脚本自动创建一些sql语句:

with open('large.sql', 'w+') as fp:
    fp.write('BEGIN TRANSACTION;')
    for i in range(3, 1000):
        fp.write('insert into department values(%d, "test", %d);\n'%(i, i-1))
    fp.write('END TRANSACTION;')

然后导入到small.db里面

PS G:\code-2\sqlite3> ./sqlite3 small.db
SQLite version 3.28.0 2019-04-16 19:49:53
Enter ".help" for usage hints.
sqlite> .read large.sql

b-tree 分析

此时department表的根页2的类型从原来的leaf 变成了 interior

PS G:\code-2\sqlite3> ./showdb small.db 2bdCCC
Pagesize: 4096
Available pages: 1..7
Header on btree page 2:
 000: 05                    5  table interior node
 001: 00 00                 0  Offset to first freeblock
 003: 00 04                 4  Number of cells on this page
 005: 0f e8              4072  Offset to cell content area
 007: 00                    0  Fragmented byte count
 008: 00 00 00 07           7  Right child
Cell[0]:
 ffa: 00 00 00 03                left child page:: 3
 ffe: 82 07                      rowid: 263
Cell[1]:
 ff4: 00 00 00 04                left child page:: 4
 ff8: 83 77                      rowid: 503
Cell[2]:
 fee: 00 00 00 05                left child page:: 5
 ff2: 85 67                      rowid: 743
Cell[3]:
 fe8: 00 00 00 06                left child page:: 6
 fec: 87 57                      rowid: 983

可见b-tree page 是b+树,因为内部结点并不存储信息。可是这文件还是不够大,因为这只有一个内部结点,其余的都是叶节点。

数据量增加至原来的十倍,emmm一样没变化,还是只有一个内部节点。那我们计算下要多少才能产生多个内部节点,官方文档给出的资料是,k等于页大小/cell数据。内部节点的cell开销很小,就一个页号以及可变长的rowid,外加cell pointer的2字节,目前来说就8字节。页表头12字节,而叶节点存满要262条记录。。。。

经过一系列努力,终于完成了分裂。
到目前为止,两种b-tree page已经弄清楚究竟是干嘛的了。内部节点存放的是 {rowid + left child page number}的数组,而叶节点,则是{rowid + payload(也就是record format)}。

添加索引

一开始为了不必要的干扰,创建表的时候我并没有使用primary key,其会隐式创建索引 sqlite_autoindex_TABLE_N,插入到sqlite_schema中。现在为dept添加索引:

PS G:\code-2\sqlite3> ./sqlite3 small.db
SQLite version 3.28.0 2019-04-16 19:49:53
Enter ".help" for usage hints.
sqlite> .sch
CREATE TABLE department(id int, dept char(30), emp_id int);
CREATE INDEX depId_index on department(id);
sqlite> CREATE INDEX depName_index on department(dept);
sqlite> select * from sqlite_master;
table|department|department|2|CREATE TABLE department(id int, dept char(30), emp_id int)
index|depId_index|department|6333|CREATE INDEX depId_index on department(id)
index|depName_index|department|9263|CREATE INDEX depName_index on department(dept)
sqlite> .qu

可以看见新创建的索引depName_index的根页号为9263:

索引内部节点分析

PS G:\code-2\sqlite3> ./showdb small.db 9263bdccc
Pagesize: 4096
Available pages: 1..13876
Header on btree page 9263:
 000: 02                    2  index interior node
 001: 00 00                 0  Offset to first freeblock
 003: 00 19                25  Number of cells on this page
 005: 0d f5              3573  Offset to cell content area
 007: 00                    0  Fragmented byte count
 008: 00 00 35 d0       13776  Right child
Cell[0]:
 feb: 00 00 24 ed                left child page:: 9453
 fef: 10                         payload-size: 16
 ff0: 03                         record-header-size: 3
 ff1: 21                         typecode[0]: 33 - text(10)
 ff2: 03                         typecode[1]: 3 - int24
 ff3: 74 65 73 74 31 33 31 34 35 data[0]: 'test131453'
 ffd: 02 01 7b                   data[1]: 131451
Cell[1]:
 fd6: 00 00 24 ee                left child page:: 9454
 fda: 10                         payload-size: 16
 fdb: 03                         record-header-size: 3
 fdc: 21                         typecode[0]: 33 - text(10)
 fdd: 03                         typecode[1]: 3 - int24
 fde: 74 65 73 74 31 36 37 31 32 data[0]: 'test167126'
 fe8: 02 8c d4                   data[1]: 167124

可以看出来这里的key = dept,并且它映射到rowid上,也就是data[1]。回忆起b-tree page的时候,必然有rowid的出现,实际上原本的表就有index了,这index为rowid。创建新的index,实际上就是建立字段与对应rowid的关联。
此外,可以看出index page采用的是b树,这个与table b-tree page的不一样。

PS G:\code-2\sqlite3> ./sqlite3 small.db
SQLite version 3.28.0 2019-04-16 19:49:53
Enter ".help" for usage hints.
sqlite> select * from department where rowid = 131451;
131453|test131453|131452
sqlite>

注:如果是表使用without rowid的话,那么表是只有index page构成,并且必须有主键,主键在磁盘中存放在最前面,接着才是其他列。接着如果再创建别的索引的话,也是按照主键映射关系来间接寻得数据。

PS G:\code-2\sqlite3> ./sqlite3 test2.db
SQLite version 3.28.0 2019-04-16 19:49:53
Enter ".help" for usage hints.
sqlite> create table department(id int primary key, dept char(30), emp_id int) without rowid;
sqlite> insert into department values(1, "test", -1);
sqlite> insert into department values(2, "test", 1);
sqlite> .qu

PS G:\code-2\sqlite3> ./showdb test2.db 2bdccc
Pagesize: 4096
Available pages: 1..2
Header on btree page 2:
 000: 0a                   10  index leaf
 001: 00 00                 0  Offset to first freeblock
 003: 00 02                 2  Number of cells on this page
 005: 0f ec              4076  Offset to cell content area
 007: 00                    0  Fragmented byte count
Cell[0]:
 ff6: 09                         payload-size: 9
 ff7: 04                         record-header-size: 4
 ff8: 09                         typecode[0]: 9 - one
 ff9: 15                         typecode[1]: 21 - text(4)
 ffa: 01                         typecode[2]: 1 - int8
 ffb: 74 65 73 74                data[1]: 'test'
 fff: ff                         data[2]: -1
Cell[1]:
 fec: 09                         payload-size: 9
 fed: 04                         record-header-size: 4
 fee: 01                         typecode[0]: 1 - int8
 fef: 15                         typecode[1]: 21 - text(4)
 ff0: 09                         typecode[2]: 9 - one
 ff1: 02                         data[0]: 2
 ff2: 74 65 73 74                data[1]: 'test'

碎片、自由块和空闲页

碎片:在update时候,数据内容发生增减。缩减很正常,但是如果是增长呢?情况估计要看代码,先检查自由块,然后还有碎片大小,最后再根据这些情况再整体挪位置

PS G:\code-2\sqlite3> ./showdb test2.db 2bdccc
Pagesize: 4096
Available pages: 1..5
Header on btree page 2:
 000: 0a                   10  index leaf
 001: 0f ec              4076  Offset to first freeblock
 003: 00 02                 2  Number of cells on this page
 005: 0d eb              3563  Offset to cell content area
 007: 03                    3  Fragmented byte count
Cell[0]:
 fdd: 0e                         payload-size: 14
 fde: 04                         record-header-size: 4
 fdf: 01                         typecode[0]: 1 - int8
 fe0: 19                         typecode[1]: 25 - text(6)
 fe1: 03                         typecode[2]: 3 - int24
 fe2: 03                         data[0]: 3
 fe3: 74 65 73 74 33 33          data[1]: 'test33'
 fe9: 00 82 35                   data[2]: 33333
Cell[1]:
 deb: 8f 5b                      payload-size: 2011 (489 local, 1522 overflow)
 ded: 05                         record-header-size: 5
 dee: 03                         typecode[0]: 3 - int24
 def: 9f 2d                      typecode[1]: 4013 - text(2000)
 df1: 03                         typecode[2]: 3 - int24
 df2: 06 f7 49                   data[0]: 456521
 fd6: 00 00 00 05                overflow-page: 5


PS G:\code-2\sqlite3> ./sqlite3 test2.db
sqlite> update department set dept='testlonglonglonglonglong' where id = 3;
sqlite> .qu
PS G:\code-2\sqlite3> ./showdb test2.db 2bdccc
Pagesize: 4096
Available pages: 1..5
Header on btree page 2:
 000: 0a                   10  index leaf
 001: 00 00                 0  Offset to first freeblock
 003: 00 02                 2  Number of cells on this page
 005: 0d eb              3563  Offset to cell content area
 007: 05                    5  Fragmented byte count
Cell[0]:
 fdf: 20                         payload-size: 32
 fe0: 04                         record-header-size: 4
 fe1: 01                         typecode[0]: 1 - int8
 fe2: 3d                         typecode[1]: 61 - text(24)
 fe3: 03                         typecode[2]: 3 - int24
 fe4: 03                         data[0]: 3
 fe5: 74 65 73 74 6c 6f 6e 67 6c data[1]: 'testlonglonglonglonglon...'
 ffd: 00 82 35                   data[2]: 33333
Cell[1]:
 deb: 8f 5b                      payload-size: 2011 (489 local, 1522 overflow)
 ded: 05                         record-header-size: 5
 dee: 03                         typecode[0]: 3 - int24
 def: 9f 2d                      typecode[1]: 4013 - text(2000)
 df1: 03                         typecode[2]: 3 - int24
 df2: 06 f7 49                   data[0]: 456521
 df5:                            ... 481 bytes of content ...
 fd6: 00 00 00 05                overflow-page: 5

自由块:在删除某条记录时候,并且该记录不是最上面的那一条,那么就会产生free block
空闲页:删的记录过多,导致整个页全是空的的时候,还是原先那个大文件,small.db,现在删掉很多很多数据。

PS G:\code-2\sqlite3> ./sqlite3 small.db
SQLite version 3.28.0 2019-04-16 19:49:53
Enter ".help" for usage hints.
sqlite> delete from department where id >= 10001;
sqlite> .qu
PS G:\code-2\sqlite3> ./showdb small.db dbheader
Pagesize: 4096
Available pages: 1..13876
 000: 53 51 4c 69 74 65 20 66 6f 72 6d 61 74 20 33 00 SQLite format 3.
 010: 10 00 01 01 00 40 20 20 00 00 03 ed 00 00 36 34 .....@  ......64
 020: 00 00 35 72 00 00 35 cf 00 00 00 03 00 00 00 04 ..5r..5.........
 030: 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00 00 ................
 040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
 050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 03 ed ................
 060: 00 2e 34 20 00                                  ..4 .
Decoded:
 010: 10 00              4096  Database page size
 012: 01                    1  File format write version
 013: 01                    1  File format read version
 014: 00                    0  Reserved space at end of page
 018: 00 00 03 ed        1005  File change counter
 01c: 00 00 36 34       13876  Size of database in pages
 020: 00 00 35 72       13682  Page number of first freelist page
 024: 00 00 35 cf       13775  Number of freelist pages
 028: 00 00 00 03           3  Schema cookie
 02c: 00 00 00 04           4  Schema format version
 030: 00 00 00 00           0  Default page cache size
 034: 00 00 00 00           0  Largest auto-vac root page
 038: 00 00 00 01           1  Text encoding
 03c: 00 00 00 00           0  User version
 040: 00 00 00 00           0  Incremental-vacuum mode
 044: 00 00 00 00           0  Application ID
 048: 00 00 00 00           0  meta[8]
 04c: 00 00 00 00           0  meta[9]
 050: 00 00 00 00           0  meta[10]
 054: 00 00 00 00           0  meta[11]
 058: 00 00 00 00           0  meta[12]
 05c: 00 00 03 ed        1005  Change counter for version number
 060: 00 2e 34 20     3028000  SQLite version number
PS G:\code-2\sqlite3>

检索空闲chunk链表:

PS G:\code-2\sqlite3> ./showdb small.db 13682tr
Pagesize: 4096
Available pages: 1..13876
Decode of freelist trunk page 13682:
 000: 00 00 34 0b       13323  Next freelist trunk page
 004: 00 00 02 29         553  Number of entries on this page
Decode of freelist trunk page 13323:
 000: 00 00 22 20        8736  Next freelist trunk page
 004: 00 00 03 f8        1016  Number of entries on this page
Decode of freelist trunk page 8736:
 000: 00 00 12 84        4740  Next freelist trunk page
 004: 00 00 03 f8        1016  Number of entries on this page
Decode of freelist trunk page 4740:
 000: 00 00 2f d6       12246  Next freelist trunk page
 004: 00 00 03 f8        1016  Number of entries on this page
Decode of freelist trunk page 12246:
 000: 00 00 1f b1        8113  Next freelist trunk page
 004: 00 00 03 f8        1016  Number of entries on this page
Decode of freelist trunk page 8113:
 000: 00 00 0d 3e        3390  Next freelist trunk page
 004: 00 00 03 f8        1016  Number of entries on this page
Decode of freelist trunk page 3390:
 000: 00 00 2b a0       11168  Next freelist trunk page
 004: 00 00 03 f8        1016  Number of entries on this page
Decode of freelist trunk page 11168:
 000: 00 00 1d 43        7491  Next freelist trunk page
 004: 00 00 03 f8        1016  Number of entries on this page
Decode of freelist trunk page 7491:
 000: 00 00 1c 73        7283  Next freelist trunk page
 004: 00 00 03 f8        1016  Number of entries on this page
Decode of freelist trunk page 7283:
 000: 00 00 27 6b       10091  Next freelist trunk page
 004: 00 00 03 f8        1016  Number of entries on this page
Decode of freelist trunk page 10091:
 000: 00 00 26 05        9733  Next freelist trunk page
 004: 00 00 03 f8        1016  Number of entries on this page
Decode of freelist trunk page 9733:
 000: 00 00 24 9e        9374  Next freelist trunk page
 004: 00 00 03 f8        1016  Number of entries on this page
Decode of freelist trunk page 9374:
 000: 00 00 00 2e          46  Next freelist trunk page
 004: 00 00 03 f8        1016  Number of entries on this page
Decode of freelist trunk page 46:
 000: 00 00 00 00           0  Next freelist trunk page
 004: 00 00 03 f8        1016  Number of entries on this page

查看具体某个chunk里面的空闲页:

PS G:\code-2\sqlite3> ./showdb small.db 46td
Pagesize: 4096
Available pages: 1..13876
Decode of freelist trunk page 46:
 000: 00 00 00 00           0  Next freelist trunk page
 004: 00 00 03 f8        1016  Number of entries on this page
    [0]      47    [1]    6360    [2]      48    [3]      49    [4]    6361
    [5]      50    [6]      51    [7]      52    [8]      53    [9]      54
   [10]    6363   [11]      55   [12]      56   [13]    6364   [14]      57
   [15]      58   [16]    6365   [17]      59   [18]      60   [19]      61
   [20]    6366   [21]      62   [22]      63   [23]    6367   [24]      64
   [25]      65   [26]    6368   [27]      66   [28]      67   [29]    6369
   [30]      68   [31]      69   [32]    6370   [33]      70   [34]      71
   [35]      72   [36]    6371   [37]      73   [38]      74   [39]    6372
   [40]      75   [41]      76   [42]    6373   [43]      77   [44]      78
   [45]    6374   [46]      79   [47]      80   [48]    6375   [49]      81
   [50]      82   [51]      83   [52]    6376   [53]      84   [54]      85
   [55]    6377   [56]      86   [57]      87   [58]    6378   [59]      88
   [60]      89   [61]      90   [62]    6379   [63]      91   [64]      92
   [65]    6380   [66]      93   [67]      94   [68]    6381   [69]      95
   [70]      96   [71]      97   [72]    6382   [73]      98   [74]      99
   [75]    6383   [76]     100   [77]     101   [78]    6384   [79]     102
   [80]     103   [81]     104   [82]    6385   [83]     105   [84]     106
   [85]    6386   [86]     107   [87]     108   [88]    6387   [89]     109
   [90]     110   [91]     111   [92]    6388   [93]     112   [94]     113
   [95]    6389   [96]     114   [97]     115   [98]    6390   [99]     116

但是这些页里面的数据可能是没有被清除掉的。

溢出页

它主要是指的是对应某个关键字的playload的数据过大,导致一个页放不下。计算方式有点复杂,参考showdb.c里面的

/*
** Compute the local payload size given the total payload size and
** the page size.
*/
static i64 localPayload(i64 nPayload, char cType){
  i64 maxLocal;
  i64 minLocal;
  i64 surplus;
  i64 nLocal;
  if( cType==13 ){
    /* Table leaf */
    maxLocal = pagesize-35;
    minLocal = (pagesize-12)*32/255-23;
  }else{
    maxLocal = (pagesize-12)*64/255-23;
    minLocal = (pagesize-12)*32/255-23;
  }
  if( nPayload>maxLocal ){
    surplus = minLocal + (nPayload-minLocal)%(pagesize-4);
    if( surplus<=maxLocal ){
      nLocal = surplus;
    }else{
      nLocal = minLocal;
    }
  }else{
    nLocal = nPayload;
  }
  return nLocal;
}

这里页头用最大值,也就是12字节,单个cell头部长23字节。
35 = 12 + 23; //应该是这个意思,不过具体cell部分的计算不太清楚。

所以让我们尝试给index插入2k大小的数据吧,毕竟index的允许的最大值只有25%,也就是1k大小。先准备好sql语句。

with open('large.sql', 'w+') as fp:
    s = ''
    for i in range(1000, 1500):
        s += ('%d'%(i))
    fp.write('insert into department values(%d, "%s", %d);\n'%(456521, s, 456520))

然后再导入test2.db,并且查询。

PS G:\code-2\sqlite3> ./sqlite3 .\test2.db
SQLite version 3.28.0 2019-04-16 19:49:53
Enter ".help" for usage hints.
sqlite> .sch
CREATE TABLE department(id int primary key, dept char(30), emp_id int) without rowid;
CREATE INDEX depName_index on department(dept);
sqlite> .read large.sql
sqlite> select * from department;
3|test33|33333
456521||456520
sqlite> select length(dept) from department where id = 456521;
2000
sqlite> .qu

查看页号2里面的数据,发现溢出了

PS G:\code-2\sqlite3> ./showdb test2.db 2bdccc
Pagesize: 4096
Available pages: 1..5
Header on btree page 2:
 000: 0a                   10  index leaf
 001: 0f ec              4076  Offset to first freeblock
 003: 00 02                 2  Number of cells on this page
 005: 0d eb              3563  Offset to cell content area
 007: 03                    3  Fragmented byte count
Cell[0]:
 fdd: 0e                         payload-size: 14
 fde: 04                         record-header-size: 4
 fdf: 01                         typecode[0]: 1 - int8
 fe0: 19                         typecode[1]: 25 - text(6)
 fe1: 03                         typecode[2]: 3 - int24
 fe2: 03                         data[0]: 3
 fe3: 74 65 73 74 33 33          data[1]: 'test33'
 fe9: 00 82 35                   data[2]: 33333
Cell[1]:
 deb: 8f 5b                      payload-size: 2011 (489 local, 1522 overflow)
 ded: 05                         record-header-size: 5
 dee: 03                         typecode[0]: 3 - int24
 def: 9f 2d                      typecode[1]: 4013 - text(2000)
 df1: 03                         typecode[2]: 3 - int24
 df2: 06 f7 49                   data[0]: 456521
 df5:                            ... 481 bytes of content ...
 fd6: 00 00 00 05                overflow-page: 5

这里就比较奇怪了,sqlite是直接通过计算比较得到字符串的长度,以及在本页内的存储长度,溢出页号作为可选字段,存放在实际内容之后。这就极大限制了payload fraction的实现了,这里相当于硬编码,计算得到的长度,不具备可扩展性。

溢出页的构造就简单多了,就是next-page-num + content。也就是一个链表。

你可能感兴趣的:(大文件分析 - (一))