2020-12-04 DB file 如何存在硬盘

how query raw

refer:https://serverfault.com/questions/395472/mysql-innodb-ext3-block-size

Filesystem block size should not have bad impact on InnoDB. I'm not speaking about tiny bits of cpu-bound performance, since filesystem overhead for it is vanishingly small. What you should worry about is IO performance.

When mysql needs to read InnodDB page from disks it accesses inode structure for the file. ext3 inode contains references to 15 blocks. First 12 points to data blocks directly. The rest 3 points to blocks, containing other blocks references, which also may be direct or indirect.

So if InnoDB page lays in first (124)=48KB of the file - it will be fetched in 2 IO operations: 1 for inode, second for data block, if it lays in first (124 + 1024)4=4.2MB in 3 ops, (12+1024+1024^2)4=4GB - 4 ops, (12bs+1024+10242+10243)4=4TB - 5 ops.

1024 is the number of 4byte block reference in 4k block.

[图片上传失败...(image-dbd95a-1607066331014)]

Readahead(preallocation for writes) and caching will reduce this count, allowing to read/write several blocks at once.

Block size of 4k is the same as linux memory page size, making page caching easier to code.

When Innodb page will be written first time, ext3 will preallocate 8 sequential blocks (32kb) and write 4 of them, other 4 will be discarded (or used for one more page). All changes to this page will be stored on the same blocks.

Reducing blocksize has only benefit in saving disk space, since 1 block is minimal unit of data to store on disk.

Increasing it (there's some kernel patches to do it) will improve performance for very large files, but not that much, as you may think. Matching it to InnoDB page size makes no sense, since in the vast majority of cases data blocks for one InnoDB page will lay sequentially on disk and will be read/written in single operation.

innodb 维护文件

image.png

mysql innodb中 文件被称为空间, 表对应的文件则叫表空间,而表空间又会被分为多个页(一般每个16 KiB),每个页都会有唯一标识number(32bit)(实际值是从表空间开始位置的offset,偏移量为相距的byte数)以及数据和其他的page meta信息。而关键的来了,这些页是维护在一个B+tree下的,页实际对应的就是B+tree内的node,而页中存的数据,则对应着B+tree 飞叶子node的索引key和子node指针(唯一标识number),如果是叶子node的话则对应着 (索引key 和record)。

https://blog.jcole.us/2013/01/03/the-basics-of-innodb-space-file-layout/

InnoDB’s data storage model uses “spaces”, often called “tablespaces” in the context of MySQL, and sometimes called “file spaces” in InnoDB itself. A space may consist of multiple actual files at the operating system level (e.g. ibdata1, ibdata2, etc.) but it is just a single logical file — multiple physical files are just treated as though they were physically concatenated together.

Each space in InnoDB is assigned a 32-bit integer space ID, which is used in many different places to refer to the space. InnoDB always has a “system space”, which is always assigned the space ID of 0. The system space is used for various special bookkeeping that InnoDB requires. Through MySQL, InnoDB currently only supports additional spaces in the form of “file per table” spaces, which create an .ibd file for each MySQL table. Internally, this .ibd file is actually a fully functional space which could contain multiple tables, but in the implementation with MySQL, they will only contain a single table.

Pages
Each space is divided into pages, normally 16 KiB each (this can differ for two reasons: if the compile-time define UNIV_PAGE_SIZE is changed, or if InnoDB compression is used). Each page within a space is assigned a 32-bit integer page number, often called “offset”, which is actually just the page’s offset from the beginning of the space (not necessarily the file, for multi-file spaces). So, page 0 is located at file offset 0, page 1 at file offset 16384, and so on. (The astute may remember that InnoDB has a limit of 64TiB of data; this is actually a limit per space, and is due primarily to the page number being a 32-bit integer combined with the default page size: 232 x 16 KiB = 64 TiB.)

B+tree 示例

image.png

你可能感兴趣的:(2020-12-04 DB file 如何存在硬盘)