提示:这里可以添加学习目标
例如:一周掌握 Java 入门知识
1、 2020 12.28 学习pager相关内容及Btree内容并完成对页的寻找
2、 2020 12.29 继续学习pager相关内容,完成对页数据的查找
3、 2021 1.14 掌握pager每一个数据的详细格式
4、 2021 1.18 完成数据页的hash
此篇博客用来学习sqlite3源码和对页的构造
B-tree和pager是相连接的,并且在事务和锁上起着关键作用。一个连接可以有多个数据库对象—一个主要的数据库以及附加的数据库,每一个数据库对象有一个B-tree对象,一个B-tree有一个pager对象,它是sqlite的核心模块,充当了多种重要角色。作为一个事务管理器,它通过并发控制和故障恢复实现事务的ACID特性,负责事务的原子提交和回滚;作为一个页管理器,它处理从文件中读写数据页,并执行文件空间管理工作;作为日志管理器,它负责写日志记录到日志文件;作为锁管理器,它确保事务在访问数据页之前,一定先对数据文件上锁,实现并发控制。本质上来说,pager模块实现了存储的持久性和事务的原子性。从图1中我们可以看到pager模块主要由4个子模块组成:事务管理模块,锁管理模块,日志模块和缓存模块。而事务模块的实现依赖于其它3个子模块。因此pager模块最核心的功能实质是由缓存模块、日志管理器和锁管理器完成。Tree模块是pager模块的上游,Tree模块在访问数据文件前,需要创建一个pager对象,通过pager对象来操作文件。pager模块利用pager对象来跟踪文件锁相关的信息,日志状态,数据库状态等。对于同一个文件,一个进程可能有多个pager对象;这些对象之间都是相互独立的。对于共享缓存模式,每个数据文件只有一个pager对象,所有连接共享这个pager对象。
由于新版本的引入,所有的pager.h中的内容全部移植到sqlite3.c,有些API接口也随之改变,首先看pager结构
struct Pager {
sqlite3_vfs *pVfs; /* OS functions to use for IO */
u8 exclusiveMode; /* Boolean. True if locking_mode==EXCLUSIVE */
u8 journalMode; /* One of the PAGER_JOURNALMODE_* values */ //
u8 useJournal; /* Use a rollback journal on this file */ //使用回滚日志
u8 noSync; /* Do not sync the journal if true */ //和日志是异步的
u8 fullSync; /* Do extra syncs of the journal for robustness */ //为了健壮,为日志进行额外的同步
u8 extraSync; /* sync directory after journal delete */ //在日志被删除之后,同步目录
u8 syncFlags; /* SYNC_NORMAL or SYNC_FULL otherwise */ //同步标志
u8 walSyncFlags; /* See description above */
u8 tempFile; /* zFilename is a temporary or immutable file */ //临时的文件
u8 noLock; /* Do not lock (except in WAL mode) */ //不进行锁定(在预写日志中例外)
u8 readOnly; /* True for a read-only database */ //如果是只读
u8 memDb; /* True to inhibit all file I/O */ //禁止所有的文件IO操作
/************************************************************************
The following block contains those class members that change during
routine operation. Class members not in this block are either fixed
when the pager is first created or else only change when there is a
significant mode change (such as changing the page_size, locking_mode,
or the journal_mode). From another view, these class members describe
the "state" of the pager, while other class members describe the
"configuration" of the pager.
*/
u8 eState; /* Pager state (OPEN, READER, WRITER_LOCKED..) */
u8 eLock; /* Current lock held on database file */ //当前锁定这个数据库文件的锁
u8 changeCountDone; /* Set after incrementing the change-counter */ //改变计数
u8 setMaster; /* True if a m-j name has been written to jrnl */ //
u8 doNotSpill; /* Do not spill the cache when non-zero */ //非零时不要溢出内存
u8 subjInMemory; /* True to use in-memory sub-journals */ //为true,只用内存子日志
u8 bUseFetch; /* True to use xFetch() */
u8 hasHeldSharedLock; /* True if a shared lock has ever been held */ //如果曾经持有共享锁,则为true
Pgno dbSize; /* Number of pages in the database */ //数据库中的页数
Pgno dbOrigSize; /* dbSize before the current transaction */ //在当前事务开始之前的dbSize
Pgno dbFileSize; /* Number of pages in the database file */ //数据库文件中的页数
Pgno dbHintSize; /* Value passed to FCNTL_SIZE_HINT call */ //
int errCode; /* One of several kinds of errors */ //错误代码
int nRec; /* Pages journalled since last j-header written */ //
u32 cksumInit; /* Quasi-random value added to every checksum */
u32 nSubRec; /* Number of records written to sub-journal */ //写入子日志的记录数
Bitvec *pInJournal; /* One bit for each page in the database file */ //用于跟踪日记页的位矢量
sqlite3_file *fd; /* File descriptor for database */ //数据库文件描述符
sqlite3_file *jfd; /* File descriptor for main journal */回滚日志文件描述
sqlite3_file *sjfd; /* File descriptor for sub-journal */
i64 journalOff; /* Current write offset in the journal file */ /*日志文件中的当前写偏移*/
i64 journalHdr; /* Byte offset to previous journal header */对上一个日志表头的日志偏移量
sqlite3_backup *pBackup; /* Pointer to list of ongoing backup processes */ 指向一个将要去备份处理的列表
PagerSavepoint *aSavepoint; /* Array of active savepoints */ 活跃的检查点列表
int nSavepoint; /* Number of elements in aSavepoint[] */ 检查点列表中检查点的个数
u32 iDataVersion; /* Changes whenever database content changes */数据库内容的改变计数
char dbFileVers[16]; /* Changes whenever database file changes */
int nMmapOut; /* Number of mmap pages currently outstanding */ 当前未完成的mmap页数
sqlite3_int64 szMmap; /* Desired maximum mmap size */希望的最大的mmap的大小
PgHdr *pMmapFreelist; /* List of free mmap page headers (pDirty) */ 空闲的页面
/*
End of the routinely-changing class members
***************************************************************************/
u16 nExtra; /* Add this many bytes to each in-memory page */ 在每个在内存的页面,都添加这些字节
i16 nReserve; /* Number of unused bytes at end of each page */ 在每个页面的尾部未使用的字节的个数
u32 vfsFlags; /* Flags for sqlite3_vfs.xOpen() */
u32 sectorSize; /* Assumed sector size during rollback */ 回滚期间假定的扇区大小
int pageSize; /* Number of bytes in a page */ 一个页面的大小
Pgno mxPgno; /* Maximum allowed size of the database */ 这个数据库允许的最大的页数
i64 journalSizeLimit; /* Size limit for persistent journal files */ 持久性日志文件的大小限制
char *zFilename; /* Name of the database file */ 数据库文件名称
char *zJournal; /* Name of the journal file */日志文件名称
int (*xBusyHandler)(void*); /* Function to call when busy */ 当忙碌的时候会调用
void *pBusyHandlerArg; /* Context argument for xBusyHandler */
int aStat[4]; /* Total cache hits, misses, writes, spills */
#ifdef SQLITE_TEST
int nRead; /* Database pages read */ 已经读取的页数
#endif
void (*xReiniter)(DbPage*); /* Call this routine when reloading pages */ 当重新加载页面时调用这个历程
int (*xGet)(Pager*,Pgno,DbPage**,int); /* Routine to fetch a patch */ 获取修补程序的历程
#ifdef SQLITE_HAS_CODEC
void *(*xCodec)(void*,void*,Pgno,int); /* Routine for en/decoding data */ 编码和解码数据的历程
void (*xCodecSizeChng)(void*,int,int); /* Notify of page size changes */ 页面大小的更改通知
void (*xCodecFree)(void*); /* Destructor for the codec */ 编码解码器的析构函数
void *pCodec; /* First argument to xCodec... methods */
#endif
char *pTmpSpace; /* Pager.pageSize bytes of space for tmp use */
PCache *pPCache; /* Pointer to page cache object */ 指向缓存实体的指针
#ifndef SQLITE_OMIT_WAL
Wal *pWal; /* Write-ahead log used by "journal_mode=wal" */
char *zWal; /* File name for write-ahead log */
#endif
};
然后分析sqlite3PagerOpen()函数
int sqlite3PagerOpen(
sqlite3_vfs *pVfs, /* The virtual file system to use */
Pager **ppPager, /* OUT: Return the Pager structure here */
const char *zFilename, /* Name of the database file to open */
int nExtra, /* Extra bytes append to each in-memory page */
int flags, /* 该标志位是否使用日志或是否是内存数据库 */
int vfsFlags, /* flags passed through to sqlite3_vfs.xOpen() */
void (*xReinit)(DbPage*) /* Function to reinitialize pages */
)
这个函数是打开pager的关键,第一项虚拟文件为sqlite *db 里所指向的pVfs,第二项为要定义的pager,第三项为数据库名称,第四项一般为0,第五项和第六项来说,看源码得知
static const int flags =
SQLITE_OPEN_READWRITE |
SQLITE_OPEN_CREATE |
SQLITE_OPEN_EXCLUSIVE |
SQLITE_OPEN_DELETEONCLOSE |
SQLITE_OPEN_TEMP_DB;
flags函数一般选项为0,vfsflags为以上结构体或者SQLITE_OPEN_MAIN_DB.
我们也可以通过sqlite3BtreeOpen()来打开获取pager,在源码中,sqlite3BtreeOpen()调用了sqlite3PagerOpen()函数,Btree *p = db->aDb[0].pBt
为共享页sharepager,sqlite3BtreeOpen()结构如下:
SQLITE_PRIVATE int sqlite3BtreeOpen(
sqlite3_vfs *pVfs, /* VFS to use with this b-tree */ 虚拟文件系统
const char *zFilename, /* Name of database file to open */ 数据库文件名
sqlite3 *db, /* Associated database connection */ 数据库
Btree **ppBtree, /* Return open Btree* here */ Btree
int flags, /* Flags */ 标志位
int vfsFlags /* Flags passed through to VFS open */ 虚拟标志位
);
它需要和sqlite3PagerOpen差不多的结构,Btree 为上面定义获取db数据库中pager,其他基本一致
获取到pager之后调用sqlite3PagerPagecount()API取到数据库的页面数。
/*
** Read the first N bytes from the beginning of the file into memory
** that pDest points to.
**
** If the pager was opened on a transient file (zFilename==""), or
** opened on a file less than N bytes in size, the output buffer is
** zeroed and SQLITE_OK returned. The rationale for this is that this
** function is used to read database headers, and a new transient or
** zero sized database has a header than consists entirely of zeroes.
**
** If any IO error apart from SQLITE_IOERR_SHORT_READ is encountered,
** the error code is returned to the caller and the contents of the
** output buffer undefined.
*/
SQLITE_PRIVATE int sqlite3PagerReadFileheader(Pager *pPager, int N, unsigned char *pDest){
int rc = SQLITE_OK;
memset(pDest, 0, N);
assert( isOpen(pPager->fd) || pPager->tempFile );
/* This routine is only called by btree immediately after creating
** the Pager object. There has not been an opportunity to transition
** to WAL mode yet.
*/
assert( !pagerUseWal(pPager) );
if( isOpen(pPager->fd) ){
IOTRACE(("DBHDR %p 0 %d\n", pPager, N))
rc = sqlite3OsRead(pPager->fd, pDest, N, 0);
if( rc==SQLITE_IOERR_SHORT_READ ){
rc = SQLITE_OK;
}
}
return rc;
}
/*
** This function may only be called when a read-transaction is open on
** the pager. It returns the total number of pages in the database.
**
** However, if the file is between 1 and bytes in size, then
** this is considered a 1 page file.
*/
SQLITE_PRIVATE void sqlite3PagerPagecount(Pager *pPager, int *pnPage){
assert( pPager->eState>=PAGER_READER );
assert( pPager->eState!=PAGER_WRITER_FINISHED );
*pnPage = (int)pPager->dbSize;
}
sqlite3PagerPagecount() 实现了对数据库的分页实现,每一页的物理大小为4096Kb,老版是还是1024,可以用sqlite3PagerSetPagesize() 设置每一页大小。
具体实现如下:
void selectdb(Pager *pager,sqlite3 *db){
int nPage = 0;
int szPage = 0;
sqlite3_vfs * pVfs;
Btree *p = db->aDb[0].pBt;
static const int flags = SQLITE_OPEN_READWRITE | SQLITE_OPEN_CREATE;
if(SQLITE_OK != sqlite3BtreeOpen(db->pVfs, "book.db", db, &db->aDb[0].pBt, 0, SQLITE_OPEN_MAIN_DB))
{
printf("failed!\n");
}
sqlite3PagerPagecount(p->pBt->pPager, &nPage);
printf("%u",nPage);
}
为什么不直接用db->aDb[0].pBt
去完成sqlite3PagerPagecount()获取页面数,因为数据库的pBt并没有pager结构体,而btree结构体有pager结构体。具体内容应该在db的结构体可以看到。
/*
** Return the currently defined page size
*/
SQLITE_PRIVATE int sqlite3BtreeGetPageSize(Btree *p){
return p->pBt->pageSize;
}
sqlite3BtreeGetPageSize()返回当前定义的页面大小。未修改为4096。
/*
** Return the pager associated with a BTree. This routine is used for
** testing and debugging only.
*/
SQLITE_PRIVATE Pager *sqlite3BtreePager(Btree *p){
return p->pBt->pPager;
}
sqlite3BtreePager()函数为返回Btree里的pager。
接下来需要获取到每一页的数据,知道接下来内容,首先要了解Pghr结构体
/*
** Every page in the cache is controlled by an instance of the following
** structure.
*/
struct PgHdr {
sqlite3_pcache_page *pPage; /* Pcache object page handle */
void *pData; /* Page data *// *页面数据* /
void *pExtra; /* Extra content */
PCache *pCache; /* PRIVATE: Cache that owns this page */
PgHdr *pDirty; /* Transient list of dirty sorted by pgno */
Pager *pPager; /* The pager this page is part of *//*pager为总页的部分*/
Pgno pgno; /* Page number for this page */*此页的页码* /
#ifdef SQLITE_CHECK_PAGES
u32 pageHash; /* Hash of page content */
#endif
u16 flags; /* PGHDR flags defined below */
/**********************************************************************
** Elements above, except pCache, are public. All that follow are
** private to pcache.c and should not be accessed by other modules.
** pCache is grouped with the public elements for efficiency.
*/
i16 nRef; /* Number of users of this page */
PgHdr *pDirtyNext; /* Next element in list of dirty pages */
PgHdr *pDirtyPrev; /* Previous element in list of dirty pages */
/* NB: pDirtyNext and pDirtyPrev are undefined if the
** PgHdr object is not dirty */
};
此结构体为缓存中的每个页面,除pCache外,以上元素是公共的。后面的所有内容都是pcache.c私有的,不应被其他模块访问。pCache与public元素组合在一起以提高效率。
原结构:SQLITE_PRIVATE int sqlite3PagerWrite(DbPage*);
/*
** Mark a data page as writeable. This routine must be called before
** making changes to a page. The caller must check the return value
** of this function and be careful not to change any page data unless
** this routine returns SQLITE_OK.
**
** The difference between this function and pager_write() is that this
** function also deals with the special case where 2 or more pages
** fit on a single disk sector. In this case all co-resident pages
** must have been written to the journal file before returning.
**
** If an error occurs, SQLITE_NOMEM or an IO error code is returned
** as appropriate. Otherwise, SQLITE_OK.
*/
SQLITE_PRIVATE int sqlite3PagerWrite(PgHdr *pPg){
Pager *pPager = pPg->pPager;
assert( (pPg->flags & PGHDR_MMAP)==0 );
assert( pPager->eState>=PAGER_WRITER_LOCKED );
assert( assert_pager_state(pPager) );
if( (pPg->flags & PGHDR_WRITEABLE)!=0 && pPager->dbSize>=pPg->pgno ){
if( pPager->nSavepoint ) return subjournalPageIfRequired(pPg);
return SQLITE_OK;
}else if( pPager->errCode ){
return pPager->errCode;
}else if( pPager->sectorSize > (u32)pPager->pageSize ){
assert( pPager->tempFile==0 );
return pagerWriteLargeSector(pPg);
}else{
return pager_write(pPg);
}
}
将数据页标记为可写。 必须先调用此例程,更改页面。 并不是修改pager的内容。
/* Dispatch all page fetch requests to the appropriate getter method.
*/
SQLITE_PRIVATE int sqlite3PagerGet(
Pager *pPager, /* The pager open on the database file */
Pgno pgno, /* Page number to fetch */
DbPage **ppPage, /* Write a pointer to the page here */
int flags /* PAGER_GET_XXX flags */
){
return pPager->xGet(pPager, pgno, ppPage, flags);
}
xGet是函数指针,有不同的实现方式,我们分析其最常用的getPageNormal,它会调用readDbPage,而后者则会调用sqlite3OsRead,通过文件句柄打开文件,读取page,
/*
** Return a pointer to the data for the specified page.
*/
SQLITE_PRIVATE void *sqlite3PagerGetData(DbPage *pPg){
assert( pPg->nRef>0 || pPg->pPager->memDb );
return pPg->pData;
}
sqlite3PagerGetData函数返回指向指定页面数据的指针。
struct sqlite3_context {
Mem *pOut; /* The return value is stored here */
FuncDef *pFunc; /* Pointer to function information */
Mem *pMem; /* Memory cell used to store aggregate context */
Vdbe *pVdbe; /* The VM that owns this context */
int iOp; /* Instruction number of OP_Function */
int isError; /* Error code returned by the function. */
u8 skipFlag; /* Skip accumulator loading if true */
u8 argc; /* Number of arguments */
sqlite3_value *argv[1]; /* Argument set */
};
struct sqlite3_pcache_page {
void *pBuf; /* The content of the page *//*页面内容*/
void *pExtra; /* Extra information associated with the page *//*页面额外内容*/
};
/*
** Try to obtain a page from the cache.
**
** This routine returns a pointer to an sqlite3_pcache_page object if
** such an object is already in cache, or if a new one is created.
** This routine returns a NULL pointer if the object was not in cache
** and could not be created.
**
** The createFlags should be 0 to check for existing pages and should
** be 3 (not 1, but 3) to try to create a new page.
**
** If the createFlag is 0, then NULL is always returned if the page
** is not already in the cache. If createFlag is 1, then a new page
** is created only if that can be done without spilling dirty pages
** and without exceeding the cache size limit.
**
** The caller needs to invoke sqlite3PcacheFetchFinish() to properly
** initialize the sqlite3_pcache_page object and convert it into a
** PgHdr object. The sqlite3PcacheFetch() and sqlite3PcacheFetchFinish()
** routines are split this way for performance reasons. When separated
** they can both (usually) operate without having to push values to
** the stack on entry and pop them back off on exit, which saves a
** lot of pushing and popping.
*/
SQLITE_PRIVATE sqlite3_pcache_page *sqlite3PcacheFetch(
PCache *pCache, /* Obtain the page from this cache */
Pgno pgno, /* Page number to obtain */
int createFlag /* If true, create page if it does not exist already */
){
int eCreate;
sqlite3_pcache_page *pRes;
assert( pCache!=0 );
assert( pCache->pCache!=0 );
assert( createFlag==3 || createFlag==0 );
assert( pCache->eCreate==((pCache->bPurgeable && pCache->pDirty) ? 1 : 2) );
/* eCreate defines what to do if the page does not exist.
** 0 Do not allocate a new page. (createFlag==0)
** 1 Allocate a new page if doing so is inexpensive.
** (createFlag==1 AND bPurgeable AND pDirty)
** 2 Allocate a new page even it doing so is difficult.
** (createFlag==1 AND !(bPurgeable AND pDirty)
*/
eCreate = createFlag & pCache->eCreate;
assert( eCreate==0 || eCreate==1 || eCreate==2 );
assert( createFlag==0 || pCache->eCreate==eCreate );
assert( createFlag==0 || eCreate==1+(!pCache->bPurgeable||!pCache->pDirty) );
pRes = sqlite3GlobalConfig.pcache2.xFetch(pCache->pCache, pgno, eCreate);
pcacheTrace(("%p.FETCH %d%s (result: %p)\n",pCache,pgno,
createFlag?" create":"",pRes));
return pRes;
}
读取页面的接口函数是sqlite3PcacheFetch(),在这个函数中需要通过sqlite3GlobalConfig.pcache2.xFetch()调用插件pcache1的接口,如果读取的页面不在缓存中时,由传入的第3个参数eCreate来控制创建缓存页的策略,sqlite3PcacheFetch()尝试从缓存中获取页面。createFlags应该为0以检查现有页面,并且应该为3(不是1,而是3)以尝试创建新页面。如果createFlag为0,则在页面始终返回NULL。如果createFlag为1,则创建一个新页面。
/* An instance of the BtreePayload object describes the content of a single
** entry in either an index or table btree.
**
** Index btrees (used for indexes and also WITHOUT ROWID tables) contain
** an arbitrary key and no data. These btrees have pKey,nKey set to the
** key and the pData,nData,nZero fields are uninitialized. The aMem,nMem
** fields give an array of Mem objects that are a decomposition of the key.
** The nMem field might be zero, indicating that no decomposition is available.
**
** Table btrees (used for rowid tables) contain an integer rowid used as
** the key and passed in the nKey field. The pKey field is zero.
** pData,nData hold the content of the new entry. nZero extra zero bytes
** are appended to the end of the content when constructing the entry.
** The aMem,nMem fields are uninitialized for table btrees.
**
** Field usage summary:
**
** Table BTrees Index Btrees
**
** pKey always NULL encoded key
** nKey the ROWID length of pKey
** pData data not used
** aMem not used decomposed key value
** nMem not used entries in aMem
** nData length of pData not used
** nZero extra zeros after pData not used
**
** This object is used to pass information into sqlite3BtreeInsert(). The
** same information used to be passed as five separate parameters. But placing
** the information into this object helps to keep the interface more
** organized and understandable, and it also helps the resulting code to
** run a little faster by using fewer registers for parameter passing.
*/
struct BtreePayload {
const void *pKey; /* Key content for indexes. NULL for tables */
sqlite3_int64 nKey; /* Size of pKey for indexes. PRIMARY KEY for tabs */
const void *pData; /* Data for tables. */
sqlite3_value *aMem; /* First of nMem value in the unpacked pKey */
u16 nMem; /* Number of aMem[] value. Might be zero */
int nData; /* Size of pData. 0 if none. */
int nZero; /* Extra zero data appended after pData,nData */
};
/*
** Argument pCsr must be a cursor opened for writing on an
** INTKEY table currently pointing at a valid table entry.
** This function modifies the data stored as part of that entry.
**
** Only the data content may only be modified, it is not possible to
** change the length of the data stored. If this function is called with
** parameters that attempt to write past the end of the existing data,
** no modifications are made and SQLITE_CORRUPT is returned.
*/
SQLITE_PRIVATE int sqlite3BtreePutData(BtCursor *pCsr, u32 offset, u32 amt, void *z){
int rc;
assert( cursorOwnsBtShared(pCsr) );
assert( sqlite3_mutex_held(pCsr->pBtree->db->mutex) );
assert( pCsr->curFlags & BTCF_Incrblob );
rc = restoreCursorPosition(pCsr);
if( rc!=SQLITE_OK ){
return rc;
}
assert( pCsr->eState!=CURSOR_REQUIRESEEK );
if( pCsr->eState!=CURSOR_VALID ){
return SQLITE_ABORT;
}
/*
** Read part of the payload for the row at which that cursor pCur is currently
** pointing. "amt" bytes will be transferred into pBuf[]. The transfer
** begins at "offset".
**
** pCur can be pointing to either a table or an index b-tree.
** If pointing to a table btree, then the content section is read. If
** pCur is pointing to an index b-tree then the key section is read.
**
** For sqlite3BtreePayload(), the caller must ensure that pCur is pointing
** to a valid row in the table. For sqlite3BtreePayloadChecked(), the
** cursor might be invalid or might need to be restored before being read.
**
** Return SQLITE_OK on success or an error code if anything goes
** wrong. An error is returned if "offset+amt" is larger than
** the available payload.
*/
SQLITE_PRIVATE int sqlite3BtreePayload(BtCursor *pCur, u32 offset, u32 amt, void *pBuf){
assert( cursorHoldsMutex(pCur) );
assert( pCur->eState==CURSOR_VALID );
assert( pCur->iPage>=0 && pCur->pPage );
assert( pCur->ixpPage->nCell );
return accessPayload(pCur, offset, amt, (unsigned char*)pBuf, 0);
}
SQLite数据库头位于数据库文件的前100个字节中。每个有效的SQLite数据库文件都以16个字节(十六进制)开头:53 51 4c 69 74 65 20 66 6f 72 6d 61 74 20 3300。表头字段的详细信息如下表所示。
页头格式
页头包含用来管理页的信息,它通常位于页的开始处。对于数据库文件的page 1,页头始于
第100 个字节处,因为前100 个字节是文件头(file header)。
前16 个字节为头字符串,程序中固定设为"SQLite format 3"。
0X1000:页大小,0X1000=4096 字节。
0X01:文件格式版本(写),值为1。
0X01:文件格式版本(读),值为1。
0X40:Btree 内部页中一个cell最多能够使用的空间。0X40=64,即25%。
0X20:Btree 内部页中一个cell使用空间的最小值。0X20=32,即12.5%。
0X20:Btree 叶子页中一个cell使用空间的最小值。0X20=32,即12.5%。
0X00000005:文件修改计数,现在已经修改了5 次,分别是1 次创建表和4次插入记录。
从0X20 开始的4 个字节:空闲页链表首指针。当前值为0,表示该链表为空。
从0X24 开始的4 个字节:文件内空闲页的数量。当前值为0。
从0X28 开始的4 个字节:Schema version。当前值为0X00000001。以后,每次sqlite_master
表被修改时,此值+1。
从0X38 开始的4 个字节:采用的字符编码。此处为0X00000001,表示采用的是UTF-8 编
码。
在pcache中,通过PCache结构对象作为连接句柄,每个缓存页通过PgHdr来表示。
读取一个page的过程(假设页号为P)
(1).在page cache中查找
通过页号,在hash表中搜索,定位到指定的桶,然后通过PgHdr1.pNext逐个比较是否是需要的页。如果找到,则将PgHdr.nRef加1,并将页面返回给上层调用模块。
(2).如果在page cache中没有找到,则获取一个空闲的slot,或者直接新建一个slot,只要不超过slot的阀值PCache1.nMax即可。
(3).如果没有可用的slot,则选择一个可以重用的slot(slot对应的页面需要释放,通过LRU算法)
(4).如果选择重用的slot对应的page是脏页,则将该页写入文件(对于wal,刷脏页前,先将脏页写入日志文件)
(5).加载page
如果页号P对应的偏移小于文件的大小,从文件读入page到slot,设置PgHdr.nRef为1,返回;如果页号P对应的偏移大于当前文件大小,则将slot中内容初始化为0,同样将PgHdr.nRef设置为1。
更新page的过程
这里假设page已经读取到内存中。Tree模块往page写入数据之前,需要调用sqlite3PagerWrite函数,使得一个page变为可写的状态,否则pager模块不知道这个page需要被修改。pager模块在数据文件上加一个reserved锁,并创建一个日志文件。如果加锁失败,则返回SQLITE_BUSY错误。它将page的原始信息拷贝到日志文件,如果日志文件中之前已经存在该page,则不进行拷贝动作,而只是将page标记为dirty。当page被写入文件后,dirty标记会被清除。