Tokyo Cabinet TCHDB源码阅读——tchdbput及相关函数代码注释

    tchdbput用于向数据文件写入一条记录(record),写入的模式有很多种,比如覆盖写(overwrite)、追加写(append)、加一个整数、加一个浮点数或者调用回调函数处理,然后把结果当作记录写入等等,这个函数在执行时是直接写入硬盘中的数据文件的,和它对应有一个异步写入的函数,名为tchdbputasync,此函数暂时把记录放于drp(delayed record pool)中,在合适的时候再写回硬盘,我后面会有相应函数分析。下面我贴出tchdbput函数的代码及注释,仅当成个人学习记录:

/* Store a record into a hash database object. */ bool tchdbput(TCHDB *hdb, const void *kbuf, int ksiz, const void *vbuf, int vsiz){ assert(hdb && kbuf && ksiz >= 0 && vbuf && vsiz >= 0); if(!HDBLOCKMETHOD(hdb, false)) return false; // 检查是否可获得函数锁 uint8_t hash; // 根据key值计算出2个hash值bidx和hash,其中bidx用户索引hash桶数组,hash用于在冲突数中 //查找记录位置,另外冲突数也在很大程度上依靠hash来获得尽量的平衡 uint64_t bidx = tchdbbidx(hdb, kbuf, ksiz, &hash); if(hdb->fd < 0 || !(hdb->omode & HDBOWRITER)){ // 检查文件是否已打开或当前是否为写模式 tchdbsetecode(hdb, TCEINVALID, __FILE__, __LINE__, __func__); HDBUNLOCKMETHOD(hdb); return false; } if(hdb->async && !tchdbflushdrp(hdb)){ // 若是处于异步操作模式,则刷新延迟写缓冲区 HDBUNLOCKMETHOD(hdb); return false; } if(!HDBLOCKRECORD(hdb, bidx, true)){ // 检查是否可对记录进行操作 HDBUNLOCKMETHOD(hdb); return false; } if(hdb->zmode){ // 是否压缩记录,若是,则选择相应压缩算法压缩记录后再存储记录 char *zbuf; if(hdb->opts & HDBTDEFLATE){ zbuf = _tc_deflate(vbuf, vsiz, &vsiz, _TCZMRAW); } else if(hdb->opts & HDBTBZIP){ zbuf = _tc_bzcompress(vbuf, vsiz, &vsiz); } else if(hdb->opts & HDBTTCBS){ zbuf = tcbsencode(vbuf, vsiz, &vsiz); } else { zbuf = hdb->enc(vbuf, vsiz, &vsiz, hdb->encop); } if(!zbuf){ tchdbsetecode(hdb, TCEMISC, __FILE__, __LINE__, __func__); HDBUNLOCKRECORD(hdb, bidx); HDBUNLOCKMETHOD(hdb); return false; } bool rv = tchdbputimpl(hdb, kbuf, ksiz, bidx, hash, zbuf, vsiz, HDBPDOVER); TCFREE(zbuf); HDBUNLOCKRECORD(hdb, bidx); HDBUNLOCKMETHOD(hdb); if(hdb->dfunit > 0 && hdb->dfcnt > hdb->dfunit && !tchdbdefrag(hdb, hdb->dfunit * HDBDFRSRAT + 1)) rv = false; return rv; } bool rv = tchdbputimpl(hdb, kbuf, ksiz, bidx, hash, vbuf, vsiz, HDBPDOVER); // 调用tchdbputimpl写入记录 HDBUNLOCKRECORD(hdb, bidx); HDBUNLOCKMETHOD(hdb); if(hdb->dfunit > 0 && hdb->dfcnt > hdb->dfunit && !tchdbdefrag(hdb, hdb->dfunit * HDBDFRSRAT + 1)) rv = false; return rv; } 

  上面的函数中,主要是通过调用tchdbputimpl来实现写入记录功能,该函数如下:

/* Store a record. `hdb' specifies the hash database object. `kbuf' specifies the pointer to the region of the key. `ksiz' specifies the size of the region of the key. `bidx' specifies the index of the bucket array. `hash' specifies the hash value for the collision tree. `vbuf' specifies the pointer to the region of the value. `vsiz' specifies the size of the region of the value. `dmode' specifies behavior when the key overlaps. If successful, the return value is true, else, it is false. */ static bool tchdbputimpl(TCHDB *hdb, const char *kbuf, int ksiz, uint64_t bidx, uint8_t hash, const char *vbuf, int vsiz, int dmode){ assert(hdb && kbuf && ksiz >= 0); if(hdb->recc) tcmdbout(hdb->recc, kbuf, ksiz); /*从cache中删除记录*/ off_t off = tchdbgetbucket(hdb, bidx); // 从hash数组中取得对应记录的偏移量 off_t entoff = 0; // 用于查找冲突数,始终存放冲突树中,当前比较记录的父节点在文件中的偏移量 TCHREC rec; // 申请了一个记录结构,用于在内存中存放记录的相关信息 char rbuf[HDBIOBUFSIZ]; while(off > 0){ rec.off = off; if(!tchdbreadrec(hdb, &rec, rbuf)) return false; //从数据文件读取指定偏移的记录,即对应hash值的第一个记录 // 读取的方式为:首先看是否可以经过mmap映射的内存读,若可以则直接读,否则调用pread读取 if(hash > rec.hash){ // 从这里开始比较要插入的值和刚刚读取的记录,比较它们的二级hash值 off = rec.left; entoff = rec.off + (sizeof(uint8_t) + sizeof(uint8_t)); // 调整entoff } else if(hash < rec.hash){ off = rec.right; entoff = rec.off + (sizeof(uint8_t) + sizeof(uint8_t)) + (hdb->ba64 ? sizeof(uint64_t) : sizeof(uint32_t)); } else { // hash值相等,接着比较记录的key if(!rec.kbuf && !tchdbreadrecbody(hdb, &rec)) return false; int kcmp = tcreckeycmp(kbuf, ksiz, rec.kbuf, rec.ksiz); // 比较记录的key值,看是否与已存记录相等 if(kcmp > 0){ off = rec.left; TCFREE(rec.bbuf); rec.kbuf = NULL; rec.bbuf = NULL; entoff = rec.off + (sizeof(uint8_t) + sizeof(uint8_t)); // 调整entoff } else if(kcmp < 0){ off = rec.right; TCFREE(rec.bbuf); rec.kbuf = NULL; rec.bbuf = NULL; entoff = rec.off + (sizeof(uint8_t) + sizeof(uint8_t)) + (hdb->ba64 ? sizeof(uint64_t) : sizeof(uint32_t)); } else { // key值相等,在这里说明有相同记录存在,下面根据传入的模式进行相应处理 bool rv; int nvsiz; char *nvbuf; HDBPDPROCOP *procptr; switch(dmode){ case HDBPDKEEP: // 保持原记录不变 tchdbsetecode(hdb, TCEKEEP, __FILE__, __LINE__, __func__); TCFREE(rec.bbuf); return false; case HDBPDCAT: // 附加到原记录后面 if(vsiz < 1){ TCFREE(rec.bbuf); return true; } if(!rec.vbuf && !tchdbreadrecbody(hdb, &rec)){ TCFREE(rec.bbuf); return false; } nvsiz = rec.vsiz + vsiz; if(rec.bbuf){ TCREALLOC(rec.bbuf, rec.bbuf, rec.ksiz + nvsiz); memcpy(rec.bbuf + rec.ksiz + rec.vsiz, vbuf, vsiz); rec.kbuf = rec.bbuf; rec.vbuf = rec.kbuf + rec.ksiz; rec.vsiz = nvsiz; } else { TCMALLOC(rec.bbuf, nvsiz + 1); memcpy(rec.bbuf, rec.vbuf, rec.vsiz); memcpy(rec.bbuf + rec.vsiz, vbuf, vsiz); rec.vbuf = rec.bbuf; rec.vsiz = nvsiz; } rv = tchdbwriterec(hdb, &rec, bidx, entoff); TCFREE(rec.bbuf); return rv; case HDBPDADDINT: // 向原记录增加一个整数值,原记录也是整数 if(rec.vsiz != sizeof(int)){ tchdbsetecode(hdb, TCEKEEP, __FILE__, __LINE__, __func__); TCFREE(rec.bbuf); return false; } if(!rec.vbuf && !tchdbreadrecbody(hdb, &rec)){ TCFREE(rec.bbuf); return false; } int lnum; memcpy(&lnum, rec.vbuf, sizeof(lnum)); if(*(int *)vbuf == 0){ TCFREE(rec.bbuf); *(int *)vbuf = lnum; return true; } lnum += *(int *)vbuf; rec.vbuf = (char *)&lnum; *(int *)vbuf = lnum; rv = tchdbwriterec(hdb, &rec, bidx, entoff); TCFREE(rec.bbuf); return rv; case HDBPDADDDBL: // 向原记录增加一个浮点数值 if(rec.vsiz != sizeof(double)){ tchdbsetecode(hdb, TCEKEEP, __FILE__, __LINE__, __func__); TCFREE(rec.bbuf); return false; } if(!rec.vbuf && !tchdbreadrecbody(hdb, &rec)){ TCFREE(rec.bbuf); return false; } double dnum; memcpy(&dnum, rec.vbuf, sizeof(dnum)); if(*(double *)vbuf == 0.0){ TCFREE(rec.bbuf); *(double *)vbuf = dnum; return true; } dnum += *(double *)vbuf; rec.vbuf = (char *)&dnum; *(double *)vbuf = dnum; rv = tchdbwriterec(hdb, &rec, bidx, entoff); TCFREE(rec.bbuf); return rv; case HDBPDPROC: // 调用回调函数处理原记录 if(!rec.vbuf && !tchdbreadrecbody(hdb, &rec)){ TCFREE(rec.bbuf); return false; } procptr = *(HDBPDPROCOP **)((char *)kbuf - sizeof(procptr)); nvbuf = procptr->proc(rec.vbuf, rec.vsiz, &nvsiz, procptr->op); TCFREE(rec.bbuf); if(nvbuf == (void *)-1){ return tchdbremoverec(hdb, &rec, rbuf, bidx, entoff); } else if(nvbuf){ rec.kbuf = kbuf; rec.ksiz = ksiz; rec.vbuf = nvbuf; rec.vsiz = nvsiz; rv = tchdbwriterec(hdb, &rec, bidx, entoff); TCFREE(nvbuf); return rv; } tchdbsetecode(hdb, TCEKEEP, __FILE__, __LINE__, __func__); return false; default: break; } TCFREE(rec.bbuf); rec.ksiz = ksiz; rec.vsiz = vsiz; rec.kbuf = kbuf; rec.vbuf = vbuf; return tchdbwriterec(hdb, &rec, bidx, entoff); } } } // 执行到这里,说明应写入新记录 if(!vbuf){ tchdbsetecode(hdb, TCENOREC, __FILE__, __LINE__, __func__); return false; } if(!HDBLOCKDB(hdb)) return false; // 构造文件中的记录格式头部,格式为: magic number(1bytes) + hash value(1bytes) + left(4bytes or 8bytes) + // right chain(4bytes or 8bytes) + padding size(2bytes) rec.rsiz = hdb->ba64 ? sizeof(uint8_t) * 2 + sizeof(uint64_t) * 2 + sizeof(uint16_t) : sizeof(uint8_t) * 2 + sizeof(uint32_t) * 2 + sizeof(uint16_t); // 键值的存储变量(区域)是变长的,我们在这里计算出其存储区长度,每个字节只用7位,第8位作为符号位 if(ksiz < (1U << 7)){ rec.rsiz += 1; } else if(ksiz < (1U << 14)){ rec.rsiz += 2; } else if(ksiz < (1U << 21)){ rec.rsiz += 3; } else if(ksiz < (1U << 28)){ rec.rsiz += 4; } else { rec.rsiz += 5; } // value值长度和键值一样,同上 if(vsiz < (1U << 7)){ rec.rsiz += 1; } else if(vsiz < (1U << 14)){ rec.rsiz += 2; } else if(vsiz < (1U << 21)){ rec.rsiz += 3; } else if(vsiz < (1U << 28)){ rec.rsiz += 4; } else { rec.rsiz += 5; } if(!tchdbfbpsearch(hdb, &rec)){ // 首先看能否从空闲快数组找到合适的记录块,若不能,则标记为应添加到文件末尾 HDBUNLOCKDB(hdb); return false; } // 下面几条语句赋予的值将被存放在数据文件的记录中,它们在记录中的顺序和这里赋值的顺序恰好一致,呵呵,不同 // 的是,数据文件记录中在这几个值后面还有一个对齐填充区,这样下一个记录就能从对齐字节开始存放了。 rec.hash = hash; rec.left = 0; rec.right = 0; rec.ksiz = ksiz; rec.vsiz = vsiz; rec.psiz = 0; rec.kbuf = kbuf; rec.vbuf = vbuf; if(!tchdbwriterec(hdb, &rec, bidx, entoff)){ // 向数据文件写入记录 HDBUNLOCKDB(hdb); return false; } // 更新数据文件的统计信息,这里通过向mmap映射的内存区拷贝数据实现,因为我们至少会mmap数据文件控制信息到内存 hdb->rnum++; uint64_t llnum = hdb->rnum; llnum = TCHTOILL(llnum); memcpy(hdb->map + HDBRNUMOFF, &llnum, sizeof(llnum)); HDBUNLOCKDB(hdb); return true; }    

    在上面查找冲突树的过程中,由于是先比较二级hash值,找到相等的值,再进行key值的比较,最终决定写入记录的位置,因此对于具有相同二级hash值的记录而言,它们可能会由于key的不同而导致与其它具有不同二级hash值的记录相互混杂,由这里我们可以知道,插入过程构造出来的冲突树是非常杂乱无章的,没有规律可言,这也许能进一步促进冲突数区域平衡吧。

你可能感兴趣的:(Tokyo Cabinet TCHDB源码阅读——tchdbput及相关函数代码注释)