swish-e搜索引擎,源代码分析(7)

前面部分对于词条进行了压缩等,从这部分开始分析索引文件的写入。

2.5 索引文件写入

2.5.1 write_index_file函数分析

基本流程为:

  • 通过 coalesce_all_word_locations函数将词条按照metaID和filenum进行排序;
  • 通过sort_words对于词条进行排序;
  • write_header写入索引文件头部;
  • write_index写入词条信息到索引文件。

2.5.2 coalesce_all_word_locations函数分析

void coalesce_all_word_locations(SWISH * sw, IndexFILE * indexf) { int i; ENTRY *epi; for (i = 0; i < VERYBIGHASHSIZE; i++) { if ((epi = sw->Index->hashentries[i])) { while (epi) { coalesce_word_locations(sw, epi); epi = epi->next; } } } }

遍历hash表中的词条,通过coalesce_word_locations合并词条信息。

2.5.3 sortChunkLocations函数分析

在coalesce_word_locations函数中,先通过sortChunkLocations对于词条进行排序。

static void sortChunkLocations(ENTRY * e) { int i, j, k, filenum,metaID,frequency; unsigned char flag; unsigned char *ptmp, *ptmp2, *compressed_data; int *pi = NULL; LOCATION *l, *prev = NULL, **lp; /* Very trivial case */ if (!e) return; if(!e->currentChunkLocationList) return; /*取得该词条所含的LOCATION的个数*/ /* Get the number of locations in chunk */ for(i = 0, l = e->currentChunkLocationList; l; i++) l=*(LOCATION **)l; /* Get next location */ /*单个比较信息的内容,存放了metaID,filenum,以及 指向LOCAITON的指针*/ /* Compute array wide */ j = 2 * sizeof(int) + sizeof(void *); /*计算出整个比较数组的大小*/ /* Compute array size */ ptmp = (void *) emalloc(j * i); /* Build an array with the elements to compare and pointers to data */ for(l = e->currentChunkLocationList, ptmp2 = ptmp; l; ) { pi = (int *) ptmp2; compressed_data = (unsigned char *)l; /* Jump next offset */ compressed_data += sizeof(LOCATION *); metaID = uncompress2(&compressed_data); uncompress_location_values(&compressed_data,&flag,&filenum,&frequency); /*从LOCATION中取出metaID和filenum, 存放在数组中*/ pi[0] = metaID; pi[1] = filenum; /*ptmp2指向存放指针的buffer位置*/ ptmp2 += 2 * sizeof(int); lp = (LOCATION **)ptmp2; /存放了LOCATION指针*/ *lp = l; ptmp2 += sizeof(void *); /*LOCATION结构的初始位置存放了 next location 指针*/ /* Get next location */ l=*(LOCATION **)l; /* Get next location */ } /* *通过上面的处理,得到的数组结构为 ----------------------------------------------------------- | L1中的metaID | L1中的filenum| 指向LOCATION结构L1的指针| ----------------------------------------------------------- |<--------------一个LOCATION的信息--------------------->| /* Sort them */ swish_qsort(ptmp, i, j, &icomp2); /*通过快速排序将数组按照metaID和filenum进行排序*/ /* Store results */ for (k = 0, ptmp2 = ptmp; k < i; k++) { ptmp2 += 2 * sizeof(int); l = *(LOCATION **)ptmp2; if(!k) e->currentChunkLocationList = l; else prev->next =l; ptmp2 += sizeof(void *); prev = l; } l->next =NULL; /* Free the memory of the array */ efree(ptmp); }

 

 

你可能感兴趣的:(swish-e搜索引擎,源代码分析(7))