OKapi BM25 算法介绍

From wikipedia.org英文版,我主要将其改变成中文。


  BM25(Best Match25)是在信息检索系统中根据提出的query对document进行评分的算法。It is based on the probabilistic retrieval framework developed in the 1970s and 1980s by Stephen E. RobertsonKaren Spärck Jones, and others.BM25算法首先由OKapi系统实现,所以又称为OKapi BM25。

  

      BM25属于bag-of-words模型,bag-of-words模型只考虑document中词频,不考虑句子结构或者语法关系之类,把document当做装words的袋子,具体袋子里面可以是杂乱无章的。It is not a single function, but actually a whole family of scoring functions, with slightly different components and parameters. One of the most prominent instantiations of the function is as follows.

  对于一个query , 包括关键字 , 一个文档的BM25得分:

其中IDF是上篇文章《 TD-IDF》中的IDF,f是《 TD-IDF》中的TF,|D|是文档D的长度,avgdl是语料库全部文档的平均长度。k 1和b是参数。 usually chosen, in absence of an advanced optimization, as   and 

你可能感兴趣的:(api)