ES分词算法

算法介
1、relevance score算法,就是计算出,一个索引中的文本,与搜索文本,他们之间的关联匹配程度
2、ES使用的是,term frequency/inverse document frequency算法,简称TF/IDF算法
3、term frequency:搜索文本中的各个词条在field文本中出现了多少次,出现次数越多,就越相关
4、inverse document frequency:搜索文本中的各个词条在整个索引的所有文档中出现了多少次,出现的次数越多,就越不相关
例如:在一个index中,hello在所有的document中出现了1000次,world出现了100次,如果检索一个单词 hello world,那么出现world的这个document的相关度要高些
5、filed-length norm :filed长度,filed越长,相关度越弱
例如:一个document的其中一个filed包含了搜索词而且这个词只有10个字符,另外一个document的filed包含搜索词,但是有1000个字符,那么短的这个更加相关

查看某一个document的分数

GET /web/info/2/_explain
{
  "query": {
   "match": {
     "content": "in"
   }
  }
}

结果

{
  "_index": "web",
  "_type": "info",
  "_id": "2",
  "matched": true,
  "explanation": {
    "value": 0.2876821,
    "description": "weight(content:in in 0) [PerFieldSimilarity], result of:",
    "details": [
      {
        "value": 0.2876821,
        "description": "score(doc=0,freq=1.0 = termFreq=1.0\n), product of:",
        "details": [
          {
            "value": 0.2876821,
            "description": "idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:",
            "details": [
              {
                "value": 1,
                "description": "docFreq",
                "details": []
              },
              {
                "value": 1,
                "description": "docCount",
                "details": []
              }
            ]
          },
          {
            "value": 1,
            "description": "tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:",
            "details": [
              {
                "value": 1,
                "description": "termFreq=1.0",
                "details": []
              },
              {
                "value": 1.2,
                "description": "parameter k1",
                "details": []
              },
              {
                "value": 0.75,
                "description": "parameter b",
                "details": []
              },
              {
                "value": 5,
                "description": "avgFieldLength",
                "details": []
              },
              {
                "value": 5,
                "description": "fieldLength",
                "details": []
              }
            ]
          }
        ]
      }
    ]
  }
}

你可能感兴趣的:(Elasticsearch)