综合排序：function score query优化算分

算分和排序

elasticsearch默认会议文档的相关度算分进行排序
可以通过制定一个或者多个字段进行排序
使用县官渡算分（_score）排序，不能满足某些特定条件
- 无法针对相关度，堆排序实现更多的控制

function score query

function score query
- 可以在查询结束后，对每一个匹配的文档进行一系列的重新算分，根据新生成的分数进行排序
提供了几种默认的计算分值的函数
- weight：为每一个文档设置一个简单而不被规范化的权重
- field value factor：使用该数值来修改_score，例如将“热度”和“点赞数”作为算分的参考因素
- random score：为每一个用户使用一个不同的，随机算分的结果
- 衰减函数：以某个字段的值为标准，距离某个值越近，得分越高
- script score：自定义脚本完全控制所需逻辑

DELETE blogs
PUT /blogs/_doc/1
{
  "title":   "About popularity",
  "content": "In this post we will talk about...",
  "votes":   0
}

PUT /blogs/_doc/2
{
  "title":   "About popularity",
  "content": "In this post we will talk about...",
  "votes":   100
}

PUT /blogs/_doc/3
{
  "title":   "About popularity",
  "content": "In this post we will talk about...",
  "votes":   1000000
}


POST /blogs/_search
{
  "query": {
    "function_score": {
      "query": {
        "multi_match": {
          "query":    "popularity",
          "fields": [ "title", "content" ]
        }
      },
      "field_value_factor": {
        "field": "votes"
      }
    }
  }
}

POST /blogs/_search
{
  "query": {
    "function_score": {
      "query": {
        "multi_match": {
          "query":    "popularity",
          "fields": [ "title", "content" ]
        }
      },
      "field_value_factor": {
        "field": "votes",
        "modifier": "log1p"
      }
    }
  }
}


POST /blogs/_search
{
  "query": {
    "function_score": {
      "query": {
        "multi_match": {
          "query":    "popularity",
          "fields": [ "title", "content" ]
        }
      },
      "field_value_factor": {
        "field": "votes",
        "modifier": "log1p" ,
        "factor": 0.1
      }
    }
  }
}


POST /blogs/_search
{
  "query": {
    "function_score": {
      "query": {
        "multi_match": {
          "query":    "popularity",
          "fields": [ "title", "content" ]
        }
      },
      "field_value_factor": {
        "field": "votes",
        "modifier": "log1p" ,
        "factor": 0.1
      },
      "boost_mode": "sum",
      "max_boost": 3
    }
  }
}

POST /blogs/_search
{
  "query": {
    "function_score": {
      "random_score": {
        "seed": 911119
      }
    }
  }
}

term&phrase suggester

什么是搜索建议

现代搜索引擎，一般会提供suggest as you type的功能
帮助用户在输入搜索的过程中，进行自动补全或者纠错。通过协助用户输入更加精准的关键词，提高后续搜索阶段文档匹配的程度
在google上搜索，一开始会自动补全，当输入到一定长度，如因为单词拼写错误无法补全，就会开始提示相似的词或者句子

elasticsearch suggester api

搜索引擎中类似的功能，在elasticsearch中是通过suggester api实现的
原理：将输入的文本分解为token，然后再索引的字典里查找相似的term并返回
根据不同的使用场景，elasticsearch设计了4种类别的suggesters
- term&suggester
- complete&context suggester

term suggester

suggester是一种特殊类型的搜索。“text”里是条用的时候提供的文本，通常来自于用户界面上用户输入的内容
用户输入的“lucen”是一个错误的拼写
会到指定的“body”上搜索，当无法搜索到结果时，建议返回的值

image.png

term suggester - missing mode

搜索“lucen rock”
- 每个建议都包含了一个算分，相似性是通过levenshtein edit distance的算法实现的。核心思想就是一个词改动多少个字符就可以和灵台一个词一致。提供了很多可选参数来控制相似性的模糊程度。例如“max_edits”
几种suggestion mode
- missing - 如索引中已经存在，就不提供建议
- popular - 推荐出现频率更加高的词
- always - 无论是否存在，都提供建议
  
  image.png

phrase suggester

phrase suggester在term suggester上增加了一些额外的逻辑
一些参数
- suggest mode：missing,popular,always
- max errors：最多可以拼错的terms数
- confidence：限制返回的结果数，默认为1

image.png

DELETE articles
PUT articles
{
  "mappings": {
    "properties": {
      "title_completion":{
        "type": "completion"
      }
    }
  }
}

POST articles/_bulk
{ "index" : { } }
{ "title_completion": "lucene is very cool"}
{ "index" : { } }
{ "title_completion": "Elasticsearch builds on top of lucene"}
{ "index" : { } }
{ "title_completion": "Elasticsearch rocks"}
{ "index" : { } }
{ "title_completion": "elastic is the company behind ELK stack"}
{ "index" : { } }
{ "title_completion": "Elk stack rocks"}
{ "index" : {} }


POST articles/_search?pretty
{
  "size": 0,
  "suggest": {
    "article-suggester": {
      "prefix": "elk ",
      "completion": {
        "field": "title_completion"
      }
    }
  }
}

DELETE articles

POST articles/_bulk
{ "index" : { } }
{ "body": "lucene is very cool"}
{ "index" : { } }
{ "body": "Elasticsearch builds on top of lucene"}
{ "index" : { } }
{ "body": "Elasticsearch rocks"}
{ "index" : { } }
{ "body": "elastic is the company behind ELK stack"}
{ "index" : { } }
{ "body": "Elk stack rocks"}
{ "index" : {} }
{  "body": "elasticsearch is rock solid"}


POST _analyze
{
  "analyzer": "standard",
  "text": ["Elk stack  rocks rock"]
}

POST /articles/_search
{
  "size": 1,
  "query": {
    "match": {
      "body": "lucen rock"
    }
  },
  "suggest": {
    "term-suggestion": {
      "text": "lucen rock",
      "term": {
        "suggest_mode": "missing",
        "field": "body"
      }
    }
  }
}


POST /articles/_search
{

  "suggest": {
    "term-suggestion": {
      "text": "lucen rock",
      "term": {
        "suggest_mode": "popular",
        "field": "body"
      }
    }
  }
}


POST /articles/_search
{

  "suggest": {
    "term-suggestion": {
      "text": "lucen rock",
      "term": {
        "suggest_mode": "always",
        "field": "body",
      }
    }
  }
}


POST /articles/_search
{

  "suggest": {
    "term-suggestion": {
      "text": "lucen hocks",
      "term": {
        "suggest_mode": "always",
        "field": "body",
        "prefix_length":0,
        "sort": "frequency"
      }
    }
  }
}


POST /articles/_search
{
  "suggest": {
    "my-suggestion": {
      "text": "lucne and elasticsear rock hello world ",
      "phrase": {
        "field": "body",
        "max_errors":2,
        "confidence":0,
        "direct_generator":[{
          "field":"body",
          "suggest_mode":"always"
        }],
        "highlight": {
          "pre_tag": "",
          "post_tag": ""
        }
      }
    }
  }
}

自动补全与机遇上下文的提示

the completion suggester

completion suggester 提供了自动完成auto complete的功能，用户每输入一个字符，就需要即时发送一个查询请求到后端查找匹配项
对性能要求比较苛刻。elasticsearch采用了不同的数据结构，并非通过倒排索引来完成。而是将analuze的数据编码成fst和索引一起存放。fst会被es整个加载进内存，速度很快。
fst只能用户前缀查找

使用completion suggester的一些步骤

定义mapping，使用“completion”type
索引数据
运行“suggest”查询，得到搜索建议

image.png

什么是context suggester

completion suggester的拓展
可以在搜索中加入更多的上下文信息，例如：“star”
- 咖啡相关：建议“starbucks”
- 电影相关：“star wars”

实现context suggester

可以定义两种类型的context
- category - 任意的字符串
- geo - 地理位置信息
实现contest suggester的具体步骤
- 定制一个mapping
- 索引数据，并且为每个文档加入context信息
- 结合context进行suggestion查询

精准度和召回率

精准度
- completion > phrase > term
召回率
- term > phrase > completion
性能
- completion > phrase > term

DELETE articles
PUT articles
{
  "mappings": {
    "properties": {
      "title_completion":{
        "type": "completion"
      }
    }
  }
}

POST articles/_bulk
{ "index" : { } }
{ "title_completion": "lucene is very cool"}
{ "index" : { } }
{ "title_completion": "Elasticsearch builds on top of lucene"}
{ "index" : { } }
{ "title_completion": "Elasticsearch rocks"}
{ "index" : { } }
{ "title_completion": "elastic is the company behind ELK stack"}
{ "index" : { } }
{ "title_completion": "Elk stack rocks"}
{ "index" : {} }


POST articles/_search?pretty
{
  "size": 0,
  "suggest": {
    "article-suggester": {
      "prefix": "elk ",
      "completion": {
        "field": "title_completion"
      }
    }
  }
}


DELETE comments
PUT comments
PUT comments/_mapping
{
  "properties": {
    "comment_autocomplete":{
      "type": "completion",
      "contexts":[{
        "type":"category",
        "name":"comment_category"
      }]
    }
  }
}

POST comments/_doc
{
  "comment":"I love the star war movies",
  "comment_autocomplete":{
    "input":["star wars"],
    "contexts":{
      "comment_category":"movies"
    }
  }
}

POST comments/_doc
{
  "comment":"Where can I find a Starbucks",
  "comment_autocomplete":{
    "input":["starbucks"],
    "contexts":{
      "comment_category":"coffee"
    }
  }
}


POST comments/_search
{
  "suggest": {
    "MY_SUGGESTION": {
      "prefix": "sta",
      "completion":{
        "field":"comment_autocomplete",
        "contexts":{
          "comment_category":"coffee"
        }
      }
    }
  }
}

【elasticsearch】20、function score query&term&&phrase suggester&自动补全及上下文提示

综合排序：function score query优化算分

算分和排序

function score query

term&phrase suggester

什么是搜索建议

elasticsearch suggester api

term suggester

term suggester - missing mode

phrase suggester

自动补全与机遇上下文的提示

the completion suggester

使用completion suggester的一些步骤

什么是context suggester

实现context suggester

精准度和召回率

你可能感兴趣的:(【elasticsearch】20、function score query&term&&phrase suggester&自动补全及上下文提示)