ElasticSearch_搜索_best_fields策略

需求

1)、搜索title或者content中包含 "java solution"的文档;
2)、我们需要找到最相似的文档,无论是titile最相似或者是content最相似;

best_fields策略

best fields策略:搜索到的结果,应该是某一个field中匹配到了尽可能多的关键词;而不是尽可能多的field匹配到了少数的关键词。

dis_max语法:直接取多个query中,分数最高的那一个query的分数即可。

数据准备

我们可以看到:
文档1 : title和content都包含了"java solution"的一部分;
文档2 : title 包含了 “java solution”, 而content不包含"java solution"的任何部分。


POST /forum/article/_bulk
{ "index": { "_id": 1 }}
{ "title" : "solution 1 ","content":" why java ?" }
{ "index": { "_id": 2 }}
{ "title" : "java  solution ","content":" why program ? " }

使用dis_max实现best_fields策略

GET /forum/article/_search
{
  "query": {
    "dis_max": {
      "queries": [
        {"match": {
          "title": "java solution"
        }},
        {"match": {
          "content": "java solution"
        }}
      ]
    }
  }
}

--返回结果如下
{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 0.51623213,
    "hits": [
      {
        "_index": "forum",
        "_type": "article",
        "_id": "2",
        "_score": 0.51623213,
        "_source": {
          "title": "java  solution ",
          "content": " why program ? "
        }
      },
      {
        "_index": "forum",
        "_type": "article",
        "_id": "1",
        "_score": 0.25811607,
        "_source": {
          "title": "solution 1 ",
          "content": " why java ?"
        }
      }
    ]
  }
}
我们可以看到文档2的分数明显高于文档1

使用tie_breaker优化dis_max

什么是tie_breaker

使用tie_breaker将其他query的分数也考虑进去。

tie_breaker的使用原因

dis_max,只是取分数最高的那个query的分数而已,完全不考虑其他query的分数,这种一刀切的做法,
可能导致在有其他query的影响下,score不准确的情况,这时为了使用结果更准确,最好还是要考虑到其他query的影响。


1、某个帖子,doc1,title中包含java,content不包含java solution任何一个关键词;
2、某个帖子,doc2,content中包含solution,title中不包含任何一个关键词;
3、某个帖子,doc3,title中包含java,content中包含beginner;
4、最终搜索,可能出来的结果是,doc1和doc2排在doc3的前面,而不是我们期望的doc3排在最前面;

POST /forum/article/_bulk
{ "index": { "_id": 1 }}
{ "title" : "java","content":"this is program" }
{ "index": { "_id": 2 }}
{ "title" : "python","content":"solution is" }
{ "index": { "_id": 3 }}
{ "title" : "java program","content":"solution is" }


GET /forum/article/_search
{
  "query": {
    "dis_max": {
      "queries": [
        {"match": {
          "title": "java solution"
        }},
        {"match": {
          "content": "java solution"
        }}
      ],
      "tie_breaker": 0.3
    }
  }
}

返回结果如下:
{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": 0.3355509,
    "hits": [
      {
        "_index": "forum",
        "_type": "article",
        "_id": "3",
        "_score": 0.3355509,
        "_source": {
          "title": "java program",
          "content": "solution is"
        }
      },
      {
        "_index": "forum",
        "_type": "article",
        "_id": "1",
        "_score": 0.2876821,
        "_source": {
          "title": "java",
          "content": "this is program"
        }
      },
      {
        "_index": "forum",
        "_type": "article",
        "_id": "2",
        "_score": 0.25811607,
        "_source": {
          "title": "python",
          "content": "solution is"
        }
      }
    ]
  }
}
我们可以观察到,doc3得分最高

multi_match 语法实现 dis_max + tie_breaker


GET /forum/article/_search
{
  "query": {
    "multi_match": {
        "query":                "java solution",
        "type":                 "best_fields", 
        "fields":               [ "title", "content" ],
        "tie_breaker":          0.3
    }
  } 
}

--返回结果如下

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": 0.3355509,
    "hits": [
      {
        "_index": "forum",
        "_type": "article",
        "_id": "3",
        "_score": 0.3355509,
        "_source": {
          "title": "java program",
          "content": "solution is"
        }
      },
      {
        "_index": "forum",
        "_type": "article",
        "_id": "1",
        "_score": 0.2876821,
        "_source": {
          "title": "java",
          "content": "this is program"
        }
      },
      {
        "_index": "forum",
        "_type": "article",
        "_id": "2",
        "_score": 0.25811607,
        "_source": {
          "title": "python",
          "content": "solution is"
        }
      }
    ]
  }
}

我们可以看到,返回结果和上面使用dis_max是一模一样的。

你可能感兴趣的:(ElasticSearch)