1)、搜索title或者content中包含 "java solution"的文档;
2)、我们需要找到最相似的文档,无论是titile最相似或者是content最相似;
best fields策略:搜索到的结果,应该是某一个field中匹配到了尽可能多的关键词;而不是尽可能多的field匹配到了少数的关键词。
dis_max语法:直接取多个query中,分数最高的那一个query的分数即可。
我们可以看到:
文档1 : title和content都包含了"java solution"的一部分;
文档2 : title 包含了 “java solution”, 而content不包含"java solution"的任何部分。
POST /forum/article/_bulk
{ "index": { "_id": 1 }}
{ "title" : "solution 1 ","content":" why java ?" }
{ "index": { "_id": 2 }}
{ "title" : "java solution ","content":" why program ? " }
GET /forum/article/_search
{
"query": {
"dis_max": {
"queries": [
{"match": {
"title": "java solution"
}},
{"match": {
"content": "java solution"
}}
]
}
}
}
--返回结果如下
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 0.51623213,
"hits": [
{
"_index": "forum",
"_type": "article",
"_id": "2",
"_score": 0.51623213,
"_source": {
"title": "java solution ",
"content": " why program ? "
}
},
{
"_index": "forum",
"_type": "article",
"_id": "1",
"_score": 0.25811607,
"_source": {
"title": "solution 1 ",
"content": " why java ?"
}
}
]
}
}
我们可以看到文档2的分数明显高于文档1
使用tie_breaker将其他query的分数也考虑进去。
dis_max,只是取分数最高的那个query的分数而已,完全不考虑其他query的分数,这种一刀切的做法,
可能导致在有其他query的影响下,score不准确的情况,这时为了使用结果更准确,最好还是要考虑到其他query的影响。
1、某个帖子,doc1,title中包含java,content不包含java solution任何一个关键词;
2、某个帖子,doc2,content中包含solution,title中不包含任何一个关键词;
3、某个帖子,doc3,title中包含java,content中包含beginner;
4、最终搜索,可能出来的结果是,doc1和doc2排在doc3的前面,而不是我们期望的doc3排在最前面;
POST /forum/article/_bulk
{ "index": { "_id": 1 }}
{ "title" : "java","content":"this is program" }
{ "index": { "_id": 2 }}
{ "title" : "python","content":"solution is" }
{ "index": { "_id": 3 }}
{ "title" : "java program","content":"solution is" }
GET /forum/article/_search
{
"query": {
"dis_max": {
"queries": [
{"match": {
"title": "java solution"
}},
{"match": {
"content": "java solution"
}}
],
"tie_breaker": 0.3
}
}
}
返回结果如下:
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 0.3355509,
"hits": [
{
"_index": "forum",
"_type": "article",
"_id": "3",
"_score": 0.3355509,
"_source": {
"title": "java program",
"content": "solution is"
}
},
{
"_index": "forum",
"_type": "article",
"_id": "1",
"_score": 0.2876821,
"_source": {
"title": "java",
"content": "this is program"
}
},
{
"_index": "forum",
"_type": "article",
"_id": "2",
"_score": 0.25811607,
"_source": {
"title": "python",
"content": "solution is"
}
}
]
}
}
我们可以观察到,doc3得分最高
GET /forum/article/_search
{
"query": {
"multi_match": {
"query": "java solution",
"type": "best_fields",
"fields": [ "title", "content" ],
"tie_breaker": 0.3
}
}
}
--返回结果如下
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 0.3355509,
"hits": [
{
"_index": "forum",
"_type": "article",
"_id": "3",
"_score": 0.3355509,
"_source": {
"title": "java program",
"content": "solution is"
}
},
{
"_index": "forum",
"_type": "article",
"_id": "1",
"_score": 0.2876821,
"_source": {
"title": "java",
"content": "this is program"
}
},
{
"_index": "forum",
"_type": "article",
"_id": "2",
"_score": 0.25811607,
"_source": {
"title": "python",
"content": "solution is"
}
}
]
}
}
我们可以看到,返回结果和上面使用dis_max是一模一样的。