在我们实际的很多位置搜索中,我们有许多案例需要针对某个区域的搜索结果进行加权,从而使得这个区域的搜索结果的得分较高而排在返回结果的前面。比如有一下的一些使用场景:
针对上面的两种情况,我们可能需要针对他们进行特别区域的划分。我们可以使用一个 Polygon 来画一个我们想要的区域,并对它的搜索结果进行加权。
我们可以通过 Elasticsearch 所提供的 compound query:
{
"query": {
"bool": {
"must": [
搜索的区域
],
"should": [
对搜索区域交叉的区域进行加权
]
}
}
}
如果你对 compound query 不是很熟的话,请参考我之前的文章 “开始使用Elasticsearch (2)”。
在做这个练习之前,你可以参考我之前的文章 “Elasticsearch:如何制作 GeoJSON 文件并进行地理位置搜索”。在那里我详述了如何把数据导入及使用 GeoJSON 来制作一个边界。 针对今天的练习,我们使用如下的数据:
POST my_locations/_bulk
{ "index" : { "_id" : "3" } }
{ "location" : [ -104.06876, 39.77462 ], "name": "C" }
{ "index" : { "_id" : "4" } }
{ "location" : [ -103.59538, 38.5718 ], "name": "D" }
{ "index" : { "_id" : "5" } }
{ "location" : [ -104.94538, 38.16629 ], "name": "E" }
{ "index" : { "_id" : "1" } }
{ "location" : [ -105.38369, 40.11067 ], "name": "A" }
{ "index" : { "_id" : "6" } }
{ "location" : [ -107.99602, 39.17918 ], "name": "F" }
{ "index" : { "_id" : "2" } }
{ "location" : [ -104.34051, 40.03688 ], "name": "B" }
运行上面的命令,创建相应的索引模式。按照之前的文章,为了展示的目的,我们也创建了一个 GeoJSON 文件:
simple.json
{
"type": "FeatureCollection",
"features": [
{
"type": "Feature",
"properties": {},
"geometry": {
"type": "Polygon",
"coordinates": [
[
[
-106.10465,
40.16875
],
[
-106.0736,
39.33315
],
[
-105.142,
39.16482
],
[
-103.85329,
39.18889
],
[
-103.52723,
39.77609
],
[
-104.17935,
40.27545
],
[
-105.17305,
40.33465
],
[
-106.10465,
40.16875
]
]
]
}
},
{
"type": "Feature",
"properties": {},
"geometry": {
"type": "Polygon",
"coordinates": [
[
[
-109.07025,
41.00014
],
[
-109.07025,
36.99584
],
[
-102.02114,
36.99584
],
[
-102.02114,
41.00014
],
[
-109.07025,
41.00014
] ]
]
}
}
]
}
我们可以按照文章 “Elasticsearch:如何制作 GeoJSON 文件并进行地理位置搜索” 中所介绍的那样制作相应的边界:
如上图所示,文档 A, B, C 位于定义的 Polygon 之内,而 D, E, F 则不在 Polygon 之内。我们现在的要求是:
按照上面的要求,我们可以进行如下的搜索:
GET my_locations/_search
{
"query": {
"bool": {
"must": [
{
"geo_shape": {
"location": {
"shape": {
"type": "polygon",
"coordinates": [
[
[
-109.07025,
41.00014
],
[
-109.07025,
36.99584
],
[
-102.02114,
36.99584
],
[
-102.02114,
41.00014
],
[
-109.07025,
41.00014
]
]
]
}
}
}
}
],
"should": [
{
"geo_polygon": {
"location": {
"points": [
[
-106.10465,
40.16875
],
[
-106.0736,
39.33315
],
[
-105.142,
39.16482
],
[
-103.85329,
39.18889
],
[
-103.52723,
39.77609
],
[
-104.17935,
40.27545
],
[
-105.17305,
40.33465
],
[
-106.10465,
40.16875
]
]
}
}
}
]
}
}
}
请注意在 must 中,我们使用的是在 GeoJSON 文件 simple.json 中的 rectangle 的坐标,而在 should 中我们使用的是 ploygon 的坐标。由于 rectange 可以看做是 ploygon 的一种特殊形式,我们统一使用 geo_shape 来进行搜索。当然在这里针对 rectangle 的搜索你也可以使用 geo_bounding_box 来进行搜索。
搜索的结果如下:
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 6,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "my_locations",
"_type" : "_doc",
"_id" : "3",
"_score" : 1.0,
"_source" : {
"location" : [
-104.06876,
39.77462
],
"name" : "C"
}
},
{
"_index" : "my_locations",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"location" : [
-105.38369,
40.11067
],
"name" : "A"
}
},
{
"_index" : "my_locations",
"_type" : "_doc",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"location" : [
-104.34051,
40.03688
],
"name" : "B"
}
},
{
"_index" : "my_locations",
"_type" : "_doc",
"_id" : "4",
"_score" : 0.0,
"_source" : {
"location" : [
-103.59538,
38.5718
],
"name" : "D"
}
},
{
"_index" : "my_locations",
"_type" : "_doc",
"_id" : "5",
"_score" : 0.0,
"_source" : {
"location" : [
-104.94538,
38.16629
],
"name" : "E"
}
},
{
"_index" : "my_locations",
"_type" : "_doc",
"_id" : "6",
"_score" : 0.0,
"_source" : {
"location" : [
-107.99602,
39.17918
],
"name" : "F"
}
}
]
}
}
从返回的结果来看,A, B, C 文档的得分较高,并排在前面。
如果我们不使用加权:
GET my_locations/_search
{
"query": {
"bool": {
"must": [
{
"geo_shape": {
"location": {
"shape": {
"type": "polygon",
"coordinates": [
[
[
-109.07025,
41.00014
],
[
-109.07025,
36.99584
],
[
-102.02114,
36.99584
],
[
-102.02114,
41.00014
],
[
-109.07025,
41.00014
]
]
]
}
}
}
}
]
}
}
}
搜索的结果是:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 6,
"relation" : "eq"
},
"max_score" : 0.0,
"hits" : [
{
"_index" : "my_locations",
"_type" : "_doc",
"_id" : "3",
"_score" : 0.0,
"_source" : {
"location" : [
-104.06876,
39.77462
],
"name" : "C"
}
},
{
"_index" : "my_locations",
"_type" : "_doc",
"_id" : "4",
"_score" : 0.0,
"_source" : {
"location" : [
-103.59538,
38.5718
],
"name" : "D"
}
},
{
"_index" : "my_locations",
"_type" : "_doc",
"_id" : "5",
"_score" : 0.0,
"_source" : {
"location" : [
-104.94538,
38.16629
],
"name" : "E"
}
},
{
"_index" : "my_locations",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.0,
"_source" : {
"location" : [
-105.38369,
40.11067
],
"name" : "A"
}
},
{
"_index" : "my_locations",
"_type" : "_doc",
"_id" : "6",
"_score" : 0.0,
"_source" : {
"location" : [
-107.99602,
39.17918
],
"name" : "F"
}
},
{
"_index" : "my_locations",
"_type" : "_doc",
"_id" : "2",
"_score" : 0.0,
"_source" : {
"location" : [
-104.34051,
40.03688
],
"name" : "B"
}
}
]
}
}
从上面,我们可以看出来 A,B,C 的结果不一定是在前面。
在上面需要注意的一点是:geo_shape 搜索在最新的版本中是建议可替代 geo_polygon,但是在实际的使用中,我发现 geo_shape 的搜索是不给任何分数的,score 为 0。geo_bounding_box 以及 geo_polygon 是可以给出一个分数的。在这种应用场景中建议使用它们来计分。