ElasticSearch位置搜索 - Spring , Hadoop, Spark , BI , ML - CSDN博客

在ElasticSearch中,地理位置通过 geo_point这个数据类型来支持。地理位置的数据需要提供经纬度信息,当经纬度不合法时,ES会拒绝新增文档。这种类型的数据支持距离计算,范围查询等。在底层,索引使用 Geohash实现。

1、创建索引

PUT创建一个索引 cn_large_cities, mapping为city:

      {
    "mappings":{
        "city":{
            "properties":{
                "city":{"type":"string"},
                "state":{"type":"string"},
                "location":{"type":"geo_point"}}}}}

geo_point类型必须显示指定,ES无法从数据中推断。在ES中,位置数据可以通过对象,字符串,数组三种形式表示,分别如下:

      #"lat,lon""location":"40.715,-74.011""location": {
  "lat":40.715,
  "lon":-74.011}

# [lon ,lat]"location":[-74.011,40.715]

POST下面4条测试数据:

      {"city":"Beijing", "state":"BJ","location":{"lat":"39.91667", "lon":"116.41667"}}

{"city":"Shanghai", "state":"SH","location":{"lat":"34.50000", "lon":"121.43333"}}

{"city":"Xiamen", "state":"FJ","location":{"lat":"24.46667", "lon":"118.10000"}}

{"city":"Fuzhou", "state":"FJ","location":{"lat":"26.08333", "lon":"119.30000"}}

{"city":"Guangzhou", "state":"GD","location":{"lat":"23.16667", "lon":"113.23333"}}

查看全部文档:

      curl -XGET"http://localhost:9200/cn_large_cities/city/_search?pretty=true"

返回全部的5条数据,score均为1:

ElasticSearch位置搜索 - Spring , Hadoop, Spark , BI , ML - CSDN博客_第1张图片

2、位置过滤

ES中有4中位置相关的过滤器,用于过滤位置信息:

  • geo_distance: 查找距离某个中心点距离在一定范围内的位置
  • geo_bounding_box: 查找某个长方形区域内的位置
  • geo_distance_range: 查找距离某个中心的距离在min和max之间的位置
  • geo_polygon: 查找位于多边形内的地点。

geo_distance

该类型过滤器查找的范围如下图:

下面是一个查询例子:

      {
  "query":{
    "filtered":{
      "filter":{
        "geo_distance":"1km",
        "location":{
          "lat":40.715,
          "lon":-73.988}}}}}

以下查询,查找距厦门500公里以内的城市:

      {
    "query":{
        "filtered":{
          "filter":{
            "geo_distance" :{
                "distance" :"500km",
                "location" :{
                    "lat" :24.46667,
                    "lon" :118.10000}}}}}}

geo_distance_range

      {
  "query":{
    "filtered":{
      "filter":{
        "geo_distance_range":{
        "gte":"1km",
        "lt":"2km",
        "location":{
          "lat":40.715,
          "lon":-73.988}}}}}

geo_bounding_box

      {
  "query":{
    "filtered":{
      "filter":{
        "geo_bounding_box":{
        "location":{
          "top_left":{
            "lat":40.8,
            "lon":-74.0},
          "bottom_right":{
            "lat":40.715,
            "lon":-73.0}}}}}}

3、按距离排序

接着我们按照距离厦门远近查找:

      {
  "sort" :[
      {
          "_geo_distance" :{
              "location" :{
                    "lat" :24.46667,
                    "lon" :118.10000}, 
              "order" :"asc",
              "unit" :"km"}}
  ],
  "query":{
    "filtered" :{
        "query" :{
            "match_all" :{}}}}}

结果如下,依次是厦门、福州、广州…。符合我们的常识:

      {
  "took":8,
  "timed_out":false,
  "_shards":{
    "total":5,
    "successful":5,
    "failed":0},
  "hits":{
    "total":5,
    "max_score":null,
    "hits":[
      {
        "_index":"us_large_cities",
        "_type":"city",
        "_id":"AVaiSGXXjL0tfmRppc_p",
        "_score":null,
        "_source":{
          "city":"Xiamen",
          "state":"FJ",
          "location":{
            "lat":"24.46667",
            "lon":"118.10000"}},
        "sort":[0]},
      {
        "_index":"us_large_cities",
        "_type":"city",
        "_id":"AVaiSSuNjL0tfmRppc_r",
        "_score":null,
        "_source":{
          "city":"Fuzhou",
          "state":"FJ",
          "location":{
            "lat":"26.08333",
            "lon":"119.30000"}},
        "sort":[216.61105485607183]},
      {
        "_index":"us_large_cities",
        "_type":"city",
        "_id":"AVaiSd02jL0tfmRppc_s",
        "_score":null,
        "_source":{
          "city":"Guangzhou",
          "state":"GD",
          "location":{
            "lat":"23.16667",
            "lon":"113.23333"}},
        "sort":[515.9964950041397]},
      {
        "_index":"us_large_cities",
        "_type":"city",
        "_id":"AVaiR7_5jL0tfmRppc_o",
        "_score":null,
        "_source":{
          "city":"Shanghai",
          "state":"SH",
          "location":{
            "lat":"34.50000",
            "lon":"121.43333"}},
        "sort":[1161.512141925948]},
      {
        "_index":"us_large_cities",
        "_type":"city",
        "_id":"AVaiRwLUjL0tfmRppc_n",
        "_score":null,
        "_source":{
          "city":"Beijing",
          "state":"BJ",
          "location":{
            "lat":"39.91667",
            "lon":"116.41667"}},
        "sort":[1725.4543712286697]}
    ]}}

结果返回的sort字段是指公里数。加上限制条件,只返回最近的一个城市:

      {

  "from":0,
  "size":1,
  "sort" :[
      {
          "_geo_distance" :{
              "location" :{
                    "lat" :24.46667,
                    "lon" :118.10000}, 
              "order" :"asc",
              "unit" :"km"}}
  ],
  "query":{
    "filtered" :{
        "query" :{
            "match_all" :{}}}}}

4、地理位置聚合

ES提供了3种位置聚合:

  • geo_distance: 根据到特定中心点的距离聚合
  • geohash_grid: 根据Geohash的单元格(cell)聚合
  • geo_bounds: 根据区域聚合

4.1 geo_distance聚合

下面这个查询根据距离厦门的距离来聚合,返回0-500,500-8000km的聚合:

      {
    "query":{
        "filtered":{
            "filter":{
                "geo_distance" :{
                    "distance" :"10000km",
                    "location" :{
                        "lat" :24.46667,
                        "lon" :118.10000}}}}},
    "aggs":{
        "per_ring":{
            "geo_distance":{
                "field":"location",
                "unit":"km",
                "origin":{
                    "lat" :24.46667,
                    "lon" :118.10000},
                "ranges":[
                    {"from":0, "to":500},
                    {"from":500, "to":8000}
                ]}}}}

返回的聚合结果如下;

      "aggregations": {
    "per_ring":{
      "buckets":[
        {
          "key":"*-500.0",
          "from":0,
          "from_as_string":"0.0",
          "to":500,
          "to_as_string":"500.0",
          "doc_count":2},
        {
          "key":"500.0-8000.0",
          "from":500,
          "from_as_string":"500.0",
          "to":8000,
          "to_as_string":"8000.0",
          "doc_count":3}
      ]}}

可以看到,距离厦门0-500km的城市有2个,500-8000km的有3个。

4.2 geohash_grid聚合

该聚合方式根据geo_point数据对应的geohash值所在的cell进行聚合,cell的划分精度通过 precision属性来控制,精度是指cell划分的次数。

      {
    "query":{
        "filtered":{
            "filter":{
                "geo_distance" :{
                    "distance" :"10000km",
                    "location" :{
                        "lat" :24.46667,
                        "lon" :118.10000}}}}},
    "aggs":{
        "grid_agg":{
            "geohash_grid":{
                "field":"location",
                "precision":2}}}}

聚合结果如下:

      "aggregations": {
    "grid_agg":{
      "buckets":[
        {
          "key":"ws",
          "doc_count":3},
        {
          "key":"wx",
          "doc_count":1},
        {
          "key":"ww",
          "doc_count":1}
      ]}}

可以看到,有3个城市的的geohash值为ws。将精度提高到5,聚合结果如下:

      "aggregations": {
    "grid_agg":{
      "buckets":[
        {
          "key":"wx4g1",
          "doc_count":1},
        {
          "key":"wwnk7",
          "doc_count":1},
        {
          "key":"wssu6",
          "doc_count":1},
        {
          "key":"ws7gp",
          "doc_count":1},
        {
          "key":"ws0eb",
          "doc_count":1}
      ]}}

4.3 geo_bounds聚合

这个聚合操作计算能够覆盖所有查询结果中geo_point的最小区域,返回的是覆盖所有位置的最小矩形:

      {
    "query":{
        "filtered":{
            "filter":{
                "geo_distance" :{
                    "distance" :"10000km",
                    "location" :{
                        "lat" :24.46667,
                        "lon" :118.10000}}}}},
    "aggs":{
        "map-zoom":{
            "geo_bounds":{
                "field":"location"}}}}

结果如下:

      "aggregations": {
    "map-zoom":{
      "bounds":{
        "top_left":{
          "lat":39.91666993126273,
          "lon":113.2333298586309},
        "bottom_right":{
          "lat":23.16666992381215,
          "lon":121.43332997336984}}}}

也就是说,这两个点构成的矩形能够包含所有到厦门距离10000km的区域。我们把距离调整为500km,此时覆盖这些城市的矩形如下:

      "aggregations": {
    "map-zoom":{
      "bounds":{
        "top_left":{
          "lat":26.083329990506172,
          "lon":118.0999999679625},
        "bottom_right":{
          "lat":24.46666999720037,
          "lon":119.29999999701977}}}}

5、参考资料

图解 MongoDB 地理位置索引的实现原理: http://blog.nosqlfan.com/html/1811.html
Geopoint数据类型: https://www.elastic.co/guide/en/elasticsearch/reference/current/geo-point.html

你可能感兴趣的:(ElasticSearch位置搜索 - Spring , Hadoop, Spark , BI , ML - CSDN博客)