Elasticsearch: range 数据类型及基于range的聚合 (7.4发行版新功能)

在Elasticsearch中有一种数据类型叫做range的数据类型。它目前支持的类型如下:

integer_range 一个带符号的32位整数范围,最小值为,最大值为
float_range 一系列单精度32位IEEE 754浮点值。
long_range 一系列带符号的64位整数,最小值为-2的63次方,最大值为2的63次方-1。
double_range 一系列双精度64位IEEE 754浮点值。
date_range 自系EPOCH以来经过的一系列日期值,表示为无符号的64位整数毫秒。
ip_range 支持IPv4或IPv6(或混合)地址的一系列ip值。

 

Range 数据类型搜索

下面是一个简单的例子来展示这个数据类型的。首先我们来创建一个叫做range_index的索引,并同时定义一个mapping:

PUT range_index
{
  "settings": {
    "number_of_shards": 2
  },
  "mappings": {
    "properties": {
      "expected_attendees": {
        "type": "integer_range"
      },
      "time_frame": {
        "type": "date_range", 
        "format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"
      }
    }
  }
}

然后,我们利用这个索引来输入一个文档:

PUT range_index/_doc/1?refresh
{
  "expected_attendees" : { 
    "gte" : 10,
    "lte" : 20
  },
  "time_frame" : { 
    "gte" : "2015-10-31 12:00:00", 
    "lte" : "2015-11-01"
  }
}

在上面的文档中,我们输入了两个range的数据,它们分别对应我们之前在mapping中定义的integer_range及date_range。

下面我们可以使用一个term query来查询integer_range字段expected_attendees:

GET range_index/_search
{
  "query": {
    "term": {
      "expected_attendees": {
        "value": "10"
      }
    }
  }
}

显示结果:

    "hits" : [
      {
        "_index" : "range_index",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "expected_attendees" : {
            "gte" : 10,
            "lte" : 20
          },
          "time_frame" : {
            "gte" : "2015-10-31 12:00:00",
            "lte" : "2015-11-01"
          }
        }
      }
    ]

因为10刚好是在我们之前的文档定义的10-20区间。为了验证我们的搜索是否有效,我们可以做另外的一个搜索:

GET range_index/_search
{
  "query": {
    "term": {
      "expected_attendees": {
        "value": "40"
      }
    }
  }
}

因为40不在我们的10-20的区间,所以我们搜索的结果显示为空。

同样地,我们可以针对时间区间来进行搜索:

GET range_index/_search
{
  "query" : {
    "range" : {
      "time_frame" : { 
        "gte" : "2015-10-31",
        "lte" : "2015-11-01",
        "relation" : "within" 
      }
    }
  }
}

因为上面的区间正好是在我们文档定义的区间之内,所以显示的结果为:

    "hits" : [
      {
        "_index" : "range_index",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "expected_attendees" : {
            "gte" : 10,
            "lte" : 20
          },
          "time_frame" : {
            "gte" : "2015-10-31 12:00:00",
            "lte" : "2015-11-01"
          }
        }
      }
    ]

相反,如果我们在这个时间之外的区间来进行搜索:

GET range_index/_search
{
  "query": {
    "range": {
      "time_frame": {
        "gte": "2017-10-31",
        "lte": "2018-11-01"
      }
    }
  }
}

显示的结果为空:

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 2,
    "successful" : 2,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 0,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  }
}

Range 数据类型聚合

 

在这一节里,我们来针对Range的数据类型来做聚合展示。这是Elasticsearch 7.4发行版的一个新的功能。

Elasticsearch: range 数据类型及基于range的聚合 (7.4发行版新功能)_第1张图片

在针对range聚合时,它会让用户可以更轻松地计算与特定存储桶重叠的范围数。例如,range字段上的日期直方图聚合使用户可以计算在特定分钟内发生的电话呼叫次数,或者可以计算给定日期休假的员工人数。

 

准备数据

我们还是拿我们之前的那个sports数据来进行展示。首先,我们来创建一个索引及mapping:

PUT sports
{
  "mappings": {
    "properties": {
      "age": {
        "type": "integer"
      },
      "birthdate": {
        "type": "date",
        "format": "date_optional_time"
      },
      "goals": {
        "type": "integer"
      },
      "location": {
        "type": "geo_point"
      },
      "name": {
        "type": "keyword"
      },
      "rating": {
        "type": "integer"
      },
      "role": {
        "type": "keyword"
      },
      "score_weight": {
        "type": "float"
      },
      "sport": {
        "type": "keyword"
      },
      "age_range": {
        "type": "integer_range"
      }
    }
  }
}

请注意上面的一个字段age_range。它的类型是integer_range类型的。我们利用Elasticsearch所提供的Bulk API接口来把如下的数据导入到Elasticsearch之中:
 

{"index":{"_index":"sports"}}
{"name":"Michael", "birthdate":"1989-10-1", "sport":"Football", "rating": ["5", "4"],  "location":"46.22,-68.45","goals": "43","score_weight":"3","role":"midfielder","age": 30, "age_range": {"gte": 27, "lte": 30}  }
{"index":{"_index":"sports"}}
{"name":"Bob", "birthdate":"1989-11-2", "sport":"Football", "rating": ["3", "4"],  "location":"45.21,-68.35", "goals": "54","score_weight":"2", "role":"forward", "age": 30, "age_range": {"gte": 27, "lte": 30} }
{"index":{"_index":"sports"}}
{"name":"Jim", "birthdate":"1988-10-3", "sport":"Football", "rating": ["3", "2"],  "location":"45.16,-63.58", "goals": "73", "score_weight":"2", "role":"forward", "age": 31, "age_range": {"gte": 31, "lte": 32} }
{"index":{"_index":"sports"}}
{"name":"Joe", "birthdate":"1992-5-20", "sport":"Basketball", "rating": ["4", "3"],  "location":"45.22,-68.53", "goals": "848", "score_weight":"3", "role":"midfielder", "age": 27, "age_range": {"gte": 27, "lte": 30}  }
{"index":{"_index":"sports"}}
{"name":"Tim", "birthdate":"1992-2-28", "sport":"Basketball", "rating": ["3", "3"],  "location":"46.22,-68.85", "goals": "942", "score_weight":"2","role":"forward", "age": 27, "age_range": {"gte": 27, "lte": 30} }
{"index":{"_index":"sports"}}
{"name":"Alfred", "birthdate":"1990-9-9", "sport":"Football", "rating": ["2", "2"],  "location":"45.12,-68.35", "goals": "53", "score_weight":"4", "role":"defender", "age": 29, "age_range": {"gte": 27, "lte": 30} }
{"index":{"_index":"sports"}}
{"name":"Jeff", "birthdate":"1990-4-1", "sport":"Hockey", "rating": ["2", "3"], "location":"46.12,-68.55", "goals": "93","score_weight":"3","role":"midfielder", "age": 29, "age_range": {"gte": 27, "lte": 30} }
{"index":{"_index":"sports"}}
{"name":"Will", "birthdate":"1988-3-1", "sport":"Hockey", "rating": ["4", "4"], "location":"46.25,-84.25", "goals": "124", "score_weight":"2", "role":"forward", "age": 31, "age_range": {"gte": 31, "lte": 32} }
{"index":{"_index":"sports"}}
{"name":"Mick", "birthdate":"1989-10-1", "sport":"Football", "rating": ["3", "4"],  "location":"46.22,-68.45","goals": "56","score_weight":"3", "role":"midfielder", "age": 30, "age_range": {"gte": 27, "lte": 30}}
{"index":{"_index":"sports"}}
{"name":"Pong", "birthdate":"1989-11-2", "sport":"Basketball", "rating": ["1", "3"],  "location":"45.21,-68.35","goals": "1483","score_weight":"2", "role":"forward", "age": 30, "age_range": {"gte": 27, "lte": 30}}
{"index":{"_index":"sports"}}
{"name":"Ray", "birthdate":"1988-10-3", "sport":"Football", "rating": ["2", "2"],  "location":"45.16,-63.58","goals": "84", "score_weight":"3", "role":"midfielder", "age": 31, "age_range": {"gte": 31, "lte": 32} }
{"index":{"_index":"sports"}}
{"name":"Ping", "birthdate":"1992-5-20", "sport":"Basketball", "rating": ["4", "3"],  "location":"45.22,-68.53","goals": "1328", "score_weight":"2", "role":"forward", "age": 27, "age_range": {"gte": 27, "lte": 30}}
{"index":{"_index":"sports"}}
{"name":"Duke", "birthdate":"1992-2-28", "sport":"Hockey", "rating": ["5", "2"],  "location":"46.22,-68.85", "goals": "218", "score_weight":"2", "role":"forward", "age": 27, "age_range": {"gte": 27, "lte": 30}}
{"index":{"_index":"sports"}}
{"name":"Hal", "birthdate":"1990-9-9", "sport":"Hockey", "rating": ["4", "2"],  "location":"45.12,-68.35","goals": "148", "score_weight":"3", "role":"midfielder", "age": 29, "age_range": {"gte": 27, "lte": 30}}
{"index":{"_index":"sports"}}
{"name":"Charge", "birthdate":"1990-4-1", "sport":"Football", "rating": ["3", "2"], "location":"44.19,-82.55","goals": "34", "score_weight":"4", "role":"defender", "age": 29, "age_range": {"gte": 27, "lte": 30}}
{"index":{"_index":"sports"}}
{"name":"Barry", "birthdate":"1988-3-1", "sport":"Football", "rating": ["5", "2"], "location":"36.45,-79.15", "age":"20", "score_weight":"4", "role":"defender", "age": 31, "age_range": {"gte": 31, "lte": 32} }
{"index":{"_index":"sports"}}
{"name":"Bank", "birthdate":"1988-3-1", "sport":"Handball", "rating": ["6", "4"], "location":"46.25,-54.53", "age":"25", "goals": "150", "score_weight":"4", "role":"defender", "age": 31, "age_range": {"gte": 31, "lte": 32} }
{"index":{"_index":"sports"}}
{"name":"Bingo", "birthdate":"1988-3-1", "sport":"Handball", "rating": ["10", "7"], "location":"46.25,-68.55", "age":"29", "goals": "143", "score_weight":"3", "role":"midfielder", "age": 31, "age_range": {"gte": 31, "lte": 32} }
{"index":{"_index":"sports"}}
{"name":"James", "birthdate":"1988-3-1", "sport":"Basketball", "rating": ["10", "8"], "location":"41.25,-69.55", "age":"36", "goals": "1284", "score_weight":"2", "role":"forward", "age": 31, "age_range": {"gte": 31, "lte": 32} }
{"index":{"_index":"sports"}}
{"name":"Wayne", "birthdate":"1988-3-1", "sport":"Hockey", "rating": ["10", "10"], "location":"46.21,-68.55", "age":"25", "goals": "113", "score_weight":"3", "role":"midfielder", "age": 31, "age_range": {"gte": 31, "lte": 32} }
{"index":{"_index":"sports"}}
{"name":"Brady", "birthdate":"1988-3-1", "sport":"Handball", "rating": ["10", "10"], "location":"63.24,-84.55", "age":"29", "goals": "443", "score_weight":"2", "role":"forward", "age": 31, "age_range": {"gte": 31, "lte": 32} }
{"index":{"_index":"sports"}}
{"name":"Lewis", "birthdate":"1988-3-1", "sport":"Football", "rating": ["10", "10"], "location":"56.25,-74.55", "age":"24", "goals": "49", "score_weight":"3", "role":"midfielder", "age": 31, "age_range": {"gte": 31, "lte": 32} }

注意在我们的数据里,我们定义两个年龄段27-30及30-32 。这个是在age_range字段里表示的。

首先,我们来做一个histogram的查询:

GET sports/_search
{
  "size": 0,
  "aggs": {
    "age_distogram": {
      "histogram": {
        "field": "age",
        "interval": 1
      }
    }
  }
}

我们按照年龄来进行一个直方图来表示我们的年龄的分布。显示的结果是:

  "aggregations" : {
    "age_distogram" : {
      "buckets" : [
        {
          "key" : 27.0,
          "doc_count" : 4
        },
        {
          "key" : 28.0,
          "doc_count" : 0
        },
        {
          "key" : 29.0,
          "doc_count" : 4
        },
        {
          "key" : 30.0,
          "doc_count" : 4
        },
        {
          "key" : 31.0,
          "doc_count" : 10
        }
      ]
    }
  }

我们也可以通过Kibana来表示:

Elasticsearch: range 数据类型及基于range的聚合 (7.4发行版新功能)_第2张图片

从上面的图上我们可以看出来各个年龄的文档数量的分布情况。

我们仔细地看一下我们的一个文档:

        "_source" : {
          "name" : "Michael",
          "birthdate" : "1989-10-1",
          "sport" : "Football",
          "rating" : [
            "5",
            "4"
          ],
          "location" : "46.22,-68.45",
          "goals" : "43",
          "score_weight" : "3",
          "role" : "midfielder",
          "age" : 30,
          "age_range" : {
            "gte" : 27,
            "lte" : 30
          }
        }

我们可以看出来在我们的文档里含有一个字段叫做age_range的。它定义了这个运动员所在的年龄范围。我们可以通过这个字段来对我们的数据进行统计:

GET sports/_search
{
  "size": 0,
  "aggs": {
    "age_histogram": {
      "histogram": {
        "field": "age_range",
        "interval": 3
      }
    }
  }
}

在这里,我们使用age_range来进行聚合统计。那么返回的结果是:

  "aggregations" : {
    "age_histogram" : {
      "buckets" : [
        {
          "key" : 27.0,
          "doc_count" : 12
        },
        {
          "key" : 30.0,
          "doc_count" : 22
        }
      ]
    }
  }

结果显示返回有两个bucket。第一个key为27的doc_count是12,我们知道在27-30 (因为我们的interval是3)岁之间的文档数是12个。第一个bucket刚好覆盖range1里的所有文档。而key为30的doc_count为22,也就是文档的总数。这是为什么呢?

Elasticsearch: range 数据类型及基于range的聚合 (7.4发行版新功能)_第3张图片

从上面可以看出来30岁这个年龄是跨两个range:range1及range2,所以当我们统计的时候其实是把range1和range2里所有的文档相加起来算起的,也就是整个文档的数量

当然如果我们把interval设置为2,我们在来看一下我们的统计结果:

GET sports/_search
{
  "size": 0,
  "aggs": {
    "age_histogram": {
      "histogram": {
        "field": "age_range",
        "interval": 2
      }
    }
  }
}

返回的结果是:

  "aggregations" : {
    "age_histogram" : {
      "buckets" : [
        {
          "key" : 26.0,
          "doc_count" : 12
        },
        {
          "key" : 28.0,
          "doc_count" : 12
        },
        {
          "key" : 30.0,
          "doc_count" : 22
        },
        {
          "key" : 32.0,
          "doc_count" : 10
        }
      ]
    }
  }

上面显示的第一个桶是26-27范围。因为27是在range 1里,由于range1里含有12个文档,所以返回的是12。同样针对key为28的情况,它的范围是28-29,由于29是在range1范围里,所以返回值也是12。对key为30的情况,因为它被包含在range1及range2里,那么返回的值等于range1及range2的总和,也就是22。针对key为32的情况,它的范围是32-34。因为32在range2里,而range2里只有10个文档,所以这个桶的值是range2的值,也就是10。

 

Elasticsearch: range 数据类型及基于range的聚合 (7.4发行版新功能)_第4张图片

 

 

你可能感兴趣的:(Elastic)