Elasticsearch除搜索以外,还提供了针对数据统计分析的功能,通过各种API可以构建数据的复杂查询,不同类型的聚合查询都有自己的目的和输出,为了更好的理解这些类型,人们通常又会把它们分为三大类。
每个桶都与一个键和一个文档标准相关联,通过桶的聚合查询,我们将得到一个桶的列表,即:满足条件的文档集合。
计算一组文档的某些指标项的聚合
对其他聚合的输出或相关指标进行二次聚合
Bucket就类似于数据库中的分组,把满足条件的文档分为一组,Elasticsearch提供了很多类型的分组,比如有:range,geo、sample、term等
下面来看几个实际的例子
下面这个表示,查询索引为kibana_sample_data_flights中的文档数据,并按照DestCountry进行聚合查询,命名为:flight_dest,且只查询前5条。
GET /kibana_sample_data_flights/_search
{
"aggs": {
"flight_dest": {
"terms": {
"field": "DestCountry",
"size": 5
}
}
}
}
查询结果如下,前面是文档数据,最后是flight_dest信息
按照AvgTicketPrice属性,分为三档,分别为:小于500,500到1000,大于1000
GET /kibana_sample_data_flights/_search
{
"aggs": {
"price_ranges": {
"range": {
"field": "AvgTicketPrice",
"ranges": [
{
"to": 500
},
{
"from": 500,
"to": 1000
},
{
"from": 1000
}
]
}
}
}
}
聚合结果中的key也支持自定义命名,比如:
查询目的地是IT,且按照三类票价进行分组
基于时间范围的聚合查询
GET /user_info_2/_search
{
"aggs": {
"range": {
"date_range": {
"field": "update_date",
"ranges": [
{
"to": "2020-05-01 00:00:00"
},
{
"from": "2020-05-02 00:00:00",
"to": "2020-08-01 00:00:00"
},
{
"from": "2020-08-02 00:00:00"
}
]
}
}
}
}
查询结果
{
"took": 9,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 6,
"max_score": 1,
"hits": [
{
"_index": "user_info_2",
"_type": "_doc",
"_id": "8",
"_score": 1,
"_source": {
"age": "20",
"update_date": "2020-05-01 00:00:00"
}
},
{
"_index": "user_info_2",
"_type": "_doc",
"_id": "9",
"_score": 1,
"_source": {
"name": "赵六",
"update_date": "2020-08-01 00:00:00"
}
},
{
"_index": "user_info_2",
"_type": "_doc",
"_id": "10",
"_score": 1,
"_source": {
"age": null,
"update_date": "2020-11-01 00:00:00"
}
},
{
"_index": "user_info_2",
"_type": "_doc",
"_id": "2",
"_score": 1,
"_source": {
"name": "李四",
"age": 29,
"address": "中国南京市建邺区",
"tel": "13901234568",
"update_date": "2020-01-01 00:00:00"
}
},
{
"_index": "user_info_2",
"_type": "_doc",
"_id": "1",
"_score": 1,
"_source": {
"update_date": "2020-01-01 00:00:00"
}
},
{
"_index": "user_info_2",
"_type": "_doc",
"_id": "3",
"_score": 1,
"_source": {
"name": "王五",
"age": 30,
"address": "中国北京市朝阳区",
"tel": "13901234567",
"update_date": "2020-03-01 00:00:00"
}
}
]
},
"aggregations": {
"range": {
"buckets": [
{
"key": "*-2020-05-01 00:00:00",
"to": 1588291200000,
"to_as_string": "2020-05-01 00:00:00",
"doc_count": 3
},
{
"key": "2020-05-02 00:00:00-2020-08-01 00:00:00",
"from": 1588377600000,
"from_as_string": "2020-05-02 00:00:00",
"to": 1596240000000,
"to_as_string": "2020-08-01 00:00:00",
"doc_count": 0
},
{
"key": "2020-08-02 00:00:00-*",
"from": 1596326400000,
"from_as_string": "2020-08-02 00:00:00",
"doc_count": 1
}
]
}
}
}
对经过Filter条件过滤后的结果集进行聚合查询
如下表示,从DestCountry为AU的文档集中进行聚合查询,统计DistanceMiles的平均值。
GET /kibana_sample_data_flights/_search
{
"aggs": {
"flight_Miles": {
"filter": {
"term": {
"DestCountry": "AU"
}
},
"aggs": {
"avg_miles": {
"avg": {
"field": "DistanceMiles"
}
}
}
}
}
}
统计文档中缺失字段的数量,缺失字段包含值为null的情况
在user_info_2索引中,找缺失age的文档数
GET /user_info_2/_search
{
"aggs": {
"without_age": {
"missing": {
"field": "age"
}
}
}
}
统计结果为2,一个没有age字段,一个age字段值为null
{
"took": 10,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 6,
"max_score": 1,
"hits": [
{
"_index": "user_info_2",
"_type": "_doc",
"_id": "9",
"_score": 1,
"_source": {
"name": "赵六"
}
},
{
"_index": "user_info_2",
"_type": "_doc",
"_id": "8",
"_score": 1,
"_source": {
"age": "20"
}
},
{
"_index": "user_info_2",
"_type": "_doc",
"_id": "10",
"_score": 1,
"_source": {
"age": null
}
},
{
"_index": "user_info_2",
"_type": "_doc",
"_id": "2",
"_score": 1,
"_source": {
"name": "李四",
"age": 29,
"address": "中国南京市建邺区",
"tel": "13901234568"
}
},
{
"_index": "user_info_2",
"_type": "_doc",
"_id": "1",
"_score": 1,
"_source": {
"name": "张三",
"age": 28,
"address": "中国南京市鼓楼区",
"tel": "13901234567"
}
},
{
"_index": "user_info_2",
"_type": "_doc",
"_id": "3",
"_score": 1,
"_source": {
"name": "王五",
"age": 30,
"address": "中国北京市朝阳区",
"tel": "13901234567"
}
}
]
},
"aggregations": {
"without_age": {
"doc_count": 2
}
}
}
直方图聚合,可按照一定的区间进行统计
GET /kibana_sample_data_flights/_search
{
"aggs": {
"test": {
"histogram": {
"field": "AvgTicketPrice",
"interval": 100
}
}
}
}
查询结果如下