聚合查询是ES的一个非常强大的功能,在日常开发中经常使用。就像你用MySQL时,经常用到count,sum,max,min,group by,having等等。
学习ES的聚合功能,不用一开始掌握每一种类型,只要熟练使用开发中常用的就行,对应特定的类型,比如ip地址聚合,经纬度聚合,等开发中遇到了再深入。学习要先广度,后深度。
PUT /pigg/_doc/1
{
"name": "老亚瑟",
"age": 30,
"sex": "男",
"group": "日落圣殿",
"tag":["战士", "坦克"],
"date": "2019-12-26",
"friend": "安琪拉"
}
PUT /pigg/_doc/2
{
"name": "安琪拉",
"age": 16,
"sex": "女",
"group": "日落圣殿",
"tag":["法师"],
"date": "2019-01-01",
"friend": ""
}
PUT /pigg/_doc/3
{
"name": "凯",
"age": 28,
"sex": "男",
"group": "长城守卫军",
"tag":["战士"],
"date": "2020-01-01"
}
PUT /pigg/_doc/4
{
"name": "盾山",
"age": 38,
"sex": "男",
"group": "长城守卫军",
"tag":["辅助", "坦克"],
"date": "2020-02-02"
}
PUT /pigg/_doc/5
{
"name": "百里守约",
"age": 18,
"sex": "男",
"group": "长城守卫军",
"tag":["射手"],
"date": "2020-03-03"
}
PUT /pigg/_doc/6
{
"name": "李元芳",
"age": 15,
"sex": "男",
"group": "长安",
"tag":["刺客"],
"date": "2020-03-23"
}
PUT /pigg/_doc/7
{
"name": "陈咬金",
"age": 40,
"sex": "男",
"group": "长安",
"tag":["战士", "坦克"]
}
指标聚合参考之前我的博客Elasticsearch笔记(五) 指标聚合 SQL DSL JavaAPI
桶聚合就是把某个条件作为一个桶,满足这个条件的数据归属到这个桶里。
举例1:有很多彩色的气球,按照颜色桶聚合,桶1:红色的球,桶2:黄色的球,桶3:蓝色的球
通过上面例子,大概能了解桶聚合的作用了吧。另外用作聚合的字段,最好是keyword类型,虽然text也可以,但是要启用field_data属性,很影响性能。
terms桶聚合类似SQL的Group By功能,下面举例按照英雄的阵营terms聚合。
GET /pigg/_search
{
"size": 0,
"aggs": {
"group_by_group": {
"terms": {
"field": "group.keyword"
}
}
}
}
结果如下:
"buckets" : [
{
"key" : "长城守卫军",
"doc_count" : 3
},
{
"key" : "日落圣殿",
"doc_count" : 2
},
{
"key" : "长安",
"doc_count" : 2
}
]
aggs是可以再内嵌aggs的,举例:分别统计每个阵营的男女人数。
GET /pigg/_search
{
"size": 0,
"aggs": {
"group_by_group": {
"terms": {
"field": "group.keyword"
},
"aggs": {
"group_by_sex": {
"terms": {
"field": "sex.keyword"
}
}
}
}
}
}
结果如下:
"buckets" : [
{
"key" : "长城守卫军",
"doc_count" : 3,
"group_by_sex" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "男",
"doc_count" : 3
}
]
}
},
{
"key" : "日落圣殿",
"doc_count" : 2,
"group_by_sex" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "女",
"doc_count" : 1
},
{
"key" : "男",
"doc_count" : 1
}
]
}
},
{
"key" : "长安",
"doc_count" : 2,
"group_by_sex" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "男",
"doc_count" : 2
}
]
}
}
]
GET /pigg/_search
{
"size": 0,
"aggs": {
"count_of_changan": {
"filter": {
"term": {
"group.keyword": "长安"
}
}
}
}
}
上面等用于
GET /pigg/_count
{
"query": {
"bool": {
"filter": {
"term": {
"group.keyword": "长安"
}
}
}
}
}
GET /pigg/_search
{
"size":0,
"aggs":{
"avg_age_of_changan":{
"filter":{
"term":{
"group.keyword":"长安"
}
},
"aggs":{
"avg_age":{
"avg":{
"field":"age"
}
}
}
}
}
}
结果如下:
"aggregations" : {
"avg_age_of_changan" : {
"doc_count" : 2,
"avg_age" : {
"value" : 27.5
}
}
}
filters聚合是定义多个桶,把符合某个桶的过滤条件的数据放到这个桶里。
GET /pigg/_search
{
"aggs":{
"count_of_tag":{
"filters":{
"filters":{
"tag_战士":{
"term":{
"tag.keyword":"战士"
}
},
"tag_刺客":{
"term":{
"tag.keyword":"刺客"
}
}
}
}
}
}
}
结果如下:
"aggregations" : {
"count_of_tag" : {
"buckets" : {
"tag_刺客" : {
"doc_count" : 1
},
"tag_战士" : {
"doc_count" : 3
}
}
}
}
GET /pigg/_search
{
"size": 0,
"aggs":{
"count_of_tag":{
"filters":{
"filters":{
"姓李":{
"prefix":{
"name.keyword":"李"
}
},
"姓陈":{
"prefix":{
"name.keyword":"陈"
}
},
"年龄>=20":{
"range":{
"age":{
"gte":20
}
}
}
}
}
}
}
}
结果如下:
"aggregations" : {
"count_of_tag" : {
"buckets" : {
"姓李" : {
"doc_count" : 1
},
"姓陈" : {
"doc_count" : 1
},
"年龄>=20" : {
"doc_count" : 4
}
}
}
}
range区间聚合是先划分一个值的区间,文档的那个字段属于哪个区间,就把文档归属到哪个桶。
range聚合用"from"和"to"来定义值区间,是左闭右开的,from 30 to 40是包括30,但不包含40。
举例:按照年龄range聚合
GET /pigg/_search
{
"aggs": {
"age_rang": {
"range": {
"field": "age",
"missing": 0,
"ranges": [
{
"to": 30
},
{
"from": 30,
"to": 40
},
{
"from": 40
}
]
}
}
}
}
结果如下:
"buckets" : [
{
"key" : "*-30.0",
"to" : 30.0,
"doc_count" : 4
},
{
"key" : "30.0-40.0",
"from" : 30.0,
"to" : 40.0,
"doc_count" : 2
},
{
"key" : "40.0-*",
"from" : 40.0,
"doc_count" : 1
}
]
GET /pigg/_search
{
"size":0,
"aggs":{
"range":{
"date_range":{
"field":"date",
"format": "yyyy-MM-dd",
"ranges":[
{
"from":"now-7d/d",
"to":"now"
}
]
}
}
}
}
结果如下:
"aggregations" : {
"range" : {
"buckets" : [
{
"key" : "2020-03-16-2020-03-23",
"from" : 1.5843168E12,
"from_as_string" : "2020-03-16",
"to" : 1.584954696781E12,
"to_as_string" : "2020-03-23",
"doc_count" : 1
}
]
}
}
基于日期类型字段,以日期间隔来分桶聚合。可用的时间间隔类型为:year、quarter、month、week、day、hour、minute、second,其中,除了year、quarter 和 month,其余可用小数形式。
GET /pigg/_search
{
"size":0,
"aggs":{
"dates":{
"date_histogram":{
"field":"date",
"interval":"month",
"format":"yyyy-MM-dd"
}
}
}
}
返回结果如下:
"dates" : {
"buckets" : [
{
"key_as_string" : "2019-01-01",
"key" : 1546300800000,
"doc_count" : 1
},
{
"key_as_string" : "2019-02-01",
"key" : 1548979200000,
"doc_count" : 0
},
{
"key_as_string" : "2019-03-01",
"key" : 1551398400000,
"doc_count" : 0
},
{
"key_as_string" : "2019-04-01",
"key" : 1554076800000,
"doc_count" : 0
},
{
"key_as_string" : "2019-05-01",
"key" : 1556668800000,
"doc_count" : 0
},
{
"key_as_string" : "2019-06-01",
"key" : 1559347200000,
"doc_count" : 0
},
{
"key_as_string" : "2019-07-01",
"key" : 1561939200000,
"doc_count" : 0
},
{
"key_as_string" : "2019-08-01",
"key" : 1564617600000,
"doc_count" : 0
},
{
"key_as_string" : "2019-09-01",
"key" : 1567296000000,
"doc_count" : 0
},
{
"key_as_string" : "2019-10-01",
"key" : 1569888000000,
"doc_count" : 0
},
{
"key_as_string" : "2019-11-01",
"key" : 1572566400000,
"doc_count" : 0
},
{
"key_as_string" : "2019-12-01",
"key" : 1575158400000,
"doc_count" : 1
},
{
"key_as_string" : "2020-01-01",
"key" : 1577836800000,
"doc_count" : 1
},
{
"key_as_string" : "2020-02-01",
"key" : 1580515200000,
"doc_count" : 1
},
{
"key_as_string" : "2020-03-01",
"key" : 1583020800000,
"doc_count" : 2
}
]
}
GET /pigg/_search
{
"size":0,
"aggs":{
"dates":{
"date_histogram":{
"field":"date",
"interval":"month",
"format":"yyyy-MM",
"min_doc_count":1
}
}
}
}
返回结果如下:返回了至少有1个人的月份
"dates" : {
"buckets" : [
{
"key_as_string" : "2019-01",
"key" : 1546300800000,
"doc_count" : 1
},
{
"key_as_string" : "2019-12",
"key" : 1575158400000,
"doc_count" : 1
},
{
"key_as_string" : "2020-01",
"key" : 1577836800000,
"doc_count" : 1
},
{
"key_as_string" : "2020-02",
"key" : 1580515200000,
"doc_count" : 1
},
{
"key_as_string" : "2020-03",
"key" : 1583020800000,
"doc_count" : 2
}
]
}
统计2020年1月之后的数据
GET /pigg/_search
{
"size":0,
"query": {
"bool": {
"filter": {
"range": {
"date": {
"gte": "2020-01-01"
}
}
}
}
},
"aggs":{
"dates":{
"date_histogram":{
"field":"date",
"interval":"month",
"format":"yyyy-MM",
"min_doc_count":1
}
}
}
}
返回结果如下:
"dates" : {
"buckets" : [
{
"key_as_string" : "2020-01",
"key" : 1577836800000,
"doc_count" : 1
},
{
"key_as_string" : "2020-02",
"key" : 1580515200000,
"doc_count" : 1
},
{
"key_as_string" : "2020-03",
"key" : 1583020800000,
"doc_count" : 2
}
]
}
统计没有friend值的人员个数
亚瑟的friend值=“安琪拉”,安琪拉friend="",虽然是"",但是还是算存在的。
#总人数是7,有friend是2个人,所以返回值是5
POST /pigg/_search?size=0
{
"aggs" : {
"account_without_friend" : {
"missing" : { "field" : "friend.keyword" }
}
}
}
折叠查询非常方便,根据某个字段分组,并且取其中第一个,其下面展开的值也可以同时查出来。
GET /pigg/_search
{
"collapse": {
"field": "group.keyword",
"inner_hits":{
"name": "old_age",
"size": 2,
"sort": [{"age": "desc"}]
}
},
"sort": [
{
"age": {
"order": "desc"
}
}
]
}
结果显示查出的阵营是按照里面年纪最大的人的年纪排序,比如陈咬金40岁,老亚瑟30岁,他们分别是各自阵营的年龄最大的人,陈咬金的年纪>老亚瑟的年纪,所以长安排名比日落圣殿高。
"hits" : [
{
"_index" : "pigg",
"_type" : "_doc",
"_id" : "7",
"_score" : null,
"_source" : {
"name" : "陈咬金",
"age" : 40,
"sex" : "男",
"group" : "长安",
"tag" : [
"战士",
"坦克"
]
},
"fields" : {
"group.keyword" : [
"长安"
]
},
"sort" : [
40
],
"inner_hits" : {
"old_age" : {
"hits" : {
"total" : 2,
"max_score" : null,
"hits" : [
{
"_index" : "pigg",
"_type" : "_doc",
"_id" : "7",
"_score" : null,
"_source" : {
"name" : "陈咬金",
"age" : 40,
"sex" : "男",
"group" : "长安",
"tag" : [
"战士",
"坦克"
]
},
"sort" : [
40
]
},
{
"_index" : "pigg",
"_type" : "_doc",
"_id" : "6",
"_score" : null,
"_source" : {
"name" : "李元芳",
"age" : 15,
"sex" : "男",
"group" : "长安",
"tag" : [
"刺客"
],
"date" : "2020-03-23"
},
"sort" : [
15
]
}
]
}
}
}
},
{
"_index" : "pigg",
"_type" : "_doc",
"_id" : "4",
"_score" : null,
"_source" : {
"name" : "盾山",
"age" : 38,
"sex" : "男",
"group" : "长城守卫军",
"tag" : [
"辅助",
"坦克"
],
"date" : "2020-02-02"
},
"fields" : {
"group.keyword" : [
"长城守卫军"
]
},
"sort" : [
38
],
"inner_hits" : {
"old_age" : {
"hits" : {
"total" : 3,
"max_score" : null,
"hits" : [
{
"_index" : "pigg",
"_type" : "_doc",
"_id" : "4",
"_score" : null,
"_source" : {
"name" : "盾山",
"age" : 38,
"sex" : "男",
"group" : "长城守卫军",
"tag" : [
"辅助",
"坦克"
],
"date" : "2020-02-02"
},
"sort" : [
38
]
},
{
"_index" : "pigg",
"_type" : "_doc",
"_id" : "3",
"_score" : null,
"_source" : {
"name" : "凯",
"age" : 28,
"sex" : "男",
"group" : "长城守卫军",
"tag" : [
"战士"
],
"date" : "2020-01-01"
},
"sort" : [
28
]
}
]
}
}
}
},
{
"_index" : "pigg",
"_type" : "_doc",
"_id" : "1",
"_score" : null,
"_source" : {
"name" : "老亚瑟",
"age" : 30,
"sex" : "男",
"group" : "日落圣殿",
"tag" : [
"战士",
"坦克"
],
"date" : "2019-12-26",
"friend" : "安琪拉"
},
"fields" : {
"group.keyword" : [
"日落圣殿"
]
},
"sort" : [
30
],
"inner_hits" : {
"old_age" : {
"hits" : {
"total" : 2,
"max_score" : null,
"hits" : [
{
"_index" : "pigg",
"_type" : "_doc",
"_id" : "1",
"_score" : null,
"_source" : {
"name" : "老亚瑟",
"age" : 30,
"sex" : "男",
"group" : "日落圣殿",
"tag" : [
"战士",
"坦克"
],
"date" : "2019-12-26",
"friend" : "安琪拉"
},
"sort" : [
30
]
},
{
"_index" : "pigg",
"_type" : "_doc",
"_id" : "2",
"_score" : null,
"_source" : {
"name" : "安琪拉",
"age" : 16,
"sex" : "女",
"group" : "日落圣殿",
"tag" : [
"法师"
],
"date" : "2019-01-01",
"friend" : ""
},
"sort" : [
16
]
}
]
}
}
}
}
]