1、两个核心概念:bucket和metric
bucket:一个数据分组
Metric:就是对一个bucket执行的某种聚合分析的操作,比如说求平均值,求最大值,求最小值
2、聚合介绍及下钻分析
统计哪种颜色的电视销量最高
GET /tvs/sales/_search
{
"size" : 0,
"aggs" : {
"popular_colors" : {
"terms" : {
"field" : "color"
}
}
}
}
size:只获取聚合结果,而不要执行聚合的原始数据
aggs:固定语法,要对一份数据执行分组聚合操作
popular_colors:就是对每个aggs,都要起一个名字,这个名字是随机的,你随便取什么都ok
terms:根据字段的值进行分组
field:根据指定的字段的值进行分组
GET /tvs/sales/_search
{
"size" : 0,
"aggs": {
"colors": {
"terms": {
"field": "color"
},
"aggs": {
"avg_price": {
"avg": {
"field": "price"
}
}
}
}
}
}
这个第二个aggs内部,同样取个名字,执行一个metric操作,avg,对之前的每个bucket中的数据的指定的field求一个平均值
{
"took": 28,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 8,
"max_score": 0,
"hits": []
},
"aggregations": {
"group_by_color": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "红色",
"doc_count": 4,
"avg_price": {
"value": 3250
}
},
{
"key": "绿色",
"doc_count": 2,
"avg_price": {
"value": 2100
}
}
}
]
}
}
}
在计算出每各颜色中的有多少台及评均价格后,再计算里面分别是些什么牌子的,每个牌子的平均价格是多少
GET /tvs/sales/_search
{
"size": 0,
"aggs": {
"group_color": {
"terms": {
"field": "color"
},
"aggs": {
"avg_price": {
"avg": {
"field": "price"
}
},
"group_brand":{
"terms": {
"field": "brand"
},
"aggs": {
"brand_avg": {
"avg": {
"field": "price"
}
}
}
}
}
}
}
}
3、Sun max min avg的用法
GET /tvs/sales/_search
{
"size": 0,
"aggs": {
"group_by_color": {
"terms": {
"field": "color"
},
"aggs": {
"avg_price": {"avg": {"field": "price"}
},
"max_price":{ "max": { "field": "price"}
},
"min_price":{"min": {"field": "price"}
},
"sum_price":{ "sum": {"field": "price"}}
}
}
}
}
4、histogram 通过interval指定区间聚合查询
划分范围,02000,20004000,buckets
histogram:类似于terms,也是进行bucket分组操作,接收一个field,按照这个field的值的各个范围区间,进行bucket分组操作
"histogram":{
"field": "price",
"interval": 2000
},
Order:对指定的聚合进行排序,一般放在想要进行排序的上一层
GET /tvs/sales/_search
{
"size": 0,
"aggs": {
"his_price": {
"histogram": {
"field": "price",
"interval": 2000,
"order": {
"sum_price": "asc"
}
},
"aggs": {
"sum_price": {
"sum": {
"field": "price"
}
}
}
}
}
}
5、date histogram,按照我们指定的某个date类型的日期field,以及日期interval,按照一定的日期间隔,去划分bucket
GET /tvs/sales/_search
{
"size": 0,
"aggs": {
"group_date": {
"date_histogram": {
"field": "sold_date",
"interval": "month",
"format": "yyyy-MM-dd",
"min_doc_count": 0,
"extended_bounds":{
"min":"2016-01-01",
"max":"2017-12-31"
}
}
}
}
}
min_doc_count:0即使某个日期interval,2017-01-01~2017-01-31中,一条数据都没有,那么这个区间也是要返回的,默认不会过滤掉这个区间的,1过虑
extended_bounds,min,max:划分bucket的时候,会限定在这个起始日期,和截止日期内
6、分析之统计每季度每个品牌的销售额
GET /tvs/sales/_search
{
"size": 0,
"aggs": {
"group_date": {
"date_histogram": {
"field": "sold_date",
"interval": "quarter",
"format": "yyyy-MM-dd",
"min_doc_count": 1,
"extended_bounds":{
"min":"2016-01-01",
"max":"2017-12-31"
}
},
"aggs": {
"group_brand": {
"terms": {
"field": "brand"
},
"aggs": {
"sum_price": {
"sum": {
"field": "price"
}
}
}
},
"sum_quarter":{
"sum": {
"field": "price"
}
}
}
}
}
}
Quarter:按季度
7、单个品牌与所有品牌销量对比
GET /tvs/sales/_search
{
"size": 0,
"query": {
"term": {
"brand": {
"value": "长虹"
}
}
},
"aggs": {
"single_avg": {
"avg": {
"field": "price"
}
},
"all_qu":{
"global": {},
"aggs": {
"all_avg": {
"avg": {
"field": "price"
}
}
}
}
}
}
8、bucket filter:对查询出来的进行多时间聚合分析
GET /tvs/sales/_search
{
"size": 0,
"query": {
"term": {
"brand": {
"value": "长虹"
}
}
},
"aggs": {
"recent_150d": {
"filter": {
"range": {
"sold_date": {
"gte": "now-150d"
}
}
},
"aggs": {
"recent_150d_avg_price": {
"avg": {
"field": "price"
}
}
}
},
"recent_140d": {
"filter": {
"range": {
"sold_date": {
"gte": "now-140d"
}
}
},
"aggs": {
"recent_140d_avg_price": {
"avg": {
"field": "price"
}
}
}
}
}
}