开发验证过程中,ElasticSearch聚合时不显示桶的个数,在进行数据核对时非常麻烦。这里有几个解决方案:
java代码中发送查询后,返回response,buckets返回是一个数组,可以获取数组的大小,即聚合桶的数量。我知道这个解决方案可能被喷。
GET cn_order*/_search
{
"size":0,
"aggregations": {
"groupby": {
"terms": {
"script": {
"inline": "doc['order_id'].value+'-split-'+doc['merchant_id'].value"
},
"size": 200
},
"aggregations": {
"marketFee": {
"sum": {
"field": "market_fee"
}
}
}
}
}
}
使用terms,加script的好处是,即可以单键聚合,也可以多键聚合。返回值示例:
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 90,
"successful": 90,
"failed": 0
},
"hits": {
"total": 4,
"max_score": 0,
"hits": []
},
"aggregations": {
"groupby": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"doc_count": 2,
"marketFee": {
"value": 4.2
},
"key": "1437002_76-split-123”
},
{
"doc_count": 1,
"marketFee": {
"value": 2.1
},
"key": "1437002_77-split-234”
},
{
"doc_count": 1,
"marketFee": {
"value": 2.1
},
"key": “123_7759-split-345”
}
]
}
}
}
其中key的构造即为书写的doc[‘order_id’].value+’-split-’+doc[‘merchant_id’].value格式。中括号内的引号里包含键值。
可以使用辅助手段,在查询结果里构造一个key个数的计数,使之为1,然后在对这个计数进行汇总即可。直接上查询语句:
GET cn_energy_charge_bill*/_search
{
"size": 0,
"aggregations": {
“keycount": {
"sum_bucket": {
"buckets_path": "groupby>uniqueId"
}
},
"groupby": {
"terms": {
"script": {
"inline": "doc[‘order_id'].value+'-split-'+doc[’merchant_id’].value"
},
"size": 200
},
"aggregations": {
"marketFee": {
"sum": {
"field": "market_fee"
}
},
"uniqueId": {
"cardinality":{
"script": {
"inline": "doc[‘order_id'].value+'-split-'+doc[‘merchant_id’].value"
}
}
}
}
}
}
}
重点在聚合桶外部还有一个桶个数的聚合:在聚合桶的查询语句中增加一个key维度的聚合,并且进行cardinality去重,所以对于每个单独的桶,key只有一个,这里聚合结果只会是1;然后对key“uniqueId”进行二次汇总聚合,这个汇总即为桶的个数。查询结果如下:
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 90,
"successful": 90,
"failed": 0
},
"hits": {
"total": 4,
"max_score": 0,
"hits": []
},
"aggregations": {
"keycount": {
"value": 3
},
"aggregations": {
"groupby": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"doc_count": 2,
"marketFee": {
"value": 4.2
},
"key": "1437002_76-split-123”
},
{
"doc_count": 1,
"marketFee": {
"value": 2.1
},
"key": "1437002_77-split-234”
},
{
"doc_count": 1,
"marketFee": {
"value": 2.1
},
"key": “123_7759-split-345”,
"uniqueId": {
"value": 1
}
}
]
}
}
}
keycount的value即为有多少个桶,也可以看到uniqueId的值为1。这里有一个唯一的缺点是,查询时聚合的size值要设的足够大。我的查询设置为200,最后结果只有3个桶,所有桶计数是3.但是如果桶个数超过200个,那么显示200个桶,计数就是200。也就是说桶计数是基于查询出来的桶的个数,如果要显示所有桶的个数,显示的聚合桶的个数设置要大于桶聚合结果的个数。