ES 6.3.2中聚合查询分为:桶聚合(Bucket aggregations)、指标聚合(Metrics aggregations)、流水线聚合(Pipeline aggregations)、矩阵聚合(Matrix Aggregations),下面介绍几种常见的聚合方式。
求某个指标的平均值:
// 查询条件,年级为高二、高三的数据
BoolQueryBuilder queryBuilder = QueryBuilders.boolQuery()
.filter(QueryBuilders.termsQuery("grade", "高二", "高三"));
// params集合
Map params = new HashMap<>();
params.put("correction", 2);
// 构建脚本
Script script = new Script(ScriptType.INLINE, "painless", "_value * params.correction", params);
// 平均值聚合语句
AvgAggregationBuilder avg = AggregationBuilders
.avg("avg_score")
.field("score")
.missing(0)
.script(script);
以上代码相当于以下命令:
{
"query": {
"bool": {
"adjust_pure_negative": true,
"filter": [{
"terms": {
"boost": 1.0,
"grade": ["高三", "高二"]
}
}],
"boost": 1.0
}
},
"aggs": {
"avg_grade": {
"avg": {
"field": "score",
"missing": 0,
"script": {
"lang": "painless",
"source": "_value * params.correction",
"params": {
"correction": 2
}
}
}
}
}
}
以下为对上面命令关键词的解释
query: 查询条件,用于圈定数据范围
aggs: 聚合条件
avg_grade: 聚合名称,聚合结果以此进行展示
avg: 命令,求平均值
field: 求平局值的字段
missing: 对于缺失值的数据采用的默认值
script: 脚本
lang: 脚本语言,painless
source: 脚本内容,此处表示数据的value值乘以params.correction,表示参数集合中的某个值
params: 参数集合,其中correction 值为2,此脚本表示数据值乘以2
Java代码解析返回值:
Map map = response.getAggregations().getAsMap();
Aggregation aggregation = map.get("avg_score");
InternalAvg internalAvg = (InternalAvg) aggregation;
double avgScore = internalAvg.getValue();
求一批数据中某个字段的最大值
Java代码如下:
BoolQueryBuilder queryBuilder = QueryBuilders.boolQuery()
.filter(QueryBuilders.termsQuery("grade", "高二", "高三"));
MaxAggregationBuilder max = AggregationBuilders
.max("max_score")
.field("score")
.missing(0);
等效查询命令如下:
{
"query": {
"bool": {
"adjust_pure_negative": true,
"filter": [{
"terms": {
"boost": 1.0,
"grade": ["高三", "高二"]
}
}],
"boost": 1.0
}
},
"aggs": {
"max_score": {
"max": {
"field": "score",
"missing": 0
}
}
}
}
Java代码解析结果:
Map map = response.getAggregations().getAsMap();
Aggregation aggregation = map.get("max_score");
InternalMax internalMax = (InternalMax) aggregation;
double maxScore = internalMax.getValue();
求一批数据中某个字段的最小值
Java代码如下:
BoolQueryBuilder queryBuilder = QueryBuilders.boolQuery()
.filter(QueryBuilders.termsQuery("grade", "高二", "高三"));
MinAggregationBuilder count = AggregationBuilders
.min("min_score")
.field("score")
.missing(0);
等效查询命令如下:
{
"query": {
"bool": {
"adjust_pure_negative": true,
"filter": [{
"terms": {
"boost": 1.0,
"grade": ["高三", "高二"]
}
}],
"boost": 1.0
}
},
"aggs": {
"min_score": {
"min": {
"field": "score",
"missing": 0
}
}
}
}
Java代码解析结果:
Map map = response.getAggregations().getAsMap();
Aggregation aggregation = map.get("min_score");
InternalMin internalMin = (InternalMin) aggregation;
double minScore = internalMin.getValue();
求一批数据中某个字段的和
Java代码如下:
BoolQueryBuilder queryBuilder = QueryBuilders.boolQuery()
.filter(QueryBuilders.termsQuery("grade", "高二", "高三"));
SumAggregationBuilder min = AggregationBuilders
.sum("sum_score")
.field("score")
.missing(0);
等效查询命令如下:
{
"query": {
"bool": {
"adjust_pure_negative": true,
"filter": [{
"terms": {
"boost": 1.0,
"grade": ["高三", "高二"]
}
}],
"boost": 1.0
}
},
"aggs": {
"sum_score": {
"sum": {
"field": "score",
"missing": 0
}
}
}
}
Java代码解析结果:
Map map = response.getAggregations().getAsMap();
Aggregation aggregation = map.get("sum_score");
InternalSum internalMin = (InternalSum) aggregation;
double sumScore = internalMin.getValue();
求一批数据中的个数
Java代码如下:
BoolQueryBuilder queryBuilder = QueryBuilders.boolQuery()
.filter(QueryBuilders.termsQuery("grade", "高二", "高三"));
ValueCountAggregationBuilder min = AggregationBuilders
.count("count_score")
.field("score")
.missing(0);
等效查询命令如下:
{
"query": {
"bool": {
"adjust_pure_negative": true,
"filter": [{
"terms": {
"boost": 1.0,
"grade": ["高三", "高二"]
}
}],
"boost": 1.0
}
},
"aggs": {
"count_score": {
"value_count": {
"field": "score",
"missing": 0
}
}
}
}
Java代码解析结果:
Map map = response.getAggregations().getAsMap();
Aggregation aggregation = map.get("count_score");
InternalValueCount internalValueCount = (InternalValueCount) aggregation;
double countValue = internalValueCount.getValue();
ES支持多个指标聚合,例如即查询平均值,又查询最大值、最小值
Java代码如下:
BoolQueryBuilder queryBuilder = QueryBuilders.boolQuery()
.filter(QueryBuilders.termsQuery("grade", "高二", "高三"));
MinAggregationBuilder min = AggregationBuilders
.min("min_score")
.field("score")
.missing(0);
MaxAggregationBuilder max = AggregationBuilders
.max("max_score")
.field("score")
.missing(0);
AvgAggregationBuilder avg = AggregationBuilders
.avg("avg_score")
.field("score")
.missing(0);
SearchResponse response = ES_DAO.getTransportClient()
.prepareSearch("student")
.setTypes("student")
.setSearchType(SearchType.QUERY_THEN_FETCH)
.setQuery(queryBuilder)
.addAggregation(avg)
.addAggregation(max)
.addAggregation(min)
.setExplain(false)
.execute()
.actionGet();
等效查询命令如下,查询命令包含多个聚合函数:
{
"query": {
"bool": {
"adjust_pure_negative": true,
"filter": [{
"terms": {
"boost": 1.0,
"grade": ["高三", "高二"]
}
}],
"boost": 1.0
}
},
"aggs": {
"max_score": {
"max": {
"field": "score",
"missing": 0
}
},
"min_score": {
"min": {
"field": "score",
"missing": 0
}
},
"avg_score": {
"avg": {
"field": "score",
"missing": 0
}
}
}
}
Java代码解析结果:
// 获取最大值
Map map = response.getAggregations().getAsMap();
Aggregation maxAgg = map.get("max_score");
InternalMax internalMax = (InternalMax) maxAgg;
double maxValue = internalMax.getValue();
System.out.println(maxValue);
// 获取最小值
Aggregation minAgg = map.get("min_score");
InternalMin internalMin = (InternalMin) minAgg;
double minValue = internalMin.getValue();
System.out.println(minValue);
// 获取平均值
Aggregation avgAgg = map.get("avg_score");
InternalAvg internalAvg = (InternalAvg) avgAgg;
double avgValue = internalAvg.getValue();
System.out.println(avgValue);
以上单个指标进行聚合的返回值,全部都继承了 SingleValue 这个抽象类,例如:InternalMax、InternalMin、InternalAvg
ES可以按照某些字段进行分组统计,java代码如下:
BoolQueryBuilder queryBuilder = QueryBuilders.boolQuery()
.filter(QueryBuilders.termsQuery("grade", "高二", "高三"));
// 按性别进行分组,然后指定排序规则
TermsAggregationBuilder sexTerm = AggregationBuilders.terms("sex_count").field("sex")
.missing("未知")
.size(3)
.order(BucketOrder.count(false));
// 按年龄进行分组,然后指定排序规则
TermsAggregationBuilder ageTerm = AggregationBuilders.terms("age_count").field("age")
.size(100)
.order(BucketOrder.count(false));
// 年龄分组是性别分组的子聚合
sexTerm.subAggregation(ageTerm);
// 执行ES查询统计
SearchResponse response = ES_DAO.getTransportClient()
.prepareSearch("student")
.setTypes("student")
.setSearchType(SearchType.QUERY_THEN_FETCH)
.setQuery(queryBuilder)
.addAggregation(sexTerm)
.setExplain(false)
.execute()
.actionGet();
// 解析分组结果
Map map = response.getAggregations().getAsMap();
Aggregation sexAgg = map.get("sex_count");
StringTerms stringTerms = (StringTerms) sexAgg;
for (Terms.Bucket teamBucket : stringTerms.getBuckets()) {
String sex = (String) teamBucket.getKey();
long count = teamBucket.getDocCount();
System.out.println("性别分组结果:" + sex + " " + count);
Map subAggMap = teamBucket.getAggregations().getAsMap();
for (Map.Entry entry : subAggMap.entrySet()) {
String key = entry.getKey();
Aggregation ageAgg = entry.getValue();
LongTerms terms = (LongTerms) ageAgg;
for (Terms.Bucket bucket : terms.getBuckets()) {
Long age = (Long) bucket.getKey();
long count1 = bucket.getDocCount();
System.out.println("性别年龄分组结果:" + sex + " " + age + " " + count1);
}
}
}
注意,分组时ES有一个默认返回数据个数,如果数据量大的话,最好通过size(n)指定一个分组返回的数据个数,否则会出现返回数据丢失的情况。
同kibana命令如下:
{
"query": {
"bool": {
"adjust_pure_negative": true,
"filter": [{
"terms": {
"boost": 1.0,
"grade": ["高三", "高二"]
}
}],
"boost": 1.0
}
},
"aggs": {
"sex_count": {
"terms": {
"shard_min_doc_count": 0,
"field": "sex",
"size": 3,
"missing": "未知",
"show_term_doc_count_error": false,
"min_doc_count": 1,
"order": [{
"_count": "desc"
}, {
"_key": "asc"
}]
},
"aggregations": {
"age_count": {
"terms": {
"shard_min_doc_count": 0,
"field": "age",
"size": 100,
"show_term_doc_count_error": false,
"min_doc_count": 1,
"order": [{
"_count": "desc"
}, {
"_key": "asc"
}]
}
}
}
}
}
}
这种多桶聚合类似于普通直方图,但它只能用于日期值。因为在Elasticsearch中,日期在内部表示为长值,所以在日期上也可以使用正态直方图,但不太准确。这两个api的主要区别在于,这里可以使用日期/时间表达式指定间隔。基于时间的数据需要特殊的支持,因为基于时间的间隔并不总是固定的长度。java代码如下:
// 查询条件
BoolQueryBuilder queryBuilder = QueryBuilders.boolQuery()
.filter(QueryBuilders.termsQuery("grade", "高二", "高三"));
// 按日期进行分组,然后指定排序规则
DateHistogramAggregationBuilder dateHis = AggregationBuilders
.dateHistogram("date_histogram")
.field("create_time")
.interval(1000 * 60 * 60 * 24)
.minDocCount(1)
.order(BucketOrder.key(true))
;
// 执行ES查询统计
SearchResponse response = ES_DAO.getTransportClient()
.prepareSearch("student")
.setTypes("student")
.setSearchType(SearchType.QUERY_THEN_FETCH)
.setQuery(queryBuilder)
.addAggregation(dateHis)
.setExplain(false)
.execute()
.actionGet();
// 解析分组结果
Map map = response.getAggregations().getAsMap();
Aggregation dateAgg = map.get("date_histogram");
InternalDateHistogram dateHistogram = (InternalDateHistogram) dateAgg;
for (InternalDateHistogram.Bucket entry : dateHistogram.getBuckets()) {
DateTime date = (DateTime) entry.getKey();
long count = entry.getDocCount();
System.out.println(date + " " + count);
}
interval: 统计日期间隔,单位为毫秒值,1000 * 60 * 60 * 24 表示按天进行统计
minDocCount: 最小文档数量,默认为0,会将当天没有数据的统计出来显示为0,如果需要连续这里可以填0,如无必要可以填一个所需的最小值。
order: 排序规则,按照key值即日期进行升序排列
同kibana命令如下:
{
"query": {
"bool": {
"adjust_pure_negative": true,
"filter": [{
"terms": {
"boost": 1.0,
"grade": ["高三", "高二"]
}
}],
"boost": 1.0
}
},
"aggs": {
"date_histogram": {
"date_histogram": {
"field": "create_time",
"offset": 0,
"interval": 86400000,
"keyed": false,
"min_doc_count": 1,
"order": {
"_key": "asc"
}
}
}
}
}
interval: 时间间隔还可以填:minute、hour、day、month、year等。
可应用于从文档中提取的数值。它在这些值上动态地构建固定大小(也称为间隔)的bucket。例如,如果文档有一个包含价格(数字)的字段,我们可以将此聚合配置为动态构建间隔为5的bucket(如果价格为5美元)。当执行聚合时,将对每个文档的price字段进行求值,并将其向下舍入到最近的bucket中—例如,如果price为32,bucket size为5,则舍入结果将为30,因此文档将“落入”与键30关联的bucket中。下面是使用的舍入函数:
bucket_key = Math.floor((value - offset) / interval) * interval + offset
java代码如下:
// 查询条件
BoolQueryBuilder queryBuilder = QueryBuilders.boolQuery()
.filter(QueryBuilders.termsQuery("grade", "高二", "高三"));
// 直方图聚合,分数按照 0.5分一个梯度进行统计,按照分值高低倒序排列
HistogramAggregationBuilder histogram = AggregationBuilders
.histogram("score_histogram")
.field("score")
.interval(0.5)
.order(BucketOrder.key(false))
.minDocCount(0);
// 执行ES查询统计
SearchResponse response = ES_DAO.getTransportClient()
.prepareSearch("student")
.setTypes("student")
.setSearchType(SearchType.QUERY_THEN_FETCH)
.setQuery(queryBuilder)
.addAggregation(histogram)
.setExplain(false)
.execute()
.actionGet();
// 解析分组结果
Map map = response.getAggregations().getAsMap();
Aggregation scoreHistogram = map.get("score_histogram");
InternalHistogram internalDateRange = (InternalHistogram) scoreHistogram;
for (InternalHistogram.Bucket entry : internalDateRange.getBuckets()) {
double key = (double) entry.getKey();
long count = entry.getDocCount();
System.out.println(key + " " + count);
}
同kibana命令如下:
{
"query": {
"bool": {
"adjust_pure_negative": true,
"filter": [{
"terms": {
"boost": 1.0,
"grade": ["高三", "高二"]
}
}],
"boost": 1.0
}
},
"size": 1,
"aggs": {
"score_histogram": {
"histogram": {
"field": "score",
"offset": 0.0,
"interval": 0.5,
"keyed": false,
"min_doc_count": 0,
"order": {
"_key": "desc"
}
}
}
}
}
一种基于字段数据的单bucket聚合,它创建一个bucket,其中包含当前文档集上下文中缺少字段值(实际上是缺少字段或配置了空值集)的所有文档。此聚合器通常与其他字段数据桶聚合器(例如范围)一起使用,以返回由于缺少字段数据值而无法放入任何其他桶中的所有文档的信息。即统计数据中某个字段值为空的数据量。
java代码如下:
// 查询条件
BoolQueryBuilder queryBuilder = QueryBuilders.boolQuery()
.filter(QueryBuilders.termsQuery("grade", "高二", "高三"));
// 聚合性别字段缺失的数据
MissingAggregationBuilder missingAggregationBuilder = AggregationBuilders
.missing("sex_miss")
.field("sex");
// 执行ES查询统计
SearchResponse response = ES_DAO.getTransportClient()
.prepareSearch("student")
.setTypes("student")
.setSearchType(SearchType.QUERY_THEN_FETCH)
.setQuery(queryBuilder)
.addAggregation(missingAggregationBuilder)
.setExplain(false)
.execute()
.actionGet();
// 解析分组结果
Map map = response.getAggregations().getAsMap();
Aggregation aggregation = map.get("sex_miss");
InternalMissing internalMissing = (InternalMissing) aggregation;
String name = internalMissing.getName();
long count = internalMissing.getDocCount();
System.out.println(name + " " + count);
同kibana命令如下:
{
"query": {
"bool": {
"adjust_pure_negative": true,
"filter": [{
"terms": {
"boost": 1.0,
"grade": ["高三", "高二"]
}
}],
"boost": 1.0
}
},
"size": 0,
"aggs": {
"sex_miss": {
"missing": {
"field": "sex"
}
}
}
}