聚合分析是数据库中重要的功能特性,完成对一个查询的数据集中数据的聚合计算,如:找出某字段(或计算表达式的结果)的最大值、最小值,计算和、平均值等。ES作为搜索引擎兼数据库,同样提供了强大的聚合分析能力。
对一个数据集求最大、最小、和、平均值等指标的聚合,在ES中称为指标聚合 metric而关系型数据库中除了有聚合函数外,还可以对查询出的数据进行分组group by,再在组上进行指标聚合。在 ES 中group by 称为分桶,桶聚合 bucket
聚合参考
TransportClient查询条件聚合处理对象 AggregationBuilder。
//聚合处理
SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
// 统计个数
AbstractAggregationBuilder valueCountAggregationBuilder = AggregationBuilders.count("count").field("name");
// 总和, 平均值, 最大值,最小值
AbstractAggregationBuilder sumAggregationBuilder = AggregationBuilders.sum("sum").field("score");
AbstractAggregationBuilder avgAggregationBuilder = AggregationBuilders.avg("avg").field("score");
AbstractAggregationBuilder maxAggregationBuilder = AggregationBuilders.max("max").field("score");
AbstractAggregationBuilder minAggregationBuilder = AggregationBuilders.min("min").field("score");
sourceBuilder.aggregation(valueCountAggregationBuilder).aggregation(sumAggregationBuilder).aggregation(avgAggregationBuilder)
.aggregation(maxAggregationBuilder).aggregation(minAggregationBuilder);
try {
//查询索引对象
SearchRequest searchRequest = new SearchRequest(index);
searchRequest.types(type);
searchRequest.source(sourceBuilder);
SearchResponse response = client.search(searchRequest).get();
System.out.println(response);
} catch (InterruptedException | ExecutionException e) {
e.printStackTrace();
}
需要用到一个概念叫子聚合,就是在进行一次聚合分组完成后,形成一个中间表数据,再针对这个进行一次聚合例如更具名称分组,并求出这个人的分数的最大、最小、总和、平均值。
分组使用的对象的 TermsAggregationBuilder,例如构建一个分组:
TermsAggregationBuilder aggregation = AggregationBuilders.terms("name").field("name").order(Terms.Order.aggregation("name", true));
order()方法可以对数据进行排序,size方法可以控制统计数据显示的条数默认10。
aggregation.size(10);
分组后进行聚合:
sql :
select sum(score), avg(score),count(name),max(score), min(score) from table group by name
注意代码里面的重点1,2,3此处不这样传值就是另外一个结果。
//聚合处理
SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
// 统计个数
AbstractAggregationBuilder valueCountAggregationBuilder = AggregationBuilders.count("count").field("name");
// 总和, 平均值, 最大值,最小值
AbstractAggregationBuilder sumAggregationBuilder = AggregationBuilders.sum("sum").field("score");
AbstractAggregationBuilder avgAggregationBuilder = AggregationBuilders.avg("avg").field("score");
AbstractAggregationBuilder maxAggregationBuilder = AggregationBuilders.max("max").field("score");
AbstractAggregationBuilder minAggregationBuilder = AggregationBuilders.min("min").field("score");
// sourceBuilder.aggregation(valueCountAggregationBuilder).aggregation(sumAggregationBuilder).aggregation(avgAggregationBuilder)
// .aggregation(maxAggregationBuilder).aggregation(minAggregationBuilder);
//重点1 分组
TermsAggregationBuilder aggregation = AggregationBuilders.terms("name").field("name").order(Terms.Order.aggregation("name", true));
//重点2 子聚合
aggregation.subAggregation(valueCountAggregationBuilder).subAggregation(sumAggregationBuilder).subAggregation(avgAggregationBuilder).
subAggregation(maxAggregationBuilder).subAggregation(minAggregationBuilder);
//重点3 添加aggregation到sourceBuilder
sourceBuilder.aggregation(aggregation);
try {
//查询索引对象
SearchRequest searchRequest = new SearchRequest(index);
searchRequest.types(type);
searchRequest.source(sourceBuilder);
SearchResponse response = client.search(searchRequest).get();
System.out.println(response);
} catch (InterruptedException | ExecutionException e) {
e.printStackTrace();
}
需要用到一个概念叫子聚合,就是在进行一次聚合分组完成后,形成一个中间表数据,再针对这个进行一次聚合例如更具名称分组,并求出这个人的分数的最大、最小、总和、平均值。
对字段分使用到一个对象: Script 相关资料可以查询 ES Script官网介绍。
由于ES不支持多个字段分组处理,于是只能自己写script脚本来进行多字段分组实现思想:
1.将分组字段的值取出来使用特殊字符进行分割,实现一个单字符字段,
例如将name,age,sex 值组合 特殊分隔符为;
id | name | age | sex | score |
1 | 小明 | 3 | 男 | 60 |
2 | 小红 | 2 | 女 | 80 |
1 | 小明 | 3 | 男 | 70 |
2 | 小红 | 2 | 女 | 90 |
1 | 小明 | 3 | 男 | 80 |
转换后
id | name | sum | avg | max | min |
1 | 小明3男 | 210 | 70 | 80 | 60 |
2 | 小红2女 | 170 | 85 | 90 | 80 |
使用ES单字段聚合,对name在进行字符分割处理,使用name.split(""),处理后列表,以曲线救国方式来实现多字段分组。
id | name | age | sex | sum | avg | max | min |
1 | 小明 | 3 | 男 | 210 | 70 | 80 | 60 |
2 | 小红 | 2 | 女 | 170 | 85 | 90 | 80 |
sql :使用单字段方法聚合分组处理,
select sum(score), avg(score),max(score), min(score) from table group by name, age , sex
//聚合处理
SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
// 总和, 平均值, 最大值,最小值
AbstractAggregationBuilder sumAggregationBuilder = AggregationBuilders.sum("sum").field("score");
AbstractAggregationBuilder avgAggregationBuilder = AggregationBuilders.avg("avg").field("score");
AbstractAggregationBuilder maxAggregationBuilder = AggregationBuilders.max("max").field("score");
AbstractAggregationBuilder minAggregationBuilder = AggregationBuilders.min("min").field("score");
// sourceBuilder.aggregation(valueCountAggregationBuilder).aggregation(sumAggregationBuilder).aggregation(avgAggregationBuilder)
// .aggregation(maxAggregationBuilder).aggregation(minAggregationBuilder);
//此处定义分割线方便后面处理
String SEPARATOR = "";
//重点1 分组
TermsAggregationBuilder aggregation = AggregationBuilders.terms("name").field("name").order(Terms.Order.aggregation("name", true));
String scriptStr = "doc['name'].value +'" + SEPARATOR + "' + doc['age'].value" + SEPARATOR + "' + doc['sex'].value";
Script script = new Script(ScriptType.INLINE, Script.DEFAULT_SCRIPT_LANG, scriptStr, new HashMap<>());
//重点2 子聚合添加script
aggregation.script(script).subAggregation(sumAggregationBuilder).subAggregation(avgAggregationBuilder).
subAggregation(maxAggregationBuilder).subAggregation(minAggregationBuilder);
//重点3 添加aggregation到sourceBuilder
sourceBuilder.aggregation(aggregation);
try {
//查询索引对象
SearchRequest searchRequest = new SearchRequest(index);
searchRequest.types(type);
searchRequest.source(sourceBuilder);
SearchResponse searchResponse = client.search(searchRequest).get();
System.out.println(searchResponse);
} catch (InterruptedException | ExecutionException e) {
e.printStackTrace();
}