1、实现查询去重、分页
2、实现根据aifile.oid去重,create_time排序
DSL源码:
GET /aipage/_search
{
"query": {
"match": {
"status": 0
}
},
"sort": [
{
"create_time": {
"order": "desc"
}
}
],"aggs": {
"target_oid": {
"terms": {
"field": "aifile.oid",
"size": 10 //去重后查询出的数量
},"aggs": {
"rated": {
"top_hits": {
"sort": [{
"create_time": {"order": "desc"}
}],
"size": 10
}
}
}
}
},
"size": 0,
"from": 0
}
terms节点中的size参数规定了最后返回的term个数(默认是10个)
得到结果:
......
"aggregations": {
"file_oid": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 105410,
"buckets": [
{
"key": "3ABE5618-37D3-447E-91B3-EDA1F8AE4156",
"doc_count": 745,
"rated": {
......
}
},
{
"key": "980AD126-BAD7-45F0-9037-ACEE21DEFCCC",
"doc_count": 624,
"rated": {
......
}
}
]
}
}
可以看到,按照size=2,返回了链条去重后的结果,key值对应的就是aifile.oid,reted列表事对应的原始数据。
order排序
order指定了最后返回结果的排序方式,默认是按照doc_count排序。
{
"aggs" : {
"genders" : {
"terms" : {
"field" : "gender",
"order" : { "_count" : "asc" }
}
}
}
}
也可以按照字典方式排序:
{
"aggs" : {
"genders" : {
"terms" : {
"field" : "gender",
"order" : { "_term" : "asc" }
}
}
}
}
当然也可以通过order指定一个单值的metric聚合,来排序。
{
"aggs" : {
"genders" : {
"terms" : {
"field" : "gender",
"order" : { "avg_height" : "desc" }
},
"aggs" : {
"avg_height" : { "avg" : { "field" : "height" } }
}
}
}
}
同时也支持多值的Metric聚合,不过要指定使用的多值字段:
{
"aggs" : {
"genders" : {
"terms" : {
"field" : "gender",
"order" : { "height_stats.avg" : "desc" }
},
"aggs" : {
"height_stats" : { "stats" : { "field" : "height" } }
}
}
}
}
min_doc_count与shard_min_doc_count
聚合的字段可能存在一些频率很低的词条,如果这些词条数目比例很大,那么就会造成很多不必要的计算。
因此可以通过设置min_doc_count和shard_min_doc_count来规定最小的文档数目,只有满足这个参数要求的个数的词条才会被记录返回。
通过名字就可以看出:
min_doc_count:规定了最终结果的筛选
shard_min_doc_count:规定了分片中计算返回时的筛选
通常情况,terms聚合都是仅针对于一个字段的聚合。因为该聚合是需要把词条放入一个哈希表中,如果多个字段就会造成n^2的内存消耗。
不过,对于多字段,ES也提供了下面两种方式:
1 使用脚本合并字段
2 使用copy_to方法,合并两个字段,创建出一个新的字段,对新字段执行单个字段的聚合。
对于子聚合的计算,有两种方式:
默认情况下ES会使用深度优先,不过可以手动设置成广度优先,比如:
{
"aggs" : {
"actors" : {
"terms" : {
"field" : "actors",
"size" : 10,
"collect_mode" : "breadth_first"
},
"aggs" : {
"costars" : {
"terms" : {
"field" : "actors",
"size" : 5
}
}
}
}
}
}
缺省值指定了缺省的字段的处理方式:
{
"aggs" : {
"tags" : {
"terms" : {
"field" : "tags",
"missing": "N/A"
}
}
}
}
GET /aipage/_search
{
"query": {
"match": {
"status": 0
}
},
"sort": [
{
"create_time": {
"order": "desc"
}
}
],
"collapse":{
"field":"aifile.oid"
},
"size": 2,
"from": 0
}
注意:这里的size是2,不再是0,否则不会返回查询结果
返回结果:
{
......
"hits": {
"total": {
"value": 10000,
"relation": "gte"
},
"max_score": null,
"hits": [
{
......
"aifile": {
"oid": "C545E593-4D24-43C1-929F-49B45BE575A4",
......
},
......
},
"fields": {
"aifile.oid": [
"C545E593-4D24-43C1-929F-49B45BE575A4"
]
},
"sort": [
1569185780000
]
},
{
......
"aifile": {
"oid": "F1D1DCD0-37EA-4A33-8A8D-A4E94AE61CBC",
......
},
......
},
"fields": {
"aifile.oid": [
"F1D1DCD0-37EA-4A33-8A8D-A4E94AE61CBC"
]
},
"sort": [
1569184367000
]
}
]
}
}
方式二较方式一:
简化;
性能好很多。
Java实现
1)统计去重数目
public class EsTest {
public static void main(String[] args) {
Settings settings = Settings.settingsBuilder().put("cluster.name", "elasticsearch") // 设置集群名
.put("client.transport.ignore_cluster_name", true) // 忽略集群名字验证, 打开后集群名字不对也能连接上
.build();
TransportClient client = TransportClient.builder().settings(settings).build()
.addTransportAddress(new InetSocketTransportAddress(new InetSocketAddress("101.10.32.1", 9300)));
CardinalityBuilder cardinalityBuilder = AggregationBuilders.cardinality("uid_aggs").field("uid");
SearchRequestBuilder request = client.prepareSearch("user_onoffline_log")
.setTypes("logs")
.setSearchType(SearchType.QUERY_THEN_FETCH)
.setQuery(QueryBuilders.boolQuery()
.must(QueryBuilders.termQuery("uid", "")))
.addAggregation(cardinalityBuilder)
.setSize(1);
SearchResponse response = request.execute().actionGet();
List aggregationList = response.getAggregations().asList();
for (Aggregation aggregation : aggregationList) {
System.out.println(aggregation.getProperty("value"));
}
}
}
2)返回去重内容
public class EsTest {
public static void main(String[] args) {
Settings settings = Settings.settingsBuilder().put("cluster.name", "elasticsearch") // 设置集群名
.put("client.transport.ignore_cluster_name", true) // 忽略集群名字验证, 打开后集群名字不对也能连接上
.build();
TransportClient client = TransportClient.builder().settings(settings).build()
.addTransportAddress(new InetSocketTransportAddress(new InetSocketAddress("101.10.32.1", 9300)));
AggregationBuilder aggregationBuilder = AggregationBuilders
.terms("uid_aggs").field("uid").size(10000)
.subAggregation(AggregationBuilders.topHits("uid_top")
.addSort("offline_time", SortOrder.DESC)
.setSize(1));
SearchRequestBuilder request = client.prepareSearch("user_onoffline_log")
.setTypes("logs")
.setSearchType(SearchType.QUERY_THEN_FETCH)
.setQuery(QueryBuilders.boolQuery()
.must(QueryBuilders.termQuery("uid", "")))
.addAggregation(aggregationBuilder)
.setSize(1);
SearchResponse response = request.execute().actionGet();
Terms genders = response.getAggregations().get("uid_aggs");
for (Terms.Bucket entry : genders.getBuckets()) {
TopHits top = entry.getAggregations().get("uid_top");
for (SearchHit hit : top.getHits()) {
System.out.println(hit.getSource());
}
}
}
}