ES的聚合
我们还有一个需求需要完成:允许管理者在职员目录中进行一些分析。 Elasticsearch有一个功能叫做聚合(aggregations),它允许你在数据上生成复杂的分析统计。它很像SQL中的GROUP BY
但是功能更强大。
举个例子,让我们找到所有职员中最大的共同点(兴趣爱好)是什么:
GET /megacorp/employee/_search
{
"aggs": {
"all_interests": {
"terms": { "field": "interests" }
}
}
}
暂时先忽略语法只看查询结果:
{ "took": 229, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 3, "max_score": 1, "hits": [ { "_index": "megacorp", "_type": "employee", "_id": "1", "_score": 1, "_source": { "first_name": "John", "last_name": "Smith", "age": 32, "about": "I like to collect rock albums", "interests": [ "music" ] } }, { "_index": "megacorp", "_type": "employee", "_id": "2", "_score": 1, "_source": { "first_name": "Lily", "last_name": "Smith", "age": 29, "about": "I like to go shopping!", "interests": [ "music" ] } }, { "_index": "megacorp", "_type": "employee", "_id": "3", "_score": 1, "_source": { "first_name": "Tom", "last_name": "Smith", "age": 18, "about": "I like to play basketball!", "interests": [ "music" ] } } ] }, "aggregations": { "all_interests": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "music", "doc_count": 3 } ] } } }
我们可以看到两个职员对音乐有兴趣,一个喜欢林学,一个喜欢运动。这些数据并没有被预先计算好,它们是实时的从匹配查询语句的文档中动态计算生成的。如果我们想知道所有名为"Tom"的人最大的共同点(兴趣爱好),我们只需要增加合适的语句既可:
GET /megacorp/employee/_search
{
"query": {
"match": {
"first_name": "Tom"
}
},
"aggs": {
"all_interests": {
"terms": {
"field": "interests"
}
}
}
}
all_interests
聚合已经变成只包含和查询语句相匹配的文档了:
{ "took": 5, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 1, "max_score": 0.30685282, "hits": [ { "_index": "megacorp", "_type": "employee", "_id": "3", "_score": 0.30685282, "_source": { "first_name": "Tom", "last_name": "Smith", "age": 18, "about": "I like to play basketball!", "interests": [ "music" ] } } ] }, "aggregations": { "all_interests": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "music", "doc_count": 1 } ] } } }
聚合也允许分级汇总。例如,让我们统计每种兴趣下职员的平均年龄:
GET /megacorp/employee/_search
{
"aggs" : {
"all_interests" : {
"terms" : { "field" : "interests" },
"aggs" : {
"avg_age" : {
"avg" : { "field" : "age" }
}
}
}
}
}
"aggregations": { "all_interests": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "music", "doc_count": 3, "avg_age": { "value": 26.333333333333332, "value_as_string": "26.333333333333332" } } ] } }
该聚合结果比之前的聚合结果要更加丰富。我们依然得到了兴趣以及数量(指具有该兴趣的员工人数)的列表,但是现在每个兴趣额外拥有avg_age
字段来显示具有该兴趣员工的平均年龄。
即使你还不理解语法,但你也可以大概感觉到通过这个特性可以完成相当复杂的聚合工作,你可以处理任何类型的数据。