之前忽略了在搜索返回结果中的hits._score
和max_score
参数,其实都是指文档与指定的搜索查询匹配程度的相对度量,score
越高,匹配度越高。但查询并不总是需要产生分数,特别是当它们仅用于“过滤”文档集时,Elasticsearch会检测这些情况并自动优化查询执行,以便不计算无用的分数。bool
搜索和range
搜索都支持过滤操作,如(在bool
内部):
// 过滤得到20000<=balance<=30000的文档
GET /bank/_search
{
"query": {
"bool": {
"must": { "match_all": {} },
"filter": {
"range": {
"balance": {
"gte": 20000,
"lte": 30000
}
}
}
}
}
}
聚合提供了数据分组和统计的功能(可以像SQL中group by
那样去理解),在Elasticsearch中,执行搜索后返回结果时,在这个返回结果中将聚合结果与命中结果(就是实际返回hits
)分开(即分为搜索命中结果和聚合结果)。可以运行查询和多个聚合,并一次性获取两个(或任一个)操作的结果,从而避免使用简洁的API进行网络往返所耗费的时间。
下面是一个按用户所在的州state
为聚合的操作条件:
GET /bank/_search
{
"size": 0,
"aggs": {
"group_by_state": {
"terms": {
"field": "state.keyword"
}
}
}
}
上述aggs
就是用来指定聚合条件的,这里为了便于观察聚合结果,直接让返回的命中结果中具体结果数组显示0个"size": 0
,聚合结果也是默认显示top10,最终的结果为:
{
"took": 0,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 1000,
"max_score": 0,
// 这边数组直接显示0个
"hits": []
},
"aggregations": {
"group_by_state": {
"doc_count_error_upper_bound": 20,
"sum_other_doc_count": 770,
"buckets": [
{
"key": "ID",
"doc_count": 27
},
// 省略8个。。。
{
"key": "MO",
"doc_count": 20
}
]
}
}
}
返回的结果中aggregations
就是指聚合结果(注意它已经和命中结果hits
分开了),发现用户所在州为"ID"(Idaho,爱达荷州)有27个。。。又比如下面是按州state
来统计平均工资:
GET /bank/_search
{
"size": 0,
"aggs": {
"group_by_state": {
"terms": {
"field": "state.keyword"
},
"aggs": {
"average_balance": {
"avg": {
"field": "balance"
}
}
}
}
}
}
上述的命令中在group_by_state
聚合中嵌套了一个average_balance
的聚合(这种嵌套聚合很常见),在实际开发中可以任意嵌套聚合以提取所需要的信息,返回的聚合结果如下(其他不相关的以省略):
"aggregations": {
"group_by_state": {
"doc_count_error_upper_bound": 20,
"sum_other_doc_count": 770,
"buckets": [
{
"key": "ID",
"doc_count": 27,
"average_balance": {
"value": 24368.777777777777
}
},
// 省去8个
{
"key": "MO",
"doc_count": 20,
"average_balance": {
"value": 24151.8
}
}
]
}
}
按照州平均账户余额降序排列:
GET /bank/_search
{
"size": 0,
"aggs": {
"group_by_state": {
"terms": {
"field": "state.keyword",
"order": {
"average_balance": "desc"
}
},
"aggs": {
"average_balance": {
"avg": {
"field": "balance"
}
}
}
}
}
}
下面是先将年龄为20-29、30-39、40-49的依次分组,然后再按性别分组,最后按平均账户余额分组:
GET /bank/_search
{
"size": 0,
"aggs": {
"group_by_age": {
"range": {
"field": "age",
"ranges": [
{
"from": 20,
"to": 30
},
{
"from": 30,
"to": 40
},
{
"from": 40,
"to": 50
}
]
},
"aggs": {
"group_by_gender": {
"terms": {
"field": "gender.keyword"
},
"aggs": {
"average_balance": {
"avg": {
"field": "balance"
}
}
}
}
}
}
}
}
返回的聚合结果如下:
"aggregations": {
"group_by_age": {
"buckets": [
{
"key": "20.0-30.0",
"from": 20,
"to": 30,
"doc_count": 451,
"group_by_gender": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "M",
"doc_count": 232,
"average_balance": {
"value": 27374.05172413793
}
},
{
"key": "F",
"doc_count": 219,
"average_balance": {
"value": 25341.260273972603
}
}
]
}
},
{
"key": "30.0-40.0",
"from": 30,
"to": 40,
"doc_count": 504,
"group_by_gender": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "F",
"doc_count": 253,
"average_balance": {
"value": 25670.869565217392
}
},
{
"key": "M",
"doc_count": 251,
"average_balance": {
"value": 24288.239043824702
}
}
]
}
},
{
"key": "40.0-50.0",
"from": 40,
"to": 50,
"doc_count": 45,
"group_by_gender": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "M",
"doc_count": 24,
"average_balance": {
"value": 26474.958333333332
}
},
{
"key": "F",
"doc_count": 21,
"average_balance": {
"value": 27992.571428571428
}
}
]
}
}
]
}
}
当然,除了上面的聚合操作,还有更多聚合操作可以探索。