接续上篇,本篇介绍elasticsearch聚合查询,使用python库elasticsearch-dsl进行聚合查询操作。
7.3、聚合查询
高阶概念
- Buckets(桶/集合):满足特定条件的文档的集合
-
Metrics(指标):对桶内的文档进行统计计算(例如最小值,求和,最大值等)
- 新建一张测试表
1 PUT cars 2 { 3 "mappings": { 4 "transactions":{ 5 "properties": { 6 "price":{ 7 "type": "integer" 8 }, 9 "color":{ 10 "type": "text", 11 "fielddata": true 12 }, 13 "make":{ 14 "type": "text", 15 "fielddata": true 16 }, 17 "sold":{ 18 "type": "date", 19 "format": "yyyy-MM-dd" 20 } 21 } 22 } 23 } 24 }
插入数据
1 POST /cars/transactions/_bulk 2 { "index": {"_index": "cars", "_type": "transactions"}} 3 { "price" : 10000, "color" : "red", "make" : "honda", "sold" : "2014-10-28" } 4 { "index": {"_index": "cars", "_type": "transactions"}} 5 { "price" : 20000, "color" : "red", "make" : "honda", "sold" : "2014-11-05" } 6 { "index": {"_index": "cars", "_type": "transactions"}} 7 { "price" : 30000, "color" : "green", "make" : "ford", "sold" : "2014-05-18" } 8 { "index": {"_index": "cars", "_type": "transactions"}} 9 { "price" : 15000, "color" : "blue", "make" : "toyota", "sold" : "2014-07-02" } 10 { "index": {"_index": "cars", "_type": "transactions"}} 11 { "price" : 12000, "color" : "green", "make" : "toyota", "sold" : "2014-08-19" } 12 { "index": {"_index": "cars", "_type": "transactions"}} 13 { "price" : 20000, "color" : "red", "make" : "honda", "sold" : "2014-11-05" } 14 { "index": {"_index": "cars", "_type": "transactions"}} 15 { "price" : 80000, "color" : "red", "make" : "bmw", "sold" : "2014-01-01" } 16 { "index": {"_index": "cars", "_type": "transactions"}} 17 { "price" : 25000, "color" : "blue", "make" : "ford", "sold" : "2014-02-12" }
- 查询哪个颜色的汽车销量最好(按颜色分类)
1 GET cars/transactions/_search 2 { 3 "size": 0, 4 "aggs": { 5 "popular_colors": { 6 "terms": { 7 "field": "color" 8 } 9 } 10 } 11 }
1 s = Search(index='cars') 2 a = A("terms", field="color") 3 s.aggs.bucket("popular_color", a) 4 response = s.execute()
或者
1 s.aggs.bucket("popular_color", "terms", field="color")
- 查询每种颜色车的平均价格
1 GET cars/transactions/_search 2 { 3 "size": 0, 4 "aggs": { 5 "colors": { 6 "terms": { 7 "field": "color" 8 }, 9 "aggs": { 10 "avg_price": { 11 "avg": { 12 "field": "price" 13 } 14 } 15 } 16 } 17 } 18 }
1 s = Search(index='cars') 2 a1 = A("terms", field="color") 3 a2 = A("avg", field="price") 4 s.aggs.bucket("colors", a1).metric("avg_price", a2) 5 response = s.execute()
或者
1 s = Search(index='cars') 2 s.aggs.bucket("colors", "terms", field="color").metric("avg_price", "avg", field="price") 3 response = s.execute()
- 先按颜色分,再按品牌分,再求每种品牌的均价
1 GET cars/transactions/_search 2 { 3 "size": 0, 4 "aggs": { 5 "colors": { 6 "terms": { 7 "field": "color" 8 }, 9 "aggs": { 10 "make": { 11 "terms": { 12 "field": "make" 13 }, 14 "aggs": { 15 "avg_price": { 16 "avg": { 17 "field": "price" 18 } 19 } 20 } 21 } 22 } 23 } 24 } 25 }
1 s = Search(index='cars') 2 s.aggs.bucket("colors", "terms", field="color") 3 s.aggs["colors"].bucket("make", "terms", field="make") 4 s.aggs["colors"].aggs["make"].metric("avg_price", "avg", field="price") 5 response = s.execute()
- 先按颜色分,再按品牌分,再求每种品牌的最高和最低价
1 GET cars/transactions/_search 2 { 3 "size": 0, 4 "aggs": { 5 "colors": { 6 "terms": { 7 "field": "color" 8 }, 9 "aggs": { 10 "make": { 11 "terms": { 12 "field": "make" 13 }, 14 "aggs": { 15 "min_price": { 16 "min": { 17 "field": "price" 18 } 19 }, 20 "max_price": { 21 "max": { 22 "field": "price" 23 } 24 } 25 } 26 } 27 } 28 } 29 } 30 }
1 s = Search(index='cars') 2 s.aggs.bucket("colors", "terms", field="color") 3 s.aggs["colors"].bucket("make", "terms", field="make") 4 s.aggs["colors"].aggs["make"].metric("min_price", "min", field="price") 5 s.aggs["colors"].aggs["make"].metric("max_price", "max", field="price") 6 response = s.execute()
- 未完待续...
- 新建一张测试表