ElasticSearch-聚合

什么是聚合

每个聚合都是一个或者多个桶和零个或者多个指标的组合。

桶(Buckets)

满足特定条件的文档的集合。

指标(Metrics)

对桶内的文档进行统计计算。

聚合语法结构

"aggregation" : {
    "" : {
        "aggregation_type" : {
            
        }
        [, "meta" : {[]}]?
        [, "aggregation" : {[]}]?
    }
    [, "" : {...}]*
}

Bucket Aggregation-分桶

Filter Aggregation -- 过滤分桶
Filters Aggregation -- 过滤分桶
Date Histogram Aggregation -- 按照日期自动划分桶
Date Range Aggregation -- 给定日期范围划分
Histogram Aggregation -- 直方图划分桶
Range Aggregation -- 给定范围划分桶
IP Range Aggregation -- 按照给定ip范围分桶
Terms Aggregatioon -- 按照最多的词条分桶
Geo Distance Aggregation -- 按地理位置指定的中心点园环分桶
GeoHash grid Aggregation -- 按geohash单元分桶

Metrics Aggregation-指标

Avg Aggregation -- 平均值
Max Aggregation -- 最大值
Min Aggregation -- 最小值
Sum Aggregation -- 求和
Cardinality Aggregation -- 基数(去重值)
Percentiles Aggregation -- 百分位
Percentile Ranks Aggregation -- 百分位排名
Stats Aggregation -- 统计(包含min、max、sum、avg)
Geo Bounds Aggregation -- 地理坐标边框
Geo Centroid Aggregation -- 图心

初始化数据

DELETE cars
PUT cars
{
  "mappings": {
    "transactions": {
      "properties": {
        "price": {
          "type":"long"
        },
        "color": {
          "type":"keyword"
        },
        "make": {
          "type":"keyword"
        },
        "sold": {
          "type":"date"
        }
      }
    }
  }
}
POST /cars/transactions/_bulk
{ "index": {}}
{ "price" : 10000, "color" : "red", "make" : "honda", "sold" : "2014-10-28" }
{ "index": {}}
{ "price" : 20000, "color" : "red", "make" : "honda", "sold" : "2014-11-05" }
{ "index": {}}
{ "price" : 30000, "color" : "green", "make" : "ford", "sold" : "2014-05-18" }
{ "index": {}}
{ "price" : 15000, "color" : "blue", "make" : "toyota", "sold" : "2014-07-02" }
{ "index": {}}
{ "price" : 12000, "color" : "green", "make" : "toyota", "sold" : "2014-08-19" }
{ "index": {}}
{ "price" : 20000, "color" : "red", "make" : "honda", "sold" : "2014-11-05" }
{ "index": {}}
{ "price" : 80000, "color" : "red", "make" : "bmw", "sold" : "2014-01-01" }
{ "index": {}}
{ "price" : 25000, "color" : "blue", "make" : "ford", "sold" : "2014-02-12" }

select count(color) from cars group by color;

GET /cars/transactions/_search
{
    "size" : 0,
    "aggs" : { 
        "popular_colors" : { 
            "terms" : { 
              "field" : "color"
            }
        }
    }
}

每种颜色汽车的平均价格是多少?

GET /cars/transactions/_search
{
   "size" : 0,
   "aggs": {
      "colors": {
         "terms": {
            "field": "color"
         },
         "aggs": { 
            "avg_price": { 
               "avg": {
                  "field": "price" 
               }
            }
         }
      }
   }
}

每个颜色的汽车制造商的分布

GET /cars/transactions/_search
{
   "size" : 0,
   "aggs": {
      "colors": {
         "terms": {
            "field": "color"
         },
         "aggs": {
            "avg_price": { 
               "avg": {
                  "field": "price"
               }
            },
            "make": { 
                "terms": {
                    "field": "make" 
                }
            }
         }
      }
   }
}

为每个汽车生成商计算最低和最高的价格

GET /cars/transactions/_search
{
   "size" : 0,
   "aggs": {
      "colors": {
         "terms": {
            "field": "color"
         },
         "aggs": {
            "avg_price": { "avg": { "field": "price" }
            },
            "make" : {
                "terms" : {
                    "field" : "make"
                },
                "aggs" : { 
                    "min_price" : { "min": { "field": "price"} }, 
                    "max_price" : { "max": { "field": "price"} } 
                }
            }
         }
      }
   }
}

直方图

GET /cars/transactions/_search
{
   "size" : 0,
   "aggs":{
      "price":{
         "histogram":{ 
            "field": "price",
            "interval": 20000
         },
         "aggs":{
            "price_sum": {
               "sum": { 
                 "field" : "price"
               }
             }
         }
      }
   }
}

最受欢迎 10 种汽车以及它们的平均售价、标准差

GET /cars/transactions/_search
{
  "size" : 0,
  "aggs": {
    "makes": {
      "terms": {
        "field": "make",
        "size": 10
      },
      "aggs": {
        "stats": {
          "extended_stats": {
            "field": "price"
          }
        }
      }
    }
  }
}

时间条形图

今年每月销售多少台汽车?
这只股票最近 12 小时的价格是多少?
我们网站上周每小时的平均响应延迟时间是多少?

每月销售多少台汽车

GET /cars/transactions/_search
{
   "size" : 0,
   "aggs": {
      "sales": {
         "date_histogram": {
            "field": "sold",
            "interval": "month", 
            "format": "yyyy-MM-dd" 
         }
      }
   }
}

GET /cars/transactions/_search
{
   "size" : 0,
   "aggs": {
      "sales": {
         "date_histogram": {
            "field": "sold",
            "interval": "month",
            "format": "yyyy-MM-dd",
            "min_doc_count" : 0, 
            "extended_bounds" : { 
                "min" : "2014-01-01",
                "max" : "2014-12-31"
            }
         }
      }
   }
}

同时按季度、按每个汽车品牌计算销售总额,以便可以找出哪种品牌最赚钱:

间间隔从 month 改成了 quarter

GET /cars/transactions/_search
{
   "size" : 0,
   "aggs": {
      "sales": {
         "date_histogram": {
            "field": "sold",
            "interval": "quarter", 
            "format": "yyyy-MM-dd",
            "min_doc_count" : 0,
            "extended_bounds" : {
                "min" : "2014-01-01",
                "max" : "2014-12-31"
            }
         },
         "aggs": {
            "per_make_sum": {
               "terms": {
                  "field": "make"
               },
               "aggs": {
                  "sum_price": {
                     "sum": { "field": "price" } 
                  }
               }
            },
            "total_sum": {
               "sum": { "field": "price" } 
            }
         }
      }
   }
}

查询某一个范围的聚合

GET /cars/transactions/_search
{
  "size": 0, 
    "query" : {
        "match" : {
            "make" : "ford"
        }
    },
    "aggs" : {
        "colors" : {
            "terms" : {
              "field" : "color"
            }
        }
    }
}

全局桶,query只是对single_avg_price起作用

GET /cars/transactions/_search
{
    "size" : 0,
    "query" : {
        "match" : {
            "make" : "ford"
        }
    },
    "aggs" : {
        "single_avg_price": {
            "avg" : { "field" : "price" } 
        },
        "all": {
            "global" : {}, 
            "aggs" : {
                "avg_price": {
                    "avg" : { "field" : "price" } 
                }

            }
        }
    }
}

过滤

GET /cars/transactions/_search
{
    "size" : 0,
    "query" : {
        "constant_score": {
            "filter": {
                "range": {
                    "price": {
                        "gte": 10000
                    }
                }
            }
        }
    },
    "aggs" : {
        "single_avg_price": {
            "avg" : { "field" : "price" }
        }
    }
}

可以指定一个过滤桶,当文档满足过滤桶的条件时,将其加入到桶内

不过滤搜索结果,对聚合结果进行过滤

GET /cars/transactions/_search
{
   "size" : 0,
   "query":{
      "match": {
         "make": "ford"
      }
   },
   "aggs":{
      "recent_sales": {
         "filter": { 
            "range": {
               "sold": {
                  "from": "now-1M"
               }
            }
         },
         "aggs": {
            "average_price":{
               "avg": {
                  "field": "price" 
               }
            }
         }
      }
   }
}

只过滤搜索结果,不过滤聚合结果(注意hits.total的变化)

GET /cars/transactions/_search
{
    "size" : 0,
    "query": {
        "match": {
            "make": "ford"
        }
    },
    "post_filter": {    
        "term" : {
            "color" : "green"
        }
    },
    "aggs" : {
        "all_colors": {
            "terms" : { "field" : "color" }
        }
    }
}

内置排序

GET /cars/transactions/_search
{
    "size" : 0,
    "aggs" : {
        "colors" : {
            "terms" : {
              "field" : "color",
              "order": {
                "_count" : "asc"
              }
            }
        }
    }
}

_count

按文档数排序。对 terms 、 histogram 、 date_histogram 有效。

_term

按词项的字符串值的字母顺序排序。只在 terms 内使用。

_key

按每个桶的键值数值排序(理论上与 _term 类似)。 只在 histogram 和 date_histogram 内使用。

按照汽车颜色创建一个销售条状图表,但按照汽车平均售价的升序进行排序

GET /cars/transactions/_search
{
    "size" : 0,
    "aggs" : {
        "colors" : {
            "terms" : {
              "field" : "color",
              "order": {
                "avg_price" : "asc" 
              }
            },
            "aggs": {
                "avg_price": {
                    "avg": {"field": "price"} 
                }
            }
        }
    }
}

多值度量排序

GET /cars/transactions/_search
{
    "size" : 0,
    "aggs" : {
        "colors" : {
            "terms" : {
              "field" : "color",
              "order": {
                "stats.variance" : "asc" 
              }
            },
            "aggs": {
                "stats": {
                    "extended_stats": {"field": "price"}
                }
            }
        }
    }
}

嵌套度量

stats 度量是 red_green_cars 聚合的子节点,而 red_green_cars 又是 colors 聚合的子节点

嵌套路径上的每个桶都必须是单值的,度量用尖括号( > )嵌套起来

GET /cars/transactions/_search
{
    "size" : 0,
    "aggs" : {
        "colors" : {
            "histogram" : {
              "field" : "price",
              "interval": 20000,
              "order": {
                "red_green_cars>stats.variance" : "asc" 
              }
            },
            "aggs": {
                "red_green_cars": {
                    "filter": { "terms": {"color": ["red", "green"]}}, 
                    "aggs": {
                        "stats": {"extended_stats": {"field" : "price"}} 
                    }
                }
            }
        }
    }
}

ES有两种近似算法( cardinality 和 percentiles ),它们会提供准确但不是 100% 精确的结果。 以牺牲一点小小的估算错误为代价,这些算法可以为我们换来高速的执行效率和极小的内存消耗。

近似计算的结果会在毫秒内返回,而“完全正确”的结果就可能需要几秒,甚至无法返回。

销售汽车颜色的数量

GET /cars/transactions/_search
{
    "size" : 0,
    "aggs" : {
        "distinct_colors" : {
            "cardinality" : {
              "field" : "color"
            }
        }
    }
}

precision_threshold 接受 0–40,000 之间的数字,更大的值还是会被当作 40,000 来处理

GET /cars/transactions/_search
{
    "size" : 0,
    "aggs" : {
        "distinct_colors" : {
            "cardinality" : {
              "field" : "color",
              "precision_threshold" : 100
            }
        }
    }
}

百分位度量

DELETE website
PUT website
{
  "mappings": {
    "logs": {
      "properties": {
        "latency": {
          "type":"long"
        },
        "zone": {
          "type":"keyword"
        },
        "timestamp": {
          "type":"date"
        }
      }
    }
  }
}

POST /website/logs/_bulk

{ "index": {}}
{ "latency" : 100, "zone" : "US", "timestamp" : "2014-10-28" }
{ "index": {}}
{ "latency" : 80, "zone" : "US", "timestamp" : "2014-10-29" }
{ "index": {}}
{ "latency" : 99, "zone" : "US", "timestamp" : "2014-10-29" }
{ "index": {}}
{ "latency" : 102, "zone" : "US", "timestamp" : "2014-10-28" }
{ "index": {}}
{ "latency" : 75, "zone" : "US", "timestamp" : "2014-10-28" }
{ "index": {}}
{ "latency" : 82, "zone" : "US", "timestamp" : "2014-10-29" }
{ "index": {}}
{ "latency" : 100, "zone" : "EU", "timestamp" : "2014-10-28" }
{ "index": {}}
{ "latency" : 280, "zone" : "EU", "timestamp" : "2014-10-29" }
{ "index": {}}
{ "latency" : 155, "zone" : "EU", "timestamp" : "2014-10-29" }
{ "index": {}}
{ "latency" : 623, "zone" : "EU", "timestamp" : "2014-10-28" }
{ "index": {}}
{ "latency" : 380, "zone" : "EU", "timestamp" : "2014-10-28" }
{ "index": {}}
{ "latency" : 319, "zone" : "EU", "timestamp" : "2014-10-29" }

查看平均响应延迟时间,百分位时间

GET /website/logs/_search
{
    "size" : 0,
    "aggs" : {
        "load_times" : {
            "percentiles" : {
                "field" : "latency" 
            }
        },
        "avg_load_time" : {
            "avg" : {
                "field" : "latency" 
            }
        }
    }
}

查看延迟时间跟地理位置的延迟是否有关的百分位

GET /website/logs/_search
{
    "size" : 0,
    "aggs" : {
        "zones" : {
            "terms" : {
                "field" : "zone" 
            },
            "aggs" : {
                "load_times" : {
                    "percentiles" : { 
                      "field" : "latency",
                      "percents" : [50, 95.0, 99.0] 
                    }
                },
                "load_avg" : {
                    "avg" : {
                        "field" : "latency"
                    }
                }
            }
        }
    }
}

查看某一个值属于哪个百分位(percentile_ranks)

GET /website/logs/_search
{
    "size" : 0,
    "aggs" : {
        "zones" : {
            "terms" : {
                "field" : "zone"
            },
            "aggs" : {
                "load_times" : {
                    "percentile_ranks" : {
                      "field" : "latency",
                      "values" : [210, 800] 
                    }
                }
            }
        }
    }
}

你可能感兴趣的:(ElasticSearch-聚合)