Elasticsearch 聚合分析

Elasticsearch聚合定义

聚合有助于基于搜索查询提供聚合数据。 它基于称为聚合的简单构建块,可以组合以构建复杂的数据。
基本语法结构如下:

"aggregations" : {
    "" : {
        "" : {
            
        }
        [,"meta" : {  [] } ]?
        [,"aggregations" : { []+ } ]?
    }
    [,"" : { ... } ]*
}

Elasticsearch聚合分类

es将聚合分析主要分为如下4类:

  • Bucket:分桶类型,类似SQL中的GROUP BY语法
  • Metric:指标分析类型,如计算最大值、最小值、平均值等等
  • Pipeline:管道分析类型,基于上一级的聚合分析结果进行在分析
  • Matrix:矩阵分析类型

先准备数据:

POST /cars/transactions/_bulk
{ "index": {}}
{ "price" : 10000, "color" : "red", "make" : "honda", "sold" : "2014-10-28" }
{ "index": {}}
{ "price" : 20000, "color" : "red", "make" : "honda", "sold" : "2014-11-05" }
{ "index": {}}
{ "price" : 30000, "color" : "green", "make" : "ford", "sold" : "2014-05-18" }
{ "index": {}}
{ "price" : 15000, "color" : "blue", "make" : "toyota", "sold" : "2014-07-02" }
{ "index": {}}
{ "price" : 12000, "color" : "green", "make" : "toyota", "sold" : "2014-08-19" }
{ "index": {}}
{ "price" : 20000, "color" : "red", "make" : "honda", "sold" : "2014-11-05" }
{ "index": {}}
{ "price" : 80000, "color" : "red", "make" : "bmw", "sold" : "2014-01-01" }
{ "index": {}}
{ "price" : 25000, "color" : "blue", "make" : "ford", "sold" : "2014-02-12" }

Metric聚合分析

Metric聚合分析分为单值分析和多值分析两类:

  • 单值分析,只输出一个分析结果
min,max,avg,sum
cardinality

多值分析,输出多个分析结果

stats,extended stats
percentile,percentile rank
top hits 

min,max,avg,sum

样例:

get /cars/transactions/_search
{
  "size": 0,//不返回文档列表
  "aggs":{
    "price_max":{
      "max": {
        "field": "price"
      }
    },
    
    "price_min":{
      "min": {
        "field": "price"
      }
    },
    "avg_price":{
      "avg":{
        "field":"price"
      }
    },
    "sum_price":{
      "sum":{
        "field":"price"
      }
    }
  }
  
}

cardinality

ardinality:意为集合的势,或者基数,是指不同数值的个数,类似SQL中的distinct count概念
样例:

get /cars/transactions/_search
{
  "size": 0,//不返回文档列表
  "aggs":{
    "count_of_make":{
      "cardinality": {
        "field": "make.keyword"
      }
    }
  }
  
}

stats,extended stats


  • stats:返回一系列数值类型的统计值,包含min、max、avg、sumcount
  • extended stats:对stats的扩展,包含了更多的统计数据,比如方差、标准差等

样例:

get /cars/transactions/_search
{
  "size": 0,
  "aggs":{
    "stats_price":{
      "stats": {
        "field": "price"
      }
    }
  }
  
}

Percentile,Percentile Rank


  • Percentile: 百分位数统计。
  • Percentile Rank: 百分位数统计

Top Hits

Top Hits: 一般用于分桶后获取该桶内匹配的顶部文档列表,即详情数据

例如根据汽车厂商进行分组,并取每组价格最高的两条transactions(交易)数据

get /cars/transactions/_search
{
  "size": 0,
  "aggs": {
    "group_by_color": {
      "terms": {
        "field": "make.keyword"
      },
      "aggs": {
      "top_data": {
        "top_hits": {
          "size": 2,
          "_source": [
            "price",
            "color",
            "make"
          ],
          "sort": [
            {
              "price": {
                "order": "desc"
              }
            }
          ]
        }
      }
    }
    }
    
  }
}

结果:

#! Deprecation: [types removal] Specifying types in search requests is deprecated.
{
  "took" : 8,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 8,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "group_by_color" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "honda",
          "doc_count" : 3,
          "top_data" : {
            "hits" : {
              "total" : {
                "value" : 3,
                "relation" : "eq"
              },
              "max_score" : null,
              "hits" : [
                {
                  "_index" : "cars",
                  "_type" : "transactions",
                  "_id" : "js_K120B6sb1aJIMtJKa",
                  "_score" : null,
                  "_source" : {
                    "color" : "red",
                    "price" : 20000,
                    "make" : "honda"
                  },
                  "sort" : [
                    20000
                  ]
                },
                {
                  "_index" : "cars",
                  "_type" : "transactions",
                  "_id" : "ks_K120B6sb1aJIMtJKa",
                  "_score" : null,
                  "_source" : {
                    "color" : "red",
                    "price" : 20000,
                    "make" : "honda"
                  },
                  "sort" : [
                    20000
                  ]
                }
              ]
            }
          }
        },
        {
          "key" : "ford",
          "doc_count" : 2,
          "top_data" : {
            "hits" : {
              "total" : {
                "value" : 2,
                "relation" : "eq"
              },
              "max_score" : null,
              "hits" : [
                {
                  "_index" : "cars",
                  "_type" : "transactions",
                  "_id" : "j8_K120B6sb1aJIMtJKa",
                  "_score" : null,
                  "_source" : {
                    "color" : "green",
                    "price" : 30000,
                    "make" : "ford"
                  },
                  "sort" : [
                    30000
                  ]
                },
                {
                  "_index" : "cars",
                  "_type" : "transactions",
                  "_id" : "lM_K120B6sb1aJIMtJKa",
                  "_score" : null,
                  "_source" : {
                    "color" : "blue",
                    "price" : 25000,
                    "make" : "ford"
                  },
                  "sort" : [
                    25000
                  ]
                }
              ]
            }
          }
        },
        {
          "key" : "toyota",
          "doc_count" : 2,
          "top_data" : {
            "hits" : {
              "total" : {
                "value" : 2,
                "relation" : "eq"
              },
              "max_score" : null,
              "hits" : [
                {
                  "_index" : "cars",
                  "_type" : "transactions",
                  "_id" : "kM_K120B6sb1aJIMtJKa",
                  "_score" : null,
                  "_source" : {
                    "color" : "blue",
                    "price" : 15000,
                    "make" : "toyota"
                  },
                  "sort" : [
                    15000
                  ]
                },
                {
                  "_index" : "cars",
                  "_type" : "transactions",
                  "_id" : "kc_K120B6sb1aJIMtJKa",
                  "_score" : null,
                  "_source" : {
                    "color" : "green",
                    "price" : 12000,
                    "make" : "toyota"
                  },
                  "sort" : [
                    12000
                  ]
                }
              ]
            }
          }
        },
        {
          "key" : "bmw",
          "doc_count" : 1,
          "top_data" : {
            "hits" : {
              "total" : {
                "value" : 1,
                "relation" : "eq"
              },
              "max_score" : null,
              "hits" : [
                {
                  "_index" : "cars",
                  "_type" : "transactions",
                  "_id" : "k8_K120B6sb1aJIMtJKa",
                  "_score" : null,
                  "_source" : {
                    "color" : "red",
                    "price" : 80000,
                    "make" : "bmw"
                  },
                  "sort" : [
                    80000
                  ]
                }
              ]
            }
          }
        }
      ]
    }
  }
}

Bucketing聚合

基于检索构成了逻辑文档组,满足特定规则的文档放置到一个桶里,每一个桶关联一个key。

这里写图片描述
类比Mysql中的group by操作,

最简单的分桶策略,直接按照term来分桶,如果是text类型,则按照分词后的结果分桶

get /cars/transactions/_search
{
  "size": 0,
  "aggs": {
    "group_by_color": {
      "terms": {
        "field": "color.keyword"
      }
    },
     "group_by_make": {
      "terms": {
        "field": "make.keyword"
      }
    }
  }
}

注意点如果不加.keyword会报错:

"error": {
    "root_cause": [
      {
        "type": "illegal_argument_exception",
        "reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [color] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."
      }
    ],
...

Elasticsearch 5.x 版本开始支持通过text的内置字段keyword作精确查询、聚合分析.

Range,Date Range

  • Range: 通过制定数值的范围来设定分桶规则
  • Date Range: 通过指定日期的范围来设定分桶规则

样例:

get /cars/transactions/_search
{
  "size": 0,
  "aggs": {
    "range_price": {
      "range": {
        "field": "price",
        "ranges": [
          {
            "to": 20000
          },
          {
            "from": 20000,
            "to": 30000
          },
          {
            "from":50000
          }
        ]
      }
    }
  }
}

结果:

#! Deprecation: [types removal] Specifying types in search requests is deprecated.
{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 8,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "range_price" : {
      "buckets" : [
        {
          "key" : "*-20000.0",
          "to" : 20000.0,
          "doc_count" : 3
        },
        {
          "key" : "20000.0-30000.0",
          "from" : 20000.0,
          "to" : 30000.0,
          "doc_count" : 3
        },
        {
          "key" : "50000.0-*",
          "from" : 50000.0,
          "doc_count" : 1
        }
      ]
    }
  }
}

Historgram,Date Histogram


  • Historgram: 直方图,以固定间隔的策略来分割数据
  • Date Histogram: 针对日期的直方图或者柱状图,是时序分析中常用的聚合分析类型

示例:

get /cars/transactions/_search
{
  "size": 0,
  "aggs": {
    "hist_price": {
      "histogram": {
        "field": "price",
        "interval": 20000, 
        "extended_bounds": 
          {
            "min": 10000,
            "max": 80000
          }
      }
    }
  }
}

结果:

#! Deprecation: [types removal] Specifying types in search requests is deprecated.
{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 8,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "hist_price" : {
      "buckets" : [
        {
          "key" : 0.0,
          "doc_count" : 3
        },
        {
          "key" : 20000.0,
          "doc_count" : 4
        },
        {
          "key" : 40000.0,
          "doc_count" : 0
        },
        {
          "key" : 60000.0,
          "doc_count" : 0
        },
        {
          "key" : 80000.0,
          "doc_count" : 1
        }
      ]
    }
  }
}

Bucket + Metric聚合分析

Bucket聚合分析允许通过子分析来进一步进行分析,该分析可以是Bucket也可以是Metric,这也使得es的聚合分析能力变得异常强大

(1)分桶之后再分桶

get /cars/transactions/_search
{
  "size": 0,
  "aggs": {
    "group_by_make": {
      "terms": {
        "field": "make.keyword"
      },
      "aggs": {
        "range_price": {
          "range": {
            "field": "price",
            "ranges": [
              {
                "to": 20000
              },
              {
                "from": 20000,
                "to": 30000
              },
              {
                "from": 50000
              }
            ]
          }
        }
      }
    }
  }
}

结果:

#! Deprecation: [types removal] Specifying types in search requests is deprecated.
{
  "took" : 4,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 8,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "group_by_make" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "honda",
          "doc_count" : 3,
          "range_price" : {
            "buckets" : [
              {
                "key" : "*-20000.0",
                "to" : 20000.0,
                "doc_count" : 1
              },
              {
                "key" : "20000.0-30000.0",
                "from" : 20000.0,
                "to" : 30000.0,
                "doc_count" : 2
              },
              {
                "key" : "50000.0-*",
                "from" : 50000.0,
                "doc_count" : 0
              }
            ]
          }
        },
        {
          "key" : "ford",
          "doc_count" : 2,
          "range_price" : {
            "buckets" : [
              {
                "key" : "*-20000.0",
                "to" : 20000.0,
                "doc_count" : 0
              },
              {
                "key" : "20000.0-30000.0",
                "from" : 20000.0,
                "to" : 30000.0,
                "doc_count" : 1
              },
              {
                "key" : "50000.0-*",
                "from" : 50000.0,
                "doc_count" : 0
              }
            ]
          }
        },
        {
          "key" : "toyota",
          "doc_count" : 2,
          "range_price" : {
            "buckets" : [
              {
                "key" : "*-20000.0",
                "to" : 20000.0,
                "doc_count" : 2
              },
              {
                "key" : "20000.0-30000.0",
                "from" : 20000.0,
                "to" : 30000.0,
                "doc_count" : 0
              },
              {
                "key" : "50000.0-*",
                "from" : 50000.0,
                "doc_count" : 0
              }
            ]
          }
        },
        {
          "key" : "bmw",
          "doc_count" : 1,
          "range_price" : {
            "buckets" : [
              {
                "key" : "*-20000.0",
                "to" : 20000.0,
                "doc_count" : 0
              },
              {
                "key" : "20000.0-30000.0",
                "from" : 20000.0,
                "to" : 30000.0,
                "doc_count" : 0
              },
              {
                "key" : "50000.0-*",
                "from" : 50000.0,
                "doc_count" : 1
              }
            ]
          }
        }
      ]
    }
  }
}

(2)分桶后进行数据分析

get /cars/transactions/_search
{
  "size": 0,
  "aggs": {
    "group_by_make": {
      "terms": {
        "field": "make.keyword"
      },
      "aggs": {
        "stats_price":{
          "stats": {
            "field": "price"
          }
        }
      }
    }
  }
}

结果:

#! Deprecation: [types removal] Specifying types in search requests is deprecated.
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 8,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "group_by_make" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "honda",
          "doc_count" : 3,
          "stats_price" : {
            "count" : 3,
            "min" : 10000.0,
            "max" : 20000.0,
            "avg" : 16666.666666666668,
            "sum" : 50000.0
          }
        },
        {
          "key" : "ford",
          "doc_count" : 2,
          "stats_price" : {
            "count" : 2,
            "min" : 25000.0,
            "max" : 30000.0,
            "avg" : 27500.0,
            "sum" : 55000.0
          }
        },
        {
          "key" : "toyota",
          "doc_count" : 2,
          "stats_price" : {
            "count" : 2,
            "min" : 12000.0,
            "max" : 15000.0,
            "avg" : 13500.0,
            "sum" : 27000.0
          }
        },
        {
          "key" : "bmw",
          "doc_count" : 1,
          "stats_price" : {
            "count" : 1,
            "min" : 80000.0,
            "max" : 80000.0,
            "avg" : 80000.0,
            "sum" : 80000.0
          }
        }
      ]
    }
  }
}

聚合分析中的排序

根据厂商分组后并按照价格进行降序排列:

get /cars/transactions/_search
{
  "size": 0,
  "aggs": {
    "group_by_make": {
      "terms": {
        "field": "make.keyword",
        "order": {
          "avg_price": "desc"
        }
      },
      "aggs": {
        "avg_price":{
          "avg": {
            "field": "price"
          }
        }
      }
    }
  }
}

结果:

#! Deprecation: [types removal] Specifying types in search requests is deprecated.
{
  "took" : 26,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 8,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "group_by_make" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "bmw",
          "doc_count" : 1,
          "avg_price" : {
            "value" : 80000.0
          }
        },
        {
          "key" : "ford",
          "doc_count" : 2,
          "avg_price" : {
            "value" : 27500.0
          }
        },
        {
          "key" : "honda",
          "doc_count" : 3,
          "avg_price" : {
            "value" : 16666.666666666668
          }
        },
        {
          "key" : "toyota",
          "doc_count" : 2,
          "avg_price" : {
            "value" : 13500.0
          }
        }
      ]
    }
  }
}

 

你可能感兴趣的:(ES)