Elasticsearch学习---聚合查询之Bucket Aggregations

前言

Elasticsearch除搜索以外,还提供了针对数据统计分析的功能,通过各种API可以构建数据的复杂查询,不同类型的聚合查询都有自己的目的和输出,为了更好的理解这些类型,人们通常又会把它们分为三大类。

聚合类型三大类

Bucketing(桶聚合)

每个桶都与一个键和一个文档标准相关联,通过桶的聚合查询,我们将得到一个桶的列表,即:满足条件的文档集合。

Metric(指标)

计算一组文档的某些指标项的聚合

Pipeline(管道)

对其他聚合的输出或相关指标进行二次聚合

Bucket Aggregations

Bucket就类似于数据库中的分组,把满足条件的文档分为一组,Elasticsearch提供了很多类型的分组,比如有:range,geo、sample、term等

下面来看几个实际的例子

Term Aggregation

下面这个表示,查询索引为kibana_sample_data_flights中的文档数据,并按照DestCountry进行聚合查询,命名为:flight_dest,且只查询前5条。

GET /kibana_sample_data_flights/_search
{
  "aggs": {
    "flight_dest": {
      "terms": {
        "field": "DestCountry",
        "size": 5
      }
    }
  }
}

查询结果如下,前面是文档数据,最后是flight_dest信息
Elasticsearch学习---聚合查询之Bucket Aggregations_第1张图片

Range Aggregation

按照AvgTicketPrice属性,分为三档,分别为:小于500,500到1000,大于1000

GET /kibana_sample_data_flights/_search
{
  "aggs": {
    "price_ranges": {
      "range": {
        "field": "AvgTicketPrice",
        "ranges": [
          {
            "to": 500
          },
          {
            "from": 500,
            "to": 1000
          },
          {
            "from": 1000
          }
        ]
      }
    }
  }
}

查询结果
Elasticsearch学习---聚合查询之Bucket Aggregations_第2张图片

聚合结果中的key也支持自定义命名,比如:

Elasticsearch学习---聚合查询之Bucket Aggregations_第3张图片

查询目的地是IT,且按照三类票价进行分组

Elasticsearch学习---聚合查询之Bucket Aggregations_第4张图片

Date Range Aggregation

基于时间范围的聚合查询

GET /user_info_2/_search
{
  "aggs": {
    "range": {
      "date_range": {
        "field": "update_date",
        
        "ranges": [
          {
            "to": "2020-05-01 00:00:00"
          },
          {
            "from": "2020-05-02 00:00:00",
            "to": "2020-08-01 00:00:00"
          },
          {
            "from": "2020-08-02 00:00:00"
          }
        ]
      }
    }
  }
}

查询结果

{
  "took": 9,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 6,
    "max_score": 1,
    "hits": [
      {
        "_index": "user_info_2",
        "_type": "_doc",
        "_id": "8",
        "_score": 1,
        "_source": {
          "age": "20",
          "update_date": "2020-05-01 00:00:00"
        }
      },
      {
        "_index": "user_info_2",
        "_type": "_doc",
        "_id": "9",
        "_score": 1,
        "_source": {
          "name": "赵六",
          "update_date": "2020-08-01 00:00:00"
        }
      },
      {
        "_index": "user_info_2",
        "_type": "_doc",
        "_id": "10",
        "_score": 1,
        "_source": {
          "age": null,
          "update_date": "2020-11-01 00:00:00"
        }
      },
      {
        "_index": "user_info_2",
        "_type": "_doc",
        "_id": "2",
        "_score": 1,
        "_source": {
          "name": "李四",
          "age": 29,
          "address": "中国南京市建邺区",
          "tel": "13901234568",
          "update_date": "2020-01-01 00:00:00"
        }
      },
      {
        "_index": "user_info_2",
        "_type": "_doc",
        "_id": "1",
        "_score": 1,
        "_source": {
          "update_date": "2020-01-01 00:00:00"
        }
      },
      {
        "_index": "user_info_2",
        "_type": "_doc",
        "_id": "3",
        "_score": 1,
        "_source": {
          "name": "王五",
          "age": 30,
          "address": "中国北京市朝阳区",
          "tel": "13901234567",
          "update_date": "2020-03-01 00:00:00"
        }
      }
    ]
  },
  "aggregations": {
    "range": {
      "buckets": [
        {
          "key": "*-2020-05-01 00:00:00",
          "to": 1588291200000,
          "to_as_string": "2020-05-01 00:00:00",
          "doc_count": 3
        },
        {
          "key": "2020-05-02 00:00:00-2020-08-01 00:00:00",
          "from": 1588377600000,
          "from_as_string": "2020-05-02 00:00:00",
          "to": 1596240000000,
          "to_as_string": "2020-08-01 00:00:00",
          "doc_count": 0
        },
        {
          "key": "2020-08-02 00:00:00-*",
          "from": 1596326400000,
          "from_as_string": "2020-08-02 00:00:00",
          "doc_count": 1
        }
      ]
    }
  }
}

Filter Aggregation

对经过Filter条件过滤后的结果集进行聚合查询

如下表示,从DestCountry为AU的文档集中进行聚合查询,统计DistanceMiles的平均值。

GET /kibana_sample_data_flights/_search
{
  "aggs": {
    "flight_Miles": {
      "filter": {
        "term": {
          "DestCountry": "AU"
        }
      },
      "aggs": {
        "avg_miles": {
          "avg": {
            "field": "DistanceMiles"
          }
        }
      }
    }
  }
}

结果如下
Elasticsearch学习---聚合查询之Bucket Aggregations_第5张图片

Missing Aggregation

统计文档中缺失字段的数量,缺失字段包含值为null的情况

在user_info_2索引中,找缺失age的文档数

GET /user_info_2/_search
{
  "aggs": {
    "without_age": {
      "missing": {
        "field": "age"
      }
    }
  }
}

统计结果为2,一个没有age字段,一个age字段值为null

{
  "took": 10,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 6,
    "max_score": 1,
    "hits": [
      {
        "_index": "user_info_2",
        "_type": "_doc",
        "_id": "9",
        "_score": 1,
        "_source": {
          "name": "赵六"
        }
      },
      {
        "_index": "user_info_2",
        "_type": "_doc",
        "_id": "8",
        "_score": 1,
        "_source": {
          "age": "20"
        }
      },
      {
        "_index": "user_info_2",
        "_type": "_doc",
        "_id": "10",
        "_score": 1,
        "_source": {
          "age": null
        }
      },
      {
        "_index": "user_info_2",
        "_type": "_doc",
        "_id": "2",
        "_score": 1,
        "_source": {
          "name": "李四",
          "age": 29,
          "address": "中国南京市建邺区",
          "tel": "13901234568"
        }
      },
      {
        "_index": "user_info_2",
        "_type": "_doc",
        "_id": "1",
        "_score": 1,
        "_source": {
          "name": "张三",
          "age": 28,
          "address": "中国南京市鼓楼区",
          "tel": "13901234567"
        }
      },
      {
        "_index": "user_info_2",
        "_type": "_doc",
        "_id": "3",
        "_score": 1,
        "_source": {
          "name": "王五",
          "age": 30,
          "address": "中国北京市朝阳区",
          "tel": "13901234567"
        }
      }
    ]
  },
  "aggregations": {
    "without_age": {
      "doc_count": 2
    }
  }
}

Histogram Aggregation

直方图聚合,可按照一定的区间进行统计

GET /kibana_sample_data_flights/_search
{
  "aggs": {
    "test": {
      "histogram": {
        "field": "AvgTicketPrice",
        "interval": 100
      }
    }
  }
}

查询结果如下

Elasticsearch学习---聚合查询之Bucket Aggregations_第6张图片

你可能感兴趣的:(Elasticsearch,elasticsearch)