ES学习记录4——ES中的过滤和聚合

6. 过滤

 之前忽略了在搜索返回结果中的hits._scoremax_score参数,其实都是指文档与指定的搜索查询匹配程度的相对度量,score越高,匹配度越高。但查询并不总是需要产生分数,特别是当它们仅用于“过滤”文档集时,Elasticsearch会检测这些情况并自动优化查询执行,以便不计算无用的分数。bool搜索和range搜索都支持过滤操作,如(在bool内部):

// 过滤得到20000<=balance<=30000的文档
GET /bank/_search
{
  "query": {
    "bool": {
      "must": { "match_all": {} },
      "filter": {
        "range": {
          "balance": {
            "gte": 20000,
            "lte": 30000
          }
        }
      }
    }
  }
}

7. 聚合操作

 聚合提供了数据分组和统计的功能(可以像SQL中group by那样去理解),在Elasticsearch中,执行搜索后返回结果时,在这个返回结果中将聚合结果与命中结果(就是实际返回hits)分开(即分为搜索命中结果和聚合结果)。可以运行查询和多个聚合,并一次性获取两个(或任一个)操作的结果,从而避免使用简洁的API进行网络往返所耗费的时间。

 下面是一个按用户所在的州state为聚合的操作条件:

GET /bank/_search
{
  "size": 0,
  "aggs": {
    "group_by_state": {
      "terms": {
        "field": "state.keyword"
      }
    }
  }
}

上述aggs就是用来指定聚合条件的,这里为了便于观察聚合结果,直接让返回的命中结果中具体结果数组显示0个"size": 0,聚合结果也是默认显示top10,最终的结果为:

{
  "took": 0,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1000,
    "max_score": 0,
    // 这边数组直接显示0个
    "hits": []
  },
  "aggregations": {
    "group_by_state": {
      "doc_count_error_upper_bound": 20,
      "sum_other_doc_count": 770,
      "buckets": [
        {
          "key": "ID",
          "doc_count": 27
        },
        // 省略8个。。。
        {
          "key": "MO",
          "doc_count": 20
        }
      ]
    }
  }
}

返回的结果中aggregations就是指聚合结果(注意它已经和命中结果hits分开了),发现用户所在州为"ID"(Idaho,爱达荷州)有27个。。。又比如下面是按州state来统计平均工资:

GET /bank/_search
{
  "size": 0,
  "aggs": {
    "group_by_state": {
      "terms": {
        "field": "state.keyword"
      },
      "aggs": {
        "average_balance": {
          "avg": {
            "field": "balance"
          }
        }
      }
    }
  }
}

上述的命令中在group_by_state聚合中嵌套了一个average_balance的聚合(这种嵌套聚合很常见),在实际开发中可以任意嵌套聚合以提取所需要的信息,返回的聚合结果如下(其他不相关的以省略):

  "aggregations": {
    "group_by_state": {
      "doc_count_error_upper_bound": 20,
      "sum_other_doc_count": 770,
      "buckets": [
        {
          "key": "ID",
          "doc_count": 27,
          "average_balance": {
            "value": 24368.777777777777
          }
        },
        // 省去8个
        {
          "key": "MO",
          "doc_count": 20,
          "average_balance": {
            "value": 24151.8
          }
        }
      ]
    }
  }

按照州平均账户余额降序排列:

GET /bank/_search
{
  "size": 0,
  "aggs": {
    "group_by_state": {
      "terms": {
        "field": "state.keyword",
        "order": {
          "average_balance": "desc"
        }
      },
      "aggs": {
        "average_balance": {
          "avg": {
            "field": "balance"
          }
        }
      }
    }
  }
}

下面是先将年龄为20-29、30-39、40-49的依次分组,然后再按性别分组,最后按平均账户余额分组:

GET /bank/_search
{
  "size": 0,
  "aggs": {
    "group_by_age": {
      "range": {
        "field": "age",
        "ranges": [
          {
            "from": 20,
            "to": 30
          },
          {
            "from": 30,
            "to": 40
          },
          {
            "from": 40,
            "to": 50
          }
        ]
      },
      "aggs": {
        "group_by_gender": {
          "terms": {
            "field": "gender.keyword"
          },
          "aggs": {
            "average_balance": {
              "avg": {
                "field": "balance"
              }
            }
          }
        }
      }
    }
  }
}

返回的聚合结果如下:

  "aggregations": {
    "group_by_age": {
      "buckets": [
        {
          "key": "20.0-30.0",
          "from": 20,
          "to": 30,
          "doc_count": 451,
          "group_by_gender": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
              {
                "key": "M",
                "doc_count": 232,
                "average_balance": {
                  "value": 27374.05172413793
                }
              },
              {
                "key": "F",
                "doc_count": 219,
                "average_balance": {
                  "value": 25341.260273972603
                }
              }
            ]
          }
        },
        {
          "key": "30.0-40.0",
          "from": 30,
          "to": 40,
          "doc_count": 504,
          "group_by_gender": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
              {
                "key": "F",
                "doc_count": 253,
                "average_balance": {
                  "value": 25670.869565217392
                }
              },
              {
                "key": "M",
                "doc_count": 251,
                "average_balance": {
                  "value": 24288.239043824702
                }
              }
            ]
          }
        },
        {
          "key": "40.0-50.0",
          "from": 40,
          "to": 50,
          "doc_count": 45,
          "group_by_gender": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
              {
                "key": "M",
                "doc_count": 24,
                "average_balance": {
                  "value": 26474.958333333332
                }
              },
              {
                "key": "F",
                "doc_count": 21,
                "average_balance": {
                  "value": 27992.571428571428
                }
              }
            ]
          }
        }
      ]
    }
  }

当然,除了上面的聚合操作,还有更多聚合操作可以探索。

你可能感兴趣的:(#,ES,Elasticsearch)