二十六、ElasticSearch深入聚合数据分析

1、两个核心概念:bucket和metric
bucket:一个数据分组
Metric:就是对一个bucket执行的某种聚合分析的操作,比如说求平均值,求最大值,求最小值
2、聚合介绍及下钻分析
统计哪种颜色的电视销量最高

GET /tvs/sales/_search
{
    "size" : 0,
    "aggs" : { 
        "popular_colors" : { 
            "terms" : { 
              "field" : "color"
            }
        }
    }
}

size:只获取聚合结果,而不要执行聚合的原始数据
aggs:固定语法,要对一份数据执行分组聚合操作
popular_colors:就是对每个aggs,都要起一个名字,这个名字是随机的,你随便取什么都ok
terms:根据字段的值进行分组
field:根据指定的字段的值进行分组

GET /tvs/sales/_search
{
   "size" : 0,
   "aggs": {
      "colors": {
         "terms": {
            "field": "color"
         },
         "aggs": { 
            "avg_price": { 
               "avg": {
                  "field": "price" 
               }
            }
         }
      }
   }
}

这个第二个aggs内部,同样取个名字,执行一个metric操作,avg,对之前的每个bucket中的数据的指定的field求一个平均值

{
  "took": 28,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 8,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "group_by_color": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "红色",
          "doc_count": 4,
          "avg_price": {
            "value": 3250
          }
        },
        {
          "key": "绿色",
          "doc_count": 2,
          "avg_price": {
            "value": 2100
          }
        }
        }
      ]
    }
  }
}

在计算出每各颜色中的有多少台及评均价格后,再计算里面分别是些什么牌子的,每个牌子的平均价格是多少

GET /tvs/sales/_search
{
  "size": 0,
  "aggs": {
    "group_color": {
      "terms": {
        "field": "color"
      },
      "aggs": {
        "avg_price": {
          "avg": {
            "field": "price"
          }
        },
        "group_brand":{
          "terms": {
            "field": "brand"
          },
          "aggs": {
            "brand_avg": {
              "avg": {
                "field": "price"
              }
            }
          }
        }
      }
    }
  }
}

3、Sun max min avg的用法

GET /tvs/sales/_search
{
  "size": 0,
  "aggs": {
    "group_by_color": {
      "terms": {
        "field": "color"
      },
      "aggs": {
        "avg_price": {"avg": {"field": "price"}
        },
        "max_price":{ "max": { "field": "price"}
        },
        "min_price":{"min": {"field": "price"}
        },
        "sum_price":{ "sum": {"field": "price"}}
      }
    }
  }
}

4、histogram 通过interval指定区间聚合查询

划分范围,02000,20004000,buckets
histogram:类似于terms,也是进行bucket分组操作,接收一个field,按照这个field的值的各个范围区间,进行bucket分组操作
"histogram":{
"field": "price",
"interval": 2000
},

Order:对指定的聚合进行排序,一般放在想要进行排序的上一层

GET /tvs/sales/_search
{
  "size": 0,
  "aggs": {
    "his_price": {
      "histogram": {
        "field": "price",
        "interval": 2000,
        "order": {
          "sum_price": "asc"
        }
      },
      "aggs": {
        "sum_price": {
          "sum": {
            "field": "price"
          }
        }
      }
    }
  }
}

5、date histogram,按照我们指定的某个date类型的日期field,以及日期interval,按照一定的日期间隔,去划分bucket

GET /tvs/sales/_search
{
  "size": 0,
  "aggs": {
    "group_date": {
      "date_histogram": {
        "field": "sold_date",
        "interval": "month",
        "format": "yyyy-MM-dd",
        "min_doc_count": 0,
      "extended_bounds":{
        "min":"2016-01-01",
        "max":"2017-12-31"
      }
      }
    }
  }
}

min_doc_count:0即使某个日期interval,2017-01-01~2017-01-31中,一条数据都没有,那么这个区间也是要返回的,默认不会过滤掉这个区间的,1过虑
extended_bounds,min,max:划分bucket的时候,会限定在这个起始日期,和截止日期内
6、分析之统计每季度每个品牌的销售额

GET /tvs/sales/_search
{
  "size": 0,
  "aggs": {
    "group_date": {
      "date_histogram": {
        "field": "sold_date",
        "interval": "quarter",
        "format": "yyyy-MM-dd",
        "min_doc_count": 1,
      "extended_bounds":{
        "min":"2016-01-01",
        "max":"2017-12-31"
        }
      },
      "aggs": {
        "group_brand": {
          "terms": {
            "field": "brand"
          },
          "aggs": {
            "sum_price": {
              "sum": {
                "field": "price"
              }
            }
          }
        },
        "sum_quarter":{
          "sum": {
            "field": "price"
          }
        }
      }
    }
  }
}

Quarter:按季度
7、单个品牌与所有品牌销量对比

GET /tvs/sales/_search
{
  "size": 0,
  "query": {
    "term": {
      "brand": {
        "value": "长虹"
      }
    }
  },
  "aggs": {
    "single_avg": {
      "avg": {
        "field": "price"
      }
    },
    "all_qu":{
      "global": {},
      "aggs": {
        "all_avg": {
          "avg": {
            "field": "price"
          }
        }
      }
    }
  }
}

8、bucket filter:对查询出来的进行多时间聚合分析

GET /tvs/sales/_search 
{
  "size": 0,
  "query": {
    "term": {
      "brand": {
        "value": "长虹"
      }
    }
  },
  "aggs": {
    "recent_150d": {
      "filter": {
        "range": {
          "sold_date": {
            "gte": "now-150d"
          }
        }
      },
      "aggs": {
        "recent_150d_avg_price": {
          "avg": {
            "field": "price"
          }
        }
      }
    },
    "recent_140d": {
      "filter": {
        "range": {
          "sold_date": {
            "gte": "now-140d"
          }
        }
      },
      "aggs": {
        "recent_140d_avg_price": {
          "avg": {
            "field": "price"
          }
        }
      }
    }
  }
}

你可能感兴趣的:(二十六、ElasticSearch深入聚合数据分析)