es 官方文档参考:https://www.elastic.co/guide/en/elasticsearch/reference/current/getting-started.html
PUT index_name/type_name/id
其中id为自增变量1,2,3
添加示例数据如下,后续的查询步骤基于添加的这些数据
PUT /megacorp/employee/1
{
"first_name" : "John",
"last_name" : "Smith",
"age" : 25,
"about" : "I love to go rock climbing",
"interests": [ "sports", "music" ]
}
PUT /megacorp/employee/2
{
"first_name" : "Jane",
"last_name" : "Smith",
"age" : 32,
"about" : "I like to collect rock albums",
"interests": [ "music" ]
}
PUT /megacorp/employee/3
{
"first_name" : "Douglas",
"last_name" : "Fir",
"age" : 35,
"about": "I like to build cabinets",
"interests": [ "forestry" ]
}
- 获取全部`GET /megacorp/employee/_search`
- 获取last_name为smith的值`GET /megacorp/employee/_search?q=last_name:Smith`
使用Elasticsearch-head查询时,要注意使用POST才能带Body中的查询表达式查询。
后续的操作主要是使用Postman进行数据的查询。
match:匹配
使用查询表达式获取last_name为smith的值的查询方法如下:
```json
GET /megacorp/employee/_search
{
"query" : {
"match" : {
"last_name" : "Smith"
}
}
}
```
使用mactch全文搜索所有喜欢攀岩的员工
```json
GET /megacorp/employee/_search
{
"query":{
"match":{
"about":"rock climbing"
}
}
}
```
返回结果中_score表示匹配到的结果的相似度。结果列表默认通过_score的降序顺序显示。
match会匹配全文中包含rock或者climbing的数据
match_phrase:短语搜索
当需要匹配rock climbing整个短语时,将2.2.1 dsl中的 match
替换为match_phrase
进行短语搜索
range:过滤器,用于范围查询
搜索姓氏为Smith,年龄大于 30 的的员工
```json
GET /megacorp/employee/_search
{
"query":{
"bool":{
"must":{
"match":{
"last_name":"Smith"
}
},
"filter":{
"range":{
"age":{"gt":30}
}
}
}
}
}
```
regexp:正则匹配
{
"query":{
"regexp":{
"last_name":"Smith"
}
}
}
聚合操作被置于顶层参数aggs
之下,我们可以聚合指定一个我们想要名称,本例中是: all_interests
(挖掘出员工中最受欢迎的兴趣爱好).最后,定义单个桶的类型 terms
(即我们要针对哪一个特征数据进行聚合)
PS:text类型字段的属性要添加 set fielddata=true 才能做聚合查询(详情见2.2)
_bulk:批量添加数据
数据是关于汽车交易的信息:车型、制造商、售价、何时被出售等
POST /cars/transactions/_bulk
{
"index": {
}}
{
"price" : 10000, "color" : "red", "make" : "honda", "sold" : "2014-10-28" }
{
"index": {
}}
{
"price" : 20000, "color" : "red", "make" : "honda", "sold" : "2014-11-05" }
{
"index": {
}}
{
"price" : 30000, "color" : "green", "make" : "ford", "sold" : "2014-05-18" }
{
"index": {
}}
{
"price" : 15000, "color" : "blue", "make" : "toyota", "sold" : "2014-07-02" }
{
"index": {
}}
{
"price" : 12000, "color" : "green", "make" : "toyota", "sold" : "2014-08-19" }
{
"index": {
}}
{
"price" : 20000, "color" : "red", "make" : "honda", "sold" : "2014-11-05" }
{
"index": {
}}
{
"price" : 80000, "color" : "red", "make" : "bmw", "sold" : "2014-01-01" }
{
"index": {
}}
{
"price" : 25000, "color" : "blue", "make" : "ford", "sold" : "2014-02-12" }
PS:在json文件末尾需要加一行空行,否则会报错
_mapping操作
当直接开始聚合时会发生报错
"type": "illegal_argument_exception", "reason": "Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [color] in order to load field data by uninverting the inverted index. Note that this can use significant memory."
这里我们需要通过_mapping操作将我们要聚合的属性color
的fielddata属性设置为true。为了避免后续可能发生的同类错误,同时设置make
的fielddata
属性为true
POST /cars/_mapping
{
{
"properties": {
"color": {
"type": "text",
"fielddata": true
},
"make": {
"type": "text",
"fielddata": true
}
}
}
}
聚合得出汽车销量最好的颜色
size
设置为 0 来提高查询速度。如果不设置该参数,hits[“hits”]中会返回含有color的所有文档。GET /cars/transcations/_search
{
"size":0,
"aggs":{
"popular_color":{
"terms":{
"field":"color"
}
}
}
}
返回结果
{
"hits": {
"total": {
"value": 8,
"relation": "eq"
},
"max_score": null,
"hits": []
},
"aggregations": {
"popular_color": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "red",
"doc_count": 4
},
{
"key": "blue",
"doc_count": 2
},
{
"key": "green",
"doc_count": 2
}
]
}
}
}
通常,我们的应用需要提供更复杂的文档度量。
例如,每种颜色汽车的平均价格是多少。
需要将度量 嵌套 在桶内,度量会基于桶内的文档计算统计结果。以average
平均度量为例
GET /cars/transcations/_search
{
"size":0,
"aggs":{
"colors":{
"terms":{
"field":"color"
},
"aggs":{
"avg_price":{
"avg":{
"field":"price"
}
}
}
}
}
}
其中colors、avg_price均为我们自定义的名称,呈现在返回值中(如下)
{
...
"hits": {
"total": {
"value": 8,
"relation": "eq"
},
"max_score": null,
"hits": []
},
"aggregations": {
"colors": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "red",
"doc_count": 4,
"avg_price": {
"value": 32500.0
}
},
{
"key": "blue",
"doc_count": 2,
"avg_price": {
"value": 20000.0
}
},
{
"key": "green",
"doc_count": 2,
"avg_price": {
"value": 21000.0
}
}
]
}
}
}
上述的聚合操作都是默认针对整个文档来操作的(即在聚合前先查询了所有的文档,省略了querry
)。当需要针对某个汽车制造商进行聚合时,如下
GET /cars/transactions/_search
{
"query" : {
"match" : {
"make" : "ford"
}
},
"aggs" : {
"colors" : {
"terms" : {
"field" : "color"
}
}
}
}
统计每个颜色的汽车制造商的分布。这里基于上述的查询语句加入一层嵌套即可。
GET /cars/transcations/_search
{
"size":0,
"aggs":{
"colors":{
"terms":{
"field":"color"
},
"aggs":{
"avg_price":{
"avg":{
"field":"price"
}
},
"producer":{
"terms":{
"field":"make"
}
}
}
}
}
}
返回结果(显示部分)
{
...
"aggregations": {
"colors": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "red",
"doc_count": 4,
"producer": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "honda",
"doc_count": 3
},
{
"key": "bmw",
"doc_count": 1
}
]
},
"avg_price": {
"value": 32500.0
}
},
...
]
}
}
}
从返回结果可见
红色车有四辆。
红色车的平均售价是 $32,500 美元。
其中三辆是 Honda 本田制造,一辆是 BMW 宝马制造。
在上一个查询的基础上再添加一层嵌套
统计每个厂商的最大最小值max、min
{
"size":0,
"aggs":{
"colors":{
"terms":{
"field":"color"
},
"aggs":{
"avg_price":{
"avg":{
"field":"price"}},
"make":{
"terms":{
"field":"make"
},
"aggs":{
"min_price":{
"min":{
"field":"price"}},
"max_price":{
"max":{
"field":"price"}}
}
}
}
}
}
}
返回值(只显示部分结果:红色车)
{
...
"aggregations": {
"colors": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "red",
"doc_count": 4,
"avg_price": {
"value": 32500.0
},
"make": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "honda",
"doc_count": 3,
"max_price": {
"value": 20000.0
},
"min_price": {
"value": 10000.0
}
},
{
"key": "bmw",
"doc_count": 1,
"max_price": {
"value": 80000.0
},
"min_price": {
"value": 80000.0
}
}
]
}
}
]
}
}
}
进行时间序列分析最常用的方法是date_histogram
GET /cars/transactions/_search
{
"size":0,
"aggs":{
"sales":{
"date_histogram":{
"field":"sold",
"interval":"month",
"format":"yyyy-MM-dd",
"min_doc_count":0,
"extended_bounds":{
"min":"2014-01-01",
"max":"2014-12-01"
}
}
}
}
}
其中min_doc_count
用于强制返回空buckets(默认情况下空buckets是不会返回的)
按季度quarter
、按每个汽车品牌计算销售总额,以便可以找出哪种品牌最赚钱。
GET /cars/transactions/_search
{
"size":0,
"aggs":{
"sales":{
"date_histogram":{
"field":"sold",
"interval":"quarter",
"format":"yyyy-MM-dd",
"min_doc_count":0,
"extended_bounds":{
"min":"2014-01-01",
"max":"2014-12-01"
}
},
"aggs":{
"per_make_sum":{
"terms":{
"field":"make"
},
"aggs":{
"sum_price":{
"sum":{
"field":"price"}
}
}
},
"total_sum":{
"sum":{
"field":"price"}
}
}
}
}
}
按季度聚合返回值
{
"aggregations": {
"sales": {
"buckets": [
{
"key_as_string": "2014-01-01",
"key": 1388534400000,
"doc_count": 2,
"per_make_sum": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "bmw",
"doc_count": 1,
"sum_price": {
"value": 80000.0
}
},
{
"key": "ford",
"doc_count": 1,
"sum_price": {
"value": 25000.0
}
}
]
},
"total_sum": {
"value": 105000.0
}
},
{
"key_as_string": "2014-04-01",
"key": 1396310400000,
"doc_count": 1,
"per_make_sum": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "ford",
"doc_count": 1,
"sum_price": {
"value": 30000.0
}
}
]
},
"total_sum": {
"value": 30000.0
}
},
{
"key_as_string": "2014-07-01",
"key": 1404172800000,
"doc_count": 2,
"per_make_sum": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "toyota",
"doc_count": 2,
"sum_price": {
"value": 27000.0
}
}
]
},
"total_sum": {
"value": 27000.0
}
},
{
"key_as_string": "2014-10-01",
"key": 1412121600000,
"doc_count": 3,
"per_make_sum": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "honda",
"doc_count": 3,
"sum_price": {
"value": 50000.0
}
}
]
},
"total_sum": {
"value": 50000.0
}
}
]
}
}
}
当我们需要要搜索某个索引的子集,但是聚合的对象却是所有数据时。可以用普通聚合获取前者的信息,使用全局桶global
获取后者信息。
eg:比较福特汽车与所有汽车的平均售价
GET /cars/transactions/_search
{
"size":0,
"query":{
"match":{
"make":"ford"
}
},
"aggs":{
"fold_ave_price":{
"ave":{
"field":"price"}
},
"all":{
"global":{
},
"aggs":{
"ave_price":{
"ave":{
"field":"price"}
}
}
}
}
}
返回结果
{
"aggregations": {
"all": {
"doc_count": 8,
"ave_price": {
"value": 26500.0
}
},
"fold_ave_price": {
"value": 27500.0
}
}
}
可以指定一个过滤桶filter
,当文档满足过滤桶的条件时,我们将其加入到桶内。
gte
greater than or equal
GET /cars/transactions/_search
{
"size":0,
"query":{
"consatant_score":{
"filter":{
"range":{
"price":{
"gte":10000
}
}
}
}
},
"aggs":{
"average_price":{
"avg":{
"field":"price"
}
}
}
}
GET /cars/transactions/_search
{
"size":0,
"query":{
"match":{
"make":"fold"
}
},
"aggs":{
"recent_sales":{
"filter":{
"range":{
"sold":{
"from":"now-1M"
}
}
}
},
aggs:{
"average_price":{
"avg":{
"field":"price"
}
}
}
}
}
一个 bool 过滤器由三部分组成:must、should、must_not
must
:所有的语句都 必须(must) 匹配,与 AND 等价。
must_not
:所有的语句都 不能(must not) 匹配,与 NOT 等价。
should
:至少有一个语句要匹配,与 OR 等价。
这里查询福特汽车中价格大于10000美元且颜色不为绿色的车
GET /cars/transactions/_search
{
"query":{
"bool":{
"must":[
{
"term":{
"make":"ford"}},
{
"range":{
"price":{
"gte":10000}}}
],
"must_not":[
{
"term":{
"color":"green"}}
]
}
}
}
可以看到返回了一条记录
{
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 2.2809339,
"hits": [
{
"_index": "cars",
"_type": "transactions",
"_id": "wpHHsHUB1ZhgFrAFKd18",
"_score": 2.2809339,
"_source": {
"price": 25000,
"color": "blue",
"make": "ford",
"sold": "2014-02-12"
}
}
]
}
}
查询之后执行。可以利用这个行为对查询条件应用更多的过滤器,而不会影响其他的操作。
eg:用户搜索汽车同时可以根据颜色来过滤
{
"size":0,
"query":{
"match":{
"make":"fold"
}
},
post_filter:{
"term":{
"color":"green"
}
}
}
为聚合引入了一个 order 对象, 它允许我们可以根据以下几个值中的一个值进行排序:
_count
:按文档数排序。对 terms 、 histogram 、 date_histogram 有效。
_term
:按词项的字符串值的字母顺序排序。只在 terms 内使用。
_key
:按每个桶的键值数值排序(理论上与 _term 类似)。 只在 histogram 和 date_histogram 内使用。
{
"size":0,
"aggs":{
"colors":{
"terms":{
"field":"color",
"order":{
"_count":"asc"
}
}
}
}
}
eg:按照汽车颜色创建一个销售条状图表,但按照汽车平均售价的升序进行排序。
GET /cars/transactions/_search
{
"size":0,
"aggs":{
"colors":{
"terms":{
"field":"color",
"order":{
"avg_price":"asc"
}
},
"aggs":{
"avg_price":{
"avg":{
"field":"price"}
}
}
}
}
}
返回结果
GET /cars/transactions/_search
{
"aggregations": {
"colors": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "blue",
"doc_count": 2,
"avg_price": {
"value": 20000.0
}
},
{
"key": "green",
"doc_count": 2,
"avg_price": {
"value": 21000.0
}
},
{
"key": "red",
"doc_count": 4,
"avg_price": {
"value": 32500.0
}
}
]
}
}
}
使用extended_stats
会在结果输出所有的度量值(见返回值),这个例子中获取price的度量值后按price的方差进行排序
GET /cars/transactions/_search
{
"size":0,
"aggs":{
"colors":{
"terms":{
"field":"color",
"order":{
"stats.variance":"asc"
}
},
"aggs":{
"stats":{
"extended_stats":{
"field":"price"
}
}
}
}
}
}
返回值(部分展示)
{
"aggregations": {
"colors": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "blue",
"doc_count": 2,
"stats": {
"count": 2,
"min": 15000.0,
"max": 25000.0,
"avg": 20000.0,
"sum": 40000.0,
"sum_of_squares": 8.5E8,
"variance": 2.5E7,
"variance_population": 2.5E7,
"variance_sampling": 5.0E7,
"std_deviation": 5000.0,
"std_deviation_population": 5000.0,
"std_deviation_sampling": 7071.067811865475,
"std_deviation_bounds": {
"upper": 30000.0,
"lower": 10000.0,
"upper_population": 30000.0,
"lower_population": 10000.0,
"upper_sampling": 34142.13562373095,
"lower_sampling": 5857.86437626905
}
}
},
{
"key": "green",
"doc_count": 2,
"stats": {
"count": 2,
"min": 12000.0,
"max": 30000.0,
"avg": 21000.0,
"sum": 42000.0,
"sum_of_squares": 1.044E9,
"variance": 8.1E7,
"variance_population": 8.1E7,
"variance_sampling": 1.62E8,
"std_deviation": 9000.0,
"std_deviation_population": 9000.0,
"std_deviation_sampling": 12727.922061357855,
"std_deviation_bounds": {
"upper": 39000.0,
"lower": 3000.0,
"upper_population": 39000.0,
"lower_population": 3000.0,
"upper_sampling": 46455.84412271571,
"lower_sampling": -4455.844122715709
}
}
}
]
}
}
}
当需要对更深的度量进行排序,比如孙子桶或从孙桶。将度量用尖括号( > )嵌套起来,像这样: my_bucket>another_bucket>metric
eg:创建一个汽车售价的直方图,但是按照红色和绿色(不包括蓝色)车各自的方差来排序
{
"size":0,
"aggs":{
"colors":{
"histogram":{
"field":"price",
"interval":20000,
"order":{
"red_green_cars>stats.variance":"asc"
}
},
"aggs":{
"red_green_cars":{
"filter":{
"terms":{
"color":["red","green"]}},
"aggs":{
"stats":{
"extended_stats":{
"field":"price"}
}
}
}
}
}
}
}