elasticsearch是面向文档的搜索分析引擎。
(1)快速检查集群的健康状态
GET /_cat/health?v
(2)快速查看集群中有哪些索引
GET /_cat/indices?v
(3)创建索引与删除索引
PUT /my_index001?pretty 创建索引
DELETE /my_index001?pretty 删除索引
(4)商品的crud
方法一:插入数据,自动生成document的 _id
POST /my_index001/_doc
{
"name":"奶粉",
"dec":"婴幼儿奶粉",
"price":270,
"producer":"奶粉producer",
"tags":["婴幼儿","二段"]
}
方法二:插入数据,手动指定document的 _id
PUT /my_index001/_doc/1
{
"name":"奶粉",
"dec":"中老年奶粉",
"price":250,
"producer":"中老年奶粉producer",
"tags":["中老年","补钙"]
}
查询所有document
查询索引my_index001下的所有数据
GET my_index001/_search
{
"query": {
"match_all": {}
}
}
返回结果
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : 1.0,
"hits" : [
{
"_index" : "my_index001",
"_type" : "_doc",
"_id" : "fXbW9nQBgKKpTl0QYML0",
"_score" : 1.0,
"_source" : {
"name" : "奶粉",
"dec" : "婴幼儿奶粉",
"price" : 270,
"producer" : "奶粉producer",
"tags" : [
"婴幼儿",
"二段"
]
}
},
{
"_index" : "my_index001",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"name" : "奶粉",
"dec" : "中老年奶粉",
"price" : 250,
"producer" : "中老年奶粉producer",
"tags" : [
"中老年",
"补钙"
]
}
}
]
}
}
返回结果中_id=fXbW9nQBgKKpTl0QYML0的文档是方法一没有指定ID的方式下es帮我们自动生成的document的ID;_id=1的文档是我们手动指定document的ID的情况下插入的数据,如果id已经存在则报错。
返回结果中的信息:
took:耗费来几毫秒
time_out:是否超时,这里没有
_shards:拆分成来几个分片,若有多个分片,对于搜索请求,会打到所有的primary shard(或者它的某个replica shard也可以)
hits.total:查询结果的数量,2个document
hits.max_score:score的含义就是一个document对于一个search的相关度的匹配分数,越相关,就越匹配,分数也高
hits.hits:包含了匹配搜索的document的详细数据
修改商品:更新指定document的指定字段
POST /my_index001/_doc/1/_update
{
"doc" : {
"price" : 280
}
}
替换document:此方法不是更新指定字段,而是将_id=1的document整个替换为下面插入的新的document。注意:若用这个方法更新,需要将原有的不需要更新的字段全部带上,否则会出现字段丢失。
POST /my_index001/_doc/1
{
"price" : 280
}
删除数据
DELETE /my_index001/_doc/1
一个bool查询是一个或者多个查询子句的组合:总共包含4种子句。其中2种会影响算分,2种不影响算分
相关性并不只是全文检索的专利。也适用于yes|no的子句,匹配的子句越多,相关性评分越高。如果多条查询子句被合并为一条复合查询语句,比如bool查询,则每个查询字句计算得出的评分会被合并到总的相关性评分中。
bool组合查询
GET /my_index001/_search
{
"query": {
"bool": {
"must": [
{"match": {
"name": "奶粉"
}}
],
"must_not": [
{"match": {
"name": "牙膏"
}}
],
"filter": {
"range": {
"price": {
"gte": 200,
"lte": 300
}
}
},
"should": [
{"match": {
"producer": "producer"
}}
]
}
}
}
上面这个bool查询中用到must、must_not、should、filter,这四个可以并行以任意顺序出现,在比的bool查询中,没有must条件,should中必须满足一条查询。
查询语句的结构会对相关度算分产生影响
查询语句
GET /my_index001/_search
{
"query": {
"bool": {
"should": [
{"match": {"tags": "老人"}},
{"match": {"tags": "婴儿"}},
{"bool": {
"should": [
{"match": {
"tags": "中年"
}}
]
}}
]
}
}
}
返回结果:
"hits" : {
"total" : 3,
"max_score" : 2.345461,
"hits" : [
{
"_index" : "my_index001",
"_type" : "_doc",
"_id" : "2",
"_score" : 2.345461,
"_source" : {
"name" : "奶粉",
"dec" : "老人奶粉",
"price" : 240,
"producer" : "老人奶粉producer",
"tags" : [
"老人"
]
}
},
{
"_index" : "my_index001",
"_type" : "_doc",
"_id" : "fXbW9nQBgKKpTl0QYML0",
"_score" : 2.345461,
"_source" : {
"name" : "奶粉",
"dec" : "婴儿奶粉",
"price" : 270,
"producer" : "奶粉producer",
"tags" : [
"婴儿"
]
}
},
{
"_index" : "my_index001",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.4779618,
"_source" : {
"name" : "奶粉",
"dec" : "中年奶粉",
"price" : 240,
"producer" : "中年奶粉producer",
"tags" : [
"中年",
"帮助睡眠"
]
}
}
]
}
上面的查询中tags="老人"和tags="婴儿"的文档的扽都为2.345461,而tags="中年"的文档得分为1.4779618,是因为前面两个文档查询时条件在同一层级,而最后一个文档查询条件的层级则是在前两个文档的下一层,上面的查询语句可见。
控制字段的boosting,控制关键词搜索结果的得分
查询语句:
GET /my_index001/_search
{
"query": {
"bool": {
"should": [
{"match": {
"tags": {
"query": "老人",
"boost":4
}
}},
{"match": {
"tags": {
"query": "婴儿",
"boost":1
}
}}
]
}
}
}
返回结果:
"hits" : [
{
"_index" : "my_index001",
"_type" : "_doc",
"_id" : "2",
"_score" : 9.381844,
"_source" : {
"name" : "奶粉",
"dec" : "老人奶粉",
"price" : 240,
"producer" : "老人奶粉producer",
"tags" : [
"老人"
]
}
},
{
"_index" : "my_index001",
"_type" : "_doc",
"_id" : "fXbW9nQBgKKpTl0QYML0",
"_score" : 2.345461,
"_source" : {
"name" : "奶粉",
"dec" : "婴儿奶粉",
"price" : 270,
"producer" : "奶粉producer",
"tags" : [
"婴儿"
]
}
}
]
}
boost设置为4的关键词的查询结果得分为9.381844,boost设置为1的关键词的查询结果得分则为2.345461。若两个boost得分都设置为4,则返回结果两个文档的得分都等于9.381844。
以上的查询使用的都是全文检索,全文检索会将输入的搜索串拆解开来,去倒排索引里面去一一匹配,只要能匹配上任意一个拆解后的单词,就可以作为结果返回。相反,短语检索要求输入的搜索串必须在指定的字段文本中,完全包含一摸一样的,才可以算匹配,才能所谓结果返回。语法如下:
GET /my_index001/_search
{
"query": {
"match_phrase": {
"dec": "老人奶粉"
}
}
}
在使用es的聚合分析功能之前必须将聚合分析的字段的fielddata属性设置为true,否则会报action_request_validation_exception这个异常,设置索引的某个字段的fielddata属性语法如下:
PUT /my_index001/_mapping/_doc
{
"properties": {
"name":{
"type": "text",
"fielddata": true
}
}
}
计算每个name下的商品数量,注意若有中文需自行设置IK分词器,es内置的中文分词器会把中文拆分成一个个单独的文字:
聚合语法:
GET /my_index001/_search
{
"size": 0, //若不加size,会把聚合用到所有的document也一并返回
"aggs": {
"group_by_name": { //聚合返回结果名称
"terms": {
"field": "name" //聚合字段
}
}
}
}
返回结果:
"aggregations" : {
"group_by_name" : { //聚合返回结果名称,与上面相对应
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [ //聚合结果
{
"key" : "奶粉",
"doc_count" : 3
},
{
"key" : "裤子",
"doc_count" : 1
}
]
}
}
计算name="奶粉"下每个tag下的商品数量:
GET /my_index001/_search
{
"size": 0,
"query": {
"match": {
"name": "奶粉"
}
},
"aggs": {
"group_by_tags": {
"terms": {
"field": "tags"
}
}
}
}
先分组,再计算每组的平均值,计算每个name下商品的平均值:
GET /my_index001/_search
{
"size": 0,
"aggs": {
"group_by_name": {
"terms": {
"field": "name"
},
"aggs": {
"avg_price": {
"avg": {
"field": "price"
}
}
}
}
}
}
计算每个name下的price的平均值,并按照price升序排序:
GET /my_index001/_search
{
"size": 0,
"aggs": {
"group_by_name": {
"terms": {
"field": "name",
"order": {
"avg_price": "asc"
}
},
"aggs": {
"avg_price": {
"avg": {
"field": "price"
}
}
}
}
}
}
按照指定价格范围区间进行分组,然后在每组内再按照tag进行分组,最后计算每组平均价格:
GET /my_index001/_search
{
"size": 0,
"aggs": {
"group_by_price": {
"range": {
"field": "price",
"ranges": [
{
"from": 0,
"to": 100
},
{
"from": 100,
"to": 200
},
{
"from": 200,
"to": 300
}
]
},
"aggs": {
"group_by_name": {
"terms": {
"field": "name"
},
"aggs": {
"average_price": {
"avg": {
"field": "price"
}
}
}
}
}
}
}
}