1.本文elasticsearch版本是 7.2.1;
2.文档查询语句叫做 DSL, domain structure language, 领域特定语言;dsl,参见 Query DSL | Elasticsearch Guide [7.2] | Elastic
3.elasticsearch 基于json 提供了完整的查询 DSL 语句(Domain Specific Language-领域特定语言)来定义查询,把查询的dsl看做一种查询的抽象语法树,包含两种类型的查询;
4.查询子句的行为不同,具体取决于它们是用于查询上下文还是过滤器上下文。
5.查询dsl api分类(包括但不限于,本文仅列出常用查询api);
6.本文es文档数据来源于 content-elasticsearch-deep-dive/accounts.json at master · linuxacademy/content-elasticsearch-deep-dive · GitHubMyles Elastic Certified Engineer Course. Contribute to linuxacademy/content-elasticsearch-deep-dive development by creating an account on GitHub.https://github.com/linuxacademy/content-elasticsearch-deep-dive/blob/master/sample_data/accounts.json,批量导入,参见 2.elasticsearch文档批量操作-bulk api_PacosonSWJTU的博客-CSDN博客
中的 “【2】bulk 批量导入样本数据章节”;
7.elasticsearch查询 dsl api 是否计算文档分数统计表:
0)match_all 查询所有文档,给定文档分数为1.0 (不计算文档分数,给定1.0);
1)分页查询银行账户索引文档;
// 【sql】
Select firstname, balance
From bank
Order by balance desc
Limit 5 offset 0/5
// elasticsearch match_all api
Post localhost:9200/bank/_search
{
"_source":["firstname", "balance"]
,"query":{
"match_all":{}
}
, "sort":[
{
"balance":"desc"
}
]
, "from":5/0 // 文档偏移量为5 或者 0
, "size":5 // 每页5个文档
}
match匹配结果:返回与提供的文本、数字、日期或布尔值匹配的文档。
1)match 匹配非字符串类型字段,就是精确匹配;
2)查询余额等于 49568 的客户信息文档
Post localhost:9200/bank/_search
{
"_source":["firstname", "balance"]
,"query":{
"match":{
"balance":49568
}
}
}
{
"took": 18,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 1.0, // 评分默认为1,没有计算文档评分
"hits": [
{
"_index": "bank",
"_type": "account",
"_id": "168",
"_score": 1.0, // 评分默认为1,没有计算文档评分
"_source": {
"firstname": "Carissa",
"balance": 49568
}
}
]
}
}
0)match匹配字符串类型字段,就是模糊匹配;
1)查询地址包含 Kings 的银行客户文档
Post localhost:9200/bank/_search
{
"_source":["firstname", "balance","address"]
,"query":{
"match":{
"address":"Kings"
}
}
}
{
"took": 7,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 2,
"relation": "eq"
},
"max_score": 5.9908285,// 显然计算了文档评分
"hits": [
{
"_index": "bank",
"_type": "account",
"_id": "20",
"_score": 5.9908285,
"_source": {
"firstname": "Elinor",
"address": "282 Kings Place",
"balance": 16418
}
},
{
"_index": "bank",
"_type": "account",
"_id": "722",
"_score": 5.9908285,
"_source": {
"firstname": "Roberts",
"address": "305 Kings Hwy",
"balance": 27256
}
}
]
}
}
【小结】match的字段类型是字符串时 (注意match的字段类型是非字符串,则是精确匹配)
1)match_phrase短语匹配:把需要匹配的值当做一个整体单词(不分词)进行检索;
2)match 全文检索例子(查询结果有19个文档)
Post localhost:9200/bank/_search
{
"_source":["firstname", "balance","address"]
,"query":{
"match":{
"address":"mill lane"
}
}
}
【解析】
3)match_phrase:短语匹配例子 (查询结果只有1个文档)
Post localhost:9200/bank/_search
{
"_source":["firstname", "balance","address"]
,"query":{
"match_phrase":{
"address":"mill lane"
}
}
}
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 9.507477,
"hits": [
{
"_index": "bank",
"_type": "account",
"_id": "136",
"_score": 9.507477,
"_source": {
"firstname": "Winnie",
"address": "198 Mill Lane",
"balance": 45801
}
}
]
}
}
【解析】match_phrase:把 mill lane 当做一个整体,不分词,送入es查询包含 mill lane的文档;
1)场景: 查询出 state或address字段包含 mill 的文档;
Post localhost:9200/bank/_search
{
"_source":["firstname", "balance","address","state"]
,"query":{
"multi_match":{
"query":"mill"
, "fields":["state", "address"]
}
}
}
// 查询结果
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 4,
"relation": "eq"
},
"max_score": 5.4032025,
"hits": [
{
"_index": "bank",
"_type": "account",
"_id": "970",
"_score": 5.4032025,
"_source": {
"firstname": "Forbes",
"address": "990 Mill Road",
"balance": 19648,
"state": "AK"
}
},
......
}
1)场景: 查询出 state或address字段包含 mill 或 KY的文档;
{
"_source":["firstname", "balance","address","state"]
,"query":{
"multi_match":{
"query":"mill KY"
, "fields":["state", "address"]
}
}
}
1)bool 用来做复合查询;
2)bool 复合查询可以包含的查询子句
1)场景: 查询gender等于M,state等于KY,且 age不等于28,或者 lastname等于 Hancock的文档;
Post localhost:9200/bank/_search
{
"query":{
"bool":{
"must":[
{"match":{"gender":"M"}}
,{"match":{"state":"KY"}}
]
,"must_not":[
{"match":{"age":"28"}}
]
,"should":[
{"match":{"lastname":"Hancock"}}
]
}
}
}
// 查询结果
{
"took": 8,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 6,
"relation": "eq"
},
"max_score": 11.173532,
"hits": [
{
"_index": "bank",
"_type": "account",
"_id": "640",
"_score": 11.173532, // 计算文档得分
"_source": {
"account_number": 640,
"balance": 35596,
"firstname": "Candace",
"lastname": "Hancock",
"age": 25,
"gender": "M",
"address": "574 Riverdale Avenue",
"employer": "Animalia",
"email": "[email protected]",
"city": "Blandburg",
"state": "KY"
}
},
............................
1)filter 不计算文档分数 ;
1)不用filter的查询(计算分数)
场景:查询年龄大于等于18,且小于等于30,且address包含mill的文档;
注意: match:{"address":"mill"} 表示的是 address包含mill,并不是address等于mill;
Post localhost:9200/bank/_search
{
"query":{
"bool":{
"must":[
{
"range":{
"age":{"gte":18,"lte":30}
}
}
, {
"match":{"address":"mill"}
}
]
}
}
}
// 查询结果
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 6.4032025,
"hits": [
{
"_index": "bank",
"_type": "account",
"_id": "970",
"_score": 6.4032025,// 计算分数
"_source": {
"account_number": 970,
"balance": 19648,
"firstname": "Forbes",
"lastname": "Wallace",
"age": 28,
"gender": "M",
"address": "990 Mill Road",
"employer": "Pheast",
"email": "[email protected]",
"city": "Lopezo",
"state": "AK"
}
}
]
}
}
1)用filter过滤器查询,就不会计算文档的相关性分数;
2)场景:查询年龄大于等于18,且小于等于30,且address包含mill的文档;
查询结果的分数为0.0,显然 filter不会计算分数;
Post localhost:9200/bank/_search
{
"query":{
"bool":{
"filter":[
{
"range":{
"age":{"gte":18,"lte":30}
}
}
, {
"match":{"address":"mill"}
}
]
}
}
}
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 0.0,
"hits": [
{
"_index": "bank",
"_type": "account",
"_id": "970",
"_score": 0.0, // 没有计算得分
"_source": {
"account_number": 970,
"balance": 19648,
"firstname": "Forbes",
"lastname": "Wallace",
"age": 28,
"gender": "M",
"address": "990 Mill Road",
"employer": "Pheast",
"email": "[email protected]",
"city": "Lopezo",
"state": "AK"
}
}
]
}
}
1)场景:查询 gender等于M,且state等于KY,且age不等于28,或者 lastname等于 Hancock,且 age在25到30之间的文档;
Post localhost:9200/bank/_search
{
"query":{
"bool":{
"must":[
{"match":{"gender":"M"}}
,{"match":{"state":"KY"}}
]
,"must_not":[
{"match":{"age":"28"}}
]
,"should":[
{"match":{"lastname":"Hancock"}}
]
, "filter":{
"range":{
"age":{"gte":"25","lte":"30"}
}
}
}
}
}
// 查询结果
{
"took": 4,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 11.173532,
"hits": [
{
"_index": "bank",
"_type": "account",
"_id": "640",
"_score": 11.173532,
"_source": {
"account_number": 640,
"balance": 35596,
"firstname": "Candace",
"lastname": "Hancock",
"age": 25,
"gender": "M",
"address": "574 Riverdale Avenue",
"employer": "Animalia",
"email": "[email protected]",
"city": "Blandburg",
"state": "KY"
}
}
]
}
}
2)如何使得上述bool查询中的must,should 不计算分数呢 ?将其嵌套在 filter 里面,如下:
post localhost:9200/bank/_search
{
"query":{
"bool":{
"filter":{
"bool":{
"must":[
{"match":{"gender":"M"}}
,{"match":{"state":"KY"}}
]
,"must_not":[
{"match":{"age":"28"}}
]
,"should":[
{"match":{"lastname":"Hancock"}}
]
, "filter":{
"range":{
"age":{"gte":"25","lte":"30"}
}
}
}
}
}
}
}
// 查询结果
{
"took": 368,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 0.0,
"hits": [
{
"_index": "bank",
"_type": "account",
"_id": "640",
"_score": 0.0, // 显然没有计算文档分数
"_source": {
"account_number": 640,
"balance": 35596,
"firstname": "Candace",
"lastname": "Hancock",
"age": 25,
"gender": "M",
"address": "574 Riverdale Avenue",
"employer": "Animalia",
"email": "[email protected]",
"city": "Blandburg",
"state": "KY"
}
}
]
}
}
1)定义:
2)注意:
3)term 与 match 区别:
1)场景1:查询 address 等于 574 Riverdale Avenue 的文档 ;
es文档中的address被分词了,而term查询对 574 Riverdale Avenue 进行精确查询(不分词),所以查无记录。
{
"query":{
"term":{
"address":"574 Riverdale Avenue"
}
}
}
// 为空。
2)场景2:查询 address.keyword 等于 574 Riverdale Avenue 的文档 (查询 address不分词时 等于 574 Riverdale Avenue 的文档 )
{
"query":{
"term":{
"address.keyword":"574 Riverdale Avenue" // 这里匹配的是 address的keyword属性
}
}
}
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 6.5032897,
"hits": [
{
"_index": "bank",
"_type": "account",
"_id": "640",
"_score": 6.5032897,// 显然 term 要评分
"_source": {
"account_number": 640,
"balance": 35596,
"firstname": "Candace",
"lastname": "Hancock",
"age": 25,
"gender": "M",
"address": "574 Riverdale Avenue",
"employer": "Animalia",
"email": "[email protected]",
"city": "Blandburg",
"state": "KY"
}
}
]
}
}
【结果分析】
1)address.keyword 查无记录;因为是精确匹配(不分词);
场景:查询地址等于 574 Riverdale 的文档;
{
"query":{
"term":{
"address.keyword":"574 Riverdale"
}
}
}
// 查无记录
2)match_phrase 是部分匹配,包含 574 Riverdale 即可(有值);
{
"query":{
"match_phrase":{
"address":"574 Riverdale"
}
}
}
// 查询结果
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 12.492344,
"hits": [
{
"_index": "bank",
"_type": "account",
"_id": "640",
"_score": 12.492344,
"_source": {
"account_number": 640,
"balance": 35596,
"firstname": "Candace",
"lastname": "Hancock",
"age": 25,
"gender": "M",
"address": "574 Riverdale Avenue",
"employer": "Animalia",
"email": "[email protected]",
"city": "Blandburg",
"state": "KY"
}
}
]
}
}
【小结】
1)constant_score 不计算分数;
2)constant_score 参数有2个:filter 和 boost;
场景:查询年龄在大于等于25,且小于等于30的文档;
post localhost:9200/bank/_search
{
"query":{
"constant_score":{
"filter":{
"range":{
"age":{"gte":"25","lte":"30"}
}
}
}
}
}
// 查询结果
{
"took": 4,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 273,
"relation": "eq"
},
"max_score": 1.0,
"hits": [
{
"_index": "bank",
"_type": "account",
"_id": "13",
"_score": 1.0, // 显然,文档分数默认为1.0
"_source": {
"account_number": 13,
"balance": 32838,
"firstname": "Nanette",
"lastname": "Bates",
"age": 28,
"gender": "F",
"address": "789 Madison Street",
"employer": "Quility",
"email": "[email protected]",
"city": "Nogal",
"state": "VA"
}
},
......
}
参考自: Constant score query | Elasticsearch Guide [7.2] | Elastichttps://www.elastic.co/guide/en/elasticsearch/reference/7.2/query-dsl-constant-score-query.html