查询和过滤的区别
ES提供基于JSON的完整DSL来定义查询,查询DSL包括两种子句:
叶查询子句:在特定的字段上查找特定的值,比如match、term或者range查询,这些查询可以自己使用
复合查询子句:包含其他叶查询或复合查询子句,以合理的方式结合多条查询(比如bool或dis_max查询),或者改变查询行为(比如not或constant_score查询)
查询(query):用于检查内容与条件是否匹配,并且计算_score元字段表示匹配度,查询的结构中以query参数开始来执行内容查询
过滤(filter):不计算匹配得分,只是简单地决定文档是否匹配,内容过滤主要用于过滤结构化的数据
使用过滤往往会被ElasticSearch自动缓存来提高性能
查询子句也可以传递filter查询,比如bool查询内的filter、constant_score查询内的filter等
举个查询子句的例子,查询会匹配符合下列所有文档
1、title字段包含单词 "重生甜俏妻逆袭"
2、author字段包含"帘半卷"
3、amount字段为100
4、ConsumeTime字段包含从2018-01-01至今的日期
请求: POST http://127.0.0.1:9200/xy_order/_search
{
"query":{
"bool":{
"must":[
{"match":{"title":"重生甜俏妻逆袭"}},
{"match":{"author":"帘半卷"}}
],
"filter":[
{"term":"amount":100},
{"range","ConsumeTime":{"gte":"2018-01-01"}}
]
}
}
}
query参数表示内容查询,内容查询中使用的bool和match子句,用于计算每个文档的匹配得分。
filter参数表示内容过滤,内容过滤中使用的term和range子句,会过滤掉不匹配的文档,并且不影响计算文档匹配得分。
全文搜索
标准查询:接受文本/数字/日期的查询,分析参数并组成查询条件,例如:
{
"match":{
"message":"this is a test"
}
}
注意message是字段名,可以用任何字段的名(包括_all)来替换。
有三种类型的match查询:布尔(bool)、短语(phrase)和短语前缀(phrase_prefix),除此之外还有多段查询、Lucene语法查询、简化查询
布尔查询
默认的标准查询类型,分析文本并且组成一个布尔类型。operator参数可以设置为or或者and来控制布尔子句(默认为or),用于匹配的should子句的最小数量可以使用minimun_should_match参数来设置
可以设置analyzer来控制在文本上执行分析过程的分词器。默认是字段映射中明确定义或者默认的搜索分词器
lenient参数可以设置为诶true来忽略数据类型匹配出错造成的异常,例如尝试通过文本查询字符串来查询数字类型字段默认为false
短语查询
短语查询分析文本并创建短语查询,例如:
GET xy_order/_search
{
"query": {
"match_phrase":{
"cbid":"8228721504057103"
}
}
}
短语查询根据一个可配置的slope匹配索引词
可以设置analyzer来控制将要在文本上执行分词的分词器,默认是字段映射中定义的分析器或者默认的搜索分析器,例如:
GET xy_order/_search
{
"query": {
"match_phrase":{
"cbid":{
"query":"8228721504057103","analyzer": "standard"
}
}
}
}
短语前缀查询
可以对文本最后一个字段进行前缀匹配,例如:
GET xy_order/_search
{
"query": {
"match_phrase_prefix":{"ServerIp":"10.236"}
}
}
也可以接受max_expansions参数,可以控制最后索引词会扩展多少前缀,推荐设置为一个可以接受的值来控制查询的执行时间。例如:
GET xy_order/_search
{
"query": {
"match_phrase_prefix":{"ServerIp":{"query":"10.236", "max_expansions": 10}}
}
}
多字段查询
在标准查询的基础上,支持多字段查询:
GET bookcenter/_search
{
"query": {
"multi_match": {
"query": "少年",
"fields": ["desc","content"]
}
}
}
字段可以通过通配符指定:
GET bookcenter/_search
{
"query": {
"multi_match": {
"query": "法师",
"fields": ["desc","*tent"]
}
}
}
结果:
{
"took": 5,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.9635944,
"hits": [
{
"_index": "bookcenter",
"_type": "qidian",
"_id": "3",
"_score": 0.9635944,
"_source": {
"title": "全职法师",
"author": "乱",
"desc": "心潮澎湃,无限幻想,迎风挥击千层浪,少年不败热血!",
"category": "东方玄幻",
"content": "一觉醒来,世界大变熟悉的高中传授的是魔法,告诉大家要成为一名出色的魔法师。居住的都市之外游荡着袭击人类的魔物妖兽,虎视眈眈.崇尚科学的世界变成了崇尚魔法,偏偏有着一样以学渣看待自己的老师,一样目光异样的同学,一样社会底层挣扎的爸爸,一样纯美却不能走路的非血缘妹妹……不过,莫凡发现绝大多数人都只能够主修一系魔法,自己却是全系全能法师!"
}
}
]
}
}
多匹配查询内部执行方式取决于type参数,可以设置的值如下所示:
best_fields————(默认)查找匹配任何字段的文档,但是使用最佳匹配字段的_score
most_fields————查找匹配任何字段的文档,结合每个字段的_score
cross_fields————用相同的分析器处理字段,把这些字段当做一个大字段,查找任何字段的每个单词
phrase————在每个字段上运行短语匹配查询,结合每个字段的_score
phrase_prefix————在每个字段上运行短语前缀匹配查询,结合每个字段的_score
字段查询
全文文本查询在执行之前会分析查询字符串,而字段查询只针对存储在反向索引中的精确索引词
这种查询通常用于结构化数据,例如数字、日期和枚举,而不是全文文本字段
1、单字段查询
根据指定字段中包含的指定内容查询文档
GET xy_order/_search
{
"query": {
"term": {
"cbid": {
"value": "4513669803378203"
}
}
}
}
结果:
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1,
"hits": [
{
"_index": "xy_order",
"_type": "order",
"_id": "4513669803378203",
"_score": 1,
"_source": {
"YwGuid": 800159902781,
"AppId": 36,
"AreaId": 1,
"AppType": 111,
"cbid": 4513669803378203,
"ItemId": 25419177209132988,
"Uuid": 3279,
"Amount": 3,
"iscount": 1,
"valueofAmount": 3,
"OrderId": "9bb713f10b8669dc5cfc4fcb6e732c00",
"Subject": "爆萌小仙:扑倒冰山冷上神",
"DeviceUid": "171840941",
"UserIp": "10.62.21.173",
"ServerIp": "10.226.143.222",
"ConsumeTime": 1514736047000,
"CreateTime": 0
}
}
]
}
}
多字段查询
过滤文档,文档字段匹配任何提供的索引词,例如:
GET xy_order/_search
{
"query": {
"constant_score": {
"filter": {"terms": {
"cbid": [
"4513669803378203",
"8228721504057103"
]
}},
"boost": 1.2
}
}
}
结果:
{
"took": 6,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 1.2,
"hits": [
{
"_index": "xy_order",
"_type": "order",
"_id": "4513669803378203",
"_score": 1.2,
"_source": {
"YwGuid": 800159902781,
"AppId": 36,
"AreaId": 1,
"AppType": 111,
"cbid": 4513669803378203,
"ItemId": 25419177209132988,
"Uuid": 3279,
"Amount": 3,
"iscount": 1,
"valueofAmount": 3,
"OrderId": "9bb713f10b8669dc5cfc4fcb6e732c00",
"Subject": "爆萌小仙:扑倒冰山冷上神",
"DeviceUid": "171840941",
"UserIp": "10.62.21.173",
"ServerIp": "10.226.143.222",
"ConsumeTime": 1514736047000,
"CreateTime": 0
}
},
{
"_index": "xy_order",
"_type": "order",
"_id": "8228721504057103",
"_score": 1.2,
"_source": {
"YwGuid": 1039914285,
"AppId": 36,
"AreaId": 1,
"AppType": 111,
"cbid": 8228721504057103,
"ItemId": 25430287753426924,
"Uuid": 437,
"Amount": 5,
"iscount": 1,
"valueofAmount": 5,
"OrderId": "e0c07410ed29dcb26ef3d0c021a4e20b",
"Subject": "重生甜俏妻逆袭",
"DeviceUid": "183635853",
"UserIp": "10.242.15.141",
"ServerIp": "10.236.16.173",
"ConsumeTime": 1514822297000,
"CreateTime": 0
}
}
]
}
}
范围查询
根据字段包含的值(日期、数字或字符串)范围查找文档
例如查询amount在10到20之间的文档(数据量有点大,限制一下结果数)
GET xy_order/_search
{
"query": {
"range": {
"Amount": {
"gte": 10,
"lte": 20
}
}
},
"from": 0
, "size": 2
}
结果
{
"took": 5,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 755,
"max_score": 1,
"hits": [
{
"_index": "xy_order",
"_type": "order",
"_id": "5392592004069701",
"_score": 1,
"_source": {
"YwGuid": 2351842566,
"AppId": 36,
"AreaId": 3,
"AppType": 111,
"cbid": 5392592004069701,
"ItemId": 14475631957226180,
"Uuid": 78,
"Amount": 10,
"iscount": 0,
"valueofAmount": 0,
"OrderId": "16fa7c7f5fa91624ced9a050ba34dd99",
"Subject": "风流太子穿越妃",
"DeviceUid": "868090023415690",
"UserIp": "223.89.154.49",
"ServerIp": "101.226.103.77",
"ConsumeTime": 1514737983000,
"CreateTime": 0
}
},
{
"_index": "xy_order",
"_type": "order",
"_id": "7519820503924203",
"_score": 1,
"_source": {
"YwGuid": 493458659,
"AppId": 36,
"AreaId": 3,
"AppType": 111,
"cbid": 7519820503924203,
"ItemId": 25418846504780308,
"Uuid": 466,
"Amount": 10,
"iscount": 1,
"valueofAmount": 5,
"OrderId": "8e69a792d8ac936adfff67fc3fb28da3",
"Subject": "网游之召唤王",
"DeviceUid": "865121031827908",
"UserIp": "125.69.125.205",
"ServerIp": "101.226.103.77",
"ConsumeTime": 1514738295000,
"CreateTime": 0
}
}
]
}
}
范围查询接受的参数如下所示:
gte:大于或等于
gt: 大于
lte: 小于或等于
lt: 小于
boost: 设置查询的加权值,默认为1.0
复合查询
常数得分查询
这个查询包含另一个查询,并且仅返回过滤查询中任何常数得分等于查询加权的文档
GET xy_order/_search
{
"query": {
"constant_score": {
"filter": {"term": {
"Amount": 100
}},
"boost": 1.0
}
}
, "from": 0
, "size": 2
}
结果:
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 6,
"max_score": 1,
"hits": [
{
"_index": "xy_order",
"_type": "order",
"_id": "6338525304454703",
"_score": 1,
"_source": {
"YwGuid": 120005257736,
"AppId": 36,
"AreaId": 3,
"AppType": 135,
"cbid": 6338525304454703,
"ItemId": 6338525304454703,
"Uuid": 0,
"Amount": 100,
"iscount": 0,
"valueofAmount": 100,
"OrderId": "9bf4eb7f138f54605386442f1798b1a6",
"Subject": "贞观小农民",
"DeviceUid": "866486024724605",
"UserIp": "220.166.141.46",
"ServerIp": "101.226.103.77",
"ConsumeTime": 1514739482000,
"CreateTime": 0
}
},
{
"_index": "xy_order",
"_type": "order",
"_id": "8542213204453803",
"_score": 1,
"_source": {
"YwGuid": 120002974277,
"AppId": 36,
"AreaId": 3,
"AppType": 135,
"cbid": 8542213204453803,
"ItemId": 8542213204453803,
"Uuid": 0,
"Amount": 100,
"iscount": 0,
"valueofAmount": 100,
"OrderId": "53d39f8542717ea87d7949b9a905e067",
"Subject": "重生农女:妙手空间猎世子",
"DeviceUid": "868256021922124",
"UserIp": "223.81.196.70",
"ServerIp": "101.226.103.77",
"ConsumeTime": 1514784659000,
"CreateTime": 0
}
}
]
}
}
布尔查询
获取匹配其他查询的布尔值的文档,布尔查询对应Luence的BooleanQuery,基于一个或多个布尔子句的使用,每个子句都有一类事件:
1、must————必须出现在匹配文档中,并且会影响匹配得分
2、filter————必须出现在匹配文档中,匹配得分将会被忽略
3、should————应该出现在匹配文档中,在布尔查询中如果没有must或filter子句,文档必须匹配一个或多个should子句。应该匹配的should子句的最小值可以通过minimum_should_match参数进行设置
4、must_not————必须不出现在匹配文档中
布尔查询也支持disable_coord参数(默认为false)
布尔查询采取匹配越多越好的方式,所以每个匹配的must或should子句的得分会被加在一起,每个文档提供最终的_score,例如:
GET xy_order/_search
{
"query": {
"bool": {
"must": [
{"term": {
"ServerIp": {
"value": "101.226.103.77"
}
}}
]
, "must_not": [
{"term": {
"AreaId": {
"value": 5
}
}}
]
, "filter": {
"range": {
"Amount": {
"gte": 10,
"lte": 200
}
}
}
}
}
, "from": 0
, "size": 2
}
结果:
{
"took": 12,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 332,
"max_score": 0.4822159,
"hits": [
{
"_index": "xy_order",
"_type": "order",
"_id": "25382584000584301",
"_score": 0.4822159,
"_source": {
"YwGuid": 801025145875,
"AppId": 36,
"AreaId": 4,
"AppType": 111,
"cbid": 25382584000584300,
"ItemId": 16574121561281348,
"Uuid": 533,
"Amount": 14,
"iscount": 1,
"valueofAmount": 14,
"OrderId": "f58ac0fbfdd35209801dd961e3ded854",
"Subject": "重生之嫡女谋嫁",
"DeviceUid": "ea4ce27785e403d7e847710ec5aaad36",
"UserIp": "68.43.179.51",
"ServerIp": "101.226.103.77",
"ConsumeTime": 1514736119000,
"CreateTime": 0
}
},
{
"_index": "xy_order",
"_type": "order",
"_id": "25113666000167002",
"_score": 0.4822159,
"_source": {
"YwGuid": 800218465088,
"AppId": 36,
"AreaId": 3,
"AppType": 111,
"cbid": 25113666000167000,
"ItemId": 11386956139303862,
"Uuid": 286,
"Amount": 10,
"iscount": 1,
"valueofAmount": 10,
"OrderId": "0937b1098aa84b5ead4e4ab8f794a2d6",
"Subject": "邪王宠妻:医妃休想出墙",
"DeviceUid": "862823036270694",
"UserIp": "61.158.149.245",
"ServerIp": "101.226.103.77",
"ConsumeTime": 1514736190000,
"CreateTime": 0
}
}
]
}
}
最大值获取查询
这个查询通过执行自己的子查询生成文档的并集,并且用文档执行任何子查询的最大匹配得分作为文档得分
查询对应Luence的DisjunctionMaxQuery
GET xy_order/_search
{
"query": {
"dis_max": {
"tie_breaker": 0.7,
"boost": 1.2,
"queries": [
{"term": {
"Amount": {
"value": 100
}
}},
{"term": {
"AreaId": {
"value": 5
}
}}
]
}
}
, "from": 0
, "size": 5
}
boosting查询
可以用来有效降级匹配给出的查询结果,不像布尔查询的NOT子句,boosting查询任然选择包括不合需要的索引词的文档。但降低了它们的整体得分:
GET xy_order/_search
{
"query": {
"boosting": {
"positive": {
"term": {
"AreaId": {
"value": "1"
}
}
},
"negative": {
"term": {
"cbid": {
"value": "8824770204890803"
}
}
}
, "negative_boost": 0.2
}
}
, "from": 0
, "size": 2
}
ElasticSearch高亮显示
ElasticSearch中的高亮显示是来源于Lucene的功能,允许在一个或者多个字段上突出显示搜索内容,Lucene支持三种高亮显示方式highlighter
、fast-vector-highlighter
、positings-highlighter
,第一种是默认的标准类型。例如:
GET qidian_book/abstract/_search
{
"query": {
"term": {
"desc": {
"value": "mofashi"
}
}
},
"highlight": {
"fields": {"desc": {}}
}
}
结果:
{
"took": 39,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.3973951,
"hits": [
{
"_index": "qidian_book",
"_type": "abstract",
"_id": "1",
"_score": 0.3973951,
"_source": {
"title": "dao mu bi ji",
"author": "tang jia san shao",
"desc": "yi jiao xing lai,shi jie da bian shu xi de gao zhong chuan shou de shi mo fa, gao su da jia yao cheng wei yi ming chu se de mofashi, ju zhu de du shi zhi wai you dan zhe xi ji ren lei de mowu yao sou, zi ji yao cheng wei quan xiao zhu mu de mofashi"
},
"highlight": {
"desc": [
"da bian shu xi de gao zhong chuan shou de shi mo fa, gao su da jia yao cheng wei yi ming chu se de mofashi",
"de du shi zhi wai you dan zhe xi ji ren lei de mowu yao sou, zi ji yao cheng wei quan xiao zhu mu de mofashi"
]
}
}
]
}
}
结果中有高亮显示的内容mofashi,为了执行高亮显示,该字段必须有实际的内容,并且这个字段必须进行存储,就是在字段映射中store的值必须为true,不能只在内存中。系统会自动加载_source字段并匹配相关的列
fast-vector-highlighter
highlighter
是普通的高亮显示,而fast-vector-highlighter
高亮显示更加有特点,如下所示:
1、快,特别是内容大的字段,比如大于1MB
2、可定制的boundary_chars
,boundary_max_scan
和fragment_offset
3、可以设置term_vector
的值为with_positions_offset
,增加索引的大小
4、可以将多个字段的匹配组合成一个结果
5、可以权重匹配分配在不同的位置上