intervals查询使用了匹配规则,这些规则将会使用在指定字段的对应词(term)上;
这些规则定义将产生横跨文本的最小化的间隔(interval),这些间隔可以被父级间隔(interval)组合或过滤;
intervals查询示例
//请求参数
GET software/_search
{
"query": {
"intervals": {
"desc": {
"all_of": {
"ordered": true,
"intervals": [
{
"match": {
"query": "distributed search",
"max_gaps": 0,
"ordered": true
}
},
{
"any_of": {
"intervals": [
{
"match": {
"query": "analytics engine"
}
},
{
"match": {
"query": "Elastic Stack"
}
}
]
}
}
]
}
}
}
}
}
intervals查询顶级参数
序号 | 参数 | 描述 |
---|---|---|
1 | (必须)—希望搜索的文档字段;该参数对应着规则对象,基于词(term)、顺序(order)以及相互间距离来匹配文档; |
合法的规则关键词有以下几类
序号 | 关键字 | 描述 |
---|---|---|
1 | match | |
2 | prefix | |
3 | wildcard | |
4 | fuzzy | |
5 | all_of | |
6 | any_of |
match规则匹配被分词后的文本
具体匹配参数
序号 | 参数 | 描述 |
---|---|---|
1 | query | (必须,字符串类型)–指定需要查询的文本信息 |
2 | max_gaps | (可选,数值类型)—匹配词(term)之间最大间隔,默认为-1;未指定或指定为-1则匹配无间隔限制,设置为0则匹配词必须要在已匹配词的下个词开始匹配(连续) |
3 | ordered | (可选,布尔类型)—值为true表示匹配词必须按照指定顺序出现,默认为false |
4 | analyzer | (可选,字符串类型)—指定查询的分词器,默认为指定查询字段对应的分词器 |
5 | filter | (可选,规则对象)—对应一个interval filter |
6 | use_field | (可选,字符串类型)—若指定该字段,则intervals查询不使用上层转而以该字段进行查询,查询使用的分词器也是该字段对应的搜索分词器; |
prefix规则匹配的词要以指定的字符串开头,若prefix参数指定的字符串匹配超过128个词(term)则ES将报错,
这可以通过设置字段参数index_prefix来接触该限制;
具体匹配参数
序号 | 参数 | 描述 |
---|---|---|
1 | prefix | (必须,字符串类型)—指定匹配词(term)开头的字符串 |
2 | analyzer | (可选,字符串类型)—分词器用于对前缀字符串进行normalize处理,默认为上层指定的分词器 |
3 | use_field | (可选,字符串类型)—若指定该字段,则intervals查询不使用上层转而以该字段进行查询 |
wildcard规则使用通配符进行匹配,指定的通配符匹配超过128个则ES将报错;
具体匹配参数
序号 | 参数 | 描述 |
---|---|---|
1 | pattern | (必须,字符串类型)—指定通配符;参数支持两类通配符: ? 匹配单个字符; * 匹配零或多个字符,包括空字符 |
2 | analyzer | (可选,字符串类型)—分词器用于对通配符进行normalize处理,默认为上层指定的分词器 |
3 | use_field | (可选,字符串类型)—若指定该字段,则intervals查询不使用上层转而以该字段进行查询 |
fuzzy规则匹配与给定词(term)相似词(可编辑距离内的term)的匹配结果,若模糊匹配的词(term)超过128个则ES将报错;
具体匹配参数
序号 | 参数 | 描述 |
---|---|---|
1 | term | (必须,字符串类型)—需要匹配的词 |
2 | prefix_length | (可选,字符串类型)—创建扩展时起始字符数保持不变,默认起始字符数为0 |
3 | transpositions | (可选,布尔类型)—确定编辑时是否包括两个相邻字符的换位(ab->ba),默认为true |
4 | fuzziness | (可选,字符串类)—匹配允许的最大编辑距离,默认为auto |
5 | analyzer | (可选,字符串类型)—分词器用于对term进行normalize处理,默认为上层指定的分词器 |
6 | use_field | (可选,字符串类型)—若指定该字段,则intervals查询不使用上层转而以该字段进行查询 |
all_of规则返回的匹配结果是跨越多个组合规则而得到的;
具体匹配参数
序号 | 参数 | 描述 |
---|---|---|
1 | intervals | (必须,对象数组)—需要组合的规则数组;所有规则都必须在文档中产生匹配项以使最终有匹配文档 |
2 | max_gaps | (可选,数值类型)—匹配词(term)之间最大间隔,默认为-1;未指定或指定为-1则匹配无间隔限制,设置为0则匹配词必须要在已匹配词的下个词开始匹配(连续) |
3 | ordered | (可选,布尔类型)—值为true表示匹配词必须按照指定顺序出现,默认为false |
4 | filter | (可选,规则对象)—对应一个interval filter |
any_of规则匹配任何子规则的文档;
具体匹配参数
序号 | 参数 | 描述 |
---|---|---|
1 | intervals | (必须,对象数组)—需要任一匹配的规则数组; |
2 | filter | (可选,规则对象)—对应一个interval filter |
filter规则是基于查询返回intervals;
具体匹配参数
序号 | 参数 | 描述 |
---|---|---|
1 | after | (可选,查询对象)—query的interval在filter的interval之后 |
2 | before | (可选,规则对象)—query的interval在filter的interval之前 |
3 | contained_by | (可选,查询对象)—filter中的interval包含query的interval |
4 | containing | (可选,查询对象)—query的interval包含filter的interval |
5 | not_contained_by | (可选,查询对象)—filter中的interval不包含query的interval |
6 | not_containing | (可选,查询对象)—query的interval不包含filter的interval |
7 | not_overlapping | (可选,查询对象)—filter中的interval与query的interval不重叠 |
8 | overlapping | (可选,查询对象)—filter中的interval与query的interval相互重叠 |
9 | script | (可选,脚本对象)—脚本用于返回匹配的文档 |
//以下查询包含filter规则,有两个限制条件:
//1、要求desc字段查询时指定的query字段中两个词相隔不得超过3个位置(max_gaps)
//2、在匹配词'distributed engine'之间不允许包含'redis'字段
POST software/_search
{
"query": {
"intervals":{
"desc":{
"match":{
"query":"distributed engine",
"max_gaps": 3,
"filter":{
"not_containing":{
"match":{
"query": "redis"
}
}
}
}
}
}
}
}
//结果返回,可结合不同情况分别测试
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 0.19999999,
"hits" : [
{
"_index" : "software",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.19999999,
"_source" : {
"title" : "elasticsearch",
"desc" : "Elasticsearch is the distributed search and analytics engine at the heart of the Elastic Stack"
}
},
{
"_index" : "software",
"_type" : "_doc",
"_id" : "4",
"_score" : 0.19999999,
"_source" : {
"title" : "elasticsearch",
"desc" : "distributed search and analytics engine at the heart of the Elastic Stack"
}
}
]
}
}
//查询的字段'distributed engine'要在'redis'之前
GET software/_search
{
"query": {
"intervals":{
"desc":{
"match":{
"query":"distributed engine",
"max_gaps": 3,
"filter":{
"before":{
"match":{
"query": "redis"
}
}
}
}
}
}
}
}
//结果返回
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 0.19999999,
"hits" : [
{
"_index" : "software",
"_type" : "_doc",
"_id" : "5",
"_score" : 0.19999999,
"_source" : {
"title" : "elasticsearch",
"desc" : "distributed search redis analytics engine redis"
}
}
]
}
}
GET software/_search
{
"query": {
"intervals":{
"desc":{
"match":{
"query":"distributed engine",
"filter":{
"script":{
"source":"interval.start > 1 && interval.end < 10 && interval.gaps == 3"
}
}
}
}
}
}
}
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 0.19999999,
"hits" : [
{
"_index" : "software",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.19999999,
"_source" : {
"title" : "elasticsearch",
"desc" : "Elasticsearch is the distributed search and analytics engine at the heart of the Elastic Stack"
}
}
]
}
}
最小化
intervals查询总是最小化间隔(interval)以保证查询时间在线性范围内;这在有时候会出现令人不解的情况,尤其是在使用了max_gaps参数或filter的情况下;例如以下查询希望’library API’短语中包含code的查询:
//
GET software/_search
{
"query": {
"intervals":{
"desc":{
"match":{
"query":"library API",
"filter":{
"contained_by":{
"match":{
"query":"code"
}
}
}
}
}
}
}
}
以上的查询语句并不与短语but rather a code library and API that can easily be used
匹配,可以将contained_by
改成after
进行匹配;
另外的一个限制是在any_of
子规则查询当中出现的重叠短语;即当一个较短短语匹配则较长短语将永远无法匹配到,这在组合使用max_gaps时返回令人不解的结果,考虑以下的查询:
GET software/_search
{
"query": {
"intervals": {
"desc": {
"all_of": {
"intervals": [
{
"match": {
"query": "add"
}
},
{
"any_of": {
"intervals": [
{
"match": {
"query": "search"
}
},
{
"match": {
"query": "search capabilities"
}
}
]
}
},
{
"match": {
"query": "to"
}
}
],
"max_gaps": 0,
"ordered": true
}
}
}
}
}
以上这个查询将永远也不会匹配add search capabilities to
,因为any_of
的规则只会产生search
,在这种情况下就需要重写上面的查询条件,重写之后的条件如下:
GET software/_search
{
"query": {
"intervals": {
"desc": {
"any_of": {
"intervals": [
{
"match": {
"query": "add search capabilities to",
"max_gaps": 0,
"ordered": true
}
},
{
"match": {
"query": "add search to",
"max_gaps": 0,
"ordered": true
}
}
]
}
}
}
}
}
//以上两个查询条件结果相同
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 0.3333333,
"hits" : [
{
"_index" : "software",
"_type" : "_doc",
"_id" : "6",
"_score" : 0.3333333,
"_source" : {
"title" : "lucene",
"desc" : "Lucene is not a complete application, but rather a code library and API that can easily be used to add search capabilities to applications"
}
}
]
}
}
以下为查询的索引文档信息
PUT software/_doc/1
{
"title":"elasticsearch",
"desc":"Elasticsearch is the distributed search and analytics engine at the heart of the Elastic Stack"
}
PUT software/_doc/2
{
"title":"redis",
"desc":"Redis is an open source, in-memory data structure store, used as a database, cache and message broker"
}
PUT software/_doc/3
{
"title":"Luence",
"desc":"Lucene Core is a Java library providing powerful indexing and search features, as well as spellchecking, hit highlighting and advanced analysis/tokenization capabilities"
}
PUT software/_doc/4
{
"title":"elasticsearch",
"desc":"distributed search and analytics engine at the heart of the Elastic Stack"
}
PUT software/_doc/5
{
"title":"elasticsearch",
"desc":"distributed search redis analytics engine redis"
}
PUT software/_doc/6
{
"title":"lucene",
"desc":"Lucene is not a complete application, but rather a code library and API that can easily be used to add search capabilities to applications"
}