四十一、Elasticsearch初识搜索引擎-query string的分词

1、query string分词
（1）query string必须以和index建立时相同的analyzer进行分词

（2）query string对exact value和full text的区别对待

比如我们有一个document，其中有一个field，包含的value是：hello you and me，建立倒排索引，我们要搜索这个document对应的index，搜索文本是hello me，默认情况下，query string ES会使用他对应的field建立倒排索引时相同的分词器去进行分词，分词和normalization，只有这样，才能实现正确的搜索。

总之一句话：你建立索引时用到的什么分词器，你query的时候就必须用相同的分词器，否则分词规则不同，肯定搜不到你想要的结果。

2、解答之前遗留的问题
知识点：不同类型的field，可能有的就是full text，有的就是exact value

比如之前练习的website那个
post_date date:exact value
_all full text，分词，normalization

Demo1

GET /_search?q=2017
为什么出来三条？

因为搜索的是_all field，document会把所有的field拼接成一个大的字符串，进行分词

建立倒排索引

	doc1	doc2	doc3
2017	*	*	*
01	*
02		*
03			*

_all，全文检索，去倒排索引里找2017，匹配3个document，所以返回3条结果。

GET /_search?q=2017-01-01
为什么也返回三条结果?

_all，2017-01-01，query string会用跟建立倒排索引一样的分词器去进行分词
2017
01
01

_all，全文检索，去倒排索引里找2017，匹配3个document，01匹配了一条，求并集去重复，所以返回3条结果。

Demo2

GET /_search?q=post_date:2017-01-01
为什么只返回了一条？

date，会作为exact value去建立倒排索引

	doc1	doc2	doc3
2017-01-01	*
2017-01-02		*
2017-01-03			*

post_date：2017-01-01，只匹配了第一条，所以返回第一条document

GET /_search?q=post_date:2017
为什么也返回一条？
这个在这里不讲解，因为是es 5.2以后做的一个优化

3、测试分词器

GET /_analyze
{
  "analyzer": "standard",
  "text": "2017-01-01"
}

返回结果

{
  "tokens": [
    {
      "token": "2017",
      "start_offset": 0,
      "end_offset": 4,
      "type": "",
      "position": 0
    },
    {
      "token": "01",
      "start_offset": 5,
      "end_offset": 7,
      "type": "",
      "position": 1
    },
    {
      "token": "01",
      "start_offset": 8,
      "end_offset": 10,
      "type": "",
      "position": 2
    }
  ]
}

发现给我们拆分成了三个词，2017,01,01，类型是NUM（数字）

若有兴趣，欢迎来加入群，【Java初学者学习交流群】：458430385，此群有Java开发人员、UI设计人员和前端工程师。有问必答，共同探讨学习，一起进步！
欢迎关注我的微信公众号【Java码农社区】，会定时推送各种干货：

四十一、Elasticsearch初识搜索引擎-query string的分词_第1张图片

qrcode_for_gh_577b64e73701_258.jpg

四十一、Elasticsearch初识搜索引擎-query string的分词

你可能感兴趣的:(四十一、Elasticsearch初识搜索引擎-query string的分词)