十四、Elasticsearch的新增改查时的mapping映射相关语法讲解

1、query string基础语法

GET /test_index/test_type/_search?q=test_field:test//查询test_field中有test的

GET /test_index/test_type/_search?q=+test_field:test//必须包含，和上面一样

GET /test_index/test_type/_search?q=-test_field:test//不包含

2、_all metadata的原理和作用

GET /test_index/test_type/_search?q=test

直接可以搜索所有的field，任意一个field包含指定的关键字就可以搜索出来。我们在进行中搜索的时候，难道是对document中的每一个field都进行一次搜索吗？不是的

es中的_all元数据，在建立索引的时候，我们插入一条document，它里面包含了多个field，此时，es会自动将多个field的值，全部用字符串的方式串联起来，变成一个长的字符串，作为_all field的值，同时建立索引

后面如果在搜索的时候，没有对某个field指定搜索，就默认搜索_all field，其中是包含了所有field的值的

举个例子

{

"name": "jack",

"age": 26,

"email": "[email protected]",

"address": "guamgzhou"

}

"jack 26 [email protected] guangzhou"，作为这一条document的_all field的值，同时进行分词后建立对应的倒排索引

3、初识搜索引擎_用一个例子告诉你mapping到底是什么

插入几条数据，让es自动为我们建立一个索引

PUT /website/article/1

{

"post_date": "2017-01-01",

"title": "my first article",

"content": "this is my first article in this website",

"author_id": 11400

}

PUT /website/article/2

{

"post_date": "2017-01-02",

"title": "my second article",

"content": "this is my second article in this website",

"author_id": 11400

}

PUT /website/article/3

{

"post_date": "2017-01-03",

"title": "my third article",

"content": "this is my third article in this website",

"author_id": 11400

}

尝试各种搜索

GET /website/article/_search?q=2017 3条结果

GET /website/article/_search?q=2017-01-01 3条结果

GET /website/article/_search?q=post_date:2017-01-01 1条结果

GET /website/article/_search?q=post_date:2017 1条结果

查看es自动建立的mapping，带出什么是mapping的知识点:

自动或手动为index中的type建立的一种数据结构和相关配置，简称为mapping

dynamic mapping，自动为我们建立index，创建type，以及type对应的mapping，mapping中包含了每个field对应的数据类型，以及如何分词等设置。

查看mapping

GET /website/_mapping/article

信息

{

"website": {

"mappings": {

"article": {

"properties": {

"author_id": {

"type": "long"

"content": {

"type": "text",

"fields": {

"keyword": {

"type": "keyword",

"ignore_above": 256

}

"post_date": {

"type": "date"

"title": {

"type": "text",

"fields": {

"keyword": {

"type": "keyword",

"ignore_above": 256

}

搜索结果为什么不一致，因为es自动建立mapping的时候，设置了不同的field不同的data type。不同的data type的分词、搜索等行为是不一样的。所以出现了_all field和post_date field的搜索表现完全不一样。（先看4-6的一些知识点后就知道为什么了）

4、精确匹配与全文搜索的对比分析

1、exact value精准批配

2017-01-01，exact value，搜索的时候，必须输入2017-01-01，才能搜索出来

如果你输入一个01，是搜索不出来的

2、full text全文搜索

（1）缩写 vs. 全程：cn vs. china

（2）格式转化：like liked likes

（3）大小写：Tom vs tom

（4）同义词：like vs love

就不是说单纯的只是匹配完整的一个值，而是可以对值进行拆分词语后（分词）进行匹配，也可以通过缩写、时态、大小写、同义词等进行匹配

5、倒排索引核心原理快速揭秘

doc1：I really liked my small dogs, and I think my mom also liked them.

doc2：He never liked any dogs, so I hope that my mom will not expect me to liked him.

分词，初步的倒排索引的建立

word doc1 doc2

liked * *

small *

dogs *

mom * *

演示了一下倒排索引最简单的建立的一个过程

搜索mother like little dog，不可能有任何结果

这个是不是我们想要的搜索结果？？？绝对不是，因为在我们看来，mother和mom有区别吗？同义词，都是妈妈的意思。like和liked有区别吗？没有，都是喜欢的意思，只不过一个是现在时，一个是过去时。little和small有区别吗？同义词，都是小小的。dog和dogs有区别吗？狗，只不过一个是单数，一个是复数。

normalization，建立倒排索引的时候，会执行一个操作，也就是说对拆分出的各个单词进行相应的处理，以提升后面搜索的时候能够搜索到相关联的文档的概率

worddoc1doc2

liked**liked --> like

small* small --> little

dogs* dogs --> dog

mom**

这样后在执行上面的操作就可以doc1和doc2都会搜索出来

6、分词器

（1）、什么是分词器

切分词语，normalization（提升recall召回率）normalization（翻译是标准化）

给你一段句子，然后将这段句子拆分成一个一个的单个的单词，同时对每个单词进行normalization（时态转换，单复数转换），分瓷器

recall，召回率：搜索的时候，增加能够搜索到的结果的数量

character filter：在一段文本进行分词之前，先进行预处理，比如说最常见的就是，过滤html标签（hello --> hello），& --> and（I&you --> I and you）

tokenizer：分词，hello you and me --> hello, you, and, me

token filter：lowercase，stop word，synonymom，dogs --> dog，liked --> like，Tom --> tom，a/the/an --> 干掉，mother --> mom，small --> little

一个分词器，很重要，将一段文本进行各种处理，最后处理好的结果才会拿去建立倒排索引

（2）、内置分词器的介绍

文本：Set the shape to semi-transparent by calling set_trans(5)

standard analyzer：set, the, shape, to, semi, transparent, by, calling, set_trans, 5（默认的是standard）

simple analyzer：set, the, shape, to, semi, transparent, by, calling, set, trans

whitespace analyzer：Set, the, shape, to, semi-transparent, by, calling, set_trans(5)

language analyzer（特定的语言的分词器，比如说，english，英语分词器）：set, shape, semi, transpar, call, set_tran, 5

7、对第3点中的为什么做出解答

（1）、query string分词

query string必须以和index建立时相同的analyzer进行分词

query string对exact value和full text的区别对待：

date：exact value

_all：full text

比如我们有一个document，其中有一个field，包含的value是：hello you and me，建立倒排索引

我们要搜索这个document对应的index，搜索文本是hell me，这个搜索文本就是query string

query string，默认情况下，es会使用它对应的field建立倒排索引时相同的分词器去进行分词，分词和normalization，只有这样，才能实现正确的搜索

知识点：不同类型的field，可能有的就是full text，有的就是exact value

post_date，date：exact value

_all：full text，分词和normalization

（2）、mapping引入案例遗留问题大揭秘

GET /_search?q=2017-01-01

这样时，没有指定字段，query string会用跟建立倒排索引进行分词，所以会查出来三条。

GET /_search?q=post_date:2017-01-01

post_date，会作为exact value去建立索引，所以只有一条

GET /_search?q=post_date:2017

因为是es 5.2以后做的一个优化，且为exact value去建立索引，所以查的时候会给返回一条，后面有机会了解后做讲解

（3）、测试分词器，用下面的方法可以看是怎么分词的

GET /_analyze

{

"analyzer": "standard",

"text": "Text to analyze"

}

（3）、mapping数据类型

String/text

byte，short，integer，long

float，double

boolean

date

自己映射时什么做相应处理

true or false --> boolean

123 --> long

123.45 --> double

2017-01-01 --> date

"hello world" --> string/text

8、总结

（1）往es里面直接插入数据，es会自动建立索引，同时建立type以及对应的mapping

（2）mapping中就自动定义了每个field的数据类型

（3）不同的数据类型（比如说text和date），可能有的是exact value，有的是full text

（4）exact value，在建立倒排索引的时候，分词的时候，是将整个值一起作为一个关键词建立到倒排索引中（如post_date:2017-01-01）；full text，会经历各种各样的处理，分词/normaliztion（时态转换，同义词转换，大小写转换），才会建立到倒排索引中

（5）同时呢，exact value和full text类型的field就决定了，在一个搜索过来的时候，对exact value field或者是full text field进行搜索的行为也是不一样的，会跟建立倒排索引的行为保持一致；比如说exact value搜索的时候，就是直接按照整个值进行匹配，full text/query string，也会进行分词和normalization再去倒排索引中去搜索

（6）可以用es的dynamic mapping，让其自动建立mapping，包括自动设置数据类型；也可以提前手动创建index和type的mapping，自己对各个field进行设置，包括数据类型，包括索引行为，包括分词器，等等

mapping，就是index的type的元数据，每个type都有一个自己的mapping，决定了数据类型，建立倒排索引的行为，还有进行搜索的行为

[if !supportLists]9、[endif]如何建立索引

analyzed分词

not_analyzed//不分

no不能被搜索

[if !supportLists]（1）[endif]、修改mapping

只能创建index时手动建立mapping，或者新增field mapping，但是不能update field mapping

PUT /website

{

"mappings": {

"article": {

"properties": {

"author_id": {

"type": "long"

"title": {

"type": "text",

"analyzer": "english"

"content": {

"type": "text"

"post_date": {

"type": "date"

"publisher_id": {

"type": "text",

"index": "not_analyzed"

}

如果做修改会报错

PUT /website

{

"mappings": {

"article": {

"properties": {

"author_id": {

"type": "text"

}

报错

{

"error": {

"root_cause": [

{

"type": "index_already_exists_exception",

"reason": "index [website/co1dgJ-uTYGBEEOOL8GsQQ] already exists",

"index_uuid": "co1dgJ-uTYGBEEOOL8GsQQ",

"index": "website"

}

"type": "index_already_exists_exception",

"reason": "index [website/co1dgJ-uTYGBEEOOL8GsQQ] already exists",

"index_uuid": "co1dgJ-uTYGBEEOOL8GsQQ",

"index": "website"

"status": 400

}

新增一个字段时是可以的

PUT /website/_mapping/article

{

"properties" : {

"new_field" : {

"type" : "string",

"index": "not_analyzed"

}

[if !supportLists]（2）[endif]测试建立好的mapping是怎么进行分词的

GET /website/_analyze

{

"field": "content",

"text": "my-dogs"

}

[if !supportLists]（3）[endif]特殊类型

{ "tags": [ "tag1", "tag2" ]}//多字段

null，[]，[null]//为null的

Object类型的

多种类型的，es底层可能会对它做特殊处理

如：{

"address": {

"country": "china",

"province": "guangdong",

"city": "guangzhou"

"name": "jack",

"age": 27,

"join_date": "2017-01-01"

}

es处理后

{

"name": [jack],

"age": [27],

"join_date": [2017-01-01],

"address.country": [china],

"address.province": [guangdong],

"address.city": [guangzhou]

}

如：

{

"authors": [

{ "age": 26, "name": "Jack White"},

{ "age": 55, "name": "Tom Jones"},

{ "age": 39, "name": "Kitty Smith"}

]

}

ES处理后

{

"authors.age": [26, 55, 39],

"authors.name": [jack, white, tom, jones, kitty, smith]

}

十四、Elasticsearch的新增改查时的mapping映射相关语法讲解

你可能感兴趣的:(十四、Elasticsearch的新增改查时的mapping映射相关语法讲解)