下面先简单描述一下mapping是什么?
当我们插入几条数据,让ES自动为我们建立一个索引
PUT /website/_doc/1
{
"post_date": "2017-01-01",
"title": "my first article",
"content": "this is my first article in this website",
"author_id": 11400
}
PUT /website/_doc/2
{
"post_date": "2017-01-02",
"title": "my second article",
"content": "this is my second article in this website",
"author_id": 11400
}
PUT /website/_doc/3
{
"post_date": "2017-01-03",
"title": "my third article",
"content": "this is my third article in this website",
"author_id": 11400
}
查看mapping
GET /website/_mapping
{
"website" : {
"mappings" : {
"properties" : {
"author_id" : {
"type" : "long"
},
"content" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"post_date" : {
"type" : "date"
},
"title" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}
}
}
上面是插入数据自动生成的mapping,还有手动生成的mapping。这种自动或手动为index中的type建立的一种数据结构和相关配置,称为mapping。
下面是手动创建的mapping。
PUT /test_mapping
{
"mappings" : {
"properties" : {
"author_id" : {
"type" : "long"
},
"content" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"post_date" : {
"type" : "date"
},
"title" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}
}
1、精确匹配与全文搜索的对比分析
(1)exact value
也就是某个field必须全部匹配才能返回相应的document
示例:
GET /website/_search?q=post_date:2017
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 0,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
}
}
GET /website/_search?q=post_date:2017-01-01
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "website",
"_type" : "doc",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"post_date" : "2017-01-01",
"title" : "my first article",
"content" : "this is my first article in this website",
"author_id" : 11400
}
}
]
}
}
(2)full text
full text与exact value不一样,不是说单纯的只是匹配完整的一个值,而是可以对值进行拆分词语后(分词)进行匹配,也可以通过缩写、时态、大小写、同义词等进行匹配。
示例:
GET /website/_search?q=title:article
{
"took" : 7,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : 0.087011375,
"hits" : [
{
"_index" : "website",
"_type" : "doc",
"_id" : "1",
"_score" : 0.087011375,
"_source" : {
"post_date" : "2017-01-01",
"title" : "my first article",
"content" : "this is my first article in this website",
"author_id" : 11400
}
},
{
"_index" : "website",
"_type" : "doc",
"_id" : "2",
"_score" : 0.087011375,
"_source" : {
"post_date" : "2017-01-02",
"title" : "my second article",
"content" : "this is my second in this website",
"author_id" : 11400
}
},
{
"_index" : "website",
"_type" : "doc",
"_id" : "3",
"_score" : 0.087011375,
"_source" : {
"post_date" : "2017-01-03",
"title" : "my third article",
"content" : "this is my third in this website",
"author_id" : 11400
}
}
]
}
}
2、倒排索引核心原理
下面演示一下倒排索引简单建立的过程,当然实际中倒排索引的建立过程会非常的复杂。
doc1: I really liked my small dogs, and I think my mom also liked them.
doc2: He never liked any dogs, so I hope that my mom will not expect me to liked him.
分词,初步的倒排索引的建立
word doc1 doc2
I * *
really *
liked * *
my * *
small *
dogs *
and *
think *
mom * *
also *
them *
He *
never *
any *
so *
hope *
that *
will *
not *
expect *
me *
to *
him *
搜索 mother like little dog, 不会有任何结果
mother
like
little
dog
这肯定不是我们想要的结果。比如mother和mom其实根本就没有区别。但是却检索不到。但是做下测试发现ES是可以查到的。实际上ES在建立倒排索引的时候,还会执行一个操作,就是会对拆分的各个单词进行相应的处理,以提升后面搜索的时候能够搜索到相关联的文档的概率。像时态的转换,单复数的转换,同义词的转换,大小写的转换。这个过程称为正则化(normalization)
mother-> mom
liked -> like
small -> little
dogs -> dog
这样重新建立倒排索引:
word doc1 doc2
I * *
really *
like * *
my * *
little *
dog *
and *
think *
mom * *
also *
them *
He *
never *
any *
so *
hope *
that *
will *
not *
expect *
me *
to *
him *
查询:mother like little dog 分词正则化
mother -> mom
like -> like
little -> little
dog -> dog
doc1和doc2都会搜索出来
doc1:I really liked my small dogs, and I think my mom also liked them.
doc2:He never liked any dogs, so I hope that my mom will not expect me to liked him.
3、对mapping进一步总结
(1)往ES里面直接插入数据,ES会自动建立索引,同时建立type以及对应的mapping
(2)mapping中自动定义了每个fieldd的数据类型
(3)不同的数据类型(比如说text和date),可能有的是exact value,有的是full text
(4)exact value,在建立倒排索引的时候,分词的时候,都是将整个值一起作为关键字建立到倒排索引中;full text会经历各种各样的处理,分词,normalization(时态转换,同义词转换,大小写转换),才会建立到倒排索引中
(5)在搜索的时候,exact value和full text类型就决定了,对exact value和full text field进行搜索的行为也是不一样的,会跟建立倒排索引的行为保持一致;比如说exact value搜索的时候,就是直接按照整个值进行匹配,full text也会进行分词和正则化normalization再去倒排索引中去搜索。
(6)可以用 ES的dynamic mapping,让其自动建立mapping,包括自动设置数据类型;也可以提前手动创建index和type的mapping,自己对各个field进行设置,包括数据类型,包括索引行为,包括分析器等等。
mapping本质上就是index的type的元数据,决定了数据类型,建立倒排索引的行为,还有进行搜索的行为。
4、mapping核心数据类型以及dynamic mapping
(1)核心数据类型
string text:字符串类型
byte:字节类型
short:短整型
integer:整型
long:长整型
float:浮点型
boolean:布尔类型
date:时间类型
当然还有一些高级类型,像数组,对象object,但其底层都是text字符串类型
(2) dynamic mapping
true or false -> boolean
123 -> long
123.45 -> float
2017-01-01 -> date
"hello world" -> string text
(3)查看mapping
GET /{index}/mapping
GET /test/_mapping
{
"test" : {
"mappings" : {
"properties" : {
"field1" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"field2" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}
}
}
5、手动建立和修改mapping以及定制string类型是否分词
注意:只能创建index时手动建立mapping,或者新增field mapping,但是不能update field mapping。
# 创建索引
PUT /website
{
"mappings": {
"properties": {
"author_id": {
"type": "long"
},
"title": {
"type": "text",
"analyzer": "standard"
},
"content": {
"type": "text"
},
"post_date": {
"type": "date"
},
"publisher_id": {
"type": "keyword"
}
}
}
}
#修改字段的mapping
PUT /website
{
"mappings": {
"properties": {
"author_id": {
"type": "text"
}
}
}
}
{
"error": {
"root_cause": [
{
"type": "resource_already_exists_exception",
"reason": "index [website/5xLohnJITHqCwRYInmBFmA] already exists",
"index_uuid": "5xLohnJITHqCwRYInmBFmA",
"index": "website"
}
],
"type": "resource_already_exists_exception",
"reason": "index [website/5xLohnJITHqCwRYInmBFmA] already exists",
"index_uuid": "5xLohnJITHqCwRYInmBFmA",
"index": "website"
},
"status": 400
}
#增加mapping的字段
PUT /website/_mapping
{
"properties": {
"new_field": {
"type": "text"
}
}
}
{
"acknowledged" : true
}
6、mapping复杂类型y以及object类型数据底层结构
(1)multivalue field
{
"tags": ["tag1", "tag2"]
}
(2)empty field
null, []
(3)object field
PUT /test/_create/1
{
"address": {
"country": "china",
"province": "guangdong",
"city": "guangzhou"
},
"name": "jack",
"age": 27,
"join_date": "2017-01-01"
}
GET /test/_mapping
{
"test" : {
"mappings" : {
"properties" : {
"address" : {
"properties" : {
"city" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"country" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"province" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
},
"age" : {
"type" : "long"
},
"join_date" : {
"type" : "date"
},
"name" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}
}
}
GET /test/_doc/1
{
"_index" : "test",
"_type" : "_doc",
"_id" : "1",
"_version" : 1,
"_seq_no" : 0,
"_primary_term" : 1,
"found" : true,
"_source" : {
"address" : {
"country" : "china",
"province" : "guangdong",
"city" : "guangzhou"
},
"name" : "jack",
"age" : 27,
"join_date" : "2017-01-01"
}
}
注意:建立索引的时候与string时一样的,数据类型不能混