Mapping类似数据库中的schema的定义,作用如下
Mapping会把JSON文档映射成Lucene所需要的扁平格式
一个Mapping属于一个索引的Type
简单类型
复杂类型 - 对象和嵌套对象
特殊类型
类型推断错误:
#插入数据
PUT b_test/_doc/1
{
"startLocation": {
"lat": 32.004287,
"lon": 118.779369
}
}
#查看mapping
GET b_test/_mapping
{
"b_test" : {
"mappings" : {
"properties" : {
"startLocation" : {
"properties" : {
"lat" : {
"type" : "float"
},
"lon" : {
"type" : "float"
}
}
}
}
}
}
}
正确做法
#先创建索引
PUT b_test
{
"mappings": {
"properties": {
"location": {
"type": "geo_point"
}
}
}
}
#再插入数据
PUT b_test/_doc/1
{
"location": {
"lat": 32.004287,
"lon": 118.779369
}
}
JSON类型 |
ElasticSearch 类型 |
---|---|
JSON类型 |
ElasticSearch 类型 |
字符串 |
|
布尔值 | boolean |
浮点数 | float |
整数 | long |
对象 | Object |
数组 | 由第一个非空数值的类型所决定 |
空值 | 忽略 |
添加一个文档
// 写入文档,查看Mapping
PUT mapping_test/_doc/1
{
"firstName": "Chan",
"lastName": "Jackie",
"loginDate": "2018-07-24T10:29:48.103Z"
}// 查看Mapping 文件
GET mapping_test/_mapping
{
"mapping_test" : {
"mappings" : {
"properties" : {
"firstName" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"lastName" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"loginDate" : {
"type" : "date"
}
}
}
}
}
** 我们看到这里。es 将 loginDate 字符串自动处理成了 date 类型的**
#删除index
DELETE mapping_test
# dynamic mapping 推断字段的类型
PUT mapping_test/_doc/1
{
"uid":"123",
"isVip": false,
"isAdmin":"true",
"age":19,
"heigh":180
}
// 查看Mapping 文件
GET mapping_test/_mapping
{
"mapping_test" : {
"mappings" : {
"properties" : {
"age" : {
"type" : "long"
},
"heigh" : {
"type" : "long"
},
"isAdmin" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"isVip" : {
"type" : "boolean"
},
"uid" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}
}
}
两种情况:
1、新增加字段
2、对已有字段,一旦已经有数据写入,就不再支持修改字段定义
原因:
1、因为如果修改了字段的数据类型,会导致已被索引的属性无法被搜索,
2、但是如果是增加新的字段,就不会有这样的影响。
true |
false |
strict |
|
---|---|---|---|
文档可索引 | √ | √ | x |
字段可索引 | √ | x | x |
Mapping被更新 | √ | x | x |
#写入的文档加入新的字段,默认Mapping支持dynamic
PUT mapping_test/_doc/1
{
"dynamicTest":"test"
}
#查询
GET /mapping_test/_search
{
"version": true,
"query": {
"match": {
"dynamicTest": "test"
}
}
}
#修改为dynamic false
PUT mapping_test/_mapping
{
"dynamic": false
}
#新增anotherField
PUT mapping_test/_doc/10
{
"anotherField":"test"
}
#查看数据
GET b_test/_doc/10
#该字段无法被索引,应为dynamic为false
GET /mapping_test/_search
{
"version": true,
"query": {
"match": {
"anotherField": "test"
}
}
}
#修改为dynamic strict
PUT mapping_test/_mapping
{
"dynamic": "strict"
}
#写入数据出错, HTTP CODE 400
PUT mapping_test/_doc/10
{
"lastField":"test"
}
PUT /your_index
{
"mappings": {
"properties": {
// 字段
}
}
}
#e.g.
PUT /scheduler_driver_intention_info
{
"mappings": {
"properties": {
"intentionId": {
"type": "long"
},
"productId": {
"type": "long"
},
"driverId": {
"type": "long"
},
"cooperationType": {
"type": "integer"
},
"startLocation": {
"type": "geo_point"
},
"truckType": {
"type": "long"
},
"truckLength": {
"type": "double"
},
"startCityList": {
"type": "nested",
"properties": {
"districtId": {
"type": "integer"
},
"districtName": {
"type": "keyword"
},
"cityId": {
"type": "integer"
},
"cityName": {
"type": "keyword"
},
"generalizationFlag": {
"type": "boolean"
}
}
},
"endCityList": {
"type": "nested",
"properties": {
"districtId": {
"type": "integer"
},
"districtName": {
"type": "keyword"
},
"cityId": {
"type": "integer"
},
"cityName": {
"type": "keyword"
},
"generalizationFlag": {
"type": "boolean"
}
}
},
"startExcludeCityList": {
"type": "nested",
"properties": {
"districtId": {
"type": "integer"
},
"districtName": {
"type": "keyword"
}
}
},
"endExcludeCityList": {
"type": "nested",
"properties": {
"districtId": {
"type": "integer"
},
"districtName": {
"type": "keyword"
}
}
},
"createTime": {
"type": "long"
},
"createTimeStr": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"updateTime": {
"type": "long"
},
"updateTimeStr": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"valid": {
"type": "boolean"
}
}
}
}
#索引新增字段
PUT your_index/_mapping
{
"properties": {
"aaa": {
"type": "text"
}
}
}
为了减少输入工作量,减少出错概率,可以依照以下步骤
为了减少输入工作量,减少出错概率,可以依照以下步骤
Index - 控制当前字段是否被索引,默认为true,如果设置成false,该字段不可被搜索
# 设置index为false
PUT b_test
{
"mappings": {
"properties": {
"firstName":{
"type": "text"
},
"lastName":{
"type": "text"
},
"mobile":{
"type": "text",
"index": false
}
}
}
}
PUT /b_test/_doc/1
{
"firstName": "zhang",
"lastName": "san",
"mobile": "123"
}
#报错
GET /b_test/_search
{
"query": {
"match": {
"mobile": "123"
}
}
}
1、四种不同的级别的Index Options 设置,可以控制倒排索引记录的内容
2、Text 类型默认记录 positions,其他默认为 docs
3、记录内容越多,占用存储空间越大
# 显示的创建Mapping
PUT b_test
{
"mappings": {
"properties": {
"name":{
"type": "text"
},
"address":{
"type": "text"
},
"phone_num":{
"type": "text",
"index": false
},
"bio":{
"type": "text",
"index_options": "offsets"
}
}
}
}
# 显示的创建Mapping
PUT b_test
{
"mappings": {
"properties": {
"firstName":{
"type": "text"
},
"lastName":{
"type": "text"
},
"mobile":{
"type": "keyword",
"null_value": "NULL"
}
}
}
}
PUT /b_test/_doc/1
{
"firstName": "zhang",
"lastName": "san",
"mobile": null
}
# 查询 注意NULL为大写
GET b_test/_search?q=mobile:NULL
GET b_test/_search
{
"query": {
"match": {
"mobile": "NULL"
}
}
}
# 显示的创建Mapping
PUT b_test
{
"mappings": {
"properties": {
"firstName":{
"type": "text",
"copy_to": "fullName"
},
"lastName":{
"type": "text",
"copy_to": "fullName"
}
}
}
}
PUT /b_test/_doc/1
{
"firstName": "zhang",
"lastName": "san"
}
# 查询
GET b_test/_search?q=fullName:(zhang san)
ES中不提供专门的数组类型,但是任何字段,都可以包含多个相同类型的数值
PUT b_test/_doc/1
{
"firstName": "Chan",
"lastName": "zhangsan"
}
PUT b_test/_doc/2
{
"firstName": "Chan",
"lastName": ["zhangsan","lisi"]
}
# 查询
GET /b_test/_search
{
"query": {
"term": {
"lastName.keyword": "zhangsan"
}
}
}
1NF - 消除非主属性对键的部分函数依赖
2NF - 消除非主属性对键的传递函数依赖
3NF - 消除主属性对键的传递函数依赖
BCNF - 主属性不依赖于主属性
对象类型:
PUT /blog
{
"mappings": {
"properties": {
"content": {
"type": "text"
},
"time": {
"type": "date"
},
"user": {
"properties": {
"city": {
"type": "text"
},
"userid": {
"type": "long"
},
"username": {
"type": "keyword"
}
}
}
}
}
}
# 插入一条 Blog 信息
PUT blog/_doc/1
{
"content":"I like Elasticsearch",
"time":"2021-10-19T00:00:00",
"user":{
"userid":1,
"username":"Jack",
"city":"Shanghai"
}
}
# 查询 Blog 信息
POST blog/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"content": "Elasticsearch"
}
},
{
"match": {
"user.username": "Jack"
}
}
]
}
}
}
对象数组类型:
# 电影的Mapping信息
PUT my_movies
{
"mappings" : {
"properties" : {
"actors" : {
"properties" : {
"first_name" : {
"type" : "keyword"
},
"last_name" : {
"type" : "keyword"
}
}
},
"title" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}
}
# 写入一条电影信息
POST my_movies/_doc/1
{
"title":"Speed",
"actors":[
{
"first_name":"Keanu",
"last_name":"Reeves"
},
{
"first_name":"Dennis",
"last_name":"Hopper"
}
]
}
# 查询电影信息
POST my_movies/_search
{
"query": {
"bool": {
"must": [
{"match": {"actors.first_name": "Keanu"}},
{"match": {"actors.last_name": "Hopper"}}
]
}
}
}
问题:为什么会搜索到不需要的结果?
Nested Data Type
PUT my_movies
{
"mappings" : {
"properties" : {
"actors" : {
"type": "nested",
"properties" : {
"first_name" : {"type" : "keyword"},
"last_name" : {"type" : "keyword"}
}},
"title" : {
"type" : "text",
"fields" : {"keyword":{"type":"keyword","ignore_above":256}}
}
}
}
}
# 写入一条电影信息
POST my_movies/_doc/1
{
"title":"Speed",
"actors":[
{
"first_name":"Keanu",
"last_name":"Reeves"
},
{
"first_name":"Dennis",
"last_name":"Hopper"
}
]
}
# Nested 查询
POST my_movies/_search
{
"query": {
"bool": {
"must": [
{"match": {"title": "Speed"}},
{
"nested": {
"path": "actors",
"query": {
"bool": {
"must": [
{"match": {
"actors.first_name": "Keanu"
}},
{"match": {
"actors.last_name": "Hopper"
}}
]
}
}
}
}
]
}
}
}
倒排索引,是适合用于进行搜索的
倒排索引的结构
(1)包含这个关键词的document list
(2)包含这个关键词的所有document的数量:IDF(inverse document frequency)
(3)这个关键词在每个document中出现的次数:TF(term frequency)
(4)这个关键词在这个document中的次序
(5)每个document的长度:length norm
(6)包含这个关键词的所有document的平均长度
word doc1 doc2
dog * *
hello *
you *
倒排索引不可变的好处
(1)不需要锁,提升并发能力,避免锁的问题
(2)数据不变,一直保存在os cache中,只要cache内存足够
(3)filter cache一直驻留在内存,因为数据不变
(4)可以压缩,节省cpu和io开销
倒排索引不可变的坏处:每次都要重新构建整个索引
1、在ES5.x里,一定要注意数值类型是否需要做范围查询,看似数值,但其实只用于Term或者Terms这类精确匹配的,应该定义为keyword类型。典型的例子就是索引web日志时常见的HTTP Status code。
2、如果RangeQuery的结果集很大,并且还需要和其他结果集更小的查询条件做AND的,应该升级到ES5.4+,该版本在底层引入的indexOrDocValuesQuery
,可以极大提升该场景下RangeQuery的查询速度。
3、reindex重建索引
一个field的设置是不能被修改的,如果要修改一个Field或者primary shard那么应该重新按照新的mapping,建立一个index,然后将数据批量查询出来,重新用bulk api写入index中批量查询的时候,建议采用scroll api,并且采用多线程并发的方式来reindex数据,每次scoll就查询指定日期的一段数据,交给一个线程即可。
如果说旧索引的名字,是old_index,新索引的名字是new_index,终端java应用,已经在使用old_index在操作了,难道还要去停止java应用,修改使用的index为new_index,才重新启动java应用吗?这个过程中,就会导致java应用停机,可用性降低,所以给java应用一个别名,这个别名是指向旧索引的,java应用先用index alias来操作,此时实际指向的是旧的my_index
①PUT /{my_index}/_alias/{alias_index}
②新建一个index,调整其mapping
③使用scroll api将数据批量查询出来
GET /my_index/_search?scroll=1m
{
"query": {
"match_all": {}
},
"sort": ["_doc"],
"size": 1
}
④采用bulk api将scoll查出来的一批数据,批量写入新索引
POST /_bulk
{ "index": { "_index": "my_index_new", "_type": "my_type", "_id": "2" }}
{ "content": "xxx" }
⑤反复循环,查询一批又一批的数据出来,采取bulk api将每一批数据批量写入新索引
⑥将alias_index切换到my_index_new上去,java应用会直接通过index别名使用新的索引中的数据,java应用程序不需要停机,零提交,高可用
POST /_aliases
{
"actions": [
{ "remove": { "index": "my_index", "alias": "alias_index" }},
{ "add": { "index": "my_index_new", "alias": "alias_index" }}
]
}
官网文档
Mapping | Elasticsearch Guide [7.5] | Elastic
一文搞懂 Elasticsearch 之 Mapping
一文搞懂 Elasticsearch 之 Mapping - 云+社区 - 腾讯云
理解 Percolator 数据类型及 Percolate 查询
Elasticsearch:理解 Percolator 数据类型及 Percolate 查询-阿里云开发者社区
使用 ignore_above 限制字符串长度
Elasticsearch 7 : 使用 ignore_above 限制字符串长度 - 乐天笔记