主要内容: 修改以及定制分词器,root object简单说明, dynamic mapping(动态映射)
1、修改以及定制分词器
1.1、默认的分词器 standard
standard tokenizer:以单词边界进行切分
standard token filter:什么都不做
lowercase token filter:将所有字母转换为小写
stop token filer(默认被禁用):移除停用词,比如a、the、 it等等
1.2、修改分词器的设置
启用 english的停用词token filter
PUT /my_index
{
"settings": {
"analysis": {
"analyzer": {
"es_std": {
"type": "standard",
"stopwords": "_english_"
}
}
}
}
}
可以试着运行下列两个方法,观察区别
GET /my_index/_analyze
{
"analyzer": "standard",
"text": "a dog is in the house"
}
GET /my_index/_analyze
{
"analyzer": "es_std",
"text":"a dog is in the house"
}
1.3、定制化自己的分词器
PUT /my_index
{
"settings": {
"analysis": {
"char_filter": { //自定义一个char_filter ,将&符号转化为and
"&_to_and": {
"type": "mapping",
"mappings": [
"&=> and"
]
}
},
"filter": { //自定义停用词,
"my_stopwords": {
"type": "stop",
"stopwords": [
"the", // the a 为停用词
"a"
]
}
},
"analyzer": { 自定义分词器
"my_analyzer": {
"type": "custom",
"char_filter": [
"html_strip",
"&_to_and"
],
"tokenizer": "standard",
"filter": [
"lowercase",
"my_stopwords"
]
}
}
}
}
}
测试一下,观察结果
GET /my_index/_analyze
{
"text": "tom&jerry are a friend in the house, , HAHA!!",
"analyzer": "my_analyzer"
}
使用自己定义的分词器
PUT /my_index/_mapping
{
"properties": {
"content": { //对content字段使用自定义的分词器
"type": "text",
"analyzer": "my_analyzer"
}
}
}
2、root object
2.1、root object概念
就是某个type对应的mapping json,包括了properties,metadata(_id,_source,_type),settings(analyzer),其他settings(比如include_in_all)
PUT /my_index
{
"mappings": {
"my_type": {
"properties": {}
}
}
}
2.2、properties
type,index,analyzer
PUT /my_index/_mapping/
{
"properties": {
"title": {
"type": "text"
}
}
}
2.3、_source
好处
(1)查询的时候,直接可以拿到完整的document,不需要先拿document id,再发送一次请求拿document
(2)partial update基于_source实现
(3)reindex时,直接基于_source实现,不需要从数据库(或者其他外部存储)查询数据再修改
(4)可以基于_source定制返回field
(5)debug query更容易,因为可以直接看到_source
如果不需要上述好处,可以禁用_source
PUT /my_index/_mapping
{
"_source": {
"enabled": false
}
}
2.4、_all
将所有field打包在一起,作为一个_all field,建立索引。没指定任何field进行搜索时,就是使用_all field在搜索。
···
PUT /my_index/_mapping/my_type3
{
"_all": {"enabled": false}
}
也可以在field级别设置include_in_all field,设置是否要将field的值包含在_all field中
PUT /my_index/_mapping/my_type4
{
"properties": {
"my_field": {
"type": "text",
"include_in_all": false
}
}
}
3、dynamic mapping定制化策略
3.1、定制dynamic策略
true:遇到陌生字段,就进行dynamic mapping
false:遇到陌生字段,就忽略
strict:遇到陌生字段,就报错
PUT /my_index
{
"mappings": {
"dynamic": "strict",
"properties": {
"title": {
"type": "text"
},
"address": {
"type": "object",
"dynamic": "true"
}
}
}
}
尝试插入content字段,会提示content字段不被允许
PUT /my_index/_doc/1
{
"title": "my article",
"content": "this is my article",
"address": {
"province": "guangdong",
"city": "guangzhou"
}
}
address字段则没有这个问题,因为设为dynamic,可以动态插入
PUT /my_index/my_type/1
{
"title": "my article",
"address": {
"province": "guangdong",
"city": "guangzhou"
}
}
3.2 定制dynamic maping策略
(1)date_detection
默认会按照一定格式识别date,比如yyyy-MM-dd。但是如果某个field先过来一个2017-01-01的值,就会被自动dynamic mapping成date,后面如果再来一个"hello world"之类的值,就会报错。可以手动关闭某个index的date_detection,如果有需要,自己手动指定某个field为date类型。
PUT /my_index/_mapping
{
"date_detection": false
}
(2)定制自己的dynamic mapping template(type level)(动态映射模板)
PUT my_index
{
"mappings": {
"dynamic_templates": [
{
"longs_as_strings": {
"match_mapping_type": "string",
"match": "long_*",
"unmatch": "*_text",
"mapping": {
"type": "long"
}
}
}
]
}
}
插入数据
PUT my_index/_doc/1
{
"long_num": "5",
"long_text": "foo"
}
long_num
会转化成long
long_text
会是默认的string
更多操作参见官方文档
Dynamic templates | Elasticsearch Reference [7.6] | Elastic https://www.elastic.co/guide/en/elasticsearch/reference/7.6/dynamic-templates.html