本样例版本为6.x
7.x中去掉了type,但是type依然存在,为默认值:_doc
PUT paper
{
"settings": {
"number_of_replicas": 0,
"number_of_shards": 2
},
"mappings": {
"pap": {
"properties": {
"linkCount": {
"fielddata": true,
"store": true,
"type": "text"
},
"pubDate": {
"copy_to": "fullcontent",
"store": true,
"type": "text"
},
"publish_date": {
"format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis",
"ignore_malformed": true,
"type": "date"
},
"source": {
"copy_to": "fullcontent",
"fielddata": true,
"store": true,
"type": "text"
},
"summary": {
"copy_to": "fullcontent",
"store": true,
"type": "text",
"similarity": "BM25",
"index_options": "offsets",
"analyzer": "ik_max_word"
},
"title": {
"copy_to": "fullcontent",
"store": true,
"type": "text",
"similarity": "BM25",
"index_options": "offsets",
"analyzer": "ik_max_word"
},
"url": {
"type": "text"
},
"viewCount": {
"fielddata": true,
"store": true,
"type": "text"
},
"year": {
"fielddata": true,
"store": true,
"type": "text"
},
"columnName": {
"copy_to": "fullcontent",
"store": true,
"type": "text",
"similarity": "BM25",
"index_options": "offsets"
},
"doi": {
"copy_to": "fullcontent",
"store": true,
"type": "text"
},
"downloadCount": {
"fielddata": true,
"store": true,
"type": "text"
},
"enTitle": {
"copy_to": "fullcontent",
"store": true,
"type": "text",
"similarity": "BM25",
"analyzer": "ik_max_word"
},
"id": {
"store": true,
"type": "keyword"
},
"journal": {
"copy_to": "fullcontent",
"store": true,
"type": "text",
"similarity": "BM25",
"index_options": "offsets",
"analyzer": "ik_max_word"
},
"keyWords": {
"copy_to": "fullcontent",
"store": true,
"type": "keyword"
},
"authors": {
"properties": {
"author": {
"copy_to": "fullcontent",
"type": "keyword"
},
"doctorId": {
"store": true,
"type": "keyword"
},
"hospitalName": {
"copy_to": "fullcontent",
"store": true,
"type": "text",
"similarity": "BM25",
"index_options": "offsets",
"analyzer": "ik_max_word"
},
"doctor_hcoid": {
"store": true,
"type": "keyword"
},
"doctor_hcpid": {
"store": true,
"type": "keyword"
},
"institution": {
"copy_to": "fullcontent",
"store": true,
"type": "text",
"similarity": "BM25",
"index_options": "offsets",
"analyzer": "ik_max_word"
},
"url": {
"type": "text"
}
}
},
"fullcontent": {
"copy_to": "fullcontent",
"store": true,
"type": "text",
"similarity": "BM25",
"index_options": "offsets",
"analyzer": "ik_max_word"
}
}
}
}
}
"number_of_replicas": 0, // 备份数
"number_of_shards": 2 //分片数
字段类型概述
一级分类 二级分类 具体类型
核心类型 字符串类型 text,keyword
整数类型 integer,long,short,byte
浮点类型 double,float,half_float,scaled_float
逻辑类型 boolean
日期类型 date
范围类型 range
二进制类型 binary
复合类型 数组类型 array
对象类型 object
嵌套类型 nested
地理类型 地理坐标类型 geo_point
地理地图 geo_shape
特殊类型 IP类型 ip
范围类型 completion
令牌计数类型 token_count
附件类型 attachment
抽取类型 percolator
例:
"summary": {
"type": "text",
"analyzer": "ik_max_word"
}
此mapping表示:summary字段为字符串,分词器采用全词匹配。若希望采用分词匹配,则mapping应为:
"summary": {
"type": "text",
"analyzer": "ik_smart"
}
例:
"cnName": {
"store": true,
"type": "keyword"
},
此mapping表示:cnName字段为字符串,类型为keyword,为全词匹配,适合精确匹配查找,支持groupby。
"linkCount": {
"fielddata": true,
"store": true,
"type": "text"
}
这个字段需要 groupby 且 type 为 text 的时候,必须 将 fielddata 设置为 true
elasticsearch将字段保存一份源文档到 _source
"publish_date": {
"format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis",
"ignore_malformed": true,
"type": "date"
}
ignore_malformed:取值为true或false,默认值是false。若要忽略格式错误的数值,则应设置为true。
例:
"summary": {
"copy_to": "fullcontent",
"store": true,
"type": "text",
"similarity": "BM25",
"index_options": "offsets",
"analyzer": "ik_max_word"
},
此mapping中的 “similarity”: “BM25” 为了避免搜索词在该字段中出现的频率过高而影响评分。
比如:我们搜索fire fox,假如返回两篇文章 doc1 和 doc2,doc1 的评分为15,doc2的评分为10。但是,有可能doc1是一篇很长的关于火灾的文章;而doc2则是一篇关于firefox浏览器的使用教程。而我们的预期显然则是更偏向于后者,此时则需要在mapping中加入相似度模型。
关于BM25的理论基础:推荐阅读https://www.elastic.co/guide/cn/elasticsearch/guide/current/pluggable-similarites.html