PUT /employee
{
"settings": {
"index": {
"refresh_interval": "1s",
"number_of_shards": 1,
"max_result_window": "10000",
"number_of_replicas": 0
}
},
"mappings": {
"properties": {
"id": {
"type": "long"
},
"name": {
"type": "keyword"
},
"age": {
"type": "integer"
},
"create_date": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"
},
"address": {
"type": "keyword"
},
"desc": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"leader": {
"type": "object"
},
"car": {
"type": "nested",
"properties": {
"brand": {
"type": "keyword",
"ignore_above": 256
},
"number": {
"type": "keyword",
"ignore_above": 256
},
"make": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
DELETE /employee
PUT /employee/_mapping
{
"properties":{
"salary":{
"type":"double"
}
}
}
GET /employee/_mapping
{
"employee" : {
"mappings" : {
"properties" : {
"address" : {
"type" : "keyword"
},
"age" : {
"type" : "integer"
},
"car" : {
"type" : "nested",
"properties" : {
"brand" : {
"type" : "keyword",
"ignore_above" : 256
},
"make" : {
"type" : "keyword",
"ignore_above" : 256
},
"number" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"create_date" : {
"type" : "date",
"format" : "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"
},
"desc" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"id" : {
"type" : "long"
},
"leader" : {
"type" : "object"
},
"name" : {
"type" : "keyword"
},
"salary" : {
"type" : "double"
}
}
}
}
}
PUT /employee/_doc/1
{
"id": 1,
"name": "张小勇",
"age": "45",
"leader": {
"name": "马某",
"age": 40,
"depart": "营销部"
},
"car": [{
"brand": "丰田",
"make": "日本",
"number": "粤A12345"
},
{
"brand": "奔驰",
"make": "德国",
"number": "粤A9999"
}
],
"address": "浙江杭州西湖阿里马巴巴高薪技术开发区110号",
"desc": "长相不丑,擅长营销...."
}
PUT /employee/_doc/2
{
"id": 2,
"name": "张某龙",
"age": "40",
"leader": {
"name": "马x腾",
"age": 40,
"depart": "研发部"
},
"car": [{
"brand": "奔驰",
"make": "中国北京",
"number": "粤A8888"
},
{
"brand": "华晨宝马",
"make": "中国",
"number": "粤A9999"
}
],
"address": "广东省深圳市南山区某讯大厦1号",
"desc": "长相帅气,高大威猛,人中龙凤,擅长写代码"
}
PUT /employee/_doc/1
{
"id": 1,
"address": "浙江杭州西湖阿里马巴巴高薪技术开发区111号"
}
覆盖式修改,本质同上,相当于先删除后增加
POST /employee/_doc/1
{
"id": 1,
"address": "浙江杭州西湖阿里马巴巴高薪技术开发区112号"
}
非覆盖式修改,只修改当前字段,其他字段和值保留原样
POST /employee/_update/1
{
"doc": {
"id": 1,
"address": "浙江杭州西湖阿里马巴巴高薪技术开发区110号"
}
}
Query string语法:
q=field:search content语法,默认是_all
+:代表一定要出现,类似must。
-:代表一定不能包含,类似must_not
GET /employee/_search
GET /employee/_search?q=id:1
GET /employee/_search?q=+id:1
GET /employee/_search?q=-id:1
GET /employee/_search?q=-id:1&q=name:赵老哥&sort=id:desc
GET /employee/_search?from=0&size=2
Query DSL写法:略
DELETE /employee/_doc/1
Elasticsearch默认的分词器是标准分词器,即对英文进行单词切割,对中文进行单字切割。
POST /_analyze
{
"analyzer": "standard",
"text": "我是一个中国人"
}
上述分词将分词为:我|是|中|国|人,即单字拆分。
GET /employee/_search
{
"_source": ["id","name","leader.name","car.*"],
"query": {
"match_all": {}
}
}
不带条件,全部查询
GET /employee/_search
{
"query": {
"match_all": {}
}
}
match查询text类型字段
match会被分词,查询text类型的数据,只要match的分词结果和text的分词结果有相同的就匹配
GET /employee/_search
{
"_source": ["id","name","desc"],
"query": {
"match": {
"desc": "长相帅气"
}
}
}
查询结果为:
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 1.5948997,
"hits" : [
{
"_index" : "employee",
"_type" : "_doc",
"_id" : "2",
"_score" : 1.5948997,
"_source" : {
"name" : "张某龙",
"id" : 2,
"desc" : "长相帅气,高大威猛,人中龙凤,擅长写代码"
}
},
{
"_index" : "employee",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.49274418,
"_source" : {
"name" : "张小勇",
"id" : 1,
"desc" : "长相不丑,擅长营销...."
}
}
]
}
}
match查询keyword类型字段
match会被分词,而keyword类型不会被分词,match的需要跟keyword的完全匹配才有结果
GET /employee/_search
{
"_source": ["id","name"],
"query": {
"match": {
"name": "张小勇"
}
}
}
查询结果为:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 0.6931471,
"hits" : [
{
"_index" : "employee",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.6931471,
"_source" : {
"name" : "张小勇",
"id" : 1
}
}
]
}
}
查询text类型字段,查询用于完整匹配指定短语,按照短语分词顺序匹配
GET /employee/_search
{
"_source": ["id","name","desc"],
"query": {
"match_phrase": {
"desc": "长相帅"
}
}
}
查询结果为:
{
"took" : 17,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 0.9220042,
"hits" : [
{
"_index" : "employee",
"_type" : "_doc",
"_id" : "2",
"_score" : 0.9220042,
"_source" : {
"name" : "张某龙",
"id" : 2,
"desc" : "长相帅气,高大威猛,人中龙凤,擅长写代码"
}
}
]
}
}
查询keyword类型字段,需要完全匹配,原因是keyword类型字段不分词,要完全匹配
GET /employee/_search
{
"_source": ["id","name","desc"],
"query": {
"match_phrase": {
"address": "深圳市"
}
}
}
地址里面虽然有广州,但是查询无结果:
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 0,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
}
}
搜索本身很多时候是不精确的,很多时候我们需要在用户的查询词中有部分错误的情况下也能召回正确的结果,可以理解为模糊匹配
经典莱文斯坦距离(Levenshtein distance):编辑距离的一种,指两个字符串之间,由一个转成另一个所需的最少编辑操作次数
替换一个字符:dog -> dop
插入一个字符:an -> ant
删除一个字符:your -> you
莱文斯坦扩展版(Damerau–Levenshtein distance):莱文斯坦距离的一个扩展版 ,将相邻位置的两个字符的互换当做一次编辑,而在经典的莱文斯坦距离计算中位置互换是2次编辑
参数说明:
fuzziness:代表固定的最大编辑距离,可以是数字0,1,2,默认是0,不开启错误匹配,或者字符串AUTO模式,自动根据字符长度来匹配编辑距离数
prefix_length:控制两个字符串匹配的最小相同的前缀大小,也即是前n个字符不允许编辑,必须与查询词相同,默认是0
max_expansions:匹配最大词项,取每个分片的N个词项,减少召回率,默认值为50
transpositions:将相邻位置字符互换算作一次编辑距离,如 ab -> ba,即使用Damerau–Levenshtein距离算法,默认开启,设置 transpositions=false将使用经典莱文斯坦距离算法
GET /employee/_search
{
"query": {
"fuzzy": {
"desc": {
"value": "长相",
"fuzziness": 1,
"prefix_length": 0,
"max_expansions": 50,
"transpositions": true
}
}
}
}
上述查询无结果,是因为fuzzy不会对query分词,标准分词器分词为单字分词
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 0,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
}
}
如果设置编辑距离fuzziness为1,即最大允许分词是错一个字,即:
GET /employee/_search
{
"query": {
"fuzzy": {
"desc": {
"value": "长相",
"fuzziness": 1,
"prefix_length": 0,
"max_expansions": 50,
"transpositions": true
}
}
}
}
查询结果变为:
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 0.0,
"hits" : [
{
"_index" : "employee",
"_type" : "_doc",
"_id" : "2",
"_score" : 0.0,
"_source" : {
"id" : 2,
"name" : "张某龙",
"age" : "40",
"leader" : {
"name" : "马x腾",
"age" : 40,
"depart" : "研发部"
},
"car" : [
{
"brand" : "奔驰",
"make" : "中国北京",
"number" : "粤A8888"
},
{
"brand" : "华晨宝马",
"make" : "中国",
"number" : "粤A9999"
}
],
"address" : "广东省深圳市南山区某讯大厦1号",
"desc" : "长相帅气,高大威猛,人中龙凤,擅长写代码"
}
},
{
"_index" : "employee",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.0,
"_source" : {
"id" : 1,
"name" : "张小勇",
"age" : "45",
"leader" : {
"name" : "马某",
"age" : 40,
"depart" : "营销部"
},
"car" : [
{
"brand" : "丰田",
"make" : "日本",
"number" : "粤A12345"
},
{
"brand" : "奔驰",
"make" : "德国",
"number" : "粤A9999"
}
],
"address" : "浙江杭州西湖阿里马巴巴高薪技术开发区110号",
"desc" : "长相不丑,擅长营销...."
}
}
]
}
}
注意:fuzzy不进行query分词查询,match也可以进行query分词模糊查询
GET /employee/_search
{
"query": {
"match": {
"desc": {
"query": "长相",
"fuzziness": 1
}
}
}
}
查询结果:
{
"took" : 5,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 0.49274418,
"hits" : [
{
"_index" : "employee",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.49274418,
"_source" : {
"id" : 1,
"name" : "张小勇",
"age" : "45",
"leader" : {
"name" : "马某",
"age" : 40,
"depart" : "营销部"
},
"car" : [
{
"brand" : "丰田",
"make" : "日本",
"number" : "粤A12345"
},
{
"brand" : "奔驰",
"make" : "德国",
"number" : "粤A9999"
}
],
"address" : "浙江杭州西湖阿里马巴巴高薪技术开发区110号",
"desc" : "长相不丑,擅长营销...."
}
},
{
"_index" : "employee",
"_type" : "_doc",
"_id" : "2",
"_score" : 0.38656068,
"_source" : {
"id" : 2,
"name" : "张某龙",
"age" : "40",
"leader" : {
"name" : "马x腾",
"age" : 40,
"depart" : "研发部"
},
"car" : [
{
"brand" : "奔驰",
"make" : "中国北京",
"number" : "粤A8888"
},
{
"brand" : "华晨宝马",
"make" : "中国",
"number" : "粤A9999"
}
],
"address" : "广东省深圳市南山区某讯大厦1号",
"desc" : "长相帅气,高大威猛,人中龙凤,擅长写代码"
}
}
]
}
}
搜索一般都会要求具有“搜索推荐”或者叫“搜索补全”的功能,即在用户输入搜索的过程中,进行自动补全或者纠错。以此来提高搜索文档的匹配精准度,进而提升用户的搜索体验。
针对单独term的搜索推荐,可以对单个term进行建议或者纠错,不考虑搜索短语中多个term的关系
PhraseSuggester即短语建议器,但是phrasesuggester在termsuggester的基础上,考虑多个term之间的关系,比如是否同时出现一个索引原文中,相邻程度以及词频等,但是据说有坑,就是返回的内容不一定是文档中包含的。
自动补全,自动完成,支持三种查询:前缀查询(prefix)|模糊查询(fuzzy)|正则表达式查询(regex),主要针对的应用场景就是"Auto Completion"。 此场景下用户每输入一个字符的时候,就需要即时发送一次查询请求到后端查找匹配项,在用户输入速度较高的情况下对后端响应速度要求比较苛刻。因此实现上它和前面两个Suggester采用了不同的数据结构,索引并非通过倒排来完成,而是将analyze过的数据编码成FST和索引一起存放。对于一个open状态的索引,FST会被ES整个装载到内存里的,进行前缀查找速度极快。但是FST只能用于前缀查找,这也是Completion Suggester的局限所在。
综上所述,自动推荐补全有以下特点
a.基于内存而非索引,性能强悍。
b.需要结合特定的completion类型
c.只适合前缀推荐
prefix query:基于前缀查询的搜索提示,是最常用的一种搜索推荐查询。
prefix:客户端搜索词
field:建议词字段
size:需要返回的建议词数量(默认5)
skip_duplicates:是否过滤掉重复建议,默认false
fuzzy query:模糊匹配词项
fuzziness:允许的偏移量,默认auto
transpositions:如果设置为true,则换位计为一次更改而不是两次更改,默认为true。
min_length:返回模糊建议之前的最小输入长度,默认 3
prefix_length:输入的最小长度(不检查模糊替代项)默认为 1
unicode_aware:如果为true,则所有度量(如模糊编辑距离,换位和长度)均以Unicode代码点而不是以字节为单位。这比原始字节略慢,因此默认情况下将其设置为false。
regex query:可以用正则表示前缀,不建议使用
重新添加索引,在需要自动补全搜索字段上添加completion子类型,并重新插入数据
PUT /employee
{
"settings": {
"index": {
"refresh_interval": "1s",
"number_of_shards": 1,
"max_result_window": "10000",
"number_of_replicas": 0
}
},
"mappings": {
"properties": {
"id": {
"type": "long"
},
"name": {
"type": "keyword"
},
"age": {
"type": "integer"
},
"create_date": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"
},
"address": {
"type": "keyword"
},
"desc": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
},
"completion": {
"type": "completion"
}
}
},
"leader": {
"type": "object"
},
"car": {
"type": "nested",
"properties": {
"brand": {
"type": "keyword",
"ignore_above": 256
},
"number": {
"type": "keyword",
"ignore_above": 256
},
"make": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
a.前缀搜索,下面只能搜到以 "长相" 开头的短语
GET employee/_search?pretty
{
"_source": ["name","desc"],
"suggest": {
"descCompletion": {
"prefix": "长相",
"completion": {
"field": "desc.completion"
}
}
}
}
返回结果:
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 0,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"suggest" : {
"descCompletion" : [
{
"text" : "长相",
"offset" : 0,
"length" : 2,
"options" : [
{
"text" : "长相不丑,擅长营销....",
"_index" : "employee",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"name" : "张小勇",
"desc" : "长相不丑,擅长营销...."
}
},
{
"text" : "长相帅气,高大威猛,人中龙凤,擅长写代码",
"_index" : "employee",
"_type" : "_doc",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"name" : "张某龙",
"desc" : "长相帅气,高大威猛,人中龙凤,擅长写代码"
}
}
]
}
]
}
}
b.模糊查询匹配前缀
标准分词器,不好搞,暂时略
GET employee/_search?pretty
{
"_source": ["name", "desc"],
"suggest": {
"descCompletion": {
"prefix": "长",
"completion": {
"field": "desc.completion",
"skip_duplicates": true,
"fuzzy": {
"fuzziness": 5,
"transpositions": false,
"min_length": 1,
"prefix_length": 1
}
}
}
}
}
查询结果
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 0,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"suggest" : {
"descCompletion" : [
{
"text" : "长",
"offset" : 0,
"length" : 1,
"options" : [
{
"text" : "长相不丑,擅长营销....",
"_index" : "employee",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"name" : "张小勇",
"desc" : "长相不丑,擅长营销...."
}
},
{
"text" : "长相帅气,高大威猛,人中龙凤,擅长写代码",
"_index" : "employee",
"_type" : "_doc",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"name" : "张某龙",
"desc" : "长相帅气,高大威猛,人中龙凤,擅长写代码"
}
}
]
}
]
}
}
c.正则匹配词语前缀,正则搜"长*"有结果,但是搜"长相*"则无结果,因为标准分词器没有"长相"该词项,都是单字
GET employee/_search?pretty
{
"suggest": {
"descCompletion": {
"regex": "长*",
"completion": {
"field": "desc.completion",
"size": 10
}
}
}
}
搜索结果
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 0,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"suggest" : {
"descCompletion" : [
{
"text" : "长*",
"offset" : 0,
"length" : 2,
"options" : [
{
"text" : "长相不丑,擅长营销....",
"_index" : "employee",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"id" : 1,
"name" : "张小勇",
"age" : "45",
"leader" : {
"name" : "马某",
"age" : 40,
"depart" : "营销部"
},
"car" : [
{
"brand" : "丰田",
"make" : "日本",
"number" : "粤A12345"
},
{
"brand" : "奔驰",
"make" : "德国",
"number" : "粤A9999"
}
],
"address" : "浙江杭州西湖阿里马巴巴高薪技术开发区110号",
"desc" : "长相不丑,擅长营销...."
}
},
{
"text" : "长相帅气,高大威猛,人中龙凤,擅长写代码",
"_index" : "employee",
"_type" : "_doc",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"id" : 2,
"name" : "张某龙",
"age" : "40",
"leader" : {
"name" : "马x腾",
"age" : 40,
"depart" : "研发部"
},
"car" : [
{
"brand" : "奔驰",
"make" : "中国北京",
"number" : "粤A8888"
},
{
"brand" : "华晨宝马",
"make" : "中国",
"number" : "粤A9999"
}
],
"address" : "广东省深圳市南山区某讯大厦1号",
"desc" : "长相帅气,高大威猛,人中龙凤,擅长写代码"
}
}
]
}
]
}
}
a.CompletionSuggester的筛选器,通过设置向下文映射来实现
b.在索引和查询启用上下文的完成字段时,必须提供上下文
c.添加上下文映射会增加completion的字段的索引大小。并且这一过程法发生在堆中