起因:在项目开发过程中,要使用到搜索 引擎来对一些关键字实现逆向查询,如果仅用模糊搜索,那么搜索的时间会根据数据量的增大而增大,对比之下就学了elasticsearch,也记录一下,常常回顾。
1. Elasticsearch进行基础数据搭建及QueryString查询
1.1. 基础数据的准备
所有的数据都要在index这个容器下
就要录入document,_id:是文档的id,他和你内容数据id是没有关系
-
比如说操作商品数据(product,variants,images,tags)还是以商品id作为业务主键
/_doc/1/_update #我们不能为这个_id再保存一个关系到数据库,所以一般_id和我们的业务主键一致
比如图片路径不用索引:index:false
比如需要精确查询的内容:type:keyword
一定要手工创建mapping,设置你需要的类型和分词器
POST /index_customer/_mapping
{
"properties": {
"id": {
"type": "long"
},
"age": {
"type": "integer"
},
"username": {
"type": "keyword"
},
"nickname": {
"type": "text",
"analyzer": "ik_max_word"
},
"consume": {
"type": "float"
},
"desc": {
"type": "text",
"analyzer": "ik_max_word"
},
"sex": {
"type": "byte"
},
"borthday": {
"type": "date"
},
"faceimg": {
"type": "text",
"index": false
}
}
}
# 回忆一下分词器的概念
POST /index_customer/_analyze
{
"field": "nickname",
"text": "今晚路上的车非常通畅,我提前到公司了"
}
录入数据
# POST /index_customer/_doc/1001
{
"id": 1001,
"age": 24,
"username": "kingliu",
"nickname": "飞上天空做太阳",
"consume": 1289.88,
"desc": "我在艾编程学习java和vue,学习到了很多知识",
"sex": 1,
"birthday": "1996-12-24",
"faceimg": "https://www.icodingedu.com/img/customers/1001/logo.png"
}
# POST /index_customer/_doc/1002
{
"id": 1002,
"age": 26,
"username": "lucywang",
"nickname": "夜空中最亮的星",
"consume": 6699.88,
"desc": "我在艾编程学习VIP课程,非常打动我",
"sex": 0,
"birthday": "1994-02-24",
"faceimg": "https://www.icodingedu.com/img/customers/1002/logo.png"
}
# POST /index_customer/_doc/1003
{
"id": 1003,
"age": 30,
"username": "kkstar",
"nickname": "照亮你的星",
"consume": 7799.66,
"desc": "老师们授课风格不同,但结果却是异曲同工,讲的很不错,值得推荐",
"sex": 1,
"birthday": "1990-12-02",
"faceimg": "https://www.icodingedu.com/img/customers/1003/logo.png"
}
# POST /index_customer/_doc/1004
{
"id": 1004,
"age": 31,
"username": "alexwang",
"nickname": "骑着老虎看日出",
"consume": 9799.66,
"desc": "课程内容充实,有料,很有吸引力,赞一个",
"sex": 1,
"birthday": "1989-05-09",
"faceimg": "https://www.icodingedu.com/img/customers/1004/logo.png"
}
# POST /index_customer/_doc/1005
{
"id": 1005,
"age": 32,
"username": "jsonzhang",
"nickname": "我是你的神话",
"consume": 12789.66,
"desc": "需要抽时间把所有内容都学个遍,太给力料",
"sex": 1,
"birthday": "1988-07-19",
"faceimg": "https://www.icodingedu.com/img/customers/1005/logo.png"
}
# POST /index_customer/_doc/1006
{
"id": 1006,
"age": 27,
"username": "abbyli",
"nickname": "好甜的棉花糖",
"consume": 10789.86,
"desc": "还不错,内容超过我的预期值,钱花的值",
"sex": 0,
"birthday": "1993-10-11",
"faceimg": "https://www.icodingedu.com/img/customers/1006/logo.png"
}
# POST /index_customer/_doc/1007
{
"id": 1007,
"age": 33,
"username": "jacktian",
"nickname": "船长jack",
"consume": 9789.86,
"desc": "一直想抽时间学习,这下有时间整了,给力",
"sex": 1,
"birthday": "1987-09-16",
"faceimg": "https://www.icodingedu.com/img/customers/1007/logo.png"
}
# POST /index_customer/_doc/1008
{
"id": 1008,
"age": 23,
"username": "feifei",
"nickname": "我爱篮球",
"consume": 6689.86,
"desc": "虽然又一些不太懂,但只要有时间,相信一定能学好的",
"sex": 1,
"birthday": "1997-04-18",
"faceimg": "https://www.icodingedu.com/img/customers/1008/logo.png"
}
# POST /index_customer/_doc/1009
{
"id": 1009,
"age": 25,
"username": "daisyzhang",
"nickname": "一起看日出",
"consume": 6680,
"desc": "学习起来还是很有意思的,值得我多次学习",
"sex": 0,
"birthday": "1995-03-27",
"faceimg": "https://www.icodingedu.com/img/customers/1009/logo.png"
}
# POST /index_customer/_doc/1010
{
"id": 1010,
"age": 29,
"username": "ethenhe",
"nickname": "旋风小子",
"consume": 6699.99,
"desc": "课程严谨,知识丰富,讲解到位,编程给力",
"sex": 1,
"birthday": "1991-06-22",
"faceimg": "https://www.icodingedu.com/img/customers/1010/logo.png"
}
# POST /index_customer/_doc/1011
{
"id": 1011,
"age": 27,
"username": "lydiazheng",
"nickname": "跳芭蕾的小妮",
"consume": 8899.99,
"desc": "自己一直都没有动力去系统学习,这次有了老师和大家的督促,非常不错,终于坚持下来了",
"sex": 0,
"birthday": "1993-08-27",
"faceimg": "https://www.icodingedu.com/img/customers/1011/logo.png"
}
# POST /index_customer/_doc/1012
{
"id": 1012,
"age": 37,
"username": "draglong",
"nickname": "飞龙在天",
"consume": 18899.99,
"desc": "技术宅,就喜欢研究技术,和大家碰撞感觉很好",
"sex": 1,
"birthday": "1983-11-22",
"faceimg": "https://www.icodingedu.com/img/customers/1012/logo.png"
}
QueryString的查询方式
/index_customer/_search?q=desc:艾编程&q=age:24
2. Elasticsearch的DSL查询语法
QueryString实际业务中使用非常少,一旦需要复杂的查询条件,就需要DSL来进行查询
- Domain Specific Language
- 特定领域的语言
- 内容是基于json
- 查询灵活方便,有利于复杂查询
POST /index_customer/_doc/_search
{
"query": {
"match": {
"desc": "艾编程"
}
}
}
POST /index_customer/_doc/_search #判断字段是否存在,如果存在将字段相关的所有数据展示
{
"query": {
"exists": {
"field": "desc"
}
}
}
3. DSL查询所有数据及分页&文档ID搜索
# 查询所有数据并进行filter过滤
POST /index_customer/_doc/_search
{
"query": {
"match_all":{}
},
"_source": ["id","nickname","age"]
}
# 分页 from从第几个开始,0开始节点
POST /index_customer/_doc/_search
{
"query": {
"match_all":{}
},
"_source": ["id","nickname","age"],
"from": 0,
"size": 5
}
DSL进行文档ID的精确搜索
GET /index_customer/_doc/1001
GET /index_customer/_doc/1002?_source=id,nickname,age
# 我们要通过DSL一次查询出多个ID
POST /index_customer/_search
{
"query":{
"ids": {
"type": "_doc",
"values": ["1001","1010","1012"]
}
},
"_source": ["id","username","nickname","desc"]
}
4. DSL的term&math&terms匹配查询
# term查询,将条件里的词不进行分词来查询
POST /index_customer/_search
{
"query":{
"term": {
"desc": "艾编程"
}
}
}
# match查询,将条件里的词使用这个field的分词器进行分词后和field里的分词逐一匹配
POST /index_customer/_search
{
"query":{
"match": {
"desc": "艾编程"
}
}
}
# terms查询,将条件里的词不进行分词来查询,但可以设置多个
POST /index_customer/_search
{
"query":{
"terms": {
"desc": ["艾编程","编程"]
}
}
}
5. DSL的match相关搜索查询及boost权重设置
5.1. math_phrase
match对查询条件分词后只要有匹配,哪怕是一个词,也会返回
math_phrase:分词结果必须都包含,并且是相邻(搜索比较严格)
如果有这么一篇文章
# springboot elasticsearch dsl
1、springboot 安装使用
2、elasticsearch 安装使用
3、es的DSL查询使用
# 其实你心里的想法是springboot下如何操作elasticsearch进行DSL查询
短语匹配
# query里的词必须都匹配,而且还要相邻
# slop: 允许词语间跳过的数量,默认可以不写,不写是挨着的没有跳数
{
"query": {
"match_phrase": {
"desc": {
"query": "艾编程 知识",
"slop": 9
}
}
}
}
5.2. mach扩展:operator
- or:搜索的分词内容,只要有一个词语匹配就展示
- and:搜索的分词内容必须全部匹配
POST /index_customer/_search
{
"query": {
"match": {
"desc": {
"query": "我在艾编程",
"operator": "and"
}
}
}
}
# or就是常规match查询
{
"query":{
"match": {
"desc": "艾编程"
}
}
}
5.3. mach扩展:minimum_should_match
minimum_should_match:最低匹配精度,配置有两种,一种是占比:60%,只要有任意 6个词就展示出结果 6/10个分词=60%,还有一种绝对值:可以直接写数字:6,至少要6个词都匹配
POST /index_customer/_search
{
"query": {
"match": {
"desc": {
"query": "我在艾编程学习",
"minimum_should_match": "40%"
}
}
}
}
5.4. multi_match
POST /index_customer/_search
{
"query": {
"multi_match": {
"query": "我在艾编程学习神话",
"fields": [
"desc",
"nickname"
]
}
}
}
5.5. 修改默认的排序权重
权重,为某个字段设置查询权重,权重越高,文档相关性越高_score越高,查询结果约靠前
# "nickname^10" 代表在这个列上如果查询到数据,就会将相关性放大10倍
POST /index_customer/_search
{
"query": {
"multi_match": {
"query": "我在艾编程学习神话",
"fields": [
"desc",
"nickname^10"
]
}
}
}
6. DSL之布尔查询&为指定语句加权
布尔查询就是bool,其实就是多条件组合查询,可以在不同列上进行
- must:查询必须匹配所有搜索条件,类似and
- should:查询只要匹配一个搜索条件即可,类似or
- must_not:不匹配搜索条件的内容,一个都不要满足,类似not
这里要注意的点
# must、should、must_not虽然语法可以并列
# must和should同时写,只有must生效
# must和must_not同时写,会在must符合后进行must_not条件剔除
业务场景描述:
desc里有:日出,并且username是jacktian或nickname里有学习的数据
# 解决方案一
{
"query": {
"bool": {
"must": [
{
"match": {
"desc": "学习"
}
}
],
"should": [
{
"match": {
"nickname": "日出"
}
},
{
"term": {
"username": "jacktian"
}
}
],
"minimum_should_match": 1
}
}
}
# 方案二
# desc='学习' and (nickname='日出' or username='jacktian')
{
"query": {
"bool": {
"must": [
{
"match": {
"desc": "学习"
}
},
{
"bool": {
"should": [
{
"match": {
"nickname": "日出"
}
},
{
"term": {
"username": "jacktian"
}
}
]
}
}
]
}
}
}
# 方案三
# (desc='学习' and nickname='日出') or (desc='学习' and username='jacktian')
{
"query": {
"bool": {
"should": [
{
"bool": {
"must": [
{
"match": {
"nickname": {
"query": "日出",
"boost": 10
}
}
},
{
"match": {
"desc": "学习"
}
}
]
}
},
{
"bool": {
"must": [
{
"match": {
"desc": "学习"
}
},
{
"term": {
"username": "jacktian"
}
}
]
}
}
]
}
}
}
# "boost": 10 就是给字段增加权重的方式
7. DSL搜索过滤器&搜索排序&搜索结果高亮
- query:是根据用户搜索条件检索匹配记录
- post_filter:用于查询后,对结果数据进行筛选
# gte : 大于等于
# lte : 小于等于
# gt : 大于
# lt : 小于
POST /index_customer/_search
{
"query": {
"match": {
"desc": "我在艾编程学习"
}
},
"post_filter": {
"range": {
"consume": {
"gt": 10000,
"lt": 20000
}
}
}
}
# 过滤也可以在query的结果基础上再做其他field的分词
POST /index_customer/_search
{
"query": {
"match": {
"desc": "我在艾编程学习"
}
},
"post_filter": {
"match": {
"nickname": "太阳"
}
}
}
排序
POST /index_customer/_search
{
"query": {
"match": {
"desc": "我在艾编程学习"
}
},
"sort": [
{
"consume": "desc"
},
{
"age": "asc"
}
]
}
# 注意:sort是不能给text排序,但可以给keyword排序,因为text会做分词所以无法给多个分词进行排序,keyword不会分词所以可以
# 一般也不会给文本类型进行排序,doc默认排序规格是按照:文档搜索的相关度打分_score,按照分数降序
# 权重是影响_score分数的因子
结果高亮显现
# 高亮其实就是把我们搜索匹配到的关键字进行颜色突出显示,html
POST /index_customer/_search
{
"query": {
"match": {
"desc": "我在艾编程学习"
}
},
"highlight": {
"pre_tags": [
""
],
"post_tags": [
""
],
"fields": {
"desc": {}
}
}
}
8. 其他DSL查询方式
# prefix : 根据前缀去查询
# fuzzy : 搜索引擎自动纠错的功能,icodinng.com 把帮我们纠错成icoding.com
# wildcard : 占位符查询
? : 1个字符
* : N个字符
{
"query": {
"wildcard": {
"desc": "?艾编程*"
}
}
}
不要以为每天把功能完成了就行了,这种思想是要不得的,互勉~!