Day407&408&409.ES -谷粒商城

ES

一、基本概念

mysql用作持久化存储,ES用作检索

  • index索引

类比mysql的数据库概念

  • Type类型

类比mysql的概念

  • Document文档

类比mysql的记录概念

index库>type表>document文档

Day407&408&409.ES -谷粒商城_第1张图片

  • 为什么ES搜索快?倒排索引

Day407&408&409.ES -谷粒商城_第2张图片

检索:
1 红海特工行动?查出后计算相关性得分:3号记录命中了2次,且3号本身才有3个单词,2/3,所以3号最匹配
2 红海行动?

关系型数据库中两个数据表示是独立的,即使他们里面有相同名称的列也不影响使用,但ES中不是这样的。
elasticsearch是基于Lucene开发的搜索引擎,而ES中不同type下名称相同的filed最终在Lucene中的处理方式是一样的。

• 两个不同type下的两个user_name,在ES同一个索引下其实被认为是同一个filed,你必须在两个不同的type中定义相同的filed映射。
否则,不同type中的相同字段名称就会在处理中出现冲突的情况,导致Lucene处理效率下降。去掉type就是为了提高ES处理数据的效率。


Elasticsearch 7.x
URL中的type参数为可选。比如,索引一个文档不再要求提供文档类型。


Elasticsearch 8.x
不再支持URL中的type参数。


解决:
将索引从多类型迁移到单类型,每种类型文档一个独立索引

二、Docket安装ES

1、dokcer中安装elastic search

下载ealastic search(存储和检索)和kibana(可视化检索)

docker pull elasticsearch:7.4.2
docker pull kibana:7.4.2

注意版本要统一


2、配置

# 将docker里的目录挂载到linux的/usr/local/elasticsearch/data目录中,修改/mydata就可以改掉docker里的
mkdir -p /mydata/elasticsearch/config
mkdir -p /mydata/elasticsearch/data

# es可以被远程任何机器访问
echo "http.host: 0.0.0.0" >/mydata/elasticsearch/config/elasticsearch.yml

# 递归更改权限,es需要访问
chmod -R 777 /mydata/elasticsearch/

3、启动Elastic search

# 9200是用户交互端口 9300是集群心跳端口
# -e指定是单阶段运行
# -e指定占用的内存大小,生产时可以设置32G
sudo docker run --name elasticsearch -p 9200:9200 -p 9300:9300 \
-e  "discovery.type=single-node" \
-e ES_JAVA_OPTS="-Xms64m -Xmx512m" \
-v /mydata/elasticsearch/config/elasticsearch.yml:/usr/share/elasticsearch/config/elasticsearch.yml \
-v /mydata/elasticsearch/data:/usr/share/elasticsearch/data \
-v  /mydata/elasticsearch/plugins:/usr/share/elasticsearch/plugins \
-d elasticsearch:7.4.2 

查看是否启动成功

docker ps

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-kXuVmn5F-1633092409342)(C:/Users/PePe/AppData/Roaming/Typora/typora-user-images/image-20211001164944045.png)]


4、安装kibana

  • 拉去kibana,注意版本对应
docker pull kibana:7.4.2
  • 启动kibana
sudo docker run --name kibana -e ELASTICSEARCH_HOSTS=http://192.168.109.101:9200  -p 5601:5601 -d kibana:7.4.2

5、测试

  • 查看elasticsearch版本信息http://192.168.109.101:9200

Day407&408&409.ES -谷粒商城_第3张图片

  • 显示elasticsearch节点信息:http://192.168.109.101:9200/_cat/nodes

Day407&408&409.ES -谷粒商城_第4张图片

127.0.0.1 14 92 29 0.48 0.96 0.60 dilm * 4fe4e202abf1
#    4fe4e202abf1代表上面的结点 *代表是主节点
  • 访问Kibanahttp://192.168.109.101:5601/app/kibana

Day407&408&409.ES -谷粒商城_第5张图片


6、初步检索

Day407&408&409.ES -谷粒商城_第6张图片

  • _CAT
    在这里插入图片描述
GET /_cat/nodes     #查看所有节点

127.0.0.1 15 93 8 0.18 0.55 0.52 dilm * 4fe4e202abf1
GET /_cat/health    #查看es健康状况

1633079094 09:04:54 elasticsearch green 1 1 3 3 0 0 0 0 - 100.0%
# 注:green表示健康值正常
GET /_cat/master    #查看主节点

Y9zawKrWSQWvFBx0wVi94g 127.0.0.1 127.0.0.1 4fe4e202abf1
# 主节点唯一编号
# 虚拟机地址
GET /_cat/indicies  #查看所有索引,等价于mysql数据库的show databases

green  open .kibana_task_manager_1   DhtDmKrsRDOUHPJm1EFVqQ 1 0 2 3 40.8kb 40.8kb
green  open .apm-agent-configuration vxzRbo9sQ1SvMtGkx6aAHQ 1 0 0 0   230b   230b
green  open .kibana_1                rdJ5pejQSKWjKxRtx-EIkQ 1 0 5 1 18.2kb 18.2kb
#这3个索引是kibana创建的
  • PUT

必须携带id

Day407&408&409.ES -谷粒商城_第7张图片

#索引一个文档
#保存一个数据,保存在哪个索引的哪个类型下(哪张数据库哪张表下),保存时用唯一标识指定

put /achang/user/1  #这里的1是指定了id为1
{
     
  "name":"achang",
  "age":"18"
}


{
     
  "_index" : "achang", #表明该数据在哪个数据库下
  "_type" : "user",    #表明该数据在哪个类型下
  "_id" : "1",         #表明被保存数据的id
  "_version" : 1,      #被保存数据的版本
  "result" : "created",#这里是创建了一条数据,如果重新put一条数据,则该状态会变为updated,并且版本号也会发生变化。
  "_shards" : {
      #分片,集群的情况下
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 0,       #并发控制字段,每次更新都会+1,用来做乐观锁
  "_primary_term" : 1  #主分片重新分配,如重启,就会变化
}

Day407&408&409.ES -谷粒商城_第8张图片

  • GET
get /achang/user/1

{
     
  "_index" : "achang",
  "_type" : "user",
  "_id" : "1",
  "_version" : 2,
  "_seq_no" : 1,
  "_primary_term" : 1,
  "found" : true,
  "_source" : {
      #真正的数据
    "name" : "achang",
    "age" : "20"
  }
}
  • 乐观锁

通过“if_seq_no=1&if_primary_term=1”,当序列号匹配的时候,才进行修改,否则不修改。

#如下两个请求并发发出
put /achang/user/1?if_seq_no=1&if_primary_term=1
{
     
  "name" : "achang1"
}

put /achang/user/1?if_seq_no=1&if_primary_term=1
{
     
  "name" : "achang2"
}

#再次查询,发现name被改成了achang1
get /achang/user/1
{
     
  "_index" : "achang",
  "_type" : "user",
  "_id" : "1",
  "_version" : 3,
  "_seq_no" : 2,
  "_primary_term" : 1,
  "found" : true,
  "_source" : {
     
    "name" : "achang1"
  }
}
  • _update
POST customer/externel/1/_update
{
     
    "doc":{
     
        "name":"111"
    }
}
#或者
POST customer/externel/1
{
     
    "doc":{
     
        "name":"222"
    }
}
#或者
PUT customer/externel/1
{
     
    "doc":{
     
        "name":"222"
    }
}

不同

  • 带有update情况下 POST操作会对比源文档数据,如果相同不会有什么操作,文档version不增加

  • PUT操作总会重新保存并增加version版本

  • POST时带_update对比元数据如果一样就不进行任何操作

看场景

  • 对于大并发更新不带update
  • 对于大并发查询偶尔更新带update;对比更新,重新计算分配规则
  • POST更新文档,带有_update
  • 删除文档或索引
DELETE customer/external/1
DELETE customer

#注:elasticsearch并没有提供删除类型的操作,只提供了删除索引和文档的操作。
#实例:删除整个costomer索引数据
#删除前,所有的索引
get /_cat/indices
green  open .kibana_task_manager_1   DhtDmKrsRDOUHPJm1EFVqQ 1 0 2 0 31.3kb 31.3kb
green  open .apm-agent-configuration vxzRbo9sQ1SvMtGkx6aAHQ 1 0 0 0   283b   283b
green  open .kibana_1                rdJ5pejQSKWjKxRtx-EIkQ 1 0 8 3 28.8kb 28.8kb
yellow open customer                 mG9XiCQISPmfBAmL1BPqIw 1 1 9 1  8.6kb  8.6kb

#删除 “customer”索引
DELTE /customer
#响应
{
     
    "acknowledged": true
}


#删除后,所有的索引/_cat/indices
green open .kibana_task_manager_1   DhtDmKrsRDOUHPJm1EFVqQ 1 0 2 0 31.3kb 31.3kb
green open .apm-agent-configuration vxzRbo9sQ1SvMtGkx6aAHQ 1 0 0 0   283b   283b
green open .kibana_1                rdJ5pejQSKWjKxRtx-EIkQ 1 0 8 3 28.8kb 28.8kb
  • ES的批量操作——bulk
#匹配导入数据
post /customer/external/_bulk
{
     "index":{
     "_id":"1"}}#两行为一个整体
{
     "name":"a"}#真正的数据
{
     "index":{
     "_id":"2"}}#两行为一个整体
{
     "name":"b"}#真正的数据
#语法格式:
post /xxxxx/xxxxx/_bulk
{
     action:{
     metadata}}\n
{
     request body  }\n
{
     action:{
     metadata}}\n
{
     request body  }\n

这里的批量操作,当发生某一条执行发生失败时,其他的数据仍然能够接着执行,也就是说彼此之间是独立的

bulk api以此按顺序执行所有的action(动作)。如果一个单个的动作因任何原因失败,它将继续处理它后面剩余的动作。当bulk api返回时,它将提供每个动作的状态(与发送的顺序相同),所以您可以检查是否一个指定的动作是否失败了。

#实例1: 执行多条数据
POST /customer/external/_bulk
{
     "index":{
     "_id":"1"}}
{
     "name":"John Doe"}
{
     "index":{
     "_id":"2"}}
{
     "name":"John Doe"}
#保存操作,指定了索引、id,真正的数据未name:xxx

#执行结果
{
     
  "took" : 318,  #花费了多少ms
  "errors" : false, #没有发生任何错误
  "items" : [ #每个数据的结果
    {
     
      "index" : {
      #保存
        "_index" : "customer", #索引
        "_type" : "external", #类型
        "_id" : "1", #文档
        "_version" : 1, #版本
        "result" : "created", #创建
        "_shards" : {
     
          "total" : 2,
          "successful" : 1,
          "failed" : 0
        },
        "_seq_no" : 0,
        "_primary_term" : 1,
        "status" : 201 #新建完成
      }
    },
    {
     
      "index" : {
      #第二条记录
        "_index" : "customer",
        "_type" : "external",
        "_id" : "2",
        "_version" : 1,
        "result" : "created",
        "_shards" : {
     
          "total" : 2,
          "successful" : 1,
          "failed" : 0
        },
        "_seq_no" : 1,
        "_primary_term" : 1,
        "status" : 201
      }
    }
  ]
}
#实例2:对于整个索引执行批量操作
POST /_bulk
{
     "delete":{
     "_index":"website","_type":"blog","_id":"123"}}#删除操作

{
     "create":{
     "_index":"website","_type":"blog","_id":"123"}}#保存操作,下面是数据
{
     "title":"my first blog post"}

{
     "index":{
     "_index":"website","_type":"blog"}}#保存操作,下面的是数据
{
     "title":"my second blog post"}

{
     "update":{
     "_index":"website","_type":"blog","_id":"123"}}#更新操作
{
     "doc":{
     "title":"my updated blog post"}}
#指定操作,索引,类型,id


#运行结果:
{
     
  "took" : 414,
  "errors" : false,
  "items" : [
    {
     
      "delete" : {
     
        "_index" : "website",
        "_type" : "blog",
        "_id" : "123",
        "_version" : 1,
        "result" : "not_found",
        "_shards" : {
     
          "total" : 2,
          "successful" : 1,
          "failed" : 0
        },
        "_seq_no" : 0,
        "_primary_term" : 1,
        "status" : 404
      }
    },
    {
     
      "create" : {
     
        "_index" : "website",
        "_type" : "blog",
        "_id" : "123",
        "_version" : 2,
        "result" : "created",
        "_shards" : {
     
          "total" : 2,
          "successful" : 1,
          "failed" : 0
        },
        "_seq_no" : 1,
        "_primary_term" : 1,
        "status" : 201
      }
    },
    {
     
      "index" : {
     
        "_index" : "website",
        "_type" : "blog",
        "_id" : "AOpgO3wB3UIR4wi8SrO8",
        "_version" : 1,
        "result" : "created",
        "_shards" : {
     
          "total" : 2,
          "successful" : 1,
          "failed" : 0
        },
        "_seq_no" : 2,
        "_primary_term" : 1,
        "status" : 201
      }
    },
    {
     
      "update" : {
     
        "_index" : "website",
        "_type" : "blog",
        "_id" : "123",
        "_version" : 3,
        "result" : "updated",
        "_shards" : {
     
          "total" : 2,
          "successful" : 1,
          "failed" : 0
        },
        "_seq_no" : 3,
        "_primary_term" : 1,
        "status" : 200
      }
    }
  ]
}
  • 样本测试数据

准备了一份顾客银行账户信息的虚构的JSON文档样本。每个文档都有下列的schema(模式)。

{
     
	"account_number": 1,
	"balance": 39225,
	"firstname": "Amber",
	"lastname": "Duke",
	"age": 32,
	"gender": "M",
	"address": "880 Holmes Lane",
	"employer": "Pyrami",
	"email": "[email protected]",
	"city": "Brogan",
	"state": "IL"
}

https://gitee.com/xlh_blog/common_content/blob/master/es%E6%B5%8B%E8%AF%95%E6%95%B0%E6%8D%AE.json;导入测试数据

POST bank/account/_bulk
#上面的数据

Day407&408&409.ES -谷粒商城_第9张图片

get /_cat/indices #刚导入了1000条

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-8F1hdHIc-1633092409361)(C:/Users/PePe/AppData/Roaming/Typora/typora-user-images/image-20211001183836919.png)]


  • 让Docker每次启动都自动启动ES

sudo docker update 【实例ID】 --restart=always

[root@s1 elasticsearch]# sudo docker ps -a
CONTAINER ID   IMAGE                 COMMAND                  CREATED       STATUS       PORTS                                                                                  NAMES
5c43fff82773   kibana:7.4.2          "/usr/local/bin/dumb…"   2 hours ago   Up 2 hours   0.0.0.0:5601->5601/tcp, :::5601->5601/tcp                                              kibana
4fe4e202abf1   elasticsearch:7.4.2   "/usr/local/bin/dock…"   2 hours ago   Up 2 hours   0.0.0.0:9200->9200/tcp, :::9200->9200/tcp, 0.0.0.0:9300->9300/tcp, :::9300->9300/tcp   elasticsearch
879b641ebe6c   redis                 "docker-entrypoint.s…"   11 days ago   Up 2 hours   0.0.0.0:6379->6379/tcp, :::6379->6379/tcp                                              redis
b2b889f90cd9   mysql:5.7             "docker-entrypoint.s…"   11 days ago   Up 2 hours   0.0.0.0:3306->3306/tcp, :::3306->3306/tcp, 33060/tcp                                   mysql
[root@s1 elasticsearch]# sudo docker update 5c4 --restart=always
5c4
[root@s1 elasticsearch]# sudo docker update 4fe --restart=always
4fe

三、进阶检索

官方API

https://www.elastic.co/guide/en/elasticsearch/reference/7.x/search-your-data.html

1、search Api

  • 通过REST request uri 发送搜索参数 (uri +检索参数);
  • 通过REST request body 来发送它们(uri+请求体);
  • 请求参数方式检索

检索bank索引中查询全部,并按account_number升序排序;

检索了1000条数据,但是根据相关性算法,只返回10条

GET bank/_search?q=*&sort=account_number:asc

# q=* 查询所有
# sort 排序字段
# asc升序 

检索bank下所有信息,包括type和docs

GET bank/_search

返回格式

Day407&408&409.ES -谷粒商城_第10张图片

took – 花费多少ms搜索
timed_out – 是否超时
_shards – 多少分片被搜索了,以及多少成功/失败的搜索分片
max_score –文档相关性最高得分
hits.total.value - 多少匹配文档被找到
hits.sort - 结果的排序key,没有的话按照score排序
hits._score - 相关得分 (not applicable when using match_all)

  • uri+请求体进行检索
GET /bank/_search
{
     
  "query": {
      "match_all": {
     } },
  "sort": [
    {
      "account_number": "asc" },
    {
     "balance":"desc"}
  ]
}

2、Query DSL

什么get的请求体叫query DSL

  • 基本语法格式

    Elasticsearch提供了一个可以执行查询的Json风格的DSL(domain-specific language领域特定语言)。这个被称为Query DSL,该查询语言非常全面。

    • 典型结构
    QUERY_NAME:{
           
       ARGUMENT:VALUE,
       ARGUMENT:VALUE,
        ...
    }
    

    如果针对于某个字段,那么它的结构如下:

    {
           
      QUERY_NAME:{
           
         FIELD_NAME:{
           
           ARGUMENT:VALUE,
           ARGUMENT:VALUE,...
          }   
       }
    }
    
    • 示例
    GET bank/_search
    {
           
      "query": {
             #查询形式
        "match_all": {
           } #查询所有
      },
      "from": 0, #开始位置
      "size": 5, #显示数
      "_source":["balance"],#返回部分字段
      "sort": [ #排序
        {
           
          "account_number": {
           
            "order": "desc"
          }
        }
      ]
    }
    
    #   _source为要返回的字段
    

3、match匹配查询

  • 基本类型(非字符串),精确控制

    GET bank/_search
    {
           
      "query": {
           
        "match": {
           
          "account_number": "999"
        }
      }
    }
    

查询结果

{
     
  "took" : 8,
  "timed_out" : false,
  "_shards" : {
     
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
     
    "total" : {
     
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
     
        "_index" : "bank",
        "_type" : "account",
        "_id" : "999",
        "_score" : 1.0,
        "_source" : {
     
          "account_number" : 999,
          "balance" : 6087,
          "firstname" : "Dorothy",
          "lastname" : "Barron",
          "age" : 22,
          "gender" : "F",
          "address" : "499 Laurel Avenue",
          "employer" : "Xurban",
          "email" : "[email protected]",
          "city" : "Belvoir",
          "state" : "CA"
        }
      }
    ]
  }
}

  • 字符串,全文检索
GET bank/_search
{
     
  "query": {
     
    "match": {
     
      "address": "kings" #字符串
    }
  }
}

全文检索,最终会按照评分进行排序,会对检索条件进行分词匹配

{
     
  "took" : 9,
  "timed_out" : false,
  "_shards" : {
     
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
     
    "total" : {
     
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 5.9908285,
    "hits" : [
      {
     
        "_index" : "bank",
        "_type" : "account",
        "_id" : "20",
        "_score" : 5.9908285,
        "_source" : {
     
          "account_number" : 20,
          "balance" : 16418,
          "firstname" : "Elinor",
          "lastname" : "Ratliff",
          "age" : 36,
          "gender" : "M",
          "address" : "282 Kings Place", #分词匹配
          "employer" : "Scentric",
          "email" : "[email protected]",
          "city" : "Ribera",
          "state" : "WA"
        }
      },
      {
     
        "_index" : "bank",
        "_type" : "account",
        "_id" : "722",
        "_score" : 5.9908285,
        "_source" : {
     
          "account_number" : 722,
          "balance" : 27256,
          "firstname" : "Roberts",
          "lastname" : "Beasley",
          "age" : 34,
          "gender" : "F",
          "address" : "305 Kings Hwy",#分词匹配
          "employer" : "Quintity",
          "email" : "[email protected]",
          "city" : "Hayden",
          "state" : "PA"
        }
      }
    ]
  }
}

4、match_phrase 【短句匹配】

  • match_phrase

将需要匹配的值当成一整个单词(不分词)进行检索

前面的是包含mill或road就查出来,我们现在要都包含才查出

GET bank/_search
{
     
  "query": {
     
    "match_phrase": {
     
      "address": "mill road"
    }
  }
}

查处address中包含mill road的所有记录,并给出相关性得分

{
     
  "took" : 50,
  "timed_out" : false,
  "_shards" : {
     
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
     
    "total" : {
     
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 8.926605,
    "hits" : [
      {
     
        "_index" : "bank",
        "_type" : "account",
        "_id" : "970",
        "_score" : 8.926605,
        "_source" : {
     
          "account_number" : 970,
          "balance" : 19648,
          "firstname" : "Forbes",
          "lastname" : "Wallace",
          "age" : 28,
          "gender" : "M",
          "address" : "990 Mill Road",
          "employer" : "Pheast",
          "email" : "[email protected]",
          "city" : "Lopezo",
          "state" : "AK"
        }
      }
    ]
  }
}

  • match_phrase和match的区别,观察如下实例
GET bank/_search
{
     
  "query": {
     
    "match_phrase": {
     
      "address": "990 Mill"
    }
  }
}

结果

{
     
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
     
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
     
    "total" : {
     
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 10.806405,
    "hits" : [
      {
     
        "_index" : "bank",
        "_type" : "account",
        "_id" : "970",
        "_score" : 10.806405,
        "_source" : {
     
          "account_number" : 970,
          "balance" : 19648,
          "firstname" : "Forbes",
          "lastname" : "Wallace",
          "age" : 28,
          "gender" : "M",
          "address" : "990 Mill Road", #
          "employer" : "Pheast",
          "email" : "[email protected]",
          "city" : "Lopezo",
          "state" : "AK"
        }
      }
    ]
  }
}

使用match的keyword

GET bank/_search
{
     
  "query": {
     
    "match": {
     
      "address.keyword": "990 Mill"
    }
  }
}

查询结果,一条也未匹配到

{
     
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
     
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
     
    "total" : {
     
      "value" : 0,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ] #
  }
}

修改匹配条件为“990 Mill Road”

GET bank/_search
{
     
  "query": {
     
    "match": {
     
      "address.keyword": "990 Mill Road"
    }
  }
}

修改匹配条件为“990 Mill Road”

{
     
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
     
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
     
    "total" : {
     
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 6.5032897,
    "hits" : [
      {
     
        "_index" : "bank",
        "_type" : "account",
        "_id" : "970",
        "_score" : 6.5032897,
        "_source" : {
     
          "account_number" : 970,
          "balance" : 19648,
          "firstname" : "Forbes",
          "lastname" : "Wallace",
          "age" : 28,
          "gender" : "M",
          "address" : "990 Mill Road", #
          "employer" : "Pheast",
          "email" : "[email protected]",
          "city" : "Lopezo",
          "state" : "AK"
        }
      }
    ]
  }
}

查询出一条数据

{
     
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
     
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
     
    "total" : {
     
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 6.5032897,
    "hits" : [
      {
     
        "_index" : "bank",
        "_type" : "account",
        "_id" : "970",
        "_score" : 6.5032897,
        "_source" : {
     
          "account_number" : 970,
          "balance" : 19648,
          "firstname" : "Forbes",
          "lastname" : "Wallace",
          "age" : 28,
          "gender" : "M",
          "address" : "990 Mill Road",
          "employer" : "Pheast",
          "email" : "[email protected]",
          "city" : "Lopezo",
          "state" : "AK"
        }
      }
    ]
  }
}

文本字段的匹配,使用keyword,匹配的条件就是要显示字段的全部值,要进行精确匹配的。

match_phrase是做短语匹配,只要文本中包含匹配条件,就能匹配到


5、multi_math【多字段匹配】

字段中或关系state或者address中包含mill,并且在查询过程中,会对于查询条件进行分词。

GET bank/_search
{
     
  "query": {
     
    "multi_match": {
     
      "query": "mill",
      "fields": [
        "state",
        "address"
      ]
    }
  }
}

查询结果:

{
     
  "took" : 28,
  "timed_out" : false,
  "_shards" : {
     
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
     
    "total" : {
     
      "value" : 4,
      "relation" : "eq"
    },
    "max_score" : 5.4032025,
    "hits" : [
      {
     
        "_index" : "bank",
        "_type" : "account",
        "_id" : "970",
        "_score" : 5.4032025,
        "_source" : {
     
          "account_number" : 970,
          "balance" : 19648,
          "firstname" : "Forbes",
          "lastname" : "Wallace",
          "age" : 28,
          "gender" : "M",
          "address" : "990 Mill Road",
          "employer" : "Pheast",
          "email" : "[email protected]",
          "city" : "Lopezo",
          "state" : "AK"
        }
      },
      {
     
        "_index" : "bank",
        "_type" : "account",
        "_id" : "136",
        "_score" : 5.4032025,
        "_source" : {
     
          "account_number" : 136,
          "balance" : 45801,
          "firstname" : "Winnie",
          "lastname" : "Holland",
          "age" : 38,
          "gender" : "M",
          "address" : "198 Mill Lane",
          "employer" : "Neteria",
          "email" : "[email protected]",
          "city" : "Urie",
          "state" : "IL"
        }
      },
      {
     
        "_index" : "bank",
        "_type" : "account",
        "_id" : "345",
        "_score" : 5.4032025,
        "_source" : {
     
          "account_number" : 345,
          "balance" : 9812,
          "firstname" : "Parker",
          "lastname" : "Hines",
          "age" : 38,
          "gender" : "M",
          "address" : "715 Mill Avenue",
          "employer" : "Baluba",
          "email" : "[email protected]",
          "city" : "Blackgum",
          "state" : "KY"
        }
      },
      {
     
        "_index" : "bank",
        "_type" : "account",
        "_id" : "472",
        "_score" : 5.4032025,
        "_source" : {
     
          "account_number" : 472,
          "balance" : 25571,
          "firstname" : "Lee",
          "lastname" : "Long",
          "age" : 32,
          "gender" : "F",
          "address" : "288 Mill Street",
          "employer" : "Comverges",
          "email" : "[email protected]",
          "city" : "Movico",
          "state" : "MT"
        }
      }
    ]
  }
}

6、bool用来做复合查询

复合语句可以合并,任何其他查询语句,包括符合语句。

这也就意味着,复合语句之间可以互相嵌套,可以表达非常复杂的逻辑。

  • must
  • 必须达到must所列举的所有条件
  • must_not
  • 必须不匹配must_not所列举的所有条件。
  • should
  • 应该满足should所列举的条件。满足条件最好,不满足也可以,满足得分更高
  • must 必须是指定的情况

实例:查询gender=m,并且address=mill的数据

GET bank/_search
{
     
   "query":{
     
        "bool":{
     
             "must":[
              {
     "match":{
     "address":"mill"}},
              {
     "match":{
     "gender":"M"}}
             ]
         }
    }
}

结果

{
     
  "took" : 83,
  "timed_out" : false,
  "_shards" : {
     
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
     
    "total" : {
     
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : 6.0824604,
    "hits" : [
      {
     
        "_index" : "bank",
        "_type" : "account",
        "_id" : "970",
        "_score" : 6.0824604,
        "_source" : {
     
          "account_number" : 970,
          "balance" : 19648,
          "firstname" : "Forbes",
          "lastname" : "Wallace",
          "age" : 28,
          "gender" : "M",
          "address" : "990 Mill Road",
          "employer" : "Pheast",
          "email" : "[email protected]",
          "city" : "Lopezo",
          "state" : "AK"
        }
      },
      {
     
        "_index" : "bank",
        "_type" : "account",
        "_id" : "136",
        "_score" : 6.0824604,
        "_source" : {
     
          "account_number" : 136,
          "balance" : 45801,
          "firstname" : "Winnie",
          "lastname" : "Holland",
          "age" : 38,
          "gender" : "M",#
          "address" : "198 Mill Lane",#
          "employer" : "Neteria",
          "email" : "[email protected]",
          "city" : "Urie",
          "state" : "IL"
        }
      },
      {
     
        "_index" : "bank",
        "_type" : "account",
        "_id" : "345",
        "_score" : 6.0824604,
        "_source" : {
     
          "account_number" : 345,
          "balance" : 9812,
          "firstname" : "Parker",
          "lastname" : "Hines",
          "age" : 38,
          "gender" : "M",#
          "address" : "715 Mill Avenue",#
          "employer" : "Baluba",
          "email" : "[email protected]",
          "city" : "Blackgum",
          "state" : "KY"
        }
      }
    ]
  }
}
  • must_not 必须不是指定的情况

实例:查询gender=m,并且address=mill的数据,但是age不等于38的

GET bank/_search
{
     
  "query": {
     
    "bool": {
     
      "must": [
        {
      "match": {
      "gender": "M" }},
        {
      "match": {
     "address": "mill"}}
      ],
      "must_not": [
        {
      "match": {
      "age": "38" }}
      ]
   }
  }
}

结果

{
     
  "took" : 8,
  "timed_out" : false,
  "_shards" : {
     
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
     
    "total" : {
     
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 6.0824604,
    "hits" : [
      {
     
        "_index" : "bank",
        "_type" : "account",
        "_id" : "970",
        "_score" : 6.0824604,
        "_source" : {
     
          "account_number" : 970,
          "balance" : 19648,
          "firstname" : "Forbes",
          "lastname" : "Wallace",
          "age" : 28,#
          "gender" : "M", #
          "address" : "990 Mill Road", #
          "employer" : "Pheast",
          "email" : "[email protected]",
          "city" : "Lopezo",
          "state" : "AK"
        }
      }
    ]
  }
}
  • should

应该达到should列举的条件,如果到达会增加相关文档的评分,并不会改变查询的结果。

如果query中只有should且只有一种匹配规则,那么should的条件就会被作为默认匹配条件二区改变查询结果。

实例:匹配lastName应该等于Wallace的数据

GET bank/_search
{
     
  "query": {
     
    "bool": {
     
      "must": [
        {
     
          "match": {
     
            "gender": "M"
          }
        },
        {
     
          "match": {
     
            "address": "mill"
          }
        }
      ],
      "must_not": [
        {
     
          "match": {
     
            "age": "18"
          }
        }
      ],
      "should": [
        {
     
          "match": {
     
            "lastname": "Wallace"
          }
        }
      ]
    }
  }
}

查询结果:能够看到相关度越高,得分也越高

{
     
  "took" : 7,
  "timed_out" : false,
  "_shards" : {
     
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
     
    "total" : {
     
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : 12.585751,
    "hits" : [
      {
     
        "_index" : "bank",
        "_type" : "account",
        "_id" : "970",
        "_score" : 12.585751,
        "_source" : {
     
          "account_number" : 970,
          "balance" : 19648,
          "firstname" : "Forbes",
          "lastname" : "Wallace",#
          "age" : 28,#
          "gender" : "M",#
          "address" : "990 Mill Road",#
          "employer" : "Pheast",
          "email" : "[email protected]",
          "city" : "Lopezo",
          "state" : "AK"
        }
      },
      {
     
        "_index" : "bank",
        "_type" : "account",
        "_id" : "136",
        "_score" : 6.0824604,
        "_source" : {
     
          "account_number" : 136,
          "balance" : 45801,
          "firstname" : "Winnie",
          "lastname" : "Holland",#
          "age" : 38,#
          "gender" : "M",#
          "address" : "198 Mill Lane",#
          "employer" : "Neteria",
          "email" : "[email protected]",
          "city" : "Urie",
          "state" : "IL"
        }
      },
      {
     
        "_index" : "bank",
        "_type" : "account",
        "_id" : "345",
        "_score" : 6.0824604,
        "_source" : {
     
          "account_number" : 345,
          "balance" : 9812,
          "firstname" : "Parker",
          "lastname" : "Hines",#
          "age" : 38,#
          "gender" : "M",#
          "address" : "715 Mill Avenue",#
          "employer" : "Baluba",
          "email" : "[email protected]",
          "city" : "Blackgum",
          "state" : "KY"
        }
      }
    ]
  }
}

7、Filter【结果过滤】

上面的must和should影响相关性得分,而must_not仅仅是一个filter ,不贡献得分 must改为filter就使must不贡献得分。

如果只有filter条件的话,我们会发现得分都是0

一个key多个值可以用terms并不是所有的查询都需要产生分数,特别是哪些仅用于filtering过滤的文档。

为了不计算分数,elasticsearch会自动检查场景并且优化查询的执行。 不参与评分更快

GET bank/_search
{
     
  "query": {
     
    
    "bool": {
     
      "must": [
        {
      "match": {
     "address": "mill" } }
      ],
      "filter": {
        #query.bool.filter
        "range": {
     
          "balance": {
     
            "gte": "10000",
            "lte": "20000"
          }
        }
      }
    }
    
  }
}

这里先是查询所有匹配address=mill的文档,然后再根据10000<=balance<=20000进行过滤查询结果

{
     
  "took" : 37,
  "timed_out" : false,
  "_shards" : {
     
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
     
    "total" : {
     
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 5.4032025, #
    "hits" : [
      {
     
        "_index" : "bank",
        "_type" : "account",
        "_id" : "970",
        "_score" : 5.4032025,
        "_source" : {
     
          "account_number" : 970,
          "balance" : 19648, #
          "firstname" : "Forbes",
          "lastname" : "Wallace",
          "age" : 28,
          "gender" : "M",
          "address" : "990 Mill Road", #
          "employer" : "Pheast",
          "email" : "[email protected]",
          "city" : "Lopezo",
          "state" : "AK"
        }
      }
    ]
  }
}

boolean查询中,must, should 和must_not 元素都被称为查询子句

文档是否符合每个“must”或“should”子句中的标准,决定了文档的“相关性得分”。

得分越高,文档越符合您的搜索条件

默认情况下,Elasticsearch返回根据这些相关性得分排序的文档。

“must_not”子句中的条件被视为“过滤器”。 它影响文档是否包含在结果中, 但不影响文档的评分方式。 还可以显式地指定任意过滤器来包含或排除基于结构化数据的文档。

filter在使用过程中,并不会计算相关性得分:

GET bank/_search
{
     
  "query": {
     
    "bool": {
     
      "must": [
        {
     
          "match": {
     
            "address": "mill"
          }
        }
      ],
      "filter": {
     
        "range": {
     
          "balance": {
     
            "gte": "10000",
            "lte": "20000"
          }
        }
      }
    }
  }
}
#查询结果:
{
     
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
     
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
     
    "total" : {
     
      "value" : 213,
      "relation" : "eq"
    },
    "max_score" : 0.0,
    "hits" : [
      {
     
        "_index" : "bank",
        "_type" : "account",
        "_id" : "20",
        "_score" : 0.0,
        "_source" : {
     
          "account_number" : 20,
          "balance" : 16418,
          "firstname" : "Elinor",
          "lastname" : "Ratliff",
          "age" : 36,
          "gender" : "M",
          "address" : "282 Kings Place",
          "employer" : "Scentric",
          "email" : "[email protected]",
          "city" : "Ribera",
          "state" : "WA"
        }
      },
      {
     
        "_index" : "bank",
        "_type" : "account",
        "_id" : "37",
        "_score" : 0.0,
        "_source" : {
     
          "account_number" : 37,
          "balance" : 18612,
          "firstname" : "Mcgee",
          "lastname" : "Mooney",
          "age" : 39,
          "gender" : "M",
          "address" : "826 Fillmore Place",
          "employer" : "Reversus",
          "email" : "[email protected]",
          "city" : "Tooleville",
          "state" : "OK"
        }
      },
        #省略。。。

能看到所有文档的“_score” : 0.0


8、term

和match一样。匹配某个属性的值。

全文检索字段(text字符串等)用match, 其他 非text字段匹配用term

不要使用term来进行文本字段查询 es默认存储text值时用分词分析,所以要搜索text值,使用match

https://www.elastic.co/guide/en/elasticsearch/reference/7.6/query-dsl-term-query.html

  • 字段.keyword:要一一匹配到 精确匹配

Day407&408&409.ES -谷粒商城_第11张图片

  • match_phrase:子串包含即可 使用term匹配查询 短语匹配

Day407&408&409.ES -谷粒商城_第12张图片

GET bank/_search
{
     
  "query": {
     
    "term": {
     
      "address": "mill Road"
    }
  }
}

查询结果:

# 一条也没有匹配到
{
     
  "took" : 6,.
  
  "timed_out" : false,
  "_shards" : {
     
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
     
    "total" : {
     
      "value" : 0,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  }
}

而更换为match匹配时,能够匹配到32个文档

GET bank/_search
{
     
  "query": {
     
    "match": {
     
      "address": "mill Road"
    }
  }
}

结果:

{
     
  "took" : 17,
  "timed_out" : false,
  "_shards" : {
     
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
     
    "total" : {
     
      "value" : 32,
      "relation" : "eq"
    },
    "max_score" : 8.926605,
    "hits" : [
      {
     
        "_index" : "bank",
        "_type" : "account",
        "_id" : "970",
        "_score" : 8.926605,
        "_source" : {
     
          "account_number" : 970,
          "balance" : 19648,
          "firstname" : "Forbes",
          "lastname" : "Wallace",
          "age" : 28,
          "gender" : "M",
          "address" : "990 Mill Road",
          "employer" : "Pheast",
          "email" : "[email protected]",
          "city" : "Lopezo",
          "state" : "AK"
        }
      },
		#省略.....
      }
    ]
  }
}

9、Aggregation(聚合)

聚合提供了从数据中分组提取数据的能力。最简单的聚合方法大致等于SQL Group by和SQL聚合函数。

在elasticsearch中,执行搜索返回this(命中结果),并且同时返回聚合结果,把以响应中的所有hits(命中结果)分隔开的能力。

这是非常强大且有效的,你可以执行查询和多个聚合,并且在一次使用中得到各自的(任何一个的)返回结果,使用一次简洁和简化的API啦避免网络往返。

aggs:执行聚合。聚合语法如下:

"aggs":{
      # 聚合
    "aggs_name这次聚合的名字,方便展示在结果集中":{
     
        "AGG_TYPE聚合的类型(avg,term,terms)":{
     }
     }
}

terms:看值的可能性分布

avg:看值的分布平均

  • 例:搜索address中包含mill的所有人的年龄分布以及平均年龄,但不显示这些人的详情
# 分别为包含mill、,平均年龄、
GET bank/_search
{
     
  "query": {
      # 查询出包含mill的
    "match": {
     
      "address": "Mill"
    }
  },
  "aggs": {
      #基于查询聚合
    "ageAgg": {
       # 聚合的名字,随便起
      "terms": {
      # 看值的可能性分布
        "field": "age",
        "size": 10
      }
    },
    "ageAvg": {
      
      "avg": {
      # 看age值的平均
        "field": "age"
      }
    },
    "balanceAvg": {
     
      "avg": {
      # 看balance的平均
        "field": "balance"
      }
    }
  },
  "size": 0  # 不看详情,只看聚合结果
}

查询结果:

{
     
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
     
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
     
    "total" : {
     
      "value" : 4, // 命中4条
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
     
    "ageAgg" : {
      // 第一个聚合的结果
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
     
          "key" : 38,
          "doc_count" : 2
        },
        {
     
          "key" : 28,
          "doc_count" : 1
        },
        {
     
          "key" : 32,
          "doc_count" : 1
        }
      ]
    },
    "ageAvg" : {
      // 第二个聚合的结果
      "value" : 34.0
    },
    "balanceAvg" : {
      // 第三个聚合的结果
      "value" : 25208.0
    }
  }
}
  • 子聚合

按照年龄聚合,并且求这些年龄段的这些人的平均薪资

写到一个聚合里是基于上个聚合进行子聚合。

下面求每个age分布的平均balance

GET bank/_search
{
     
  "query": {
     
    "match_all": {
     } #查询所有
  },
  "aggs": {
     
    "ageAgg": {
     
      "terms": {
      # 看分布
        "field": "age", #字段
        "size": 100 #数量
      },
      "aggs": {
      # 与terms并列 【子聚合】
        "ageAvg": {
      #平均
          "avg": {
     
            "field": "balance"
          }
        }
      }
    }
  },
  "size": 0
}

输出结果

{
     
  "took" : 49,
  "timed_out" : false,
  "_shards" : {
     
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
     
    "total" : {
     
      "value" : 1000,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
     
    "ageAgg" : {
     
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
     
          "key" : 31,
          "doc_count" : 61,
          "ageAvg" : {
     
            "value" : 28312.918032786885
          }
        },
        {
     
          "key" : 39,
          "doc_count" : 60,
          "ageAvg" : {
     
            "value" : 25269.583333333332
          }
        },
        {
     
          "key" : 26,
          "doc_count" : 59,
          "ageAvg" : {
     
            "value" : 23194.813559322032
          }
        },
        {
     
          "key" : 32,
          "doc_count" : 52,
          "ageAvg" : {
     
            "value" : 23951.346153846152
          }
        },
        {
     
          "key" : 35,
          "doc_count" : 52,
          "ageAvg" : {
     
            "value" : 22136.69230769231
          }
        },
        {
     
          "key" : 36,
          "doc_count" : 52,
          "ageAvg" : {
     
            "value" : 22174.71153846154
          }
        },
        {
     
          "key" : 22,
          "doc_count" : 51,
          "ageAvg" : {
     
            "value" : 24731.07843137255
          }
        },
        {
     
          "key" : 28,
          "doc_count" : 51,
          "ageAvg" : {
     
            "value" : 28273.882352941175
          }
        },
        {
     
          "key" : 33,
          "doc_count" : 50,
          "ageAvg" : {
     
            "value" : 25093.94
          }
        },
        {
     
          "key" : 34,
          "doc_count" : 49,
          "ageAvg" : {
     
            "value" : 26809.95918367347
          }
        },
        {
     
          "key" : 30,
          "doc_count" : 47,
          "ageAvg" : {
     
            "value" : 22841.106382978724
          }
        },
        {
     
          "key" : 21,
          "doc_count" : 46,
          "ageAvg" : {
     
            "value" : 26981.434782608696
          }
        },
        {
     
          "key" : 40,
          "doc_count" : 45,
          "ageAvg" : {
     
            "value" : 27183.17777777778
          }
        },
        {
     
          "key" : 20,
          "doc_count" : 44,
          "ageAvg" : {
     
            "value" : 27741.227272727272
          }
        },
        {
     
          "key" : 23,
          "doc_count" : 42,
          "ageAvg" : {
     
            "value" : 27314.214285714286
          }
        },
        {
     
          "key" : 24,
          "doc_count" : 42,
          "ageAvg" : {
     
            "value" : 28519.04761904762
          }
        },
        {
     
          "key" : 25,
          "doc_count" : 42,
          "ageAvg" : {
     
            "value" : 27445.214285714286
          }
        },
        {
     
          "key" : 37,
          "doc_count" : 42,
          "ageAvg" : {
     
            "value" : 27022.261904761905
          }
        },
        {
     
          "key" : 27,
          "doc_count" : 39,
          "ageAvg" : {
     
            "value" : 21471.871794871793
          }
        },
        {
     
          "key" : 38,
          "doc_count" : 39,
          "ageAvg" : {
     
            "value" : 26187.17948717949
          }
        },
        {
     
          "key" : 29,
          "doc_count" : 35,
          "ageAvg" : {
     
            "value" : 29483.14285714286
          }
        }
      ]
    }
  }
}

复杂子聚合:

查出所有年龄分布,并且这些年龄段中M的平均薪资和F的平均薪资以及这个年龄段的总体平均薪资

GET bank/_search
{
     
  "query": {
     
    "match_all": {
     }
  },
  "aggs": {
     
    "ageAgg": {
     
      "terms": {
       #  看age分布
        "field": "age",
        "size": 100
      },
      "aggs": {
      # 子聚合
        "genderAgg": {
     
          "terms": {
      # 看gender分布
            "field": "gender.keyword" # 注意这里,文本字段应该用.keyword
          },
          "aggs": {
      # 子聚合
            "balanceAvg": {
     
              "avg": {
      # 性别的平均薪资
                "field": "balance"
              }
            }
          }
        },
        "ageBalanceAvg": {
     
          "avg": {
      #age分布的平均(男女)
            "field": "balance"
          }
        }
      }
    }
  },
  "size": 0
}

输出结果:

{
     
  "took" : 119,
  "timed_out" : false,
  "_shards" : {
     
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
     
    "total" : {
     
      "value" : 1000,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
     
    "ageAgg" : {
     
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
     
          "key" : 31,
          "doc_count" : 61,
          "genderAgg" : {
     
            "doc_count_error_upper_bound" : 0,
            "sum_other_doc_count" : 0,
            "buckets" : [
              {
     
                "key" : "M",
                "doc_count" : 35,
                "balanceAvg" : {
     
                  "value" : 29565.628571428573
                }
              },
              {
     
                "key" : "F",
                "doc_count" : 26,
                "balanceAvg" : {
     
                  "value" : 26626.576923076922
                }
              }
            ]
          },
          "ageBalanceAvg" : {
     
            "value" : 28312.918032786885
          }
        }
      ]
        .......//省略其他
    }
  }
}
  • nested对象聚合
GET articles/_search
{
     
  "size": 0, 
  "aggs": {
     
    "nested": {
     
      "nested": {
     
        "path": "payment"
      },
      "aggs": {
     
        "amount_avg": {
     
          "avg": {
     
            "field": "payment.amount"
          }
        }
      }
    }
  }
}

10、Mapping

映射定义文档如何被存储检索的

  • 字段类型

https://www.elastic.co/guide/en/elasticsearch/reference/7.x/mapping-types.html

Day407&408&409.ES -谷粒商城_第13张图片

  • 映射

查看mapping信息(对应文档的类型),类似mysql每个字段的类型

ES会自动猜测映射的类型

Day407&408&409.ES -谷粒商城_第14张图片

GET bank/_mapping
{
     
  "bank" : {
     
    "mappings" : {
     
      "properties" : {
     
        "account_number" : {
     
          "type" : "long" # long类型
        },
        "address" : {
     
          "type" : "text", # 文本类型,会进行全文检索,进行分词
          "fields" : {
     
            "keyword" : {
      # addrss.keyword
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "age" : {
     
          "type" : "long"
        },
        "balance" : {
     
          "type" : "long"
        },
        "city" : {
     
          "type" : "text",
          "fields" : {
     
            "keyword" : {
     
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "email" : {
     
          "type" : "text",
          "fields" : {
     
            "keyword" : {
     
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "employer" : {
     
          "type" : "text",
          "fields" : {
     
            "keyword" : {
     
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "firstname" : {
     
          "type" : "text",
          "fields" : {
     
            "keyword" : {
     
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "gender" : {
     
          "type" : "text",
          "fields" : {
     
            "keyword" : {
     
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "lastname" : {
     
          "type" : "text",
          "fields" : {
     
            "keyword" : {
     
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "state" : {
     
          "type" : "text",
          "fields" : {
     
            "keyword" : {
     
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        }
      }
    }
  }
}
  • 新版本改变

    • ElasticSearch7-去掉type概念

    关系型数据库中两个数据表示是独立的,即使他们里面有相同名称的列也不影响使用,但ES中不是这样的。elasticsearch是基于Lucene开发的搜索引擎,而ES中不同type下名称相同的filed最终在Lucene中的处理方式是一样的。

    两个不同type下的两个user_name,在ES同一个索引下其实被认为是同一个filed,你必须在两个不同的type中定义相同的filed映射。否则,不同type中的相同字段名称就会在处理中出现冲突的情况,导致Lucene处理效率下降。

    去掉type就是为了提高ES处理数据的效率
    Elasticsearch 7.x URL中的type参数为可选。

    比如,索引一个文档不再要求提供文档类型。

    • Elasticsearch 8.x 不再支持URL中的type参数

    将索引从多类型迁移到单类型,每种类型文档一个独立索引 将已存在的索引下的类型数据,全部迁移到指定位置即可。详见数据迁移

    Specifying types in requests is deprecated. For instance, indexing a document no longer requires a document type. The new index APIs are PUT {index}/_doc/{id} in case of explicit ids and POST {index}/_doc for auto-generated ids. Note that in 7.0, _doc is a permanent part of the path, and represents the endpoint name rather than the document type.
    The include_type_name parameter in the index creation, index template, and mapping APIs will default to false. Setting the parameter at all will result in a deprecation warning.
    The default mapping type is removed.
    Elasticsearch 8.x

    Specifying types in requests is no longer supported.
    The include_type_name parameter is removed.

  • 创建索引并指定映射

PUT /my_index
{
     
  "mappings": {
     
    "properties": {
     
      "age": {
     
        "type": "integer"
      },
      "email": {
     
        "type": "keyword" # 指定为keyword
      },
      "name": {
     
        "type": "text" # 全文检索。保存时候分词,检索时候进行分词匹配
      }
    }
  }
}

输出:

{
     
  "acknowledged" : true,
  "shards_acknowledged" : true,
  "index" : "my_index"
}
  • 查看映射
get /my_index

输出

{
     
  "my_index" : {
     
    "aliases" : {
      },
    "mappings" : {
     
      "properties" : {
     
        "age" : {
     
          "type" : "integer"
        },
        "email" : {
     
          "type" : "keyword"
        },
        "name" : {
     
          "type" : "text"
        }
      }
    },
    "settings" : {
     
      "index" : {
     
        "creation_date" : "1633158082897",
        "number_of_shards" : "1",
        "number_of_replicas" : "1",
        "uuid" : "2luZR2cQQl2U0JIQ9P4z5A",
        "version" : {
     
          "created" : "7040299"
        },
        "provided_name" : "my_index"
      }
    }
  }
}
  • 添加新的字段映射
PUT /my_index/_mapping
{
     
  "properties": {
     
    "employee-id": {
     
      "type": "keyword",
      "index": false # 字段不能被检索。检索
    }
  }
}
#这里的 “index”: false,表明新增的字段不能被检索,只是一个冗余字段。
  • 不能更新映射

对于已经存在的字段映射,我们不能更新。更新必须创建新的索引,进行数据迁移。

  • 数据迁移

    先创建new_twitter的正确映射

    然后使用如下方式进行数据迁移

    • 6.0以后写法
    POST reindex
    {
           
      "source":{
           
          "index":"bank" #老索引
       },
      "dest":{
           
          "index":"new_bank" #新索引
       }
    }
    
    • 老版本写法

    Day407&408&409.ES -谷粒商城_第15张图片

    POST reindex
    {
           
      "source":{
           
          "index":"bank",  #老索引
          "type":"account" #具体的类型:
       },
      "dest":{
           
          "index":"new_bank" #新索引
       }
    }
    

四、分词

  • 一个tokenizer(分词器)接收一个字符流,将之分割为独立的tokens(词元,通常是独立的单词),然后输出tokens流。

  • 例如:whitespace tokenizer遇到空白字符时分割文本。它会将文本"Quick brown fox!"分割为[Quick,brown,fox!]

  • 该tokenizer(分词器)还负责记录各个terms(词条)的顺序或position位置(用于phrase短语和word proximity词近邻查询),以及term(词条)所代表的原始word(单词)的start(起始)和end(结束)的character offsets(字符串偏移量)(用于高亮显示搜索的内容)。

  • elasticsearch提供了很多内置的分词器(标准分词器),可以用来构建custom analyzers(自定义分词器)。

  • 关于分词器: https://www.elastic.co/guide/en/elasticsearch/reference/7.6/analysis.html

POST _analyze
{
     
  "analyzer": "standard", #使用标准分词器
  "text": "The 2 Brown-Foxes bone." #需要分析的文本
}

执行结果:

{
     
  "tokens" : [
    {
     
      "token" : "the",
      "start_offset" : 0,
      "end_offset" : 3,
      "type" : "",
      "position" : 0
    },
    {
     
      "token" : "2",
      "start_offset" : 4,
      "end_offset" : 5,
      "type" : "",
      "position" : 1
    },
    {
     
      "token" : "brown",
      "start_offset" : 6,
      "end_offset" : 11,
      "type" : "",
      "position" : 2
    },
    {
     
      "token" : "foxes",
      "start_offset" : 12,
      "end_offset" : 17,
      "type" : "",
      "position" : 3
    },
    {
     
      "token" : "bone",
      "start_offset" : 18,
      "end_offset" : 22,
      "type" : "",
      "position" : 4
    }
  ]
}

  • 对于中文,我们需要安装额外的分词器
    • 安装ik分词器

      • github地址:https://github.com/medcl/elasticsearch-analysis-ik/releases,找到你对应的版本

      • 所有的语言分词,默认使用的都是“Standard Analyzer”,但是这些分词器针对于中文的分词,并不友好。为此需要安装中文的分词器。

      • 在前面安装的elasticsearch时,我们已经将elasticsearch容器的“/usr/share/elasticsearch/plugins”目录,映射到宿主机的“ /mydata/elasticsearch/plugins”目录下,所以比较方便的做法就是下载“/elasticsearch-analysis-ik-7.4.2.zip”文件,然后解压到目录ik下即可。安装完毕后,需要重启elasticsearch容器

      • 确认是否安装好了分词器

        • 通过 docker ps 查看ES容器的id

        • 通过 docker exec -it ES容器id /bin/bash 进入容器内部

        • 通过插件,查看是否安装成功

        • Day407&408&409.ES -谷粒商城_第16张图片

        • 退出ES容器

          exit;
          
        • 重启ES

          docker restart elasticsearch
          

          [外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-Ns1ZxbwJ-1633174793380)(C:/Users/PePe/AppData/Roaming/Typora/typora-user-images/image-20211002172312822.png)]

    • 测试分词器

      • ik_smart 智能分词
      GET _analyze
      {
               
         "analyzer": "ik_smart", 
         "text":"我是阿昌"
      }
      
      {
               
        "tokens" : [
          {
               
            "token" : "我",
            "start_offset" : 0,
            "end_offset" : 1,
            "type" : "CN_CHAR",
            "position" : 0
          },
          {
               
            "token" : "是",
            "start_offset" : 1,
            "end_offset" : 2,
            "type" : "CN_CHAR",
            "position" : 1
          },
          {
               
            "token" : "阿昌",
            "start_offset" : 2,
            "end_offset" : 4,
            "type" : "CN_WORD",
            "position" : 2
          }
        ]
      }
      
      
      • ik_max_word 最大组合
      GET _analyze
      {
               
         "analyzer": "ik_max_word", 
         "text":"我是温州人"
      }
      
      {
               
        "tokens" : [
          {
               
            "token" : "我",
            "start_offset" : 0,
            "end_offset" : 1,
            "type" : "CN_CHAR",
            "position" : 0
          },
          {
               
            "token" : "是",
            "start_offset" : 1,
            "end_offset" : 2,
            "type" : "CN_CHAR",
            "position" : 1
          },
          {
               
            "token" : "温州人",
            "start_offset" : 2,
            "end_offset" : 5,
            "type" : "CN_WORD",
            "position" : 2
          },
          {
               
            "token" : "温州",
            "start_offset" : 2,
            "end_offset" : 4,
            "type" : "CN_WORD",
            "position" : 3
          },
          {
               
            "token" : "人",
            "start_offset" : 4,
            "end_offset" : 5,
            "type" : "CN_CHAR",
            "position" : 4
          }
        ]
      }
      
      
    • 自定义词库

      修改/mydata/elasticsearch/plugins/ik/config中的IKAnalyzer.cfg.xml

      
      DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
      <properties>
              <comment>IK Analyzer 扩展配置comment>
              
              <entry key="ext_dict">entry>
               
              <entry key="ext_stopwords">entry>
              
              
              
              
      properties>
      

      修改完成后,需要重启elasticsearch容器,否则修改不生效。

      docker restart elasticsearch

      更新完成后,es只会对于新增的数据用更新分词。历史数据是不会重新分词的。如果想要历史数据重新分词,需要执行:

      POST my_index/_update_by_query?conflicts=proceed
      

五、安装Nginx

通过nginx来为es提供远程的自定义分词

  • 随便启动一个nginx实例,只是为了复制出配置
docker run -p80:80 --name nginx -d nginx:1.10
  • 将容器内的配置文件拷贝到/usr/local/nginx/conf/ 下
docker container cp nginx:/etc/nginx . #别忘了点,且前面有一个空格

Day407&408&409.ES -谷粒商城_第17张图片

  • 停止nginx容器,并删除
docker stop nginx  #停止nginx容器
docker rm nginx    #删除nginx镜像
  • 更改从nginx中复制的nginx文件夹,并改名为conf

Day407&408&409.ES -谷粒商城_第18张图片

  • 在nginx文件夹中创建如下
mkdir -p /mydata/nginx/html
mkdir -p /mydata/nginx/logs
  • 最后nginx文件夹中有的文件夹为

Day407&408&409.ES -谷粒商城_第19张图片

  • 再次重启,nginx,并指定我们上面的三个目录做docker挂载
docker run -p 80:80 --name nginx \
 -v /mydata/nginx/html:/usr/share/nginx/html \
 -v /mydata/nginx/logs:/var/log/nginx \
 -v /mydata/nginx/conf/:/etc/nginx \
 -d nginx:1.10
  • 创建“/mydata/nginx/html/index.html”文件,测试是否能够正常访问
touch index.html

Day407&408&409.ES -谷粒商城_第20张图片

  • 访问:http://ngix所在主机的IP:80/index.html

Day407&408&409.ES -谷粒商城_第21张图片

  • 安装好nginx,把Nginx当做tomcat来用
mkdir /mydata/nginx/html/es
cd /mydata/nginx/html/es/
vim fenci.txt

#在里面输入
阿昌
乔碧罗殿下
#保存操作
  • 测试http://192.168.109.101/es/fenci.txt,乱码问题我们先不管

Day407&408&409.ES -谷粒商城_第22张图片

  • 此时再配置上面的es,ik远程分词器的地址

<entry key="remote_ext_dict">http://192.168.109.101/es/fenci.txtentry> 

Day407&408&409.ES -谷粒商城_第23张图片

  • 重启es
docker restart elasticsearch
  • 再次测试

Day407&408&409.ES -谷粒商城_第24张图片

  • 设置nginx开机自动启动
docker update nginx --restart=always

六、elasticsearch-Rest-Client

1、9300: TCP

spring-data-elasticsearch:transport-api.jar;

  • springboot版本不同,ransport-api.jar不同,不能适配es版本
  • 7.x已经不建议使用,8以后就要废弃

2、9200: HTTP

有诸多包

  • jestClient: 非官方,更新慢;
  • RestTemplate:模拟HTTP请求,ES很多操作需要自己封装,麻烦;
  • HttpClient:同上;
  • Elasticsearch-Rest-Client:官方RestClient,封装了ES操作,API层次分明,上手简单;
  • 最终选择Elasticsearch-Rest-Client(elasticsearch-rest-high-level-client)

七、SpringBoot整合ElasticSearch

  • 搭建elasticsearch模块

Day407&408&409.ES -谷粒商城_第25张图片

Day407&408&409.ES -谷粒商城_第26张图片

  • 引入依赖,改变我们项目springboot版本2.2.1.RELEASE
<dependency>
    <groupId>org.elasticsearch.clientgroupId>
    <artifactId>elasticsearch-rest-high-level-clientartifactId>
    <version>7.4.2version>
dependency>
<dependency>
    <groupId>com.achang.achangmallgroupId>
    <artifactId>achangmall-commonartifactId>
    <version>0.0.1-SNAPSHOTversion>
dependency>

Day407&408&409.ES -谷粒商城_第27张图片

  • 查询我们springboot版本里面,对应的es的版本控制为6.8.4

Day407&408&409.ES -谷粒商城_第28张图片

  • 我们需要修改成我们es所对应的版本
<properties>
    <java.version>1.8java.version>
    <elasticsearch.version>7.4.2elasticsearch.version>
properties>
  • 配置
spring:
  cloud:
    nacos:
      discovery:
        server-addr: localhost:8848
  application:
    name: achangmall-search
server:
  port: 12000
  • com.achang.achangmall.search.conf.ElasticsearchConfig配置类

官方建议把requestOptions创建成单实例

@Configuration
public class ElasticsearchConfig {
     
    @Bean
    public RestHighLevelClient restHighLevelClient(){
     
        RestClientBuilder builder = RestClient.builder(new HttpHost("192.168.109.101", 9200, "http"));
        RestHighLevelClient client = new RestHighLevelClient(builder);
        return client;
    }
}
  • 因为引入的Common的依赖,所以我们需要排除掉数据源
@SpringBootApplication(exclude = DataSourceAutoConfiguration.class)
  • 在测试类中测试RestHighLevelClient是否注入成功
package com.achang.achangmall;

import com.achang.achangmall.search.AchangmallSearchApplication;
import org.elasticsearch.client.RestHighLevelClient;
import org.junit.Test;
import org.junit.runner.RunWith;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.test.context.SpringBootTest;
import org.springframework.test.context.junit4.SpringRunner;

@SpringBootTest(classes = AchangmallSearchApplication.class)
@RunWith(SpringRunner.class)
public class AchangmallSearchApplicationTests {
     

    @Autowired
    private RestHighLevelClient client;

    @Test
    public void contextLoads() {
     
        System.out.println(client);
    }

}

Day407&408&409.ES -谷粒商城_第29张图片


  • 编写测试类

官方API文档:

https://www.elastic.co/guide/en/elasticsearch/client/java-rest/7.x/java-rest-high.html

  • 保存方式分为同步异步,异步方式多了个listener回调

  • 设置索引

@Test
public void test1() throws Exception{
     
    IndexRequest indexRequest = new IndexRequest("users");//存储索引
    indexRequest.id("1");//id
    //        indexRequest.source("username","achang","age",18,"gender","男");

    User user = new User();
    user.setUsername("achang");
    user.setAge(18);
    user.setGender("男");
    String json = JSON.toJSONString(user);//转换成json字符串
    indexRequest.source(json, XContentType.JSON);

    //执行操作
    IndexResponse response = client.index(indexRequest, ElasticsearchConfig.COMMON_OPTIONS);

    
    //IndexResponse[index=users,type=_doc,id=1,version=1,result=created,seqNo=0,primaryTerm=1,shards={"total":2,"successful":1,"failed":0}]
    System.out.println(response);
}
  • 基本的crud操作可以参考官方文档如上

  • 这里测试一个复杂查询

官方文档:https://www.elastic.co/guide/en/elasticsearch/client/java-rest/current/java-rest-high-search.html

@Test
public void find() throws IOException {
     
    // 1 创建检索请求
    SearchRequest searchRequest = new SearchRequest();
    searchRequest.indices("bank");
    SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
    // 构造检索条件
    //        sourceBuilder.query();
    //        sourceBuilder.from();
    //        sourceBuilder.size();
    //        sourceBuilder.aggregation();
    sourceBuilder.query(QueryBuilders.matchQuery("address","mill"));
    System.out.println(sourceBuilder.toString());

    searchRequest.source(sourceBuilder);

    // 2 执行检索
    SearchResponse response = client.search(searchRequest, GuliESConfig.COMMON_OPTIONS);
    // 3 分析响应结果
    System.out.println(response.toString());
}
@Test
public void find() throws IOException {
     
    // 1 创建检索请求
    SearchRequest searchRequest = new SearchRequest();
    searchRequest.indices("bank");
    SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
    // 构造检索条件
    //        sourceBuilder.query();
    //        sourceBuilder.from();
    //        sourceBuilder.size();
    //        sourceBuilder.aggregation();
    sourceBuilder.query(QueryBuilders.matchQuery("address","mill"));
    //AggregationBuilders工具类构建AggregationBuilder
    // 构建第一个聚合条件:按照年龄的值分布
    TermsAggregationBuilder agg1 = AggregationBuilders.terms("agg1").field("age").size(10);// 聚合名称
    // 参数为AggregationBuilder
    sourceBuilder.aggregation(agg1);
    // 构建第二个聚合条件:平均薪资
    AvgAggregationBuilder agg2 = AggregationBuilders.avg("agg2").field("balance");
    sourceBuilder.aggregation(agg2);

    System.out.println("检索条件"+sourceBuilder.toString());

    searchRequest.source(sourceBuilder);

    // 2 执行检索
    SearchResponse response = client.search(searchRequest, GuliESConfig.COMMON_OPTIONS);
    // 3 分析响应结果
    System.out.println(response.toString());
    SearchHits hits = response.getHits();
    SearchHit[] hits1 = hits.getHits();
    for (SearchHit hit : hits1) {
     
        hit.getId();
        hit.getIndex();
        String sourceAsString = hit.getSourceAsString();
        Account account = JSON.parseObject(sourceAsString, Account.class);//将json转成对应bean对象
        System.out.println(account);

        //获取检索到聚合信息
        Aggregations aggregations = response.getAggregations();
        Terms agg21 = aggregations.get("agg2");
        for (Terms.Bucket bucket : agg21.getBuckets()) {
     
            String keyAsString = bucket.getKeyAsString();
            System.out.println(keyAsString);
        }
    }

你可能感兴趣的:(es,elasticsearch,docker,谷粒商城,nginx)