Elasticsearch 入门教程

Elastic 的底层是开源库 Lucene。但是,你没法直接用 Lucene,必须自己写代码去调用它的接口。Elastic 是 Lucene 的封装,提供了 REST API 的操作接口,开箱即用。

一、安装

环境

#lsb_release -a
LSB Version:    :base-4.0-amd64:base-4.0-noarch:core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-noarch
Distributor ID: CentOS
Description:    CentOS release 6.5 (Final)
Release:    6.5
Codename:   Final

#uname -a
Linux CDVM-213010030 2.6.32-431.el6.x86_64 #1 SMP Fri Nov 22 03:15:09 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

依赖

Elasticsearch 需要 Java 8+ 环境

1.1 安装Java

下载Java jdk, Java SE Development Kit 8 - Downloads

[elsearch@CDVM-213010030 tmp]$ ls /tmp/installed/
jdk-8u121-linux-x64.rpm  jdk-8u121-linux-x64.tar.gz
[elsearch@CDVM-213010030 tmp]$ rpm -ivh jdk-8u121-linux-x64.rpm
[elsearch@CDVM-213010030 tmp]$ java -version
java version "1.8.0_121"
Java(TM) SE Runtime Environment (build 1.8.0_121-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.121-b13, mixed mode)

1.2 安装Elasticsearch

[下载最新版本的 Elasticsearch](
https://www.elastic.co/guide/...

解压安装包

# tar zxvf elasticsearch-5.2.0.tar.gz -C /opt

运行elasticsearch脚本启动

# cd /opt/elasticsearch-5.2.0/bin
# ./elasticsearch

如果是用root账号运行,会出现以下错误:

[root@CDVM-213010030 elasticsearch-5.2.0]# ./bin/elasticsearch
[2017-02-08T14:22:45,125][WARN ][o.e.b.ElasticsearchUncaughtExceptionHandler] [] uncaught exception in thread [main]
org.elasticsearch.bootstrap.StartupException: java.lang.RuntimeException: can not run elasticsearch as root
    at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:125) ~[elasticsearch-5.2.0.jar:5.2.0]
    at org.elasticsearch.bootstrap.Elasticsearch.execute(Elasticsearch.java:112) ~[elasticsearch-5.2.0.jar:5.2.0]
    at 
    ... 6 more

这是出于系统安全考虑设置的条件。由于 ElasticSearch 可以接收用户输入的脚本并且执行,为了系统安全考虑, 建议创建一个单独的用户用来运行 ElasticSearch。

创建 elsearch 用户组及 elsearch 用户

groupadd elsearch
useradd elsearch -g elsearch -p elsearch

更改 Elasticsearch 文件夹及内部文件的所属用户及组为 elsearch:elsearch

chown -R elsearch:elsearch  /opt/elasticsearch-7.3.0

切换到elsearch用户再启动

# su elsearch
$ cd elasticsearch-5.2.0/bin
$ ./elasticsearch

启动后打印信息如下

[elsearch@CDVM-213010030 elasticsearch-5.2.0]$ ./bin/elasticsearch
[2017-02-08T15:22:52,677][WARN ][o.e.b.JNANatives         ] unable to install syscall filter: 
java.lang.UnsupportedOperationException: seccomp unavailable: requires kernel 3.5+ with CONFIG_SECCOMP and CONFIG_SECCOMP_FILTER compiled in
    at org.elasticsearch.bootstrap.SystemCallFilter.linuxImpl(SystemCallFilter.java:350) ~[elasticsearch-5.2.0.jar:5.2.0]
    at org.elasticsearch.bootstrap.SystemCallFilter.init(SystemCallFilter.java:638) ~[elasticsearch-5.2.0.jar:5.2.0]
    at org.elasticsearch.bootstrap.JNANatives.tryInstallSystemCallFilter(JNANatives.java:215) [elasticsearch-5.2.0.jar:5.2.0]
    at org.elasticsearch.bootstrap.Natives.tryInstallSystemCallFilter(Natives.java:99) [elasticsearch-5.2.0.jar:5.2.0]
    at org.elasticsearch.bootstrap.Bootstrap.initializeNatives(Bootstrap.java:110) [elasticsearch-5.2.0.jar:5.2.0]
    at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:203) [elasticsearch-5.2.0.jar:5.2.0]
    at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:333) [elasticsearch-5.2.0.jar:5.2.0]
    at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:121) [elasticsearch-5.2.0.jar:5.2.0]
    at org.elasticsearch.bootstrap.Elasticsearch.execute(Elasticsearch.java:112) [elasticsearch-5.2.0.jar:5.2.0]
    at org.elasticsearch.cli.SettingCommand.execute(SettingCommand.java:54) [elasticsearch-5.2.0.jar:5.2.0]
    at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:122) [elasticsearch-5.2.0.jar:5.2.0]
    at org.elasticsearch.cli.Command.main(Command.java:88) [elasticsearch-5.2.0.jar:5.2.0]
    at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:89) [elasticsearch-5.2.0.jar:5.2.0]
    at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:82) [elasticsearch-5.2.0.jar:5.2.0]
[2017-02-08T15:22:52,768][INFO ][o.e.n.Node               ] [] initializing ...
[2017-02-08T15:22:52,838][INFO ][o.e.e.NodeEnvironment    ] [qm6aUUo] using [1] data paths, mounts [[/ (/dev/vda)]], net usable_space [184gb], net total_space [196.8gb], spins? [possibly], types [ext4]
[2017-02-08T15:22:52,839][INFO ][o.e.e.NodeEnvironment    ] [qm6aUUo] heap size [1.9gb], compressed ordinary object pointers [true]
[2017-02-08T15:22:52,840][INFO ][o.e.n.Node               ] node name [qm6aUUo] derived from node ID [qm6aUUoUScO_S16Sod_7Bw]; set [node.name] to override
[2017-02-08T15:22:52,841][INFO ][o.e.n.Node               ] version[5.2.0], pid[22947], build[24e05b9/2017-01-24T19:52:35.800Z], OS[Linux/2.6.32-431.el6.x86_64/amd64], JVM[Oracle Corporation/Java HotSpot(TM) 64-Bit Server VM/1.8.0_121/25.121-b13]
[2017-02-08T15:22:53,573][INFO ][o.e.p.PluginsService     ] [qm6aUUo] loaded module [aggs-matrix-stats]
[2017-02-08T15:22:53,573][INFO ][o.e.p.PluginsService     ] [qm6aUUo] loaded module [ingest-common]
[2017-02-08T15:22:53,573][INFO ][o.e.p.PluginsService     ] [qm6aUUo] loaded module [lang-expression]
[2017-02-08T15:22:53,573][INFO ][o.e.p.PluginsService     ] [qm6aUUo] loaded module [lang-groovy]
[2017-02-08T15:22:53,573][INFO ][o.e.p.PluginsService     ] [qm6aUUo] loaded module [lang-mustache]
[2017-02-08T15:22:53,573][INFO ][o.e.p.PluginsService     ] [qm6aUUo] loaded module [lang-painless]
[2017-02-08T15:22:53,573][INFO ][o.e.p.PluginsService     ] [qm6aUUo] loaded module [percolator]
[2017-02-08T15:22:53,574][INFO ][o.e.p.PluginsService     ] [qm6aUUo] loaded module [reindex]
[2017-02-08T15:22:53,574][INFO ][o.e.p.PluginsService     ] [qm6aUUo] loaded module [transport-netty3]
[2017-02-08T15:22:53,574][INFO ][o.e.p.PluginsService     ] [qm6aUUo] loaded module [transport-netty4]
[2017-02-08T15:22:53,574][INFO ][o.e.p.PluginsService     ] [qm6aUUo] no plugins loaded
[2017-02-08T15:22:55,217][INFO ][o.e.n.Node               ] initialized
[2017-02-08T15:22:55,218][INFO ][o.e.n.Node               ] [qm6aUUo] starting ...
[2017-02-08T15:22:55,288][WARN ][i.n.u.i.MacAddressUtil   ] Failed to find a usable hardware address from the network interfaces; using random bytes: 24:be:3c:15:2d:4a:1d:e3
[2017-02-08T15:22:55,336][INFO ][o.e.t.TransportService   ] [qm6aUUo] publish_address {127.0.0.1:9300}, bound_addresses {[::1]:9300}, {127.0.0.1:9300}
[2017-02-08T15:22:55,342][WARN ][o.e.b.BootstrapChecks    ] [qm6aUUo] max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]
[2017-02-08T15:22:55,342][WARN ][o.e.b.BootstrapChecks    ] [qm6aUUo] system call filters failed to install; check the logs and fix your configuration or disable system call filters at your own risk
[2017-02-08T15:22:58,384][INFO ][o.e.c.s.ClusterService   ] [qm6aUUo] new_master {qm6aUUo}{qm6aUUoUScO_S16Sod_7Bw}{qojdwy-UQmODn4nlWEboQA}{127.0.0.1}{127.0.0.1:9300}, reason: zen-disco-elected-as-master ([0] nodes joined)
[2017-02-08T15:22:58,401][INFO ][o.e.h.HttpServer         ] [qm6aUUo] publish_address {127.0.0.1:9200}, bound_addresses {[::1]:9200}, {127.0.0.1:9200}
[2017-02-08T15:22:58,401][INFO ][o.e.n.Node               ] [qm6aUUo] started
[2017-02-08T15:22:58,424][INFO ][o.e.g.GatewayService     ] [qm6aUUo] recovered [0] indices into cluster_state

注意:Elasticsearch-5只支持内核3.5以上版本的linux操作系统

ElasticSearch后端启动命令

./elasticsearch -d

修改配置文件

# ----------------------------------- Memory -----------------------------------
#
# Lock the memory on startup:
#
#bootstrap.memory_lock: true
bootstrap.system_call_filter: false
#
# Make sure that the heap size is set to about half the memory available
# on the system and that the owner of the process is allowed to use this
# limit.
#
# Elasticsearch performs poorly when the system is swapping the memory.
#
# ---------------------------------- Network -----------------------------------
#
# Set the bind address to a specific IP (IPv4 or IPv6):
#
network.host: 10.213.10.30
#network.host: 127.0.0.1
#
# Set a custom port for HTTP:
#
http.port: 9200

默认情况下,Elastic 只允许本机访问,如果需要远程访问,可以修改 Elastic 安装目录的config/elasticsearch.yml文件,去掉network.host的注释,将它的值改成0.0.0.0,然后重新启动 Elastic。

错误1

ERROR: bootstrap checks failed
max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]
system call filters failed to install; check the logs and fix your configuration or disable system call filters at your own risk
[2017-02-08T16:13:57,827][INFO ][o.e.n.Node               ] [qm6aUUo] stopping ...
[2017-02-08T16:13:57,925][INFO ][o.e.n.Node               ] [qm6aUUo] stopped
[2017-02-08T16:13:57,925][INFO ][o.e.n.Node               ] [qm6aUUo] closing ...
[2017-02-08T16:13:57,946][INFO ][o.e.n.Node               ] [qm6aUUo] closed

解决方案:
使用下面的方法临时使其生效
$ sudo sysctl -w vm.max_map_count=262144
或修改 /etc/sysctl.conf 文件,添加 “vm.max_map_count = 262144”;
设置后,立即生效
[root@CDVM-213010030 elasticsearch-5.2.0]# sysctl -p
[root@CDVM-213010030 elasticsearch-5.2.0]# sysctl -a | grep
vm.max_map_count = 262144

错误2

ERROR: bootstrap checks failed
system call filters failed to install; check the logs and fix your configuration or disable system call filters at your own risk
[2017-02-08T16:39:43,123][INFO ][o.e.n.Node               ] [qm6aUUo] stopping ...
[2017-02-08T16:39:43,136][INFO ][o.e.n.Node               ] [qm6aUUo] stopped
[2017-02-08T16:39:43,136][INFO ][o.e.n.Node               ] [qm6aUUo] closing ...
[2017-02-08T16:39:43,242][INFO ][o.e.n.Node               ] [qm6aUUo] closed

原因:

这是在因为Centos6不支持SecComp,而ES5.2.0默认bootstrap.system_call_filter为true进行检测,所以导致检测失败,失败后直接导致ES不能启动。

解决方案:

在elasticsearch.yml中配置bootstrap.system_call_filter为false,注意要在Memory下面:
bootstrap.memory_lock: false
bootstrap.system_call_filter: false

可以查看issues

错误3

max file descriptors [4096] for elasticsearch process likely too low, increase to at least [65536]

解决方案:

修改/etc/security/limits.conf文件,添加或修改如下行:
hard nofile 65536
soft nofile 65536

503错误

# curl -X PUT 'localhost:9200/weather'
{"error":{"root_cause":[{"type":"master_not_discovered_exception","reason":null}],"type":"master_not_discovered_exception","reason":null},"status":503}

解决方案:

you must config elasticsearch.yml
set
node.name='node-1'
and
cluster.initial_master_nodes: ["node-1"]
then restart es!

重启后解决

# supervisorctl restart elasticSearch
# curl -X PUT '127.0.0.1:9200/weather'
{"acknowledged":true,"shards_acknowledged":true,"index":"weather"}

附上完整配置

# ------------------------------------ Node ------------------------------------
node.name: node-1
# ---------------------------------- Network -----------------------------------
network.host: 0.0.0.0
#
# Set a custom port for HTTP:
#
http.port: 9200
# --------------------------------- Discovery ----------------------------------
cluster.initial_master_nodes: ["node-1"]
# ---------------------------------- Gateway -----------------------------------
gateway.recover_after_nodes: 1
# ---------------------------------- Various -----------------------------------
http.cors.enabled: true
http.cors.allow-origin: "*"

1.3 测试

$ curl http://10.213.10.30:9200/?pretty
{
  "name" : "qm6aUUo",
  "cluster_name" : "elasticsearch",
  "cluster_uuid" : "UXrjeTP6SmmOIZOZ4j9I4w",
  "version" : {
    "number" : "5.2.0",
    "build_hash" : "24e05b9",
    "build_date" : "2017-01-24T19:52:35.800Z",
    "build_snapshot" : false,
    "lucene_version" : "6.4.0"
  },
  "tagline" : "You Know, for Search"
}

更多见官方安装文档

二、基本概念

2.1 接近实时(NRT)

Elasticsearch 是一个接近实时的搜索平台。这意味着,从索引一个文档直到这个文档能够被搜索到有一个很小的延迟(通常是 1 秒)。

2.2 集群(cluster)

一个集群就是由一个或多个节点组织在一起, 它们共同持有你全部的数据, 并一起提供索引和搜索功能。 一个集群由一个唯一的名字标识, 这个名字默认就是“elasticsearch”。 这个名字很重要, 因为一个节点只能通过指定某个集群的名字,来加入这个集群。在生产环境中显式地设定这个名字是一个好习惯,但是使用默认值来进行测试/开发也是不错的。

注意,一个集群中只包含一个节点是合法的。另外,你也可以拥有多个集群,集群以名字区分。

2.3 节点(node)

一个节点是你集群中的一个服务器,作为集群的一部分,它存储你的数据,参与集群的索引和搜索功能。 和集群类似, 一个节点也是由一个名字来标识的, 默认情况下, 这个名字是一个随机的Marvel角色的名字,这个名字会在节点启动时分配给它。这个名字对于管理工作来说很重要,因为在这个管理过程中,你会去确定网络中的哪些 服务器对应于Elasticsearch集群中的哪些节点。

一个节点可以通过配置集群名称的方式来加入一个指定的集群。 默认情况下,每个节点都会被安排加入到一个叫做“elasticsearch”的集群中,这意味着,如果你在你的网络中启动了若干个节点, 并假定它们能够相互发现彼此,它们将会自动地形成并加入到一个叫做“elasticsearch” 的集群中。

在一个集群里可以拥有任意多个节点。而且,如果当前你的网络中没有运行任何Elasticsearch节点,这时启动一个节点,会默认创建并加入一个叫做“elasticsearch”的单节点集群。

2.4 索引(Index)

ElasticSearch把数据存放到一个或者多个索引(indices)中。如果用关系型数据库模型对比,索引(index)的地位与数据库实例(database)相当。索引存放和读取的基本单元是文档(Document)。

一个索引就是一个拥有相似特征的文档的集合。比如说,你可以有一个客户数据的索引,另一个产品目录的索引,还有一个订单数据的索引。一个索引由一个名字来标识(必须全部是小写字母的),并且当我们要对这个索引中的文档进行索引、搜索、更新和删除的时候,都要使用到这个名字。在一个集群中,你能够创建任意多个索引。

Elastic 会索引所有字段,经过处理后写入一个反向索引(Inverted Index)。查找数据的时候,直接查找该索引。

所以,Elastic 数据管理的顶层单位就叫做 Index(索引)。它是单个数据库的同义词。每个 Index (即数据库)的名字必须是小写。

下面的命令可以查看当前节点的所有 Index。

$ curl -X GET 'http://localhost:9200/_cat/indices?v'

2.5 类型(type)

Document 可以分组,比如weather这个 Index 里面,可以按城市分组(北京和上海),也可以按气候分组(晴天和雨天)。这种分组就叫做 Type,它是虚拟的逻辑分组,用来过滤 Document。

在一个索引中,你可以定义一种或多种类型。一个类型是你的索引的一个逻辑上的分类/分区,其语义完全由你来定。通常,会为具有一组相同字段的文档定义一个类型。比如说,我们假设你运营一个博客平台并且将你所有的数据存储到一个索引中。在这个索引中,你可以为用户数据定义一个类型,为博客数据定义另一个类型,当然,也可以为评论数据定义另一个类型。

可以把 type 看作关系型数据库中的 table

不过如果把 type 看作 table 和以下这个说法有矛盾:

不同的 Type 应该有相似的结构(schema),举例来说,id字段不能在这个组是字符串,在另一个组是数值。这是与关系型数据库的表的一个区别。性质完全不同的数据(比如products和logs)应该存成两个 Index,而不是一个 Index 里面的两个 Type(虽然可以做到)。

下面的命令可以列出每个 Index 所包含的 Type。

$ curl 'localhost:9200/_mapping?pretty=true'

根据规划,Elastic 6.x 版只允许每个 Index 包含一个 Type,7.x 版将会彻底移除 Type。

2.6 文档(document)

Index 里面单条的记录称为 Document(文档),许多条 Document 构成了一个 Index。

一个文档是一个可被索引的基础信息单元。比如,你可以拥有某一个客户的文档、某一个产品的一个文档、某个订单的一个文档,在一个index/type里面,你可以存储任意多的文档。

注意,一个文档物理上存在于一个索引之中。

Document 使用 JSON 格式表示,下面是一个例子。

{
  "user": "张三",
  "title": "工程师",
  "desc": "数据库管理"
}

同一个 Index 里面的 Document,不要求有相同的结构(scheme),但是最好保持相同,这样有利于提高搜索效率。

2.7 分片和复制(shards and replicas)

一个索引可以存储超出单个结点硬件限制的大量数据。比如,一个具有10亿文档的索引占据1TB的磁盘空间,而任一节点可能没有这样大的磁盘空间来存储或者单个节点处理搜索请求,响应会太慢。

为了解决这个问题,Elasticsearch提供了将索引划分成多片的能力,这些片叫做分片。当你创建一个索引的时候,你可以指定你想要的分片的数量。每个分片本身也是一个功能完善并且独立的“索引”,这个“索引” 可以被放置到集群中的任何节点上。

分片之所以重要,主要有两方面的原因:

  • 允许你水平分割/扩展你的内容容量
  • 允许你在分片(位于多个节点上)之上进行分布式的、并行的操作,进而提高性能/吞吐量

至于一个分片怎样分布,它的文档怎样聚合回搜索请求,是完全由Elasticsearch管理的,对于作为用户的你来说,这些都是透明的。

在一个网络/云的环境里,失败随时都可能发生。在某个分片/节点因为某些原因处于离线状态或者消失的情况下,故障转移机制是非常有用且强烈推荐的。为此, Elasticsearch允许你创建分片的一份或多份拷贝,这些拷贝叫做复制分片,或者直接叫复制。

复制之所以重要,有两个主要原因:

  • 在分片/节点失败的情况下,复制提供了高可用性。复制分片不与原/主要分片置于同一节点上是非常重要的。
  • 因为搜索可以在所有的复制上并行运行,复制可以扩展你的搜索量/吞吐量

总之,每个索引可以被分成多个分片。一个索引也可以被复制0次(即没有复制) 或多次。一旦复制了,每个索引就有了主分片(作为复制源的分片)和复制分片(主分片的拷贝)。 分片和复制的数量可以在索引创建的时候指定。在索引创建之后,你可以在任何时候动态地改变复制的数量,但是你不能再改变分片的数量。

默认情况下,Elasticsearch中的每个索引分配5个主分片和1个复制。这意味着,如果你的集群中至少有两个节点,你的索引将会有5个主分片和另外5个复制分片(1个完全拷贝),这样每个索引总共就有10个分片。

三、创建和删除 Index

3.1 创建Index

$ curl -X PUT 'localhost:9200/weather'

服务器返回一个 JSON 对象,里面的acknowledged字段表示操作成功。

{
  "acknowledged":true,
  "shards_acknowledged":true
}

3.2 删除Index

$ curl -X DELETE 'localhost:9200/weather'

四 、数据操作

4.1 新增记录

新增记录有两种方式

一种是:POST 请求,不指定 Id,比如,向/accounts/person 新增一条人员记录

$ curl -X POST 'localhost:9200/accounts/person' -d '
{
  "user": "李四",
  "title": "工程师",
  "desc": "系统管理"
}'

上面代码中,向/accounts/person发出一个 POST 请求,添加一个记录。这时,服务器返回的 JSON 对象里面,_id字段就是一个随机字符串。

{
  "_index":"accounts",
  "_type":"person",
  "_id":"AV3qGfrC6jMbsbXb6k1p",
  "_version":1,
  "result":"created",
  "_shards":{"total":2,"successful":1,"failed":0},
  "created":true
}

另一种是PUT 请求,路径是/accounts/person/1,最后的1是该条记录的 Id。它不一定是数字,任意字符串(比如abc)都可以。

$ curl -X PUT 'localhost:9200/accounts/person/1' -d '
{
  "user": "张三",
  "title": "工程师",
  "desc": "数据库管理"
}' 

服务器返回的 JSON 对象,会给出 Index、Type、Id、Version 等信息。

{
  "_index":"accounts",
  "_type":"person",
  "_id":"1",
  "_version":1,
  "result":"created",
  "_shards":{"total":2,"successful":1,"failed":0},
  "created":true
}
注意,如果没有先创建 Index(这个例子是accounts),直接执行上面的命令,Elastic 也不会报错,而是直接生成指定的 Index。所以,打字的时候要小心,不要写错 Index 的名称。

4.2 查看记录

向/Index/Type/Id发出 GET 请求,就可以查看这条记录。

$ curl 'localhost:9200/accounts/person/1?pretty=true'

上面代码请求查看/accounts/person/1这条记录,URL 的参数pretty=true表示以易读的格式返回。

返回的数据中,found字段表示查询成功,_source字段返回原始记录。

{
  "_index" : "accounts",
  "_type" : "person",
  "_id" : "1",
  "_version" : 1,
  "found" : true,
  "_source" : {
    "user" : "张三",
    "title" : "工程师",
    "desc" : "数据库管理"
  }
}

如果 Id 不正确,就查不到数据,found字段就是false。

$ curl 'localhost:9200/weather/beijing/abc?pretty=true'

{
  "_index" : "accounts",
  "_type" : "person",
  "_id" : "abc",
  "found" : false
}

4.3 删除记录

删除记录就是发出 DELETE 请求。

$ curl -X DELETE 'localhost:9200/accounts/person/1'

4.4 更新记录

更新记录就是使用 PUT 请求,重新发送一次数据。

$ curl -X PUT 'localhost:9200/accounts/person/1' -d '
{
    "user" : "张三",
    "title" : "工程师",
    "desc" : "数据库管理,软件开发"
}' 

{
  "_index":"accounts",
  "_type":"person",
  "_id":"1",
  "_version":2,
  "result":"updated",
  "_shards":{"total":2,"successful":1,"failed":0},
  "created":false
}

上面代码中,我们将原始数据从"数据库管理"改成"数据库管理,软件开发"。 返回结果里面,有几个字段发生了变化。

"_version" : 2,
"result" : "updated",
"created" : false

可以看到,记录的 Id 没变,但是版本(version)从1变成2,操作类型(result)从created变成updated,created字段变成false,因为这次不是新建记录。

五、数据查询

5.1 返回所有记录

使用 GET 方法,直接请求/Index/Type/_search,就会返回所有记录。

$ curl 'localhost:9200/accounts/person/_search'

{
  "took":2,
  "timed_out":false,
  "_shards":{"total":5,"successful":5,"failed":0},
  "hits":{
    "total":2,
    "max_score":1.0,
    "hits":[
      {
        "_index":"accounts",
        "_type":"person",
        "_id":"AV3qGfrC6jMbsbXb6k1p",
        "_score":1.0,
        "_source": {
          "user": "李四",
          "title": "工程师",
          "desc": "系统管理"
        }
      },
      {
        "_index":"accounts",
        "_type":"person",
        "_id":"1",
        "_score":1.0,
        "_source": {
          "user" : "张三",
          "title" : "工程师",
          "desc" : "数据库管理,软件开发"
        }
      }
    ]
  }
}

上面代码中,返回结果的 took字段表示该操作的耗时(单位为毫秒),timed_out字段表示是否超时,hits字段表示命中的记录。

The response also provides the following information about the search request:

  • took – how long it took Elasticsearch to run the query, in milliseconds
  • timed_out – whether or not the search request timed out
  • _shards – how many shards were searched and a breakdown of how many shards succeeded, failed, or were skipped.
  • max_score – the score of the most relevant document found
  • hits.total.value - how many matching documents were found
  • hits.sort - the document’s sort position (when not sorting by relevance score)
  • hits._score - the document’s relevance score (not applicable when using match_all)

5.2 查询分页

Each search request is self-contained: Elasticsearch does not maintain any state information across requests. To page through the search hits, specify the from and size parameters in your request.

For example, the following request gets hits 10 through 19:

GET /bank/_search
{
  "query": { "match_all": {} },
  "sort": [
    { "account_number": "asc" }
  ],
  "from": 10,
  "size": 10
}

通过from字段,指定位移。通过size字段设置每页返回条数。

Now that you’ve seen how to submit a basic search request, you can start to construct queries that are a bit more interesting than match_all.

5.3 匹配单个词

To search for specific terms within a field, you can use a match query. For example, the following request searches the address field to find customers whose addresses contain mill or lane:

GET /bank/_search
{
  "query": { "match": { "address": "mill lane" } }
}

上面代码搜索的是「mill」 or 「lane」,是 “or” 关系。

如果要执行多个关键词的 “and” 搜索,必须使用布尔查询。

$ curl 'localhost:9200/bank/_search'  -d '
{
  "query": {
    "bool": {
      "must": [
        { "match": { "address": "mill" } },
        { "match": { "address": "lane" } }
      ]
    }
  }
}'

注意:Elastic 的查询非常特别,使用自己的查询语法,以上查询要求 GET 请求带有数据体。

5.4 匹配整个词组

To perform a phrase search rather than matching individual terms, you use match_phrase instead of match. For example, the following request only matches addresses that contain the phrase mill lane:

GET /bank/_search
{
  "query": { "match_phrase": { "address": "mill lane" } }
}

5.5 结合多个查询条件:匹配或不匹配

To construct more complex queries, you can use a bool query to combine multiple query criteria. You can designate criteria as required (must match), desirable (should match), or undesirable (must not match).

For example, the following request searches the bank index for accounts that belong to customers who are 40 years old, but excludes anyone who lives in Idaho (ID):

GET /bank/_search
{
  "query": {
    "bool": {
      "must": [
        { "match": { "age": "40" } }
      ],
      "must_not": [
        { "match": { "state": "ID" } }
      ]
    }
  }
}

Each must, should, and must_not element in a Boolean query is referred to as a query clause. How well a document meets the criteria in each must or should clause contributes to the document’s relevance score. The higher the score, the better the document matches your search criteria. By default, Elasticsearch returns documents ranked by these relevance scores.

The criteria in a must_not clause is treated as a filter. It affects whether or not the document is included in the results, but does not contribute to how documents are scored. You can also explicitly specify arbitrary filters to include or exclude documents based on structured data.

For example, the following request uses a range filter to limit the results to accounts with a balance between $20,000 and $30,000 (inclusive).

GET /bank/_search
{
  "query": {
    "bool": {
      "must": { "match_all": {} },
      "filter": {
        "range": {
          "balance": {
            "gte": 20000,
            "lte": 30000
          }
        }
      }
    }
  }
}

更多见 Start searching

六、删除文档

6.1 Delete

Request

DELETE //_doc/<_id>

Description

You use DELETE to remove a document from an index. You must specify the index name and document ID.

Examples

Delete the JSON document 1 from the twitter index:

DELETE /twitter/_doc/1

The API returns the following result:

{
    "_shards" : {
        "total" : 2,
        "failed" : 0,
        "successful" : 2
    },
    "_index" : "twitter",
    "_type" : "_doc",
    "_id" : "1",
    "_version" : 2,
    "_primary_term": 1,
    "_seq_no": 5,
    "result": "deleted"
}

6.2 Delete by query

Request

POST //_delete_by_query

Deletes documents that match the specified query.

POST /twitter/_delete_by_query
{
  "query": {
    "match": {
      "message": "some message"
    }
  }
}

Response body

The JSON response looks like this:

{
  "took" : 147,
  "timed_out": false,
  "total": 119,
  "deleted": 119,
  "batches": 1,
  "version_conflicts": 0,
  "noops": 0,
  "retries": {
    "bulk": 0,
    "search": 0
  },
  "throttled_millis": 0,
  "requests_per_second": -1.0,
  "throttled_until_millis": 0,
  "failures" : [ ]
}
Documents with a version equal to 0 cannot be deleted using delete by query because internal versioning does not support 0 as a valid version number.

REST APIs

参考

Elasticsearch 官方文档
Elasticsearch guide
Mastering elasticsearch
全文搜索引擎 Elasticsearch 入门教程

你可能感兴趣的:(elastic-search,java,elasticsearch,入门教程)