作者:ReyCG
出处:ReyCG 的博客 — https://www.cnblogs.com/reycg-blog/
目录
- 引言
- 概述
- High REST Client
- 起步
- 兼容性
- Java Doc 地址
- Maven 配置
- 依赖
- 初始化
- 文档 API
- Index API
- GET API
- Exists API
- Delete API
- Update API
- Bulk API 批量处理
- Multi-Get API
- 结语
引言
业余时间搞 python 爬虫爬取数据,完善我的小程序;工作时间还是要努力完成领导分配的任务,做我的 Java 老本行的。
这不,现在就有个需求,集团要将 elasticsearch 版本从 2.2 升级到 6.3, 由于之前做项目使用 spring data es
来完成 es 数据的增删改查,现在一下升级到这么高的版本,遇到各种 API 不兼容的问题。并且 spring data es
由于整体框架 spring
等版本的限制,也不能使用了。
无奈之下,只能使用 elasticsearch 提供的 java reset client API 来完成之前的操作。工欲善其事,必先利其器。要使用 API,第一步就是要完整,熟练的理解各个 API 的用途,限制。在学习 API 的过程中,我将 API 的文档统一整理了一番,方便自己使用时查询,也希望能对用到这部分的同学提供方便。
注意,本 API 指南只针对 elasticsearch 6.3 版本。
概述
Rest client 分成两部分:
- Java Low Level REST Client
- 官方低级别 es 客户端,使用 http 协议与 Elastiicsearch 集群通信,与所有 es 版本兼容。
- Java High level REST Client
- 官方高级别 es 客户端,基于低级别的客户端,它会暴露 API 特定的方法。
官方文档链接地址
High REST Client
High Client 基于 Low Client, 主要目的是暴露一些 API,这些 API 可以接受请求对象为参数,返回响应对象,而对请求和响应细节的处理都是由 client 自动完成的。
每个 API 在调用时都可以是同步或者异步的。同步和异步 API 的区别是什么呢?
- 同步 API 会导致阻塞,一直等待数据返回
- 异步 API 在命名上会加上
async
后缀,需要有一个 listener
作为参数,等这个请求返回结果或者发生错误时,这个 listener
就会被调用
起步
兼容性
- java 1.8
- Elasticsearch 核心项目
Java Doc 地址
只有英文版
Maven 配置
<dependency>
<groupId>org.elasticsearch.clientgroupId>
<artifactId>elasticsearch-rest-high-level-clientartifactId>
<version>6.3.2version>
dependency>
依赖
org.elasticsearch.client:elasticsearch-rest-client
org.elasticsearch:elasticsearch
初始化
RestHighLevelClient
实例依赖 REST low-level client builder
RestHighLevelClient client = new RestHighLevelClient(
RestClient.builder(
new HttpHost("localhost", 9200, "http"),
new HttpHost("localhost", 9201, "http")));
High-level client 会依赖 Low-level client 来执行请求, low-level client 则会维护一个请求的线程连接池,因为当 high-level 请求处理结束时,应该 close 掉这个连接,使 low-level client 能尽快释放资源。
client.close();
文档 API
High level rest 客户端支持下面的 文档(Document) API
- 单文档 API
- index API
- Get API
- Delete API
- Update API
- 多文档 API
Index API
IndexRequest
IndexRequest request = new IndexRequest(
"posts",
"doc",
"1");
String jsonString = "{" +
"\"user\":\"kimchy\"," +
"\"postDate\":\"2013-01-30\"," +
"\"message\":\"trying out Elasticsearch\"" +
"}";
request.source(jsonString, XContentType.JSON);
Document Source
document source 可以是下面的格式
Map jsonMap = new HashMap<>();
jsonMap.put("user", "kimchy");
jsonMap.put("postDate", new Date());
jsonMap.put("message", "trying out Elasticsearch");
IndexRequest indexRequest = new IndexRequest("posts", "doc", "1")
.source(jsonMap);
- XContentBuilder : 这是 Document Source 提供的帮助类,专门用来产生 json 格式的数据
XContentBuilder builder = XContentFactory.jsonBuilder();
builder.startObject();
{
builder.field("user", "kimchy");
builder.timeField("postDate", new Date());
builder.field("message", "trying out Elasticsearch");
}
builder.endObject();
IndexRequest indexRequest = new IndexRequest("posts", "doc", "1")
.source(builder);
IndexRequest indexRequest = new IndexRequest("posts", "doc", "1")
.source("user", "kimchy",
"postDate", new Date(),
"message", "trying out Elasticsearch");
同步索引
IndexResponse indexResponse = client.index(request);
异步索引
前面已经讲过,异步执行函数需要添加 listener
, 而对于 index 而言,这个 listener
的类型就是 ActionListener
client.indexAsync(request, listener);
异步方法执行后会立刻返回,在索引操作执行完成后,ActionListener
就会被回调:
- 执行成功,调用
onResponse
函数
- 执行失败,调用
onFailure
函数
ActionListener listener = new ActionListener() {
@Override
public void onResponse(IndexResponse indexResponse) {
}
@Override
public void onFailure(Exception e) {
}
};
IndexResponse
不管是同步回调还是异步回调,如果调用成功,都会返回 IndexRespose
对象。 这个对象中包含什么信息呢?看下面代码
String index = indexResponse.getIndex();
String type = indexResponse.getType();
String id = indexResponse.getId();
long version = indexResponse.getVersion();
if (indexResponse.getResult() == DocWriteResponse.Result.CREATED) {
} else if (indexResponse.getResult() == DocWriteResponse.Result.UPDATED) {
}
ReplicationResponse.ShardInfo shardInfo = indexResponse.getShardInfo();
if (shardInfo.getTotal() != shardInfo.getSuccessful()) {
}
if (shardInfo.getFailed() > 0) {
for (ReplicationResponse.ShardInfo.Failure failure : shardInfo.getFailures()) {
String reason = failure.reason();
}
}
在索引时有版本冲突的话,会抛出 ElasticsearchException
IndexRequest request = new IndexRequest("posts", "doc", "1")
.source("field", "value")
.version(1);
try {
IndexResponse response = client.index(request);
} catch(ElasticsearchException e) {
if (e.status() == RestStatus.CONFLICT) {
}
}
如果将 opType
设置为 create
, 而且如果索引的文档与已存在的文档在 index, type 和 id 上均相同,也会抛出冲突异常。
IndexRequest request = new IndexRequest("posts", "doc", "1")
.source("field", "value")
.opType(DocWriteRequest.OpType.CREATE);
try {
IndexResponse response = client.index(request);
} catch(ElasticsearchException e) {
if (e.status() == RestStatus.CONFLICT) {
}
}
GET API
GET 请求
每个 GET 请求都必须需传入下面 3 个参数
GetRequest getRequest = new GetRequest(
"posts",
"doc",
"1");
可选参数
下面的参数都是可选的, 里面的选项并不完整,如要获取完整的属性,请参考 官方文档
request.fetchSourceContext(FetchSourceContext.DO_NOT_FETCH_SOURCE);
String[] includes = new String[]{"message", "*Date"};
String[] excludes = Strings.EMPTY_ARRAY;
FetchSourceContext fetchSourceContext =
new FetchSourceContext(true, includes, excludes);
request.fetchSourceContext(fetchSourceContext);
String[] includes = Strings.EMPTY_ARRAY;
String[] excludes = new String[]{"message"};
FetchSourceContext fetchSourceContext =
new FetchSourceContext(true, includes, excludes);
request.fetchSourceContext(fetchSourceContext);
request.realtime(false);
request.version(2);
request.versionType(VersionType.EXTERNAL);
同步执行
GetResponse getResponse = client.get(getRequest);
异步执行
此部分与 index 相似, 只有一点不同, 返回类型为 GetResponse
代码部分略
Get Response
返回的 GetResponse
对象包含要请求的文档数据(包含元数据和字段)
String index = getResponse.getIndex();
String type = getResponse.getType();
String id = getResponse.getId();
if (getResponse.isExists()) {
long version = getResponse.getVersion();
String sourceAsString = getResponse.getSourceAsString();
Map sourceAsMap = getResponse.getSourceAsMap();
byte[] sourceAsBytes = getResponse.getSourceAsBytes();
} else {
}
在请求中如果包含特定的文档版本,如果与已存在的文档版本不匹配, 就会出现冲突
try {
GetRequest request = new GetRequest("posts", "doc", "1").version(2);
GetResponse getResponse = client.get(request);
} catch (ElasticsearchException exception) {
if (exception.status() == RestStatus.CONFLICT) {
}
}
Exists API
如果文档存在 Exists API 返回 true
, 否则返回 fasle
。
Exists Request
GetRequest
用法和 Get API 差不多,两个对象的可选参数是相同的。由于 exists()
方法只返回 true
或者 false
, 建议将获取 _source
以及任何存储字段的值关闭,尽量使请求轻量级。
GetRequest getRequest = new GetRequest(
"posts",
"doc",
"1");
getRequest.fetchSourceContext(new FetchSourceContext(false));
getRequest.storedFields("_none_");
同步请求
boolean exists = client.exists(getRequest);
异步请求
异步请求与 Index API 相似,此处不赘述,只粘贴代码。如需详细了解,请参阅官方地址
ActionListener listener = new ActionListener() {
@Override
public void onResponse(Boolean exists) {
}
@Override
public void onFailure(Exception e) {
}
};
client.existsAsync(getRequest, listener);
Delete API
官方地址
Delete Request
DeleteRequest
必须传入下面参数
DeleteRequest request = new DeleteRequest(
"posts",
"doc",
"1");
可选参数
超时时间
request.timeout(TimeValue.timeValueMinutes(2));
request.timeout("2m");
刷新策略
request.setRefreshPolicy(WriteRequest.RefreshPolicy.WAIT_UNTIL);
request.setRefreshPolicy("wait_for");
版本
request.version(2);
版本类型
request.versionType(VersionType.EXTERNAL);
同步执行
DeleteResponse deleteResponse = client.delete(request);
异步执行
ActionListener listener = new ActionListener() {
@Override
public void onResponse(DeleteResponse deleteResponse) {
}
@Override
public void onFailure(Exception e) {
}
};
client.deleteAsync(request, listener);
Delete Response
DeleteResponse
可以检索执行操作的信息,如代码所示
String index = deleteResponse.getIndex();
String type = deleteResponse.getType();
String id = deleteResponse.getId();
long version = deleteResponse.getVersion();
ReplicationResponse.ShardInfo shardInfo = deleteResponse.getShardInfo();
if (shardInfo.getTotal() != shardInfo.getSuccessful()) {
}
if (shardInfo.getFailed() > 0) {
for (ReplicationResponse.ShardInfo.Failure failure : shardInfo.getFailures()) {
String reason = failure.reason();
}
}
也可以来检查文档是否存在
DeleteRequest request = new DeleteRequest("posts", "doc", "does_not_exist");
DeleteResponse deleteResponse = client.delete(request);
if (deleteResponse.getResult() == DocWriteResponse.Result.NOT_FOUND) {
}
版本冲突时也会抛出 `ElasticsearchException
try {
DeleteRequest request = new DeleteRequest("posts", "doc", "1").version(2);
DeleteResponse deleteResponse = client.delete(request);
} catch (ElasticsearchException exception) {
if (exception.status() == RestStatus.CONFLICT) {
}
}
Update API
Update Request
UpdateRequest
的必需参数如下
UpdateRequest request = new UpdateRequest(
"posts",
"doc",
"1");
使用脚本更新
官方地址
部分文档更新
在更新部分文档时,已存在文档与部分文档会合并。
部分文档可以有以下形式
JSON 格式
UpdateRequest request = new UpdateRequest("posts", "doc", "1");
String jsonString = "{" +
"\"updated\":\"2017-01-01\"," +
"\"reason\":\"daily update\"" +
"}";
request.doc(jsonString, XContentType.JSON);
Map
格式
Map jsonMap = new HashMap<>();
jsonMap.put("updated", new Date());
jsonMap.put("reason", "daily update");
UpdateRequest request = new UpdateRequest("posts", "doc", "1")
.doc(jsonMap);
XContentBuilder
对象
XContentBuilder builder = XContentFactory.jsonBuilder();
builder.startObject();
{
builder.timeField("updated", new Date());
builder.field("reason", "daily update");
}
builder.endObject();
UpdateRequest request = new UpdateRequest("posts", "doc", "1")
.doc(builder);
Object
key-pairs
UpdateRequest request = new UpdateRequest("posts", "doc", "1")
.doc("updated", new Date(),
"reason", "daily update");
Upserts
如果文档不存在,可以使用 upserts
方法将文档以新文档的方式创建。
UpdateRequest request = new UpdateRequest("posts", "doc", "1")
.doc("updated", new Date(),
"reason", "daily update");
upserts
方法支持的文档格式与 update
方法相同。
可选参数
超时时间
request.timeout(TimeValue.timeValueSeconds(1));
request.timeout("1s");
刷新策略
request.setRefreshPolicy(WriteRequest.RefreshPolicy.WAIT_UNTIL);
request.setRefreshPolicy("wait_for");
冲突后重试次数
request.retryOnConflict(3);
获取数据源,默认是开启的
request.fetchSource(true);
包括特定字段
String[] includes = new String[]{"updated", "r*"};
String[] excludes = Strings.EMPTY_ARRAY;
request.fetchSource(new FetchSourceContext(true, includes, excludes));
排除特定字段
String[] includes = Strings.EMPTY_ARRAY;
String[] excludes = new String[]{"updated"};
request.fetchSource(new FetchSourceContext(true, includes, excludes));
指定版本
request.version(2);
禁用 noop detection
request.scriptedUpsert(true);
设置如果更新的文档不存在,就必须要创建一个
request.docAsUpsert(true);
同步执行
UpdateResponse updateResponse = client.update(request);
异步执行
此处只贴代码,官方地址
ActionListener listener = new ActionListener() {
@Override
public void onResponse(UpdateResponse updateResponse) {
}
@Override
public void onFailure(Exception e) {
}
};
client.updateAsync(request, listener);
Update Response
String index = updateResponse.getIndex();
String type = updateResponse.getType();
String id = updateResponse.getId();
long version = updateResponse.getVersion();
if (updateResponse.getResult() == DocWriteResponse.Result.CREATED) {
} else if (updateResponse.getResult() == DocWriteResponse.Result.UPDATED) {
} else if (updateResponse.getResult() == DocWriteResponse.Result.DELETED) {
} else if (updateResponse.getResult() == DocWriteResponse.Result.NOOP) {
}
如果在 UpdateRequest
中使能了获取源数据,响应中则包含了更新后的源文档信息。
GetResult result = updateResponse.getGetResult();
if (result.isExists()) {
String sourceAsString = result.sourceAsString();
Map sourceAsMap = result.sourceAsMap();
byte[] sourceAsBytes = result.source();
} else {
}
也可以检测是否分片失败
ReplicationResponse.ShardInfo shardInfo = updateResponse.getShardInfo();
if (shardInfo.getTotal() != shardInfo.getSuccessful()) {
}
if (shardInfo.getFailed() > 0) {
for (ReplicationResponse.ShardInfo.Failure failure : shardInfo.getFailures()) {
String reason = failure.reason();
}
}
如果在执行 UpdateRequest
时,文档不存在,响应中会包含 404
状态码,而且会抛出 ElasticsearchException
。
UpdateRequest request = new UpdateRequest("posts", "type", "does_not_exist")
.doc("field", "value");
try {
UpdateResponse updateResponse = client.update(request);
} catch (ElasticsearchException e) {
if (e.status() == RestStatus.NOT_FOUND) {
}
}
如果版本冲突,也会抛出 ElasticsearchException
UpdateRequest request = new UpdateRequest("posts", "doc", "1")
.doc("field", "value")
.version(1);
try {
UpdateResponse updateResponse = client.update(request);
} catch(ElasticsearchException e) {
if (e.status() == RestStatus.CONFLICT) {
}
}
Bulk API 批量处理
批量请求
使用 BulkRequest
可以在一次请求中执行多个索引,更新和删除的操作。
BulkRequest request = new BulkRequest();
request.add(new IndexRequest("posts", "doc", "1")
.source(XContentType.JSON,"field", "foo"));
request.add(new IndexRequest("posts", "doc", "2")
.source(XContentType.JSON,"field", "bar"));
request.add(new IndexRequest("posts", "doc", "3")
.source(XContentType.JSON,"field", "baz"));
在同一个 BulkRequest
也可以添加不同的操作类型
BulkRequest request = new BulkRequest();
request.add(new DeleteRequest("posts", "doc", "3"));
request.add(new UpdateRequest("posts", "doc", "2")
.doc(XContentType.JSON,"other", "test"));
request.add(new IndexRequest("posts", "doc", "4")
.source(XContentType.JSON,"field", "baz"));
可选参数
超时时间
request.timeout(TimeValue.timeValueMinutes(2));
request.timeout("2m");
刷新策略
request.setRefreshPolicy(WriteRequest.RefreshPolicy.WAIT_UNTIL);
request.setRefreshPolicy("wait_for");
设置在批量操作前必须有几个分片处于激活状态
request.waitForActiveShards(2);
request.waitForActiveShards(ActiveShardCount.ALL);
request.waitForActiveShards(ActiveShardCount.DEFAULT);
request.waitForActiveShards(ActiveShardCount.ONE);
同步请求
BulkResponse bulkResponse = client.bulk(request);
异步请求
与 GETAPI 等请求类似,只贴代码。
ActionListener listener = new ActionListener() {
@Override
public void onResponse(BulkResponse bulkResponse) {
}
@Override
public void onFailure(Exception e) {
}
};
client.bulkAsync(request, listener);
Bulk Response
BulkResponse
中包含执行操作后的信息,并允许对每个操作结果迭代。
for (BulkItemResponse bulkItemResponse : bulkResponse) {
DocWriteResponse itemResponse = bulkItemResponse.getResponse();
if (bulkItemResponse.getOpType() == DocWriteRequest.OpType.INDEX
|| bulkItemResponse.getOpType() == DocWriteRequest.OpType.CREATE) {
IndexResponse indexResponse = (IndexResponse) itemResponse;
} else if (bulkItemResponse.getOpType() == DocWriteRequest.OpType.UPDATE) {
UpdateResponse updateResponse = (UpdateResponse) itemResponse;
} else if (bulkItemResponse.getOpType() == DocWriteRequest.OpType.DELETE) {
DeleteResponse deleteResponse = (DeleteResponse) itemResponse;
}
}
此外,批量响应还有一个非常便捷的方法来检测是否有一个或多个操作失败
if (bulkResponse.hasFailures()) {
}
在这种情况下,我们要遍历所有的操作结果,检查是否是失败的操作,并获取对应的失败信息
for (BulkItemResponse bulkItemResponse : bulkResponse) {
if (bulkItemResponse.isFailed()) {
BulkItemResponse.Failure failure = bulkItemResponse.getFailure();
}
}
Bulk Processor
BulkProcessor
是为了简化 Bulk API 的操作提供的一个工具类,要执行操作,就需要下面组件
RestHighLevelClient
用来执行 BulkRequest
并获取 BulkResponse`
BulkProcessor.Listener
对 BulkRequest
执行前后以及失败时监听
BulkProcessor.builder
方法用来构建一个新的BulkProcessor
BulkProcessor.Listener listener = new BulkProcessor.Listener() {
@Override
public void beforeBulk(long executionId, BulkRequest request) {
}
@Override
public void afterBulk(long executionId, BulkRequest request,
BulkResponse response) {
}
@Override
public void afterBulk(long executionId, BulkRequest request, Throwable failure) {
}
};
BulkProcessor bulkProcessor =
BulkProcessor.builder(client::bulkAsync, listener).build();
BulkProcessor.Builder
提供了多个方法来配置 BulkProcessor
如何来处理请求的执行。
BulkProcessor.Builder builder = BulkProcessor.builder(client::bulkAsync, listener);
builder.setBulkActions(500);
builder.setBulkSize(new ByteSizeValue(1L, ByteSizeUnit.MB));
builder.setConcurrentRequests(0);
builder.setFlushInterval(TimeValue.timeValueSeconds(10L));
builder.setBackoffPolicy(BackoffPolicy
.constantBackoff(TimeValue.timeValueSeconds(1L), 3));
BulkProcessor
创建后,各种请求就可以添加进去:
IndexRequest one = new IndexRequest("posts", "doc", "1").
source(XContentType.JSON, "title",
"In which order are my Elasticsearch queries executed?");
IndexRequest two = new IndexRequest("posts", "doc", "2")
.source(XContentType.JSON, "title",
"Current status and upcoming changes in Elasticsearch");
IndexRequest three = new IndexRequest("posts", "doc", "3")
.source(XContentType.JSON, "title",
"The Future of Federated Search in Elasticsearch");
bulkProcessor.add(one);
bulkProcessor.add(two);
bulkProcessor.add(three);
BulkProcessor
执行时,会对每个 bulk request调用 BulkProcessor.Listener
, listener 提供了下面方法来访问 BulkRequest
和 BulkResponse
:
BulkProcessor.Listener listener = new BulkProcessor.Listener() {
@Override
public void beforeBulk(long executionId, BulkRequest request) {
int numberOfActions = request.numberOfActions();
logger.debug("Executing bulk [{}] with {} requests",
executionId, numberOfActions);
}
@Override
public void afterBulk(long executionId, BulkRequest request,
BulkResponse response) {
if (response.hasFailures()) {
logger.warn("Bulk [{}] executed with failures", executionId);
} else {
logger.debug("Bulk [{}] completed in {} milliseconds",
executionId, response.getTook().getMillis());
}
}
@Override
public void afterBulk(long executionId, BulkRequest request, Throwable failure) {
logger.error("Failed to execute bulk", failure);
}
};
请求添加到 BulkProcessor
, 它的实例可以使用下面两种方法关闭请求。
awaitClose()
在请求返回后或等待一定时间关闭
boolean terminated = bulkProcessor.awaitClose(30L, TimeUnit.SECONDS);
bulkProcessor.close();
两个方法都会在关闭前对处理器中的请求进行刷新,并避免新的请求添加进去。
Multi-Get API
multiGet
API 可以在单个 http 交互中并行的执行多个 get
请求。
Muti-Get Request
MutiGetRequest
实例化时参数为空,实例化后可以通过添加 MultiGetRequest.Item
来配置获取的信息
MultiGetRequest request = new MultiGetRequest();
request.add(new MultiGetRequest.Item(
"index",
"type",
"example_id"));
request.add(new MultiGetRequest.Item("index", "type", "another_id"));
可选参数
multiGet
支持的参数与 Get API 支持的可选参数是相同的,可以在 Item 上设置它们。
官方地址
同步执行
构建 MultiGetRequest
后可以同步的方式执行multiGet
MultiGetResponse response = client.multiGet(request);
异步执行
和上面的异步执行一样,也是使用 listener 机制。
ActionListener listener = new ActionListener() {
@Override
public void onResponse(MultiGetResponse response) {
}
@Override
public void onFailure(Exception e) {
}
};
client.multiGetAsync(request, listener);
Multi-Get Response
MultiGetResponse
中getResponse
方法包含的 MultiGetItemResponse
顺序与请求时的相同。
MultiGetItemResponse
,如果执行成功,就会返回 GetResponse
对象,失败则返回
MultiGetResponse.Failure
MultiGetItemResponse firstItem = response.getResponses()[0];
assertNull(firstItem.getFailure());
GetResponse firstGet = firstItem.getResponse();
String index = firstItem.getIndex();
String type = firstItem.getType();
String id = firstItem.getId();
if (firstGet.isExists()) {
long version = firstGet.getVersion();
String sourceAsString = firstGet.getSourceAsString();
Map sourceAsMap = firstGet.getSourceAsMap();
byte[] sourceAsBytes = firstGet.getSourceAsBytes();
} else {
}
如果子请求中对应的 index 不存在,返回的 getFailure
方法中会包含 exception:
assertNull(missingIndexItem.getResponse());
Exception e = missingIndexItem.getFailure().getFailure();
ElasticsearchException ee = (ElasticsearchException) e;
assertThat(e.getMessage(),
containsString("reason=no such index"));
对版本冲突时的处理,官方说明地址
MultiGetRequest request = new MultiGetRequest();
request.add(new MultiGetRequest.Item("index", "type", "example_id")
.version(1000L));
MultiGetResponse response = client.multiGet(request);
MultiGetItemResponse item = response.getResponses()[0];
assertNull(item.getResponse());
Exception e = item.getFailure().getFailure();
ElasticsearchException ee = (ElasticsearchException) e;
assertThat(e.getMessage(),
containsString("version conflict, current version [1] is "
+ "different than the one provided [1000]"));
结语
本文只包含 Java High level Rest Client 的 起步,和文档 API 部分,下篇文章中会包含查询 API,敬请期待。