文档API Document APIs
这个部分主要描述了以下的CRUD API
一 Single document APIs
1 Index API
index API 允许我们添加某种类型的JSON文档到特定的index ,并使之可搜索.
生成JSON文档
生成JSON文档的方式如下:
- 手动使用native
byte[]
or as aString
- 使用一个可以自动转换为对应JSON 的
Map
- 使用第三方库如 Jackson序列化 beans
- 使用内置的
XContentFactory.jsonBuilder()
实际上, 每种方法都是转换成byte[] (so a String is converted to a byte[]). 所以如果一个对象已经是byte[] 就可直接使用. jsonBuilder
是一个高度优化的可以直接创建一个byte[]的JSON 生成器.
各种生成JSON文档方法的具体说明
Manually
注意要根据日期格式对日期进行编码
String json = "{" +
"\"user\":\"kimchy\"," +
"\"postDate\":\"2013-01-30\"," +
"\"message\":\"trying out Elasticsearch\"" +
"}";
Using Map
Map json = new HashMap();
json.put("user","kimchy");
json.put("postDate",new Date());
json.put("message","trying out Elasticsearch");
序列化Beans
可以使用 Jackson 进行序列化.
要先添加Jackson Databind 到应用项目.然后使用ObjectMapper
进行序列化
import com.fasterxml.jackson.databind.*;
// instance a json mapper
ObjectMapper mapper = new ObjectMapper(); // create once, reuse
// generate json
byte[] json = mapper.writeValueAsBytes(yourbeaninstance);
Use Elasticsearchhelpers
import static org.elasticsearch.common.xcontent.XContentFactory.*;
XContentBuilder builder = jsonBuilder()
.startObject()
.field("user", "kimchy")
.field("postDate", new Date())
.field("message", "trying out Elasticsearch")
.endObject()
注意 : 你也可以用方法startArray(String)
和 endArray()
添加数组. field
方法可以接受很多类型的对象. 你可以直接传入 数据, 日期 甚至其他XContentBuilder
对象.
如果你想查看生产的JSON内容,你可以使用方法string()
String json = builder.string();
应用实例
以下演示 在index为twitter ;类型为 tweet; id为1 加入一个JSON文档
import static org.elasticsearch.common.xcontent.XContentFactory.*;
IndexResponse response = client.prepareIndex("twitter", "tweet", "1")
.setSource(jsonBuilder()
.startObject()
.field("user", "kimchy")
.field("postDate", new Date())
.field("message", "trying out Elasticsearch")
.endObject()
)
.get();
你可以以一个json string的形式 加入json文档,此时你不需要给出id
String json = "{" +
"\"user\":\"kimchy\"," +
"\"postDate\":\"2013-01-30\"," +
"\"message\":\"trying out Elasticsearch\"" +
"}";
IndexResponse response = client.prepareIndex("twitter", "tweet")
.setSource(json)
.get();
IndexResponse
对象会反馈一个报告
// Index name
String _index = response.getIndex();
// Type name
String _type = response.getType();
// Document ID (generated or not)
String _id = response.getId();
// Version (if it's the first time you index this document, you will get: 1)
long _version = response.getVersion();
// status has stored current instance statement.
RestStatus status = response.status();
更多有关index的操作,请查看REST index 文档.
操作线程
当操作是在同一个节点执行时,index API 允许我们设置操作的执行方式为线程模式.
另一个方式是可以在不同的线程执行这次操作,或者在调用线程执行, 默认情况下,
operationThreaded
设置为true
表示操作是在不同的线程执行的.
2 Get API
get API 允许我们从index下根据id获取某个类型的JSON 文档. 以下实例演示了index为 twitter, type 为 tweet, id为1 获取数据:
GetResponse response = client.prepareGet("twitter", "tweet", "1").get();
更多 get操作,请查看 REST get 文档.
操作线程
当操作是在同一个节点执行时,get API 允许我们设置操作的执行方式为线程模式.
另一个方式是可以在不同的线程执行这次操作,或者在调用线程执行, 默认情况下,
operationThreaded设置为true 表示操作是在不同的线程执行的. 以下是设置为false的例子
GetResponse response = client.prepareGet("twitter", "tweet", "1")
.setOperationThreaded(false)
.get();
3 Delete API
Delete API 允许我们从index下根据id删除某个类型的JSON 文档. 以下实例演示了index为 twitter, type 为 tweet, id为1 删除数据:
DeleteResponse response = client.prepareDelete("twitter", "tweet", "1").get();
更多 get操作,请查看 RESTdelete API文档.
Delete By Query API
删除一个查询出来的数据
BulkIndexByScrollResponse response =
DeleteByQueryAction.INSTANCE.newRequestBuilder(client)
.filter(QueryBuilders.matchQuery("gender", "male")) // query
.source("persons") // index
.get(); // execute the operation
long deleted = response.getDeleted(); // number of deleted documents
也可以持续操作, 如果你想异步执行这些操作,你可以调用execute
而非get
兵器提供如下监听
DeleteByQueryAction.INSTANCE.newRequestBuilder(client)
.filter(QueryBuilders.matchQuery("gender", "male")) //query
.source("persons") //index
.execute(new ActionListener() { //listener
@Override
public void onResponse(BulkIndexByScrollResponse response) {
long deleted = response.getDeleted(); //number of deleted documents
}
@Override
public void onFailure(Exception e) {
// Handle the exception
}
});
Update API
创建UpdateRequest
并发送到 client
UpdateRequest updateRequest = new UpdateRequest();
updateRequest.index("index");
updateRequest.type("type");
updateRequest.id("1");
updateRequest.doc(jsonBuilder()
.startObject()
.field("gender", "male")
.endObject());
client.update(updateRequest).get();
或者 使用prepareUpdate()
方法
client.prepareUpdate("ttl", "doc", "1")
.setScript(new Script("ctx._source.gender = \"male\"" // 1, ScriptService.ScriptType.INLINE, null, null))
.get();
client.prepareUpdate("ttl", "doc", "1")
.setDoc(jsonBuilder() //2
.startObject()
.field("gender", "male")
.endObject())
.get();
说明1:你自己的script. 也可以是本地存储的script文件名称. 这种情况下你需要使用, you’ll ScriptService.ScriptType.FILE
说明2:被增加的文档
注意: 你不能同时提供script
和doc
根据script 修改
UpdateRequest updateRequest = new UpdateRequest("ttl", "doc", "1")
.script(new Script("ctx._source.gender = \"male\""));
client.update(updateRequest).get();
Update by merging documents
运行传入部分会被添加进已知文档的文档
UpdateRequest updateRequest = new UpdateRequest("index", "type", "1")
.doc(jsonBuilder()
.startObject()
.field("gender", "male")
.endObject());
client.update(updateRequest).get();
Upsert
支持upsert
如果文档不存在 ,会使用upsert元素去增加新文档
IndexRequest indexRequest = new IndexRequest("index", "type", "1")
.source(jsonBuilder()
.startObject()
.field("name", "Joe Smith")
.field("gender", "male")
.endObject());
UpdateRequest updateRequest = new UpdateRequest("index", "type", "1")
.doc(jsonBuilder()
.startObject()
.field("gender", "male")
.endObject())
.upsert(indexRequest); //1
client.update(updateRequest).get();
说明1:
//如果文档不存在, the one in indexRequest will be added. 文档内容如下:
{
"name" : "Joe Smith",
"gender": "male"
}
//如果index/type/1
文档已存在, 该操作后文档内容如下:
{ "name" : "Joe Dalton",
"gender": "male"
}
二 Multi-document APIs
multi get API可以根据index, type 和 id获取一系列的文档数据
MultiGetResponse multiGetItemResponses = client.prepareMultiGet()
.add("twitter", "tweet", "1") // get by a single id
.add("twitter", "tweet", "2", "3", "4")//or by a list of ids for the same index / type
.add("another", "type", "foo") //you can also get from another index
.get();
for (MultiGetItemResponse itemResponse : multiGetItemResponses) {
//iterate over the result set
GetResponse response = itemResponse.getResponse();
if (response.isExists()) { // you can check if the document exists
String json = response.getSourceAsString(); //access to the _source field
}
}
更多有关multi get 操作,请参照REST multi get 文档.
1 Bulk API
使用bulk API可以在一个请求中索引和删除几个文档 ,使用实例如下:
import static org.elasticsearch.common.xcontent.XContentFactory.*;
BulkRequestBuilder bulkRequest = client.prepareBulk();
// either use client#prepare, or use Requests# to directly build index/delete requests
bulkRequest.add(client.prepareIndex("twitter", "tweet", "1")
.setSource(jsonBuilder()
.startObject()
.field("user", "kimchy")
.field("postDate", new Date())
.field("message", "trying out Elasticsearch")
.endObject()
)
);
bulkRequest.add(client.prepareIndex("twitter", "tweet", "2")
.setSource(jsonBuilder()
.startObject()
.field("user", "kimchy")
.field("postDate", new Date())
.field("message", "another post")
.endObject()
)
);
BulkResponse bulkResponse = bulkRequest.get();
if (bulkResponse.hasFailures()) {
// process failures by iterating through each bulk response item
}
Using Bulk Processor
BulkProcessor提供一个基于请求数量和大小或者某个特定时间之后的自动刷新批处理操作接口.
使用如下:
import org.elasticsearch.action.bulk.BackoffPolicy;
import org.elasticsearch.action.bulk.BulkProcessor;
import org.elasticsearch.common.unit.ByteSizeUnit;
import org.elasticsearch.common.unit.ByteSizeValue;
import org.elasticsearch.common.unit.TimeValue;
BulkProcessor bulkProcessor = BulkProcessor.builder(
client 1,new BulkProcessor.Listener() {
@Override
public void beforeBulk(long executionId,
BulkRequest request) { ... } //2
@Override
public void afterBulk(long executionId,
BulkRequest request,
BulkResponse response) { ... } //3
@Override
public void afterBulk(long executionId,
BulkRequest request,
Throwable failure) { ... }//4
})
.setBulkActions(10000)//5
.setBulkSize(new ByteSizeValue(5, ByteSizeUnit.MB))//6
.setFlushInterval(TimeValue.timeValueSeconds(5))//7
.setConcurrentRequests(1) //8
.setBackoffPolicy(
BackoffPolicy.exponentialBackoff(TimeValue.timeValueMillis(100), 3))//9
.build();
说明1:Add your elasticsearch client
说明2:bulk执行前会调用这个方法. 例如:你可以通过这个方法使用request.numberOfActions()
查看 numberOfActions
说明3:bulk执行后会调用这个方法.例如你可以通过这个方法结合response.hasFailures()
查看失败的请求
说明4:当bulk执行失败或产生异常的时候会去调用这个方法
说明5:表示每个bulk要执行1000条请求
说明6:表示每到5mb大小的时候执行bulk
说明7:不论请求的数量如何,我们每5s刷新bulk
说明8:设置当前并发请求数量,0表示运行执行单个请求,1表示当在累计一个新bulk请求时,一个并发请求允许被执行(不知道在讲什么)
Set the number of concurrent requests. A value of 0 means that only a single request will be allowed to be executed. A value of 1 means 1 concurrent request is allowed to be executed while accumulating new bulk requests.
说明9:Set a custom backoff policy which will initially wait for 100ms, increase exponentially and retries up to three times. A retry is attempted whenever one or more bulk item requests have failed with an EsRejectedExecutionException which indicates that there were too little compute resources available for processing the request. To disable backoff, pass BackoffPolicy.noBackoff().
默认·BulkProcessor·参数设置:
sets bulkActions to 1000
sets bulkSize to 5mb
does not set flushInterval
sets concurrentRequests to 1, which means an asynchronous execution of the flush operation.
sets backoffPolicy to an exponential backoff with 8 retries and a start delay of 50ms. The total wait time is roughly 5.1 seconds.
Add requests
创建了BulkProcessor
后可以往里面添加请求
bulkProcessor.add(new IndexRequest("twitter", "tweet", "1").source(/* your doc here */));
bulkProcessor.add(new DeleteRequest("twitter", "tweet", "2"));
Closing the Bulk Processor
当所有文档都被加载入了BulkProcessor
可以使用awaitClose
或close方法将它关闭
bulkProcessor.awaitClose(10, TimeUnit.MINUTES);
or
bulkProcessor.close();
两个方法都 flush任何剩余的文档,并且使所有通过flushInterval规定的flushes失效,如果concurrent requests启动了,awaitClose
方法会等待特定时间直至所有bulk请求完成,然后返回true,如果在这些请求执行完成前,设定的时间已到,那么则返回false.
close不等待剩余请求执行完毕,立马退出.
Using Bulk Processor in tests
使用elasticsearch做测试并且使用BulkProcessor
往dataset添加数据时, 你最好把concurrent requests
设置为0
于是bulk的flush operation将会用异步的方式执行:
BulkProcessor bulkProcessor = BulkProcessor.builder(client, new BulkProcessor.Listener() { /* Listener methods */ })
.setBulkActions(10000)
.setConcurrentRequests(0)
.build();
// Add your requests
bulkProcessor.add(/* Your requests */);
// Flush any remaining requests
bulkProcessor.flush();
// Or close the bulkProcessor if you don't need it anymore
bulkProcessor.close();
// Refresh your indices
client.admin().indices().prepareRefresh().get();
// Now you can start searching!
client.prepareSearch().get();