elasticsearch(五)java 使用批量操作bulk及注意事项

1,BulkRequest对象可以用来在一次请求中,执行多个索引、更新或删除操作

    且允许在一次请求中进行不同的操作,即一次请求中索引、更新、删除操作可以同时存在

BulkRequest bulkRequest = new BulkRequest();
bulkRequest.add(new DeleteRequest("posts", "doc", "300"));
bulkRequest.add(new UpdateRequest("posts", "doc", "2").doc(XContentType.JSON,"other", "test").fetchSource(true));
bulkRequest.add(new IndexRequest("posts", "doc", "4").source(XContentType.JSON,"field", "baz"));

2,关于BulkRequest的参数设置,除了使用BulkRequest add(IndexRequest request)等方法加入针对单个不同的文档操作请求外,其它通用参数设置同单个文档操作设置:

bulkRequest.timeout(TimeValue.timeValueMinutes(2));
bulkRequest.setRefreshPolicy(WriteRequest.RefreshPolicy.WAIT_UNTIL);

注意,针对单个文档操作的设置,应该在add方法里面设置,如为某个更新操作进行返回结果的设置【.fetchSource(true)】:

 bulkRequest.add(new UpdateRequest("posts", "doc", "2").doc(XContentType.JSON,"other", "test").fetchSource(true));

3,BulkResponse 作为执行结果的接收对象,它包含执行操作的信息,且可以使用它来遍历每个操作的执行结果

for (BulkItemResponse bulkItemResponse : bulkResponse) { 
    DocWriteResponse itemResponse = bulkItemResponse.getResponse(); 

    if (bulkItemResponse.getOpType() == DocWriteRequest.OpType.INDEX
            || bulkItemResponse.getOpType() == DocWriteRequest.OpType.CREATE) { 
        IndexResponse indexResponse = (IndexResponse) itemResponse;

    } else if (bulkItemResponse.getOpType() == DocWriteRequest.OpType.UPDATE) { 
        UpdateResponse updateResponse = (UpdateResponse) itemResponse;

    } else if (bulkItemResponse.getOpType() == DocWriteRequest.OpType.DELETE) { 
        DeleteResponse deleteResponse = (DeleteResponse) itemResponse;
    }
}

注意的是:bulkItemResponse.getOpType() 返回的是请求问题的add方法加入的操作,而不是实际对文档进行操作的值,如添加到请求中的操作为

bulkRequest.add(new IndexRequest("posts", "doc2", "1").source(XContentType.JSON,"field", "foo"));

要是文档不存在,会自动创建一个,此时如下代码是执行的,也就是判断是创建成功是正确的

if (bulkItemResponse.getOpType() == DocWriteRequest.OpType.INDEX || bulkItemResponse.getOpType() == DocWriteRequest.OpType.CREATE) {
     IndexResponse indexResponse = (IndexResponse) itemResponse;
     System.out.println("id=" + indexResponse.getId() + "的文档创建成功");
     System.out.println("id=" + indexResponse.getId() + "文档操作类型:" + indexResponse.getResult());
}

但是要是文档存在,原来的文档会被更新(而非创建),如上代码依然执行,而如下判断

bulkItemResponse.getOpType() == DocWriteRequest.OpType.UPDATE

返回的却是false,所以要是想知道文档实际被进行的操作,可以通过如下代码进行:

DocWriteResponse itemResponse = bulkItemResponse.getResponse();
IndexResponse indexResponse = (IndexResponse) itemResponse;
indexResponse.getResult()

其中itemResponse.getResult()和indexResponse.getResult()都可以获取实际的操作行为

4,如果elasticsearch服务器中不存在对应的值为1的文档id,会自动创建一个id为1的文档

同样,如果不存在posts文档库的话,也会根index/type/id据自动创建整个文档

bulkRequest.add(new IndexRequest("posts", "doc", "1").source(XContentType.JSON,"field", "foo"));

但是类似如下,如果posts文档库中如果已存在类型为doc的文档,则会报错

bulkRequest.add(new IndexRequest("posts", "doc2", "1").source(XContentType.JSON,"field", "foo"));

报错内容如下:

Rejecting mapping update to [posts] as the final mapping would have more than 1 type: [doc2, doc]

原因:在ElasticSearch6.0以后一个index下只能有一个type值,所以无法在posts下自动再创建一个新的类型的文档

代码中可以通过如下判断是否出现这种非法的执行操作:

                if (bulkItemResponse.getFailure() != null) {
                    BulkItemResponse.Failure failure = bulkItemResponse.getFailure();
                    if(failure.getStatus() == RestStatus.BAD_REQUEST) {
                        System.out.println("id=" + bulkItemResponse.getId() + "为非法的请求!");
                        continue;
                    }
                }

对于IndexRequest请求操作,如果希望创建文档,而文档要是存在时不要进行更新的话,可以进行如下设置:

bulkRequest.add(new IndexRequest("posts", "doc", "5").source(XContentType.JSON,"field", "foo").opType(DocWriteRequest.OpType.CREATE));

即添加.opType(DocWriteRequest.OpType.CREATE)设置,同时failure.getStatus() == RestStatus.CONFLICT设置不抛出异常

if (bulkItemResponse.getOpType() == DocWriteRequest.OpType.INDEX || bulkItemResponse.getOpType() == DocWriteRequest.OpType.CREATE) {
    if(bulkItemResponse.getFailure() != null && bulkItemResponse.getFailure().getStatus() == RestStatus.CONFLICT) {
       System.out.println("id=" + bulkItemResponse.getId() + "与现在文档冲突");
       continue;
    }
    IndexResponse indexResponse = (IndexResponse) itemResponse;
    System.out.println("id=" + indexResponse.getId() + "的文档创建成功");
    System.out.println("id=" + indexResponse.getId() + "文档操作类型:" + itemResponse.getResult());
}

 

5,对于删除操作,如果不作特别的判断,如下的话,会一直都是会进入if方法执行的(即使文档不存在)

if (bulkItemResponse.getOpType() == DocWriteRequest.OpType.DELETE) { 
    DeleteResponse deleteResponse = (DeleteResponse) itemResponse;
}

所以如果要想判断文档不存在的情况,则需要如下判断:

if (bulkItemResponse.getOpType() == DocWriteRequest.OpType.DELETE) {
    DeleteResponse deleteResponse = (DeleteResponse) itemResponse;
    if (deleteResponse.getResult() == DocWriteResponse.Result.NOT_FOUND) {
        System.out.println("id=" + deleteResponse.getId() + "的文档未找到,未执行删除!");
    }else {
        System.out.println("id=" + deleteResponse.getId() + "的文档删除成功");
    }
}

6,完整代码示例:

package com.example.elasticsearch.document;

import org.apache.http.HttpHost;
import org.elasticsearch.action.DocWriteRequest;
import org.elasticsearch.action.DocWriteResponse;
import org.elasticsearch.action.bulk.BulkItemResponse;
import org.elasticsearch.action.bulk.BulkRequest;
import org.elasticsearch.action.bulk.BulkResponse;
import org.elasticsearch.action.delete.DeleteRequest;
import org.elasticsearch.action.delete.DeleteResponse;
import org.elasticsearch.action.index.IndexRequest;
import org.elasticsearch.action.index.IndexResponse;
import org.elasticsearch.action.support.WriteRequest;
import org.elasticsearch.action.update.UpdateRequest;
import org.elasticsearch.action.update.UpdateResponse;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestClient;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.common.unit.TimeValue;
import org.elasticsearch.common.xcontent.XContentType;
import org.elasticsearch.rest.RestStatus;

/**
 * Created with IntelliJ IDEA.
 *
 * @Author: Weichang Zhong
 * @Date: 2018/11/7
 * @Time: 16:26
 * @Description:
 */
public class SynBulkRequest {

    public static void main(String[] args) {
        try (RestHighLevelClient client = new RestHighLevelClient(
                RestClient.builder(
                        new HttpHost("127.0.0.1", 9200, "http")
                )
        )) {

            BulkRequest bulkRequest = new BulkRequest();
            bulkRequest.add(new IndexRequest("posts", "doc", "5").source(XContentType.JSON,"field", "foo").opType(DocWriteRequest.OpType.CREATE));
            bulkRequest.add(new IndexRequest("posts2000", "doc", "2").source(XContentType.JSON,"field", "bar"));
            bulkRequest.add(new IndexRequest("posts", "doc", "3").source(XContentType.JSON,"field", "baz"));

            bulkRequest.add(new DeleteRequest("posts", "doc", "300"));
            bulkRequest.add(new UpdateRequest("posts", "doc", "2").doc(XContentType.JSON,"other", "test").fetchSource(true));
            bulkRequest.add(new IndexRequest("posts", "doc", "4").source(XContentType.JSON,"field", "baz"));

            bulkRequest.timeout(TimeValue.timeValueMinutes(2));
            bulkRequest.setRefreshPolicy(WriteRequest.RefreshPolicy.WAIT_UNTIL);
            BulkResponse bulkResponse = client.bulk(bulkRequest, RequestOptions.DEFAULT);

            for (BulkItemResponse bulkItemResponse : bulkResponse) {
                if (bulkItemResponse.getFailure() != null) {
                    BulkItemResponse.Failure failure = bulkItemResponse.getFailure();
                    System.out.println(failure.getCause());
                    if(failure.getStatus() == RestStatus.BAD_REQUEST) {
                        System.out.println("id=" + bulkItemResponse.getId() + "为非法的请求!");
                        continue;
                    }
                }

                DocWriteResponse itemResponse = bulkItemResponse.getResponse();

                if (bulkItemResponse.getOpType() == DocWriteRequest.OpType.INDEX || bulkItemResponse.getOpType() == DocWriteRequest.OpType.CREATE) {
                    if(bulkItemResponse.getFailure() != null && bulkItemResponse.getFailure().getStatus() == RestStatus.CONFLICT) {
                        System.out.println("id=" + bulkItemResponse.getId() + "与现在文档冲突");
                        continue;
                    }
                    IndexResponse indexResponse = (IndexResponse) itemResponse;
                    System.out.println("id=" + indexResponse.getId() + "的文档创建成功");
                    System.out.println("id=" + indexResponse.getId() + "文档操作类型:" + itemResponse.getResult());
                } else if (bulkItemResponse.getOpType() == DocWriteRequest.OpType.UPDATE) {
                    UpdateResponse updateResponse = (UpdateResponse) itemResponse;
                    System.out.println("id=" + updateResponse.getId() + "的文档更新成功");
                    System.out.println("id=" + updateResponse.getId() +"文档内容为:" + updateResponse.getGetResult().sourceAsString());
                } else if (bulkItemResponse.getOpType() == DocWriteRequest.OpType.DELETE) {
                    DeleteResponse deleteResponse = (DeleteResponse) itemResponse;
                    if (deleteResponse.getResult() == DocWriteResponse.Result.NOT_FOUND) {
                        System.out.println("id=" + deleteResponse.getId() + "的文档未找到,未执行删除!");
                    }else {
                        System.out.println("id=" + deleteResponse.getId() + "的文档删除成功");
                    }
                }
            }

        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

注意:bulk批量操作里是不允许执行get操作的,因为get操作和其它操作的参数是不同的,所以如下代码会报错:

bulkRequest.add(new GetRequest("posts", "doc", "22"));

 

你可能感兴趣的:(36,elasticsearch,elasticsearch)