ES 批量Bulk操作存储数据到ES数据丢失解决

        private static void output(SearchHit[] hits) {
            BulkRequestBuilder prepareBulk = writeClient.prepareBulk();
            for (SearchHit sh : hits) {
                prepareBulk.add(writeClient
                        .prepareIndex(writeIndex, sh.getType(), sh.getId())
                        .setSource(sh.getSource()));
            }
            if (prepareBulk.numberOfActions() >= 1) {
                prepareBulk.get();
            }
        }

如上代码使用prepareBulk()和prepareIndex()方法,发现当操作百万数据时,总是发生不定数据量的丢失,当修改为如下api,并同时优化bulk操作配置 问题解决:
 

        private static void output1(SearchHit[] hits) {
            BulkRequest bulkRequest = new BulkRequest();
            for (SearchHit sh : hits) {
                IndexRequest source1 = new IndexRequest(writeIndex,sh.getType(), sh.getId()).source(sh.getSource());
                bulkRequest.add(source1);
            }
            if (bulkRequest.numberOfActions() >= 1) {
                writeClient.bulk(bulkRequest);
            }
        }

优化bulk操作参考:

https://blog.csdn.net/liyantianmin/article/details/77935636

https://blog.csdn.net/aa5305123/article/details/86542105

首次索引设置副本数为0
threadpool.index.queue_size: 1000
indices.memory.index_buffer_size: 20%
index.translog.durability: async
index.translog.flush_threshold_size: 600MB
index.translog.flush_threshold_ops: 500000
threadpool.bulk.queue_size: 1000

当bulk批量进行index操作压力过大会查看http://10.69.46.34:9200//_nodes/stats?pretty会在bulk下看到大量reject,在代码中打印bulk操作的BulkResponse响应信息也会发现大量的EsRejectedExcutionException异常。这也是我丢失数据的主要原因。

 "bulk" : {
          "threads" : 4,
          "queue" : 0,
          "active" : 0,
          "rejected" : 1337077,
          "largest" : 4,
          "completed" : 22935863
        }

做上面的优化bulk配置后,rejected数量变为0,bulk写入时不在丢失数据:

"bulk" : {
          "threads" : 4,
          "queue" : 0,
          "active" : 0,
          "rejected" : 0,
          "largest" : 4,
          "completed" : 149784
        },

你可能感兴趣的:(ES)