elasticsearch 插入源码分析 以及一点思考

官方API:https://www.elastic.co/guide/en/elasticsearch/client/java-rest/current/java-rest-high-document-bulk.html

之前和ES support的小伙伴说是我现在就是异步的,会自动进行大小验证,数量验证,然后进行数据提交,而不是同步的。之前因为忙,一直没调研。后面有时间就调研了一下,果然这位同学,看错了。
es rest client 现在bulk api,默认是同步的,而不是异步的。
bulksync才是,所以我把代码拔的过程,重新梳理一遍让自己思路更清楚一点。

但是这里有一个问题,是为什么要异步了?其实这里就是有一个概念,首先从ES插入来说
ES每次提交一个请求都会有一个segement file,如果1000条数据,提交1000次和提交1次里面是1000的处理的性能是完全不同的。当然ES是有segement合并,但是合并就有性能损耗
当然代码我还没看过,所以可以参考一下,其他大佬的文章:
https://blog.csdn.net/jiaojiao521765146514/article/details/83753215
但是个人我,在使用中出现了类似的问题。由于上次调用方无法每次大量数据,拆分成多个小请求,导致偶尔请求高峰,但是我有重试机制,但是长此以往对于扩展肯定不好。所以就带来了下面的研究。

然后在源码分析之前:再来分享一下自己对于ES查询的优化想法。ES查询是倒排索引,以前我总以为直接是把倒排索引直接放内存,放不下就进硬盘。然后一直感觉想法不对。查了一下lucene文档,感觉真的聪明,term dictionry是很大,那么做一个term index来做。
然后term index的索引做成一个二叉树,这样子就牛逼了。时间复杂度又变成了logN.
logN带来的结果是即便数据量再大,也能压缩到很少的。

然后这个推荐看 极客时间的:算法 还有 漫画算法。
算法导论,我是真的服气。刷不完。最主要是算法题还有一个很重要的功能就是面试。哈哈哈。祝福自己面试顺利

然后再补充一下:缓存概念,es缓存分为了 query cache,request cache,fielddata data
所以完全可以把常用的index尽量合并,来提高命中率。同时提高内存。
这样子每次驱逐的是使用低频的index.

/**
 * @deprecated If creating a new HLRC ReST API call, consider creating new actions instead of reusing server actions. The Validation
 * layer has been added to the ReST client, and requests should extend {@link Validatable} instead of {@link ActionRequest}.
 */
@Deprecated
protected final  Resp performRequest(Req request,
                                                                       CheckedFunction requestConverter,
                                                                       RequestOptions options,
                                                                       CheckedFunction responseConverter,
                                                                       Set ignores) throws IOException {
             
             //这里进行request校验
    ActionRequestValidationException validationException = request.validate();
    if (validationException != null && validationException.validationErrors().isEmpty() == false) {
        throw validationException;
    }
    return internalPerformRequest(request, requestConverter, options, responseConverter, ignores);
}
@Override
//注意这里是子类进行的重写。父亲节点
public ActionRequestValidationException validate() {
    ActionRequestValidationException validationException = null;
    if (requests.isEmpty()) {
        validationException = addValidationError("no requests added", validationException);
    }
    for (DocWriteRequest request : requests) {
        // We first check if refresh has been set
        if (((WriteRequest) request).getRefreshPolicy() != RefreshPolicy.NONE) {
            validationException = addValidationError(
                    "RefreshPolicy is not supported on an item request. Set it on the BulkRequest instead.", validationException);
        }
        //这里的validate是基于iindex的了
        ActionRequestValidationException ex = ((WriteRequest) request).validate();
        if (ex != null) {
            if (validationException == null) {
                validationException = new ActionRequestValidationException();
            }
            validationException.addValidationErrors(ex.validationErrors());
        }
    }

    return validationException;
}

比如这里我使用的都是IndexRequest,一些基本类型的判断,其实没啥价值

@Override
public ActionRequestValidationException validate() {
    ActionRequestValidationException validationException = super.validate();
    if (source == null) {
        validationException = addValidationError("source is missing", validationException);
    }
    if (Strings.isEmpty(type())) {
        validationException = addValidationError("type is missing", validationException);
    }
    if (contentType == null) {
        validationException = addValidationError("content type is missing", validationException);
    }
    final long resolvedVersion = resolveVersionDefaults();
    if (opType() == OpType.CREATE) {
        if (versionType != VersionType.INTERNAL) {
            validationException = addValidationError("create operations only support internal versioning. use index instead",
                validationException);
            return validationException;
        }

        if (resolvedVersion != Versions.MATCH_DELETED) {
            validationException = addValidationError("create operations do not support explicit versions. use index instead",
                validationException);
            return validationException;
        }

        if (ifSeqNo != UNASSIGNED_SEQ_NO || ifPrimaryTerm != UNASSIGNED_PRIMARY_TERM) {
            validationException = addValidationError("create operations do not support compare and set. use index instead",
                validationException);
            return validationException;
        }
    }

    if (opType() != OpType.INDEX && id == null) {
        addValidationError("an id is required for a " + opType() + " operation", validationException);
    }

    validationException = DocWriteRequest.validateSeqNoBasedCASParams(this, validationException);

    if (id != null && id.getBytes(StandardCharsets.UTF_8).length > 512) {
        validationException = addValidationError("id is too long, must be no longer than 512 bytes but was: " +
                        id.getBytes(StandardCharsets.UTF_8).length, validationException);
    }

    if (id == null && (versionType == VersionType.INTERNAL && resolvedVersion == Versions.MATCH_ANY) == false) {
        validationException = addValidationError("an id must be provided if version type or value are set", validationException);
    }

    if (pipeline != null && pipeline.isEmpty()) {
        validationException = addValidationError("pipeline cannot be an empty string", validationException);
    }


    return validationException;
}

然后就是
internalPerformRequest(request, requestConverter, options, responseConverter, ignores);
直接往下面找到真正封装的调用代码,虽然调用的是异步client,但是最终加了get,导致它变为了阻塞

private Response performRequest(final NodeTuple> nodeTuple,
                                final InternalRequest request,
                                Exception previousException) throws IOException {
    RequestContext context = request.createContextForNextAttempt(nodeTuple.nodes.next(), nodeTuple.authCache);
    HttpResponse httpResponse;
    try {
        //这个client是CloseableHttpAsyncClient,但是后面用了get(),其实也就是同步的
        httpResponse = client.execute(context.requestProducer, context.asyncResponseConsumer, context.context, null).get();
    } catch(Exception e) {
        RequestLogger.logFailedRequest(logger, request.httpRequest, context.node, e);
        onFailure(context.node);
        Exception cause = extractAndWrapCause(e);
        addSuppressedException(previousException, cause);
        if (nodeTuple.nodes.hasNext()) {
            return performRequest(nodeTuple, request, cause);
        }
        if (cause instanceof IOException) {
            throw (IOException) cause;
        }
        if (cause instanceof RuntimeException) {
            throw (RuntimeException) cause;
        }
        throw new IllegalStateException("unexpected exception type: must be either RuntimeException or IOException", cause);
    }
    ResponseOrResponseException responseOrResponseException = convertResponse(request, context.node, httpResponse);
    if (responseOrResponseException.responseException == null) {
        return responseOrResponseException.response;
    }
    addSuppressedException(previousException, responseOrResponseException.responseException);
    if (nodeTuple.nodes.hasNext()) {
        return performRequest(nodeTuple, request, responseOrResponseException.responseException);
    }
    throw responseOrResponseException.responseException;
}

上面代码关键。

补充

基于3个维度
1.提交多少次
2.文件总共大小
3.每隔多久刷新一次
使用是通过bulkProcessor

public static BulkProcessor getBulkProcessor(RestHighLevelClient restHighLevelClient) {

    BiConsumer> bulkConsumer =
            (request, bulkListener) -> restHighLevelClient.bulkAsync(request, RequestOptions.DEFAULT, bulkListener);

    return BulkProcessor.builder(bulkConsumer, new BulkProcessor.Listener() {

        @Override
        public void beforeBulk(long executionId, BulkRequest bulkRequest) {
     }

        @Override
        public void afterBulk(long executionId, BulkRequest bulkRequest, BulkResponse bulkResponse) {
}
        @Override
        public void afterBulk(long executionId, BulkRequest bulkRequest, Throwable throwable) {
        }
    }).setBulkActions(5000)
            .setFlushInterval(TimeValue.timeValueSeconds(10))
            .build();

}


public static Builder builder(BiConsumer> consumer, Listener listener) {
    Objects.requireNonNull(consumer, "consumer");
    Objects.requireNonNull(listener, "listener");
    //定时任务线程池,但是可以
    final ScheduledThreadPoolExecutor scheduledThreadPoolExecutor = Scheduler.initScheduler(Settings.EMPTY);
    return new Builder(consumer, listener,
        buildScheduler(scheduledThreadPoolExecutor),
            () -> Scheduler.terminate(scheduledThreadPoolExecutor, 10, TimeUnit.SECONDS));
}

static ScheduledThreadPoolExecutor initScheduler(Settings settings) {
    //核心线程为1,
    final ScheduledThreadPoolExecutor scheduler = new SafeScheduledThreadPoolExecutor(1,
            EsExecutors.daemonThreadFactory(settings, "scheduler"), new EsAbortPolicy());
   //关闭任务,延时任务依旧执行
     scheduler.setExecuteExistingDelayedTasksAfterShutdownPolicy(false);
    //关闭任务,依旧执行定时任务
    scheduler.setContinueExistingPeriodicTasksAfterShutdownPolicy(false);
    //取消任务并不删除
    scheduler.setRemoveOnCancelPolicy(true);
    return scheduler;
}

//Es线程名,所以查询直接按照Es的名称可以很容易看到哪些是ES线程
public static String threadName(Settings settings, String namePrefix) {
    if (Node.NODE_NAME_SETTING.exists(settings)) {
        return threadName(Node.NODE_NAME_SETTING.get(settings), namePrefix);
    } else {
        // TODO this should only be allowed in tests
        return threadName("", namePrefix);
    }
}
public static String threadName(final String nodeName, final String namePrefix) {
    // TODO missing node names should only be allowed in tests
    return "elasticsearch" + (nodeName.isEmpty() ? "" : "[") + nodeName + (nodeName.isEmpty() ? "" : "]") + "[" + namePrefix + "]";
}

static class EsThreadFactory implements ThreadFactory {

    final ThreadGroup group;
    final AtomicInteger threadNumber = new AtomicInteger(1);
    final String namePrefix;

    EsThreadFactory(String namePrefix) {
        this.namePrefix = namePrefix;
        SecurityManager s = System.getSecurityManager();
        group = (s != null) ? s.getThreadGroup() :
                Thread.currentThread().getThreadGroup();
    }

    @Override
    public Thread newThread(Runnable r) {
        //设置为守护线程
        Thread t = new Thread(group, r,
                namePrefix + "[T#" + threadNumber.getAndIncrement() + "]",
                0);
        t.setDaemon(true);
        return t;
    }

}
//报错策略是为自定义的会强行重试一波,但是这里没有使用。估计是考虑到大并发的情况。猜测
public void rejectedExecution(Runnable r, ThreadPoolExecutor executor) {
    if (r instanceof AbstractRunnable && ((AbstractRunnable)r).isForceExecution()) {
        BlockingQueue queue = executor.getQueue();
        if (!(queue instanceof SizeBlockingQueue)) {
            throw new IllegalStateException("forced execution, but expected a size queue");
        } else {
            try {
                ((SizeBlockingQueue)queue).forcePut(r);
            } catch (InterruptedException var5) {
                Thread.currentThread().interrupt();
                throw new IllegalStateException("forced execution, but got interrupted", var5);
            }
        }
    } else {
        this.rejected.inc();
        throw new EsRejectedExecutionException("rejected execution of " + r + " on " + executor, executor.isShutdown());
    }
}

关键的builder方法
eturn new Builder(consumer, listener,
buildScheduler(scheduledThreadPoolExecutor),
() -> Scheduler.terminate(scheduledThreadPoolExecutor, 10, TimeUnit.SECONDS));

这里buildeSchedual和 terminate我觉得非常有意思。

Scheduler.terminate是去关闭线程池。
bulkProcessor.add(new IndexRequest())

private void internalAdd(DocWriteRequest request) {
    //bulkRequest and instance swapping is not threadsafe, so execute the mutations under a lock.
    //once the bulk request is ready to be shipped swap the instance reference unlock and send the local reference to the handler.
    Tuple bulkRequestToExecute = null;
    lock.lock();
    try {
        //判断qschedual线程池是否已经关闭,关闭直接抛异常出去
        ensureOpen();
        //bulkRueqest底层是用hashSet来add,Yonglock来保证线程安全
         bulkRequest.add(request);
         //就是通过这个来判断是否超过了bulkSize/bulkActionNum
         //所以多线程也是数据量多到限制才开多线程。牛逼。和线程池的逻辑一样
        bulkRequestToExecute = newBulkRequestIfNeeded();
    } finally {
        lock.unlock();
    }
    //execute sending the local reference outside the lock to allow handler to control the concurrency via it's configuration.
    if (bulkRequestToExecute != null) {
        execute(bulkRequestToExecute.v1(), bulkRequestToExecute.v2());
    }
}


private Tuple newBulkRequestIfNeeded(){
    ensureOpen();
    if (!isOverTheLimit()) {
        return null;
    }
    final BulkRequest bulkRequest = this.bulkRequest;
    this.bulkRequest = bulkRequestSupplier.get();
    return new Tuple<>(bulkRequest,executionIdGen.incrementAndGet()) ;
}

private boolean isOverTheLimit() {
    if (bulkActions != -1 && bulkRequest.numberOfActions() >= bulkActions) {
        return true;
    }
    if (bulkSize != -1 && bulkRequest.estimatedSizeInBytes() >= bulkSize) {
        return true;
    }
    return false;
}

然后就是execute,然后我觉得这里做切面就非常的耿直,也是我最喜欢的,直接那代码去做的静态代码。
good work

public void execute(BulkRequest bulkRequest, long executionId) {
    Runnable toRelease = () -> {};
    boolean bulkRequestSetupSuccessful = false;
    try {
        //Before bulk切面 
        listener.beforeBulk(executionId, bulkRequest);
        //通过信号量来设置同时可以并发发送,有多少线程提交
        semaphore.acquire();
        toRelease = semaphore::release;
        //这里其实latch,了必须等这里结束之后,才能进行下一步,来做到了每个action的同步返回出错或者成功
        CountDownLatch latch = new CountDownLatch(1);
        //这里就是去提交
        retry.withBackoff(consumer, bulkRequest, ActionListener.runAfter(new ActionListener() {
            @Override
            public void onResponse(BulkResponse response) {
                listener.afterBulk(executionId, bulkRequest, response);
            }

            @Override
            public void onFailure(Exception e) {
                listener.afterBulk(executionId, bulkRequest, e);
            }
        }, () -> {
            semaphore.release();
            latch.countDown();
        }));
        bulkRequestSetupSuccessful = true;
        if (concurrentRequests == 0) {
            latch.await();
        }
    } catch (InterruptedException e) {
        Thread.currentThread().interrupt();
        logger.info(() -> new ParameterizedMessage("Bulk request {} has been cancelled.", executionId), e);
        listener.afterBulk(executionId, bulkRequest, e);
    } catch (Exception e) {
        logger.warn(() -> new ParameterizedMessage("Failed to execute bulk request {}.", executionId), e);
        listener.afterBulk(executionId, bulkRequest, e);
    } finally {
        if (bulkRequestSetupSuccessful == false) {  // if we fail on client.bulk() release the semaphore
            toRelease.run();
        }
    }
}
//注意最后的外部类
//这里有过一个qschedual注意,这里是只是失败重试,而不是10s之内刷新数据
public void execute(BulkRequest bulkRequest) {
    this.currentBulkRequest = bulkRequest;
    consumer.accept(bulkRequest, this);
}


//下面的方法
@Override
public void onResponse(BulkResponse bulkItemResponses) {
    if (!bulkItemResponses.hasFailures()) {
        // we're done here, include all responses
        addResponses(bulkItemResponses, (r -> true));
        finishHim();
    } else {
        if (canRetry(bulkItemResponses)) {
            addResponses(bulkItemResponses, (r -> !r.isFailed()));
            retry(createBulkRequestForRetry(bulkItemResponses));
        } else {
            addResponses(bulkItemResponses, (r -> true));
            finishHim();
        }
    }
}

@Override
public void onFailure(Exception e) {
    if (e instanceof RemoteTransportException && ((RemoteTransportException) e).status() == RETRY_STATUS && backoff.hasNext()) {
        retry(currentBulkRequest);
    } else {
        try {
            listener.onFailure(e);
        } finally {
            if (retryCancellable != null) {
                retryCancellable.cancel();
            }
        }
    }
}

然后跟踪下去就到了之前lambda传入的方法restHighLevelClient.bulkAsync(request, RequestOptions.DEFAULT, bulkListener)

和之前区别不大。CloseableHttpAsyncClient都是这个。
但是相比之前的区别是什么了?在于Listen

client.execute(context.requestProducer, context.asyncResponseConsumer, context.context, new FutureCallback() {
    @Override
    public void completed(HttpResponse httpResponse) {
        try {
            ResponseOrResponseException responseOrResponseException = convertResponse(request, context.node, httpResponse);
            if (responseOrResponseException.responseException == null) {
                listener.onSuccess(responseOrResponseException.response);
            } else {
                if (nodeTuple.nodes.hasNext()) {
                    listener.trackFailure(responseOrResponseException.responseException);
                    performRequestAsync(nodeTuple, request, listener);
                } else {
                    listener.onDefinitiveFailure(responseOrResponseException.responseException);
                }
            }
        } catch(Exception e) {
            listener.onDefinitiveFailure(e);
        }
    }

    @Override
    public void failed(Exception failure) {
        try {
            RequestLogger.logFailedRequest(logger, request.httpRequest, context.node, failure);
            onFailure(context.node);
            if (nodeTuple.nodes.hasNext()) {
                listener.trackFailure(failure);
                performRequestAsync(nodeTuple, request, listener);
            } else {
                listener.onDefinitiveFailure(failure);
            }
        } catch(Exception e) {
            listener.onDefinitiveFailure(e);
        }
    }

    @Override
    public void cancelled() {
        listener.onDefinitiveFailure(new ExecutionException("request was cancelled", null));
    }
});

其实现在问题来了,维度我们知道在add的时候去触发大小和数量的维度来计算。
那么怎么去计算主动刷新的维度了?

.setBulkActions(5000)
.setFlushInterval(TimeValue.timeValueSeconds(10))
.build();

其实楼上有一个代码分析是强行关闭,但是无法关闭返回falst,后面的add才能进行的。那么如何保证无法被强行关闭了。
就是在Build这里调用构造函数的时候,进行任务刷新。

BulkProcessor(BiConsumer> consumer, BackoffPolicy backoffPolicy, Listener listener,
              int concurrentRequests, int bulkActions, ByteSizeValue bulkSize, @Nullable TimeValue flushInterval,
              Scheduler scheduler, Runnable onClose, Supplier bulkRequestSupplier) {
    this.bulkActions = bulkActions;
    this.bulkSize = bulkSize.getBytes();
    this.bulkRequest = bulkRequestSupplier.get();
    this.bulkRequestSupplier = bulkRequestSupplier;
    this.bulkRequestHandler = new BulkRequestHandler(consumer, backoffPolicy, listener, scheduler, concurrentRequests);
    // Start period flushing task after everything is setup
    this.cancellableFlushTask = startFlushTask(flushInterval, scheduler);
    this.onClose = onClose;
}

//这里进行任务刷新
private Scheduler.Cancellable startFlushTask(TimeValue flushInterval, Scheduler scheduler) {
    if (flushInterval == null) {
        return new Scheduler.Cancellable() {
            @Override
            public boolean cancel() {
                return false;
            }

            @Override
            public boolean isCancelled() {
                return true;
            }
        };
    }
    final Runnable flushRunnable = scheduler.preserveContext(new Flush());
    return scheduler.scheduleWithFixedDelay(flushRunnable, flushInterval, ThreadPool.Names.GENERIC);
}

其实通过从上面的代码分析中,我得到了我想要的结论
1.刷新3个维度
2.多线程如何开启。
3.错误如何重试

你可能感兴趣的:(java,大数据,源码学习)