flink 批量写入Elasticsearch报 I/O reactor status: STOPPED

错误堆栈如下

java.lang.IllegalStateException: Request cannot be executed; I/O reactor status: STOPPED
        at org.apache.http.util.Asserts.check(Asserts.java:46)
        at org.apache.http.impl.nio.client.CloseableHttpAsyncClientBase.ensureRunning(CloseableHttpAsyncClientBase.java:90)
        at org.apache.http.impl.nio.client.InternalHttpAsyncClient.execute(InternalHttpAsyncClient.java:123)
        at org.elasticsearch.client.RestClient.lambda$performRequestAsync$0(RestClient.java:327)
        at org.elasticsearch.client.Cancellable.runIfNotCancelled(Cancellable.java:81)
        at org.elasticsearch.client.RestClient.performRequestAsync(RestClient.java:325)
        at org.elasticsearch.client.RestClient.performRequestAsync(RestClient.java:314)
        at org.elasticsearch.client.RestHighLevelClient.internalPerformRequestAsync(RestHighLevelClient.java:1653)
        at org.elasticsearch.client.RestHighLevelClient.performRequestAsync(RestHighLevelClient.java:1614)
        at org.elasticsearch.client.RestHighLevelClient.performRequestAsyncAndParseEntity(RestHighLevelClient.java:1580)
        at org.elasticsearch.client.RestHighLevelClient.bulkAsync(RestHighLevelClient.java:509)
        at org.apache.flink.streaming.connectors.elasticsearch7.Elasticsearch7ApiCallBridge.lambda$createBulkProcessorBuilder$0(Elasticsearch7ApiCallBridge.java:83)
        at org.elasticsearch.action.bulk.Retry$RetryHandler.execute(Retry.java:205)
        at org.elasticsearch.action.bulk.Retry.withBackoff(Retry.java:59)
        at org.elasticsearch.action.bulk.BulkRequestHandler.execute(BulkRequestHandler.java:62)
        at org.elasticsearch.action.bulk.BulkProcessor.execute(BulkProcessor.java:455)
        at org.elasticsearch.action.bulk.BulkProcessor.internalAdd(BulkProcessor.java:389)
        at org.elasticsearch.action.bulk.BulkProcessor.add(BulkProcessor.java:361)
        at org.elasticsearch.action.bulk.BulkProcessor.add(BulkProcessor.java:347)
        at org.apache.flink.streaming.connectors.elasticsearch7.Elasticsearch7BulkProcessorIndexer.add(Elasticsearch7BulkProcessorIndexer.java:72)
        at org.leaf.MyElasticsearchSinkFunction.process(MyElasticsearchSinkFunction.java:27)
        at org.leaf.MyElasticsearchSinkFunction.process(MyElasticsearchSinkFunction.java:12)
        at org.apache.flink.streaming.connectors.elasticsearch.ElasticsearchSinkBase.invoke(ElasticsearchSinkBase.java:310)
        at org.apache.flink.streaming.api.operators.StreamSink.processElement(StreamSink.java:56)
        at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.pushToOperator(OperatorChain.java:641)
        ^Cat org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:616)
        at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:596)
        at org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:730)
        at org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:708)
        at org.apache.flink.streaming.api.operators.StreamMap.processElement(StreamMap.java:41)
        at org.apache.flink.streaming.runtime.tasks.OneInputStreamTask$S

百度上搜来搜去,都是说在高并发的情况下RestClient被干掉了,但是怎么干掉的没说,经过排查发现,发现是自己代码的BUG,在Elasticsearch Sink里面设置了失败回调,导致ES连接报错的时候,错误直接丢在失败回调里面了,然而程序没有再次将异常抛出,所以导致flink认为程序是正常的,一直发送数据,但是ES的RestClient已经在第一次出现异常的时候被关闭了(底层因为异常关闭的)。

flink里面跑的程序,应该开启checkpoint并设置重启策略,当程序出现无法处理的异常,依靠flink重启程序。

目前排查发现,和httpasyncclient有关。Elasticsearch High Api 库,底层的批量插入是调用异步API发起,在flink里面,如果没有配置对应的IO线程数,默认是当前服务器所有的核心数。

这就会导致flink的代码里面,Elasticsearch Sink的并行度不能设置太大。不然会报以下错误

java.lang.OutOfMemoryError: unable to create native thread: possibly out of memory or process/resource limits reached
   at java.lang.Thread.start0(Native Method) ~[?:?]
   at java.lang.Thread.start(Thread.java:801) [?:?]
   at org.apache.lucene.index.ConcurrentMergeScheduler.merge(ConcurrentMergeScheduler.java:554) ~[lucene-core-8.5.1.jar:8.5.1 edb9fc409398f2c3446883f9f80595c884d245d0 - ivera - 2020-04-08 08:55:42]
   at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:670) ~[lucene-core-8.5.1.jar:8.5.1 edb9fc409398f2c3446883f9f80595c884d245d0 - ivera - 2020-04-08 08:55:42]

另外还需要关注一下线程堆栈大小、系统最大进程数、JVM堆等参数。

你可能感兴趣的:(大数据,flink,elasticsearch)