Es客户端学习

版本:opensearch-rest-high-level-client-2.3.0.jar,httpcore-nio-4.4.11.jar,httpasyncclient-4.1.4.jar

问题背景

初始化es索引逻辑是监听大数据团队消息,然后异步写入es(org.opensearch.client.RestHighLevelClient#bulkAsync),qps很低就将服务cpu接近打满,通过排查问题原因是消息消费很快,es写入有瓶颈,由于是异步写入,那么请求都积压在服务导致服务内存不足频繁GC,进而导致cpu飙高,那么es客户端是如何初始化的?异步线程池是什么类型队列为什么不会积压阻塞?那么带着问题去学习事半功倍

ES客户端初始化流程

Es客户端学习_第1张图片

  1. 创建异步Http客户端
    1. customizeRequestConfig回调
    2. customizeHttpClient回调
  2. 创建IOReactor:ConnectingIOReactor
  3. 创建连接管理器
  4. 创建http异步客户端:InternalHttpAsyncClient

创建IOReactor

默认实现类:DefaultConnectingIOReactor

  1. requestQueue:SessionRequest队列,即:客户端与服务器建立会话连接请求队列(SessionRequest interface represents a request to establish a new connection (or session) to a remote host)
  2. threadFactory:dispatcher线程池,兜底命名规则:I/O dispatcher
  3. dispatchers:dispatcher处理器(BaseIOReactor)
    1. eventDispatch(IOEventDispatch):IOEvent分发调度器,动作:connected/inputReady/outputReady/timeout/disconnected
    2. exceptionHandler:IO异常处理器,当收到异常时决定是否继续执行I/O reactor,异常类型:IOException/RuntimeException
  4. workers:worker线程,BaseIOReactor与IOEventDispatch桥接线程。自旋等待处理select数据
    1. this.selector.select
    2. 如果是SHUT_DOWN状态,终止自旋
    3. 如果是SHUTTING_DOWN状态,closeSessions,closeNewChannels
    4. processEvents:处理事件,BaseIOReactor仅处理读写事件,acceptable,connectable事件不处理,客户端也不需要处理·.·
    5. validate:Validate active channels
    6. processClosedSessions:处理已关闭会话
    7. 如果是ACTIVE状态:org.apache.http.impl.nio.reactor.AbstractIOReactor#processNewChannels:处理新channel如果存在的话,即如果存在新channel将当前selector注册至channel(类型:SelectionKey.OP_READ,监听channel事件),并将注册的SelectionKey封装为IOSessionImpl添加至org.apache.http.impl.nio.reactor.AbstractIOReactor#sessions,将IOSessionImpl附加(attach)至刚刚注册的SelectionKey
    8. 如果是ACTIVE状态,并且sessions为空,优雅停机:Exit select loop if graceful shutdown has been completed
    9. 如果开启了interestOpsQueueing:processPendingInterestOps,默认关闭
  5. threads:使用threadFactory命名规则封装的Worker线程缓存

创建连接管理器connmgr

实现类:PoolingNHttpClientConnectionManager

  1. ioreactor : DefaultConnectingIOReactor
  2. configData
  3. pool:连接池实现类CPool
    1. connFactory:实现类InternalConnectionFactory
    2. addressResolver
    3. sessionRequestCallback:实现类InternalSessionRequestCallback,在会话请求(org.apache.http.nio.reactor.ConnectingIOReactor#connect)完成时处理pending队列中的待处理请求
    4. routeToPool:route与pool映射
    5. leasingRequests:实现类new LinkedList>()
    6. pending:实现类new HashSet()
    7. leased:实现类new HashSet()
    8. available:new LinkedList()
    9. completedRequests:new ConcurrentLinkedQueue>()
    10. maxPerRoute:new HashMap()
    11. defaultMaxPerRoute:2
    12. maxTotal:20
  4. iosessionFactoryRegistry:RegistryBuilder.create()…build()

创建http异步客户端

实现类:InternalHttpAsyncClient

  1. connmgr:连接管理器
  2. connManagerShared:默认false,Defines the connection manager is to be shared by multiple client instances
  3. threadFactory
    1. connManagerShared默认false,使用配置线程工厂,兜底Executors.defaultThreadFactory()
    2. 否则为null
  4. eventHandler
    1. connManagerShared默认false,使用配置事件处理器,兜底HttpAsyncRequestExecutor
    2. 否则为null
  5. reactorThread
    1. 如果threadFactory与eventHandler不为空,使用threadFacotry创建匿名线程,线程使用connmgr执行内部IO事件分发调度(new InternalIODispatch(eventHandler))
    2. 否则为null
  6. status:初始化状态为INACTIVE
  7. exec:实现类MainClientExec(路由策略在prepare阶段决定org.apache.http.impl.nio.client.MainClientExec#prepare:routePlanner=new DefaultRoutePlanner(schemePortResolver=DefaultSchemePortResolver.INSTANCE))

创建Rest客户端

  1. client:http异步客户端
  2. nodeSelector:默认org.opensearch.client.NodeSelector#ANY
  3. chunkedEnabled:默认为空
  4. nodes:集群节点HttpHost.create(serverCluster),例如:https://my-es0-cluster.com
  5. nodeTuple:nodes与authCache映射
  6. compressionEnabled:如果chunkedEnabled不为空,取chunkedEnabled配置,否则为false

启动http异步客户端

实现类:InternalHttpAsyncClient

启动异步客户端
  1. 将客户端状态由INACTIVE设置为ACTIVE
  2. 如果reactorThread不为空,启动reactorThread线程
    1. ioreactor(DefaultConnectingIOReactor)执行内部事件(InternalIODispatch)分发:DefaultConnectingIOReactor.AbstractMultiworkerIOReactor.execute(InternalIODispatch)
    2. 默认根据服务核心数量(n)创建n个BaseIOReactor,并将n个BaseIOReactor与InternalIODispatch封装为n个Worker
    3. 遍历并启动Worker
    4. 自旋等待select事件:selector.select(mac系统默认实现类:KQueueSelectorImpl)
    5. DefaultConnectingIOReactor处理select事件(下面再聊这块的流程):org.apache.http.impl.nio.reactor.DefaultConnectingIOReactor#processEvents
    6. 遍历workers,如果存在异常处理异常
  3. 客户端至此创建并初始化完成返回RestClient
DefaultConnectingIOReactor处理select事件

org.apache.http.impl.nio.reactor.DefaultConnectingIOReactor#processEvents

  1. 处理会话请求:org.apache.http.impl.nio.reactor.DefaultConnectingIOReactor#processSessionRequests
  2. 遍历selectedKeys处理数据:org.apache.http.impl.nio.reactor.DefaultConnectingIOReactor#processEvent
    1. 如果可以连接isConnectable,建立连接
    2. 如果连接未完成(未中断,例如:超时/IO异常/主动释放/主动取消),将会话具柄(SessionRequestHandle)中的会话请求(SessionRequestImpl)封装为通道条目ChannelEntry,添加至BaseIOReactor的新通道(newChannels)
    3. 唤醒selector等待处理通道数据,即启动异步客户端中的2.c步骤逻辑
  3. 处理超时数据:org.apache.http.impl.nio.reactor.DefaultConnectingIOReactor#processTimeouts
BaseIOReactor处理select事件
  1. 执行父类方法处理事件:org.apache.http.impl.nio.reactor.AbstractIOReactor#processEvents
  2. 遍历SelectionKey数据处理读写等事件
  3. 处理读写等数据均回调eventDispatch(即InternalIODispatch,handler=HttpAsyncRequestExecutor)处理,例如:org.apache.http.impl.nio.reactor.BaseIOReactor#readable,回调eventDispatch.inputReady

ES客户端线程设计

Es客户端学习_第2张图片

异步写流程bulkAsync

Es客户端学习_第3张图片

RouteSpecificPool

  1. available:pool entry可复用链表
  2. leased:已租借集合,可复用链表转移而来
  3. pending:待处理会话请求与回调映射(Map pending),key值等同CPool.pengding,待处理的请求,value为获取连接的Future

方法

  1. getFree:返回空闲资源,不为空则将LeaseRequest设置为已完成(即isDone=true)。并且将CPool的该资源由available转移至leased

CPool

  1. maxTotal:全局最大可用资源数量
  2. available:pool entry可复用链表
  3. leased:已租借集合,可复用链表转移而来。包含RouteSpecificPool:leased结合
  4. pending:待处理会话请求集合,当RouteSpecificPool已分配数量小于maxPerRoute并且maxTotal存在剩余空间,创建的待连接的会话请求
  5. leasingRequests:租赁中的请求链表,当LeaseRequest请求未完成,并且processPendingRequest未完成时,将LeaseRequest添加至该链表
  6. completedRequests:已完成请求队列,当LeaseRequest请求完成时,将LeaseRequest添加至该队列
  7. totalUsed=maxTotal-pending-leased

方法

  1. fireCallbacks:处理completedRequests队列
  2. release:释放已租借资源

LeaseRequest

  1. future:状态机变化时回调该动作,例如:完结时回调PoolingNHttpClientConnectionManager.FutureCallback#completed->DefaultClientExchangeHandlerImpl#requestConnection->connectionAllocated->将LeaseRequest请求数据写入es服务端等待响应

小结

  1. leased:已租借的连接资源。在请求响应完成时回调callback释放已租借资源,例如:org.apache.http.impl.nio.client.DefaultClientExchangeHandlerImpl#responseCompleted/org.apache.http.impl.nio.client.MainClientExec#responseCompleted。一些关闭或异常场景也会释放连接资源,例如:org.apache.http.impl.nio.client.AbstractClientExchangeHandler#discardConnection
  2. available:可租借的连接资源。如果存在直接获取链接,将其转移至leased
  3. leasingRequests:正在租借连接资源的请求,等待有可用连接时处理该请求,并将该请求转移至leased或pending
  4. completedRequests:已完成租借并且已与服务端建立连接的请求,等待客户端请求写入服务端完成,服务端响应数据后将其再次出借或释放
  5. pending:已完成租借,且正在与服务端建立连接中的请求,建立完成后回调org.apache.http.nio.pool.AbstractNIOConnPool#requestCompleted租借连接将客户端请求写入服务端,租借成功则转移至leased,否则释放资源转移至available

问题推断

结合问题背景,有朋友可能已经有原因推断了,如果消息不断产生,生产的速度大于异步写es数据的速度,那么积压的请求都会堆积在leasingRequests这个无限链表里,那么就会出现gc频繁,并且无法回收,导致cpu飙高,降低服务的并发与吞吐量

问题复现

代码很简单,并发100线程异步写es数据,因为是异步写入es,因此不会阻塞,会很快完成写入,在完成1W条数据写入时可以手工dump内存快照

int size = 100;
ExecutorService es = Executors.newFixedThreadPool(size);
Thread.sleep(10 * 1000);
for (int i = 0; i < 10000; i++) {
  int finalI = i;
  es.submit(() -> {
    try {
      indexWriterService.indexDelete(docMessage);
    } catch (Exception e) {
      throw new RuntimeException(e);
    }
    System.out.println(System.currentTimeMillis() + ":"+ finalI);
  }
  );
}
Thread.sleep(Integer.MAX_VALUE);

问题分析

dump命令

jmap -dump:live,format=b,file=dump.hprof 30652

使用ibm工具分析dump文件,工具jar可以在官网下载,执行命令如下(因为是jdk17所以有很多参数,如果不是17可以直接执行工具)

java 
# jdk17导致不得不加的参数
--add-opens java.base/java.lang=ALL-UNNAMED --add-opens java.base/sun.net.util=ALL-UNNAMED --add-opens java.base/java.util=ALL-UNNAMED --add-opens java.base/java.lang.reflect=ALL-UNNAMED --add-opens java.base/java.text=ALL-UNNAMED --add-opens java.desktop/java.awt.font=ALL-UNNAMED --add-opens java.desktop/sun.swing=ALL-UNNAMED 
# jdk8直接直接下面这个工具jar即可
-Xmx4G -jar ha457.jar

分析结果,可以看到72%的内存暂用被列举为嫌疑对象了,而其中42%也正是与我们推断一致,蓝色部分为对象入口PoolingNHttpClientConnectionManager->CPool->LinkedList->Node(LeaseRequest)。至此破案

另外的30%呢,同样可以通过工具跟踪到其中23%的占用其实是我们的spring框架中加载的环境配置数据

压测时使用arthas对生产进行dump,也验证了我们的推断-.-

解决方法

  1. 为无限队列增加限制,方法很多,例如:增加一个请求计数器,请求进来时判断处理中数量,放行则递增,完结或异常时递减
  2. 将租借超时时间(对应配置:org.apache.http.client.config.RequestConfig.Builder#connectionRequestTimeout,默认Long的最大值)设置小一些(不推荐),超时不会再占用leasingRequests而是直接进入completedRequests并fireCallbacks,可能会有大量超时异常依然会消耗系统资源得不偿失,本文复现时也出现了该异常,堆栈见:注1
  3. 通过连接池状态限制请求数量(org.apache.http.impl.nio.conn.PoolingNHttpClientConnectionManager#getStats)。es客户端没有对外暴露任何连接池相关的方法,如果想要通过该方式处理则需要实现es的回调接口org.opensearch.client.RestClientBuilder.HttpClientConfigCallback#customizeHttpClient,通过该回调传入定义好的连接池,例如:注2,实际如果选择该方案可以参考官方的兜底构建方式(org.apache.http.impl.nio.client.HttpAsyncClientBuilder#build)。例如我们增加了一个简单的监控打点(见注3),可以观察到每个路由的任务积压状态:pool_status=[leased: 0; pending: 6561(包含leasingRequests数量); available: 0; max: 2]

注1:

  at org.opensearch.client.RestHighLevelClient$1.onFailure(RestHighLevelClient.java:1966) ~[opensearch-rest-high-level-client-2.3.0.jar:2.3.0]
	at org.opensearch.client.RestClient$FailureTrackingResponseListener.onDefinitiveFailure(RestClient.java:707) ~[opensearch-rest-client-2.3.0.jar:2.3.0]
	at org.opensearch.client.RestClient$1.failed(RestClient.java:450) ~[opensearch-rest-client-2.3.0.jar:2.3.0]
	at org.apache.http.concurrent.BasicFuture.failed(BasicFuture.java:137) ~[httpcore-4.4.11.jar:4.4.11]
	at org.apache.http.impl.nio.client.DefaultClientExchangeHandlerImpl.executionFailed(DefaultClientExchangeHandlerImpl.java:101) ~[httpasyncclient-4.1.4.jar:4.1.4]
	at org.apache.http.impl.nio.client.AbstractClientExchangeHandler.failed(AbstractClientExchangeHandler.java:426) ~[httpasyncclient-4.1.4.jar:4.1.4]
	at org.apache.http.impl.nio.client.AbstractClientExchangeHandler.connectionRequestFailed(AbstractClientExchangeHandler.java:348) ~[httpasyncclient-4.1.4.jar:4.1.4]
	at org.apache.http.impl.nio.client.AbstractClientExchangeHandler.access$100(AbstractClientExchangeHandler.java:62) ~[httpasyncclient-4.1.4.jar:4.1.4]
	at org.apache.http.impl.nio.client.AbstractClientExchangeHandler$1.failed(AbstractClientExchangeHandler.java:392) ~[httpasyncclient-4.1.4.jar:4.1.4]
	at org.apache.http.concurrent.BasicFuture.failed(BasicFuture.java:137) ~[httpcore-4.4.11.jar:4.4.11]
	at org.apache.http.impl.nio.conn.PoolingNHttpClientConnectionManager$1.failed(PoolingNHttpClientConnectionManager.java:316) ~[httpasyncclient-4.1.4.jar:4.1.4]
	at org.apache.http.concurrent.BasicFuture.failed(BasicFuture.java:137) ~[httpcore-4.4.11.jar:4.4.11]
	at org.apache.http.nio.pool.AbstractNIOConnPool.fireCallbacks(AbstractNIOConnPool.java:503) ~[httpcore-nio-4.4.11.jar:4.4.11]
	at org.apache.http.nio.pool.AbstractNIOConnPool.requestTimeout(AbstractNIOConnPool.java:633) ~[httpcore-nio-4.4.11.jar:4.4.11]
	at org.apache.http.nio.pool.AbstractNIOConnPool$InternalSessionRequestCallback.timeout(AbstractNIOConnPool.java:894) ~[httpcore-nio-4.4.11.jar:4.4.11]
	at org.apache.http.impl.nio.reactor.SessionRequestImpl.timeout(SessionRequestImpl.java:183) ~[httpcore-nio-4.4.11.jar:4.4.11]
	at org.apache.http.impl.nio.reactor.DefaultConnectingIOReactor.processTimeouts(DefaultConnectingIOReactor.java:210) ~[httpcore-nio-4.4.11.jar:4.4.11]
	at org.apache.http.impl.nio.reactor.DefaultConnectingIOReactor.processEvents(DefaultConnectingIOReactor.java:155) ~[httpcore-nio-4.4.11.jar:4.4.11]
	at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor.execute(AbstractMultiworkerIOReactor.java:351) ~[httpcore-nio-4.4.11.jar:4.4.11]
	at org.apache.http.impl.nio.conn.PoolingNHttpClientConnectionManager.execute(PoolingNHttpClientConnectionManager.java:221) ~[httpasyncclient-4.1.4.jar:4.1.4]
	at org.apache.http.impl.nio.client.CloseableHttpAsyncClientBase$1.run(CloseableHttpAsyncClientBase.java:64) ~[httpasyncclient-4.1.4.jar:4.1.4]
	at java.lang.Thread.run(Thread.java:833) [?:?]
Caused by: java.util.concurrent.TimeoutException: Connection lease request time out
	at org.apache.http.nio.pool.AbstractNIOConnPool.processPendingRequest(AbstractNIOConnPool.java:411) ~[httpcore-nio-4.4.11.jar:4.4.11]
	at org.apache.http.nio.pool.AbstractNIOConnPool.processNextPendingRequest(AbstractNIOConnPool.java:391) ~[httpcore-nio-4.4.11.jar:4.4.11]
	at org.apache.http.nio.pool.AbstractNIOConnPool.requestTimeout(AbstractNIOConnPool.java:629) ~[httpcore-nio-4.4.11.jar:4.4.11]
	... 8 more

注2:

RestClient.builder(HttpHost.create("myServerCluster")).setHttpClientConfigCallback(httpClientBuilder -> {
    httpClientBuilder.setMaxConnTotal(apolloConfig.getEsMaxConnectTotal());
    httpClientBuilder.setMaxConnPerRoute(apolloConfig.getEsMaxConnectPerRoute());
    ConnectingIOReactor ioreactor = IOReactorUtils.create(
        defaultIOReactorConfig != null ? defaultIOReactorConfig : IOReactorConfig.DEFAULT, threadFactory);
    PoolingNHttpClientConnectionManager poolingmgr = new PoolingNHttpClientConnectionManager(
        ioreactor,
        RegistryBuilder.<SchemeIOSessionStrategy>create()
        .register("http", NoopIOSessionStrategy.INSTANCE)
        .register("https", null)
        .build());
    httpClientBuilder.setConnectionManager(poolingmgr);
    return httpClientBuilder;
});

注3:

PoolingNHttpClientConnectionManager poolingMgr = esClientFactory.getPoolingmgr();
    if (poolingMgr != null) {
      for (HttpRoute route : poolingMgr.getRoutes()) {
        System.out.println("pool_status="+poolingMgr.getStats(route));
      }
    }

你可能感兴趣的:(java,elasticsearch,ES,NIO,opensearch)