记一次Elasticsearch连接慢的故障

昨天上午接到研发通知,告知Elasticsearch连接不上,ES集群为1主3从,登录主节点查看进程在,9200端口已无监听,查看最后的日志为:

[2020-04-27T09:59:48,063][ERROR][i.n.u.c.D.rejectedExecution] [ES-001] Failed to submit a listener notification task. Event loop shut down?
java.util.concurrent.RejectedExecutionException: event executor terminated
    at io.netty.util.concurrent.SingleThreadEventExecutor.reject(SingleThreadEventExecutor.java:867) ~[?:?]
    at io.netty.util.concurrent.SingleThreadEventExecutor.offerTask(SingleThreadEventExecutor.java:328) ~[?:?]
    at io.netty.util.concurrent.SingleThreadEventExecutor.addTask(SingleThreadEventExecutor.java:321) ~[?:?]
    at io.netty.util.concurrent.SingleThreadEventExecutor.execute(SingleThreadEventExecutor.java:778) ~[?:?]
    at io.netty.util.concurrent.DefaultPromise.safeExecute(DefaultPromise.java:768) ~[?:?]
    at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:432) ~[?:?]
    at io.netty.util.concurrent.DefaultPromise.setFailure(DefaultPromise.java:112) ~[?:?]

这时重启ES主节点,主节点启动成功后,等待恢复分片,监控发现分片恢复很慢,集群连接也非常慢,集群共404个主分片,一上午才恢复了40%,后继续找原因,比如修改参数:

cluster.routing.allocation.node_concurrent_recoveries: 10    #默认为2

观察只是提高了分片并发数,速度还是一样慢,继续查看26号的日志发现很多如下一样的日志:

[2020-04-26T19:26:01,823][DEBUG][o.e.a.s.TransportSearchAction] [ES-001] [37660] Failed to execute fetch phase
org.elasticsearch.transport.RemoteTransportException: [ES-004][172.28.231.28:9300][indices:data/read/search[phase/fetch/id]]
Caused by: org.elasticsearch.search.SearchContextMissingException: No search context found for id [37660]
[2020-04-26T19:26:01,835][DEBUG][o.e.a.s.TransportSearchAction] [ES-001] [161303] Failed to execute fetch phase
org.elasticsearch.transport.RemoteTransportException: [ES-002][172.28.231.25:9300][indices:data/read/search[phase/fetch/id]]
Caused by: org.elasticsearch.search.SearchContextMissingException: No search context found for id [161303]
[2020-04-26T19:26:01,856][DEBUG][o.e.a.s.TransportSearchAction] [ES-001] [37649] Failed to execute fetch phase
org.elasticsearch.transport.RemoteTransportException: [ES-004][172.28.231.28:9300][indices:data/read/search[phase/fetch/id]]
Caused by: org.elasticsearch.search.SearchContextMissingException: No search context found for id [37649]
[2020-04-27T02:00:48,750][DEBUG][o.e.a.s.TransportSearchAction] [ES-001] [78812] Failed to execute fetch phase
org.elasticsearch.transport.RemoteTransportException: [ES-003][172.28.231.17:9300][indices:data/read/search[phase/fetch/id]]
Caused by: org.elasticsearch.search.SearchContextMissingException: No search context found for id [78812]

原因:

从节点每台机都重复启动了,端口只有一个在监听,但是进程有2个

解决:

杀掉所有ES进程,重启恢复正常

你可能感兴趣的:(Elasticsearch)