elasticsearch5.4集群超时

四个节点,有两个是新增加的节点,两个老节点间组成集群没有问题,新增加了两个节点,无论是四个组成集群

# --------------------------------- Discovery ----------------------------------

#

# Pass an initial list of hosts to perform discovery when new node is started:

# The default list of hosts is ["127.0.0.1", "[::1]"]

#

discovery.zen.ping.unicast.hosts: ["10.96.91.208","10.96.91.209","10.96.91.210","10.96.91.211"]

#

# Prevent the "split brain" by configuring the majority of nodes (total number of master-eligible nodes / 2 + 1):

#

discovery.zen.minimum_master_nodes: 3

#

# For more information, consult the zen discovery module documentation.

#

---------------------

还是两个节点集群(新旧搭配)

# --------------------------------- Discovery ----------------------------------

#

# Pass an initial list of hosts to perform discovery when new node is started:

# The default list of hosts is ["127.0.0.1", "[::1]"]

#

discovery.zen.ping.unicast.hosts: ["10.96.91.208","10.96.91.210"]

#

# Prevent the "split brain" by configuring the majority of nodes (total number of master-eligible nodes / 2 + 1):

#

discovery.zen.minimum_master_nodes: 2

#

# For more information, consult the zen discovery module documentation.

---------------------

都是有问题,报错内容如下

[2017-10-11T13:30:38,240][WARN ][o.e.n.Node ] [node-03] timed out while waiting for initial discovery state - timeout: 30s

[2017-10-11T13:30:38,254][INFO ][o.e.h.n.Netty4HttpServerTransport] [node-03] publish_address {10.96.91.210:9200}, bound_addresses {10.96.91.210:9200}

[2017-10-11T13:30:38,259][INFO ][o.e.n.Node              ] [node-03] started

[2017-10-11T13:30:41,301][WARN ][o.e.d.z.ZenDiscovery    ] [node-03] failed to connect to master [{node-01}{VwK2Mm2hSDy4avASCpZt5w}{PMslvo9XSRWYESBXqPwz1w}{10.96.91.208}{10.96.91.208:9300}], retrying...

org.elasticsearch.transport.ConnectTransportException: [node-01][10.96.91.208:9300] connect_timeout[30s]

    at org.elasticsearch.transport.netty4.Netty4Transport.connectToChannels(Netty4Transport.java:361) ~[?:?]

    at org.elasticsearch.transport.TcpTransport.openConnection(TcpTransport.java:549) ~[elasticsearch-5.4.3.jar:5.4.3]

    at org.elasticsearch.transport.TcpTransport.connectToNode(TcpTransport.java:473) ~[elasticsearch-5.4.3.jar:5.4.3]

    at org.elasticsearch.transport.TransportService.connectToNode(TransportService.java:315) ~[elasticsearch-5.4.3.jar:5.4.3]

    at org.elasticsearch.transport.TransportService.connectToNode(TransportService.java:302) ~[elasticsearch-5.4.3.jar:5.4.3]

    at org.elasticsearch.discovery.zen.ZenDiscovery.joinElectedMaster(ZenDiscovery.java:468) [elasticsearch-5.4.3.jar:5.4.3]

    at org.elasticsearch.discovery.zen.ZenDiscovery.innerJoinCluster(ZenDiscovery.java:420) [elasticsearch-5.4.3.jar:5.4.3]

    at org.elasticsearch.discovery.zen.ZenDiscovery.access$4100(ZenDiscovery.java:83) [elasticsearch-5.4.3.jar:5.4.3]

    at org.elasticsearch.discovery.zen.ZenDiscovery$JoinThreadControl$1.run(ZenDiscovery.java:1197) [elasticsearch-5.4.3.jar:5.4.3]

    at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:569) [elasticsearch-5.4.3.jar:5.4.3]

    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_101]

    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_101]

    at java.lang.Thread.run(Thread.java:745) [?:1.8.0_101]

Caused by: io.netty.channel.ConnectTimeoutException: connection timed out: 10.96.91.208/10.96.91.208:9300

    at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe$1.run(AbstractNioChannel.java:267) ~[?:?]

    at io.netty.util.concurrent.PromiseTask$RunnableAdapter.call(PromiseTask.java:38) ~[?:?]

    at io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:120) ~[?:?]

    at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163) ~[?:?]

    at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:403) ~[?:?]

    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:462) ~[?:?]

    at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858) ~[?:?]

    ... 1 more

查看日志,可以发现是网络问题。

排查网络

网卡的网络配置

cd /etc/sysconfig/network/

more ifcfg-eth0

网络路由配置

more routes

网关配置

more /etc/resolv.conf

这些配置四台服务器基本都是一样的。所以不是配置问题

继续检查ping 和 traceroute

ping没有问题

traceroute显示不一样,发现有了一个空跳。怀疑是防火墙的问题

查看防火墙的状态

chkconfig --list|grep fire

关闭防火墙

cd /etc/init.d/

./SuSEfirewall2_setup stop

./SuSEfirewall2_init stop

开机关闭防火墙

chkconfig SuSEfirewall2_setup off

chkconfig SuSEfirewall2_init off

至此,解决问题

你可能感兴趣的:(elasticsearch5.4集群超时)