四个节点,有两个是新增加的节点,两个老节点间组成集群没有问题,新增加了两个节点,无论是四个组成集群
# --------------------------------- Discovery ----------------------------------
#
# Pass an initial list of hosts to perform discovery when new node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]
#
discovery.zen.ping.unicast.hosts: ["10.96.91.208","10.96.91.209","10.96.91.210","10.96.91.211"]
#
# Prevent the "split brain" by configuring the majority of nodes (total number of master-eligible nodes / 2 + 1):
#
discovery.zen.minimum_master_nodes: 3
#
# For more information, consult the zen discovery module documentation.
#
---------------------
还是两个节点集群(新旧搭配)
# --------------------------------- Discovery ----------------------------------
#
# Pass an initial list of hosts to perform discovery when new node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]
#
discovery.zen.ping.unicast.hosts: ["10.96.91.208","10.96.91.210"]
#
# Prevent the "split brain" by configuring the majority of nodes (total number of master-eligible nodes / 2 + 1):
#
discovery.zen.minimum_master_nodes: 2
#
# For more information, consult the zen discovery module documentation.
---------------------
都是有问题,报错内容如下
[2017-10-11T13:30:38,240][WARN ][o.e.n.Node ] [node-03] timed out while waiting for initial discovery state - timeout: 30s
[2017-10-11T13:30:38,254][INFO ][o.e.h.n.Netty4HttpServerTransport] [node-03] publish_address {10.96.91.210:9200}, bound_addresses {10.96.91.210:9200}
[2017-10-11T13:30:38,259][INFO ][o.e.n.Node ] [node-03] started
[2017-10-11T13:30:41,301][WARN ][o.e.d.z.ZenDiscovery ] [node-03] failed to connect to master [{node-01}{VwK2Mm2hSDy4avASCpZt5w}{PMslvo9XSRWYESBXqPwz1w}{10.96.91.208}{10.96.91.208:9300}], retrying...
org.elasticsearch.transport.ConnectTransportException: [node-01][10.96.91.208:9300] connect_timeout[30s]
at org.elasticsearch.transport.netty4.Netty4Transport.connectToChannels(Netty4Transport.java:361) ~[?:?]
at org.elasticsearch.transport.TcpTransport.openConnection(TcpTransport.java:549) ~[elasticsearch-5.4.3.jar:5.4.3]
at org.elasticsearch.transport.TcpTransport.connectToNode(TcpTransport.java:473) ~[elasticsearch-5.4.3.jar:5.4.3]
at org.elasticsearch.transport.TransportService.connectToNode(TransportService.java:315) ~[elasticsearch-5.4.3.jar:5.4.3]
at org.elasticsearch.transport.TransportService.connectToNode(TransportService.java:302) ~[elasticsearch-5.4.3.jar:5.4.3]
at org.elasticsearch.discovery.zen.ZenDiscovery.joinElectedMaster(ZenDiscovery.java:468) [elasticsearch-5.4.3.jar:5.4.3]
at org.elasticsearch.discovery.zen.ZenDiscovery.innerJoinCluster(ZenDiscovery.java:420) [elasticsearch-5.4.3.jar:5.4.3]
at org.elasticsearch.discovery.zen.ZenDiscovery.access$4100(ZenDiscovery.java:83) [elasticsearch-5.4.3.jar:5.4.3]
at org.elasticsearch.discovery.zen.ZenDiscovery$JoinThreadControl$1.run(ZenDiscovery.java:1197) [elasticsearch-5.4.3.jar:5.4.3]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:569) [elasticsearch-5.4.3.jar:5.4.3]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_101]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_101]
at java.lang.Thread.run(Thread.java:745) [?:1.8.0_101]
Caused by: io.netty.channel.ConnectTimeoutException: connection timed out: 10.96.91.208/10.96.91.208:9300
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe$1.run(AbstractNioChannel.java:267) ~[?:?]
at io.netty.util.concurrent.PromiseTask$RunnableAdapter.call(PromiseTask.java:38) ~[?:?]
at io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:120) ~[?:?]
at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163) ~[?:?]
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:403) ~[?:?]
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:462) ~[?:?]
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858) ~[?:?]
... 1 more
查看日志,可以发现是网络问题。
排查网络
网卡的网络配置
cd /etc/sysconfig/network/
more ifcfg-eth0
网络路由配置
more routes
网关配置
more /etc/resolv.conf
这些配置四台服务器基本都是一样的。所以不是配置问题
继续检查ping 和 traceroute
ping没有问题
traceroute显示不一样,发现有了一个空跳。怀疑是防火墙的问题
查看防火墙的状态
chkconfig --list|grep fire
关闭防火墙
cd /etc/init.d/
./SuSEfirewall2_setup stop
./SuSEfirewall2_init stop
开机关闭防火墙
chkconfig SuSEfirewall2_setup off
chkconfig SuSEfirewall2_init off
至此,解决问题