在构造方法中,完成了配置等操作,从配置中读取其他节点ip信息存到configuredHosts中,配置操作超时的时长,以及注册远程数据传输服务,完成线程池的构造,这里还对discovery/zen/unicast注册了requsetHandler用于处理同集群其他节点传来的ping请求的处理。
public UnicastZenPing(Settings settings, ThreadPool threadPool, TransportService transportService,
UnicastHostsProvider unicastHostsProvider, PingContextProvider contextProvider) {
super(settings);
this.threadPool = threadPool;
this.transportService = transportService;
this.clusterName = ClusterName.CLUSTER_NAME_SETTING.get(settings);
this.hostsProvider = unicastHostsProvider;
this.contextProvider = contextProvider;
final int concurrentConnects = DISCOVERY_ZEN_PING_UNICAST_CONCURRENT_CONNECTS_SETTING.get(settings);
resolveTimeout = DISCOVERY_ZEN_PING_UNICAST_HOSTS_RESOLVE_TIMEOUT.get(settings);
logger.debug(
"using concurrent_connects [{}], resolve_timeout [{}]",
concurrentConnects,
resolveTimeout);
transportService.registerRequestHandler(ACTION_NAME, ThreadPool.Names.SAME, UnicastPingRequest::new,
new UnicastPingRequestHandler());
final ThreadFactory threadFactory = EsExecutors.daemonThreadFactory(settings, "[unicast_connect]");
unicastZenPingExecutorService = EsExecutors.newScaling(
nodeName() + "/" + "unicast_connect",
0,
concurrentConnects,
60,
TimeUnit.SECONDS,
threadFactory,
threadPool.getThreadContext());
}
节点通过ping()方法获取到其他节点的信息,以供节点参与选举。
@Override
public void ping(final Consumer resultsConsumer, final TimeValue duration) {
ping(resultsConsumer, duration, duration);
}
可以注意到这里的第一个参数Consumer,它是java8函数式编程的一个接口,这里可以理解成指向函数的指针,这里的resultsConsumer实际上就是response::complete实例,它是作为处理一次ping结束的处理方法。
private ZenPing.PingCollection pingAndWait(TimeValue timeout) {
final CompletableFuture response = new CompletableFuture<>();
try {
zenPing.ping(response::complete, timeout);
} catch (Exception ex) {
// logged later
response.completeExceptionally(ex);
}
try {
return response.get();
} catch (InterruptedException e) {
logger.trace("pingAndWait interrupted");
return new ZenPing.PingCollection();
} catch (ExecutionException e) {
logger.warn("Ping execution failed", e);
return new ZenPing.PingCollection();
}
}
我们继续看ping()方法,可以看到这里首先调用了resolveHostLists()方法解析集群内的节点。
final List seedNodes;
try {
seedNodes = resolveHostsLists(
unicastZenPingExecutorService,
logger,
configuredHosts,
limitPortCounts,
transportService,
UNICAST_NODE_PREFIX,
resolveTimeout);
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
在中可以看到,它通过一开始配置的线程池,启用一个定时任务,将hosts通过流,map的方式,执行transportService的addressesFromString函数,将host ip转换成TransportAddress类型。
final List> callables =
hosts
.stream()
.map(hn -> (Callable) () -> transportService.addressesFromString(hn, limitPortCounts))
.collect(Collectors.toList());
final List> futures =
executorService.invokeAll(callables, resolveTimeout.nanos(), TimeUnit.NANOSECONDS);
接下来通过transportService分别得到本机点的publish ip地址跟bound ip地址几个,然后变量之前构造的futrues集合,执行其中解析ip地址的方法,并且在方法执行完后,将本机的publish ip跟绑定的ip以外的ip封装成DiscoveryNode节点,并返回。
final List discoveryNodes = new ArrayList<>();
final Set localAddresses = new HashSet<>();
localAddresses.add(transportService.boundAddress().publishAddress());
localAddresses.addAll(Arrays.asList(transportService.boundAddress().boundAddresses()));
// ExecutorService#invokeAll guarantees that the futures are returned in the iteration order of the tasks so we can associate the
// hostname with the corresponding task by iterating together
final Iterator it = hosts.iterator();
for (final Future future : futures) {
final String hostname = it.next();
if (!future.isCancelled()) {
assert future.isDone();
try {
final TransportAddress[] addresses = future.get();
logger.trace("resolved host [{}] to {}", hostname, addresses);
for (int addressId = 0; addressId < addresses.length; addressId++) {
final TransportAddress address = addresses[addressId];
// no point in pinging ourselves
if (localAddresses.contains(address) == false) {
discoveryNodes.add(
new DiscoveryNode(
nodeId_prefix + hostname + "_" + addressId + "#",
address,
emptyMap(),
emptySet(),
Version.CURRENT.minimumCompatibilityVersion()));
}
}
} catch (final ExecutionException e) {
assert e.getCause() != null;
final String message = "failed to resolve host [" + hostname + "]";
logger.warn(message, e.getCause());
}
} else {
logger.warn("timed out after [{}] resolving host [{}]", resolveTimeout, hostname);
}
}
return discoveryNodes;
至此,resolveHostsLists()方法结束,回到ping()方法中,得到的是seedNodes集合。然后将所有可能成为master节点的节点也加入
到seedNodes中。
final DiscoveryNodes nodes = contextProvider.clusterState().nodes();
for (ObjectCursor masterNode : nodes.getMasterNodes().values()) {
seedAddresses.add(masterNode.value.getAddress());
}
然后构造ConnectProfile(传入ping连接类型reg跟握手连接的超时时长)跟PingingRound(传入本轮ping操作的id,目标节点,本地节点,连接ConnectProfile,当然ConnectProfile内包含了连接类型reg跟握手连接的超时时长),作为一次ping连接的抽象。
final ConnectionProfile connectionProfile =
ConnectionProfile.buildSingleChannelProfile(TransportRequestOptions.Type.REG, requestDuration, requestDuration);
final PingingRound pingingRound = new PingingRound(pingingRoundIdGenerator.incrementAndGet(), seedAddresses, resultsConsumer,
nodes.getLocalNode(), connectionProfile);
activePingingRounds.put(pingingRound.id(), pingingRound);
操作抽象好了,接下来可以构造执行操作的任务,并通过线程池愉快地进行ping操作了。
final AbstractRunnable pingSender = new AbstractRunnable() {
@Override
public void onFailure(Exception e) {
if (e instanceof AlreadyClosedException == false) {
logger.warn("unexpected error while pinging", e);
}
}
@Override
protected void doRun() throws Exception {
sendPings(requestDuration, pingingRound);
}
};
threadPool.generic().execute(pingSender);
threadPool.schedule(TimeValue.timeValueMillis(scheduleDuration.millis() / 3), ThreadPool.Names.GENERIC, pingSender);
threadPool.schedule(TimeValue.timeValueMillis(scheduleDuration.millis() / 3 * 2), ThreadPool.Names.GENERIC, pingSender);
可以看到,这里构造了pingSender,并通过线程池,隔了一定的时长(默认1s)三次调用pingSender线程任务。向seedNodes的所有节点发送ping请求。
threadPool.schedule(scheduleDuration, ThreadPool.Names.GENERIC, new AbstractRunnable() {
@Override
protected void doRun() throws Exception {
finishPingingRound(pingingRound);
}
@Override
public void onFailure(Exception e) {
logger.warn("unexpected error while finishing pinging round", e);
}
});
protected void finishPingingRound(PingingRound pingingRound) {
pingingRound.close();
}
最后通过一个线程任务收尾。
接下来看看sendPing()做了啥
protected void sendPings(final TimeValue timeout, final PingingRound pingingRound) {
final ClusterState lastState = contextProvider.clusterState();
final UnicastPingRequest pingRequest = new UnicastPingRequest(pingingRound.id(), timeout, createPingResponse(lastState));
List temporalAddresses = temporalResponses.stream().map(pingResponse -> {
assert clusterName.equals(pingResponse.clusterName()) :
"got a ping request from a different cluster. expected " + clusterName + " got " + pingResponse.clusterName();
return pingResponse.node().getAddress();
}).collect(Collectors.toList());
final Stream uniqueAddresses = Stream.concat(pingingRound.getSeedAddresses().stream(),
temporalAddresses.stream()).distinct();
// resolve what we can via the latest cluster state
final Set nodesToPing = uniqueAddresses
.map(address -> {
DiscoveryNode foundNode = lastState.nodes().findByAddress(address);
if (foundNode != null && transportService.nodeConnected(foundNode)) {
return foundNode;
} else {
return new DiscoveryNode(
address.toString(),
address,
emptyMap(),
emptySet(),
Version.CURRENT.minimumCompatibilityVersion());
}
}).collect(Collectors.toSet());
nodesToPing.forEach(node -> sendPingRequestToNode(node, timeout, pingingRound, pingRequest));
}
先得到之前向本节点发送ping消息的同集群消息,再构造当前的pingRequest(传入当前ping轮次的id,当然由于发了三次,这样id的pingRequest会有三个,然后根据当前节点的节点数据以及所在集群的其他信息构造成pingResponse)。
private PingResponse createPingResponse(ClusterState clusterState) {
DiscoveryNodes discoNodes = clusterState.nodes();
return new PingResponse(discoNodes.getLocalNode(), discoNodes.getMasterNode(), clusterState);
}
然后,遍历之前得到的节点集合seedNodes,分别对每个节点调用sendPingRequstToNode()方法,发送pingingRound跟pingRequest。
接下来看下sendPingRequstToNode()方法
private void sendPingRequestToNode(final DiscoveryNode node, TimeValue timeout, final PingingRound pingingRound,
final UnicastPingRequest pingRequest) {
submitToExecutor(new AbstractRunnable() {
@Override
protected void doRun() throws Exception {
Connection connection = null;
if (transportService.nodeConnected(node)) {
try {
// concurrency can still cause disconnects
connection = transportService.getConnection(node);
} catch (NodeNotConnectedException e) {
logger.trace("[{}] node [{}] just disconnected, will create a temp connection", pingingRound.id(), node);
}
}
if (connection == null) {
connection = pingingRound.getOrConnect(node);
}
logger.trace("[{}] sending to {}", pingingRound.id(), node);
transportService.sendRequest(connection, ACTION_NAME, pingRequest,
TransportRequestOptions.builder().withTimeout((long) (timeout.millis() * 1.25)).build(),
getPingResponseHandler(pingingRound, node));
}
@Override
public void onFailure(Exception e) {
if (e instanceof ConnectTransportException || e instanceof AlreadyClosedException) {
// can't connect to the node - this is more common path!
logger.trace(() -> new ParameterizedMessage("[{}] failed to ping {}", pingingRound.id(), node), e);
} else if (e instanceof RemoteTransportException) {
// something went wrong on the other side
logger.debug(() -> new ParameterizedMessage(
"[{}] received a remote error as a response to ping {}", pingingRound.id(), node), e);
} else {
logger.warn(() -> new ParameterizedMessage("[{}] failed send ping to {}", pingingRound.id(), node), e);
}
}
@Override
public void onRejection(Exception e) {
// The RejectedExecutionException can come from the fact unicastZenPingExecutorService is at its max down in sendPings
// But don't bail here, we can retry later on after the send ping has been scheduled.
logger.debug("Ping execution rejected", e);
}
});
}
可以看到 ,启动个线程调用transportService服务向目标节点的 "internal:discovery/zen/unicast"发送ping请求,同时传入一个请求成功的回调函数,来处理请求成功后的回复。我们可以看其实现。
protected TransportResponseHandler getPingResponseHandler(final PingingRound pingingRound,
final DiscoveryNode node) {
return new TransportResponseHandler() {
@Override
public UnicastPingResponse read(StreamInput in) throws IOException {
return new UnicastPingResponse(in);
}
@Override
public String executor() {
return ThreadPool.Names.SAME;
}
@Override
public void handleResponse(UnicastPingResponse response) {
logger.trace("[{}] received response from {}: {}", pingingRound.id(), node, Arrays.toString(response.pingResponses));
if (pingingRound.isClosed()) {
if (logger.isTraceEnabled()) {
logger.trace("[{}] skipping received response from {}. already closed", pingingRound.id(), node);
}
} else {
Stream.of(response.pingResponses).forEach(pingingRound::addPingResponseToCollection);
}
}
@Override
public void handleException(TransportException exp) {
if (exp instanceof ConnectTransportException || exp.getCause() instanceof ConnectTransportException ||
exp.getCause() instanceof AlreadyClosedException) {
// ok, not connected...
logger.trace(() -> new ParameterizedMessage("failed to connect to {}", node), exp);
} else if (closed == false) {
logger.warn(() -> new ParameterizedMessage("failed to send ping to [{}]", node), exp);
}
}
};
}
可以看到,如果pingingRound没有关闭,则遍历pingResponse,调用pingingRound::addPingResponseToCollection将结果存入pingCollection中。
public void addPingResponseToCollection(PingResponse pingResponse) {
if (localNode.equals(pingResponse.node()) == false) {
pingCollection.addPing(pingResponse);
}
}
回到ping最后收尾的地方
threadPool.schedule(scheduleDuration, ThreadPool.Names.GENERIC, new AbstractRunnable() {
@Override
protected void doRun() throws Exception {
finishPingingRound(pingingRound);
}
@Override
public void onFailure(Exception e) {
logger.warn("unexpected error while finishing pinging round", e);
}
});
protected void finishPingingRound(PingingRound pingingRound) {
pingingRound.close();
}
@Override
public void close() {
List toClose = null;
synchronized (this) {
if (closed.compareAndSet(false, true)) {
activePingingRounds.remove(id);
toClose = new ArrayList<>(tempConnections.values());
tempConnections.clear();
}
}
if (toClose != null) {
// we actually closed
try {
pingListener.accept(pingCollection);
} finally {
IOUtils.closeWhileHandlingException(toClose);
}
}
}
可以看到最后调用了构造传入的response::complete的accept方法,结束了ping调用。
一开始在构造中提到了处理其他节点ping请求的handler,UnicastPingRequstHandler。我们看其是如何处理逻辑的。
class UnicastPingRequestHandler implements TransportRequestHandler {
@Override
public void messageReceived(UnicastPingRequest request, TransportChannel channel, Task task) throws Exception {
if (closed) {
throw new AlreadyClosedException("node is shutting down");
}
if (request.pingResponse.clusterName().equals(clusterName)) {
channel.sendResponse(handlePingRequest(request));
} else {
throw new IllegalStateException(
String.format(
Locale.ROOT,
"mismatched cluster names; request: [%s], local: [%s]",
request.pingResponse.clusterName().value(),
clusterName.value()));
}
}
}
这段代码看着很舒服,当然这里把处理逻辑扔到了handlePingRequest中,我们继续进去看。
private UnicastPingResponse handlePingRequest(final UnicastPingRequest request) {
assert clusterName.equals(request.pingResponse.clusterName()) :
"got a ping request from a different cluster. expected " + clusterName + " got " + request.pingResponse.clusterName();
temporalResponses.add(request.pingResponse);
// add to any ongoing pinging
activePingingRounds.values().forEach(p -> p.addPingResponseToCollection(request.pingResponse));
threadPool.schedule(TimeValue.timeValueMillis(request.timeout.millis() * 2), ThreadPool.Names.SAME,
() -> temporalResponses.remove(request.pingResponse));
List pingResponses = CollectionUtils.iterableAsArrayList(temporalResponses);
pingResponses.add(createPingResponse(contextProvider.clusterState()));
return new UnicastPingResponse(request.id, pingResponses.toArray(new PingResponse[pingResponses.size()]));
}
可以看到temporalResponse存放每个节点所发过来的pingRequset中携带的节点数据PingRespons,再向所有activePingRound中添加本次request的response。这里起一个定时任务,一定时间后删除本次加入到temporalResponse中的PingRespons以保证时效性。最后把收集请求的temporalResponse做为pingResponse,将当前节点的信息,以及收集来的节点的信息一并返回给ping请求发送者。
然后回到了前面分析的步骤,处理返回的请求,看其最后的实现。
public synchronized boolean addPing(PingResponse ping) {
PingResponse existingResponse = pings.get(ping.node());
if (existingResponse == null || existingResponse.id() <= ping.id()) {
pings.put(ping.node(), ping);
return true;
}
return false;
}
可以看到这里收集ping结果,如果该ping的node以及有ping结果,则取ping轮次id大的,及更新的ping数据结果。