序
本文主要研究一下scalecube-cluster的MembershipProtocol
MembershipProtocol
scalecube-cluster-2.2.5/cluster/src/main/java/io/scalecube/cluster/membership/MembershipProtocol.java
/**
* Cluster Membership Protocol component responsible for managing information about existing members
* of the cluster.
*/
public interface MembershipProtocol {
/**
* Starts running cluster membership protocol. After started it begins to receive and send cluster
* membership messages
*/
Mono start();
/** Stops running cluster membership protocol and releases occupied resources. */
void stop();
/** Listen changes in cluster membership. */
Flux listen();
/**
* Returns list of all members of the joined cluster. This will include all cluster members
* including local member.
*
* @return all members in the cluster (including local one)
*/
Collection members();
/**
* Returns list of all cluster members of the joined cluster excluding local member.
*
* @return all members in the cluster (excluding local one)
*/
Collection otherMembers();
/**
* Returns local cluster member which corresponds to this cluster instance.
*
* @return local member
*/
Member member();
/**
* Returns cluster member with given id or null if no member with such id exists at joined
* cluster.
*
* @return member by id
*/
Optional member(String id);
/**
* Returns cluster member by given address or null if no member with such address exists at joined
* cluster.
*
* @return member by address
*/
Optional member(Address address);
}
- MembershipProtocol接口定义了start、stop、listen、members、otherMembers、member方法
MembershipProtocolImpl
scalecube-cluster-2.2.5/cluster/src/main/java/io/scalecube/cluster/membership/MembershipProtocolImpl.java
public final class MembershipProtocolImpl implements MembershipProtocol {
private static final Logger LOGGER = LoggerFactory.getLogger(MembershipProtocolImpl.class);
private enum MembershipUpdateReason {
FAILURE_DETECTOR_EVENT,
MEMBERSHIP_GOSSIP,
SYNC,
INITIAL_SYNC,
SUSPICION_TIMEOUT
}
// Qualifiers
public static final String SYNC = "sc/membership/sync";
public static final String SYNC_ACK = "sc/membership/syncAck";
public static final String MEMBERSHIP_GOSSIP = "sc/membership/gossip";
private final Member localMember;
// Injected
private final Transport transport;
private final MembershipConfig config;
private final List seedMembers;
private final FailureDetector failureDetector;
private final GossipProtocol gossipProtocol;
private final MetadataStore metadataStore;
private final CorrelationIdGenerator cidGenerator;
// State
private final Map membershipTable = new HashMap<>();
private final Map members = new HashMap<>();
// Subject
private final FluxProcessor subject =
DirectProcessor.create().serialize();
private final FluxSink sink = subject.sink();
// Disposables
private final Disposable.Composite actionsDisposables = Disposables.composite();
// Scheduled
private final Scheduler scheduler;
private final Map suspicionTimeoutTasks = new HashMap<>();
/**
* Creates new instantiates of cluster membership protocol with given transport and config.
*
* @param localMember local cluster member
* @param transport cluster transport
* @param failureDetector failure detector
* @param gossipProtocol gossip protocol
* @param metadataStore metadata store
* @param config membership config parameters
* @param scheduler scheduler
* @param cidGenerator correlation id generator
*/
public MembershipProtocolImpl(
Member localMember,
Transport transport,
FailureDetector failureDetector,
GossipProtocol gossipProtocol,
MetadataStore metadataStore,
MembershipConfig config,
Scheduler scheduler,
CorrelationIdGenerator cidGenerator) {
this.transport = Objects.requireNonNull(transport);
this.config = Objects.requireNonNull(config);
this.failureDetector = Objects.requireNonNull(failureDetector);
this.gossipProtocol = Objects.requireNonNull(gossipProtocol);
this.metadataStore = Objects.requireNonNull(metadataStore);
this.localMember = Objects.requireNonNull(localMember);
this.scheduler = Objects.requireNonNull(scheduler);
this.cidGenerator = Objects.requireNonNull(cidGenerator);
// Prepare seeds
seedMembers = cleanUpSeedMembers(config.getSeedMembers());
// Init membership table with local member record
membershipTable.put(localMember.id(), new MembershipRecord(localMember, ALIVE, 0));
// fill in the table of members with local member
members.put(localMember.id(), localMember);
actionsDisposables.addAll(
Arrays.asList(
// Listen to incoming SYNC and SYNC ACK requests from other members
transport
.listen() //
.publishOn(scheduler)
.subscribe(this::onMessage, this::onError),
// Listen to events from failure detector
failureDetector
.listen()
.publishOn(scheduler)
.subscribe(this::onFailureDetectorEvent, this::onError),
// Listen to membership gossips
gossipProtocol
.listen()
.publishOn(scheduler)
.subscribe(this::onMembershipGossip, this::onError)));
}
@Override
public Flux listen() {
return subject.onBackpressureBuffer();
}
@Override
public Mono start() {
// Make initial sync with all seed members
return Mono.create(
sink -> {
// In case no members at the moment just schedule periodic sync
if (seedMembers.isEmpty()) {
schedulePeriodicSync();
sink.success();
return;
}
// If seed addresses are specified in config - send initial sync to those nodes
LOGGER.debug("Making initial Sync to all seed members: {}", seedMembers);
//noinspection unchecked
Mono[] syncs =
seedMembers
.stream()
.map(
address -> {
String cid = cidGenerator.nextCid();
return transport
.requestResponse(prepareSyncDataMsg(SYNC, cid), address)
.filter(this::checkSyncGroup);
})
.toArray(Mono[]::new);
// Process initial SyncAck
Flux.mergeDelayError(syncs.length, syncs)
.take(1)
.timeout(Duration.ofMillis(config.getSyncTimeout()), scheduler)
.publishOn(scheduler)
.flatMap(message -> onSyncAck(message, true))
.doFinally(
s -> {
schedulePeriodicSync();
sink.success();
})
.subscribe(
null,
ex -> LOGGER.info("Exception on initial SyncAck, cause: {}", ex.toString()));
});
}
@Override
public void stop() {
// Stop accepting requests, events and sending sync
actionsDisposables.dispose();
// Cancel remove members tasks
for (String memberId : suspicionTimeoutTasks.keySet()) {
Disposable future = suspicionTimeoutTasks.get(memberId);
if (future != null && !future.isDisposed()) {
future.dispose();
}
}
suspicionTimeoutTasks.clear();
// Stop publishing events
sink.complete();
}
@Override
public Collection members() {
return new ArrayList<>(members.values());
}
@Override
public Collection otherMembers() {
return new ArrayList<>(members.values())
.stream()
.filter(member -> !member.equals(localMember))
.collect(Collectors.toList());
}
@Override
public Member member() {
return localMember;
}
@Override
public Optional member(String id) {
return Optional.ofNullable(members.get(id));
}
@Override
public Optional member(Address address) {
return new ArrayList<>(members.values())
.stream()
.filter(member -> member.address().equals(address))
.findFirst();
}
//......
}
- MembershipProtocolImpl实现了MembershipProtocol接口;它定义了MembershipUpdateReason枚举(
FAILURE_DETECTOR_EVENT、MEMBERSHIP_GOSSIP、SYNC、INITIAL_SYNC、SUSPICION_TIMEOUT
) - MembershipProtocolImpl的构造器监听了transport.listen()触发onMessage方法;监听了failureDetector.listen()触发onFailureDetectorEvent方法;监听了gossipProtocol.listen()触发onMembershipGossip方法
- MembershipProtocolImpl的start方法在seedMembers.isEmpty()的时候会执行schedulePeriodicSync方法即每隔syncInterval执行doSync方法;当seedMembers不为空时则遍历seedMembers通过transport.requestResponse发送SYNC,执行成功则触发onSyncAck;stop方法则挨个销毁suspicionTimeoutTasks的future
onMessage
scalecube-cluster-2.2.5/cluster/src/main/java/io/scalecube/cluster/membership/MembershipProtocolImpl.java
public final class MembershipProtocolImpl implements MembershipProtocol {
//......
private void onMessage(Message message) {
if (checkSyncGroup(message)) {
if (SYNC.equals(message.qualifier())) {
onSync(message).subscribe(null, this::onError);
}
if (SYNC_ACK.equals(message.qualifier())) {
if (message.correlationId() == null) { // filter out initial sync
onSyncAck(message, false).subscribe(null, this::onError);
}
}
}
}
private boolean checkSyncGroup(Message message) {
if (message.data() instanceof SyncData) {
SyncData syncData = message.data();
return config.getSyncGroup().equals(syncData.getSyncGroup());
}
return false;
}
/** Merges incoming SYNC data, merges it and sending back merged data with SYNC_ACK. */
private Mono onSync(Message syncMsg) {
return Mono.defer(
() -> {
LOGGER.debug("Received Sync: {}", syncMsg);
return syncMembership(syncMsg.data(), false)
.doOnSuccess(
avoid -> {
Message message = prepareSyncDataMsg(SYNC_ACK, syncMsg.correlationId());
Address address = syncMsg.sender();
transport
.send(address, message)
.subscribe(
null,
ex ->
LOGGER.debug(
"Failed to send SyncAck: {} to {}, cause: {}",
message,
address,
ex.toString()));
});
});
}
private Mono onSyncAck(Message syncAckMsg, boolean onStart) {
return Mono.defer(
() -> {
LOGGER.debug("Received SyncAck: {}", syncAckMsg);
return syncMembership(syncAckMsg.data(), onStart);
});
}
private Mono syncMembership(SyncData syncData, boolean onStart) {
return Mono.defer(
() -> {
MembershipUpdateReason reason =
onStart ? MembershipUpdateReason.INITIAL_SYNC : MembershipUpdateReason.SYNC;
return Mono.whenDelayError(
syncData
.getMembership()
.stream()
.filter(r1 -> !r1.equals(membershipTable.get(r1.id())))
.map(r1 -> updateMembership(r1, reason))
.toArray(Mono[]::new));
});
}
//......
}
- onMessage方法首先通过checkSyncGroup检查一下是不是该syncGroup的消息;之后根据message.qualifier()是SYNC则执行onSync,是SYNC_ACK则执行onSyncAck
- onSync方法则执行syncMembership,成功时向sender返回SYNC_ACK信息;onSyncAck方法也是调用syncMembership,只不过没有再向sender返回信息
- syncMembership方法则根据syncData的membership来挨个执行updateMembership方法
onFailureDetectorEvent
scalecube-cluster-2.2.5/cluster/src/main/java/io/scalecube/cluster/membership/MembershipProtocolImpl.java
public final class MembershipProtocolImpl implements MembershipProtocol {
//......
/** Merges FD updates and processes them. */
private void onFailureDetectorEvent(FailureDetectorEvent fdEvent) {
MembershipRecord r0 = membershipTable.get(fdEvent.member().id());
if (r0 == null) { // member already removed
return;
}
if (r0.status() == fdEvent.status()) { // status not changed
return;
}
LOGGER.debug("Received status change on failure detector event: {}", fdEvent);
if (fdEvent.status() == ALIVE) {
// TODO: Consider to make more elegant solution
// Alive won't override SUSPECT so issue instead extra sync with member to force it spread
// alive with inc + 1
Message syncMsg = prepareSyncDataMsg(SYNC, null);
Address address = fdEvent.member().address();
transport
.send(address, syncMsg)
.subscribe(
null,
ex ->
LOGGER.debug(
"Failed to send {} to {}, cause: {}", syncMsg, address, ex.toString()));
} else {
MembershipRecord record =
new MembershipRecord(r0.member(), fdEvent.status(), r0.incarnation());
updateMembership(record, MembershipUpdateReason.FAILURE_DETECTOR_EVENT)
.subscribe(null, this::onError);
}
}
//......
}
- onFailureDetectorEvent方法根据FailureDetectorEvent判断该MembershipRecord的状态是否有变化,如果变为ALIVE则往fdEvent.member().address()发送SYNC信息;否则使用MembershipUpdateReason.FAILURE_DETECTOR_EVENT来updateMembership
onMembershipGossip
scalecube-cluster-2.2.5/cluster/src/main/java/io/scalecube/cluster/membership/MembershipProtocolImpl.java
public final class MembershipProtocolImpl implements MembershipProtocol {
//......
/** Merges received membership gossip (not spreading gossip further). */
private void onMembershipGossip(Message message) {
if (MEMBERSHIP_GOSSIP.equals(message.qualifier())) {
MembershipRecord record = message.data();
LOGGER.debug("Received membership gossip: {}", record);
updateMembership(record, MembershipUpdateReason.MEMBERSHIP_GOSSIP)
.subscribe(null, this::onError);
}
}
//......
}
- onMembershipGossip方法则针对message.qualifier()为MEMBERSHIP_GOSSIP的消息使用MembershipUpdateReason.MEMBERSHIP_GOSSIP来updateMembership
updateMembership
scalecube-cluster-2.2.5/cluster/src/main/java/io/scalecube/cluster/membership/MembershipProtocolImpl.java
public final class MembershipProtocolImpl implements MembershipProtocol {
//......
/**
* Try to update membership table with the given record.
*
* @param r1 new membership record which compares with existing r0 record
* @param reason indicating the reason for updating membership table
*/
private Mono updateMembership(MembershipRecord r1, MembershipUpdateReason reason) {
return Mono.defer(
() -> {
Objects.requireNonNull(r1, "Membership record can't be null");
// Get current record
MembershipRecord r0 = membershipTable.get(r1.id());
// Check if new record r1 overrides existing membership record r0
if (!r1.isOverrides(r0)) {
return Mono.empty();
}
// If received updated for local member then increase incarnation and spread Alive gossip
if (r1.member().id().equals(localMember.id())) {
int currentIncarnation = Math.max(r0.incarnation(), r1.incarnation());
MembershipRecord r2 =
new MembershipRecord(localMember, r0.status(), currentIncarnation + 1);
membershipTable.put(localMember.id(), r2);
LOGGER.debug(
"Local membership record r0: {}, but received r1: {}, "
+ "spread with increased incarnation r2: {}",
r0,
r1,
r2);
spreadMembershipGossip(r2)
.subscribe(
null,
ex -> {
// on-op
});
return Mono.empty();
}
// Update membership
if (r1.isDead()) {
membershipTable.remove(r1.id());
} else {
membershipTable.put(r1.id(), r1);
}
// Schedule/cancel suspicion timeout task
if (r1.isSuspect()) {
scheduleSuspicionTimeoutTask(r1);
} else {
cancelSuspicionTimeoutTask(r1.id());
}
// Emit membership and regardless of result spread gossip
return emitMembershipEvent(r0, r1)
.doFinally(
s -> {
// Spread gossip (unless already gossiped)
if (reason != MembershipUpdateReason.MEMBERSHIP_GOSSIP
&& reason != MembershipUpdateReason.INITIAL_SYNC) {
spreadMembershipGossip(r1)
.subscribe(
null,
ex -> {
// no-op
});
}
});
});
}
private Mono spreadMembershipGossip(MembershipRecord record) {
return Mono.defer(
() -> {
Message msg = Message.withData(record).qualifier(MEMBERSHIP_GOSSIP).build();
LOGGER.debug("Spead membreship: {} with gossip", msg);
return gossipProtocol
.spread(msg)
.doOnError(
ex ->
LOGGER.debug(
"Failed to spread membership: {} with gossip, cause: {}",
msg,
ex.toString()))
.then();
});
}
private void scheduleSuspicionTimeoutTask(MembershipRecord record) {
long suspicionTimeout =
ClusterMath.suspicionTimeout(
config.getSuspicionMult(), membershipTable.size(), config.getPingInterval());
suspicionTimeoutTasks.computeIfAbsent(
record.id(),
id ->
scheduler.schedule(
() -> onSuspicionTimeout(id), suspicionTimeout, TimeUnit.MILLISECONDS));
}
private void onSuspicionTimeout(String memberId) {
suspicionTimeoutTasks.remove(memberId);
MembershipRecord record = membershipTable.get(memberId);
if (record != null) {
LOGGER.debug("Declare SUSPECTED member {} as DEAD by timeout", record);
MembershipRecord deadRecord =
new MembershipRecord(record.member(), DEAD, record.incarnation());
updateMembership(deadRecord, MembershipUpdateReason.SUSPICION_TIMEOUT)
.subscribe(null, this::onError);
}
}
private void cancelSuspicionTimeoutTask(String memberId) {
Disposable future = suspicionTimeoutTasks.remove(memberId);
if (future != null && !future.isDisposed()) {
future.dispose();
}
}
private Mono emitMembershipEvent(MembershipRecord r0, MembershipRecord r1) {
return Mono.defer(
() -> {
final Member member = r1.member();
if (r1.isDead()) {
members.remove(member.id());
// removed
return Mono.fromRunnable(
() -> {
Map metadata = metadataStore.removeMetadata(member);
sink.next(MembershipEvent.createRemoved(member, metadata));
});
}
if (r0 == null && r1.isAlive()) {
members.put(member.id(), member);
// added
return metadataStore
.fetchMetadata(member)
.doOnSuccess(
metadata -> {
metadataStore.updateMetadata(member, metadata);
sink.next(MembershipEvent.createAdded(member, metadata));
})
.onErrorResume(TimeoutException.class, e -> Mono.empty())
.then();
}
if (r0 != null && r0.incarnation() < r1.incarnation()) {
// updated
return metadataStore
.fetchMetadata(member)
.doOnSuccess(
metadata1 -> {
Map metadata0 =
metadataStore.updateMetadata(member, metadata1);
sink.next(MembershipEvent.createUpdated(member, metadata0, metadata1));
})
.onErrorResume(TimeoutException.class, e -> Mono.empty())
.then();
}
return Mono.empty();
});
}
//......
}
- updateMembership会对比传入的MembershipRecord与本地的localMember,如果是需要更新localMember则执行spreadMembershipGossip,之后根据MembershipRecord的状态做不同处理,比如isDead则从membershipTable移除,比如isSuspect则执行scheduleSuspicionTimeoutTask,否则执行cancelSuspicionTimeoutTask,最后执行emitMembershipEvent及spreadMembershipGossip
- scheduleSuspicionTimeoutTask方法计算suspicionTimeout然后注册一个SuspicionTimeout的延时任务如果suspicionTimeoutTasks没有该record.id()的task的话;onSuspicionTimeout首先将该task从suspicionTimeoutTasks移除,然后使用MembershipUpdateReason.SUSPICION_TIMEOUT来updateMembership;cancelSuspicionTimeoutTask方法也是将该task从suspicionTimeoutTasks移除,并dispose该future
- emitMembershipEvent方法这里主要是更新member在metadataStore的Metadata,如果是isDead则执行metadataStore.removeMetadata(member),其他的则看情况执行metadataStore.updateMetadata(member, metadata)
schedulePeriodicSync
scalecube-cluster-2.2.5/cluster/src/main/java/io/scalecube/cluster/membership/MembershipProtocolImpl.java
public final class MembershipProtocolImpl implements MembershipProtocol {
//......
private void schedulePeriodicSync() {
int syncInterval = config.getSyncInterval();
actionsDisposables.add(
scheduler.schedulePeriodically(
this::doSync, syncInterval, syncInterval, TimeUnit.MILLISECONDS));
}
private void doSync() {
Optional addressOptional = selectSyncAddress();
if (!addressOptional.isPresent()) {
return;
}
Address address = addressOptional.get();
Message message = prepareSyncDataMsg(SYNC, null);
LOGGER.debug("Send Sync: {} to {}", message, address);
transport
.send(address, message)
.subscribe(
null,
ex ->
LOGGER.debug(
"Failed to send Sync: {} to {}, cause: {}", message, address, ex.toString()));
}
private Optional selectSyncAddress() {
List addresses =
Stream.concat(seedMembers.stream(), otherMembers().stream().map(Member::address))
.collect(Collectors.collectingAndThen(Collectors.toSet(), ArrayList::new));
Collections.shuffle(addresses);
if (addresses.isEmpty()) {
return Optional.empty();
} else {
int i = ThreadLocalRandom.current().nextInt(addresses.size());
return Optional.of(addresses.get(i));
}
}
private Message prepareSyncDataMsg(String qualifier, String cid) {
List membershipRecords = new ArrayList<>(membershipTable.values());
SyncData syncData = new SyncData(membershipRecords, config.getSyncGroup());
return Message.withData(syncData)
.qualifier(qualifier)
.correlationId(cid)
.sender(localMember.address())
.build();
}
//......
}
- schedulePeriodically会注册doSync任务每隔syncInterval执行;doSync方法首先调用selectSyncAddress随机选择一个member来作为发送SYNC的目标,之后通过prepareSyncDataMsg构造sync消息,然后通过transport.send发送
小结
- MembershipProtocol接口定义了start、stop、listen、members、otherMembers、member方法;MembershipProtocolImpl实现了MembershipProtocol接口;它定义了MembershipUpdateReason枚举(
FAILURE_DETECTOR_EVENT、MEMBERSHIP_GOSSIP、SYNC、INITIAL_SYNC、SUSPICION_TIMEOUT
) - MembershipProtocolImpl的构造器监听了transport.listen()触发onMessage方法;监听了failureDetector.listen()触发onFailureDetectorEvent方法;监听了gossipProtocol.listen()触发onMembershipGossip方法;MembershipProtocolImpl的start方法在seedMembers.isEmpty()的时候会执行schedulePeriodicSync方法即每隔syncInterval执行doSync方法;当seedMembers不为空时则遍历seedMembers通过transport.requestResponse发送SYNC,执行成功则触发onSyncAck;stop方法则挨个销毁suspicionTimeoutTasks的future
- onMessage方法首先通过checkSyncGroup检查一下是不是该syncGroup的消息;之后根据message.qualifier()是SYNC则执行onSync,是SYNC_ACK则执行onSyncAck;onSync方法则执行syncMembership,成功时向sender返回SYNC_ACK信息;onSyncAck方法也是调用syncMembership,只不过没有再向sender返回信息;syncMembership方法则根据syncData的membership来挨个执行updateMembership方法
- onFailureDetectorEvent方法根据FailureDetectorEvent判断该MembershipRecord的状态是否有变化,如果变为ALIVE则往fdEvent.member().address()发送SYNC信息;否则使用MembershipUpdateReason.FAILURE_DETECTOR_EVENT来updateMembership;onMembershipGossip方法则针对message.qualifier()为MEMBERSHIP_GOSSIP的消息使用MembershipUpdateReason.MEMBERSHIP_GOSSIP来updateMembership
- updateMembership会对比传入的MembershipRecord与本地的localMember,如果是需要更新localMember则执行spreadMembershipGossip,之后根据MembershipRecord的状态做不同处理,比如isDead则从membershipTable移除,比如isSuspect则执行scheduleSuspicionTimeoutTask,否则执行cancelSuspicionTimeoutTask,最后执行emitMembershipEvent及spreadMembershipGossip
- schedulePeriodically会注册doSync任务每隔syncInterval执行;doSync方法首先调用selectSyncAddress随机选择一个member来作为发送SYNC的目标,之后通过prepareSyncDataMsg构造sync消息,然后通过transport.send发送
MembershipProtocolImpl的start方法会注册doSync任务(
每隔syncInterval执行
),该任务会发送SYNC消息给随机选择出来的member,来sync全量的membershipRecords;onMessage方法接收到SYNC消息时执行syncMembership并在成功时返回SYNC_ACK,接收到SYNC_ACK时也是执行syncMembership;onFailureDetectorEvent及onMembershipGossip方法都会触发updateMembership方法来更新membershipTable必要是进行spreadMembershipGossip