聊聊scalecube-cluster的MembershipProtocol

本文主要研究一下scalecube-cluster的MembershipProtocol

MembershipProtocol

scalecube-cluster-2.2.5/cluster/src/main/java/io/scalecube/cluster/membership/MembershipProtocol.java

/**
 * Cluster Membership Protocol component responsible for managing information about existing members
 * of the cluster.
 */
public interface MembershipProtocol {

  /**
   * Starts running cluster membership protocol. After started it begins to receive and send cluster
   * membership messages
   */
  Mono start();

  /** Stops running cluster membership protocol and releases occupied resources. */
  void stop();

  /** Listen changes in cluster membership. */
  Flux listen();

  /**
   * Returns list of all members of the joined cluster. This will include all cluster members
   * including local member.
   *
   * @return all members in the cluster (including local one)
   */
  Collection members();

  /**
   * Returns list of all cluster members of the joined cluster excluding local member.
   *
   * @return all members in the cluster (excluding local one)
   */
  Collection otherMembers();

  /**
   * Returns local cluster member which corresponds to this cluster instance.
   *
   * @return local member
   */
  Member member();

  /**
   * Returns cluster member with given id or null if no member with such id exists at joined
   * cluster.
   *
   * @return member by id
   */
  Optional member(String id);

  /**
   * Returns cluster member by given address or null if no member with such address exists at joined
   * cluster.
   *
   * @return member by address
   */
  Optional member(Address address);
}
  • MembershipProtocol接口定义了start、stop、listen、members、otherMembers、member方法

MembershipProtocolImpl

scalecube-cluster-2.2.5/cluster/src/main/java/io/scalecube/cluster/membership/MembershipProtocolImpl.java

public final class MembershipProtocolImpl implements MembershipProtocol {

  private static final Logger LOGGER = LoggerFactory.getLogger(MembershipProtocolImpl.class);

  private enum MembershipUpdateReason {
    FAILURE_DETECTOR_EVENT,
    MEMBERSHIP_GOSSIP,
    SYNC,
    INITIAL_SYNC,
    SUSPICION_TIMEOUT
  }

  // Qualifiers

  public static final String SYNC = "sc/membership/sync";
  public static final String SYNC_ACK = "sc/membership/syncAck";
  public static final String MEMBERSHIP_GOSSIP = "sc/membership/gossip";

  private final Member localMember;

  // Injected

  private final Transport transport;
  private final MembershipConfig config;
  private final List
seedMembers; private final FailureDetector failureDetector; private final GossipProtocol gossipProtocol; private final MetadataStore metadataStore; private final CorrelationIdGenerator cidGenerator; // State private final Map membershipTable = new HashMap<>(); private final Map members = new HashMap<>(); // Subject private final FluxProcessor subject = DirectProcessor.create().serialize(); private final FluxSink sink = subject.sink(); // Disposables private final Disposable.Composite actionsDisposables = Disposables.composite(); // Scheduled private final Scheduler scheduler; private final Map suspicionTimeoutTasks = new HashMap<>(); /** * Creates new instantiates of cluster membership protocol with given transport and config. * * @param localMember local cluster member * @param transport cluster transport * @param failureDetector failure detector * @param gossipProtocol gossip protocol * @param metadataStore metadata store * @param config membership config parameters * @param scheduler scheduler * @param cidGenerator correlation id generator */ public MembershipProtocolImpl( Member localMember, Transport transport, FailureDetector failureDetector, GossipProtocol gossipProtocol, MetadataStore metadataStore, MembershipConfig config, Scheduler scheduler, CorrelationIdGenerator cidGenerator) { this.transport = Objects.requireNonNull(transport); this.config = Objects.requireNonNull(config); this.failureDetector = Objects.requireNonNull(failureDetector); this.gossipProtocol = Objects.requireNonNull(gossipProtocol); this.metadataStore = Objects.requireNonNull(metadataStore); this.localMember = Objects.requireNonNull(localMember); this.scheduler = Objects.requireNonNull(scheduler); this.cidGenerator = Objects.requireNonNull(cidGenerator); // Prepare seeds seedMembers = cleanUpSeedMembers(config.getSeedMembers()); // Init membership table with local member record membershipTable.put(localMember.id(), new MembershipRecord(localMember, ALIVE, 0)); // fill in the table of members with local member members.put(localMember.id(), localMember); actionsDisposables.addAll( Arrays.asList( // Listen to incoming SYNC and SYNC ACK requests from other members transport .listen() // .publishOn(scheduler) .subscribe(this::onMessage, this::onError), // Listen to events from failure detector failureDetector .listen() .publishOn(scheduler) .subscribe(this::onFailureDetectorEvent, this::onError), // Listen to membership gossips gossipProtocol .listen() .publishOn(scheduler) .subscribe(this::onMembershipGossip, this::onError))); } @Override public Flux listen() { return subject.onBackpressureBuffer(); } @Override public Mono start() { // Make initial sync with all seed members return Mono.create( sink -> { // In case no members at the moment just schedule periodic sync if (seedMembers.isEmpty()) { schedulePeriodicSync(); sink.success(); return; } // If seed addresses are specified in config - send initial sync to those nodes LOGGER.debug("Making initial Sync to all seed members: {}", seedMembers); //noinspection unchecked Mono[] syncs = seedMembers .stream() .map( address -> { String cid = cidGenerator.nextCid(); return transport .requestResponse(prepareSyncDataMsg(SYNC, cid), address) .filter(this::checkSyncGroup); }) .toArray(Mono[]::new); // Process initial SyncAck Flux.mergeDelayError(syncs.length, syncs) .take(1) .timeout(Duration.ofMillis(config.getSyncTimeout()), scheduler) .publishOn(scheduler) .flatMap(message -> onSyncAck(message, true)) .doFinally( s -> { schedulePeriodicSync(); sink.success(); }) .subscribe( null, ex -> LOGGER.info("Exception on initial SyncAck, cause: {}", ex.toString())); }); } @Override public void stop() { // Stop accepting requests, events and sending sync actionsDisposables.dispose(); // Cancel remove members tasks for (String memberId : suspicionTimeoutTasks.keySet()) { Disposable future = suspicionTimeoutTasks.get(memberId); if (future != null && !future.isDisposed()) { future.dispose(); } } suspicionTimeoutTasks.clear(); // Stop publishing events sink.complete(); } @Override public Collection members() { return new ArrayList<>(members.values()); } @Override public Collection otherMembers() { return new ArrayList<>(members.values()) .stream() .filter(member -> !member.equals(localMember)) .collect(Collectors.toList()); } @Override public Member member() { return localMember; } @Override public Optional member(String id) { return Optional.ofNullable(members.get(id)); } @Override public Optional member(Address address) { return new ArrayList<>(members.values()) .stream() .filter(member -> member.address().equals(address)) .findFirst(); } //...... }
  • MembershipProtocolImpl实现了MembershipProtocol接口;它定义了MembershipUpdateReason枚举(FAILURE_DETECTOR_EVENT、MEMBERSHIP_GOSSIP、SYNC、INITIAL_SYNC、SUSPICION_TIMEOUT)
  • MembershipProtocolImpl的构造器监听了transport.listen()触发onMessage方法;监听了failureDetector.listen()触发onFailureDetectorEvent方法;监听了gossipProtocol.listen()触发onMembershipGossip方法
  • MembershipProtocolImpl的start方法在seedMembers.isEmpty()的时候会执行schedulePeriodicSync方法即每隔syncInterval执行doSync方法;当seedMembers不为空时则遍历seedMembers通过transport.requestResponse发送SYNC,执行成功则触发onSyncAck;stop方法则挨个销毁suspicionTimeoutTasks的future

onMessage

scalecube-cluster-2.2.5/cluster/src/main/java/io/scalecube/cluster/membership/MembershipProtocolImpl.java

public final class MembershipProtocolImpl implements MembershipProtocol {
  //......

  private void onMessage(Message message) {
    if (checkSyncGroup(message)) {
      if (SYNC.equals(message.qualifier())) {
        onSync(message).subscribe(null, this::onError);
      }
      if (SYNC_ACK.equals(message.qualifier())) {
        if (message.correlationId() == null) { // filter out initial sync
          onSyncAck(message, false).subscribe(null, this::onError);
        }
      }
    }
  }

  private boolean checkSyncGroup(Message message) {
    if (message.data() instanceof SyncData) {
      SyncData syncData = message.data();
      return config.getSyncGroup().equals(syncData.getSyncGroup());
    }
    return false;
  }

  /** Merges incoming SYNC data, merges it and sending back merged data with SYNC_ACK. */
  private Mono onSync(Message syncMsg) {
    return Mono.defer(
        () -> {
          LOGGER.debug("Received Sync: {}", syncMsg);
          return syncMembership(syncMsg.data(), false)
              .doOnSuccess(
                  avoid -> {
                    Message message = prepareSyncDataMsg(SYNC_ACK, syncMsg.correlationId());
                    Address address = syncMsg.sender();
                    transport
                        .send(address, message)
                        .subscribe(
                            null,
                            ex ->
                                LOGGER.debug(
                                    "Failed to send SyncAck: {} to {}, cause: {}",
                                    message,
                                    address,
                                    ex.toString()));
                  });
        });
  }

  private Mono onSyncAck(Message syncAckMsg, boolean onStart) {
    return Mono.defer(
        () -> {
          LOGGER.debug("Received SyncAck: {}", syncAckMsg);
          return syncMembership(syncAckMsg.data(), onStart);
        });
  }

  private Mono syncMembership(SyncData syncData, boolean onStart) {
    return Mono.defer(
        () -> {
          MembershipUpdateReason reason =
              onStart ? MembershipUpdateReason.INITIAL_SYNC : MembershipUpdateReason.SYNC;
          return Mono.whenDelayError(
              syncData
                  .getMembership()
                  .stream()
                  .filter(r1 -> !r1.equals(membershipTable.get(r1.id())))
                  .map(r1 -> updateMembership(r1, reason))
                  .toArray(Mono[]::new));
        });
  }

  //......

}
  • onMessage方法首先通过checkSyncGroup检查一下是不是该syncGroup的消息;之后根据message.qualifier()是SYNC则执行onSync,是SYNC_ACK则执行onSyncAck
  • onSync方法则执行syncMembership,成功时向sender返回SYNC_ACK信息;onSyncAck方法也是调用syncMembership,只不过没有再向sender返回信息
  • syncMembership方法则根据syncData的membership来挨个执行updateMembership方法

onFailureDetectorEvent

scalecube-cluster-2.2.5/cluster/src/main/java/io/scalecube/cluster/membership/MembershipProtocolImpl.java

public final class MembershipProtocolImpl implements MembershipProtocol {
  //......

  /** Merges FD updates and processes them. */
  private void onFailureDetectorEvent(FailureDetectorEvent fdEvent) {
    MembershipRecord r0 = membershipTable.get(fdEvent.member().id());
    if (r0 == null) { // member already removed
      return;
    }
    if (r0.status() == fdEvent.status()) { // status not changed
      return;
    }
    LOGGER.debug("Received status change on failure detector event: {}", fdEvent);
    if (fdEvent.status() == ALIVE) {
      // TODO: Consider to make more elegant solution
      // Alive won't override SUSPECT so issue instead extra sync with member to force it spread
      // alive with inc + 1
      Message syncMsg = prepareSyncDataMsg(SYNC, null);
      Address address = fdEvent.member().address();
      transport
          .send(address, syncMsg)
          .subscribe(
              null,
              ex ->
                  LOGGER.debug(
                      "Failed to send {} to {}, cause: {}", syncMsg, address, ex.toString()));
    } else {
      MembershipRecord record =
          new MembershipRecord(r0.member(), fdEvent.status(), r0.incarnation());
      updateMembership(record, MembershipUpdateReason.FAILURE_DETECTOR_EVENT)
          .subscribe(null, this::onError);
    }
  }

  //......

}
  • onFailureDetectorEvent方法根据FailureDetectorEvent判断该MembershipRecord的状态是否有变化,如果变为ALIVE则往fdEvent.member().address()发送SYNC信息;否则使用MembershipUpdateReason.FAILURE_DETECTOR_EVENT来updateMembership

onMembershipGossip

scalecube-cluster-2.2.5/cluster/src/main/java/io/scalecube/cluster/membership/MembershipProtocolImpl.java

public final class MembershipProtocolImpl implements MembershipProtocol {
  //......

  /** Merges received membership gossip (not spreading gossip further). */
  private void onMembershipGossip(Message message) {
    if (MEMBERSHIP_GOSSIP.equals(message.qualifier())) {
      MembershipRecord record = message.data();
      LOGGER.debug("Received membership gossip: {}", record);
      updateMembership(record, MembershipUpdateReason.MEMBERSHIP_GOSSIP)
          .subscribe(null, this::onError);
    }
  }

  //......

}
  • onMembershipGossip方法则针对message.qualifier()为MEMBERSHIP_GOSSIP的消息使用MembershipUpdateReason.MEMBERSHIP_GOSSIP来updateMembership

updateMembership

scalecube-cluster-2.2.5/cluster/src/main/java/io/scalecube/cluster/membership/MembershipProtocolImpl.java

public final class MembershipProtocolImpl implements MembershipProtocol {
  //......

  /**
   * Try to update membership table with the given record.
   *
   * @param r1 new membership record which compares with existing r0 record
   * @param reason indicating the reason for updating membership table
   */
  private Mono updateMembership(MembershipRecord r1, MembershipUpdateReason reason) {
    return Mono.defer(
        () -> {
          Objects.requireNonNull(r1, "Membership record can't be null");
          // Get current record
          MembershipRecord r0 = membershipTable.get(r1.id());

          // Check if new record r1 overrides existing membership record r0
          if (!r1.isOverrides(r0)) {
            return Mono.empty();
          }

          // If received updated for local member then increase incarnation and spread Alive gossip
          if (r1.member().id().equals(localMember.id())) {
            int currentIncarnation = Math.max(r0.incarnation(), r1.incarnation());
            MembershipRecord r2 =
                new MembershipRecord(localMember, r0.status(), currentIncarnation + 1);

            membershipTable.put(localMember.id(), r2);

            LOGGER.debug(
                "Local membership record r0: {}, but received r1: {}, "
                    + "spread with increased incarnation r2: {}",
                r0,
                r1,
                r2);

            spreadMembershipGossip(r2)
                .subscribe(
                    null,
                    ex -> {
                      // on-op
                    });
            return Mono.empty();
          }

          // Update membership
          if (r1.isDead()) {
            membershipTable.remove(r1.id());
          } else {
            membershipTable.put(r1.id(), r1);
          }

          // Schedule/cancel suspicion timeout task
          if (r1.isSuspect()) {
            scheduleSuspicionTimeoutTask(r1);
          } else {
            cancelSuspicionTimeoutTask(r1.id());
          }

          // Emit membership and regardless of result spread gossip
          return emitMembershipEvent(r0, r1)
              .doFinally(
                  s -> {
                    // Spread gossip (unless already gossiped)
                    if (reason != MembershipUpdateReason.MEMBERSHIP_GOSSIP
                        && reason != MembershipUpdateReason.INITIAL_SYNC) {
                      spreadMembershipGossip(r1)
                          .subscribe(
                              null,
                              ex -> {
                                // no-op
                              });
                    }
                  });
        });
  }

  private Mono spreadMembershipGossip(MembershipRecord record) {
    return Mono.defer(
        () -> {
          Message msg = Message.withData(record).qualifier(MEMBERSHIP_GOSSIP).build();
          LOGGER.debug("Spead membreship: {} with gossip", msg);
          return gossipProtocol
              .spread(msg)
              .doOnError(
                  ex ->
                      LOGGER.debug(
                          "Failed to spread membership: {} with gossip, cause: {}",
                          msg,
                          ex.toString()))
              .then();
        });
  }

  private void scheduleSuspicionTimeoutTask(MembershipRecord record) {
    long suspicionTimeout =
        ClusterMath.suspicionTimeout(
            config.getSuspicionMult(), membershipTable.size(), config.getPingInterval());
    suspicionTimeoutTasks.computeIfAbsent(
        record.id(),
        id ->
            scheduler.schedule(
                () -> onSuspicionTimeout(id), suspicionTimeout, TimeUnit.MILLISECONDS));
  }

  private void onSuspicionTimeout(String memberId) {
    suspicionTimeoutTasks.remove(memberId);
    MembershipRecord record = membershipTable.get(memberId);
    if (record != null) {
      LOGGER.debug("Declare SUSPECTED member {} as DEAD by timeout", record);
      MembershipRecord deadRecord =
          new MembershipRecord(record.member(), DEAD, record.incarnation());
      updateMembership(deadRecord, MembershipUpdateReason.SUSPICION_TIMEOUT)
          .subscribe(null, this::onError);
    }
  }

  private void cancelSuspicionTimeoutTask(String memberId) {
    Disposable future = suspicionTimeoutTasks.remove(memberId);
    if (future != null && !future.isDisposed()) {
      future.dispose();
    }
  }

  private Mono emitMembershipEvent(MembershipRecord r0, MembershipRecord r1) {
    return Mono.defer(
        () -> {
          final Member member = r1.member();

          if (r1.isDead()) {
            members.remove(member.id());
            // removed
            return Mono.fromRunnable(
                () -> {
                  Map metadata = metadataStore.removeMetadata(member);
                  sink.next(MembershipEvent.createRemoved(member, metadata));
                });
          }

          if (r0 == null && r1.isAlive()) {
            members.put(member.id(), member);
            // added
            return metadataStore
                .fetchMetadata(member)
                .doOnSuccess(
                    metadata -> {
                      metadataStore.updateMetadata(member, metadata);
                      sink.next(MembershipEvent.createAdded(member, metadata));
                    })
                .onErrorResume(TimeoutException.class, e -> Mono.empty())
                .then();
          }

          if (r0 != null && r0.incarnation() < r1.incarnation()) {
            // updated
            return metadataStore
                .fetchMetadata(member)
                .doOnSuccess(
                    metadata1 -> {
                      Map metadata0 =
                          metadataStore.updateMetadata(member, metadata1);
                      sink.next(MembershipEvent.createUpdated(member, metadata0, metadata1));
                    })
                .onErrorResume(TimeoutException.class, e -> Mono.empty())
                .then();
          }

          return Mono.empty();
        });
  }

  //......

}
  • updateMembership会对比传入的MembershipRecord与本地的localMember,如果是需要更新localMember则执行spreadMembershipGossip,之后根据MembershipRecord的状态做不同处理,比如isDead则从membershipTable移除,比如isSuspect则执行scheduleSuspicionTimeoutTask,否则执行cancelSuspicionTimeoutTask,最后执行emitMembershipEvent及spreadMembershipGossip
  • scheduleSuspicionTimeoutTask方法计算suspicionTimeout然后注册一个SuspicionTimeout的延时任务如果suspicionTimeoutTasks没有该record.id()的task的话;onSuspicionTimeout首先将该task从suspicionTimeoutTasks移除,然后使用MembershipUpdateReason.SUSPICION_TIMEOUT来updateMembership;cancelSuspicionTimeoutTask方法也是将该task从suspicionTimeoutTasks移除,并dispose该future
  • emitMembershipEvent方法这里主要是更新member在metadataStore的Metadata,如果是isDead则执行metadataStore.removeMetadata(member),其他的则看情况执行metadataStore.updateMetadata(member, metadata)

schedulePeriodicSync

scalecube-cluster-2.2.5/cluster/src/main/java/io/scalecube/cluster/membership/MembershipProtocolImpl.java

public final class MembershipProtocolImpl implements MembershipProtocol {
  //......

  private void schedulePeriodicSync() {
    int syncInterval = config.getSyncInterval();
    actionsDisposables.add(
        scheduler.schedulePeriodically(
            this::doSync, syncInterval, syncInterval, TimeUnit.MILLISECONDS));
  }

  private void doSync() {
    Optional
addressOptional = selectSyncAddress(); if (!addressOptional.isPresent()) { return; } Address address = addressOptional.get(); Message message = prepareSyncDataMsg(SYNC, null); LOGGER.debug("Send Sync: {} to {}", message, address); transport .send(address, message) .subscribe( null, ex -> LOGGER.debug( "Failed to send Sync: {} to {}, cause: {}", message, address, ex.toString())); } private Optional
selectSyncAddress() { List
addresses = Stream.concat(seedMembers.stream(), otherMembers().stream().map(Member::address)) .collect(Collectors.collectingAndThen(Collectors.toSet(), ArrayList::new)); Collections.shuffle(addresses); if (addresses.isEmpty()) { return Optional.empty(); } else { int i = ThreadLocalRandom.current().nextInt(addresses.size()); return Optional.of(addresses.get(i)); } } private Message prepareSyncDataMsg(String qualifier, String cid) { List membershipRecords = new ArrayList<>(membershipTable.values()); SyncData syncData = new SyncData(membershipRecords, config.getSyncGroup()); return Message.withData(syncData) .qualifier(qualifier) .correlationId(cid) .sender(localMember.address()) .build(); } //...... }
  • schedulePeriodically会注册doSync任务每隔syncInterval执行;doSync方法首先调用selectSyncAddress随机选择一个member来作为发送SYNC的目标,之后通过prepareSyncDataMsg构造sync消息,然后通过transport.send发送

小结

  • MembershipProtocol接口定义了start、stop、listen、members、otherMembers、member方法;MembershipProtocolImpl实现了MembershipProtocol接口;它定义了MembershipUpdateReason枚举(FAILURE_DETECTOR_EVENT、MEMBERSHIP_GOSSIP、SYNC、INITIAL_SYNC、SUSPICION_TIMEOUT)
  • MembershipProtocolImpl的构造器监听了transport.listen()触发onMessage方法;监听了failureDetector.listen()触发onFailureDetectorEvent方法;监听了gossipProtocol.listen()触发onMembershipGossip方法;MembershipProtocolImpl的start方法在seedMembers.isEmpty()的时候会执行schedulePeriodicSync方法即每隔syncInterval执行doSync方法;当seedMembers不为空时则遍历seedMembers通过transport.requestResponse发送SYNC,执行成功则触发onSyncAck;stop方法则挨个销毁suspicionTimeoutTasks的future
  • onMessage方法首先通过checkSyncGroup检查一下是不是该syncGroup的消息;之后根据message.qualifier()是SYNC则执行onSync,是SYNC_ACK则执行onSyncAck;onSync方法则执行syncMembership,成功时向sender返回SYNC_ACK信息;onSyncAck方法也是调用syncMembership,只不过没有再向sender返回信息;syncMembership方法则根据syncData的membership来挨个执行updateMembership方法
  • onFailureDetectorEvent方法根据FailureDetectorEvent判断该MembershipRecord的状态是否有变化,如果变为ALIVE则往fdEvent.member().address()发送SYNC信息;否则使用MembershipUpdateReason.FAILURE_DETECTOR_EVENT来updateMembership;onMembershipGossip方法则针对message.qualifier()为MEMBERSHIP_GOSSIP的消息使用MembershipUpdateReason.MEMBERSHIP_GOSSIP来updateMembership
  • updateMembership会对比传入的MembershipRecord与本地的localMember,如果是需要更新localMember则执行spreadMembershipGossip,之后根据MembershipRecord的状态做不同处理,比如isDead则从membershipTable移除,比如isSuspect则执行scheduleSuspicionTimeoutTask,否则执行cancelSuspicionTimeoutTask,最后执行emitMembershipEvent及spreadMembershipGossip
  • schedulePeriodically会注册doSync任务每隔syncInterval执行;doSync方法首先调用selectSyncAddress随机选择一个member来作为发送SYNC的目标,之后通过prepareSyncDataMsg构造sync消息,然后通过transport.send发送
MembershipProtocolImpl的start方法会注册doSync任务( 每隔syncInterval执行),该任务会发送SYNC消息给随机选择出来的member,来sync全量的membershipRecords;onMessage方法接收到SYNC消息时执行syncMembership并在成功时返回SYNC_ACK,接收到SYNC_ACK时也是执行syncMembership;onFailureDetectorEvent及onMembershipGossip方法都会触发updateMembership方法来更新membershipTable必要是进行spreadMembershipGossip

doc

  • MembershipProtocol

你可能感兴趣的:(聊聊scalecube-cluster的MembershipProtocol)