NACOS源码学习---raft

一、Raft算法
Raft通过当选的领导者达成共识。筏集群中的服务器是领导者或追随者,并且在选举的精确情况下可以是候选者(领导者不可用)。领导者负责将日志复制到关注者。它通过发送心跳消息定期通知追随者它的存在。每个跟随者都有一个超时(通常在150到300毫秒之间),它期望领导者的心跳。接收心跳时重置超时。如果没有收到心跳,则关注者将其状态更改为候选人并开始领导选举。
注意:在各种分布式中间件里面,离不开ralft共识算法,nacos,kafka,rocketmq,flink,pulsa等等

1.每台Nacos机器注册上去都会给对应的服务器地址发送注册请求。send registRequest
2.这时候每台机器会把对应的请求转发到主节点,
3.主节点收到后会把该请求放到本地内存Map localAllInfoMap
4.主节点会定时从localAllInfoMap注入到机器的发送心跳
5.如果没有收到心跳那么就重试几次,如果再没获取到那么就移除这台机器从Map里面
6.每台机器也会发送beatRequest请求到Nacos服务器中,后面网络恢复后,那么也会注册到Nacos里面
7.集群模式下,各个服务器互相发送请求,通过raflt算法获取到对应的Leader节点
8.leader节点会定时向从节点发送beatLeaderRequest,这样也是一个提升性能点,只用master给其他服务发送请求,
8.如果有一个服务器没有给回复,那么这时候就会触发选举
9.当然子节点也会定时一段时间去发送请求给Leader,但是频率没有Leader发送心跳频率高,这些都是为了提供性能、
10.raft算法细节
RaftCommands.beat()方法处理/v1/ns/raft/beat请求
接收心跳包的 http 接口:

@RestController
@RequestMapping(UtilsAndCommons.NACOS_NAMING_CONTEXT +/raft”)
public class RaftController {
......

@NeedAuth
@RequestMapping(value = "/beat", method = RequestMethod.POST)
public JSONObject beat(HttpServletRequest request, HttpServletResponse response) throws Exception {
    String entity = new String(IoUtils.tryDecompress(request.getInputStream()), "UTF-8");
    String value = URLDecoder.decode(entity, "UTF-8");
    value = URLDecoder.decode(value, "UTF-8");

    // 解析心跳包
    JSONObject json = JSON.parseObject(value);
    JSONObject beat = JSON.parseObject(json.getString("beat"));

    // 处理心跳包并将本节点的信息作为 response 返回
    RaftPeer peer = raftCore.receivedBeat(beat);
    return JSON.parseObject(JSON.toJSONString(peer));
}

......

}

HeartBeat.receivedBeat()处理心跳包

1.如果收到心跳的节点不是Follower角色,则设置为Follower角色,并把它的voteFor设置为Leader节点的ip;
2.重置本地节点的heart timeout、election timeout;
3.调用PeerSet.makeLeader()通知这个节点更新Leader;(也就是说Leader节点会通过心跳通知其它节点更新Leader)
4.检查Datum:
遍历请求参数中的datums,如果Follwoer不存在这个datumKey或者时间戳比较旧,则收集这个datumKey;
每收集到50个datumKey,则向Leader节点的/v1/ns/raft/get路径发送请求,请求参数为这50个datumKey,获取对应的50个最新的Datum对象;

遍历这些Daum对象,接下来做的是和RaftCore.onPublish()方法中做的事类似:
1.调用RaftStore#write将Datum序列化为json写到cacheFile中
2.将Datum存放到RaftCore的datums集合中,key为上面的datum的key值
3.更新本地节点的election timeout
4.更新本地节点的任期term
5.本地节点的任期term持久化到properties文件中
6.调用notifier.addTask(datum, Notifier.ApplyAction.CHANGE);

通知对应的RaftListener

RaftCore.deleteDatum(String key)用来删除旧的Datum
datums集合中删除key对应的Datum;
RaftStore.delete(),在磁盘上删除这个Datum对应的文件;
notifier.addTask(deleted, Notifier.ApplyAction.DELETE),通知对应的RaftListener有DELETE事件。

本地节点的RaftPeer作为http响应返回。

@Component
public class RaftCore {
......

public RaftPeer receivedBeat(JSONObject beat) throws Exception {
    final RaftPeer local = peers.local();
    // 解析发送心跳包的节点信息
    final RaftPeer remote = new RaftPeer();
    remote.ip = beat.getJSONObject("peer").getString("ip");
    remote.state = RaftPeer.State.valueOf(beat.getJSONObject("peer").getString("state"));
    remote.term.set(beat.getJSONObject("peer").getLongValue("term"));
    remote.heartbeatDueMs = beat.getJSONObject("peer").getLongValue("heartbeatDueMs");
    remote.leaderDueMs = beat.getJSONObject("peer").getLongValue("leaderDueMs");
    remote.voteFor = beat.getJSONObject("peer").getString("voteFor");

    // 若收到的心跳包不是 leader 节点发送的,则抛异常
    if (remote.state != RaftPeer.State.LEADER) {
        Loggers.RAFT.info("[RAFT] invalid state from master, state: {}, remote peer: {}",
            remote.state, JSON.toJSONString(remote));
        throw new IllegalArgumentException("invalid state from master, state: " + remote.state);
    }

    // 本地 term 大于心跳包的 term,则心跳包不进行处理
    if (local.term.get() > remote.term.get()) {
        Loggers.RAFT.info("[RAFT] out of date beat, beat-from-term: {}, beat-to-term: {}, remote peer: {}, and leaderDueMs: {}"
            , remote.term.get(), local.term.get(), JSON.toJSONString(remote), local.leaderDueMs);
        throw new IllegalArgumentException("out of date beat, beat-from-term: " + remote.term.get()
            + ", beat-to-term: " + local.term.get());
    }

    // 若当前节点不是 follower 节点,则将其更新为 follower 节点
    if (local.state != RaftPeer.State.FOLLOWER) {
        Loggers.RAFT.info("[RAFT] make remote as leader, remote peer: {}", JSON.toJSONString(remote));
        // mk follower
        local.state = RaftPeer.State.FOLLOWER;
        local.voteFor = remote.ip;
    }

    final JSONArray beatDatums = beat.getJSONArray("datums");
    // 更新心跳包发送间隔和收不到心跳包的选举间隔
    local.resetLeaderDue();
    local.resetHeartbeatDue();

    // 更新 leader 信息,将 remote 设置为新 leader,更新原有 leader 的节点信息
    peers.makeLeader(remote);

    // 将当前节点的 key 存放到一个 map 中,value 都为 0
    Map<String, Integer> receivedKeysMap = new HashMap<String, Integer>(datums.size());
    for (Map.Entry<String, Datum> entry : datums.entrySet()) {
        receivedKeysMap.put(entry.getKey(), 0);
    }

    // 检查接收到的 datum 列表
    List<String> batch = new ArrayList<String>();
    if (!switchDomain.isSendBeatOnly()) {
        int processedCount = 0;
        Loggers.RAFT.info("[RAFT] received beat with {} keys, RaftCore.datums' size is {}, remote server: {}, term: {}, local term: {}",
            beatDatums.size(), datums.size(), remote.ip, remote.term, local.term);
        for (Object object : beatDatums) {
            processedCount = processedCount + 1;

            JSONObject entry = (JSONObject) object;
            String key = entry.getString("key");
            final String datumKey;
            // 构建 datumKey(加上前缀,发送的时候 key 是去掉了前缀的)
            if (KeyBuilder.matchServiceMetaKey(key)) {
                datumKey = KeyBuilder.detailServiceMetaKey(key);
            } else if (KeyBuilder.matchInstanceListKey(key)) {
                datumKey = KeyBuilder.detailInstanceListkey(key);
            } else {
                // ignore corrupted key:
                continue;
            }

            // 获取收到的 key 对应的版本
            long timestamp = entry.getLong("timestamp");

            // 将收到的 key 在本地 key 的 map 中标记为 1
            receivedKeysMap.put(datumKey, 1);

            try {
                // 收到的 key 在本地存在 并且 本地的版本大于收到的版本 并且 还有数据未处理,则直接 continue
                if (datums.containsKey(datumKey) && datums.get(datumKey).timestamp.get() >= timestamp && processedCount < beatDatums.size()) {
                    continue;
                }

                // 若收到的 key 在本地没有,或者本地的版本小于收到的版本,放入 batch,准备下一步获取数据
                if (!(datums.containsKey(datumKey) && datums.get(datumKey).timestamp.get() >= timestamp)) {
                    batch.add(datumKey);
                }

                // 只有 batch 的数量超过 50 或已经处理完了,才进行获取数据操作
                if (batch.size() < 50 && processedCount < beatDatums.size()) {
                    continue;
                }

                String keys = StringUtils.join(batch, ",");

                if (batch.size() <= 0) {
                    continue;
                }

                Loggers.RAFT.info("get datums from leader: {}, batch size is {}, processedCount is {}, datums' size is {}, RaftCore.datums' size is {}"
                    , getLeader().ip, batch.size(), processedCount, beatDatums.size(), datums.size());

                // 获取对应 key 的数据
                // update datum entry
                String url = buildURL(remote.ip, API_GET) + "?keys=" + URLEncoder.encode(keys, "UTF-8");
                HttpClient.asyncHttpGet(url, null, null, new AsyncCompletionHandler<Integer>() {
                    @Override
                    public Integer onCompleted(Response response) throws Exception {
                        if (response.getStatusCode() != HttpURLConnection.HTTP_OK) {
                            return 1;
                        }

                        List<Datum> datumList = JSON.parseObject(response.getResponseBody(), new TypeReference<List<Datum>>() {
                        });

                        // 更新本地数据
                        for (Datum datum : datumList) {
                            OPERATE_LOCK.lock();
                            try {
                                Datum oldDatum = getDatum(datum.key);

                                if (oldDatum != null && datum.timestamp.get() <= oldDatum.timestamp.get()) {
                                    Loggers.RAFT.info("[NACOS-RAFT] timestamp is smaller than that of mine, key: {}, remote: {}, local: {}",
                                        datum.key, datum.timestamp, oldDatum.timestamp);
                                    continue;
                                }

                                raftStore.write(datum);

                                if (KeyBuilder.matchServiceMetaKey(datum.key)) {
                                    Datum<Service> serviceDatum = new Datum<>();
                                    serviceDatum.key = datum.key;
                                    serviceDatum.timestamp.set(datum.timestamp.get());
                                    serviceDatum.value = JSON.parseObject(JSON.toJSONString(datum.value), Service.class);
                                    datum = serviceDatum;
                                }

                                if (KeyBuilder.matchInstanceListKey(datum.key)) {
                                    Datum<Instances> instancesDatum = new Datum<>();
                                    instancesDatum.key = datum.key;
                                    instancesDatum.timestamp.set(datum.timestamp.get());
                                    instancesDatum.value = JSON.parseObject(JSON.toJSONString(datum.value), Instances.class);
                                    datum = instancesDatum;
                                }

                                datums.put(datum.key, datum);
                                notifier.addTask(datum.key, ApplyAction.CHANGE);

                                local.resetLeaderDue();

                                if (local.term.get() + 100 > remote.term.get()) {
                                    getLeader().term.set(remote.term.get());
                                    local.term.set(getLeader().term.get());
                                } else {
                                    local.term.addAndGet(100);
                                }

                                raftStore.updateTerm(local.term.get());

                                Loggers.RAFT.info("data updated, key: {}, timestamp: {}, from {}, local term: {}",
                                    datum.key, datum.timestamp, JSON.toJSONString(remote), local.term);

                            } catch (Throwable e) {
                                Loggers.RAFT.error("[RAFT-BEAT] failed to sync datum from leader, key: {} {}", datum.key, e);
                            } finally {
                                OPERATE_LOCK.unlock();
                            }
                        }
                        TimeUnit.MILLISECONDS.sleep(200);
                        return 0;
                    }
                });

                batch.clear();
            } catch (Exception e) {
                Loggers.RAFT.error("[NACOS-RAFT] failed to handle beat entry, key: {}", datumKey);
            }
        }

        // 若某个 key 在本地存在但收到的 key 列表中没有,则证明 leader 已经删除,那么本地也要删除
        List<String> deadKeys = new ArrayList<String>();
        for (Map.Entry<String, Integer> entry : receivedKeysMap.entrySet()) {
            if (entry.getValue() == 0) {
                deadKeys.add(entry.getKey());
            }
        }

        for (String deadKey : deadKeys) {
            try {
                deleteDatum(deadKey);
            } catch (Exception e) {
                Loggers.RAFT.error("[NACOS-RAFT] failed to remove entry, key={} {}", deadKey, e);
            }
        }
    }

    return local;
}

}

总结
Nacos 制定自己raft时做了一些变更;

     变更一:

          leader 任期没有超时现象,在发起心跳的时候都会在重置任期时间,导致不超时,除非宕机;避免了node之间频繁通讯;同时通过心跳机制重置其它节点为follower,避免长时间双leader 现象

    变更二:

         选举未采用双阶段选举模式,简化了模式;通过数据变更term+100 的方式来解决短时间分区问题;

特征:

    一、term 的变更发生在两个地方:1.leader 选举,加1;2.数据更新,加100;

    二、心跳只能leader 发送;

    三、数据同步term必须大于等于本地term才是更新的前提;

    四、选举是发起方的term必须大于本地term

针对双leader 项目后续1.4版本会避开,已咨询过

你可能感兴趣的:(springboot,学习,java,开发语言)