衣舞晨风

【Elasticsearch源码】分片恢复分析

带着疑问学源码，第七篇：Elasticsearch 分片恢复分析
代码分析基于：https://github.com/jiankunking/elasticsearch
Elasticsearch 8.0.0-SNAPSHOT

目的

在看源码之前先梳理一下，自己对于分片恢复的疑问点：

网上对于ElasticSearch分片恢复的逻辑说法一抓一把，网上说的对不对？新版本中有没有更新？
在分片恢复的时候，如果收到Api _forcemerge请求，这时候，会如何处理?(因为副本恢复的第一节点是复制segment文件)
- 这部分等看/_forcemerge api的时候，再解答一下。
分片恢复的第二阶段是同步translog,这一步会不会加锁？不加锁的话，如何确保是同步完成了？

如果说看源码有捷径的话，那么找到网上一篇写的比较权威的源码分析文章跟着看，那不失为一种好方法。
下面源码分析部分将参考腾讯云的:Elasticsearch 底层系列之分片恢复解析，一边参考，一边印证。

源码分析

目标节点请求恢复

先找到分片恢复的入口:IndicesClusterStateService.createOrUpdateShards

在这里会判断本地节点是否在routingNodes中，如果在，说明本地节点有分片创建或更新的需求，否则跳过。

private void createOrUpdateShards(final ClusterState state) {
        // 节点到索引分片的映射关系，主要用于分片分配、均衡决策
        // 具体的内容可以看下:https://jiankunking.com/elasticsearch-cluster-state.html
        RoutingNode localRoutingNode = state.getRoutingNodes().node(state.nodes().getLocalNodeId());
        if (localRoutingNode == null) {
            return;
        }

        DiscoveryNodes nodes = state.nodes();
        RoutingTable routingTable = state.routingTable();

        for (final ShardRouting shardRouting : localRoutingNode) {
            ShardId shardId = shardRouting.shardId();
            // failedShardsCache:https://github.com/jiankunking/elasticsearch/blob/master/server/src/main/java/org/elasticsearch/indices/cluster/IndicesClusterStateService.java#L116
            // 恢复过程中失败的碎片列表；我们跟踪这些碎片，以防止在每次集群状态更新时重复恢复这些碎片
            if (failedShardsCache.containsKey(shardId) == false) {
                AllocatedIndex indexService = indicesService.indexService(shardId.getIndex());
                assert indexService != null : "index " + shardId.getIndex() + " should have been created by createIndices";
                Shard shard = indexService.getShardOrNull(shardId.id());
                if (shard == null) { // shard不存在则需创建
                    assert shardRouting.initializing() : shardRouting + " should have been removed by failMissingShards";
                    createShard(nodes, routingTable, shardRouting, state);
                } else { // 存在则更新
                    updateShard(nodes, shardRouting, shard, routingTable, state);
                }
            }
        }
    }

副本分片恢复走的是createShard分支，在该方法中，首先获取shardRouting的类型，如果恢复类型为PEER，说明该分片需要从远端获取，则需要找到源节点，然后调用IndicesService.createShard：

RecoverySource的Type有以下几种：

EMPTY_STORE,
EXISTING_STORE,//主分片本地恢复
PEER,//副分片从远处主分片恢复
SNAPSHOT,//从快照恢复
LOCAL_SHARDS//从本节点其它分片恢复(shrink时)

createShard代码如下:

private void createShard(DiscoveryNodes nodes, RoutingTable routingTable, ShardRouting shardRouting, ClusterState state) {
        assert shardRouting.initializing() : "only allow shard creation for initializing shard but was " + shardRouting;

        DiscoveryNode sourceNode = null;
        // 如果恢复方式是peer，则会找到shard所在的源节点进行恢复
        if (shardRouting.recoverySource().getType() == Type.PEER)  {
            sourceNode = findSourceNodeForPeerRecovery(logger, routingTable, nodes, shardRouting);
            if (sourceNode == null) {
                logger.trace("ignoring initializing shard {} - no source node can be found.", shardRouting.shardId());
                return;
            }
        }

        try {
            final long primaryTerm = state.metadata().index(shardRouting.index()).primaryTerm(shardRouting.id());
            logger.debug("{} creating shard with primary term [{}]", shardRouting.shardId(), primaryTerm);
            indicesService.createShard(
                    shardRouting,
                    recoveryTargetService,
                    new RecoveryListener(shardRouting, primaryTerm),
                    repositoriesService,
                    failedShardHandler,
                    this::updateGlobalCheckpointForShard,
                    retentionLeaseSyncer,
                    nodes.getLocalNode(),
                    sourceNode);
        } catch (Exception e) {
            failAndRemoveShard(shardRouting, true, "failed to create shard", e, state);
        }
    }

/**
     * Finds the routing source node for peer recovery, return null if its not found. Note, this method expects the shard
     * routing to *require* peer recovery, use {@link ShardRouting#recoverySource()} to check if its needed or not.
     */
    private static DiscoveryNode findSourceNodeForPeerRecovery(Logger logger, RoutingTable routingTable, DiscoveryNodes nodes,
                                                               ShardRouting shardRouting) {
        DiscoveryNode sourceNode = null;
        if (!shardRouting.primary()) {
            ShardRouting primary = routingTable.shardRoutingTable(shardRouting.shardId()).primaryShard();
            // only recover from started primary, if we can't find one, we will do it next round
            if (primary.active()) {
                // 找到primary shard所在节点
                sourceNode = nodes.get(primary.currentNodeId());
                if (sourceNode == null) {
                    logger.trace("can't find replica source node because primary shard {} is assigned to an unknown node.", primary);
                }
            } else {
                logger.trace("can't find replica source node because primary shard {} is not active.", primary);
            }
        } else if (shardRouting.relocatingNodeId() != null) {
            // 找到搬迁的源节点
            sourceNode = nodes.get(shardRouting.relocatingNodeId());
            if (sourceNode == null) {
                logger.trace("can't find relocation source node for shard {} because it is assigned to an unknown node [{}].",
                    shardRouting.shardId(), shardRouting.relocatingNodeId());
            }
        } else {
            throw new IllegalStateException("trying to find source node for peer recovery when routing state means no peer recovery: " +
                shardRouting);
        }
        return sourceNode;
    }

源节点的确定分两种情况，如果当前shard本身不是primary shard，则源节点为primary shard所在节点，否则，如果当前shard正在搬迁中(从其他节点搬迁到本节点)，则源节点为数据搬迁的源头节点。得到源节点后调用IndicesService.createShard，在该方法中调用方法IndexShard.startRecovery开始恢复。

public void startRecovery(RecoveryState recoveryState, PeerRecoveryTargetService recoveryTargetService,
                              PeerRecoveryTargetService.RecoveryListener recoveryListener, RepositoriesService repositoriesService,
                              Consumer mappingUpdateConsumer,
                              IndicesService indicesService) {
        // TODO: Create a proper object to encapsulate the recovery context
        // all of the current methods here follow a pattern of:
        // resolve context which isn't really dependent on the local shards and then async
        // call some external method with this pointer.
        // with a proper recovery context object we can simply change this to:
        // startRecovery(RecoveryState recoveryState, ShardRecoverySource source ) {
        //     markAsRecovery("from " + source.getShortDescription(), recoveryState);
        //     threadPool.generic().execute()  {
        //           onFailure () { listener.failure() };
        //           doRun() {
        //                if (source.recover(this)) {
        //                  recoveryListener.onRecoveryDone(recoveryState);
        //                }
        //           }
        //     }}
        // }
        assert recoveryState.getRecoverySource().equals(shardRouting.recoverySource());
        switch (recoveryState.getRecoverySource().getType()) {
            case EMPTY_STORE:
            case EXISTING_STORE:
                executeRecovery("from store", recoveryState, recoveryListener, this::recoverFromStore);
                break;
            case PEER:
                try {
                    markAsRecovering("from " + recoveryState.getSourceNode(), recoveryState);
                    recoveryTargetService.startRecovery(this, recoveryState.getSourceNode(), recoveryListener);
                } catch (Exception e) {
                    failShard("corrupted preexisting index", e);
                    recoveryListener.onRecoveryFailure(recoveryState,
                        new RecoveryFailedException(recoveryState, null, e), true);
                }
                break;
            case SNAPSHOT:
                final String repo = ((SnapshotRecoverySource) recoveryState.getRecoverySource()).snapshot().getRepository();
                executeRecovery("from snapshot",
                    recoveryState, recoveryListener, l -> restoreFromRepository(repositoriesService.repository(repo), l));
                break;
            case LOCAL_SHARDS:
                final IndexMetadata indexMetadata = indexSettings().getIndexMetadata();
                final Index resizeSourceIndex = indexMetadata.getResizeSourceIndex();
                final List startedShards = new ArrayList<>();
                final IndexService sourceIndexService = indicesService.indexService(resizeSourceIndex);
                final Set requiredShards;
                final int numShards;
                if (sourceIndexService != null) {
                    requiredShards = IndexMetadata.selectRecoverFromShards(shardId().id(),
                        sourceIndexService.getMetadata(), indexMetadata.getNumberOfShards());
                    for (IndexShard shard : sourceIndexService) {
                        if (shard.state() == IndexShardState.STARTED && requiredShards.contains(shard.shardId())) {
                            startedShards.add(shard);
                        }
                    }
                    numShards = requiredShards.size();
                } else {
                    numShards = -1;
                    requiredShards = Collections.emptySet();
                }

                if (numShards == startedShards.size()) {
                    assert requiredShards.isEmpty() == false;
                    executeRecovery("from local shards", recoveryState, recoveryListener,
                        l -> recoverFromLocalShards(mappingUpdateConsumer,
                            startedShards.stream().filter((s) -> requiredShards.contains(s.shardId())).collect(Collectors.toList()), l));
                } else {
                    final RuntimeException e;
                    if (numShards == -1) {
                        e = new IndexNotFoundException(resizeSourceIndex);
                    } else {
                        e = new IllegalStateException("not all required shards of index " + resizeSourceIndex
                            + " are started yet, expected " + numShards + " found " + startedShards.size() + " can't recover shard "
                            + shardId());
                    }
                    throw e;
                }
                break;
            default:
                throw new IllegalArgumentException("Unknown recovery source " + recoveryState.getRecoverySource());
        }
    }

对于恢复类型为PEER的任务，恢复动作的真正执行者为PeerRecoveryTargetService.doRecovery。在该方法中，首先调用getStartRecoveryRequest获取shard的metadataSnapshot，该结构中包含shard的段信息，如syncid、checksum、doc数等，然后封装为StartRecoveryRequest，通过RPC发送到源节点：

private void doRecovery(final long recoveryId, final StartRecoveryRequest preExistingRequest) {
        final String actionName;
        final TransportRequest requestToSend;
        final StartRecoveryRequest startRequest;
        final RecoveryState.Timer timer;
        try (RecoveryRef recoveryRef = onGoingRecoveries.getRecovery(recoveryId)) {
            if (recoveryRef == null) {
                logger.trace("not running recovery with id [{}] - can not find it (probably finished)", recoveryId);
                return;
            }
            final RecoveryTarget recoveryTarget = recoveryRef.target();
            timer = recoveryTarget.state().getTimer();
            if (preExistingRequest == null) {
                try {
                    final IndexShard indexShard = recoveryTarget.indexShard();
                    indexShard.preRecovery();
                    assert recoveryTarget.sourceNode() != null : "can not do a recovery without a source node";
                    logger.trace("{} preparing shard for peer recovery", recoveryTarget.shardId());
                    indexShard.prepareForIndexRecovery();
                    final long startingSeqNo = indexShard.recoverLocallyUpToGlobalCheckpoint();
                    assert startingSeqNo == UNASSIGNED_SEQ_NO || recoveryTarget.state().getStage() == RecoveryState.Stage.TRANSLOG :
                        "unexpected recovery stage [" + recoveryTarget.state().getStage() + "] starting seqno [ " + startingSeqNo + "]";
                    // 构造recovery request 
                    startRequest = getStartRecoveryRequest(logger, clusterService.localNode(), recoveryTarget, startingSeqNo);
                    requestToSend = startRequest;
                    actionName = PeerRecoverySourceService.Actions.START_RECOVERY;
                } catch (final Exception e) {
                    // this will be logged as warning later on...
                    logger.trace("unexpected error while preparing shard for peer recovery, failing recovery", e);
                    onGoingRecoveries.failRecovery(recoveryId,
                        new RecoveryFailedException(recoveryTarget.state(), "failed to prepare shard for recovery", e), true);
                    return;
                }
                logger.trace("{} starting recovery from {}", startRequest.shardId(), startRequest.sourceNode());
            } else {
                startRequest = preExistingRequest;
                requestToSend = new ReestablishRecoveryRequest(recoveryId, startRequest.shardId(), startRequest.targetAllocationId());
                actionName = PeerRecoverySourceService.Actions.REESTABLISH_RECOVERY;
                logger.trace("{} reestablishing recovery from {}", startRequest.shardId(), startRequest.sourceNode());
            }
        }
        // 向源节点发送请求，请求恢复
        transportService.sendRequest(startRequest.sourceNode(), actionName, requestToSend,
                new RecoveryResponseHandler(startRequest, timer));
    }

/**
     * Prepare the start recovery request.
     *
     * @param logger         the logger
     * @param localNode      the local node of the recovery target
     * @param recoveryTarget the target of the recovery
     * @param startingSeqNo  a sequence number that an operation-based peer recovery can start with.
     *                       This is the first operation after the local checkpoint of the safe commit if exists.
     * @return a start recovery request
     */
    public static StartRecoveryRequest getStartRecoveryRequest(Logger logger, DiscoveryNode localNode,
                                                               RecoveryTarget recoveryTarget, long startingSeqNo) {
        final StartRecoveryRequest request;
        logger.trace("{} collecting local files for [{}]", recoveryTarget.shardId(), recoveryTarget.sourceNode());

        Store.MetadataSnapshot metadataSnapshot;
        try {
            metadataSnapshot = recoveryTarget.indexShard().snapshotStoreMetadata();
            // Make sure that the current translog is consistent with the Lucene index; otherwise, we have to throw away the Lucene index.
            try {
                final String expectedTranslogUUID = metadataSnapshot.getCommitUserData().get(Translog.TRANSLOG_UUID_KEY);
                final long globalCheckpoint = Translog.readGlobalCheckpoint(recoveryTarget.translogLocation(), expectedTranslogUUID);
                assert globalCheckpoint + 1 >= startingSeqNo : "invalid startingSeqNo " + startingSeqNo + " >= " + globalCheckpoint;
            } catch (IOException | TranslogCorruptedException e) {
                logger.warn(new ParameterizedMessage("error while reading global checkpoint from translog, " +
                    "resetting the starting sequence number from {} to unassigned and recovering as if there are none", startingSeqNo), e);
                metadataSnapshot = Store.MetadataSnapshot.EMPTY;
                startingSeqNo = UNASSIGNED_SEQ_NO;
            }
        } catch (final org.apache.lucene.index.IndexNotFoundException e) {
            // happens on an empty folder. no need to log
            assert startingSeqNo == UNASSIGNED_SEQ_NO : startingSeqNo;
            logger.trace("{} shard folder empty, recovering all files", recoveryTarget);
            metadataSnapshot = Store.MetadataSnapshot.EMPTY;
        } catch (final IOException e) {
            if (startingSeqNo != UNASSIGNED_SEQ_NO) {
                logger.warn(new ParameterizedMessage("error while listing local files, resetting the starting sequence number from {} " +
                    "to unassigned and recovering as if there are none", startingSeqNo), e);
                startingSeqNo = UNASSIGNED_SEQ_NO;
            } else {
                logger.warn("error while listing local files, recovering as if there are none", e);
            }
            metadataSnapshot = Store.MetadataSnapshot.EMPTY;
        }
        logger.trace("{} local file count [{}]", recoveryTarget.shardId(), metadataSnapshot.size());
        request = new StartRecoveryRequest(
            recoveryTarget.shardId(),
            recoveryTarget.indexShard().routingEntry().allocationId().getId(),
            recoveryTarget.sourceNode(),
            localNode,
            metadataSnapshot,
            recoveryTarget.state().getPrimary(),
            recoveryTarget.recoveryId(),
            startingSeqNo);
        return request;
    }

注意，请求的发送是异步的。

源节点处理恢复请求

源节点接收到请求后会调用恢复的入口函数PeerRecoverySourceService.messageReceived#recover:

class StartRecoveryTransportRequestHandler implements TransportRequestHandler {
        @Override
        public void messageReceived(final StartRecoveryRequest request, final TransportChannel channel, Task task) throws Exception {
            recover(request, new ChannelActionListener<>(channel, Actions.START_RECOVERY, request));
        }
    }

recover方法根据request得到shard并构造RecoverySourceHandler对象，然后调用handler.recoverToTarget进入恢复的执行体：

private void recover(StartRecoveryRequest request, ActionListener listener) {
        final IndexService indexService = indicesService.indexServiceSafe(request.shardId().getIndex());
        final IndexShard shard = indexService.getShard(request.shardId().id());

        final ShardRouting routingEntry = shard.routingEntry();

        if (routingEntry.primary() == false || routingEntry.active() == false) {
            throw new DelayRecoveryException("source shard [" + routingEntry + "] is not an active primary");
        }

        if (request.isPrimaryRelocation() && (routingEntry.relocating() == false ||
            routingEntry.relocatingNodeId().equals(request.targetNode().getId()) == false)) {
            logger.debug("delaying recovery of {} as source shard is not marked yet as relocating to {}",
                request.shardId(), request.targetNode());
            throw new DelayRecoveryException("source shard is not marked yet as relocating to [" + request.targetNode() + "]");
        }

        RecoverySourceHandler handler = ongoingRecoveries.addNewRecovery(request, shard);
        logger.trace("[{}][{}] starting recovery to {}", request.shardId().getIndex().getName(), request.shardId().id(),
            request.targetNode());
        handler.recoverToTarget(ActionListener.runAfter(listener, () -> ongoingRecoveries.remove(shard, handler)));
    }


/**
     * performs the recovery from the local engine to the target
     */
    public void recoverToTarget(ActionListener listener) {
        addListener(listener);
        final Closeable releaseResources = () -> IOUtils.close(resources);
        try {
            cancellableThreads.setOnCancel((reason, beforeCancelEx) -> {
                final RuntimeException e;
                if (shard.state() == IndexShardState.CLOSED) { // check if the shard got closed on us
                    e = new IndexShardClosedException(shard.shardId(), "shard is closed and recovery was canceled reason [" + reason + "]");
                } else {
                    e = new CancellableThreads.ExecutionCancelledException("recovery was canceled reason [" + reason + "]");
                }
                if (beforeCancelEx != null) {
                    e.addSuppressed(beforeCancelEx);
                }
                IOUtils.closeWhileHandlingException(releaseResources, () -> future.onFailure(e));
                throw e;
            });
            final Consumer onFailure = e -> {
                assert Transports.assertNotTransportThread(RecoverySourceHandler.this + "[onFailure]");
                IOUtils.closeWhileHandlingException(releaseResources, () -> future.onFailure(e));
            };

            final SetOnce retentionLeaseRef = new SetOnce<>();

            runUnderPrimaryPermit(() -> {
                final IndexShardRoutingTable routingTable = shard.getReplicationGroup().getRoutingTable();
                ShardRouting targetShardRouting = routingTable.getByAllocationId(request.targetAllocationId());
                if (targetShardRouting == null) {
                    logger.debug("delaying recovery of {} as it is not listed as assigned to target node {}", request.shardId(),
                        request.targetNode());
                    throw new DelayRecoveryException("source node does not have the shard listed in its state as allocated on the node");
                }
                assert targetShardRouting.initializing() : "expected recovery target to be initializing but was " + targetShardRouting;
                retentionLeaseRef.set(
                    shard.getRetentionLeases().get(ReplicationTracker.getPeerRecoveryRetentionLeaseId(targetShardRouting)));
            }, shardId + " validating recovery target ["+ request.targetAllocationId() + "] registered ",
                shard, cancellableThreads, logger);
            // 获取一个保留锁，使得translog不被清理
            final Closeable retentionLock = shard.acquireHistoryRetentionLock();
            resources.add(retentionLock);
            final long startingSeqNo;
            // 判断是否可以从SequenceNumber恢复
            // 除了异常检测和版本号检测，主要在shard.hasCompleteHistoryOperations()方法中判断请求的序列号是否小于主分片节点的localCheckpoint，
            // 以及translog中的数据是否足以恢复(有可能因为translog数据太大或者过期删除而无法恢复)
            final boolean isSequenceNumberBasedRecovery
                = request.startingSeqNo() != SequenceNumbers.UNASSIGNED_SEQ_NO
                && isTargetSameHistory()
                && shard.hasCompleteHistoryOperations("peer-recovery", request.startingSeqNo())
                && ((retentionLeaseRef.get() == null && shard.useRetentionLeasesInPeerRecovery() == false) ||
                   (retentionLeaseRef.get() != null && retentionLeaseRef.get().retainingSequenceNumber() <= request.startingSeqNo()));
            // NB check hasCompleteHistoryOperations when computing isSequenceNumberBasedRecovery, even if there is a retention lease,
            // because when doing a rolling upgrade from earlier than 7.4 we may create some leases that are initially unsatisfied. It's
            // possible there are other cases where we cannot satisfy all leases, because that's not a property we currently expect to hold.
            // Also it's pretty cheap when soft deletes are enabled, and it'd be a disaster if we tried a sequence-number-based recovery
            // without having a complete history.

            if (isSequenceNumberBasedRecovery && retentionLeaseRef.get() != null) {
                // all the history we need is retained by an existing retention lease, so we do not need a separate retention lock
                retentionLock.close();
                logger.trace("history is retained by {}", retentionLeaseRef.get());
            } else {
                // all the history we need is retained by the retention lock, obtained before calling shard.hasCompleteHistoryOperations()
                // and before acquiring the safe commit we'll be using, so we can be certain that all operations after the safe commit's
                // local checkpoint will be retained for the duration of this recovery.
                logger.trace("history is retained by retention lock");
            }

            final StepListener sendFileStep = new StepListener<>();
            final StepListener prepareEngineStep = new StepListener<>();
            final StepListener sendSnapshotStep = new StepListener<>();
            final StepListener finalizeStep = new StepListener<>();
            // 若可以基于序列号进行恢复，则获取开始的序列号
            if (isSequenceNumberBasedRecovery) {
                // 如果基于SequenceNumber恢复，则startingSeqNo取值为恢复请求中的序列号，
                // 从请求的序列号开始快照translog。否则取值为0，快照完整的translog。
                logger.trace("performing sequence numbers based recovery. starting at [{}]", request.startingSeqNo());
                // 获取开始序列号
                startingSeqNo = request.startingSeqNo();
                if (retentionLeaseRef.get() == null) {
                    createRetentionLease(startingSeqNo, sendFileStep.map(ignored -> SendFileResult.EMPTY));
                } else {
                    // 发送的文件设置为空
                    sendFileStep.onResponse(SendFileResult.EMPTY);
                }
            } else {
                final Engine.IndexCommitRef safeCommitRef;
                try {
                    // Releasing a safe commit can access some commit files.
                    safeCommitRef = acquireSafeCommit(shard);
                    resources.add(safeCommitRef);
                } catch (final Exception e) {
                    throw new RecoveryEngineException(shard.shardId(), 1, "snapshot failed", e);
                }

                // Try and copy enough operations to the recovering peer so that if it is promoted to primary then it has a chance of being
                // able to recover other replicas using operations-based recoveries. If we are not using retention leases then we
                // conservatively copy all available operations. If we are using retention leases then "enough operations" is just the
                // operations from the local checkpoint of the safe commit onwards, because when using soft deletes the safe commit retains
                // at least as much history as anything else. The safe commit will often contain all the history retained by the current set
                // of retention leases, but this is not guaranteed: an earlier peer recovery from a different primary might have created a
                // retention lease for some history that this primary already discarded, since we discard history when the global checkpoint
                // advances and not when creating a new safe commit. In any case this is a best-effort thing since future recoveries can
                // always fall back to file-based ones, and only really presents a problem if this primary fails before things have settled
                // down.
                startingSeqNo = Long.parseLong(safeCommitRef.getIndexCommit().getUserData().get(SequenceNumbers.LOCAL_CHECKPOINT_KEY)) + 1L;
                logger.trace("performing file-based recovery followed by history replay starting at [{}]", startingSeqNo);

                try {
                    final int estimateNumOps = estimateNumberOfHistoryOperations(startingSeqNo);
                    final Releasable releaseStore = acquireStore(shard.store());
                    resources.add(releaseStore);
                    sendFileStep.whenComplete(r -> IOUtils.close(safeCommitRef, releaseStore), e -> {
                        try {
                            IOUtils.close(safeCommitRef, releaseStore);
                        } catch (final IOException ex) {
                            logger.warn("releasing snapshot caused exception", ex);
                        }
                    });

                    final StepListener deleteRetentionLeaseStep = new StepListener<>();
                    runUnderPrimaryPermit(() -> {
                            try {
                                // If the target previously had a copy of this shard then a file-based recovery might move its global
                                // checkpoint backwards. We must therefore remove any existing retention lease so that we can create a
                                // new one later on in the recovery.
                                shard.removePeerRecoveryRetentionLease(request.targetNode().getId(),
                                    new ThreadedActionListener<>(logger, shard.getThreadPool(), ThreadPool.Names.GENERIC,
                                        deleteRetentionLeaseStep, false));
                            } catch (RetentionLeaseNotFoundException e) {
                                logger.debug("no peer-recovery retention lease for " + request.targetAllocationId());
                                deleteRetentionLeaseStep.onResponse(null);
                            }
                        }, shardId + " removing retention lease for [" + request.targetAllocationId() + "]",
                        shard, cancellableThreads, logger);
                    // 第一阶段
                    deleteRetentionLeaseStep.whenComplete(ignored -> {
                        assert Transports.assertNotTransportThread(RecoverySourceHandler.this + "[phase1]");
                        phase1(safeCommitRef.getIndexCommit(), startingSeqNo, () -> estimateNumOps, sendFileStep);
                    }, onFailure);

                } catch (final Exception e) {
                    throw new RecoveryEngineException(shard.shardId(), 1, "sendFileStep failed", e);
                }
            }
            assert startingSeqNo >= 0 : "startingSeqNo must be non negative. got: " + startingSeqNo;

            sendFileStep.whenComplete(r -> {
                assert Transports.assertNotTransportThread(RecoverySourceHandler.this + "[prepareTargetForTranslog]");
                // 等待phase1执行完毕，主分片节点通知副分片节点启动此分片的Engine:prepareTargetForTranslog          
                // 该方法会阻塞处理，直到分片 Engine 启动完毕。
                // 待副分片启动Engine 完毕，就可以正常接收写请求了。
                // 注意，此时phase2尚未开始，此分片的恢复流程尚未结束。
                // 等待当前操作处理完成后，以startingSeqNo为起始点，对translog做快照，开始执行phase2：
                // For a sequence based recovery, the target can keep its local translog
                prepareTargetForTranslog(estimateNumberOfHistoryOperations(startingSeqNo), prepareEngineStep);
            }, onFailure);

            prepareEngineStep.whenComplete(prepareEngineTime -> {
                assert Transports.assertNotTransportThread(RecoverySourceHandler.this + "[phase2]");
                /*
                 * add shard to replication group (shard will receive replication requests from this point on) now that engine is open.
                 * This means that any document indexed into the primary after this will be replicated to this replica as well
                 * make sure to do this before sampling the max sequence number in the next step, to ensure that we send
                 * all documents up to maxSeqNo in phase2.
                 */
                runUnderPrimaryPermit(() -> shard.initiateTracking(request.targetAllocationId()),
                    shardId + " initiating tracking of " + request.targetAllocationId(), shard, cancellableThreads, logger);

                final long endingSeqNo = shard.seqNoStats().getMaxSeqNo();
                logger.trace("snapshot for recovery; current size is [{}]", estimateNumberOfHistoryOperations(startingSeqNo));
                final Translog.Snapshot phase2Snapshot = shard.newChangesSnapshot("peer-recovery", startingSeqNo, Long.MAX_VALUE, false);
                resources.add(phase2Snapshot);
                retentionLock.close();

                // we have to capture the max_seen_auto_id_timestamp and the max_seq_no_of_updates to make sure that these values
                // are at least as high as the corresponding values on the primary when any of these operations were executed on it.
                final long maxSeenAutoIdTimestamp = shard.getMaxSeenAutoIdTimestamp();
                final long maxSeqNoOfUpdatesOrDeletes = shard.getMaxSeqNoOfUpdatesOrDeletes();
                final RetentionLeases retentionLeases = shard.getRetentionLeases();
                final long mappingVersionOnPrimary = shard.indexSettings().getIndexMetadata().getMappingVersion();
                // 第二阶段，发送translog
                phase2(startingSeqNo, endingSeqNo, phase2Snapshot, maxSeenAutoIdTimestamp, maxSeqNoOfUpdatesOrDeletes,
                    retentionLeases, mappingVersionOnPrimary, sendSnapshotStep);

            }, onFailure);

            // Recovery target can trim all operations >= startingSeqNo as we have sent all these operations in the phase 2
            final long trimAboveSeqNo = startingSeqNo - 1;
            sendSnapshotStep.whenComplete(r -> finalizeRecovery(r.targetLocalCheckpoint, trimAboveSeqNo, finalizeStep), onFailure);

            finalizeStep.whenComplete(r -> {
                final long phase1ThrottlingWaitTime = 0L; // TODO: return the actual throttle time
                final SendSnapshotResult sendSnapshotResult = sendSnapshotStep.result();
                final SendFileResult sendFileResult = sendFileStep.result();
                final RecoveryResponse response = new RecoveryResponse(sendFileResult.phase1FileNames, sendFileResult.phase1FileSizes,
                    sendFileResult.phase1ExistingFileNames, sendFileResult.phase1ExistingFileSizes, sendFileResult.totalSize,
                    sendFileResult.existingTotalSize, sendFileResult.took.millis(), phase1ThrottlingWaitTime,
                    prepareEngineStep.result().millis(), sendSnapshotResult.sentOperations, sendSnapshotResult.tookTime.millis());
                try {
                    future.onResponse(response);
                } finally {
                    IOUtils.close(resources);
                }
            }, onFailure);
        } catch (Exception e) {
            IOUtils.closeWhileHandlingException(releaseResources, () -> future.onFailure(e));
        }
    }


    /**
     * Checks if we have a completed history of operations since the given starting seqno (inclusive).
     * This method should be called after acquiring the retention lock; See {@link #acquireHistoryRetentionLock()}
     */
    public boolean hasCompleteHistoryOperations(String reason, long startingSeqNo)  {
        return getEngine().hasCompleteOperationHistory(reason, startingSeqNo);
    }

从上面的代码可以看出，恢复主要分两个阶段，第一阶段恢复segment文件，第二阶段发送translog。这里有个关键的地方，在恢复前，首先需要获取translogView及segment snapshot，translogView的作用是保证当前时间点到恢复结束时间段的translog不被删除，segment snapshot的作用是保证当前时间点之前的segment文件不被删除。接下来看看两阶段恢复的具体执行逻辑。phase1:

/**
     * Perform phase1 of the recovery operations. Once this {@link IndexCommit}
     * snapshot has been performed no commit operations (files being fsync'd)
     * are effectively allowed on this index until all recovery phases are done
     * 
     * Phase1 examines the segment files on the target node and copies over the
     * segments that are missing. Only segments that have the same size and
     * checksum can be reused
     */
    void phase1(IndexCommit snapshot, long startingSeqNo, IntSupplier translogOps, ActionListener listener) {
        cancellableThreads.checkForCancel();
        //拿到shard的存储信息
        final Store store = shard.store();
        try {
            StopWatch stopWatch = new StopWatch().start();
            final Store.MetadataSnapshot recoverySourceMetadata;
            try {
                // 拿到snapshot的metadata
                recoverySourceMetadata = store.getMetadata(snapshot);
            } catch (CorruptIndexException | IndexFormatTooOldException | IndexFormatTooNewException ex) {
                shard.failShard("recovery", ex);
                throw ex;
            }
            for (String name : snapshot.getFileNames()) {
                final StoreFileMetadata md = recoverySourceMetadata.get(name);
                if (md == null) {
                    logger.info("Snapshot differs from actual index for file: {} meta: {}", name, recoverySourceMetadata.asMap());
                    throw new CorruptIndexException("Snapshot differs from actual index - maybe index was removed metadata has " +
                            recoverySourceMetadata.asMap().size() + " files", name);
                }
            }
            // 如果syncid相等，再继续比较下文档数，如果都相同则不用恢复
            if (canSkipPhase1(recoverySourceMetadata, request.metadataSnapshot()) == false) {
                final List phase1FileNames = new ArrayList<>();
                final List phase1FileSizes = new ArrayList<>();
                final List phase1ExistingFileNames = new ArrayList<>();
                final List phase1ExistingFileSizes = new ArrayList<>();

                // Total size of segment files that are recovered
                long totalSizeInBytes = 0;
                // Total size of segment files that were able to be re-used
                long existingTotalSizeInBytes = 0;

                // Generate a "diff" of all the identical, different, and missing
                // segment files on the target node, using the existing files on
                // the source node
                // 找出target和source有差别的segment
                // https://github.com/jiankunking/elasticsearch/blob/master/server/src/main/java/org/elasticsearch/index/store/Store.java#L971
                final Store.RecoveryDiff diff = recoverySourceMetadata.recoveryDiff(request.metadataSnapshot());
                for (StoreFileMetadata md : diff.identical) {
                    phase1ExistingFileNames.add(md.name());
                    phase1ExistingFileSizes.add(md.length());
                    existingTotalSizeInBytes += md.length();
                    if (logger.isTraceEnabled()) {
                        logger.trace("recovery [phase1]: not recovering [{}], exist in local store and has checksum [{}]," +
                                        " size [{}]", md.name(), md.checksum(), md.length());
                    }
                    totalSizeInBytes += md.length();
                }
                List phase1Files = new ArrayList<>(diff.different.size() + diff.missing.size());
                phase1Files.addAll(diff.different);
                phase1Files.addAll(diff.missing);
                for (StoreFileMetadata md : phase1Files) {
                    if (request.metadataSnapshot().asMap().containsKey(md.name())) {
                        logger.trace("recovery [phase1]: recovering [{}], exists in local store, but is different: remote [{}], local [{}]",
                            md.name(), request.metadataSnapshot().asMap().get(md.name()), md);
                    } else {
                        logger.trace("recovery [phase1]: recovering [{}], does not exist in remote", md.name());
                    }
                    phase1FileNames.add(md.name());
                    phase1FileSizes.add(md.length());
                    totalSizeInBytes += md.length();
                }

                logger.trace("recovery [phase1]: recovering_files [{}] with total_size [{}], reusing_files [{}] with total_size [{}]",
                    phase1FileNames.size(), new ByteSizeValue(totalSizeInBytes),
                    phase1ExistingFileNames.size(), new ByteSizeValue(existingTotalSizeInBytes));
                final StepListener sendFileInfoStep = new StepListener<>();
                final StepListener sendFilesStep = new StepListener<>();
                final StepListener createRetentionLeaseStep = new StepListener<>();
                final StepListener cleanFilesStep = new StepListener<>();
                cancellableThreads.checkForCancel();
                recoveryTarget.receiveFileInfo(phase1FileNames, phase1FileSizes, phase1ExistingFileNames,
                        phase1ExistingFileSizes, translogOps.getAsInt(), sendFileInfoStep);

                // 将需要恢复的文件发送到target node
                sendFileInfoStep.whenComplete(r ->
                    sendFiles(store, phase1Files.toArray(new StoreFileMetadata[0]), translogOps, sendFilesStep), listener::onFailure);

                sendFilesStep.whenComplete(r -> createRetentionLease(startingSeqNo, createRetentionLeaseStep), listener::onFailure);

                createRetentionLeaseStep.whenComplete(retentionLease ->
                    {
                        final long lastKnownGlobalCheckpoint = shard.getLastKnownGlobalCheckpoint();
                        assert retentionLease == null || retentionLease.retainingSequenceNumber() - 1 <= lastKnownGlobalCheckpoint
                            : retentionLease + " vs " + lastKnownGlobalCheckpoint;
                        // Establishes new empty translog on the replica with global checkpoint set to lastKnownGlobalCheckpoint. We want
                        // the commit we just copied to be a safe commit on the replica, so why not set the global checkpoint on the replica
                        // to the max seqno of this commit? Because (in rare corner cases) this commit might not be a safe commit here on
                        // the primary, and in these cases the max seqno would be too high to be valid as a global checkpoint.
                        cleanFiles(store, recoverySourceMetadata, translogOps, lastKnownGlobalCheckpoint, cleanFilesStep);
                    },
                    listener::onFailure);

                final long totalSize = totalSizeInBytes;
                final long existingTotalSize = existingTotalSizeInBytes;
                cleanFilesStep.whenComplete(r -> {
                    final TimeValue took = stopWatch.totalTime();
                    logger.trace("recovery [phase1]: took [{}]", took);
                    listener.onResponse(new SendFileResult(phase1FileNames, phase1FileSizes, totalSize, phase1ExistingFileNames,
                        phase1ExistingFileSizes, existingTotalSize, took));
                }, listener::onFailure);
            } else {
                logger.trace("skipping [phase1] since source and target have identical sync id [{}]", recoverySourceMetadata.getSyncId());

                // but we must still create a retention lease
                final StepListener createRetentionLeaseStep = new StepListener<>();
                createRetentionLease(startingSeqNo, createRetentionLeaseStep);
                createRetentionLeaseStep.whenComplete(retentionLease -> {
                    final TimeValue took = stopWatch.totalTime();
                    logger.trace("recovery [phase1]: took [{}]", took);
                    listener.onResponse(new SendFileResult(Collections.emptyList(), Collections.emptyList(), 0L, Collections.emptyList(),
                        Collections.emptyList(), 0L, took));
                }, listener::onFailure);

            }
        } catch (Exception e) {
            throw new RecoverFilesRecoveryException(request.shardId(), 0, new ByteSizeValue(0L), e);
        }
    }

从上面代码可以看出，phase1的具体逻辑是，首先拿到待恢复shard的metadataSnapshot从而得到recoverySourceSyncId，根据request拿到recoveryTargetSyncId，比较两边的syncid，如果相同再比较源和目标的文档数，如果也相同，说明在当前提交点之前源和目标的shard对应的segments都相同，因此不用恢复segment文件(canSkipPhase1方法中比对的)。如果两边的syncid不同，说明segment文件有差异，则需要找出所有有差异的文件进行恢复。通过比较recoverySourceMetadata和recoveryTargetSnapshot的差异性，可以找出所有有差别的segment文件。这块逻辑如下：

/**
         * Returns a diff between the two snapshots that can be used for recovery. The given snapshot is treated as the
         * recovery target and this snapshot as the source. The returned diff will hold a list of files that are:
         * 
         * identical: they exist in both snapshots and they can be considered the same ie. they don't need to be recovered
         * different: they exist in both snapshots but their they are not identical
         * missing: files that exist in the source but not in the target
         * 
         * This method groups file into per-segment files and per-commit files. A file is treated as
         * identical if and on if all files in it's group are identical. On a per-segment level files for a segment are treated
         * as identical iff:
         * 
         * all files in this segment have the same checksum
         * all files in this segment have the same length
         * the segments {@code .si} files hashes are byte-identical Note: This is a using a perfect hash function,
         * The metadata transfers the {@code .si} file content as it's hash
         * 
         * 
         * The {@code .si} file contains a lot of diagnostics including a timestamp etc. in the future there might be
         * unique segment identifiers in there hardening this method further.
         * 

         * The per-commit files handles very similar. A commit is composed of the {@code segments_N} files as well as generational files
         * like deletes ({@code _x_y.del}) or field-info ({@code _x_y.fnm}) files. On a per-commit level files for a commit are treated
         * as identical iff:
         * 

         * all files belonging to this commit have the same checksum
         * all files belonging to this commit have the same length
         * the segments file {@code segments_N} files hashes are byte-identical Note: This is a using a perfect hash function,
         * The metadata transfers the {@code segments_N} file content as it's hash
         * 
         * 
         * NOTE: this diff will not contain the {@code segments.gen} file. This file is omitted on recovery.
         */
        public RecoveryDiff recoveryDiff(MetadataSnapshot recoveryTargetSnapshot) {
            final List identical = new ArrayList<>();// 相同的file 
            final List different = new ArrayList<>();// 不同的file
            final List missing = new ArrayList<>();// 缺失的file
            final Map> perSegment = new HashMap<>();
            final List perCommitStoreFiles = new ArrayList<>();

            for (StoreFileMetadata meta : this) {
                if (IndexFileNames.OLD_SEGMENTS_GEN.equals(meta.name())) { // legacy
                    continue; // we don't need that file at all
                }
                final String segmentId = IndexFileNames.parseSegmentName(meta.name());
                final String extension = IndexFileNames.getExtension(meta.name());
                if (IndexFileNames.SEGMENTS.equals(segmentId) ||
                        DEL_FILE_EXTENSION.equals(extension) || LIV_FILE_EXTENSION.equals(extension)) {
                    // only treat del files as per-commit files fnm files are generational but only for upgradable DV
                    perCommitStoreFiles.add(meta);
                } else {
                    perSegment.computeIfAbsent(segmentId, k -> new ArrayList<>()).add(meta);
                }
            }
            final ArrayList identicalFiles = new ArrayList<>();
            for (List segmentFiles : Iterables.concat(perSegment.values(), Collections.singleton(perCommitStoreFiles))) {
                identicalFiles.clear();
                boolean consistent = true;
                for (StoreFileMetadata meta : segmentFiles) {
                    StoreFileMetadata storeFileMetadata = recoveryTargetSnapshot.get(meta.name());
                    if (storeFileMetadata == null) {
                        consistent = false;
                        missing.add(meta);// 该segment在target node中不存在，则加入到missing
                    } else if (storeFileMetadata.isSame(meta) == false) {
                        consistent = false;
                        different.add(meta);// 存在但不相同，则加入到different
                    } else {
                        identicalFiles.add(meta);// 存在且相同
                    }
                }
                if (consistent) {
                    identical.addAll(identicalFiles);
                } else {
                    // make sure all files are added - this can happen if only the deletes are different
                    different.addAll(identicalFiles);
                }
            }
            RecoveryDiff recoveryDiff = new RecoveryDiff(Collections.unmodifiableList(identical),
                Collections.unmodifiableList(different), Collections.unmodifiableList(missing));
            assert recoveryDiff.size() == this.metadata.size() - (metadata.containsKey(IndexFileNames.OLD_SEGMENTS_GEN) ? 1 : 0)
                    : "some files are missing recoveryDiff size: [" + recoveryDiff.size() + "] metadata size: [" +
                      this.metadata.size() + "] contains  segments.gen: [" + metadata.containsKey(IndexFileNames.OLD_SEGMENTS_GEN) + "]";
            return recoveryDiff;
        }

这里将所有的segment file分为三类：identical(相同)、different(不同)、missing(target缺失)。然后将different和missing的segment files作为第一阶段需要恢复的文件发送到target node。发送完segment files后，源节点还会向目标节点发送消息以通知目标节点清理临时文件，然后也会发送消息通知目标节点打开引擎准备接收translog。

第二阶段的逻辑比较简单，只需将translog view到当前时间之间的所有translog发送给源节点即可。

第二阶段使用当前translog的快照，而不获取写锁(但是，translog快照是translog的时间点视图)。然后，它将每个translog操作发送到目标节点，以便将其重播到新的shard中。

    /**
     * Perform phase two of the recovery process.
     * 
     * Phase two uses a snapshot of the current translog *without* acquiring the write lock (however, the translog snapshot is
     * point-in-time view of the translog). It then sends each translog operation to the target node so it can be replayed into the new
     * shard.
     *
     * @param startingSeqNo              the sequence number to start recovery from, or {@link SequenceNumbers#UNASSIGNED_SEQ_NO} if all
     *                                   ops should be sent
     * @param endingSeqNo                the highest sequence number that should be sent
     * @param snapshot                   a snapshot of the translog
     * @param maxSeenAutoIdTimestamp     the max auto_id_timestamp of append-only requests on the primary
     * @param maxSeqNoOfUpdatesOrDeletes the max seq_no of updates or deletes on the primary after these operations were executed on it.
     * @param listener                   a listener which will be notified with the local checkpoint on the target.
     */
    void phase2(
            final long startingSeqNo,
            final long endingSeqNo,
            final Translog.Snapshot snapshot,
            final long maxSeenAutoIdTimestamp,
            final long maxSeqNoOfUpdatesOrDeletes,
            final RetentionLeases retentionLeases,
            final long mappingVersion,
            final ActionListener listener) throws IOException {
        if (shard.state() == IndexShardState.CLOSED) {
            throw new IndexShardClosedException(request.shardId());
        }
        logger.trace("recovery [phase2]: sending transaction log operations (from [" + startingSeqNo + "] to [" + endingSeqNo + "]");
        final StopWatch stopWatch = new StopWatch().start();
        final StepListener sendListener = new StepListener<>();
        final OperationBatchSender sender = new OperationBatchSender(startingSeqNo, endingSeqNo, snapshot, maxSeenAutoIdTimestamp,
            maxSeqNoOfUpdatesOrDeletes, retentionLeases, mappingVersion, sendListener);
        sendListener.whenComplete(
            ignored -> {
                final long skippedOps = sender.skippedOps.get();
                final int totalSentOps = sender.sentOps.get();
                final long targetLocalCheckpoint = sender.targetLocalCheckpoint.get();
                assert snapshot.totalOperations() == snapshot.skippedOperations() + skippedOps + totalSentOps
                    : String.format(Locale.ROOT, "expected total [%d], overridden [%d], skipped [%d], total sent [%d]",
                    snapshot.totalOperations(), snapshot.skippedOperations(), skippedOps, totalSentOps);
                stopWatch.stop();
                final TimeValue tookTime = stopWatch.totalTime();
                logger.trace("recovery [phase2]: took [{}]", tookTime);
                listener.onResponse(new SendSnapshotResult(targetLocalCheckpoint, totalSentOps, tookTime));
            }, listener::onFailure);
        sender.start();
    }

目标节点开始恢复

接收segment

对应上一小节源节点恢复的第一阶段，源节点将所有有差异的segment发送给目标节点，目标节点接收到后会将segment文件落盘。segment files的写入函数为RecoveryTarget.writeFileChunk:

真正执行的位置：MultiFileWriter.innerWriteFileChunk

public void writeFileChunk(StoreFileMetaData fileMetaData, long position, BytesReference content, boolean lastChunk, int totalTranslogOps) throws IOException {
    final Store store = store();
    final String name = fileMetaData.name();
    ... ...
    if (position == 0) {
        indexOutput = openAndPutIndexOutput(name, fileMetaData, store);
    } else {
        indexOutput = getOpenIndexOutput(name); // 加一层前缀，组成临时文件
    }
    ... ...
    while((scratch = iterator.next()) != null) { 
        indexOutput.writeBytes(scratch.bytes, scratch.offset, scratch.length); // 写临时文件
    }
    ... ...
    store.directory().sync(Collections.singleton(temporaryFileName));  // 这里会调用fsync落盘
}

打开引擎

经过上面的过程，目标节点完成了追数据的第一步。接收完segment后，目标节点打开shard对应的引擎准备接收translog，注意，这里打开引擎后，正在恢复的shard便可进行写入、删除(操作包括primary shard同步的请求和translog中的操作命令)。打开引擎的逻辑如下：

 /**
     * Opens the engine on top of the existing lucene engine and translog.
     * The translog is kept but its operations won't be replayed.
     */
    public void openEngineAndSkipTranslogRecovery() throws IOException {
        assert routingEntry().recoverySource().getType() == RecoverySource.Type.PEER : "not a peer recovery [" + routingEntry() + "]";
        recoveryState.validateCurrentStage(RecoveryState.Stage.TRANSLOG);
        loadGlobalCheckpointToReplicationTracker();
        innerOpenEngineAndTranslog(replicationTracker);
        getEngine().skipTranslogRecovery();
    }

private void innerOpenEngineAndTranslog(LongSupplier globalCheckpointSupplier) throws IOException {
        assert Thread.holdsLock(mutex) == false : "opening engine under mutex";
        if (state != IndexShardState.RECOVERING) {
            throw new IndexShardNotRecoveringException(shardId, state);
        }
        final EngineConfig config = newEngineConfig(globalCheckpointSupplier);

        // we disable deletes since we allow for operations to be executed against the shard while recovering
        // but we need to make sure we don't loose deletes until we are done recovering
        config.setEnableGcDeletes(false);// 恢复过程中不删除translog
        updateRetentionLeasesOnReplica(loadRetentionLeases());
        assert recoveryState.getRecoverySource().expectEmptyRetentionLeases() == false || getRetentionLeases().leases().isEmpty()
            : "expected empty set of retention leases with recovery source [" + recoveryState.getRecoverySource()
            + "] but got " + getRetentionLeases();
        synchronized (engineMutex) {
            assert currentEngineReference.get() == null : "engine is running";
            verifyNotClosed();
            // we must create a new engine under mutex (see IndexShard#snapshotStoreMetadata).
            final Engine newEngine = engineFactory.newReadWriteEngine(config);// 创建engine
            onNewEngine(newEngine);
            currentEngineReference.set(newEngine);
            // We set active because we are now writing operations to the engine; this way,
            // we can flush if we go idle after some time and become inactive.
            active.set(true);
        }
        // time elapses after the engine is created above (pulling the config settings) until we set the engine reference, during
        // which settings changes could possibly have happened, so here we forcefully push any config changes to the new engine.
        onSettingsChanged();
        assert assertSequenceNumbersInCommit();
        recoveryState.validateCurrentStage(RecoveryState.Stage.TRANSLOG);
    }

接收并重放translog

打开引擎后，便可以根据translog中的命令进行相应的回放动作，回放的逻辑和正常的写入、删除类似，这里需要根据translog还原出操作类型和操作数据，并根据操作数据构建相应的数据对象，然后再调用上一步打开的engine执行相应的操作，这块逻辑如下：

IndexShard#runTranslogRecovery
=>
IndexShard#applyTranslogOperation

// 重放translog快照中的translog操作到当前引擎。
// 在成功回放每个translog操作后，会通知回调onOperationRecovered。
 /**
     * Replays translog operations from the provided translog {@code snapshot} to the current engine using the given {@code origin}.
     * The callback {@code onOperationRecovered} is notified after each translog operation is replayed successfully.
     */
    int runTranslogRecovery(Engine engine, Translog.Snapshot snapshot, Engine.Operation.Origin origin,
                            Runnable onOperationRecovered) throws IOException {
        int opsRecovered = 0;
        Translog.Operation operation;
        while ((operation = snapshot.next()) != null) {
            try {
                logger.trace("[translog] recover op {}", operation);
                Engine.Result result = applyTranslogOperation(engine, operation, origin);
                switch (result.getResultType()) {
                    case FAILURE:
                        throw result.getFailure();
                    case MAPPING_UPDATE_REQUIRED:
                        throw new IllegalArgumentException("unexpected mapping update: " + result.getRequiredMappingUpdate());
                    case SUCCESS:
                        break;
                    default:
                        throw new AssertionError("Unknown result type [" + result.getResultType() + "]");
                }

                opsRecovered++;
                onOperationRecovered.run();
            } catch (Exception e) {
                // TODO: Don't enable this leniency unless users explicitly opt-in
                if (origin == Engine.Operation.Origin.LOCAL_TRANSLOG_RECOVERY && ExceptionsHelper.status(e) == RestStatus.BAD_REQUEST) {
                    // mainly for MapperParsingException and Failure to detect xcontent
                    logger.info("ignoring recovery of a corrupt translog entry", e);
                } else {
                    throw ExceptionsHelper.convertToRuntime(e);
                }
            }
        }
        return opsRecovered;
    }

private Engine.Result applyTranslogOperation(Engine engine, Translog.Operation operation,
                                                 Engine.Operation.Origin origin) throws IOException {
        // If a translog op is replayed on the primary (eg. ccr), we need to use external instead of null for its version type.
        final VersionType versionType = (origin == Engine.Operation.Origin.PRIMARY) ? VersionType.EXTERNAL : null;
        final Engine.Result result;
        switch (operation.opType()) {// 还原出操作类型及操作数据并调用engine执行相应的动作
            case INDEX:
                final Translog.Index index = (Translog.Index) operation;
                // we set canHaveDuplicates to true all the time such that we de-optimze the translog case and ensure that all
                // autoGeneratedID docs that are coming from the primary are updated correctly.
                result = applyIndexOperation(engine, index.seqNo(), index.primaryTerm(), index.version(),
                    versionType, UNASSIGNED_SEQ_NO, 0, index.getAutoGeneratedIdTimestamp(), true, origin,
                    new SourceToParse(shardId.getIndexName(), index.id(), index.source(),
                        XContentHelper.xContentType(index.source()), index.routing()));
                break;
            case DELETE:
                final Translog.Delete delete = (Translog.Delete) operation;
                result = applyDeleteOperation(engine, delete.seqNo(), delete.primaryTerm(), delete.version(), delete.id(),
                    versionType, UNASSIGNED_SEQ_NO, 0, origin);
                break;
            case NO_OP:
                final Translog.NoOp noOp = (Translog.NoOp) operation;
                result = markSeqNoAsNoop(engine, noOp.seqNo(), noOp.primaryTerm(), noOp.reason(), origin);
                break;
            default:
                throw new IllegalStateException("No operation defined for [" + operation + "]");
        }
        return result;
    }

通过上面的步骤，translog的重放完毕，此后需要做一些收尾的工作，包括，refresh让回放后的最新数据可见，打开translog gc：

 /**
     * perform the last stages of recovery once all translog operations are done.
     * note that you should still call {@link #postRecovery(String)}.
     */
    public void finalizeRecovery() {
        recoveryState().setStage(RecoveryState.Stage.FINALIZE);
        Engine engine = getEngine();
        engine.refresh("recovery_finalization");
        engine.config().setEnableGcDeletes(true);
    }

到这里，replica shard恢复的两个阶段便完成了，由于此时shard还处于INITIALIZING状态，还需通知master节点启动已恢复的shard：

IndicesClusterStateService#RecoveryListener

        @Override
        public void onRecoveryDone(final RecoveryState state, ShardLongFieldRange timestampMillisFieldRange) {
            shardStateAction.shardStarted(
                    shardRouting,
                    primaryTerm,
                    "after " + state.getRecoverySource(),
                    timestampMillisFieldRange,
                    SHARD_STATE_ACTION_LISTENER);
        }

至此，shard recovery的所有流程都已完成。

小结

疑问点1：网上说的对不对？新版本中有没有更新？

到这里完整跟着腾讯云的文档走了一遍，主体的流程在Elasticsearch 8.0.0-SNAPSHOT中基本一样，只是部分方法有些调整。

所以下面这个流程还是正确的：

ES副本分片恢复主要涉及恢复的目标节点和源节点，目标节点即故障恢复的节点，源节点为提供恢复的节点。目标节点向源节点发送分片恢复请求，源节点接收到请求后主要分两阶段来处理。第一阶段，对需要恢复的shard创建snapshot，然后根据请求中的metadata对比如果 syncid 相同且 doc 数量相同则跳过，否则对比shard的segment文件差异，将有差异的segment文件发送给target node。第二阶段，为了保证target node数据的完整性，需要将本地的translog发送给target node，且对接收到的translog进行回放。整体流程如下图所示：

疑问点2：在分片恢复的时候，如果收到Api _forcemerge请求，这时候，会如何处理?

这部分等看/_forcemerge api的时候，再解答一下。

疑问点3：片恢复的第二阶段是同步translog,这一步会不会加锁？不加锁的话，如何确保是同步完成了？

完整性

首先，phase1阶段，保证了存量的历史数据可以恢复到从分片。phase1阶段完成后，从分片引擎打开，可以正常处理index、delete请求，而translog覆盖完了整个phase1阶段，因此在phase1阶段中的index/delete操作都将被记录下来，在phase2阶段进行translog回放时，副本分片正常的index和delete操作和translog是并行执行的，这就保证了恢复开始之前的数据、恢复中的数据都会完整的写入到副本分片，保证了数据的完整性。如下图所示：

一致性

由于phase1阶段完成后，从分片便可正常处理写入操作，而此时从分片的写入和phase2阶段的translog回放时并行执行的，如果translog的回放慢于正常的写入操作，那么可能会导致老的数据后写入，造成数据不一致。ES为了保证数据的一致性在进行写入操作时，会比较当前写入的版本和lucene文档版本号，如果当前版本更小，说明是旧数据则不会将文档写入lucene。相关代码如下：

// https://github.com/jiankunking/elasticsearch/blob/master/server/src/main/java/org/elasticsearch/index/engine/InternalEngine.java#L993
final OpVsLuceneDocStatus opVsLucene = compareOpToLuceneDocBasedOnSeqNo(index);
if (opVsLucene == OpVsLuceneDocStatus.OP_STALE_OR_EQUAL) {
    plan = IndexingStrategy.processAsStaleOp(index.version(), 0);
}

拓展:如何保证副分片和主分片一致

整理自:《Elasticsearch源码解析与优化实战》

索引恢复过程的一个难点在于如何维护主副分片的一致性。假设副分片恢复期间一直有写操作，如何实现一致呢？
我们先看看早期的做法：在2.0版本之前，副分片恢复要经历三个阶段。

phase1：将主分片的 Lucene 做快照，发送到 target。期间不阻塞索引操作，新增数据写到主分片的translog。
phase2：将主分片translog做快照，发送到target重放，期间不阻塞索引操作。
phase3：为主分片加写锁，将剩余的translog发送到target。此时数据量很小，写入过程的阻塞很短。

从理论上来说，只要流程上允许将写操作阻塞一段时间，实现主副一致是比较容易的。

但是后来(从2.0版本开始)，也就是引入translog.view概念的同时，phase3被删除。

phase3被删除，这个阶段是重放操作(operations)，同时防止新的写入 Engine。这是不必要的，因为自恢复开始，标准的 index 操作会发送所有的操作到正在恢复中的分片。重放恢复开始时获取的view中的所有操作足够保证不丢失任何操作。

阻塞写操作的phase3被删除，恢复期间没有任何写阻塞过程。接下来需要处理的就是解决phase1和phase2之间的写操作与phase2重放操作之间的时序和冲突问题。在副分片节点，phase1结束后，假如新增索引操作和 translog 重放操作并发执行，因为时序的关系会出现新老数据交替。如何实现主副分片一致呢？

假设在第一阶段执行期间，有客户端索引操作要求将docA的内容写为1，主分片执行了这个操作，而副分片由于尚未就绪所以没有执行。第二阶段期间客户端索引操作要求写 docA 的内容为2，此时副分片已经就绪，先执行将docA写为2的新增请求，然后又收到了从主分片所在节点发送过来的translog重复写docA为1的请求该如何处理？具体流程如下图所示。

答案是在写流程中做异常处理，通过版本号来过滤掉过期操作。写操作有三种类型：索引新文档、更新、删除。索引新文档不存在冲突问题，更新和删除操作采用相同的处理机制。每个操作都有一个版本号，这个版本号就是预期doc版本，它必须大于当前Lucene中的doc版本号，否则就放弃本次操作。对于更新操作来说，预期版本号是Lucene doc版本号+1。主分片节点写成功后新数据的版本号会放到写副本的请求中，这个请求中的版本号就是预期版本号。

这样，时序上存在错误的操作被忽略，对于特定doc，只有最新一次操作生效，保证了主副分片一致。

我们分别看一下写操作三种类型的处理机制。

1．索引新文档

不存在冲突问题，不需要处理。

2．更新

判断本次操作的版本号是否小于Lucene中doc的版本号，如果小于，则放弃本次操作。

Index、Delete都继承自Operation，每个Operation都有一个版本号，这个版本号就是doc版本号。对于副分片的写流程来说，正常情况下是主分片写成功后，相应doc写入的版本号被放到转发写副分片的请求中。对于更新来说，就是通过主分片将原doc版本号+1后转发到副分片实现的。在对比版本号的时候：

expectedVersion = 写副分片请求中的 version = 写主分片成功后的version

通过下面的方法判断当前操作的版本号是否低于Lucene中的版本号：

// VersionType#isVersionConflictForWrites
 EXTERNAL((byte) 1) {
        @Override
        public boolean isVersionConflictForWrites(long currentVersion, long expectedVersion, boolean deleted) {
            if (currentVersion == Versions.NOT_FOUND) {
                return false;
            }
            if (expectedVersion == Versions.MATCH_ANY) {
                return true;
            }
            if (currentVersion >= expectedVersion) {
                return true;
            }
            return false;
        }
    }

如果translog重放的操作在写一条"老"数据，则compareOpToLuceneDocBasedOnSeqNo会返回OpVsLuceneDocStatus.OP_STALE_OR_EQUAL。

private OpVsLuceneDocStatus compareOpToLuceneDocBasedOnSeqNo(final Operation op) throws IOException {
        assert op.seqNo() != SequenceNumbers.UNASSIGNED_SEQ_NO : "resolving ops based on seq# but no seqNo is found";
        final OpVsLuceneDocStatus status;
        VersionValue versionValue = getVersionFromMap(op.uid().bytes());
        assert incrementVersionLookup();
        if (versionValue != null) {
            status = compareOpToVersionMapOnSeqNo(op.id(), op.seqNo(), op.primaryTerm(), versionValue);
        } else {
            // load from index
            assert incrementIndexVersionLookup();
            try (Searcher searcher = acquireSearcher("load_seq_no", SearcherScope.INTERNAL)) {
                final DocIdAndSeqNo docAndSeqNo = VersionsAndSeqNoResolver.loadDocIdAndSeqNo(searcher.getIndexReader(), op.uid());
                if (docAndSeqNo == null) {
                    status = OpVsLuceneDocStatus.LUCENE_DOC_NOT_FOUND;
                } else if (op.seqNo() > docAndSeqNo.seqNo) {
                    status = OpVsLuceneDocStatus.OP_NEWER;
                } else if (op.seqNo() == docAndSeqNo.seqNo) {
                    assert localCheckpointTracker.hasProcessed(op.seqNo()) :
                        "local checkpoint tracker is not updated seq_no=" + op.seqNo() + " id=" + op.id();
                    status = OpVsLuceneDocStatus.OP_STALE_OR_EQUAL;
                } else {
                    status = OpVsLuceneDocStatus.OP_STALE_OR_EQUAL;
                }
            }
        }
        return status;
    }

副分片在InternalEngine#index函数中通过plan判断是否写到Lucene：

// non-primary mode (i.e., replica or recovery)
final IndexingStrategy plan = indexingStrategyForOperation(index);

在 indexingStrategyForOperation函数中，plan的最终结果就是plan =IndexingStrategy.processButSkipLucene，后面会跳过写Lucene和translog的逻辑。

3．删除

判断本次操作中的版本号是否小于Lucene中doc的版本号，如果小于，则放弃本次操作。

通过compareOpToLuceneDocBasedOnSeqNo方法判断本次操作是否小于Lucenne中doc的版本号，与Index操作时使用相同的比较函数。

类似的，在InternalEngine#delete函数中判断是否写到Lucene：

final DeletionStrategy plan = deletionStrategyForOperation(delete);

如果translog重放的是一个"老"的删除操作，则compareOpToLuceneDocBasedOnSeqNo会返回OpVsLuceneDocStatus.OP_STALE_OR_EQUAL。

plan的最终结果就是plan=DeletionStrategy.processButSkipLucene，后面会跳过Lucene删除的逻辑。

  /**
     * the status of the current doc version in lucene, compared to the version in an incoming
     * operation
     */
    enum OpVsLuceneDocStatus {
        /** the op is more recent than the one that last modified the doc found in lucene*/
        OP_NEWER,
        /** the op is older or the same as the one that last modified the doc found in lucene*/
        OP_STALE_OR_EQUAL,
        /** no doc was found in lucene */
        LUCENE_DOC_NOT_FOUND
    }

你可能感兴趣的:(ElasticSearch,elasticsearch,分片,恢复,shard,recovery,源码,分析)

探索NebulaGraph：一个开源分布式图数据库的技术解析一休哥助手数据库分布式系统开源分布式数据库
1.介绍NebulaGraph的定位和用途NebulaGraph是一款开源的分布式图数据库，专注于存储和处理大规模图数据。它的主要定位是为了解决图数据存储和分析的问题，能够处理节点和边数量巨大、结构复杂的图结构数据。NebulaGraph被设计用来应对各种领域的图数据挑战，包括社交网络分析、推荐系统、网络安全监测等。无论是从数据量还是计算复杂度上，NebulaGraph都能够应对各种挑战，为用户提
C语言如何生成随机数？(过程逐步分析) 祁同伟. #C语言 c语言
先给大家分享一个查阅函数的网站：cplusplus.com-TheC++ResourcesNetwork我们通过一道题讲解：实现1-100的猜数字游戏先将代码大框架罗列出来：voidmenu(){printf("**********1.play***********\n");printf("**********0.eixt***********\n");}voidgame(){}voidtest(
前端实现埋点&监控 Cipher_Y 前端
前端实现埋点&监控实现埋点功能的意义主要体现在以下几个方面：数据采集：埋点是数据采集领域（尤其是用户行为数据采集领域）的术语，它针对特定用户行为或事件进行捕获、处理和发送的相关技术及其实施过程。通过埋点，可以收集到用户在应用中的所有行为数据，例如页面浏览、按钮点击、表单提交等。数据分析：采集的数据可以帮助业务人员分析网站或者App的使用情况、用户行为习惯等，是后续建立用户画像、用户行为路径等数据产
分片文件异步合并上传零三零等哈来 java spring 前端
对于大文件，为了避免上传导致网络带宽不够用，还有避免内存溢出，我们采用分片异步上传。controller层，在前端对文件进行分片，先计算文件md5码，方便后续文件秒传，然后再计算可以分成多少个分片，得到分片总数以及当前分片下标。@RequestMapping("/uploadFile")@SentinelResource(value="uploadFile",blockHandler="uploa
数据库数值函数详解 web安全工具库数据库 oracle jvm
各类资料学习下载合集https://pan.quark.cn/s/8c91ccb5a474数值函数是数据库中用于处理数值数据的函数，可以用于执行各种数学运算、统计计算等。数值函数在数据分析及处理时非常重要，能够帮助我们进行数据的聚合、计算和转换。在本篇博客中，我们将详细介绍常用的数据库数值函数，并通过Python和SQLite进行示例，帮助您理解和应用这些函数。1.数值函数的基本概念数值函数是用于
加州CA 65测试（Proposition 65）的深度解读南京速跃检测技术服务有限公司学习方法创业创新
以下是关于加州CA65测试（Proposition65）的深度解读，结合法规核心、测试范围及合规影响进行结构化分析：一、法规背景与核心要求1.法规起源-名称：《1986年加州安全饮用水和有毒物质执行法》（SafeDrinkingWaterandToxicEnforcementAct），简称CA65或Prop65。-目的：保护加州居民免受致癌、致畸或生殖毒性化学物质的暴露风险，要求企业提供清晰警告标
数据安全新纪元——多方安全计算与MySQL结合的隐私预算管理深度解析墨夶数据库学习资料1 安全 mysql android
在当今数字化时代，数据已成为企业最宝贵的资产之一。然而，随着数据泄露事件频发，如何确保数据的安全性和隐私性成为了亟待解决的问题。传统的加密技术虽然能在一定程度上保护静态数据，但在动态数据分析过程中却显得力不从心。为了解决这一难题，隐私计算作为一种新兴的技术应运而生，它允许在不解密原始数据的前提下进行有效的计算和分析。本文将深入探讨如何利用多方安全计算（MPC）与关系型数据库MySQL相结合的方式实
股票市场的量化交易策略如何应对市场情绪变化？云策量化程序化炒股量化软件量化交易量化炒股 QMT 股票交易 PTrade 量化交易股票投资 deepseek
推荐阅读：《程序化炒股：如何申请官方交易接口权限？个人账户可以申请吗？》股票市场的量化交易策略如何应对市场情绪变化？在股票市场中，量化交易策略是一种基于数学模型和算法的交易方式，它通过分析历史数据来预测未来价格走势，并据此制定交易决策。然而，市场情绪的变化对股票价格有着不可忽视的影响。本文将探讨量化交易策略如何应对市场情绪的变化，并提供一些具体的代码示例。一、市场情绪的重要性市场情绪是指投资者对市
Deepseek 个性化决策输出 meisongqing DeepSeek 个性化
Deepseek个性化决策输出：基于用户画像的定制化内容生成在教育场景中，通过构建动态用户画像与智能决策模型，教育数字人可基于学生水平实时调整讲解深度，实现精准化、个性化的学习支持。以下是核心实现框架与关键步骤：1.用户画像构建：多维度数据融合数据采集：显性数据：年龄、学科成绩、测试结果、学习时长、知识点掌握进度。隐性数据：交互行为（如答题犹豫时间、回放次数）、情绪识别（语音/表情分析）、认知负荷
“统计视角看世界”专栏阅读引导赛卡统计视角看世界信息可视化数据分析
根据文章主题和逻辑关系，我为您设计以下阅读引导方案：1.六西格玛基础2.帕累托图3.直方图4.散点图基础5.散点图高阶6.多变量可视化7.密度图进阶8.回归分析配套文字说明：入门基石（必读）《1.六西格玛遇上Python》→方法论总纲，建议优先精读基础三剑客（可并行）├─《2.帕累托图》→重点数据排序与决策├─《3.直方图》→数据分布核心工具└─《4.散点图》→数据探索第一视角高阶应用链（递进学习
UI/UX设计服务行业分析 LPiling ui ux
行业现状UI（用户界面）设计关注用户与产品交互的界面设计，包括软件、应用程序、网站或任何数字产品的视觉和操作元素的集合，旨在提供用户友好的界面，使用户能够轻松地使用产品并实现他们的目标。UX（用户体验）设计则更为宏观，关注用户与产品交互过程中的全部体验，包括使用前、使用中和使用后的感受，目标是优化产品的功能性、可用性、易用性，确保用户在使用产品的过程中有良好的体验。近年来，随着技术的不断进步和用户
Kubernetes 资源管理实战：合理配置 CPU 与内存请求和限制 XMYX-0 K8S kubernetes 容器
文章目录Kubernetes资源管理实战：合理配置CPU与内存请求和限制理解Kubernetes中的资源请求与限制资源请求（Requests）资源限制（Limits）单位解析案例分析：20GB服务器与两个服务的内存配置是否有必要设置如此高的内存限制？如何合理配置？补充知识点：监控与自动扩缩容监控工具自动扩缩容（Autoscaling）总结Kubernetes资源管理实战：合理配置CPU与内存请求和
c++介绍进程和线程区别此刻我在家里喂猪呢 c++c++
进程是程序运行的实例，是操作系统分配的资源的基本单位，每个进程有自己独立的地址空间，数据，代码段，相互独立。特点：独立性：进程之间的资源相互独立，一个进程的崩溃不会影响其他进程。资源分配单位：每个进程有独立的内存空间，文件句柄，全局变量。进程间通信复杂：由于进程之间相互独立，进程通信需要额外的进制（如管道，消息队列，信号号，信号量，共享内存等）。进程切换开销大：切换进程时，操作系统要保存和恢复寄存
程序员晋升架构师实战指南甘苦人生职业规划职场和发展
以下是为程序员量身定制的晋升架构师实战指南，结合行业案例与可落地路径，助你完成技术跃迁：一、晋升路径拆解（从Code到Architecture）程序员→高级工程师核心任务：独立完成模块开发（需求分析+方案设计+编码实现）技术重点：掌握1-2门核心语言（如Java/Go）、熟悉主流框架（SpringCloud/Dubbo）案例：主导用户中心模块开发，通过缓存优化将接口响应时间从800ms降至150m
2024MathorCup数学建模之——MathorCup奖杯”获得者经验思路分享美赛数学建模数学建模
一、经验分享1.工具选择：顺手即可。Matlab和Python都是比较主流的选择，二者的应用场合各有不同。Python在数据分析、深度学习方面的优势愈发明显，而Matlab更适合进行物理仿真和数值计算。不过随着Python社区不断发展，其功能也愈发全面与强大，因此我们比较推荐学有余力的情况下可以更早接触Python。2.模型算法：多多益善。不一定要精通所有的算法，但是手上至少要准备一些常用的算法（
Spring Boot 整合 RabbitMQ：注解声明队列与交换机详解 Cloud_. java-rabbitmq spring boot rabbitmq MQ 消息队列
RabbitMQ作为一款高性能的消息中间件，在分布式系统中广泛应用。SpringBoot通过spring-boot-starter-amqp提供了对RabbitMQ的无缝集成，开发者可以借助注解快速声明队列、交换机及绑定规则，极大简化了配置流程。本文将通过代码示例和原理分析，详细介绍如何用注解实现RabbitMQ的集成，并深入解析交换机的作用与类型。一、环境准备1.添加依赖在pom.xml中引入S
量子化学仿真软件：Quantum Espresso_（8）.dos.x模块使用 kkchenjj 分子动力学2 分子动力学仿真模拟模拟仿真人工智能
dos.x模块使用在量子化学仿真软件中，dos.x模块用于计算和分析能态密度（DensityofStates,DOS）。能态密度是描述材料电子结构的重要物理量，可以提供关于材料能带结构、电子态分布和电子性质的详细信息。本节将详细介绍如何使用dos.x模块进行能态密度的计算和分析。1.基本概念1.1能态密度（DOS）定义能态密度（DOS）是指单位能量区间内的量子态数。在固体物理中，DOS可以描述材料
java:实现设置窗体背景颜色为淡蓝色（附带源码） Katie。 Java 实战项目 java 信息可视化开发语言
一、项目简介在桌面应用开发中，窗体背景颜色作为界面设计的重要组成部分，不仅影响整体美观，还能传递特定的情感和品牌信息。本项目旨在使用JavaSwing简单实现将窗体背景颜色设置为淡蓝色效果。该示例展示了如何创建一个基本的JFrame，并通过调用其内容面板的setBackground()方法，设置背景颜色为淡蓝色（RGB值173,216,230）。通过本项目，初学者可以了解Swing基本组件的使用方
使用欧拉法数值求解微分方程的 Python 实现神经网络15044 python 深度学习算法 python 开发语言
编写函数y=Eular(x,h)，使用欧拉法数值求解微分方程初值为函数Eular(x,h)中Cx为计算结束时微分方程x的值，h为计算步长再编写脚本，通过调用函数分别以不同步长(例如h=1.0，h=0.5，h=0.25)计算y(3)，并分析步长和误差之间的关系。以下是使用欧拉法数值求解微分方程的Python实现。假设我们要求解的微分方程是dydx=f(x,y)\frac{dy}{dx}=f(x,y)
若依框架二次开发——启动 RuoYi-Cloud 微服务项目 bjzhang75 项目开发实践微服务若依
文章目录前期准备第一步：拉取RuoYi-Cloud项目源码第二步：初始化数据库1.创建数据库2.导入数据第三步：配置Nacos并启用持久化1.下载并解压Nacos2.启动Nacos3.访问Nacos控制台第四步：安装并运行Redis1.安装Redis2.启动Redis第五步：修改后端配置第六步：启动后端服务第七步：启动前端项目1.进入前端项目目录2.安装前端依赖3.启动前端第八步：访问系统总结Ru
燃爆！程序员如何借助 AI 大模型冲破编程效率枷锁？（以DeepSeek，ChatGPT为例）羑悻的小杀马特. AI学习 chatgpt deepseek AI大模型开发语言
AI大模型已成为程序员提升效率的有力助手。本文聚焦DeepSeek和ChatGPT，探讨程序员如何借其冲破编程效率枷锁。在代码编写阶段，它们能快速生成基础框架、实现特定功能及复杂算法代码；调试时，精准分析错误并给出优化建议；文档生成方面，为函数、类及项目文档助力。程序员需掌握高效交互技巧，结合自身经验，合理利用AI大模型，全面提升编程效率，开启高效编程新境界。目录一·本篇背景：二、AI大模型简介2
Axios源码深度剖析 - XHR篇 IT博客技术分享 ajax node.js javascript
Axios源码深度剖析-XHR篇#Axios源码深度剖析-XHR篇[axios](https://github.com/axios/axios)是一个基于Promise的http请求库，可以用在浏览器和node.js中，目前在github上有42K的star数##分析axios-目录-[axios项目目录结构](#axios项目目录结构)-[名词解释](#名词解释)-[axios内部的运作流程图]
远程视像搬运小车控制系统设计(源码+万字报告+实物) 炳烛之明科技 stm32 嵌入式硬件单片机
目录第1章绪论11.1研究目的及意义11.2国内外研究现状21.3主要研究内容3第2章系统的总体结构42.1总体方案设计42.2功能需求分析42.2.1技术路线42.3单片机型号选择5第3章系统的硬件部分设计63.1系统总体设计63.2系统的主要功能模块设计63.2.1蜂鸣器电路模块设计63.2.2YX4055AM驱动电路模块设计73.2.3按键电路模块设计73.2.4颜色识别传感器模块设计83.
k8s--集群内的pod调用集群外的服务 IT艺术家-rookie k8s与docker容器技术 kubernetes 容器云原生
关于如何让同一个局域网内的Kubernetes服务的Pod访问同一局域网中的电脑上的服务。可能的解决方案包括使用ClusterIP、NodePort、HeadlessService、HostNetwork、ExternalIPs，或者直接使用Pod网络。每种方法都有不同的适用场景，需要逐一分析。例如，ClusterIP是默认的，只能在集群内部访问，所以可能需要其他方式。NodePort会在每个节点
Python 单例模式的 5 种实现方式：深入解析与最佳实践做测试的小薄测试高阶 python 单例模式自动化测试测试框架
单例模式（SingletonPattern）是一种经典的设计模式，其核心思想是确保一个类在整个程序运行期间只有一个实例，并提供一个全局访问点。这种模式在许多场景中非常有用，例如全局配置管理、日志记录器、数据库连接池等。然而，Python的灵活性使得实现单例模式有多种方式，每种方法都有其特点和适用场景。本文将详细介绍Python中实现单例模式的5种常见方法，并深入分析它们的优缺点以及适用场景，帮助您
SAP库龄计算报表（源码） SAP 的寒 SAP精品资源 ABAP
一个简单的库龄计算报表，根据移动类型来判断最后移动日期，包含批次和非批次库存。*&---------------------------------------------------------------------**&ReportZMMR_016*&---------------------------------------------------------------------**
硬核项目 KV 存储，轻松拿捏面试官！程序员老舅 C++Linux后端 KV存储 C++C++后端开发 Redis 内存索引 C++数据结构
硬核项目KV存储，轻松拿捏面试官！在简历上如何写这个项目？项目概述基于Bitcask模型，兼容Redis数据结构和协议的高性能KV存储引擎设计细节采用Key/Value的数据模型，实现数据存储和检索的快速、稳定、高效存储模型：采用Bitcask存储模型，具备高吞吐量和低读写放大的特征持久化：实现了数据的持久化，确保数据的可靠性和可恢复性索引：多种内存索引结构，高效、快速数据访问并发控制：使用锁机制
如何使用Langchain加载AZLyrics网页到可用文档格式 dgay_hua langchain python
##技术背景介绍在处理歌词数据时，尤其是从网页上获取歌词文本内容，用于自然语言处理或文本分析是常见的需求。AZLyrics是一个提供歌词的主要平台，为我们提供了大量的歌词数据。如果我们可以将这些网页内容自动加载到结构化的文档格式中，将极大地提升我们处理和分析歌词的效率。##核心原理解析Langchain提供了一种简单的方式来将网页内容转换为可用的文档格式。通过使用其文档加载器（DocumentLo
【星闪开发连载】WS63E模块的雷达功能浅析神一样的老师星闪技术 OpenHarmony 物联网
目录引言功能简介程序分析操作步骤简单测试结语引言WS63E星闪模块有个特色功能就是雷达运动感知，检测物体是否有运动，作用距离不超过6米。hi3863芯片本身不带雷达功能，是模块提供的相关功能。海思还有个WS63星闪模块，没有雷达感知能力。功能简介从开发板的图片上可以看到，右下角有个安装雷达天线的地方，使用使用1代IPEX接口。润和的套件里面没有带天线，从我的测试看没有天线，其实雷达功能是不正常的。
Radiance Fields from VGGSfM和Mast3r:两种先进3D重建方法的比较与分析 2401_87458718 3d
VGGSfM和Mast3r:3D场景重建的新方向在计算机视觉和3D重建领域,如何从2D图像重建3D场景一直是一个充满挑战的研究课题。近年来,随着深度学习技术的发展,一些新的方法被提出并取得了显著的进展。本文将重点介绍两种最新的基于深度学习的3D重建方法:VGGSfM和Mast3r,并通过GaussianSplatting技术对它们的性能进行全面比较和分析。VGGSfM:基于视觉几何的深度结构运动恢
PHP，安卓，UI，java，linux视频教程合集 cocos2d-x小菜 java UI PHP android linux
╔-----------------------------------╗┆
各表中的列名必须唯一。在表 'dbo.XXX' 中多次指定了列名 'XXX'。 bozch .net .net mvc
在.net mvc5中，在执行某一操作的时候，出现了如下错误：各表中的列名必须唯一。在表 'dbo.XXX' 中多次指定了列名 'XXX'。经查询当前的操作与错误内容无关，经过对错误信息的排查发现，事故出现在数据库迁移上。回想过去：在迁移之前已经对数据库进行了添加字段操作，再次进行迁移插入XXX字段的时候，就会提示如上错误。 &
Java 对象大小的计算 e200702084 java
Java对象的大小如何计算一个对象的大小呢？
Mybatis Spring 171815164 mybatis
ApplicationContext ac = new ClassPathXmlApplicationContext("applicationContext.xml"); CustomerService userService = (CustomerService) ac.getBean("customerService"); Customer cust
JVM 不稳定参数 g21121 jvm
-XX 参数被称为不稳定参数，之所以这么叫是因为此类参数的设置很容易引起JVM 性能上的差异，使JVM 存在极大的不稳定性。当然这是在非合理设置的前提下，如果此类参数设置合理讲大大提高JVM 的性能及稳定性。可以说“不稳定参数”
用户自动登录网站永夜-极光用户
1.目标:实现用户登录后,再次登录就自动登录,无需用户名和密码 2.思路:将用户的信息保存为cookie 每次用户访问网站,通过filter拦截所有请求,在filter中读取所有的cookie,如果找到了保存登录信息的cookie,那么在cookie中读取登录信息,然后直接
centos7 安装后失去win7的引导记录程序员是怎么炼成的操作系统
1.使用root身份(必须)打开 /boot/grub2/grub.cfg 2.找到 ### BEGIN /etc/grub.d/30_os-prober ### 在后面添加 menuentry "Windows 7 (loader) (on /dev/sda1)" {
Oracle 10g 官方中文安装帮助文档以及Oracle官方中文教程文档下载 aijuans oracle
Oracle 10g 官方中文安装帮助文档下载：http://download.csdn.net/tag/Oracle%E4%B8%AD%E6%96%87API%EF%BC%8COracle%E4%B8%AD%E6%96%87%E6%96%87%E6%A1%A3%EF%BC%8Coracle%E5%AD%A6%E4%B9%A0%E6%96%87%E6%A1%A3 Oracle 10g 官方中文教程
JavaEE开源快速开发平台G4Studio_V3.2发布了無為子 AOP oracle mysql javaee G4Studio
我非常高兴地宣布,今天我们最新的JavaEE开源快速开发平台G4Studio_V3.2版本已经正式发布。大家可以通过如下地址下载。访问G4Studio网站 http://www.g4it.org G4Studio_V3.2版本变更日志功能新增 (1).新增了系统右下角滑出提示窗口功能。 (2).新增了文件资源的Zip压缩和解压缩
Oracle常用的单行函数应用技巧总结百合不是茶日期函数转换函数(核心)数字函数通用函数(核心)字符函数
单行函数; 字符函数,数字函数,日期函数,转换函数(核心),通用函数(核心) 一:字符函数: .UPPER(字符串) 将字符串转为大写 .LOWER (字符串) 将字符串转为小写 .INITCAP(字符串) 将首字母大写 .LENGTH (字符串) 字符串的长度 .REPLACE(字符串,'A','_') 将字符串字符A转换成_
Mockito异常测试实例 bijian1013 java 单元测试 mockito
Mockito异常测试实例： package com.bijian.study; import static org.mockito.Mockito.mock; import static org.mockito.Mockito.when; import org.junit.Assert; import org.junit.Test; import org.mockito.
GA与量子恒道统计 Bill_chen JavaScript 浏览器百度 Google 防火墙
前一阵子，统计**网址时，Google Analytics（GA）和量子恒道统计（也称量子统计），数据有较大的偏差，仔细找相关资料研究了下，总结如下：为何GA和量子网站统计（量子统计前身为雅虎统计）结果不同？首先：没有一种网站统计工具能保证百分之百的准确出现该问题可能有以下几个原因：（1）不同的统计分析系统的算法机制不同；（2）统计代码放置的位置和前后
【Linux命令三】Top命令 bit1129 linux命令
Linux的Top命令类似于Windows的任务管理器，可以查看当前系统的运行情况，包括CPU、内存的使用情况等。如下是一个Top命令的执行结果： top - 21:22:04 up 1 day, 23:49, 1 user, load average: 1.10, 1.66, 1.99 Tasks: 202 total, 4 running, 198 sl
spring四种依赖注入方式白糖_ spring
平常的java开发中，程序员在某个类中需要依赖其它类的方法，则通常是new一个依赖类再调用类实例的方法，这种开发存在的问题是new的类实例不好统一管理，spring提出了依赖注入的思想，即依赖类不由程序员实例化，而是通过spring容器帮我们new指定实例并且将实例注入到需要该对象的类中。依赖注入的另一种说法是“控制反转”，通俗的理解是：平常我们new一个实例，这个实例的控制权是我
angular.injector boyitech AngularJS AngularJS API
angular.injector 描述: 创建一个injector对象, 调用injector对象的方法可以获得angular的service, 或者用来做依赖注入. 使用方法: angular.injector(modules, [strictDi]) 参数详解: Param Type Details mod
java-同步访问一个数组Integer[10]，生产者不断地往数组放入整数1000，数组满时等待；消费者不断地将数组里面的数置零，数组空时等待 bylijinnan Integer
public class PC { /** * 题目：生产者-消费者。 * 同步访问一个数组Integer[10]，生产者不断地往数组放入整数1000，数组满时等待；消费者不断地将数组里面的数置零，数组空时等待。 */ private static final Integer[] val=new Integer[10]; private static
使用Struts2.2.1配置 Chen.H apache spring Web xml struts
Struts2.2.1 需要如下 jar包: commons-fileupload-1.2.1.jar commons-io-1.3.2.jar commons-logging-1.0.4.jar freemarker-2.3.16.jar javassist-3.7.ga.jar ognl-3.0.jar spring.jar struts2-core-2.2.1.jar struts2-sp
[职业与教育]青春之歌 comsci 教育
每个人都有自己的青春之歌............但是我要说的却不是青春... 大家如果在自己的职业生涯没有给自己以后创业留一点点机会,仅仅凭学历和人脉关系,是难以在竞争激烈的市场中生存下去的.... &nbs
oracle连接(join)中使用using关键字 daizj JOIN oracle sql using
在oracle连接(join)中使用using关键字 34. View the Exhibit and examine the structure of the ORDERS and ORDER_ITEMS tables. Evaluate the following SQL statement: SELECT oi.order_id, product_id, order_date FRO
NIO示例 daysinsun nio
NIO服务端代码： public class NIOServer { private Selector selector; public void startServer(int port) throws IOException { ServerSocketChannel serverChannel = ServerSocketChannel.open(
C语言学习homework1 dcj3sjt126com c homework
0、课堂练习做完 1、使用sizeof计算出你所知道的所有的类型占用的空间。 int x; sizeof(x); sizeof(int); # include <stdio.h> int main(void) { int x1; char x2; double x3; float x4; printf(&quo
select in order by , mysql排序 dcj3sjt126com mysql
If i select like this: SELECT id FROM users WHERE id IN(3,4,8,1); This by default will select users in this order 1,3,4,8, I would like to select them in the same order that i put IN() values so:
页面校验-新建项目 fanxiaolong 页面校验
$(document).ready( function() { var flag = true; $('#changeform').submit(function() { var projectScValNull = true; var s =""; var parent_id = $("#parent_id").v
Ehcache（02）——ehcache.xml简介 234390216 ehcache ehcache.xml 简介
ehcache.xml简介 ehcache.xml文件是用来定义Ehcache的配置信息的，更准确的来说它是定义CacheManager的配置信息的。根据之前我们在《Ehcache简介》一文中对CacheManager的介绍我们知道一切Ehcache的应用都是从CacheManager开始的。在不指定配置信
junit 4.11中三个新功能 jackyrong java
junit 4.11中两个新增的功能，首先是注解中可以参数化，比如 import static org.junit.Assert.assertEquals; import java.util.Arrays; import org.junit.Test; import org.junit.runner.RunWith; import org.junit.runn
国外程序员爱用苹果Mac电脑的10大理由 php教程分享 windows PHP unix Microsoft perl
Mac 在国外很受欢迎，尤其是在设计/web开发/IT 人员圈子里。普通用户喜欢 Mac 可以理解，毕竟 Mac 设计美观，简单好用，没有病毒。那么为什么专业人士也对 Mac 情有独钟呢？从个人使用经验来看我想有下面几个原因： 1、Mac OS X 是基于 Unix 的这一点太重要了，尤其是对开发人员，至少对于我来说很重要，这意味着Unix 下一堆好用的工具都可以随手捡到。如果你是个 wi
位运算、异或的实际应用 wenjinglian 位运算
一．位操作基础，用一张表描述位操作符的应用规则并详细解释。二．常用位操作小技巧，有判断奇偶、交换两数、变换符号、求绝对值。三．位操作与空间压缩，针对筛素数进行空间压缩。 &n
weblogic部署项目出现的一些问题（持续补充中……） Everyday都不同 weblogic部署失败
好吧，weblogic的问题确实…… 问题一： org.springframework.beans.factory.BeanDefinitionStoreException: Failed to read candidate component class: URL [zip:E:/weblogic/user_projects/domains/base_domain/serve
tomcat7性能调优（01） toknowme tomcat7
Tomcat优化： 1、最大连接数最大线程等设置 <Connector port="8082" protocol="HTTP/1.1" useBodyEncodingForURI="t
PO VO DAO DTO BO TO概念与区别 xp9802 java DAO 设计模式 bean 领域模型
O/R Mapping 是 Object Relational Mapping（对象关系映射）的缩写。通俗点讲，就是将对象与关系数据库绑定，用对象来表示关系数据。在O/R Mapping的世界里，有两个基本的也是重要的东东需要了解，即VO，PO。它们的关系应该是相互独立的，一个VO可以只是PO的部分，也可以是多个PO构成，同样也可以等同于一个PO（指的是他们的属性）。这样，PO独立出来，数据持

【Elasticsearch源码】 分片恢复分析

目的