_Dylan_

es源码解析

接收bulk请求->判断是否需要自动创建index->处理bulk请求（解析request->构建map->循环获取shardId->执行构建index）->写primary ->写replica

bulk请求由多个request的实例组成了一个BulkRequest,入口是org.elasticsearch.rest.action.bulk.RestBulkAction,一个请求会构建一个BulkRequest对象,BulkRequest.add方法会解析提交的文本。

处理路径：
RestBulkAction ->TransportBulkAction ->TransportShardBulkAction

其中TransportShardBulkAction有个继承结构：主入口是TransportAction，具体的业务逻辑实现在子类(TransportReplicationAction)和孙子类(TransportShardBulkAction)里。
TransportShardBulkAction < TransportReplicationAction < TransportAction
RestBulkAction：
bulkRequest.add(request.content(), defaultIndex, defaultType, defaultRouting, defaultFields, null, allowExplicitIndex);
//这里的client其实是NodeClient，NodeClient将请求发送到TransportBulkAction类
client.bulk(bulkRequest, new RestBuilderListener(channel){…})

在服务端接收到其实是一个bulkRequest的实例，拿到这个实例之后，会根据配置action.auto_create_index来决定是否可以自动创建索引（默认是可以的），若是可以创建，会遍历所有的request取得其中包含的index和type，然后在遍历这些index和type，如果集群中不存在相应的index和type，则创建(创建过程这里先不说)之，完成之后才开始真正的bulk执行过程
TransportBulkAction 实现了HandledTransportAction，说明这个类同时也是一个逻辑处理类。

HandledTransportAction：
@Override
public final void messageReceived(final Request request, final TransportChannel channel, Task task) throws Exception {
    // We already got the task created on the netty layer - no need to create it again on the transport layer
    execute(task, request, new ActionListener() {


public class TransportBulkAction extends HandledTransportAction<BulkRequest, BulkResponse> {
     ·························
protected void doExecute(final BulkRequest bulkRequest, final ActionListener listener) {
     ·······
if (autoCreateIndex.needToCheck()) {
        // Keep track of all unique indices and all unique types per index for the create index requests:
        final Map> indicesAndTypes = new HashMap<>();
        for (ActionRequest request : bulkRequest.requests) {
            if (request instanceof DocumentRequest) {
                DocumentRequest req = (DocumentRequest) request;
                Set types = indicesAndTypes.get(req.index());
                if (types == null) {
                    indicesAndTypes.put(req.index(), types = new HashSet<>());
                }
                types.add(req.type());
            } else {
                throw new ElasticsearchException("Parsed unknown request in bulk actions: " + request.getClass().getSimpleName());
            }
        }
        final AtomicInteger counter = new AtomicInteger(indicesAndTypes.size());
        ClusterState state = clusterService.state();
        for (Map.Entry> entry : indicesAndTypes.entrySet()) {
            final String index = entry.getKey();
            if (autoCreateIndex.shouldAutoCreate(index, state)) {                            
          //判断是否自动创建索引，若不存在则创建索引，直接跳入execute执行，否则由executeBulk执行处理请求
                CreateIndexRequest createIndexRequest = new CreateIndexRequest(bulkRequest);
                createIndexRequest.index(index);
                for (String type : entry.getValue()) {
                    createIndexRequest.mapping(type);
                }
                createIndexRequest.cause("auto(bulk api)");
                createIndexRequest.masterNodeTimeout(bulkRequest.timeout());
                createIndexAction.execute(createIndexRequest, new ActionListener() {
                    @Override
                    public void onResponse(CreateIndexResponse result) {
                        if (counter.decrementAndGet() == 0) {
                            try {
                                executeBulk(bulkRequest, startTime, listener, responses);
                            } catch (Throwable t) {
                                listener.onFailure(t);
                            }
                        }
                    }

                    @Override
                    public void onFailure(Throwable e) {
                        if (!(ExceptionsHelper.unwrapCause(e) instanceof IndexAlreadyExistsException)) {
                            // fail all requests involving this index, if create didnt work
                            for (int i = 0; i < bulkRequest.requests.size(); i++) {
                                ActionRequest request = bulkRequest.requests.get(i);
                                if (request != null && setResponseFailureIfIndexMatches(responses, i, request, index, e)) {
                                    bulkRequest.requests.set(i, null);
                                }
                            }
                        }
                        if (counter.decrementAndGet() == 0) {
                            try {
                                executeBulk(bulkRequest, startTime, listener, responses);
                            } catch (Throwable t) {
                                listener.onFailure(t);
                            }
                        }
                    }
                });
            } else {
                if (counter.decrementAndGet() == 0) {
                    executeBulk(bulkRequest, startTime, listener, responses);
                }
            }
        }
    } else {
        executeBulk(bulkRequest, startTime, listener, responses);
    }
}

bulk请求过程：

接着通过executeBulk方法进入bulk请求流程。在该方法中，对bulkRequest.requests 进行两次循环。

判断集群的是否block了读操作,如果是blocked，就会返回timeout
// TODO use timeout to wait here if its blocked…
clusterState.blocks().globalBlockedRaiseException(ClusterBlockLevel.WRITE);
第一次判定如果是IndexRequest就调用IndexRequest.process方法，主要是为了解析出timestamp,routing,id,parent 等字段。

for (int i = 0; i < bulkRequest.requests.size(); i++) {
ActionRequest request = bulkRequest.requests.get(i);
//the request can only be null because we set it to null in the previous step, so it gets ignored
if (request == null) {
continue;
}
DocumentRequest documentRequest = (DocumentRequest) request;
if (addFailureIfIndexIsUnavailable(documentRequest, bulkRequest, responses, i, concreteIndices, metaData)) {
continue;
}
String concreteIndex = concreteIndices.resolveIfAbsent(documentRequest);
if (request instanceof IndexRequest) {
IndexRequest indexRequest = (IndexRequest) request;
MappingMetaData mappingMd = null;
if (metaData.hasIndex(concreteIndex)) {
mappingMd = metaData.index(concreteIndex).mappingOrDefault(indexRequest.type());
}
try {
indexRequest.process(metaData, mappingMd, allowIdGeneration, concreteIndex);
} catch (ElasticsearchParseException | RoutingMissingException e) {
BulkItemResponse.Failure failure = new BulkItemResponse.Failure(concreteIndex, indexRequest.type(),
indexRequest.id(), e);
BulkItemResponse bulkItemResponse = new BulkItemResponse(i, “index”, failure);
responses.set(i, bulkItemResponse);
// make sure the request gets never processed again
bulkRequest.requests.set(i, null);
}
} else if (request instanceof DeleteRequest) {
try {
TransportDeleteAction.resolveAndValidateRouting(metaData, concreteIndex, (DeleteRequest)request);
} catch(RoutingMissingException e) {
BulkItemResponse.Failure failure = new BulkItemResponse.Failure(concreteIndex, documentRequest.type(),
documentRequest.id(), e);
BulkItemResponse bulkItemResponse = new BulkItemResponse(i, “delete”, failure);
responses.set(i, bulkItemResponse);
// make sure the request gets never processed again
bulkRequest.requests.set(i, null);
}
} else if (request instanceof UpdateRequest) {
try {
TransportUpdateAction.resolveAndValidateRouting(metaData, concreteIndex, (UpdateRequest)request);
} catch(RoutingMissingException e) {
BulkItemResponse.Failure failure = new BulkItemResponse.Failure(concreteIndex, documentRequest.type(),
documentRequest.id(), e);
BulkItemResponse bulkItemResponse = new BulkItemResponse(i, “update”, failure);
responses.set(i, bulkItemResponse);
// make sure the request gets never processed again
bulkRequest.requests.set(i, null);
}
} else {
throw new AssertionError(“request type not supported: [” + request.getClass().getName() + “]”);
} }

进行一系列加工操作（获取routing、指定timestamp（没有就使用当前时间）、在
this.allowIdGeneration = this.settings.getAsBoolean(“action.bulk.action.allow_id_generation”, true);
配置为true的情况下，会自动生成一个base64UUID作为id字段，并会将request的opType字段置为CREATE，如果是使用es自动生成的id的话，默认就是createdocument而不是updatedocument）

public void process(MetaData metaData, @Nullable MappingMetaData mappingMd, boolean allowIdGeneration, String concreteIndex) {
    // resolve the routing if needed
    routing(metaData.resolveIndexRouting(routing, index));

    // resolve timestamp if provided externally
    if (timestamp != null) {
        timestamp = MappingMetaData.Timestamp.parseStringTimestamp(timestamp,
                mappingMd != null ? mappingMd.timestamp().dateTimeFormatter() : TimestampFieldMapper.Defaults.DATE_TIME_FORMATTER,
                getVersion(metaData, concreteIndex));
    }
    // extract values if needed
    if (mappingMd != null) {
        MappingMetaData.ParseContext parseContext = mappingMd.createParseContext(id, routing, timestamp);

        if (parseContext.shouldParse()) {
            XContentParser parser = null;
            try {
                parser = XContentHelper.createParser(source);
                mappingMd.parse(parser, parseContext);
                if (parseContext.shouldParseId()) {
                    id = parseContext.id();
                }
                if (parseContext.shouldParseRouting()) {
                    if (routing != null && !routing.equals(parseContext.routing())) {
                        throw new MapperParsingException("The provided routing value [" + routing + "] doesn't match the routing key stored in the document: [" + parseContext.routing() + "]");
                    }
                    routing = parseContext.routing();
                }
                if (parseContext.shouldParseTimestamp()) {
                    timestamp = parseContext.timestamp();
                    if (timestamp != null) {
                        timestamp = MappingMetaData.Timestamp.parseStringTimestamp(timestamp, mappingMd.timestamp().dateTimeFormatter(), getVersion(metaData, concreteIndex));
                    }
                }
            } catch (MapperParsingException e) {
                throw e;
            } catch (Exception e) {
                throw new ElasticsearchParseException("failed to parse doc to extract routing/timestamp/id", e);
            } finally {
                if (parser != null) {
                    parser.close();
                }
            }
        }

        // might as well check for routing here
        if (mappingMd.routing().required() && routing == null) {
            throw new RoutingMissingException(concreteIndex, type, id);
        }

        if (parent != null && !mappingMd.hasParentField()) {
            throw new IllegalArgumentException("Can't specify parent if no parent field has been configured");
        }
    } else {
        if (parent != null) {
            throw new IllegalArgumentException("Can't specify parent if no parent field has been configured");
        }
    }

    // generate id if not already provided and id generation is allowed
    if (allowIdGeneration) {
        if (id == null) {
            id(Strings.base64UUID());
            // since we generate the id, change it to CREATE
            opType(IndexRequest.OpType.CREATE);
            autoGeneratedId = true;
        }
    }

    // generate timestamp if not provided, we always have one post this stage...
    if (timestamp == null) {
        String defaultTimestamp = TimestampFieldMapper.Defaults.DEFAULT_TIMESTAMP;
        if (mappingMd != null && mappingMd.timestamp() != null) {
            // If we explicitly ask to reject null timestamp
            if (mappingMd.timestamp().ignoreMissing() != null && mappingMd.timestamp().ignoreMissing() == false) {
                throw new TimestampParsingException("timestamp is required by mapping");
            }
            defaultTimestamp = mappingMd.timestamp().defaultTimestamp();
        }

        if (defaultTimestamp.equals(TimestampFieldMapper.Defaults.DEFAULT_TIMESTAMP)) {
            timestamp = Long.toString(System.currentTimeMillis());
        } else {
            timestamp = MappingMetaData.Timestamp.parseStringTimestamp(defaultTimestamp, mappingMd.timestamp().dateTimeFormatter(), getVersion(metaData, concreteIndex));
        }
    }
}

第二次是为了对数据进行分拣，构建如下的数据结构
// first, go over all the requests and create a ShardId -> Operations mapping
Map

for (int i = 0; i < bulkRequest.requests.size(); i++) {
    ActionRequest request = bulkRequest.requests.get(i);
    if (request instanceof IndexRequest) {
        IndexRequest indexRequest = (IndexRequest) request;
        String concreteIndex = concreteIndices.getConcreteIndex(indexRequest.index());

      // 获取每个request应该发送到的shardId(获取过程： request有routing就直接返回，如果没有，会先对id求一个hash

    ShardId shardId = clusterService.operationRouting().indexShards(clusterState, concreteIndex, indexRequest.type(), indexRequest.id(), indexRequest.routing()).shardId();
    List list = requestsByShard.get(shardId);
    if (list == null) {
        list = new ArrayList<>();
        requestsByShard.put(shardId, list);
    }
    list.add(new BulkItemRequest(i, request));
} else if (request instanceof DeleteRequest) {
    DeleteRequest deleteRequest = (DeleteRequest) request;
    String concreteIndex = concreteIndices.getConcreteIndex(deleteRequest.index());
    ShardId shardId = clusterService.operationRouting().indexShards(clusterState, concreteIndex, deleteRequest.type(), deleteRequest.id(), deleteRequest.routing()).shardId();
    List list = requestsByShard.get(shardId);
    if (list == null) {
        list = new ArrayList<>();
        requestsByShard.put(shardId, list);
    }
    list.add(new BulkItemRequest(i, request));
} else if (request instanceof UpdateRequest) {
    UpdateRequest updateRequest = (UpdateRequest) request;
    String concreteIndex = concreteIndices.getConcreteIndex(updateRequest.index());
    ShardId shardId = clusterService.operationRouting().indexShards(clusterState, concreteIndex, updateRequest.type(), updateRequest.id(), updateRequest.routing()).shardId();
    List list = requestsByShard.get(shardId);
    if (list == null) {
        list = new ArrayList<>();
        requestsByShard.put(shardId, list);
    }
    list.add(new BulkItemRequest(i, request));
}

}
获取shardId对request按shard来分组，进而进行分片处理。

private int generateShardId(ClusterState clusterState, String index,
String type, String id, @Nullable String routing) {
IndexMetaData indexMetaData = clusterState.metaData().index(index);
if (indexMetaData == null) {
throw new IndexNotFoundException(index);
}
final Version createdVersion = indexMetaData.getCreationVersion();
final HashFunction hashFunction = indexMetaData.getRoutingHashFunction();
final boolean useType = indexMetaData.getRoutingUseType();

final int hash;
if (routing == null) {
    if (!useType) {
        hash = hash(hashFunction, id);
    } else {
        hash = hash(hashFunction, type, id);
    }
} else {
    hash = hash(hashFunction, routing);
}
if (createdVersion.onOrAfter(Version.V_2_0_0_beta1)) {
    return MathUtils.mod(hash, indexMetaData.getNumberOfShards());
} else {
    return Math.abs(hash % indexMetaData.getNumberOfShards());
}

}

有了ShardId,bulkRequest,List[BulkItemRequest]等信息后，遍历map 统一封装成BulkShardRequest，就是对属于同一ShardId的数据构建一个新的类似BulkRequest的对象。包含配置consistencyLevel和timeout。

for (Map.Entry> entry : requestsByShard.entrySet()) {
    final ShardId shardId = entry.getKey();
    final List requests = entry.getValue();
    BulkShardRequest bulkShardRequest = new BulkShardRequest(bulkRequest, shardId, bulkRequest.refresh(), requests.toArray(new BulkItemRequest[requests.size()]));

    bulkShardRequest.consistencyLevel(bulkRequest.consistencyLevel());
    bulkShardRequest.timeout(bulkRequest.timeout());

    //这里的shardBulkAction 是TransportShardBulkAction
    shardBulkAction.execute(bulkShardRequest, new ActionListener() {
        @Override
        public void onResponse(BulkShardResponse bulkShardResponse) {
            for (BulkItemResponse bulkItemResponse : bulkShardResponse.getResponses()) {
                // we may have no response if item failed
                if (bulkItemResponse.getResponse() != null) {
                    bulkItemResponse.getResponse().setShardInfo(bulkShardResponse.getShardInfo());
                }
                responses.set(bulkItemResponse.getItemId(), bulkItemResponse);
            }
            if (counter.decrementAndGet() == 0) {
                finishHim();
            }
        }

       ·······························
}

根据继承关系TransportShardBulkAction < TransportReplicationAction < TransportAction ，shardBulkAction.execute 其实流程逻辑还是TransportReplicationAction来完成的。入口在该类的doExecute方法:

/**
* Responsible for routing and retrying failed operations on the primary.
* The actual primary operation is done in {@link PrimaryPhase} on the
* node with primary copy.
*
* Resolves index and shard id for the request before routing it to target node
*/
@Override
protected void doExecute(Task task, Request request, ActionListener listener) {
    new ReroutePhase((ReplicationTask) task, request, listener).run();
}

#####不直接从primary分片复制，而是分别写入， 防止主分片数据损坏导致从分片数据损坏
一条数据被索引后需要经过两个阶段：
将数据写入Primary(主分片)
将数据写入Replication(从分片)
并从集群state中获得primary shard，如果primary shard不是active的会有retry机制。如果primary在本机就直接执行，如果不在会再发送到其shard所在的node。
/**
* Responsible for routing and retrying failed operations on the primary.
* The actual primary operation is done in {@link PrimaryPhase} on the
* node with primary copy.
*
* Resolves index and shard id for the request before routing it to target node
*/
final class ReroutePhase extends AbstractRunnable {

@Override
protected void doRun() {
    setPhase(task, "routing");
    final ClusterState state = observer.observedState();
    ClusterBlockException blockException = state.blocks().globalBlockedException(globalBlockLevel());
    if (blockException != null) {
        handleBlockException(blockException);
        return;
    }
    final String concreteIndex = resolveIndex() ? indexNameExpressionResolver.concreteSingleIndex(state, request) : request.index();
    blockException = state.blocks().indexBlockedException(indexBlockLevel(), concreteIndex);
    if (blockException != null) {
        handleBlockException(blockException);
        return;
    }
     //请求没有shardId,需要传递具体的索引来解析sharId
    resolveRequest(state.metaData(), concreteIndex, request);
    assert request.shardId() != null : "request shardId must be set in resolveRequest";

    IndexShardRoutingTable indexShard = state.getRoutingTable().shardRoutingTable(request.shardId().getIndex(), request.shardId().id());
    final ShardRouting primary = indexShard.primaryShard();
     //如果primary shard不是active的会有retry机制
    if (primary == null || primary.active() == false) {
        logger.trace("primary shard [{}] is not yet active, scheduling a retry: action [{}], request [{}], cluster state version [{}]", request.shardId(), actionName, request, state.version());
        retryBecauseUnavailable(request.shardId(), "primary shard is not active");
        return;
    }
    if (state.nodes().nodeExists(primary.currentNodeId()) == false) {
        logger.trace("primary shard [{}] is assigned to an unknown node [{}], scheduling a retry: action [{}], request [{}], cluster state version [{}]", request.shardId(), primary.currentNodeId(), actionName, request, state.version());
        retryBecauseUnavailable(request.shardId(), "primary shard isn't assigned to a known node.");
        return;
    }
    final DiscoveryNode node = state.nodes().get(primary.currentNodeId());
    taskManager.registerChildTask(task, node.getId());

     //如果primary在本机就直接执行，如果不在会再发送到其shard所在的node
    if (primary.currentNodeId().equals(state.nodes().localNodeId())) {
        setPhase(task, "waiting_on_primary");
        if (logger.isTraceEnabled()) {
            logger.trace("send action [{}] on primary [{}] for request [{}] with cluster state version [{}] to [{}] ", transportPrimaryAction, request.shardId(), request, state.version(), primary.currentNodeId());
        }
        performAction(node, transportPrimaryAction, true);
    } else {
        if (logger.isTraceEnabled()) {
            logger.trace("send action [{}] on primary [{}] for request [{}] with cluster state version [{}] to [{}]", actionName, request.shardId(), request, state.version(), primary.currentNodeId());
        }
        setPhase(task, "rerouted");   
        performAction(node, actionName, false);
    }
}

private void performAction(final DiscoveryNode node, final String action, final boolean isPrimaryAction) {

    transportService.sendRequest(node, action, request, transportOptions, new BaseTransportResponseHandler() {

......
}}

TransportService.java ：
private void sendLocalRequest(long requestId, final String action, final TransportRequest request) {
    final DirectResponseChannel channel = new DirectResponseChannel(logger, localNode, action, requestId, adapter, threadPool);
    try {
        final RequestHandlerRegistry reg = adapter.getRequestHandler(action);
        if (reg == null) {
            throw new ActionNotFoundTransportException("Action [" + action + "] not found");
        }
        final String executor = reg.getExecutor();
        if (ThreadPool.Names.SAME.equals(executor)) {
            //noinspection unchecked
            reg.processMessageReceived(request, channel);
        } else {
            threadPool.executor(executor).execute(new AbstractRunnable() {
                @Override
                protected void doRun() throws Exception {
                    //noinspection unchecked
                    reg.processMessageReceived(request, channel);
                }

                @Override
                public boolean isForceExecution() {
                    return reg.isForceExecution();
                }

                @Override
                public void onFailure(Throwable e) {
                    try {
                        channel.sendResponse(e);
                    } catch (Throwable e1) {
                        logger.warn("failed to notify channel of error message for action [" + action + "]", e1);
                        logger.warn("actual exception", e);
                    }
                }
            });
        }

    } catch (Throwable e) {
        try {
            channel.sendResponse(e);
        } catch (Throwable e1) {
            logger.warn("failed to notify channel of error message for action [" + action + "]", e1);
            logger.warn("actual exception", e1);
        }
    }

}

//最后的请求实现还是primary的doRun来实现

class PrimaryOperationTransportHandler extends TransportRequestHandler<Request> {
    @Override
    public void messageReceived(final Request request, final TransportChannel channel) throws Exception {
        throw new UnsupportedOperationException("the task parameter is required for this operation");
    }

    @Override
    public void messageReceived(Request request, TransportChannel channel, Task task) throws Exception {
        new PrimaryPhase((ReplicationTask) task, request, channel).run();
    }
}


/**
* Responsible for performing primary operation locally and delegating to replication action once successful
* 
* Note that as soon as we move to replication action, state responsibility is transferred to {@link ReplicationPhase}.
*/
final class PrimaryPhase extends AbstractRunnable {

     ................

    @Override
    public void onFailure(Throwable e) {
        finishAsFailed(e);
    }

    @Override
    protected void doRun() throws Exception {
        setPhase(task, "primary");
        // request shardID was set in ReroutePhase
        assert request.shardId() != null : "request shardID must be set prior to primary phase";
        final ShardId shardId = request.shardId();
        final String writeConsistencyFailure = checkWriteConsistency(shardId);
        if (writeConsistencyFailure != null) {
            finishBecauseUnavailable(shardId, writeConsistencyFailure);
            return;
        }
        final ReplicationPhase replicationPhase;
        try {
            indexShardReference = getIndexShardOperationsCounter(shardId);
               //primary写入索引
            Tuple primaryResponse = shardOperationOnPrimary(state.metaData(), request);
            if (logger.isTraceEnabled()) {
                logger.trace("action [{}] completed on shard [{}] for request [{}] with cluster state version [{}]", transportPrimaryAction, shardId, request, state.version());
            }
            replicationPhase = new ReplicationPhase(task, primaryResponse.v2(), primaryResponse.v1(), shardId, channel,
                indexShardReference);
        } catch (Throwable e) {
            request.setCanHaveDuplicates();
            if (ExceptionsHelper.status(e) == RestStatus.CONFLICT) {
                if (logger.isTraceEnabled()) {
                    logger.trace("failed to execute [{}] on [{}]", e, request, shardId);
                }
            } else {
                if (logger.isDebugEnabled()) {
                    logger.debug("failed to execute [{}] on [{}]", e, request, shardId);
                }
            }
            finishAsFailed(e);
            return;
        }
          //primary执行结束，转交replication执行请求
        finishAndMoveToReplication(replicationPhase);
    }

    /**
    * checks whether we can perform a write based on the write consistency setting
    * returns **null* if OK to proceed, or a string describing the reason to stop
    */
    String checkWriteConsistency(ShardId shardId) {
        ......
    }

    /**
    * upon success, finish the first phase and transfer responsibility to the {@link ReplicationPhase}
    */
    void finishAndMoveToReplication(ReplicationPhase replicationPhase) {
        if (finished.compareAndSet(false, true)) {
            replicationPhase.run();
        } else {
            assert false : "finishAndMoveToReplication called but operation is already finished";
        }
    }

        ......
primary写入过程： 接着进入shardOperationOnPrimary 方法,该方法是在孙子类TransportShardBulkAction类里实现的。

protected Tuple shardOperationOnPrimary(MetaData metaData, BulkShardRequest request) {
           ......
          //版本号实现了对并发修改的支持
    long[] preVersions = new long[request.items().length];
    VersionType[] preVersionTypes = new VersionType[request.items().length];
    Translog.Location location = null;

//这里的request是BulkShardRequest,对应的items则是BulkItemRequest集合，遍历根据BulkItemRequest的不同类型分为三个分支
//分别是IndexRequest,DeleteRequest,UpdateRequest三种

    for (int requestIndex = 0; requestIndex < request.items().length; requestIndex++) {
        BulkItemRequest item = request.items()[requestIndex];
        if (item.request() instanceof IndexRequest) {
            IndexRequest indexRequest = (IndexRequest) item.request();
            preVersions[requestIndex] = indexRequest.version();
            preVersionTypes[requestIndex] = indexRequest.versionType();
            try {
                WriteResult result = shardIndexOperation(request, indexRequest, metaData, indexShard, true);
                location = locationToSync(location, result.location);
                // add the response
                IndexResponse indexResponse = result.response();
                setResponse(item, new BulkItemResponse(item.id(), indexRequest.opType().lowercase(), indexResponse));
            } catch (Throwable e) {
                // rethrow the failure if we are going to retry on primary and let parent failure to handle it
                if (retryPrimaryException(e)) {
                    // restore updated versions...
                    for (int j = 0; j < requestIndex; j++) {
                        applyVersion(request.items()[j], preVersions[j], preVersionTypes[j]);
                    }
                    throw (ElasticsearchException) e;
                }
                if (ExceptionsHelper.status(e) == RestStatus.CONFLICT) {
                    logger.trace("{} failed to execute bulk item (index) {}", e, request.shardId(), indexRequest);
                } else {
                    logger.debug("{} failed to execute bulk item (index) {}", e, request.shardId(), indexRequest);
                }
                // if its a conflict failure, and we already executed the request on a primary (and we execute it
                // again, due to primary relocation and only processing up to N bulk items when the shard gets closed)
                // then just use the response we got from the successful execution
                if (item.getPrimaryResponse() != null && isConflictException(e)) {
                    setResponse(item, item.getPrimaryResponse());
                } else {
                    setResponse(item, new BulkItemResponse(item.id(), indexRequest.opType().lowercase(),
                            new BulkItemResponse.Failure(request.index(), indexRequest.type(), indexRequest.id(), e)));
                }
            }
        } else if (item.request() instanceof DeleteRequest) {
          ······
        } else if (item.request() instanceof UpdateRequest) {
          ······

    processAfterWrite(request.refresh(), indexShard, location);
    BulkItemResponse[] responses = new BulkItemResponse[request.items().length];
    BulkItemRequest[] items = request.items();
    for (int i = 0; i < items.length; i++) {
        responses[i] = items[i].getPrimaryResponse();
    }
    return new Tuple<>(new BulkShardResponse(request.shardId(), responses), request);
}

shardIndexOperation里嵌套的核心方法是executeIndexRequestOnPrimary,该方法第一步是获取到Operation对象,
Engine对象是对Lucene的IndexWriter，Searcher之类的封装。这里的Engine.IndexingOperation对应的是Create或者Index类。这两个类理解为待索引的Document的Operation

/**
* Execute the given {@link IndexRequest} on a primary shard, throwing a
* {@link RetryOnPrimaryException} if the operation needs to be re-tried.
*/
public static WriteResult executeIndexRequestOnPrimary(BulkShardRequest shardRequest, IndexRequest request, IndexShard indexShard, MappingUpdatedAction mappingUpdatedAction) throws Throwable {
          //返回一个创建索引的operation，由execute来执行
    Engine.IndexingOperation operation = prepareIndexOperationOnPrimary(shardRequest, request, indexShard);
     //第二步是判断索引的Mapping是不是要动态更新，如果是，则更新
    Mapping update = operation.parsedDoc().dynamicMappingsUpdate();
    final ShardId shardId = indexShard.shardId();
    if (update != null) {
        final String indexName = shardId.getIndex();
        mappingUpdatedAction.updateMappingOnMasterSynchronously(indexName, request.type(), update);
        operation = prepareIndexOperationOnPrimary(shardRequest, request, indexShard);
        update = operation.parsedDoc().dynamicMappingsUpdate();
        if (update != null) {
            throw new RetryOnPrimaryException(shardId,
                "Dynamic mappings are not available on the node that holds the primary yet");
        }
    }
     //最后由execute执行实际的建索引工作
    final boolean created = operation.execute(indexShard);

     //执行结束后就能获得对应文档的版本等信息，这些信息会更新对应的IndexRequest等对象。

    // update the version on request so it will happen on the replicas
    final long version = operation.version();
    request.version(version);
    request.versionType(request.versionType().versionTypeForReplicationAndRecovery());

    assert request.versionType().validateVersionForWrites(request.version());

    return new WriteResult<>(new IndexResponse(shardId.getIndex(), request.type(), request.id(), request.version(), created), operation.getTranslogLocation());
}

1、在primary的节点上，遍历上文生成的BulkItemRequest中的request，分别对每一个request做处理（这里包括生成uid，对nested、parent-child类型做处理等等）。，根据上文中的opType字段来分为INDEX和CREATE两种operation。

/**
* Utility method to create either an index or a create operation depending
* on the {@link IndexRequest.OpType} of the request.
*/
public static Engine.IndexingOperation prepareIndexOperationOnPrimary(BulkShardRequest shardRequest, IndexRequest request, IndexShard indexShard) {
    SourceToParse sourceToParse = SourceToParse.source(SourceToParse.Origin.PRIMARY, request.source()).index(request.index()).type(request.type()).id(request.id())
        .routing(request.routing()).parent(request.parent()).timestamp(request.timestamp()).ttl(request.ttl());
    boolean canHaveDuplicates = request.canHaveDuplicates();
    if (shardRequest != null) {
        canHaveDuplicates |= shardRequest.canHaveDuplicates();
    }
    if (request.opType() == IndexRequest.OpType.INDEX) {
        return indexShard.prepareIndexOnPrimary(sourceToParse, request.version(), request.versionType(), canHaveDuplicates);
    } else {
        assert request.opType() == IndexRequest.OpType.CREATE : request.opType();
        return indexShard.prepareCreateOnPrimary(sourceToParse, request.version(), request.versionType(), canHaveDuplicates, request.autoGeneratedId());
    }
}



static Engine.Index prepareIndex(DocumentMapperForType docMapper, SourceToParse source, long version, VersionType versionType, Engine
        .Operation.Origin origin, boolean canHaveDuplicates) {
    long startTime = System.nanoTime();
     /** 解析json为ParsedDocument */
    ParsedDocument doc = docMapper.getDocumentMapper().parse(source);
    if (docMapper.getMapping() != null) {
        doc.addDynamicMappingsUpdate(docMapper.getMapping());
    }

    return new Engine.Index(docMapper.getDocumentMapper().uidMapper().term(doc.uid().stringValue()), doc, version, versionType,
            origin, startTime, canHaveDuplicates);
}


首先来看看直接CREATE的：
最后 调用indexShard对象的create方法来进行索引的创建
IndexShard.java
public void create(Engine.Create create) {
    ensureWriteAllowed(create);
    markLastWrite();
    create = indexingService.preCreate(create);
    try {
        if (logger.isTraceEnabled()) {
            logger.trace("index [{}][{}]{}", create.type(), create.id(), create.docs());
        }
        engine().create(create);
        create.endTime(System.nanoTime());
    } catch (Throwable ex) {
        indexingService.postCreate(create, ex);
        throw ex;
    }
    indexingService.postCreate(create);
}

engine()方法返回的是Engine实例，create方法最后实现是由InternalEngine .innerCreate方法来执行构建索引操作。
因为写入时并发的，所以对于每一个写入都加了锁，synchronized (dirtyLock(create.uid()))用uid来判别，防止写入脏数据。
private void innerCreate(Create create) throws IOException {

     /**首先，如果满足如下三个条件就无需进行版本检查：
*- index.optimize_auto_generated_id 被设置为true
*- id设置为自动生成
*- create.canHaveDuplicates == false 
          /*
     ！！！采用自动生成的ID,可以跳过版本检查，从而提高入库的效率。

    if (engineConfig.isOptimizeAutoGenerateId() && create.autoGeneratedId() && !create.canHaveDuplicates()) {
        // We don't need to lock because this ID cannot be concurrently updated:
        innerCreateNoLock(create, Versions.NOT_FOUND, null);
    } else {
        synchronized (dirtyLock(create.uid())) {
            final long currentVersion;
            final VersionValue versionValue;
          //如果对应文档在缓存中没有找到(versionMap),那么就会由如下的代码执行实际磁盘查询操作
            versionValue = versionMap.getUnderLock(create.uid().bytes());
            if (versionValue == null) {
                currentVersion = loadCurrentVersionFromIndex(create.uid());
            } else {
                if (engineConfig.isEnableGcDeletes() && versionValue.delete() && (engineConfig.getThreadPool().estimatedTimeInMillis
                        () - versionValue.time()) > engineConfig.getGcDeletesInMillis()) {
                    currentVersion = Versions.NOT_FOUND; // deleted, and GC
                } else {
                    currentVersion = versionValue.version();
                }
            }
            innerCreateNoLock(create, currentVersion, versionValue);
        }
    }
}

通过对比create对象里的版本号和从索引文件里加载的版本号，最终决定是进行update还是create操作。
开始写入索引indexWrite：indexWriter.updateDocuments(create.uid(), create.docs());具体细节由lucence去做

private void innerCreateNoLock(Create create, long currentVersion, VersionValue versionValue) throws IOException {

    // same logic as index
    long updatedVersion;
    long expectedVersion = create.version();
    if (create.versionType().isVersionConflictForWrites(currentVersion, expectedVersion)) {
        if (create.origin() == Operation.Origin.RECOVERY) {
            return;
        } else {
            throw new VersionConflictEngineException(shardId, create.type(), create.id(), currentVersion, expectedVersion);
        }
    }
    updatedVersion = create.versionType().updateVersion(currentVersion, expectedVersion);

    // if the doc exists
    boolean doUpdate = false;
    if ((versionValue != null && versionValue.delete() == false) || (versionValue == null && currentVersion != Versions.NOT_FOUND)) {
        if (create.origin() == Operation.Origin.RECOVERY) {
            return;
        } else if (create.origin() == Operation.Origin.REPLICA) {
            // #7142: the primary already determined it's OK to index this document, and we confirmed above that the version doesn't
            // conflict, so we must also update here on the replica to remain consistent:
            doUpdate = true;
        } else if (create.origin() == Operation.Origin.PRIMARY && create.autoGeneratedId() && create.canHaveDuplicates() &&
                currentVersion == 1 && create.version() == Versions.MATCH_ANY) {
            /**
            * If bulk index request fails due to a disconnect, unavailable shard etc. then the request is
            * retried before it actually fails. However, the documents might already be indexed.
            * For autogenerated ids this means that a version conflict will be reported in the bulk request
            * although the document was indexed properly.
            * To avoid this we have to make sure that the index request is treated as an update and set updatedVersion to 1.
            * See also discussion on https://github.com/elasticsearch/elasticsearch/pull/9125
            */
            doUpdate = true;
            updatedVersion = 1;
        } else {
            // On primary, we throw DAEE if the _uid is already in the index with an older version:
            assert create.origin() == Operation.Origin.PRIMARY;
            throw new DocumentAlreadyExistsException(shardId, create.type(), create.id());
        }
    }

    create.updateVersion(updatedVersion);

    if (doUpdate) {
        if (create.docs().size() > 1) {
            indexWriter.updateDocuments(create.uid(), create.docs());
        } else {
            indexWriter.updateDocument(create.uid(), create.docs().get(0));
        }
    } else {
        if (create.docs().size() > 1) {
            indexWriter.addDocuments(create.docs());
        } else {
            indexWriter.addDocument(create.docs().get(0));
        }
    }
     //写入translog
    Translog.Location translogLocation = translog.add(new Translog.Create(create));

    versionMap.putUnderLock(create.uid().bytes(), new VersionValue(updatedVersion, translogLocation));
    create.setTranslogLocation(translogLocation);
    indexingService.postCreateUnderLock(create);
}

再来看看INDEX的：

将uid作为一个term去索引里面查找version的值，根据是否查到version来决定是add还是update：这里的update其实是将原来的document置为delete状态，并将新的document插入，merge的时候会将delete状态的document删除掉。

private boolean innerIndex(Index index) throws IOException {
    synchronized (dirtyLock(index.uid())) {
        final long currentVersion;
        VersionValue versionValue = versionMap.getUnderLock(index.uid().bytes());
        if (versionValue == null) {
            currentVersion = loadCurrentVersionFromIndex(index.uid());
        } else {
            if (engineConfig.isEnableGcDeletes() && versionValue.delete() && (engineConfig.getThreadPool().estimatedTimeInMillis() -
                    versionValue.time()) > engineConfig.getGcDeletesInMillis()) {
                currentVersion = Versions.NOT_FOUND; // deleted, and GC
            } else {
                currentVersion = versionValue.version();
            }
        }

        long updatedVersion;
        long expectedVersion = index.version();
        if (index.versionType().isVersionConflictForWrites(currentVersion, expectedVersion)) {
            if (index.origin() == Operation.Origin.RECOVERY) {
                return false;
            } else {
                throw new VersionConflictEngineException(shardId, index.type(), index.id(), currentVersion, expectedVersion);
            }
        }
        updatedVersion = index.versionType().updateVersion(currentVersion, expectedVersion);

        final boolean created;
        index.updateVersion(updatedVersion);
        if (currentVersion == Versions.NOT_FOUND) {
            // document does not exists, we can optimize for create
            created = true;
            if (index.docs().size() > 1) {
                indexWriter.addDocuments(index.docs());
            } else {
                indexWriter.addDocument(index.docs().get(0));
            }
        } else {
            if (versionValue != null) {
                created = versionValue.delete(); // we have a delete which is not GC'ed...
            } else {
                created = false;
            }
            if (index.docs().size() > 1) {
                indexWriter.updateDocuments(index.uid(), index.docs());
            } else {
                indexWriter.updateDocument(index.uid(), index.docs().get(0));
            }
        }
        Translog.Location translogLocation = translog.add(new Translog.Index(index));

        versionMap.putUnderLock(index.uid().bytes(), new VersionValue(updatedVersion, translogLocation));
        index.setTranslogLocation(translogLocation);
        indexingService.postIndexUnderLock(index);
        return created;
    }
}


完成数据索引之后，索引数据写入完成，但是只是将数据写入了buffer和translog里面。后面还有refresh和flush，保证数据可查和安全性。
如果请求里面要求refresh，则会立即refresh，如果要求立即将translog写入storage也会立即执行。

Replication写入流程：
Replication 流程大致和Primary 相同， ReplicationPhase的doRun方法是入口，核心方法是performOnReplica 发送replica operation到目标节点
,如果发现Replication shardId所属的节点就是自己的话，异步执行shardOperationOnReplica，大体逻辑如下：
/**
* Responsible for sending replica requests (see {@link AsyncReplicaAction}) to nodes with replica copy, including
* relocating copies
*/
final class ReplicationPhase extends AbstractRunnable {

     ......

    /**
    * start sending replica requests to target nodes
    */
    @Override
    protected void doRun() {
        setPhase(task, "replicating");
        if (pending.get() == 0) {
            doFinish();
            return;
        }
        for (ShardRouting shard : shards) {
            if (shard.primary() == false && executeOnReplica == false) {
                // If the replicas use shadow replicas, there is no reason to
                // perform the action on the replica, so skip it and
                // immediately return

                // this delays mapping updates on replicas because they have
                // to wait until they get the new mapping through the cluster
                // state, which is why we recommend pre-defined mappings for
                // indices using shadow replicas
                continue;
            }
            if (shard.unassigned()) {
                continue;
            }
            // we index on a replica that is initializing as well since we might not have got the event
            // yet that it was started. We will get an exception IllegalShardState exception if its not started
            // and that's fine, we will ignore it

            // we never execute replication operation locally as primary operation has already completed locally
            // hence, we ignore any local shard for replication
            if (nodes.localNodeId().equals(shard.currentNodeId()) == false) {
                performOnReplica(shard);
            }
            // send operation to relocating shard
            if (shard.relocating()) {
                performOnReplica(shard.buildTargetRelocatingShard());
            }
        }
    }

    /**
    * send replica operation to target node
    */
    void performOnReplica(final ShardRouting shard) {
        // if we don't have that node, it means that it might have failed and will be created again, in
        // this case, we don't have to do the operation, and just let it failover
        final String nodeId = shard.currentNodeId();
        if (!nodes.nodeExists(nodeId)) {
            logger.trace("failed to send action [{}] on replica [{}] for request [{}] due to unknown node [{}]", transportReplicaAction, shard.shardId(), replicaRequest, nodeId);
            onReplicaFailure(nodeId, null);
            return;
        }
        if (logger.isTraceEnabled()) {
            logger.trace("send action [{}] on replica [{}] for request [{}] to [{}]", transportReplicaAction, shard.shardId(), replicaRequest, nodeId);
        }

        final DiscoveryNode node = nodes.get(nodeId);
        transportService.sendRequest(node, transportReplicaAction, replicaRequest, transportOptions, new EmptyTransportResponseHandler(ThreadPool.Names.SAME) {

    }

    .......

}


class ReplicaOperationTransportHandler extends TransportRequestHandler {
    @Override
    public void messageReceived(final ReplicaRequest request, final TransportChannel channel) throws Exception {
        throw new UnsupportedOperationException("the task parameter is required for this operation");
    }

    @Override
    public void messageReceived(ReplicaRequest request, TransportChannel channel, Task task) throws Exception {
        new AsyncReplicaAction(request, channel, (ReplicationTask) task).run();
    }
}

最后执行由AsyncReplicaAction的 doRun实现。

protected void doRun() throws Exception {
    setPhase(task, "replica");
    assert request.shardId() != null : "request shardId must be set";
    try (Releasable ignored = getIndexShardOperationsCounter(request.shardId())) {
        shardOperationOnReplica(request);
        if (logger.isTraceEnabled()) {
            logger.trace("action [{}] completed on shard [{}] for request [{}]", transportReplicaAction, request.shardId(), request);
        }
    }
    setPhase(task, "finished");
    channel.sendResponse(TransportResponse.Empty.INSTANCE);
}

在Replication阶段，shardOperationOnReplica 该方法完成了索引内容解析，mapping动态新增，最后进入索引(operation.execute)等动作，在Primary 和 Replication , 一个BulkShardRequest 处理完成后(也就是一个Shard 对应的数据集合)，会刷写Translog日志。
5、完成数据索引之后，如果请求里面要求refresh，则会立即refresh，如果要求立即将translog写入storage也会立即执行。

你可能感兴趣的:(es源码解析)

C语言如何定义宏函数？小九格物 c语言
在C语言中，宏函数是通过预处理器定义的，它在编译之前替换代码中的宏调用。宏函数可以模拟函数的行为，但它们不是真正的函数，因为它们在编译时不会进行类型检查，也不会分配存储空间。宏函数的定义通常使用#define指令，后面跟着宏的名称和参数列表，以及宏展开后的代码。宏函数的定义方式：1.基本宏函数：这是最简单的宏函数形式，它直接定义一个表达式。#defineSQUARE(x)((x)*(x))2.带参
Long类型前后端数据不一致 igotyback 前端
响应给前端的数据浏览器控制台中response中看到的Long类型的数据是正常的到前端数据不一致前后端数据类型不匹配是一个常见问题，尤其是当后端使用Java的Long类型（64位）与前端JavaScript的Number类型（最大安全整数为2^53-1，即16位）进行数据交互时，很容易出现精度丢失的问题。这是因为JavaScript中的Number类型无法安全地表示超过16位的整数。为了解决这个问
mysql禁用远程登录 igotyback mysql
去mysql库中的user表里，将host都改成localhost之后刷新权限FLUSHPRIVILEGES;
消息中间件有哪些常见类型 xmh-sxh-1314 java
消息中间件根据其设计理念和用途，可以大致分为以下几种常见类型：点对点消息队列（Point-to-PointMessagingQueues）：在这种模型中，消息被发送到特定的队列中，消费者从队列中取出并处理消息。队列中的消息只能被一个消费者消费，消费后即被删除。常见的实现包括IBM的MQSeries、RabbitMQ的部分使用场景等。适用于任务分发、负载均衡等场景。发布/订阅消息模型（Pub/Sub
每日一题——第八十四题互联网打工人no1 C语言程序设计每日一练 c语言
题目：编写函数1、输入10个职工的姓名和职工号2、按照职工由大到小顺序排列，姓名顺序也随之调整3、要求输入一个职工号，用折半查找法找出该职工的姓名#define_CRT_SECURE_NO_WARNINGS#include#include#defineMAX_EMPLOYEES10typedefstruct{intid;charname[50];}Empolyee;voidinputEmploye
C#中使用split分割字符串互联网打工人no1 c#
1、用字符串分隔：usingSystem.Text.RegularExpressions;stringstr="aaajsbbbjsccc";string[]sArray=Regex.Split(str,"js",RegexOptions.IgnoreCase);foreach(stringiinsArray)Response.Write(i.ToString()+"");输出结果：aaabbbc
linux sdl windows.h,Windows下的SDL安装奔跑吧linux内核 linux sdl windows.h
首先你要下载并安装SDL开发包。如果装在C盘下，路径为C:\SDL1.2.5如果在WINDOWS下。你可以按以下步骤：1.打开VC++，点击"Tools",Options2,点击directories选项3.选择"Includefiles"增加一个新的路径。"C:\SDL1.2.5\include"4，现在选择"Libaryfiles“增加"C:\SDL1.2.5\lib"现在你可以开始编写你的第
Python教程：一文了解使用Python处理XPath 旦莫 Python进阶 python 开发语言
目录1.环境准备1.1安装lxml1.2验证安装2.XPath基础2.1什么是XPath？2.2XPath语法2.3示例XML文档3.使用lxml解析XML3.1解析XML文档3.2查看解析结果4.XPath查询4.1基本路径查询4.2使用属性查询4.3查询多个节点5.XPath的高级用法5.1使用逻辑运算符5.2使用函数6.实战案例6.1从网页抓取数据6.1.1安装Requests库6.1.2代
python os.environ_python os.environ 读取和设置环境变量 weixin_39605414 python os.environ
>>>importos>>>os.environ.keys()['LC_NUMERIC','GOPATH','GOROOT','GOBIN','LESSOPEN','SSH_CLIENT','LOGNAME','USER','HOME','LC_PAPER','PATH','DISPLAY','LANG','TERM','SHELL','J2REDIR','LC_MONETARY','QT_QPA
SQL Server_查询某一数据库中的所有表的内容 qq_42772833 SQL Server 数据库 sqlserver
1.查看所有表的表名要列出CrabFarmDB数据库中的所有表（名），可以使用以下SQL语句：USECrabFarmDB;--切换到目标数据库GOSELECTTABLE_NAMEFROMINFORMATION_SCHEMA.TABLESWHERETABLE_TYPE='BASETABLE';对这段SQL脚本的解释：SELECTTABLE_NAME：这个语句的作用是从查询结果中选择TABLE_NAM
四章-32-点要素的聚合彩云飘过
本文基于腾讯课堂老胡的课《跟我学Openlayers--基础实例详解》做的学习笔记，使用的openlayers5.3.xapi。源码见1032.html，对应的官网示例https://openlayers.org/en/latest/examples/cluster.htmlhttps://openlayers.org/en/latest/examples/earthquake-clusters.
Git常用命令－修改远程仓库地址猿大师 Linux Java git java
查看远程仓库地址gitremote-v返回结果originhttps://git.coding.net/＊＊＊＊＊.git(fetch)originhttps://git.coding.net/＊＊＊＊＊.git(push)修改远程仓库地址gitremoteset-urloriginhttps://git.coding.net/＊＊＊＊＊.git先删除后增加远程仓库地址gitremotermori
使用Faiss进行高效相似度搜索 llzwxh888 faiss python
在现代AI应用中，快速和高效的相似度搜索是至关重要的。Faiss（FacebookAISimilaritySearch）是一个专门用于快速相似度搜索和聚类的库，特别适用于高维向量。本文将介绍如何使用Faiss来进行相似度搜索，并结合Python代码演示其基本用法。什么是Faiss？Faiss是一个由FacebookAIResearch团队开发的开源库，主要用于高维向量的相似性搜索和聚类。Faiss
利用Requests Toolkit轻松完成HTTP请求 nseejrukjhad http 网络协议网络 python
RequestsToolkit的力量：轻松构建HTTP请求Agent在现代软件开发中，API请求是与外部服务交互的核心。RequestsToolkit提供了一种便捷的方式，帮助开发者构建自动化的HTTP请求Agent。本文旨在详细介绍RequestsToolkit的设置、使用和潜在挑战。引言RequestsToolkit是一个强大的工具包，可用于构建执行HTTP请求的智能代理。这对于想要自动化与外
webpack图片等资源的处理 dmengmeng
需要的loaderfile-loader（让我们可以引入这些资源文件）url-loader（其实是file-loader的二次封装）img-loader（处理图片所需要的）在没有使用任何处理图片的loader之前，比如说css中用到了背景图片，那么最后打包会报错的，因为他没办法处理图片。其实你只想能够使用图片的话。只加一个file-loader就可以，打开网页能准确看到图片。{test:/\.(p
Faiss Tips：高效向量搜索与聚类的利器焦习娜Samantha
FaissTips：高效向量搜索与聚类的利器faiss_tipsSomeusefultipsforfaiss项目地址:https://gitcode.com/gh_mirrors/fa/faiss_tips项目介绍Faiss是由FacebookAIResearch开发的一个用于高效相似性搜索和密集向量聚类的库。它支持多种硬件平台，包括CPU和GPU，能够在海量数据集上实现快速的近似最近邻搜索（AN
回溯算法-重新安排行程 chirou_ 算法数据结构图论 c++图搜索
leetcode332.重新安排行程这题我还没自己ac过，只能现在凭着刚学完的热乎劲把我对题解的理解记下来。本题我认为对数据结构的考察比较多，用什么数据结构去存数据，去读取数据，都是很重要的。classSolution{private:unordered_map>targets;boolbacktracking(intticketNum,vector&result){//1.确定参数和返回值//2
ARM驱动学习之5 LEDS驱动 JT灬新一嵌入式 C 底层 arm开发学习单片机
ARM驱动学习之5LEDS驱动知识点：•linuxGPIO申请函数和赋值函数–gpio_request–gpio_set_value•三星平台配置GPIO函数–s3c_gpio_cfgpin•GPIO配置输出模式的宏变量–S3C_GPIO_OUTPUT注意点：DRIVER_NAME和DEVICE_NAME匹配。实现步骤：1.加入需要的头文件：//Linux平台的gpio头文件#include//三
基于CODESYS的多轴运动控制程序框架：逻辑与运动控制分离，快速开发灵活操作 GPJnCrbBdl python 开发语言
基于codesys开发的多轴运动控制程序框架，将逻辑与运动控制分离，将单轴控制封装成功能块，对该功能块的操作包含了所有的单轴控制（归零、点动、相对定位、绝对定位、设置当前位置、伺服模式切换等等）。程序框架由主程序按照状态调用分归零模式、手动模式、自动模式、故障模式，程序状态的跳转都已完成，只需要根据不同的工艺要求完成所需的动作即可。变量的声明、地址的规划都严格按照C++的标准定义，能帮助开发者快速
C++ | Leetcode C++题解之第409题最长回文串 Ddddddd_158 经验分享 C++Leetcode 题解
题目：题解：classSolution{public:intlongestPalindrome(strings){unordered_mapcount;intans=0;for(charc:s)++count[c];for(autop:count){intv=p.second;ans+=v/2*2;if(v%2==1andans%2==0)++ans;}returnans;}};
Faiss：高效相似性搜索与聚类的利器网络·魚大数据 faiss
Faiss是一个针对大规模向量集合的相似性搜索库，由FacebookAIResearch开发。它提供了一系列高效的算法和数据结构，用于加速向量之间的相似性搜索，特别是在大规模数据集上。本文将介绍Faiss的原理、核心功能以及如何在实际项目中使用它。Faiss原理：近似最近邻搜索：Faiss的核心功能之一是近似最近邻搜索，它能够高效地在大规模数据集中找到与给定查询向量最相似的向量。这种搜索是近似的，
docker igotyback eureka 云原生
Docker容器的文件系统是隔离的，但是可以通过挂载卷（Volumes）或绑定挂载（BindMounts）将宿主机的文件系统目录映射到容器内部。要查看Docker容器的映射路径，可以使用以下方法：查看容器配置：使用dockerinspect命令可以查看容器的详细配置信息，包括挂载的卷。例如：bashdockerinspect在输出的JSON格式中，查找"Mounts"部分，这里会列出所有的挂载信息
mac电脑命令行获取电量小米人er 我的博客 macos 命令行
在macOS上，有几个命令行工具可以用来获取电量信息，最常用的是pmset命令。你可以通过以下方式来查看电池状态和电量信息：查看电池状态：pmset-gbatt这个命令会返回类似下面的输出：Nowdrawingfrom'BatteryPower'-InternalBattery-0(id=1234567)95%;discharging;4:02remainingpresent:true输出中包括电
【华为OD技术面试真题 - 技术面】- python八股文真题题库（1）算法大师华为od 面试 python
华为OD面试真题精选专栏：华为OD面试真题精选目录:2024华为OD面试手撕代码真题目录以及八股文真题目录文章目录华为OD面试真题精选1.数据预处理流程数据预处理的主要步骤工具和库2.介绍线性回归、逻辑回归模型线性回归（LinearRegression）模型形式：关键点：逻辑回归（LogisticRegression）模型形式：关键点：参数估计与评估：3.python浅拷贝及深拷贝浅拷贝（Shal
《Python数据分析实战终极指南》 xjt921122 python 数据分析开发语言
对于分析师来说，大家在学习Python数据分析的路上，多多少少都遇到过很多大坑**，有关于技能和思维的**：Excel已经没办法处理现有的数据量了，应该学Python吗？找了一大堆Python和Pandas的资料来学习，为什么自己动手就懵了？跟着比赛类公开数据分析案例练了很久，为什么当自己面对数据需求还是只会数据处理而没有分析思路？学了对比、细分、聚类分析，也会用PEST、波特五力这类分析法，为啥
209. 长度最小的子数组（滑动窗口法）清榎 leetcode刷题 c++leetcode 算法
209.长度最小的子数组题目描述：给定一个含有n个正整数的数组和一个正整数target。找出该数组中满足其和≥target的长度最小的连续子数组[numsl,numsl+1,...,numsr-1,numsr]，并返回其长度。如果不存在符合条件的子数组，返回0。解答：法一：直接使用暴力法。两重循环，对每一个元素向后进行寻找，若找到一个子数组≥target，比较其长度和result的大小，如果其长度
ES聚合分析原理与代码实例讲解光剑书架上的书大厂Offer收割机面试题简历程序员读书硅基计算碳基计算认知计算生物计算深度学习神经网络大数据 AIGC AGI LLM Java Python 架构设计 Agent 程序员实现财富自由
ES聚合分析原理与代码实例讲解1.背景介绍1.1问题的由来在大规模数据分析场景中，特别是在使用Elasticsearch（ES）进行数据存储和检索时，聚合分析成为了一个至关重要的功能。聚合分析允许用户对数据集进行细分和分组，以便深入探索数据的结构和模式。这在诸如实时监控、日志分析、业务洞察等领域具有广泛的应用。1.2研究现状目前，ES聚合分析已经成为现代大数据平台的核心组件之一。它支持多种类型的聚
最超值的Mac——Mac mini 初心么么哒
你知道最超值的Mac是什么吗？自2005年以来，Macmini一直是Apple台式机产品线中的主要产品。最初推出是为了让对Mac好奇的Mac进入Apple生态系统的一种简单方式，现在新的AppleSiliconMacmini可能是任何寻找新Mac的人的最有吸引力的购买。什么是AppleSiliconMacmini？M1Macmini是Apple最小的台式电脑，同时也是最快的台式电脑之一。最新型号由
关于Mysql 中 Row size too large (＞ 8126) 错误的解决和理解秋刀prince mysql mysql 数据库
提示：啰嗦一嘴，数据库的任何操作和验证前，一定要记得先备份！！！不会有错；文章目录问题发现一、问题导致的可能原因1、页大小2、行格式2.1compact格式2.2Redundant格式2.3Dynamic格式2.4Compressed格式3、BLOB和TEXT列二、解决办法1、修改页大小（不推荐）2、修改行格式3、修改数据类型为BLOB和TEXT列4、其他优化方式（可以参考使用）4.1合理设置数据
推荐算法_隐语义-梯度下降 _feivirus_ 算法机器学习和数学推荐算法机器学习隐语义
importnumpyasnp1.模型实现"""inputrate_matrix:M行N列的评分矩阵，值为P*Q.P:初始化用户特征矩阵M*K.Q:初始化物品特征矩阵K*N.latent_feature_cnt:隐特征的向量个数max_iteration:最大迭代次数alpha:步长lamda:正则化系数output分解之后的P和Q"""defLFM_grad_desc(rate_matrix,l
java类加载顺序 3213213333332132 java
package com.demo; /** * @Description 类加载顺序 * @author FuJianyong * 2015-2-6上午11:21:37 */ public class ClassLoaderSequence { String s1 = "成员属性"; static String s2 = "
Hibernate与mybitas的比较 BlueSkator sql Hibernate 框架 ibatis orm
第一章 Hibernate与MyBatis Hibernate 是当前最流行的O/R mapping框架，它出身于sf.net，现在已经成为Jboss的一部分。 Mybatis 是另外一种优秀的O/R mapping框架。目前属于apache的一个子项目。 MyBatis 参考资料官网：http:
php多维数组排序以及实际工作中的应用 dcj3sjt126com PHP usort uasort
自定义排序函数返回false或负数意味着第一个参数应该排在第二个参数的前面, 正数或true反之, 0相等usort不保存键名uasort 键名会保存下来uksort 排序是对键名进行的 <!doctype html> <html lang="en"> <head> <meta charset="utf-8&q
DOM改变字体大小周华华前端
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml&q
c3p0的配置 g21121 c3p0
c3p0是一个开源的JDBC连接池，它实现了数据源和JNDI绑定，支持JDBC3规范和JDBC2的标准扩展。c3p0的下载地址是：http://sourceforge.net/projects/c3p0/这里可以下载到c3p0最新版本。以在spring中配置dataSource为例：  <bean name="prope
Java获取工程路径的几种方法 510888780 java
第一种： File f = new File(this.getClass().getResource("/").getPath()); System.out.println(f); 结果: C:\Documents%20and%20Settings\Administrator\workspace\projectName\bin 获取当前类的所在工程路径; 如果不加“
在类Unix系统下实现SSH免密码登录服务器 Harry642 免密 ssh
1.客户机 (1)执行ssh-keygen -t rsa -C "[email protected]"生成公钥，xxx为自定义大email地址 (2)执行scp ~/.ssh/id_rsa.pub root@xxxxxxxxx:/tmp将公钥拷贝到服务器上，xxx为服务器地址 (3)执行cat
Java新手入门的30个基本概念一 aijuans java java 入门新手
在我们学习Java的过程中,掌握其中的基本概念对我们的学习无论是J2SE,J2EE,J2ME都是很重要的,J2SE是Java的基础,所以有必要对其中的基本概念做以归纳,以便大家在以后的学习过程中更好的理解java的精髓,在此我总结了30条基本的概念。　　Java概述:　　目前Java主要应用于中间件的开发(middleware)---处理客户机于服务器之间的通信技术,早期的实践证明,Java不适合
Memcached for windows 简单介绍 antlove java Web windows cache memcached
1. 安装memcached server a. 下载memcached-1.2.6-win32-bin.zip b. 解压缩，dos 窗口切换到 memcached.exe所在目录，运行memcached.exe -d install c.启动memcached Server,直接在dos窗口键入 net start "memcached Server&quo
数据库对象的视图和索引百合不是茶索引 oeacle数据库视图
视图视图是从一个表或视图导出的表，也可以是从多个表或视图导出的表。视图是一个虚表，数据库不对视图所对应的数据进行实际存储，只存储视图的定义，对视图的数据进行操作时,只能将字段定义为视图,不能将具体的数据定义为视图为什么oracle需要视图; &
Mockito(一) --入门篇 bijian1013 持续集成 mockito 单元测试
Mockito是一个针对Java的mocking框架，它与EasyMock和jMock很相似，但是通过在执行后校验什么已经被调用，它消除了对期望行为（expectations）的需要。其它的mocking库需要你在执行前记录期望行为（expectations），而这导致了丑陋的初始化代码。 &nb
精通Oracle10编程SQL(5)SQL函数 bijian1013 oracle 数据库 plsql
/* * SQL函数 */ --数字函数 --ABS(n):返回数字n的绝对值 declare v_abs number(6,2); begin v_abs:=abs(&no); dbms_output.put_line('绝对值：'||v_abs); end; --ACOS(n):返回数字n的反余弦值，输入值的范围是-1~1，输出值的单位为弧度
【Log4j一】Log4j总体介绍 bit1129 log4j
Log4j组件：Logger、Appender、Layout Log4j核心包含三个组件：logger、appender和layout。这三个组件协作提供日志功能：日志的输出目标日志的输出格式日志的输出级别(是否抑制日志的输出) logger继承特性 A logger is said to be an ancestor of anothe
Java IO笔记白糖_ java
public static void main(String[] args) throws IOException { //输入流 InputStream in = Test.class.getResourceAsStream("/test"); InputStreamReader isr = new InputStreamReader(in); Bu
Docker 监控 ronin47 docker监控
目前项目内部署了docker，于是涉及到关于监控的事情，参考一些经典实例以及一些自己的想法，总结一下思路。 1、关于监控的内容监控宿主机本身监控宿主机本身还是比较简单的，同其他服务器监控类似，对cpu、network、io、disk等做通用的检查，这里不再细说。额外的，因为是docker的
java-顺时针打印图形 bylijinnan java
一个画图程序要求打印出： 1.int i=5; 2.1 2 3 4 5 3.16 17 18 19 6 4.15 24 25 20 7 5.14 23 22 21 8 6.13 12 11 10 9 7. 8.int i=6 9.1 2 3 4 5 6 10.20 21 22 23 24 7 11.19
关于iReport汉化版强制使用英文的配置方法 Kai_Ge iReport汉化英文版
对于那些具有强迫症的工程师来说，软件汉化固然好用，但是汉化不完整却极为头疼，本方法针对iReport汉化不完整的情况，强制使用英文版，方法如下：在 iReport 安装路径下的 etc/ireport.conf 里增加红色部分启动参数，即可变为英文版。 # ${HOME} will be replaced by user home directory accordin
[并行计算]论宇宙的可计算性 comsci 并行计算
现在我们知道,一个涡旋系统具有并行计算能力.按照自然运动理论,这个系统也同时具有存储能力,同时具备计算和存储能力的系统,在某种条件下一般都会产生意识...... 那么,这种概念让我们推论出一个结论 &nb
用OpenGL实现无限循环的coverflow dai_lm android coverflow
网上找了很久，都是用Gallery实现的，效果不是很满意，结果发现这个用OpenGL实现的，稍微修改了一下源码，实现了无限循环功能源码地址： https://github.com/jackfengji/glcoverflow public class CoverFlowOpenGL extends GLSurfaceView implements GLSurfaceV
JAVA数据计算的几个解决方案1 datamachine java Hibernate 计算
老大丢过来的软件跑了10天，摸到点门道，正好跟以前攒的私房有关联，整理存档。 -----------------------------华丽的分割线------------------------------------- 数据计算层是指介于数据存储和应用程序之间，负责计算数据存储层的数据，并将计算结果返回应用程序的层次。J &nbs
简单的用户授权系统,利用给user表添加一个字段标识管理员的方式 dcj3sjt126com yii
怎么创建一个简单的(非 RBAC)用户授权系统通过查看论坛，我发现这是一个常见的问题，所以我决定写这篇文章。本文只包括授权系统.假设你已经知道怎么创建身份验证系统(登录)。数据库首先在 user 表创建一个新的字段(integer 类型),字段名 'accessLevel',它定义了用户的访问权限扩展 CWebUser 类在配置文件(一般为 protecte
未选之路 dcj3sjt126com 诗
作者:罗伯特*费罗斯特黄色的树林里分出两条路, 可惜我不能同时去涉足, 我在那路口久久伫立, 我向着一条路极目望去, 直到它消失在丛林深处. 但我却选了另外一条路, 它荒草萋萋,十分幽寂; 显得更诱人,更美丽, 虽然在这两条小路上, 都很少留下旅人的足迹. 那天清晨落叶满地, 两条路都未见脚印痕迹. 呵,留下一条路等改日再
Java处理15位身份证变18位蕃薯耀 18位身份证变15位 15位身份证变18位身份证转换
15位身份证变18位，18位身份证变15位 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 蕃薯耀 201
SpringMVC4零配置--应用上下文配置【AppConfig】 hanqunfeng springmvc4
从spring3.0开始，Spring将JavaConfig整合到核心模块，普通的POJO只需要标注@Configuration注解，就可以成为spring配置类，并通过在方法上标注@Bean注解的方式注入bean。 Xml配置和Java类配置对比如下： applicationContext-AppConfig.xml <!-- 激活自动代理功能参看：
Android中webview跟JAVASCRIPT中的交互 jackyrong JavaScript html android 脚本
在android的应用程序中,可以直接调用webview中的javascript代码,而webview中的javascript代码,也可以去调用ANDROID应用程序(也就是JAVA部分的代码).下面举例说明之: 1 JAVASCRIPT脚本调用android程序要在webview中,调用addJavascriptInterface(OBJ,int
8个最佳Web开发资源推荐 lampcy 编程 Web 程序员
Web开发对程序员来说是一项较为复杂的工作，程序员需要快速地满足用户需求。如今很多的在线资源可以给程序员提供帮助，比如指导手册、在线课程和一些参考资料，而且这些资源基本都是免费和适合初学者的。无论你是需要选择一门新的编程语言，或是了解最新的标准，还是需要从其他地方找到一些灵感，我们这里为你整理了一些很好的Web开发资源，帮助你更成功地进行Web开发。这里列出10个最佳Web开发资源，它们都是受
架构师之面试------jdk的hashMap实现 nannan408 HashMap
1.前言。如题。 2.详述。 (1)hashMap算法就是数组链表。数组存放的元素是键值对。jdk通过移位算法（其实也就是简单的加乘算法），如下代码来生成数组下标(生成后indexFor一下就成下标了）。 static int hash(int h) { h ^= (h >>> 20) ^ (h >>>
html禁止清除input文本输入缓存 Rainbow702 html 缓存 input 输入框 change
多数浏览器默认会缓存input的值，只有使用ctl+F5强制刷新的才可以清除缓存记录。如果不想让浏览器缓存input的值，有2种方法：方法一：在不想使用缓存的input中添加 autocomplete="off"; <input type="text" autocomplete="off" n
POJO和JavaBean的区别和联系 tjmljw POJO java beans
POJO 和JavaBean是我们常见的两个关键字，一般容易混淆，POJO全称是Plain Ordinary Java Object / Pure Old Java Object，中文可以翻译成：普通Java类，具有一部分getter/setter方法的那种类就可以称作POJO，但是JavaBean则比 POJO复杂很多， Java Bean 是可复用的组件，对 Java Bean 并没有严格的规
java中单例的五种写法 liuxiaoling java 单例
/** * 单例模式的五种写法： * 1、懒汉 * 2、恶汉 * 3、静态内部类 * 4、枚举 * 5、双重校验锁 */ /** * 五、双重校验锁，在当前的内存模型中无效 */ class LockSingleton { private volatile static LockSingleton singleton; pri