接收bulk请求->判断是否需要自动创建index->处理bulk请求(解析request->构建map->循环获取shardId->执行构建index)->写primary ->写replica
bulk请求由多个request的实例组成了一个BulkRequest,入口是org.elasticsearch.rest.action.bulk.RestBulkAction,一个请求会构建一个BulkRequest对象,BulkRequest.add方法会解析提交的文本。
处理路径:
RestBulkAction ->TransportBulkAction ->TransportShardBulkAction
其中TransportShardBulkAction有个继承结构:主入口是TransportAction,具体的业务逻辑实现在子类(TransportReplicationAction)和孙子类(TransportShardBulkAction)里。
TransportShardBulkAction < TransportReplicationAction < TransportAction
RestBulkAction:
bulkRequest.add(request.content(), defaultIndex, defaultType, defaultRouting, defaultFields, null, allowExplicitIndex);
//这里的client其实是NodeClient,NodeClient将请求发送到TransportBulkAction类
client.bulk(bulkRequest, new RestBuilderListener(channel){…})
在服务端接收到其实是一个bulkRequest的实例,拿到这个实例之后,会根据配置action.auto_create_index来决定是否可以自动创建索引(默认是可以的),若是可以创建,会遍历所有的request取得其中包含的index和type,然后在遍历这些index和type,如果集群中不存在相应的index和type,则创建(创建过程这里先不说)之,完成之后才开始真正的bulk执行过程
TransportBulkAction 实现了HandledTransportAction,说明这个类同时也是一个逻辑处理类。
HandledTransportAction:
@Override
public final void messageReceived(final Request request, final TransportChannel channel, Task task) throws Exception {
// We already got the task created on the netty layer - no need to create it again on the transport layer
execute(task, request, new ActionListener() {
public class TransportBulkAction extends HandledTransportAction<BulkRequest, BulkResponse> {
·························
protected void doExecute(final BulkRequest bulkRequest, final ActionListener listener) {
·······
if (autoCreateIndex.needToCheck()) {
// Keep track of all unique indices and all unique types per index for the create index requests:
final Map> indicesAndTypes = new HashMap<>();
for (ActionRequest request : bulkRequest.requests) {
if (request instanceof DocumentRequest) {
DocumentRequest req = (DocumentRequest) request;
Set types = indicesAndTypes.get(req.index());
if (types == null) {
indicesAndTypes.put(req.index(), types = new HashSet<>());
}
types.add(req.type());
} else {
throw new ElasticsearchException("Parsed unknown request in bulk actions: " + request.getClass().getSimpleName());
}
}
final AtomicInteger counter = new AtomicInteger(indicesAndTypes.size());
ClusterState state = clusterService.state();
for (Map.Entry> entry : indicesAndTypes.entrySet()) {
final String index = entry.getKey();
if (autoCreateIndex.shouldAutoCreate(index, state)) {
//判断是否自动创建索引,若不存在则创建索引,直接跳入execute执行,否则由executeBulk执行处理请求
CreateIndexRequest createIndexRequest = new CreateIndexRequest(bulkRequest);
createIndexRequest.index(index);
for (String type : entry.getValue()) {
createIndexRequest.mapping(type);
}
createIndexRequest.cause("auto(bulk api)");
createIndexRequest.masterNodeTimeout(bulkRequest.timeout());
createIndexAction.execute(createIndexRequest, new ActionListener() {
@Override
public void onResponse(CreateIndexResponse result) {
if (counter.decrementAndGet() == 0) {
try {
executeBulk(bulkRequest, startTime, listener, responses);
} catch (Throwable t) {
listener.onFailure(t);
}
}
}
@Override
public void onFailure(Throwable e) {
if (!(ExceptionsHelper.unwrapCause(e) instanceof IndexAlreadyExistsException)) {
// fail all requests involving this index, if create didnt work
for (int i = 0; i < bulkRequest.requests.size(); i++) {
ActionRequest request = bulkRequest.requests.get(i);
if (request != null && setResponseFailureIfIndexMatches(responses, i, request, index, e)) {
bulkRequest.requests.set(i, null);
}
}
}
if (counter.decrementAndGet() == 0) {
try {
executeBulk(bulkRequest, startTime, listener, responses);
} catch (Throwable t) {
listener.onFailure(t);
}
}
}
});
} else {
if (counter.decrementAndGet() == 0) {
executeBulk(bulkRequest, startTime, listener, responses);
}
}
}
} else {
executeBulk(bulkRequest, startTime, listener, responses);
}
}
bulk请求过程:
接着通过executeBulk方法进入bulk请求流程。在该方法中,对bulkRequest.requests 进行两次循环。
判断集群的是否block了读操作,如果是blocked,就会返回timeout
// TODO use timeout to wait here if its blocked…
clusterState.blocks().globalBlockedRaiseException(ClusterBlockLevel.WRITE);
第一次判定如果是IndexRequest就调用IndexRequest.process方法,主要是为了解析出timestamp,routing,id,parent 等字段。
for (int i = 0; i < bulkRequest.requests.size(); i++) {
ActionRequest request = bulkRequest.requests.get(i);
//the request can only be null because we set it to null in the previous step, so it gets ignored
if (request == null) {
continue;
}
DocumentRequest documentRequest = (DocumentRequest) request;
if (addFailureIfIndexIsUnavailable(documentRequest, bulkRequest, responses, i, concreteIndices, metaData)) {
continue;
}
String concreteIndex = concreteIndices.resolveIfAbsent(documentRequest);
if (request instanceof IndexRequest) {
IndexRequest indexRequest = (IndexRequest) request;
MappingMetaData mappingMd = null;
if (metaData.hasIndex(concreteIndex)) {
mappingMd = metaData.index(concreteIndex).mappingOrDefault(indexRequest.type());
}
try {
indexRequest.process(metaData, mappingMd, allowIdGeneration, concreteIndex);
} catch (ElasticsearchParseException | RoutingMissingException e) {
BulkItemResponse.Failure failure = new BulkItemResponse.Failure(concreteIndex, indexRequest.type(),
indexRequest.id(), e);
BulkItemResponse bulkItemResponse = new BulkItemResponse(i, “index”, failure);
responses.set(i, bulkItemResponse);
// make sure the request gets never processed again
bulkRequest.requests.set(i, null);
}
} else if (request instanceof DeleteRequest) {
try {
TransportDeleteAction.resolveAndValidateRouting(metaData, concreteIndex, (DeleteRequest)request);
} catch(RoutingMissingException e) {
BulkItemResponse.Failure failure = new BulkItemResponse.Failure(concreteIndex, documentRequest.type(),
documentRequest.id(), e);
BulkItemResponse bulkItemResponse = new BulkItemResponse(i, “delete”, failure);
responses.set(i, bulkItemResponse);
// make sure the request gets never processed again
bulkRequest.requests.set(i, null);
}
} else if (request instanceof UpdateRequest) {
try {
TransportUpdateAction.resolveAndValidateRouting(metaData, concreteIndex, (UpdateRequest)request);
} catch(RoutingMissingException e) {
BulkItemResponse.Failure failure = new BulkItemResponse.Failure(concreteIndex, documentRequest.type(),
documentRequest.id(), e);
BulkItemResponse bulkItemResponse = new BulkItemResponse(i, “update”, failure);
responses.set(i, bulkItemResponse);
// make sure the request gets never processed again
bulkRequest.requests.set(i, null);
}
} else {
throw new AssertionError(“request type not supported: [” + request.getClass().getName() + “]”);
} }
进行一系列加工操作(获取routing、指定timestamp(没有就使用当前时间)、 在
this.allowIdGeneration = this.settings.getAsBoolean(“action.bulk.action.allow_id_generation”, true);
配置为true的情况下,会自动生成一个base64UUID作为id字段,并会将request的opType字段置为CREATE,如果是使用es自动生成的id的话,默认就是createdocument而不是updatedocument)
public void process(MetaData metaData, @Nullable MappingMetaData mappingMd, boolean allowIdGeneration, String concreteIndex) {
// resolve the routing if needed
routing(metaData.resolveIndexRouting(routing, index));
// resolve timestamp if provided externally
if (timestamp != null) {
timestamp = MappingMetaData.Timestamp.parseStringTimestamp(timestamp,
mappingMd != null ? mappingMd.timestamp().dateTimeFormatter() : TimestampFieldMapper.Defaults.DATE_TIME_FORMATTER,
getVersion(metaData, concreteIndex));
}
// extract values if needed
if (mappingMd != null) {
MappingMetaData.ParseContext parseContext = mappingMd.createParseContext(id, routing, timestamp);
if (parseContext.shouldParse()) {
XContentParser parser = null;
try {
parser = XContentHelper.createParser(source);
mappingMd.parse(parser, parseContext);
if (parseContext.shouldParseId()) {
id = parseContext.id();
}
if (parseContext.shouldParseRouting()) {
if (routing != null && !routing.equals(parseContext.routing())) {
throw new MapperParsingException("The provided routing value [" + routing + "] doesn't match the routing key stored in the document: [" + parseContext.routing() + "]");
}
routing = parseContext.routing();
}
if (parseContext.shouldParseTimestamp()) {
timestamp = parseContext.timestamp();
if (timestamp != null) {
timestamp = MappingMetaData.Timestamp.parseStringTimestamp(timestamp, mappingMd.timestamp().dateTimeFormatter(), getVersion(metaData, concreteIndex));
}
}
} catch (MapperParsingException e) {
throw e;
} catch (Exception e) {
throw new ElasticsearchParseException("failed to parse doc to extract routing/timestamp/id", e);
} finally {
if (parser != null) {
parser.close();
}
}
}
// might as well check for routing here
if (mappingMd.routing().required() && routing == null) {
throw new RoutingMissingException(concreteIndex, type, id);
}
if (parent != null && !mappingMd.hasParentField()) {
throw new IllegalArgumentException("Can't specify parent if no parent field has been configured");
}
} else {
if (parent != null) {
throw new IllegalArgumentException("Can't specify parent if no parent field has been configured");
}
}
// generate id if not already provided and id generation is allowed
if (allowIdGeneration) {
if (id == null) {
id(Strings.base64UUID());
// since we generate the id, change it to CREATE
opType(IndexRequest.OpType.CREATE);
autoGeneratedId = true;
}
}
// generate timestamp if not provided, we always have one post this stage...
if (timestamp == null) {
String defaultTimestamp = TimestampFieldMapper.Defaults.DEFAULT_TIMESTAMP;
if (mappingMd != null && mappingMd.timestamp() != null) {
// If we explicitly ask to reject null timestamp
if (mappingMd.timestamp().ignoreMissing() != null && mappingMd.timestamp().ignoreMissing() == false) {
throw new TimestampParsingException("timestamp is required by mapping");
}
defaultTimestamp = mappingMd.timestamp().defaultTimestamp();
}
if (defaultTimestamp.equals(TimestampFieldMapper.Defaults.DEFAULT_TIMESTAMP)) {
timestamp = Long.toString(System.currentTimeMillis());
} else {
timestamp = MappingMetaData.Timestamp.parseStringTimestamp(defaultTimestamp, mappingMd.timestamp().dateTimeFormatter(), getVersion(metaData, concreteIndex));
}
}
}
第二次是为了对数据进行分拣,构建如下的数据结构
// first, go over all the requests and create a ShardId -> Operations mapping
Map
for (int i = 0; i < bulkRequest.requests.size(); i++) {
ActionRequest request = bulkRequest.requests.get(i);
if (request instanceof IndexRequest) {
IndexRequest indexRequest = (IndexRequest) request;
String concreteIndex = concreteIndices.getConcreteIndex(indexRequest.index());
// 获取每个request应该发送到的shardId(获取过程: request有routing就直接返回,如果没有,会先对id求一个hash
ShardId shardId = clusterService.operationRouting().indexShards(clusterState, concreteIndex, indexRequest.type(), indexRequest.id(), indexRequest.routing()).shardId();
List list = requestsByShard.get(shardId);
if (list == null) {
list = new ArrayList<>();
requestsByShard.put(shardId, list);
}
list.add(new BulkItemRequest(i, request));
} else if (request instanceof DeleteRequest) {
DeleteRequest deleteRequest = (DeleteRequest) request;
String concreteIndex = concreteIndices.getConcreteIndex(deleteRequest.index());
ShardId shardId = clusterService.operationRouting().indexShards(clusterState, concreteIndex, deleteRequest.type(), deleteRequest.id(), deleteRequest.routing()).shardId();
List list = requestsByShard.get(shardId);
if (list == null) {
list = new ArrayList<>();
requestsByShard.put(shardId, list);
}
list.add(new BulkItemRequest(i, request));
} else if (request instanceof UpdateRequest) {
UpdateRequest updateRequest = (UpdateRequest) request;
String concreteIndex = concreteIndices.getConcreteIndex(updateRequest.index());
ShardId shardId = clusterService.operationRouting().indexShards(clusterState, concreteIndex, updateRequest.type(), updateRequest.id(), updateRequest.routing()).shardId();
List list = requestsByShard.get(shardId);
if (list == null) {
list = new ArrayList<>();
requestsByShard.put(shardId, list);
}
list.add(new BulkItemRequest(i, request));
}
}
获取shardId对request按shard来分组,进而进行分片处理。
private int generateShardId(ClusterState clusterState, String index,
String type, String id, @Nullable String routing) {
IndexMetaData indexMetaData = clusterState.metaData().index(index);
if (indexMetaData == null) {
throw new IndexNotFoundException(index);
}
final Version createdVersion = indexMetaData.getCreationVersion();
final HashFunction hashFunction = indexMetaData.getRoutingHashFunction();
final boolean useType = indexMetaData.getRoutingUseType();
final int hash;
if (routing == null) {
if (!useType) {
hash = hash(hashFunction, id);
} else {
hash = hash(hashFunction, type, id);
}
} else {
hash = hash(hashFunction, routing);
}
if (createdVersion.onOrAfter(Version.V_2_0_0_beta1)) {
return MathUtils.mod(hash, indexMetaData.getNumberOfShards());
} else {
return Math.abs(hash % indexMetaData.getNumberOfShards());
}
}
有了ShardId,bulkRequest,List[BulkItemRequest]等信息后,遍历map 统一封装成BulkShardRequest,就是对属于同一ShardId的数据构建一个新的类似BulkRequest的对象。包含配置consistencyLevel和timeout。
for (Map.Entry> entry : requestsByShard.entrySet()) {
final ShardId shardId = entry.getKey();
final List requests = entry.getValue();
BulkShardRequest bulkShardRequest = new BulkShardRequest(bulkRequest, shardId, bulkRequest.refresh(), requests.toArray(new BulkItemRequest[requests.size()]));
bulkShardRequest.consistencyLevel(bulkRequest.consistencyLevel());
bulkShardRequest.timeout(bulkRequest.timeout());
//这里的shardBulkAction 是TransportShardBulkAction
shardBulkAction.execute(bulkShardRequest, new ActionListener() {
@Override
public void onResponse(BulkShardResponse bulkShardResponse) {
for (BulkItemResponse bulkItemResponse : bulkShardResponse.getResponses()) {
// we may have no response if item failed
if (bulkItemResponse.getResponse() != null) {
bulkItemResponse.getResponse().setShardInfo(bulkShardResponse.getShardInfo());
}
responses.set(bulkItemResponse.getItemId(), bulkItemResponse);
}
if (counter.decrementAndGet() == 0) {
finishHim();
}
}
·······························
}
根据继承关系TransportShardBulkAction < TransportReplicationAction < TransportAction ,shardBulkAction.execute 其实流程逻辑还是TransportReplicationAction来完成的。入口在该类的doExecute方法:
/**
* Responsible for routing and retrying failed operations on the primary.
* The actual primary operation is done in {@link PrimaryPhase} on the
* node with primary copy.
*
* Resolves index and shard id for the request before routing it to target node
*/
@Override
protected void doExecute(Task task, Request request, ActionListener listener) {
new ReroutePhase((ReplicationTask) task, request, listener).run();
}
#####不直接从primary分片复制,而是分别写入, 防止主分片数据损坏导致从分片数据损坏
一条数据被索引后需要经过两个阶段:
将数据写入Primary(主分片)
将数据写入Replication(从分片)
并从集群state中获得primary shard,如果primary shard不是active的会有retry机制。如果primary在本机就直接执行,如果不在会再发送到其shard所在的node。
/**
* Responsible for routing and retrying failed operations on the primary.
* The actual primary operation is done in {@link PrimaryPhase} on the
* node with primary copy.
*
* Resolves index and shard id for the request before routing it to target node
*/
final class ReroutePhase extends AbstractRunnable {
@Override
protected void doRun() {
setPhase(task, "routing");
final ClusterState state = observer.observedState();
ClusterBlockException blockException = state.blocks().globalBlockedException(globalBlockLevel());
if (blockException != null) {
handleBlockException(blockException);
return;
}
final String concreteIndex = resolveIndex() ? indexNameExpressionResolver.concreteSingleIndex(state, request) : request.index();
blockException = state.blocks().indexBlockedException(indexBlockLevel(), concreteIndex);
if (blockException != null) {
handleBlockException(blockException);
return;
}
//请求没有shardId,需要传递具体的索引来解析sharId
resolveRequest(state.metaData(), concreteIndex, request);
assert request.shardId() != null : "request shardId must be set in resolveRequest";
IndexShardRoutingTable indexShard = state.getRoutingTable().shardRoutingTable(request.shardId().getIndex(), request.shardId().id());
final ShardRouting primary = indexShard.primaryShard();
//如果primary shard不是active的会有retry机制
if (primary == null || primary.active() == false) {
logger.trace("primary shard [{}] is not yet active, scheduling a retry: action [{}], request [{}], cluster state version [{}]", request.shardId(), actionName, request, state.version());
retryBecauseUnavailable(request.shardId(), "primary shard is not active");
return;
}
if (state.nodes().nodeExists(primary.currentNodeId()) == false) {
logger.trace("primary shard [{}] is assigned to an unknown node [{}], scheduling a retry: action [{}], request [{}], cluster state version [{}]", request.shardId(), primary.currentNodeId(), actionName, request, state.version());
retryBecauseUnavailable(request.shardId(), "primary shard isn't assigned to a known node.");
return;
}
final DiscoveryNode node = state.nodes().get(primary.currentNodeId());
taskManager.registerChildTask(task, node.getId());
//如果primary在本机就直接执行,如果不在会再发送到其shard所在的node
if (primary.currentNodeId().equals(state.nodes().localNodeId())) {
setPhase(task, "waiting_on_primary");
if (logger.isTraceEnabled()) {
logger.trace("send action [{}] on primary [{}] for request [{}] with cluster state version [{}] to [{}] ", transportPrimaryAction, request.shardId(), request, state.version(), primary.currentNodeId());
}
performAction(node, transportPrimaryAction, true);
} else {
if (logger.isTraceEnabled()) {
logger.trace("send action [{}] on primary [{}] for request [{}] with cluster state version [{}] to [{}]", actionName, request.shardId(), request, state.version(), primary.currentNodeId());
}
setPhase(task, "rerouted");
performAction(node, actionName, false);
}
}
private void performAction(final DiscoveryNode node, final String action, final boolean isPrimaryAction) {
transportService.sendRequest(node, action, request, transportOptions, new BaseTransportResponseHandler() {
......
}}
TransportService.java :
private void sendLocalRequest(long requestId, final String action, final TransportRequest request) {
final DirectResponseChannel channel = new DirectResponseChannel(logger, localNode, action, requestId, adapter, threadPool);
try {
final RequestHandlerRegistry reg = adapter.getRequestHandler(action);
if (reg == null) {
throw new ActionNotFoundTransportException("Action [" + action + "] not found");
}
final String executor = reg.getExecutor();
if (ThreadPool.Names.SAME.equals(executor)) {
//noinspection unchecked
reg.processMessageReceived(request, channel);
} else {
threadPool.executor(executor).execute(new AbstractRunnable() {
@Override
protected void doRun() throws Exception {
//noinspection unchecked
reg.processMessageReceived(request, channel);
}
@Override
public boolean isForceExecution() {
return reg.isForceExecution();
}
@Override
public void onFailure(Throwable e) {
try {
channel.sendResponse(e);
} catch (Throwable e1) {
logger.warn("failed to notify channel of error message for action [" + action + "]", e1);
logger.warn("actual exception", e);
}
}
});
}
} catch (Throwable e) {
try {
channel.sendResponse(e);
} catch (Throwable e1) {
logger.warn("failed to notify channel of error message for action [" + action + "]", e1);
logger.warn("actual exception", e1);
}
}
}
//最后的请求实现还是primary的doRun来实现
class PrimaryOperationTransportHandler extends TransportRequestHandler<Request> {
@Override
public void messageReceived(final Request request, final TransportChannel channel) throws Exception {
throw new UnsupportedOperationException("the task parameter is required for this operation");
}
@Override
public void messageReceived(Request request, TransportChannel channel, Task task) throws Exception {
new PrimaryPhase((ReplicationTask) task, request, channel).run();
}
}
/**
* Responsible for performing primary operation locally and delegating to replication action once successful
*
* Note that as soon as we move to replication action, state responsibility is transferred to {@link ReplicationPhase}.
*/
final class PrimaryPhase extends AbstractRunnable {
................
@Override
public void onFailure(Throwable e) {
finishAsFailed(e);
}
@Override
protected void doRun() throws Exception {
setPhase(task, "primary");
// request shardID was set in ReroutePhase
assert request.shardId() != null : "request shardID must be set prior to primary phase";
final ShardId shardId = request.shardId();
final String writeConsistencyFailure = checkWriteConsistency(shardId);
if (writeConsistencyFailure != null) {
finishBecauseUnavailable(shardId, writeConsistencyFailure);
return;
}
final ReplicationPhase replicationPhase;
try {
indexShardReference = getIndexShardOperationsCounter(shardId);
//primary写入索引
Tuple primaryResponse = shardOperationOnPrimary(state.metaData(), request);
if (logger.isTraceEnabled()) {
logger.trace("action [{}] completed on shard [{}] for request [{}] with cluster state version [{}]", transportPrimaryAction, shardId, request, state.version());
}
replicationPhase = new ReplicationPhase(task, primaryResponse.v2(), primaryResponse.v1(), shardId, channel,
indexShardReference);
} catch (Throwable e) {
request.setCanHaveDuplicates();
if (ExceptionsHelper.status(e) == RestStatus.CONFLICT) {
if (logger.isTraceEnabled()) {
logger.trace("failed to execute [{}] on [{}]", e, request, shardId);
}
} else {
if (logger.isDebugEnabled()) {
logger.debug("failed to execute [{}] on [{}]", e, request, shardId);
}
}
finishAsFailed(e);
return;
}
//primary执行结束,转交replication执行请求
finishAndMoveToReplication(replicationPhase);
}
/**
* checks whether we can perform a write based on the write consistency setting
* returns **null* if OK to proceed, or a string describing the reason to stop
*/
String checkWriteConsistency(ShardId shardId) {
......
}
/**
* upon success, finish the first phase and transfer responsibility to the {@link ReplicationPhase}
*/
void finishAndMoveToReplication(ReplicationPhase replicationPhase) {
if (finished.compareAndSet(false, true)) {
replicationPhase.run();
} else {
assert false : "finishAndMoveToReplication called but operation is already finished";
}
}
......
primary写入过程: 接着进入shardOperationOnPrimary 方法,该方法是在孙子类TransportShardBulkAction类里实现的。
protected Tuple shardOperationOnPrimary(MetaData metaData, BulkShardRequest request) {
......
//版本号实现了对并发修改的支持
long[] preVersions = new long[request.items().length];
VersionType[] preVersionTypes = new VersionType[request.items().length];
Translog.Location location = null;
//这里的request是BulkShardRequest,对应的items则是BulkItemRequest集合,遍历根据BulkItemRequest的不同类型分为三个分支
//分别是IndexRequest,DeleteRequest,UpdateRequest三种
for (int requestIndex = 0; requestIndex < request.items().length; requestIndex++) {
BulkItemRequest item = request.items()[requestIndex];
if (item.request() instanceof IndexRequest) {
IndexRequest indexRequest = (IndexRequest) item.request();
preVersions[requestIndex] = indexRequest.version();
preVersionTypes[requestIndex] = indexRequest.versionType();
try {
WriteResult result = shardIndexOperation(request, indexRequest, metaData, indexShard, true);
location = locationToSync(location, result.location);
// add the response
IndexResponse indexResponse = result.response();
setResponse(item, new BulkItemResponse(item.id(), indexRequest.opType().lowercase(), indexResponse));
} catch (Throwable e) {
// rethrow the failure if we are going to retry on primary and let parent failure to handle it
if (retryPrimaryException(e)) {
// restore updated versions...
for (int j = 0; j < requestIndex; j++) {
applyVersion(request.items()[j], preVersions[j], preVersionTypes[j]);
}
throw (ElasticsearchException) e;
}
if (ExceptionsHelper.status(e) == RestStatus.CONFLICT) {
logger.trace("{} failed to execute bulk item (index) {}", e, request.shardId(), indexRequest);
} else {
logger.debug("{} failed to execute bulk item (index) {}", e, request.shardId(), indexRequest);
}
// if its a conflict failure, and we already executed the request on a primary (and we execute it
// again, due to primary relocation and only processing up to N bulk items when the shard gets closed)
// then just use the response we got from the successful execution
if (item.getPrimaryResponse() != null && isConflictException(e)) {
setResponse(item, item.getPrimaryResponse());
} else {
setResponse(item, new BulkItemResponse(item.id(), indexRequest.opType().lowercase(),
new BulkItemResponse.Failure(request.index(), indexRequest.type(), indexRequest.id(), e)));
}
}
} else if (item.request() instanceof DeleteRequest) {
······
} else if (item.request() instanceof UpdateRequest) {
······
processAfterWrite(request.refresh(), indexShard, location);
BulkItemResponse[] responses = new BulkItemResponse[request.items().length];
BulkItemRequest[] items = request.items();
for (int i = 0; i < items.length; i++) {
responses[i] = items[i].getPrimaryResponse();
}
return new Tuple<>(new BulkShardResponse(request.shardId(), responses), request);
}
shardIndexOperation里嵌套的核心方法是executeIndexRequestOnPrimary,该方法第一步是获取到Operation对象,
Engine对象是对Lucene的IndexWriter,Searcher之类的封装。这里的Engine.IndexingOperation对应的是Create或者Index类。这两个类理解为待索引的Document的Operation
/**
* Execute the given {@link IndexRequest} on a primary shard, throwing a
* {@link RetryOnPrimaryException} if the operation needs to be re-tried.
*/
public static WriteResult executeIndexRequestOnPrimary(BulkShardRequest shardRequest, IndexRequest request, IndexShard indexShard, MappingUpdatedAction mappingUpdatedAction) throws Throwable {
//返回一个创建索引的operation,由execute来执行
Engine.IndexingOperation operation = prepareIndexOperationOnPrimary(shardRequest, request, indexShard);
//第二步是判断索引的Mapping是不是要动态更新,如果是,则更新
Mapping update = operation.parsedDoc().dynamicMappingsUpdate();
final ShardId shardId = indexShard.shardId();
if (update != null) {
final String indexName = shardId.getIndex();
mappingUpdatedAction.updateMappingOnMasterSynchronously(indexName, request.type(), update);
operation = prepareIndexOperationOnPrimary(shardRequest, request, indexShard);
update = operation.parsedDoc().dynamicMappingsUpdate();
if (update != null) {
throw new RetryOnPrimaryException(shardId,
"Dynamic mappings are not available on the node that holds the primary yet");
}
}
//最后由execute执行实际的建索引工作
final boolean created = operation.execute(indexShard);
//执行结束后就能获得对应文档的版本等信息,这些信息会更新对应的IndexRequest等对象。
// update the version on request so it will happen on the replicas
final long version = operation.version();
request.version(version);
request.versionType(request.versionType().versionTypeForReplicationAndRecovery());
assert request.versionType().validateVersionForWrites(request.version());
return new WriteResult<>(new IndexResponse(shardId.getIndex(), request.type(), request.id(), request.version(), created), operation.getTranslogLocation());
}
1、在primary的节点上,遍历上文生成的BulkItemRequest中的request,分别对每一个request做处理(这里包括生成uid,对nested、parent-child类型做处理等等)。,根据上文中的opType字段来分为INDEX和CREATE两种operation。
/**
* Utility method to create either an index or a create operation depending
* on the {@link IndexRequest.OpType} of the request.
*/
public static Engine.IndexingOperation prepareIndexOperationOnPrimary(BulkShardRequest shardRequest, IndexRequest request, IndexShard indexShard) {
SourceToParse sourceToParse = SourceToParse.source(SourceToParse.Origin.PRIMARY, request.source()).index(request.index()).type(request.type()).id(request.id())
.routing(request.routing()).parent(request.parent()).timestamp(request.timestamp()).ttl(request.ttl());
boolean canHaveDuplicates = request.canHaveDuplicates();
if (shardRequest != null) {
canHaveDuplicates |= shardRequest.canHaveDuplicates();
}
if (request.opType() == IndexRequest.OpType.INDEX) {
return indexShard.prepareIndexOnPrimary(sourceToParse, request.version(), request.versionType(), canHaveDuplicates);
} else {
assert request.opType() == IndexRequest.OpType.CREATE : request.opType();
return indexShard.prepareCreateOnPrimary(sourceToParse, request.version(), request.versionType(), canHaveDuplicates, request.autoGeneratedId());
}
}
static Engine.Index prepareIndex(DocumentMapperForType docMapper, SourceToParse source, long version, VersionType versionType, Engine
.Operation.Origin origin, boolean canHaveDuplicates) {
long startTime = System.nanoTime();
/** 解析json为ParsedDocument */
ParsedDocument doc = docMapper.getDocumentMapper().parse(source);
if (docMapper.getMapping() != null) {
doc.addDynamicMappingsUpdate(docMapper.getMapping());
}
return new Engine.Index(docMapper.getDocumentMapper().uidMapper().term(doc.uid().stringValue()), doc, version, versionType,
origin, startTime, canHaveDuplicates);
}
首先来看看直接CREATE的:
最后 调用indexShard对象的create方法来进行索引的创建
IndexShard.java
public void create(Engine.Create create) {
ensureWriteAllowed(create);
markLastWrite();
create = indexingService.preCreate(create);
try {
if (logger.isTraceEnabled()) {
logger.trace("index [{}][{}]{}", create.type(), create.id(), create.docs());
}
engine().create(create);
create.endTime(System.nanoTime());
} catch (Throwable ex) {
indexingService.postCreate(create, ex);
throw ex;
}
indexingService.postCreate(create);
}
engine()方法返回的是Engine实例,create方法最后实现是由InternalEngine .innerCreate方法来执行构建索引操作。
因为写入时并发的,所以对于每一个写入都加了锁,synchronized (dirtyLock(create.uid()))用uid来判别,防止写入脏数据。
private void innerCreate(Create create) throws IOException {
/**首先,如果满足如下三个条件就无需进行版本检查:
*- index.optimize_auto_generated_id 被设置为true
*- id设置为自动生成
*- create.canHaveDuplicates == false
/*
!!!采用自动生成的ID,可以跳过版本检查,从而提高入库的效率。
if (engineConfig.isOptimizeAutoGenerateId() && create.autoGeneratedId() && !create.canHaveDuplicates()) {
// We don't need to lock because this ID cannot be concurrently updated:
innerCreateNoLock(create, Versions.NOT_FOUND, null);
} else {
synchronized (dirtyLock(create.uid())) {
final long currentVersion;
final VersionValue versionValue;
//如果对应文档在缓存中没有找到(versionMap),那么就会由如下的代码执行实际磁盘查询操作
versionValue = versionMap.getUnderLock(create.uid().bytes());
if (versionValue == null) {
currentVersion = loadCurrentVersionFromIndex(create.uid());
} else {
if (engineConfig.isEnableGcDeletes() && versionValue.delete() && (engineConfig.getThreadPool().estimatedTimeInMillis
() - versionValue.time()) > engineConfig.getGcDeletesInMillis()) {
currentVersion = Versions.NOT_FOUND; // deleted, and GC
} else {
currentVersion = versionValue.version();
}
}
innerCreateNoLock(create, currentVersion, versionValue);
}
}
}
通过对比create对象里的版本号和从索引文件里加载的版本号 ,最终决定是进行update还是create操作。
开始写入索引indexWrite:indexWriter.updateDocuments(create.uid(), create.docs());具体细节由lucence去做
private void innerCreateNoLock(Create create, long currentVersion, VersionValue versionValue) throws IOException {
// same logic as index
long updatedVersion;
long expectedVersion = create.version();
if (create.versionType().isVersionConflictForWrites(currentVersion, expectedVersion)) {
if (create.origin() == Operation.Origin.RECOVERY) {
return;
} else {
throw new VersionConflictEngineException(shardId, create.type(), create.id(), currentVersion, expectedVersion);
}
}
updatedVersion = create.versionType().updateVersion(currentVersion, expectedVersion);
// if the doc exists
boolean doUpdate = false;
if ((versionValue != null && versionValue.delete() == false) || (versionValue == null && currentVersion != Versions.NOT_FOUND)) {
if (create.origin() == Operation.Origin.RECOVERY) {
return;
} else if (create.origin() == Operation.Origin.REPLICA) {
// #7142: the primary already determined it's OK to index this document, and we confirmed above that the version doesn't
// conflict, so we must also update here on the replica to remain consistent:
doUpdate = true;
} else if (create.origin() == Operation.Origin.PRIMARY && create.autoGeneratedId() && create.canHaveDuplicates() &&
currentVersion == 1 && create.version() == Versions.MATCH_ANY) {
/**
* If bulk index request fails due to a disconnect, unavailable shard etc. then the request is
* retried before it actually fails. However, the documents might already be indexed.
* For autogenerated ids this means that a version conflict will be reported in the bulk request
* although the document was indexed properly.
* To avoid this we have to make sure that the index request is treated as an update and set updatedVersion to 1.
* See also discussion on https://github.com/elasticsearch/elasticsearch/pull/9125
*/
doUpdate = true;
updatedVersion = 1;
} else {
// On primary, we throw DAEE if the _uid is already in the index with an older version:
assert create.origin() == Operation.Origin.PRIMARY;
throw new DocumentAlreadyExistsException(shardId, create.type(), create.id());
}
}
create.updateVersion(updatedVersion);
if (doUpdate) {
if (create.docs().size() > 1) {
indexWriter.updateDocuments(create.uid(), create.docs());
} else {
indexWriter.updateDocument(create.uid(), create.docs().get(0));
}
} else {
if (create.docs().size() > 1) {
indexWriter.addDocuments(create.docs());
} else {
indexWriter.addDocument(create.docs().get(0));
}
}
//写入translog
Translog.Location translogLocation = translog.add(new Translog.Create(create));
versionMap.putUnderLock(create.uid().bytes(), new VersionValue(updatedVersion, translogLocation));
create.setTranslogLocation(translogLocation);
indexingService.postCreateUnderLock(create);
}
再来看看INDEX的:
将uid作为一个term去索引里面查找version的值,根据是否查到version来决定是add还是update:这里的update其实是将原来的document置为delete状态,并将新的document插入,merge的时候会将delete状态的document删除掉。
private boolean innerIndex(Index index) throws IOException {
synchronized (dirtyLock(index.uid())) {
final long currentVersion;
VersionValue versionValue = versionMap.getUnderLock(index.uid().bytes());
if (versionValue == null) {
currentVersion = loadCurrentVersionFromIndex(index.uid());
} else {
if (engineConfig.isEnableGcDeletes() && versionValue.delete() && (engineConfig.getThreadPool().estimatedTimeInMillis() -
versionValue.time()) > engineConfig.getGcDeletesInMillis()) {
currentVersion = Versions.NOT_FOUND; // deleted, and GC
} else {
currentVersion = versionValue.version();
}
}
long updatedVersion;
long expectedVersion = index.version();
if (index.versionType().isVersionConflictForWrites(currentVersion, expectedVersion)) {
if (index.origin() == Operation.Origin.RECOVERY) {
return false;
} else {
throw new VersionConflictEngineException(shardId, index.type(), index.id(), currentVersion, expectedVersion);
}
}
updatedVersion = index.versionType().updateVersion(currentVersion, expectedVersion);
final boolean created;
index.updateVersion(updatedVersion);
if (currentVersion == Versions.NOT_FOUND) {
// document does not exists, we can optimize for create
created = true;
if (index.docs().size() > 1) {
indexWriter.addDocuments(index.docs());
} else {
indexWriter.addDocument(index.docs().get(0));
}
} else {
if (versionValue != null) {
created = versionValue.delete(); // we have a delete which is not GC'ed...
} else {
created = false;
}
if (index.docs().size() > 1) {
indexWriter.updateDocuments(index.uid(), index.docs());
} else {
indexWriter.updateDocument(index.uid(), index.docs().get(0));
}
}
Translog.Location translogLocation = translog.add(new Translog.Index(index));
versionMap.putUnderLock(index.uid().bytes(), new VersionValue(updatedVersion, translogLocation));
index.setTranslogLocation(translogLocation);
indexingService.postIndexUnderLock(index);
return created;
}
}
完成数据索引之后,索引数据写入完成,但是只是将数据写入了buffer和translog里面。后面还有refresh和flush,保证数据可查和安全性。
如果请求里面要求refresh,则会立即refresh,如果要求立即将translog写入storage也会立即执行。
Replication写入流程:
Replication 流程大致和Primary 相同, ReplicationPhase的doRun方法是入口,核心方法是performOnReplica 发送replica operation到目标节点
,如果发现Replication shardId所属的节点就是自己的话,异步执行shardOperationOnReplica,大体逻辑如下:
/**
* Responsible for sending replica requests (see {@link AsyncReplicaAction}) to nodes with replica copy, including
* relocating copies
*/
final class ReplicationPhase extends AbstractRunnable {
......
/**
* start sending replica requests to target nodes
*/
@Override
protected void doRun() {
setPhase(task, "replicating");
if (pending.get() == 0) {
doFinish();
return;
}
for (ShardRouting shard : shards) {
if (shard.primary() == false && executeOnReplica == false) {
// If the replicas use shadow replicas, there is no reason to
// perform the action on the replica, so skip it and
// immediately return
// this delays mapping updates on replicas because they have
// to wait until they get the new mapping through the cluster
// state, which is why we recommend pre-defined mappings for
// indices using shadow replicas
continue;
}
if (shard.unassigned()) {
continue;
}
// we index on a replica that is initializing as well since we might not have got the event
// yet that it was started. We will get an exception IllegalShardState exception if its not started
// and that's fine, we will ignore it
// we never execute replication operation locally as primary operation has already completed locally
// hence, we ignore any local shard for replication
if (nodes.localNodeId().equals(shard.currentNodeId()) == false) {
performOnReplica(shard);
}
// send operation to relocating shard
if (shard.relocating()) {
performOnReplica(shard.buildTargetRelocatingShard());
}
}
}
/**
* send replica operation to target node
*/
void performOnReplica(final ShardRouting shard) {
// if we don't have that node, it means that it might have failed and will be created again, in
// this case, we don't have to do the operation, and just let it failover
final String nodeId = shard.currentNodeId();
if (!nodes.nodeExists(nodeId)) {
logger.trace("failed to send action [{}] on replica [{}] for request [{}] due to unknown node [{}]", transportReplicaAction, shard.shardId(), replicaRequest, nodeId);
onReplicaFailure(nodeId, null);
return;
}
if (logger.isTraceEnabled()) {
logger.trace("send action [{}] on replica [{}] for request [{}] to [{}]", transportReplicaAction, shard.shardId(), replicaRequest, nodeId);
}
final DiscoveryNode node = nodes.get(nodeId);
transportService.sendRequest(node, transportReplicaAction, replicaRequest, transportOptions, new EmptyTransportResponseHandler(ThreadPool.Names.SAME) {
}
.......
}
class ReplicaOperationTransportHandler extends TransportRequestHandler {
@Override
public void messageReceived(final ReplicaRequest request, final TransportChannel channel) throws Exception {
throw new UnsupportedOperationException("the task parameter is required for this operation");
}
@Override
public void messageReceived(ReplicaRequest request, TransportChannel channel, Task task) throws Exception {
new AsyncReplicaAction(request, channel, (ReplicationTask) task).run();
}
}
最后执行由AsyncReplicaAction的 doRun实现。
protected void doRun() throws Exception {
setPhase(task, "replica");
assert request.shardId() != null : "request shardId must be set";
try (Releasable ignored = getIndexShardOperationsCounter(request.shardId())) {
shardOperationOnReplica(request);
if (logger.isTraceEnabled()) {
logger.trace("action [{}] completed on shard [{}] for request [{}]", transportReplicaAction, request.shardId(), request);
}
}
setPhase(task, "finished");
channel.sendResponse(TransportResponse.Empty.INSTANCE);
}
在Replication阶段,shardOperationOnReplica 该方法完成了索引内容解析,mapping动态新增,最后进入索引(operation.execute)等动作,在Primary 和 Replication , 一个BulkShardRequest 处理完成后(也就是一个Shard 对应的数据集合),会刷写Translog日志。
5、完成数据索引之后,如果请求里面要求refresh,则会立即refresh,如果要求立即将translog写入storage也会立即执行。