还记得我们在JobScheduler中,在创建任务详情时,会调用一个建造器JobBuilder来创建一个Job,类型是LiteJob。
LiteJob.java
/**
* Lite调度作业.
*
* @author zhangliang
*/
public final class LiteJob implements Job {
@Setter
private ElasticJob elasticJob;
@Setter
private JobFacade jobFacade;
@Override
public void execute(final JobExecutionContext context) throws JobExecutionException {
JobExecutorFactory.getJobExecutor(elasticJob, jobFacade).execute();
}
}
进入到LiteJob,我们可以看到,它继承自quartz中的Job,同时新增了两个属性elasticJob和jobFacade,这个我们后续分析。我们关注的是execute方法。首先通过工厂模式,确定了执行器,我们可以看到有三种执行器,分别是ScriptJobExecutor、SimpleJobExecutor和DataflowJobExecutor,分别对应了三种job类型。由于SimpleJob覆盖了80%的使用场景,我们主要来分析一下SimpleJobExecutor。
SimpleExecutor.java
public final class SimpleJobExecutor extends AbstractElasticJobExecutor {
private final SimpleJob simpleJob;
public SimpleJobExecutor(final SimpleJob simpleJob, final JobFacade jobFacade) {
super(jobFacade);
this.simpleJob = simpleJob;
}
@Override
protected void process(final ShardingContext shardingContext) {
simpleJob.execute(shardingContext);
}
}
这个执行器继承自AbstractElasticJobExecutor,然后里面实现的内容也很简单,子类需要实现父类的方法process,其他的方法在父类中执行。我们重点看一下AbstractElasticJobExecutor这个基础执行器。
AbstractElasticJobExecutor.java
从代码结构看,主要看几个方法,execute()和process(),这边都是前后依赖的,所以我们顺序看一下。
execute()
try {
jobFacade.checkJobExecutionEnvironment();
} catch (final JobExecutionEnvironmentException cause) {
jobExceptionHandler.handleException(jobName, cause);
}
首先检查运行环境信息。跟进去,我们可以发现,检查的内容是本机与注册中心的时间误差秒数是否在允许范围,就是我们配置的max-time-diff-seconds,超过范围就直接抛出异常。
ShardingContexts shardingContexts = jobFacade.getShardingContexts();
if (shardingContexts.isAllowSendJobEvent()) {
jobFacade.postJobStatusTraceEvent(shardingContexts.getTaskId(), State.TASK_STAGING, String.format("Job '%s' execute begin.", jobName));
}
接着,首先获取分片上下文信息。获取分片上下文的具体执行内容为:
@Override
public ShardingContexts getShardingContexts() {
boolean isFailover = configService.load(true).isFailover();
if (isFailover) {
List failoverShardingItems = failoverService.getLocalFailoverItems();
if (!failoverShardingItems.isEmpty()) {
return executionContextService.getJobShardingContext(failoverShardingItems);
}
}
shardingService.shardingIfNecessary();
List shardingItems = shardingService.getLocalShardingItems();
if (isFailover) {
shardingItems.removeAll(failoverService.getLocalTakeOffItems());
}
shardingItems.removeAll(executionService.getDisabledItems(shardingItems));
return executionContextService.getJobShardingContext(shardingItems);
}
首先根据配置判断是否开启失效转移failover,开启表示如果作业在一次任务执行中途宕机,允许将该次未完成的任务在另一作业节点上补偿执行。
下一步,判断是否需要分片。
public void shardingIfNecessary() {
List availableJobInstances = instanceService.getAvailableJobInstances();//获取可用任务节点
if (!isNeedSharding() || availableJobInstances.isEmpty()) {
return;
}
if (!leaderService.isLeaderUntilBlock()) {//判断当前节点是否是主节点,如果主节点正在选举,则阻塞至主节点选举完成后再返回
blockUntilShardingCompleted();//阻塞至分片完成
return;
}
waitingOtherJobCompleted();
LiteJobConfiguration liteJobConfig = configService.load(false);//从配置文件中获取任务配置信息
int shardingTotalCount = liteJobConfig.getTypeConfig().getCoreConfig().getShardingTotalCount();
log.debug("Job '{}' sharding begin.", jobName);
jobNodeStorage.fillEphemeralJobNode(ShardingNode.PROCESSING, "");
resetShardingInfo(shardingTotalCount);//重置分片信息
JobShardingStrategy jobShardingStrategy = JobShardingStrategyFactory.getStrategy(liteJobConfig.getJobShardingStrategyClass());//获取配置文件中的分片策略
jobNodeStorage.executeInTransaction(new PersistShardingInfoTransactionExecutionCallback(jobShardingStrategy.sharding(availableJobInstances, jobName, shardingTotalCount)));//根据不同的策略进行分片,这块后续我们再分析
log.debug("Job '{}' sharding complete.", jobName);
}
分片判断完成后,获取本机的分片项,然后如果开启失效转移,删除本地失效转移项。
言归正传,获取到分片上下文shardingContexts后,判断是否允许发送任务事件。
if (jobFacade.misfireIfRunning(shardingContexts.getShardingItemParameters().keySet())) {//设置任务被错过执行的标记
if (shardingContexts.isAllowSendJobEvent()) {
jobFacade.postJobStatusTraceEvent(shardingContexts.getTaskId(), State.TASK_FINISHED, String.format(
"Previous job '%s' - shardingItems '%s' is still running, misfired job will start after previous job completed.", jobName,
shardingContexts.getShardingItemParameters().keySet()));
}
return;
}
分片项被错过执行,发布任务事件。下一步,准备执行。
try {
jobFacade.beforeJobExecuted(shardingContexts);
} catch (final Throwable cause) {
jobExceptionHandler.handleException(jobName, cause);
}
execute(shardingContexts, JobExecutionEvent.ExecutionSource.NORMAL_TRIGGER);
while (jobFacade.isExecuteMisfired(shardingContexts.getShardingItemParameters().keySet())) {
jobFacade.clearMisfire(shardingContexts.getShardingItemParameters().keySet());
execute(shardingContexts, JobExecutionEvent.ExecutionSource.MISFIRE);
}
jobFacade.failoverIfNecessary();
try {
jobFacade.afterJobExecuted(shardingContexts);
} catch (final Throwable cause) {
jobExceptionHandler.handleException(jobName, cause);
}
先做一些执行前准备(清理上次执行信息),然后执行,判断是否开启失效转移,执行成功后做一些执行后处理。
execute(final ShardingContexts shardingContexts, final JobExecutionEvent.ExecutionSource executionSource)
看代码...
if (shardingContexts.getShardingItemParameters().isEmpty()) {
if (shardingContexts.isAllowSendJobEvent()) {
jobFacade.postJobStatusTraceEvent(shardingContexts.getTaskId(), State.TASK_FINISHED, String.format("Sharding item for job '%s' is empty.", jobName));
}
return;
}
jobFacade.registerJobBegin(shardingContexts);
String taskId = shardingContexts.getTaskId();
if (shardingContexts.isAllowSendJobEvent()) {
jobFacade.postJobStatusTraceEvent(taskId, State.TASK_RUNNING, "");
}
try {
process(shardingContexts, executionSource);
} finally {
// TODO 考虑增加作业失败的状态,并且考虑如何处理作业失败的整体回路
jobFacade.registerJobCompleted(shardingContexts);
if (itemErrorMessages.isEmpty()) {
if (shardingContexts.isAllowSendJobEvent()) {
jobFacade.postJobStatusTraceEvent(taskId, State.TASK_FINISHED, "");
}
} else {
if (shardingContexts.isAllowSendJobEvent()) {
jobFacade.postJobStatusTraceEvent(taskId, State.TASK_ERROR, itemErrorMessages.toString());
}
}
}
主要做一些前期准备和后期的方法。重点是process方法,我们继续看。
process(final ShardingContexts shardingContexts, final JobExecutionEvent.ExecutionSource executionSource)
Collection items = shardingContexts.getShardingItemParameters().keySet();
if (1 == items.size()) {//一个分片,立即执行
int item = shardingContexts.getShardingItemParameters().keySet().iterator().next();
JobExecutionEvent jobExecutionEvent = new JobExecutionEvent(shardingContexts.getTaskId(), jobName, executionSource, item);
process(shardingContexts, item, jobExecutionEvent);
return;
}
final CountDownLatch latch = new CountDownLatch(items.size());//多个分片,使用CountDownLatch,并行执行
for (final int each : items) {
final JobExecutionEvent jobExecutionEvent = new JobExecutionEvent(shardingContexts.getTaskId(), jobName, executionSource, each);
if (executorService.isShutdown()) {
return;
}
executorService.submit(new Runnable() {
@Override
public void run() {
try {
process(shardingContexts, each, jobExecutionEvent);
} finally {
latch.countDown();
}
}
});
}
try {
latch.await();
} catch (final InterruptedException ex) {
Thread.currentThread().interrupt();
}
private void process(final ShardingContexts shardingContexts, final int item, final JobExecutionEvent startEvent)
if (shardingContexts.isAllowSendJobEvent()) {
jobFacade.postJobExecutionEvent(startEvent);
}
log.trace("Job '{}' executing, item is: '{}'.", jobName, item);
JobExecutionEvent completeEvent;
try {
process(new ShardingContext(shardingContexts, item));
completeEvent = startEvent.executionSuccess();
log.trace("Job '{}' executed, item is: '{}'.", jobName, item);
if (shardingContexts.isAllowSendJobEvent()) {
jobFacade.postJobExecutionEvent(completeEvent);
}
} catch (final Throwable cause) {
completeEvent = startEvent.executionFailure(cause);
jobFacade.postJobExecutionEvent(completeEvent);
itemErrorMessages.put(item, ExceptionUtil.transform(cause));
jobExceptionHandler.handleException(jobName, cause);
}