【引言】
上篇博客对xxl-job分布式定时任务框架的架构做了整体介绍,本篇博客开始主要从源码入手,简单学习下xxl-job的实现原理。
【项目结构】
此项目主要分为三个模块:xxl-job-admin(管理中心),xxl-job-core(核心),xxl-job-executor-samples(示例).
xxl-job-core
xxl-job-admin
xxl-job-executor-samples
【启动流程】
从xxl-job-admin的spring 配置文件(applicationcontext-xxl-job-admin.xml)出发,主要配置了三部分:数据源,事务,基于quartz实现的调度任务,配置如下:
... 此处省略
... 此处省略
通过以上配置,可以看出启动的主要工作在XxlJobDynamicScheduler中的init()方法中实现,源码如下:
public void init() throws Exception {
// admin registry monitor run
// 启动自动注册线程, 获取类型为自动注册的执行器信息,完成机器的自动注册与发现
JobRegistryMonitorHelper.getInstance().start();
// admin monitor run
// 启动失败日志监控线程
JobFailMonitorHelper.getInstance().start();
// admin-server(spring-mvc)
NetComServerFactory.putService(AdminBiz.class, XxlJobDynamicScheduler.adminBiz);
NetComServerFactory.setAccessToken(accessToken);
// init i18n
initI18n();
// valid
Assert.notNull(scheduler, "quartz scheduler is null");
logger.info(">>>>>>>>> init xxl-job admin success.");
}
JobRegistryMonitorHelper.getInstance().start() 代码如下:
public void start(){
// 创建一个线程
registryThread = new Thread(new Runnable() {
@Override
public void run() {
while (!toStop) {
try {
// auto registry group
// 获取类型为自动注册的执行器地址列表
List groupList = XxlJobDynamicScheduler.xxlJobGroupDao.findByAddressType(0);
if (CollectionUtils.isNotEmpty(groupList)) {
// remove dead address (admin/executor)
// 删除 90秒之内没有更新信息的注册机器, 90秒没有心跳信息返回,代表机器已经出现问题,故移除
XxlJobDynamicScheduler.xxlJobRegistryDao.removeDead(RegistryConfig.DEAD_TIMEOUT);
// fresh online address (admin/executor)
HashMap> appAddressMap = new HashMap>();
// 查询在90秒之内有过更新的机器列表
List list = XxlJobDynamicScheduler.xxlJobRegistryDao.findAll(RegistryConfig.DEAD_TIMEOUT);
if (list != null) {
// 循环注册机器列表, 根据执行器不同,将这些机器列表区分拿出来
for (XxlJobRegistry item: list) {
// 判断该机器注册信息RegistryGroup ,RegistType 是否是EXECUTOR , EXECUTOR 代表该机器是注册到执行器上面的
if (RegistryConfig.RegistType.EXECUTOR.name().equals(item.getRegistryGroup())) {
// 获取注册的执行器 KEY (也就是执行器)
String appName = item.getRegistryKey();
List registryList = appAddressMap.get(appName);
if (registryList == null) {
registryList = new ArrayList();
}
if (!registryList.contains(item.getRegistryValue())) {
registryList.add(item.getRegistryValue());
}
// 收集 机器信息,根据执行器做区分
appAddressMap.put(appName, registryList);
}
}
}
// fresh group address
// 遍历执行器列表
for (XxlJobGroup group: groupList) {
// 通过执行器的APP_NAME 拿出他下面的集群机器地址
List registryList = appAddressMap.get(group.getAppName());
String addressListStr = null;
if (CollectionUtils.isNotEmpty(registryList)) {
Collections.sort(registryList);
addressListStr = StringUtils.join(registryList, ",");
}
group.setAddressList(addressListStr);
// 将 这个执行器的 集群机器地址列表,写入到数据库
XxlJobDynamicScheduler.xxlJobGroupDao.update(group);
}
}
} catch (Exception e) {
logger.error("job registry instance error:{}", e);
}
try {
TimeUnit.SECONDS.sleep(RegistryConfig.BEAT_TIMEOUT);
} catch (InterruptedException e) {
logger.error("job registry instance error:{}", e);
}
}
}
});
registryThread.setDaemon(true);
//启动线程
registryThread.start();
}
JobFailMonitorHelper.getInstance().start() 代码如下:
public void start(){
// 启动线程
monitorThread = new Thread(new Runnable() {
@Override
public void run() {
// monitor
while (!toStop) {
try {
List jobLogIdList = new ArrayList();
// 从队列中拿出所有可用的 jobLogIds
int drainToNum = JobFailMonitorHelper.instance.queue.drainTo(jobLogIdList);
if (CollectionUtils.isNotEmpty(jobLogIdList)) {
for (Integer jobLogId : jobLogIdList) {
if (jobLogId==null || jobLogId==0) {
continue;
}
XxlJobLog log = XxlJobDynamicScheduler.xxlJobLogDao.load(jobLogId);
if (log == null) {
continue;
}
//任务触发成功, 但是JobHandle 还没有返回结果
if (IJobHandler.SUCCESS.getCode() == log.getTriggerCode() && log.getHandleCode() == 0) {
// job running
//将 JobLogId 放入队列 , 继续监控
JobFailMonitorHelper.monitor(jobLogId);
logger.info(">>>>>>>>>>> job monitor, job running, JobLogId:{}", jobLogId);
} else if (IJobHandler.SUCCESS.getCode() == log.getHandleCode()) {
// job success, pass
logger.info(">>>>>>>>>>> job monitor, job success, JobLogId:{}", jobLogId);
} else /*if (IJobHandler.FAIL.getCode() == log.getTriggerCode()
|| IJobHandler.FAIL.getCode() == log.getHandleCode()
|| IJobHandler.FAIL_RETRY.getCode() == log.getHandleCode() )*/ {
// job fail,
// 1、fail retry
XxlJobInfo info = XxlJobDynamicScheduler.xxlJobInfoDao.loadById(log.getJobId());
if (log.getExecutorFailRetryCount() > 0) {
JobTriggerPoolHelper.trigger(log.getJobId(), (log.getExecutorFailRetryCount()-1), TriggerTypeEnum.RETRY);
String retryMsg = "
>>>>>>>>>>>"+ I18nUtil.getString("jobconf_trigger_type_retry") +"<<<<<<<<<<<
";
log.setTriggerMsg(log.getTriggerMsg() + retryMsg);
XxlJobDynamicScheduler.xxlJobLogDao.updateTriggerInfo(log);
}
// 2、fail alarm
failAlarm(info, log);
// 任务执行失败, 执行发送邮件等预警措施
logger.info(">>>>>>>>>>> job monitor, job fail, JobLogId:{}", jobLogId);
}/* else {
JobFailMonitorHelper.monitor(jobLogId);
logger.info(">>>>>>>>>>> job monitor, job status unknown, JobLogId:{}", jobLogId);
}*/
}
}
TimeUnit.SECONDS.sleep(10);
} catch (Exception e) {
logger.error("job monitor error:{}", e);
}
}
// monitor all clear
List jobLogIdList = new ArrayList();
int drainToNum = getInstance().queue.drainTo(jobLogIdList);
if (jobLogIdList!=null && jobLogIdList.size()>0) {
for (Integer jobLogId: jobLogIdList) {
XxlJobLog log = XxlJobDynamicScheduler.xxlJobLogDao.load(jobLogId);
if (ReturnT.FAIL_CODE == log.getTriggerCode()|| ReturnT.FAIL_CODE==log.getHandleCode()) {
// job fail,
XxlJobInfo info = XxlJobDynamicScheduler.xxlJobInfoDao.loadById(log.getJobId());
failAlarm(info, log);
logger.info(">>>>>>>>>>> job monitor last, job fail, JobLogId:{}", jobLogId);
}
}
}
}
});
monitorThread.setDaemon(true);
monitorThread.start();
}
【总结】
通过以上分析,可以得到如下总结:
xxl-job在启动的时候,启动了两个线程,一是用来监控自动注册的机器,达到自动注册的目的;二是监控任务的执行状态,若失败,则发送邮件报警。