本文主要研究一下PowerJob的DispatchStrategy
tech/powerjob/common/enums/DispatchStrategy.java
@Getter
@AllArgsConstructor
public enum DispatchStrategy {
HEALTH_FIRST(1),
RANDOM(2);
private final int v;
public static DispatchStrategy of(Integer v) {
if (v == null) {
return HEALTH_FIRST;
}
for (DispatchStrategy ds : values()) {
if (v.equals(ds.v)) {
return ds;
}
}
throw new IllegalArgumentException("unknown DispatchStrategy of " + v);
}
}
DispatchStrategy定义了HEALTH_FIRST、RANDOM两个枚举值
tech/powerjob/server/remote/worker/WorkerClusterQueryService.java
public List getSuitableWorkers(JobInfoDO jobInfo) {
List workers = Lists.newLinkedList(getWorkerInfosByAppId(jobInfo.getAppId()).values());
workers.removeIf(workerInfo -> filterWorker(workerInfo, jobInfo));
DispatchStrategy dispatchStrategy = DispatchStrategy.of(jobInfo.getDispatchStrategy());
switch (dispatchStrategy) {
case RANDOM:
Collections.shuffle(workers);
break;
case HEALTH_FIRST:
workers.sort((o1, o2) -> o2.getSystemMetrics().calculateScore() - o1.getSystemMetrics().calculateScore());
break;
default:
// do nothing
}
// 限定集群大小(0代表不限制)
if (!workers.isEmpty() && jobInfo.getMaxWorkerCount() > 0 && workers.size() > jobInfo.getMaxWorkerCount()) {
workers = workers.subList(0, jobInfo.getMaxWorkerCount());
}
return workers;
}
WorkerClusterQueryService的getSuitableWorkers方法先通过getWorkerInfosByAppId获取指定appId的WorkerInfo,然后通过filterWorker进行一次过滤,最后根据dispatchStrategy来对workers进行排序,如果是RANDOM则通过Collections.shuffle(workers)随机化,如果是HEALTH_FIRST则根据systemMetrics的calculateScore结果进行排序,如果有限定maxWorkerCount则对workers进行subList,没有则返回排序后的workers
private Map getWorkerInfosByAppId(Long appId) {
ClusterStatusHolder clusterStatusHolder = getAppId2ClusterStatus().get(appId);
if (clusterStatusHolder == null) {
log.warn("[WorkerManagerService] can't find any worker for app(appId={}) yet.", appId);
return Collections.emptyMap();
}
return clusterStatusHolder.getAllWorkers();
}
public Map getAppId2ClusterStatus() {
return WorkerClusterManagerService.getAppId2ClusterStatus();
}
getWorkerInfosByAppId通过WorkerClusterManagerService.getAppId2ClusterStatus()获取ClusterStatusHolder,在返回ClusterStatusHolder的getAllWorkers
private boolean filterWorker(WorkerInfo workerInfo, JobInfoDO jobInfo) {
for (WorkerFilter filter : workerFilters) {
if (filter.filter(workerInfo, jobInfo)) {
return true;
}
}
return false;
}
filterWorker方法则是遍历workerFilters直接filter
tech/powerjob/common/model/SystemMetrics.java
public int calculateScore() {
if (score > 0) {
return score;
}
// Memory is vital to TaskTracker, so we set the multiplier factor as 2.
double memScore = (jvmMaxMemory - jvmUsedMemory) * 2;
// Calculate the remaining load of CPU. Multiplier is set as 1.
double cpuScore = cpuProcessors - cpuLoad;
// Windows can not fetch CPU load, set cpuScore as 1.
if (cpuScore > cpuProcessors) {
cpuScore = 1;
}
score = (int) (memScore + cpuScore);
return score;
}
SystemMetrics的calculateScore方法则是基于memScore与cpuScore来计算
public interface WorkerFilter {
/**
*
* @param workerInfo worker info, maybe you need to use your customized info in SystemMetrics#extra
* @param jobInfoDO job info
* @return true will remove the worker in process list
*/
boolean filter(WorkerInfo workerInfo, JobInfoDO jobInfoDO);
}
WorkerFilter定义了filter接口用于过滤worker,它有3个实现类,分别是DesignatedWorkerFilter、DisconnectedWorkerFilter、SystemMetricsWorkerFilter
tech/powerjob/server/extension/defaultimpl/workerfilter/DesignatedWorkerFilter.java
@Slf4j
@Component
public class DesignatedWorkerFilter implements WorkerFilter {
@Override
public boolean filter(WorkerInfo workerInfo, JobInfoDO jobInfo) {
String designatedWorkers = jobInfo.getDesignatedWorkers();
// no worker is specified, no filter of any
if (StringUtils.isEmpty(designatedWorkers)) {
return false;
}
Set designatedWorkersSet = Sets.newHashSet(SJ.COMMA_SPLITTER.splitToList(designatedWorkers));
for (String tagOrAddress : designatedWorkersSet) {
if (tagOrAddress.equals(workerInfo.getTag()) || tagOrAddress.equals(workerInfo.getAddress())) {
return false;
}
}
return true;
}
}
DesignatedWorkerFilter的filter方法遍历jobInfo的designatedWorkers信息,判断workerInfo的tag或者address是否匹配
tech/powerjob/server/extension/defaultimpl/workerfilter/DisconnectedWorkerFilter.java
@Slf4j
@Component
public class DisconnectedWorkerFilter implements WorkerFilter {
@Override
public boolean filter(WorkerInfo workerInfo, JobInfoDO jobInfo) {
boolean timeout = workerInfo.timeout();
if (timeout) {
log.info("[Job-{}] filter worker[{}] due to timeout(lastActiveTime={})", jobInfo.getId(), workerInfo.getAddress(), workerInfo.getLastActiveTime());
}
return timeout;
}
}
DisconnectedWorkerFilter的filter方法则通过WorkerInfo的timeout方法来判断,它主要是判断当前时间与lastActiveTime的时间差是否大于WORKER_TIMEOUT_MS(
60s
)
tech/powerjob/server/extension/defaultimpl/workerfilter/SystemMetricsWorkerFilter.java
@Slf4j
@Component
public class SystemMetricsWorkerFilter implements WorkerFilter {
@Override
public boolean filter(WorkerInfo workerInfo, JobInfoDO jobInfo) {
SystemMetrics metrics = workerInfo.getSystemMetrics();
boolean filter = !metrics.available(jobInfo.getMinCpuCores(), jobInfo.getMinMemorySpace(), jobInfo.getMinDiskSpace());
if (filter) {
log.info("[Job-{}] filter worker[{}] because the {} do not meet the requirements", jobInfo.getId(), workerInfo.getAddress(), workerInfo.getSystemMetrics());
}
return filter;
}
}
SystemMetricsWorkerFilter的filter方法则根据workerInfo的SystemMetrics判断可用cpu核数、内存、磁盘空间是否大于阈值
DispatchStrategy定义了HEALTH_FIRST、RANDOM两个枚举值;WorkerClusterQueryService的getSuitableWorkers方法先通过getWorkerInfosByAppId获取指定appId的WorkerInfo,然后通过filterWorker进行一次过滤,最后根据dispatchStrategy来对workers进行排序,如果是RANDOM则通过Collections.shuffle(workers)随机化,如果是HEALTH_FIRST则根据systemMetrics的calculateScore结果进行排序,如果有限定maxWorkerCount则对workers进行subList,没有则返回排序后的workers。