Dolpinshcheduler随笔一:MasterServer

1:Master项目,核心服务线程BeanMasterSchedulerService,在Master服务启动的时候,由容器注入、初始化、启动。

MasterServer

/**
* run master server
*/
@PostConstruct  
public void run() throws SchedulerException {
    // init remoting server
    NettyServerConfig serverConfig = new NettyServerConfig();
    serverConfig.setListenPort(masterConfig.getListenPort());
    this.nettyRemotingServer = new NettyRemotingServer(serverConfig);
    this.nettyRemotingServer.registerProcessor(CommandType.TASK_EXECUTE_RESPONSE, taskResponseProcessor);
    this.nettyRemotingServer.registerProcessor(CommandType.TASK_EXECUTE_ACK, taskAckProcessor);
    this.nettyRemotingServer.registerProcessor(CommandType.TASK_KILL_RESPONSE, new TaskKillResponseProcessor());
    this.nettyRemotingServer.registerProcessor(CommandType.STATE_EVENT_REQUEST, stateEventProcessor);
    this.nettyRemotingServer.registerProcessor(CommandType.TASK_FORCE_STATE_EVENT_REQUEST, taskEventProcessor);
    this.nettyRemotingServer.registerProcessor(CommandType.TASK_WAKEUP_EVENT_REQUEST, taskEventProcessor);
    this.nettyRemotingServer.registerProcessor(CommandType.CACHE_EXPIRE, cacheProcessor);
    this.nettyRemotingServer.start();
    // self tolerant
    this.masterRegistryClient.init();
    this.masterRegistryClient.start();
    this.masterRegistryClient.setRegistryStoppable(this);

    this.eventExecuteService.init();
    this.eventExecuteService.start();
    // 初始化MasterSchedulerService
    this.masterSchedulerService.init();
    // 启动服务进程
    this.masterSchedulerService.start();

    this.scheduler.start();

    Runtime.getRuntime().addShutdownHook(new Thread(() -> {
        if (Stopper.isRunning()) {
            close("shutdownHook");
        }
    }));
}

MasterSchedulerService是一个独立线程Bean,线程主要执行体:

/**
* run of MasterSchedulerService
*/
@Override  
public void run() {
    logger.info("master scheduler started");
    while (Stopper.isRunning()) {
        try {
            // 判断当前服务器CPU、内存负载是否达到配置的上限
            boolean runCheckFlag = OSUtils.checkResource(masterConfig.getMaxCpuLoadAvg(), masterConfig.getReservedMemory());
            if (!runCheckFlag) {
                Thread.sleep(Constants.SLEEP_TIME_MILLIS);
                continue;
            }
            // 调度一次
            scheduleProcess();
        } catch (Exception e) {
            logger.error("master scheduler thread error", e);
        }
    }
}
image

scheduleProcess()方法

该方法主要的功能是用工作流实例processInstance封装WorkflowExecuteThread线程对象,然后交给ExecutorService线程池执行。

/**
     * 1\. get command by slot
     * 2\. donot handle command if slot is empty
     */
    private void scheduleProcess() throws Exception {

        // 一次只取到一个Command对象(根据分片算法)。
        // 这里根绝分片算法获取Command的方式,其实有个问题,大家伙有发现的吗,可以在下方留言。
        Command command = findOneCommand();
        if (command != null) {
            logger.info("find one command: id: {}, type: {}", command.getId(), command.getCommandType());
            try {
                ProcessInstance processInstance = processService.handleCommand(logger, getLocalAddress(), command);

                if (processInstance != null) {
                    WorkflowExecuteThread workflowExecuteThread = new WorkflowExecuteThread(
                            processInstance
                            , taskResponseService
                            , processService
                            , nettyExecutorManager
                            , processAlertManager
                            , masterConfig
                            , taskTimeoutCheckList
                            , taskRetryCheckList);

                    this.processInstanceExecMaps.put(processInstance.getId(), workflowExecuteThread);
                    if (processInstance.getTimeout() > 0) {
                        this.processTimeoutCheckList.put(processInstance.getId(), processInstance);
                    }
                    logger.info("handle command end, command {} process {} start...",
                            command.getId(), processInstance.getId());
                    masterExecService.execute(workflowExecuteThread);
                }
            } catch (Exception e) {
                logger.error("scan command error ", e);
                processService.moveToErrorCommand(command, e.toString());
            }
        } else {
            //indicate that no command ,sleep for 1s
            Thread.sleep(Constants.SLEEP_TIME_MILLIS);
        }
    }

WorkflowExecuteThread线程体run中的主要功能是构建工作流任务的DAG(有向无环图),以及一堆及其复杂的任务状态等内部业务逻辑处理,最后将拆分可提交的任务提交给WorkServer进行执行。

我们只关注主流程,Master的内部业务逻辑我们不去深究。

    private void startProcess() throws Exception {
        if (this.taskInstanceHashMap.size() == 0) {
            isStart = false;
            // 根据工作流实例构建DAG
            buildFlowDag();
            // 初始化任务队列。因为WorkflowExecuteThread的操作,主要基于对象内的各种task list对象操作的,这里
            // 就是对这些列表对象的初始化操作等。
            initTaskQueue();
            submitPostNode(null);
            isStart = true;
        }
    }

下面主要看提交任务Node的方法submitPostNode()

Set submitTaskNodeList = DagHelper.parsePostNodes(parentNodeCode, skipTaskNodeList, dag, completeTaskList);
        // 1:一堆提交前的任务Node的各种业务处理,不看了
        List taskInstances = new ArrayList<>();
        for (String taskNode : submitTaskNodeList) {
            TaskNode taskNodeObject = dag.getNode(taskNode);
            if (taskInstanceHashMap.containsColumn(taskNodeObject.getCode())) {
                continue;
            }
            TaskInstance task = createTaskInstance(processInstance, taskNodeObject);
            taskInstances.add(task);
        }

        // if previous node success , post node submit
        for (TaskInstance task : taskInstances) {

            if (readyToSubmitTaskQueue.contains(task)) {
                continue;
            }

            if (completeTaskList.containsKey(Long.toString(task.getTaskCode()))) {
                logger.info("task {} has already run success, task id:{}", task.getName(), task.getId());
                continue;
            }
            if (task.getState().typeIsPause() || task.getState().typeIsCancel()) {
                logger.info("task {} stopped, the state is {}, task id:{}", task.getName(), task.getState(), task.getId());
            } else {
                addTaskToStandByList(task);
            }
        }
        // 2:提交任务Node。提交操作主要是操作属性队列readyToSubmitTaskQueue。上面的代码中,经过各种业务处理后,符合条件的Task会加到这个队列中。
        submitStandByTask();
        // 3:更新工作流实例状态
        updateProcessInstanceState();

最终在调用到方法notifyProcessHostUpdate(TaskInstance)

    // 调用netty发送命令体的功能,很不解为什么方法名叫这个哈,有看明白的可以在下方留言下。
    private void notifyProcessHostUpdate(TaskInstance taskInstance) {
        if (StringUtils.isEmpty(taskInstance.getHost())) {
            return;
        }

        try {
            HostUpdateCommand hostUpdateCommand = new HostUpdateCommand();
            hostUpdateCommand.setProcessHost(NetUtils.getAddr(masterConfig.getListenPort()));
            hostUpdateCommand.setTaskInstanceId(taskInstance.getId());
            Host host = new Host(taskInstance.getHost());
            // 这里是是具体调用NettyClient封装对象进行提交的代码
            nettyExecutorManager.doExecute(host, hostUpdateCommand.convert2Command());
        } catch (Exception e) {
            logger.error("notify process host update", e);
        }
    }

继续看NettyExecutorManager.doExecute

 public void doExecute(final Host host, final Command command) throws ExecuteException {
        /**
         * retry count,default retry 3
         */
        int retryCount = 3;
        boolean success = false;
        do {
            try {
                // 这个对象就是Netty Client对象了。
                // 可以看到Netty Client封装的发送方式,基本是一个Host 目标对象 + Command 命令对象的形式。
                nettyRemotingClient.send(host, command);
                success = true;
            } catch (Exception ex) {
                logger.error(String.format("send command : %s to %s error", command, host), ex);
                retryCount--;
                ThreadUtils.sleep(100);
            }
        } while (retryCount >= 0 && !success);

        if (!success) {
            throw new ExecuteException(String.format("send command : %s to %s error", command, host));
        }
    }

至此,MasterServer的主要逻辑我们已经理清,如下:

  1. Quartz调度或者网页端操作产生Command记录

  2. MasterSchedulerService会定时查询Command表,解析成ProcessInstance,然后将其封装成WorkflowExecuteThread,交给线程池处理

  3. WorkflowExecuteThread内部,在对ProcessInstance解析成相应的TaskInstance对象,然后加到readyToSubmitTaskQueue队列,

NettyExecutorManager读取该队列将相应的taskInstance提交到相应的WorkServer

你可能感兴趣的:(Dolpinshcheduler随笔一:MasterServer)