仔仔1993

Dolphinscheduler3.0源码分析

前言
1 DolphinScheduler的设计与策略
- 1.1 分布式设计
- - 1.1.1 中心化
  - 1.1.2 去中心化
- 1.2 DophinScheduler架构设计
- 1.3 容错问题
- - 1.3.1 宕机容错
  - 1.3.2 失败重试
- 1.4 远程日志访问
2 DolphinScheduler源码分析
- 2.1 工程模块介绍与配置文件
- - 2.1.1 工程模块介绍
  - 2.1.1 配置文件
- 2.2 Api主要任务操作接口
- 2.3 Quaterz架构与运行流程
- - 2.3.1 概念与架构
  - 2.3.2 初始化与执行流程
  - 2.3.3 集群运转
- 2.4 Master启动与执行流程
- - 2.4.1 概念与执行逻辑
  - 2.4.2 集群与槽（slot）
  - 2.4.3 代码执行流程
- 2.5 Work启动与执行流程
- - 2.5.1 概念与执行逻辑
  - 2.5.2 代码执行流程
- 2.6 rpc交互
- - 2.6.1 Master与Worker交互
  - 2.6.2 其他服务与Master交互
- 2.7 负载均衡算法
- - 2.7.1 加权随机
  - 2.7.2 线性负载
  - 2.7.3 平滑轮询
- 2.8 日志服务
- 2.9 报警
3 后记
- 3.1 make friends
4 参考文献

前言

研究Dolphinscheduler也是机缘巧合，平时负责基于xxl-job二次开发出来的调度平台，因为遇到了并发性能瓶颈，到了不得不优化重构的地步，所以搜索市面上应用较广的调度平台以借鉴优化思路。在阅读完DolphinScheduler代码之后，便生出了将其设计与思考记录下来的念头，这边是此篇文章的来源。因为没有正式生产使用，业务理解不一定透彻，理解可能有偏差，欢迎大家交流讨论。

1 DolphinScheduler的设计与策略

大家能关注DolphinScheduler那么一定对调度系统有了一定的了解，对于调度所涉及的到一些专有名词在这里就不做过多的介绍，重点介绍一下流程定义，流程实例，任务定义，任务实例。（没有作业这个概念确实也很新奇，可能是不想和Quartz的JobDetail重叠）
任务定义：各种类型的任务，是流程定义的关键组成，如sql,shell,spark,mr，python等
任务实例：任务的实例化，标识着具体的任务执行状态
流程定义：一组任务节点通过依赖关系建立的起来的有向无环图（DAG）
流程实例：通过手动或者定时调度生成的流程实例
定时调度：系统采用Quartz 分布式调度器，并同时支持cron表达式可视化的生成

1.1 分布式设计

分布式系统的架构设计基本分为中心化和去中心化两种，各有优劣，凭借各自的业务选择。

1.1.1 中心化

中心化设计比较简单，集群中的节点安装角色可以分为Master和slave两种，如下图：

Master: Master的角色主要负责任务分发并监督Slave的健康状态，可以动态的将任务均衡到Slave上，以致Slave节点不至于“忙死”或”闲死”的状态。
Worker: Worker的角色主要负责任务的执行工作并维护和Master的心跳，以便Master可以分配任务给Slave。
中心化设计存在一些问题。
第一点，一旦Master出现了问题，则群龙无首，整个集群就会崩溃。为了解决这个问题，大多数Master/Slave架构模式都采用了主备Master的设计方案，可以是热备或者冷备，也可以是自动切换或手动切换，而且越来越多的新系统都开始具备自动选举切换Master的能力,以提升系统的可用性。
第二点，如果Scheduler在Master上，虽然可以支持一个DAG中不同的任务运行在不同的机器上，但是会产生Master的过负载。如果Scheduler在Slave上，则一个DAG中所有的任务都只能在某一台机器上进行作业提交，则并行任务比较多的时候，Slave的压力可能会比较大。
xxl-job就是采用这种设计方式，但是存在相应的问题。管理器(admin)宕机集群会崩溃，scheduler在管理器上，管理器负责所有任务的校验和分发，管理器存在过载的风险。需要开发者执行想方案解决。

1.1.2 去中心化

在去中心化设计里，通常没有Master/Slave的概念，所有的角色都是一样的，地位是平等的，去中心化设计的核心设计在于整个分布式系统中不存在一个区别于其他节点的”管理者”，因此不存在单点故障问题。但由于不存在” 管理者”节点所以每个节点都需要跟其他节点通信才得到必须要的机器信息，而分布式系统通信的不可靠性，则大大增加了上述功能的实现难度。实际上，真正去中心化的分布式系统并不多见。反而动态中心化分布式系统正在不断涌出。在这种架构下，集群中的管理者是被动态选择出来的，而不是预置的，并且集群在发生故障的时候，集群的节点会自发的举行"会议"来选举新的"管理者"去主持工作。一般都是基于Raft算法实现的选举策略。动态展示见链接：http://thesecretlivesofdata.com/
DolphinScheduler的去中心化是Master/Worker注册到Zookeeper中，实现Master集群和Worker集群无中心，并使用Zookeeper分布式锁来选举其中的一台Master或Worker为“管理者”来执行任务。

1.2 DophinScheduler架构设计

随手盗用一张官网的系统架构图，可以看到调度系统采用去中心化设计，由UI，API，MasterServer，Zookeeper，WorkServer，Alert，LoggerSever等几部分组成。

Api: API接口层，主要负责处理前端UI层的请求。该服务统一提供RESTful api向外部提供请求服务。接口包括工作流的创建、定义、查询、修改、发布、下线、手工启动、停止、暂停、恢复、从该节点开始执行等等。
MasterServer: MasterServer采用分布式无中心设计理念，MasterServer集成了Quartz，主要负责 DAG 任务切分、任务提交监控，并同时监听其它MasterServer和WorkerServer的健康状态。 MasterServer服务启动时向Zookeeper注册临时节点，通过监听Zookeeper临时节点变化来进行容错处理。
WorkServer:WorkerServer也采用分布式无中心设计理念，WorkerServer主要负责任务的执行和提供日志服务。WorkerServer服务启动时向Zookeeper注册临时节点，并维持心跳。
ZooKeeper: ZooKeeper服务，系统中的MasterServer和WorkerServer节点都通过ZooKeeper来进行集群管理和容错。另外系统还基于ZooKeeper进行事件监听和分布式锁。
Alert:提供告警相关接口，接口主要包括两种类型的告警数据的存储、查询和通知功能。其中通知功能又有邮件通知和SNMP(暂未实现)两种。

1.3 容错问题

容错分为服务宕机容错和任务重试，服务宕机容错又分为Master容错和Worker容错两种情况：

1.3.1 宕机容错

服务容错设计依赖于ZooKeeper的Watcher机制，实现原理如图：

其中Master监控其他Master和Worker的目录，如果监听到remove事件，则会根据具体的业务逻辑进行流程实例容错或者任务实例容错，具体如下所示。
Master容错流程图

ZooKeeper Master容错完成之后则重新由DolphinScheduler中Scheduler线程调度，遍历 DAG 找到”正在运行”和“提交成功”的任务，对”正在运行”的任务监控其任务实例的状态，对”提交成功”的任务需要判断Task Queue中是否已经存在，如果存在则同样监控任务实例的状态，如果不存在则重新提交任务实例。
Worker容错流程图

Master Scheduler线程一旦发现任务实例为” 需要容错”状态，则接管任务并进行重新提交。
**注意：**由于” 网络抖动”可能会使得节点短时间内失去和ZooKeeper的心跳，从而发生节点的remove事件。对于这种情况，我们使用最简单的方式，那就是节点一旦和ZooKeeper发生超时连接，则直接将Master或Worker服务停掉。

1.3.2 失败重试

这里首先要区分任务失败重试、流程失败恢复、流程失败重跑的概念：

任务失败重试是任务级别的，是调度系统自动进行的，比如一个Shell任务设置重试次数为3次，那么在Shell任务运行失败后会自己再最多尝试运行3次。
流程失败恢复是流程级别的，是手动进行的，恢复是从只能从失败的节点开始执行或从当前节点开始执行。
流程失败重跑也是流程级别的，是手动进行的，重跑是从开始节点进行。

接下来说正题，我们将工作流中的任务节点分了两种类型。

一种是业务节点，这种节点都对应一个实际的脚本或者处理语句，比如Shell节点，MR节点、Spark节点、依赖节点等。
还有一种是逻辑节点，这种节点不做实际的脚本或语句处理，只是整个流程流转的逻辑处理，比如子流程节等。

每一个业务节点都可以配置失败重试的次数，当该任务节点失败，会自动重试，直到成功或者超过配置的重试次数。逻辑节点不支持失败重试。但是逻辑节点里的任务支持重试。
如果工作流中有任务失败达到最大重试次数，工作流就会失败停止，失败的工作流可以手动进行重跑操作或者流程恢复操作。

1.4 远程日志访问

由于Web(UI)和Worker不一定在同一台机器上，所以查看日志不能像查询本地文件那样。有两种方案：

将日志放到ES搜索引擎上
通过netty通信获取远程日志信息

介于考虑到尽可能的DolphinScheduler的轻量级性，所以选择了gRPC实现远程访问日志信息。具体代码时间见2.8；

2 DolphinScheduler源码分析

上一章的讲解可能初步看起来还不是很清晰，本章的主要目的是从代码层面一一介绍第一张讲解的功能。关于系统的安装在这里并不会设计，安装运行请大家自行探索。

2.1 工程模块介绍与配置文件

2.1.1 工程模块介绍

dolphinscheduler-alert 告警模块，提供告警服务。
dolphinscheduler-api web应用模块，提供 Rest Api 服务，供 UI 进行调用。
dolphinscheduler-common 通用的常量枚举、工具类、数据结构或者基类
dolphinscheduler-dao 提供数据库访问等操作。
dolphinscheduler-remote 基于netty的客户端、服务端
dolphinscheduler-server 日志与心跳服务
dolphinscheduler-log-server LoggerServer 用于Rest Api通过RPC查看日志
dolphinscheduler-master MasterServer服务，主要负责 DAG 的切分和任务状态的监控
dolphinscheduler-worker WorkerServer服务，主要负责任务的提交、执行和任务状态的更新。
dolphinscheduler-service service模块，包含Quartz、Zookeeper、日志客户端访问服务，便于server模块和api模块调用
dolphinscheduler-ui 前端模块

2.1.1 配置文件

dolphinscheduler-common common.properties

#本地工作目录,用于存放临时文件
data.basedir.path=/tmp/dolphinscheduler
#资源文件存储类型: HDFS,S3,NONE
resource.storage.type=NONE
#资源文件存储路径
resource.upload.path=/dolphinscheduler
#hadoop是否开启kerberos权限
hadoop.security.authentication.startup.state=false
#kerberos配置目录
java.security.krb5.conf.path=/opt/krb5.conf
#kerberos登录用户
[email protected]

#kerberos登录用户keytab
login.user.keytab.path=/opt/hdfs.headless.keytab

#kerberos过期时间,整数,单位为小时
kerberos.expire.time=2
#	如果存储类型为HDFS,需要配置拥有对应操作权限的用户
hdfs.root.user=hdfs
#请求地址如果resource.storage.type=S3,该值类似为: s3a://dolphinscheduler. 如果resource.storage.type=HDFS, 如果 hadoop 配置了 HA,需要复制core-site.xml 和 hdfs-site.xml 文件到conf目录
fs.defaultFS=hdfs://mycluster:8020
aws.access.key.id=minioadmin
aws.secret.access.key=minioadmin
aws.region=us-east-1
aws.endpoint=http://localhost:9000
# resourcemanager port, the default value is 8088 if not specified
resource.manager.httpaddress.port=8088
#yarn resourcemanager 地址, 如果resourcemanager开启了HA, 输入HA的IP地址(以逗号分隔),如果resourcemanager为单节点, 该值为空即可
yarn.resourcemanager.ha.rm.ids=192.168.xx.xx,192.168.xx.xx
#如果resourcemanager开启了HA或者没有使用resourcemanager,保持默认值即可. 如果resourcemanager为单节点,你需要将ds1 配置为resourcemanager对应的hostname
yarn.application.status.address=http://ds1:%s/ws/v1/cluster/apps/%s
# job history status url when application number threshold is reached(default 10000, maybe it was set to 1000)
yarn.job.history.status.address=http://ds1:19888/ws/v1/history/mapreduce/jobs/%s

# datasource encryption enable
datasource.encryption.enable=false

# datasource encryption salt
datasource.encryption.salt=!@#$%^&*

# data quality option
data-quality.jar.name=dolphinscheduler-data-quality-dev-SNAPSHOT.jar

#data-quality.error.output.path=/tmp/data-quality-error-data

# Network IP gets priority, default inner outer

# Whether hive SQL is executed in the same session
support.hive.oneSession=false

# use sudo or not, if set true, executing user is tenant user and deploy user needs sudo permissions; if set false, executing user is the deploy user and doesn't need sudo permissions
sudo.enable=true

# network interface preferred like eth0, default: empty
#dolphin.scheduler.network.interface.preferred=

# network IP gets priority, default: inner outer
#dolphin.scheduler.network.priority.strategy=default

# system env path
#dolphinscheduler.env.path=dolphinscheduler_env.sh

#是否处于开发模式
development.state=false

# rpc port
alert.rpc.port=50052

# Url endpoint for zeppelin RESTful API
zeppelin.rest.url=http://localhost:8080

dolphinscheduler-api application.yaml

server:
  port: 12345
  servlet:
    session:
      timeout: 120m
    context-path: /dolphinscheduler/
  compression:
    enabled: true
    mime-types: text/html,text/xml,text/plain,text/css,text/javascript,application/javascript,application/json,application/xml
  jetty:
    max-http-form-post-size: 5000000

spring:
  application:
    name: api-server
  banner:
    charset: UTF-8
  jackson:
    time-zone: UTC
    date-format: "yyyy-MM-dd HH:mm:ss"
  servlet:
    multipart:
      max-file-size: 1024MB
      max-request-size: 1024MB
  messages:
    basename: i18n/messages
  datasource:
#    driver-class-name: org.postgresql.Driver
#    url: jdbc:postgresql://127.0.0.1:5432/dolphinscheduler
    driver-class-name: com.mysql.jdbc.Driver
    url: jdbc:mysql://127.0.0.1:3306/dolphinscheduler?useUnicode=true&serverTimezone=UTC&characterEncoding=UTF-8&autoReconnect=true&useSSL=false&zeroDateTimeBehavior=convertToNull
    username: root
    password: root
    hikari:
      connection-test-query: select 1
      minimum-idle: 5
      auto-commit: true
      validation-timeout: 3000
      pool-name: DolphinScheduler
      maximum-pool-size: 50
      connection-timeout: 30000
      idle-timeout: 600000
      leak-detection-threshold: 0
      initialization-fail-timeout: 1
  quartz:
    auto-startup: false
    job-store-type: jdbc
    jdbc:
      initialize-schema: never
    properties:
      org.quartz.threadPool:threadPriority: 5
      org.quartz.jobStore.isClustered: true
      org.quartz.jobStore.class: org.quartz.impl.jdbcjobstore.JobStoreTX
      org.quartz.scheduler.instanceId: AUTO
      org.quartz.jobStore.tablePrefix: QRTZ_
      org.quartz.jobStore.acquireTriggersWithinLock: true
      org.quartz.scheduler.instanceName: DolphinScheduler
      org.quartz.threadPool.class: org.quartz.simpl.SimpleThreadPool
      org.quartz.jobStore.useProperties: false
      org.quartz.threadPool.makeThreadsDaemons: true
      org.quartz.threadPool.threadCount: 25
      org.quartz.jobStore.misfireThreshold: 60000
      org.quartz.scheduler.makeSchedulerThreadDaemon: true
#      org.quartz.jobStore.driverDelegateClass: org.quartz.impl.jdbcjobstore.PostgreSQLDelegate
      org.quartz.jobStore.driverDelegateClass: org.quartz.impl.jdbcjobstore.StdJDBCDelegate
      org.quartz.jobStore.clusterCheckinInterval: 5000

management:
  endpoints:
    web:
      exposure:
        include: '*'
  metrics:
    tags:
      application: ${spring.application.name}

registry:
  type: zookeeper
  zookeeper:
    namespace: dolphinscheduler
#    connect-string: localhost:2181
    connect-string: 10.255.158.70:2181
    retry-policy:
      base-sleep-time: 60ms
      max-sleep: 300ms
      max-retries: 5
    session-timeout: 30s
    connection-timeout: 9s
    block-until-connected: 600ms
    digest: ~

audit:
  enabled: false

metrics:
  enabled: true

python-gateway:
  # Weather enable python gateway server or not. The default value is true.
  enabled: true
  # The address of Python gateway server start. Set its value to `0.0.0.0` if your Python API run in different
  # between Python gateway server. It could be be specific to other address like `127.0.0.1` or `localhost`
  gateway-server-address: 0.0.0.0
  # The port of Python gateway server start. Define which port you could connect to Python gateway server from
  # Python API side.
  gateway-server-port: 25333
  # The address of Python callback client.
  python-address: 127.0.0.1
  # The port of Python callback client.
  python-port: 25334
  # Close connection of socket server if no other request accept after x milliseconds. Define value is (0 = infinite),
  # and socket server would never close even though no requests accept
  connect-timeout: 0
  # Close each active connection of socket server if python program not active after x milliseconds. Define value is
  # (0 = infinite), and socket server would never close even though no requests accept
  read-timeout: 0

# Override by profile

---
spring:
  config:
    activate:
      on-profile: mysql
  datasource:
    driver-class-name: com.mysql.jdbc.Driver
    url: jdbc:mysql://127.0.0.1:3306/dolphinscheduler?useUnicode=true&characterEncoding=UTF-8
  quartz:
    properties:
      org.quartz.jobStore.driverDelegateClass: org.quartz.impl.jdbcjobstore.StdJDBCDelegate

dolphinscheduler-master application.yaml

spring:
  banner:
    charset: UTF-8
  application:
    name: master-server
  jackson:
    time-zone: UTC
    date-format: "yyyy-MM-dd HH:mm:ss"
  cache:
    # default enable cache, you can disable by `type: none`
    type: none
    cache-names:
      - tenant
      - user
      - processDefinition
      - processTaskRelation
      - taskDefinition
    caffeine:
      spec: maximumSize=100,expireAfterWrite=300s,recordStats
  datasource:
    #driver-class-name: org.postgresql.Driver
    #url: jdbc:postgresql://127.0.0.1:5432/dolphinscheduler
    driver-class-name: com.mysql.jdbc.Driver
    url: jdbc:mysql://127.0.0.1:3306/dolphinscheduler?useUnicode=true&serverTimezone=UTC&characterEncoding=UTF-8&autoReconnect=true&useSSL=false&zeroDateTimeBehavior=convertToNull
    username: root
    password:
    hikari:
      connection-test-query: select 1
      minimum-idle: 5
      auto-commit: true
      validation-timeout: 3000
      pool-name: DolphinScheduler
      maximum-pool-size: 50
      connection-timeout: 30000
      idle-timeout: 600000
      leak-detection-threshold: 0
      initialization-fail-timeout: 1
  quartz:
    job-store-type: jdbc
    jdbc:
      initialize-schema: never
    properties:
      org.quartz.threadPool:threadPriority: 5
      org.quartz.jobStore.isClustered: true
      org.quartz.jobStore.class: org.quartz.impl.jdbcjobstore.JobStoreTX
      org.quartz.scheduler.instanceId: AUTO
      org.quartz.jobStore.tablePrefix: QRTZ_
      org.quartz.jobStore.acquireTriggersWithinLock: true
      org.quartz.scheduler.instanceName: DolphinScheduler
      org.quartz.threadPool.class: org.quartz.simpl.SimpleThreadPool
      org.quartz.jobStore.useProperties: false
      org.quartz.threadPool.makeThreadsDaemons: true
      org.quartz.threadPool.threadCount: 25
      org.quartz.jobStore.misfireThreshold: 60000
      org.quartz.scheduler.makeSchedulerThreadDaemon: true
#      org.quartz.jobStore.driverDelegateClass: org.quartz.impl.jdbcjobstore.PostgreSQLDelegate
      org.quartz.jobStore.driverDelegateClass: org.quartz.impl.jdbcjobstore.StdJDBCDelegate
      org.quartz.jobStore.clusterCheckinInterval: 5000

registry:
  type: zookeeper
  zookeeper:
    namespace: dolphinscheduler
#    connect-string: localhost:2181
    connect-string: 10.255.158.70:2181
    retry-policy:
      base-sleep-time: 60ms
      max-sleep: 300ms
      max-retries: 5
    session-timeout: 30s
    connection-timeout: 9s
    block-until-connected: 600ms
    digest: ~

master:
  listen-port: 5678
  # master fetch command num
  fetch-command-num: 10
  # master prepare execute thread number to limit handle commands in parallel
  pre-exec-threads: 10
  # master execute thread number to limit process instances in parallel
  exec-threads: 100
  # master dispatch task number per batch
  dispatch-task-number: 3
  # master host selector to select a suitable worker, default value: LowerWeight. Optional values include random, round_robin, lower_weight
  host-selector: lower_weight
  # master heartbeat interval, the unit is second
  heartbeat-interval: 10
  # master commit task retry times
  task-commit-retry-times: 5
  # master commit task interval, the unit is millisecond
  task-commit-interval: 1000
  state-wheel-interval: 5
  # master max cpuload avg, only higher than the system cpu load average, master server can schedule. default value -1: the number of cpu cores * 2
  max-cpu-load-avg: -1
  # master reserved memory, only lower than system available memory, master server can schedule. default value 0.3, the unit is G
  reserved-memory: 0.3
  # failover interval, the unit is minute
  failover-interval: 10
  # kill yarn jon when failover taskInstance, default true
  kill-yarn-job-when-task-failover: true

server:
  port: 5679

management:
  endpoints:
    web:
      exposure:
        include: '*'
  metrics:
    tags:
      application: ${spring.application.name}

metrics:
  enabled: true

# Override by profile

---
spring:
  config:
    activate:
      on-profile: mysql
  datasource:
    driver-class-name: com.mysql.jdbc.Driver
    url: jdbc:mysql://127.0.0.1:3306/dolphinscheduler?useUnicode=true&serverTimezone=UTC&characterEncoding=UTF-8&autoReconnect=true&useSSL=false&zeroDateTimeBehavior=convertToNull
  quartz:
    properties:
      org.quartz.jobStore.driverDelegateClass: org.quartz.impl.jdbcjobstore.StdJDBCDelegate

dolphinscheduler-worker application.yaml

spring:
  banner:
    charset: UTF-8
  application:
    name: worker-server
  jackson:
    time-zone: UTC
    date-format: "yyyy-MM-dd HH:mm:ss"
  datasource:
    #driver-class-name: org.postgresql.Driver
    #url: jdbc:postgresql://127.0.0.1:5432/dolphinscheduler
    driver-class-name: com.mysql.jdbc.Driver
    url: jdbc:mysql://127.0.0.1:3306/dolphinscheduler?useUnicode=true&serverTimezone=UTC&characterEncoding=UTF-8&autoReconnect=true&useSSL=false&zeroDateTimeBehavior=convertToNull
    username: root
    #password: root
    password:
    hikari:
      connection-test-query: select 1
      minimum-idle: 5
      auto-commit: true
      validation-timeout: 3000
      pool-name: DolphinScheduler
      maximum-pool-size: 50
      connection-timeout: 30000
      idle-timeout: 600000
      leak-detection-threshold: 0
      initialization-fail-timeout: 1

registry:
  type: zookeeper
  zookeeper:
    namespace: dolphinscheduler
#    connect-string: localhost:2181
    connect-string: 10.255.158.70:2181
    retry-policy:
      base-sleep-time: 60ms
      max-sleep: 300ms
      max-retries: 5
    session-timeout: 30s
    connection-timeout: 9s
    block-until-connected: 600ms
    digest: ~

worker:
  # worker listener port
  listen-port: 1234
  # worker execute thread number to limit task instances in parallel
  exec-threads: 100
  # worker heartbeat interval, the unit is second
  heartbeat-interval: 10
  # worker host weight to dispatch tasks, default value 100
  host-weight: 100
  # worker tenant auto create
  tenant-auto-create: true
  # worker max cpuload avg, only higher than the system cpu load average, worker server can be dispatched tasks. default value -1: the number of cpu cores * 2
  max-cpu-load-avg: -1
  # worker reserved memory, only lower than system available memory, worker server can be dispatched tasks. default value 0.3, the unit is G
  reserved-memory: 0.3
  # default worker groups separated by comma, like 'worker.groups=default,test'
  groups:
    - default
  # alert server listen host
  alert-listen-host: localhost
  alert-listen-port: 50052

server:
  port: 1235

management:
  endpoints:
    web:
      exposure:
        include: '*'
  metrics:
    tags:
      application: ${spring.application.name}

metrics:
  enabled: true

主要关注数据库，quartz, zookeeper, masker, worker配置

2.2 Api主要任务操作接口

其他业务接口可以不用关注，只需要关注最最主要的流程上线功能接口，此接口可以发散出所有的任务调度相关的代码，接口如下：/dolphinscheduler/projects/{projectCode}/schedules/{id}/online；此接口会将定义的流程提交到Quartz调度框架；代码如下：

public Map<String, Object> setScheduleState(User loginUser,
                                                long projectCode,
                                                Integer id,
                                                ReleaseState scheduleStatus) {
        Map<String, Object> result = new HashMap<>();

        Project project = projectMapper.queryByCode(projectCode);
        // check project auth
        boolean hasProjectAndPerm = projectService.hasProjectAndPerm(loginUser, project, result);
        if (!hasProjectAndPerm) {
            return result;
        }

        // check schedule exists
        Schedule scheduleObj = scheduleMapper.selectById(id);

        if (scheduleObj == null) {
            putMsg(result, Status.SCHEDULE_CRON_NOT_EXISTS, id);
            return result;
        }
        // check schedule release state
        if (scheduleObj.getReleaseState() == scheduleStatus) {
            logger.info("schedule release is already {},needn't to change schedule id: {} from {} to {}",
                    scheduleObj.getReleaseState(), scheduleObj.getId(), scheduleObj.getReleaseState(), scheduleStatus);
            putMsg(result, Status.SCHEDULE_CRON_REALEASE_NEED_NOT_CHANGE, scheduleStatus);
            return result;
        }
        ProcessDefinition processDefinition = processDefinitionMapper.queryByCode(scheduleObj.getProcessDefinitionCode());
        if (processDefinition == null || projectCode != processDefinition.getProjectCode()) {
            putMsg(result, Status.PROCESS_DEFINE_NOT_EXIST, String.valueOf(scheduleObj.getProcessDefinitionCode()));
            return result;
        }
        List<ProcessTaskRelation> processTaskRelations = processTaskRelationMapper.queryByProcessCode(projectCode, scheduleObj.getProcessDefinitionCode());
        if (processTaskRelations.isEmpty()) {
            putMsg(result, Status.PROCESS_DAG_IS_EMPTY);
            return result;
        }
        if (scheduleStatus == ReleaseState.ONLINE) {
            // check process definition release state
            if (processDefinition.getReleaseState() != ReleaseState.ONLINE) {
                logger.info("not release process definition id: {} , name : {}",
                        processDefinition.getId(), processDefinition.getName());
                putMsg(result, Status.PROCESS_DEFINE_NOT_RELEASE, processDefinition.getName());
                return result;
            }
            // check sub process definition release state
            List<Long> subProcessDefineCodes = new ArrayList<>();
            processService.recurseFindSubProcess(processDefinition.getCode(), subProcessDefineCodes);
            if (!subProcessDefineCodes.isEmpty()) {
                List<ProcessDefinition> subProcessDefinitionList =
                        processDefinitionMapper.queryByCodes(subProcessDefineCodes);
                if (subProcessDefinitionList != null && !subProcessDefinitionList.isEmpty()) {
                    for (ProcessDefinition subProcessDefinition : subProcessDefinitionList) {
                        /**
                         * if there is no online process, exit directly
                         */
                        if (subProcessDefinition.getReleaseState() != ReleaseState.ONLINE) {
                            logger.info("not release process definition id: {} , name : {}",
                                    subProcessDefinition.getId(), subProcessDefinition.getName());
                            putMsg(result, Status.PROCESS_DEFINE_NOT_RELEASE, String.valueOf(subProcessDefinition.getId()));
                            return result;
                        }
                    }
                }
            }
        }

        // check master server exists
        List<Server> masterServers = monitorService.getServerListFromRegistry(true);

        if (masterServers.isEmpty()) {
            putMsg(result, Status.MASTER_NOT_EXISTS);
            return result;
        }

        // set status
        scheduleObj.setReleaseState(scheduleStatus);

        scheduleMapper.updateById(scheduleObj);

        try {
            switch (scheduleStatus) {
                case ONLINE:
                    logger.info("Call master client set schedule online, project id: {}, flow id: {},host: {}", project.getId(), processDefinition.getId(), masterServers);
                    setSchedule(project.getId(), scheduleObj);
                    break;
                case OFFLINE:
                    logger.info("Call master client set schedule offline, project id: {}, flow id: {},host: {}", project.getId(), processDefinition.getId(), masterServers);
                    deleteSchedule(project.getId(), id);
                    break;
                default:
                    putMsg(result, Status.SCHEDULE_STATUS_UNKNOWN, scheduleStatus.toString());
                    return result;
            }
        } catch (Exception e) {
            result.put(Constants.MSG, scheduleStatus == ReleaseState.ONLINE ? "set online failure" : "set offline failure");
            throw new ServiceException(result.get(Constants.MSG).toString(), e);
        }

        putMsg(result, Status.SUCCESS);
        return result;
    }

public void setSchedule(int projectId, Schedule schedule) {
        logger.info("set schedule, project id: {}, scheduleId: {}", projectId, schedule.getId());

        quartzExecutor.addJob(ProcessScheduleJob.class, projectId, schedule);
    }

public void addJob(Class<? extends Job> clazz, int projectId, final Schedule schedule) {
        String jobName = this.buildJobName(schedule.getId());
        String jobGroupName = this.buildJobGroupName(projectId);

        Map<String, Object> jobDataMap = this.buildDataMap(projectId, schedule);
        String cronExpression = schedule.getCrontab();
        String timezoneId = schedule.getTimezoneId();

        /**
         * transform from server default timezone to schedule timezone
         * e.g. server default timezone is `UTC`
         * user set a schedule with startTime `2022-04-28 10:00:00`, timezone is `Asia/Shanghai`,
         * api skip to transform it and save into databases directly, startTime `2022-04-28 10:00:00`, timezone is `UTC`, which actually added 8 hours,
         * so when add job to quartz, it should recover by transform timezone
         */
        Date startDate = DateUtils.transformTimezoneDate(schedule.getStartTime(), timezoneId);
        Date endDate = DateUtils.transformTimezoneDate(schedule.getEndTime(), timezoneId);

        lock.writeLock().lock();
        try {

            JobKey jobKey = new JobKey(jobName, jobGroupName);
            JobDetail jobDetail;
            //add a task (if this task already exists, return this task directly)
            if (scheduler.checkExists(jobKey)) {

                jobDetail = scheduler.getJobDetail(jobKey);
                jobDetail.getJobDataMap().putAll(jobDataMap);
            } else {
                jobDetail = newJob(clazz).withIdentity(jobKey).build();

                jobDetail.getJobDataMap().putAll(jobDataMap);

                scheduler.addJob(jobDetail, false, true);

                logger.info("Add job, job name: {}, group name: {}",
                        jobName, jobGroupName);
            }

            TriggerKey triggerKey = new TriggerKey(jobName, jobGroupName);
            /*
             * Instructs the Scheduler that upon a mis-fire
             * situation, the CronTrigger wants to have it's
             * next-fire-time updated to the next time in the schedule after the
             * current time (taking into account any associated Calendar),
             * but it does not want to be fired now.
             */
            CronTrigger cronTrigger = newTrigger()
                    .withIdentity(triggerKey)
                    .startAt(startDate)
                    .endAt(endDate)
                    .withSchedule(
                            cronSchedule(cronExpression)
                                    .withMisfireHandlingInstructionDoNothing()
                                    .inTimeZone(DateUtils.getTimezone(timezoneId))
                    )
                    .forJob(jobDetail).build();

            if (scheduler.checkExists(triggerKey)) {
                // updateProcessInstance scheduler trigger when scheduler cycle changes
                CronTrigger oldCronTrigger = (CronTrigger) scheduler.getTrigger(triggerKey);
                String oldCronExpression = oldCronTrigger.getCronExpression();

                if (!StringUtils.equalsIgnoreCase(cronExpression, oldCronExpression)) {
                    // reschedule job trigger
                    scheduler.rescheduleJob(triggerKey, cronTrigger);
                    logger.info("reschedule job trigger, triggerName: {}, triggerGroupName: {}, cronExpression: {}, startDate: {}, endDate: {}",
                            jobName, jobGroupName, cronExpression, startDate, endDate);
                }
            } else {
                scheduler.scheduleJob(cronTrigger);
                logger.info("schedule job trigger, triggerName: {}, triggerGroupName: {}, cronExpression: {}, startDate: {}, endDate: {}",
                        jobName, jobGroupName, cronExpression, startDate, endDate);
            }

        } catch (Exception e) {
            throw new ServiceException("add job failed", e);
        } finally {
            lock.writeLock().unlock();
        }
    }

2.3 Quaterz架构与运行流程

2.3.1 概念与架构

Quartz 框架主要包括如下几个部分：

SchedulerFactory：任务调度工厂，主要负责管理任务调度器
Scheduler ：任务调度器，主要负责任务调度，以及操作任务的相关接口
Job ：任务接口，实现类包含具体任务业务代码
JobDetail：用于定义作业的实例。
Trigger：任务触发器，主要存放 Job 执行的时间策略。例如多久执行一次，什么时候执行，以什么频率执行等等。
JobBuilder ：用于定义/构建 JobDetail 实例，用于定义作业的实例。
TriggerBuilder ：用于定义/构建触发器实例。
Calendar：Trigger 扩展对象，可以排除或者包含某个指定的时间点（如排除法定节假日）
JobStore：存储作业和任务调度期间的状态
Scheduler 的生命期，从 SchedulerFactory 创建它时开始，到 Scheduler 调用shutdown() 方法时结束；Scheduler 被创建后，可以增加、删除和列举 Job 和 Trigger，以及执行其它与调度相关的操作（如暂停 Trigger）。但是，Scheduler 只有在调用 start() 方法后，才会真正地触发 trigger（即执行 job）

2.3.2 初始化与执行流程

quartz的基本原理就是通过scheduler来调度被JobDetail和Trigger定义的安装Job接口规范实现的自定义任务业务对象，来完成任务的调度。基本逻辑如下图；

代码时序图如下：

基本内容就是初始化任务调度容器scheduler，以及容器所需的线程池，数据交互对象jobStore，任务处理线程quartzSchedulerThread用来处理job接口的具体业务实现类。DolphinScheduler的业务类是ProcessScheduleJob，主要功能就是根据调度信息往commond表中写数据。

2.3.3 集群运转

需要注意的事：

当quartz采用集群形式部署的时候，存储介质不能使用内存的形式，也就是不能使用JobStoreRAM。
quartz集群对于对于需要被调度的Triggers实例的扫描是使用数据库锁TRIGGER_ACCESS来完成的，保障此扫描过程只能被一个quartz实例获取到。代码如下：

 public List<OperableTrigger> acquireNextTriggers(final long noLaterThan, final int maxCount, final long timeWindow)
        throws JobPersistenceException {
        
        String lockName;
        if(isAcquireTriggersWithinLock() || maxCount > 1) { 
            lockName = LOCK_TRIGGER_ACCESS;
        } else {
            lockName = null;
        }
        return executeInNonManagedTXLock(lockName, 
                new TransactionCallback<List<OperableTrigger>>() {
                    public List<OperableTrigger> execute(Connection conn) throws JobPersistenceException {
                        return acquireNextTrigger(conn, noLaterThan, maxCount, timeWindow);
                    }
                },
                new TransactionValidator<List<OperableTrigger>>() {
                    public Boolean validate(Connection conn, List<OperableTrigger> result) throws JobPersistenceException {
                        try {
                            List<FiredTriggerRecord> acquired = getDelegate().selectInstancesFiredTriggerRecords(conn, getInstanceId());
                            Set<String> fireInstanceIds = new HashSet<String>();
                            for (FiredTriggerRecord ft : acquired) {
                                fireInstanceIds.add(ft.getFireInstanceId());
                            }
                            for (OperableTrigger tr : result) {
                                if (fireInstanceIds.contains(tr.getFireInstanceId())) {
                                    return true;
                                }
                            }
                            return false;
                        } catch (SQLException e) {
                            throw new JobPersistenceException("error validating trigger acquisition", e);
                        }
                    }
                });
    }

集群失败实例恢复需要注意的是各个实例恢复各自实例对应的异常实例，因为数据库有调度容器的instanceId信息。代码如下：

 protected void clusterRecover(Connection conn, List<SchedulerStateRecord> failedInstances)
        throws JobPersistenceException {

        if (failedInstances.size() > 0) {

            long recoverIds = System.currentTimeMillis();

            logWarnIfNonZero(failedInstances.size(),
                    "ClusterManager: detected " + failedInstances.size()
                            + " failed or restarted instances.");
            try {
                for (SchedulerStateRecord rec : failedInstances) {
                    getLog().info(
                            "ClusterManager: Scanning for instance \""
                                    + rec.getSchedulerInstanceId()
                                    + "\"'s failed in-progress jobs.");

                    List<FiredTriggerRecord> firedTriggerRecs = getDelegate()
                            .selectInstancesFiredTriggerRecords(conn,
                                    rec.getSchedulerInstanceId());

                    int acquiredCount = 0;
                    int recoveredCount = 0;
                    int otherCount = 0;

                    Set<TriggerKey> triggerKeys = new HashSet<TriggerKey>();

                    for (FiredTriggerRecord ftRec : firedTriggerRecs) {

                        TriggerKey tKey = ftRec.getTriggerKey();
                        JobKey jKey = ftRec.getJobKey();

                        triggerKeys.add(tKey);

                        // release blocked triggers..
                        if (ftRec.getFireInstanceState().equals(STATE_BLOCKED)) {
                            getDelegate()
                                    .updateTriggerStatesForJobFromOtherState(
                                            conn, jKey,
                                            STATE_WAITING, STATE_BLOCKED);
                        } else if (ftRec.getFireInstanceState().equals(STATE_PAUSED_BLOCKED)) {
                            getDelegate()
                                    .updateTriggerStatesForJobFromOtherState(
                                            conn, jKey,
                                            STATE_PAUSED, STATE_PAUSED_BLOCKED);
                        }

                        // release acquired triggers..
                        if (ftRec.getFireInstanceState().equals(STATE_ACQUIRED)) {
                            getDelegate().updateTriggerStateFromOtherState(
                                    conn, tKey, STATE_WAITING,
                                    STATE_ACQUIRED);
                            acquiredCount++;
                        } else if (ftRec.isJobRequestsRecovery()) {
                            // handle jobs marked for recovery that were not fully
                            // executed..
                            if (jobExists(conn, jKey)) {
                                @SuppressWarnings("deprecation")
                                SimpleTriggerImpl rcvryTrig = new SimpleTriggerImpl(
                                        "recover_"
                                                + rec.getSchedulerInstanceId()
                                                + "_"
                                                + String.valueOf(recoverIds++),
                                        Scheduler.DEFAULT_RECOVERY_GROUP,
                                        new Date(ftRec.getScheduleTimestamp()));
                                rcvryTrig.setJobName(jKey.getName());
                                rcvryTrig.setJobGroup(jKey.getGroup());
                                rcvryTrig.setMisfireInstruction(SimpleTrigger.MISFIRE_INSTRUCTION_IGNORE_MISFIRE_POLICY);
                                rcvryTrig.setPriority(ftRec.getPriority());
                                JobDataMap jd = getDelegate().selectTriggerJobDataMap(conn, tKey.getName(), tKey.getGroup());
                                jd.put(Scheduler.FAILED_JOB_ORIGINAL_TRIGGER_NAME, tKey.getName());
                                jd.put(Scheduler.FAILED_JOB_ORIGINAL_TRIGGER_GROUP, tKey.getGroup());
                                jd.put(Scheduler.FAILED_JOB_ORIGINAL_TRIGGER_FIRETIME_IN_MILLISECONDS, String.valueOf(ftRec.getFireTimestamp()));
                                jd.put(Scheduler.FAILED_JOB_ORIGINAL_TRIGGER_SCHEDULED_FIRETIME_IN_MILLISECONDS, String.valueOf(ftRec.getScheduleTimestamp()));
                                rcvryTrig.setJobDataMap(jd);

                                rcvryTrig.computeFirstFireTime(null);
                                storeTrigger(conn, rcvryTrig, null, false,
                                        STATE_WAITING, false, true);
                                recoveredCount++;
                            } else {
                                getLog()
                                        .warn(
                                                "ClusterManager: failed job '"
                                                        + jKey
                                                        + "' no longer exists, cannot schedule recovery.");
                                otherCount++;
                            }
                        } else {
                            otherCount++;
                        }

                        // free up stateful job's triggers
                        if (ftRec.isJobDisallowsConcurrentExecution()) {
                            getDelegate()
                                    .updateTriggerStatesForJobFromOtherState(
                                            conn, jKey,
                                            STATE_WAITING, STATE_BLOCKED);
                            getDelegate()
                                    .updateTriggerStatesForJobFromOtherState(
                                            conn, jKey,
                                            STATE_PAUSED, STATE_PAUSED_BLOCKED);
                        }
                    }

                    getDelegate().deleteFiredTriggers(conn,
                            rec.getSchedulerInstanceId());

                    // Check if any of the fired triggers we just deleted were the last fired trigger
                    // records of a COMPLETE trigger.
                    int completeCount = 0;
                    for (TriggerKey triggerKey : triggerKeys) {

                        if (getDelegate().selectTriggerState(conn, triggerKey).
                                equals(STATE_COMPLETE)) {
                            List<FiredTriggerRecord> firedTriggers =
                                    getDelegate().selectFiredTriggerRecords(conn, triggerKey.getName(), triggerKey.getGroup());
                            if (firedTriggers.isEmpty()) {

                                if (removeTrigger(conn, triggerKey)) {
                                    completeCount++;
                                }
                            }
                        }
                    }

                    logWarnIfNonZero(acquiredCount,
                            "ClusterManager: ......Freed " + acquiredCount
                                    + " acquired trigger(s).");
                    logWarnIfNonZero(completeCount,
                            "ClusterManager: ......Deleted " + completeCount
                                    + " complete triggers(s).");
                    logWarnIfNonZero(recoveredCount,
                            "ClusterManager: ......Scheduled " + recoveredCount
                                    + " recoverable job(s) for recovery.");
                    logWarnIfNonZero(otherCount,
                            "ClusterManager: ......Cleaned-up " + otherCount
                                    + " other failed job(s).");

                    if (!rec.getSchedulerInstanceId().equals(getInstanceId())) {
                        getDelegate().deleteSchedulerState(conn,
                                rec.getSchedulerInstanceId());
                    }
                }
            } catch (Throwable e) {
                throw new JobPersistenceException("Failure recovering jobs: "
                        + e.getMessage(), e);
            }
        }
    }

2.4 Master启动与执行流程

对于DolphinScheduler主要运行逻辑如下图所示，Master和worker基本都包含，主要包含Master，所以放在此章节：

2.4.1 概念与执行逻辑

关键概念：
Quartz相关：
Scheduler（任务调度容器，一般都是StdScheduler实例）。
ProcessScheduleJob；（实现Quarts调度框架的Job接口的业务类，专门生成DolphinScheduler数据库业务表t_ds_commond数据）。
DolphinScheduler相关：
NettyRemotingServer（netty服务端，包含netty服务端serverBootstrap对象与netty服务端业务处理对象serverHandler），
NettyServerHandler：（netty服务端业务处理类：包含各类处理器以及处理器对应的执行线程池）；

TaskPluginManager（任务插件管理器，不同类型的任务以插件的形式管理，在应用服务启动的时候，通过@AutoService加载实现了TaskChannelFactory接口的工厂信息到数据库，通过工厂对象来加载各类TaskChannel实现类到缓存）；

MasterRegistryClient（master操作zk的客户端，封装了master对于zk的所有操作，注册，查询，删除等）；

MasterSchedulerService（扫描服务，包含业务执行线程和work包含的nettyhe护短，负责任务调度业务，slot来控制集群模式下任务不被重复调度，底层实现是zookeeper分布式锁），
WorkflowExecuteThread（真正的业务处理线程，. 通过插槽获取命令commond，执行之前会校验slot的变化，如果变化不执行，关键功能就是构建任务相关的参数，定义，优先级等，然后发送到队列，供队列处理线程消费），
CommonTaskProcessor（普通任务处理器，实现ITaskProcessor接口，根据业务分为普通，依赖，子任务，阻塞，条件任务类型，包含了任务的提交，运行，分发，杀死等业务，通过@AutoService加载的类，根本就是封装了对）。
TaskPriorityQueueImpl（任务队列，负责任务队列的存储控制）；
TaskPriorityQueueConsumer（任务队列消费线程，负责任务的根据负载均衡策略在worker之间分发与执行）；
ServerNodeManager （节点信息控制器，负责节点注册信息更新与槽位（slot）变更，底层实现是zookeeper分布式锁的应用）；

EventExecuteService（事件处理线程，通过缓存起来的任务处理线程，处理每个任务在处理过程中注册在线程事件队列中的事件）。
FailoverExecuteThread（故障转移线程，包含Master和worker的）。
MasterRegistryDataListener（托管在zk管理框架cautor的故障监听器，负责对worker和master注册在zk上的节点的新增和删除）

主节点容错代码如下，业务解释见1.5.1master容错解释：

 private void failoverMasterWithLock(String masterHost) {
        String failoverPath = getFailoverLockPath(NodeType.MASTER, masterHost);
        try {
            registryClient.getLock(failoverPath);
            this.failoverMaster(masterHost);
        } catch (Exception e) {
            LOGGER.error("{} server failover failed, host:{}", NodeType.MASTER, masterHost, e);
        } finally {
            registryClient.releaseLock(failoverPath);
        }
    }
 /**
     * failover master
     * 
     * failover process instance and associated task instance
     *故障转移流程实例和关联的任务实例
     * @param masterHost master host
     */
    private void failoverMaster(String masterHost) {
        if (StringUtils.isEmpty(masterHost)) {
            return;
        }
        Date serverStartupTime = getServerStartupTime(NodeType.MASTER, masterHost);
        long startTime = System.currentTimeMillis();
        List<ProcessInstance> needFailoverProcessInstanceList = processService.queryNeedFailoverProcessInstances(masterHost);
        LOGGER.info("start master[{}] failover, process list size:{}", masterHost, needFailoverProcessInstanceList.size());
        List<Server> workerServers = registryClient.getServerList(NodeType.WORKER);
        for (ProcessInstance processInstance : needFailoverProcessInstanceList) {
            if (Constants.NULL.equals(processInstance.getHost())) {
                continue;
            }

            List<TaskInstance> validTaskInstanceList = processService.findValidTaskListByProcessId(processInstance.getId());
            for (TaskInstance taskInstance : validTaskInstanceList) {
                LOGGER.info("failover task instance id: {}, process instance id: {}", taskInstance.getId(), taskInstance.getProcessInstanceId());
                failoverTaskInstance(processInstance, taskInstance, workerServers);
            }

            if (serverStartupTime != null && processInstance.getRestartTime() != null
                && processInstance.getRestartTime().after(serverStartupTime)) {
                continue;
            }

            LOGGER.info("failover process instance id: {}", processInstance.getId());
            //updateProcessInstance host is null and insert into command
            processInstance.setHost(Constants.NULL);
            processService.processNeedFailoverProcessInstances(processInstance);
        }

        LOGGER.info("master[{}] failover end, useTime:{}ms", masterHost, System.currentTimeMillis() - startTime);
    }

2.4.2 集群与槽（slot）

其实这里的采用zookeer分布式锁准确也不准确，为什么这么说，因为slot是commondId对master列表长度取模来计算的，而master列表长度的刷新是Zookeeper分布式锁来控制，master节点的调度数据扫描是通过slot来控制的，所以说2.4的运行逻辑图中对zk部分的描述不够准确。具体代码如下：
slot刷新：

private void updateMasterNodes() {
        MASTER_SLOT = 0;
        MASTER_SIZE = 0;
        this.masterNodes.clear();
        String nodeLock = Constants.REGISTRY_DOLPHINSCHEDULER_LOCK_MASTERS;
        try {
            registryClient.getLock(nodeLock);
            Collection<String> currentNodes = registryClient.getMasterNodesDirectly();
            List<Server> masterNodes = registryClient.getServerList(NodeType.MASTER);
            syncMasterNodes(currentNodes, masterNodes);
        } catch (Exception e) {
            logger.error("update master nodes error", e);
        } finally {
            registryClient.releaseLock(nodeLock);
        }

    }
/**
     * sync master nodes
     *
     * @param nodes master nodes
     */
    private void syncMasterNodes(Collection<String> nodes, List<Server> masterNodes) {
        masterLock.lock();
        try {
            String addr = NetUtils.getAddr(NetUtils.getHost(), masterConfig.getListenPort());
            this.masterNodes.addAll(nodes);
            this.masterPriorityQueue.clear();
            this.masterPriorityQueue.putList(masterNodes);
            int index = masterPriorityQueue.getIndex(addr);
            if (index >= 0) {
                MASTER_SIZE = nodes.size();
                MASTER_SLOT = index;
            } else {
                logger.warn("current addr:{} is not in active master list", addr);
            }
            logger.info("update master nodes, master size: {}, slot: {}, addr: {}", MASTER_SIZE, MASTER_SLOT, addr);
        } finally {
            masterLock.unlock();
        }
    }

slot应用：

/**
     * 1. get command by slot
     * 2. donot handle command if slot is empty
     */
    /** * 1. 通过插槽获取命令 * 2. 如果插槽为空，则不处理命令 */
    private void scheduleProcess() throws Exception {
        List<Command> commands = findCommands();
        if (CollectionUtils.isEmpty(commands)) {
            //indicate that no command ,sleep for 1s
            Thread.sleep(Constants.SLEEP_TIME_MILLIS);
            return;
        }

        List<ProcessInstance> processInstances = command2ProcessInstance(commands);
        if (CollectionUtils.isEmpty(processInstances)) {
            return;
        }

        for (ProcessInstance processInstance : processInstances) {
            if (processInstance == null) {
                continue;
            }

            WorkflowExecuteThread workflowExecuteThread = new WorkflowExecuteThread(
                    processInstance
                    , processService
                    , nettyExecutorManager
                    , processAlertManager
                    , masterConfig
                    , stateWheelExecuteThread);

            this.processInstanceExecCacheManager.cache(processInstance.getId(), workflowExecuteThread);
            if (processInstance.getTimeout() > 0) {
                stateWheelExecuteThread.addProcess4TimeoutCheck(processInstance);
            }
            workflowExecuteThreadPool.startWorkflow(workflowExecuteThread);
        }
    }
private List<Command> findCommands() {
        int pageNumber = 0;
        int pageSize = masterConfig.getFetchCommandNum();
        List<Command> result = new ArrayList<>();
        if (Stopper.isRunning()) {
            int thisMasterSlot = ServerNodeManager.getSlot();
            int masterCount = ServerNodeManager.getMasterSize();
            if (masterCount > 0) {
                result = processService.findCommandPageBySlot(pageSize, pageNumber, masterCount, thisMasterSlot);
            }
        }
        return result;
    }
@Override
    public List<Command> findCommandPageBySlot(int pageSize, int pageNumber, int masterCount, int thisMasterSlot) {
        if (masterCount <= 0) {
            return Lists.newArrayList();
        }
        return commandMapper.queryCommandPageBySlot(pageSize, pageNumber * pageSize, masterCount, thisMasterSlot);
    }
    
 <select id="queryCommandPageBySlot" resultType="org.apache.dolphinscheduler.dao.entity.Command">
        select *
        from t_ds_command
        where id % #{masterCount} = #{thisMasterSlot}
        order by process_instance_priority, id asc
            limit #{limit} offset #{offset}
    </select>

##槽位检查
 private List<ProcessInstance> command2ProcessInstance(List<Command> commands) {
        List<ProcessInstance> processInstances = Collections.synchronizedList(new ArrayList<>(commands.size()));
        CountDownLatch latch = new CountDownLatch(commands.size());
        for (final Command command : commands) {
            masterPrepareExecService.execute(() -> {
                try {
                    // slot check again
                    SlotCheckState slotCheckState = slotCheck(command);
                    if (slotCheckState.equals(SlotCheckState.CHANGE) || slotCheckState.equals(SlotCheckState.INJECT)) {
                        logger.info("handle command {} skip, slot check state: {}", command.getId(), slotCheckState);
                        return;
                    }
                    ProcessInstance processInstance = processService.handleCommand(logger,
                            getLocalAddress(),
                            command);
                    if (processInstance != null) {
                        processInstances.add(processInstance);
                        logger.info("handle command {} end, create process instance {}", command.getId(), processInstance.getId());
                    }
                } catch (Exception e) {
                    logger.error("handle command error ", e);
                    processService.moveToErrorCommand(command, e.toString());
                } finally {
                    latch.countDown();
                }
            });
        }

        try {
            // make sure to finish handling command each time before next scan
            latch.await();
        } catch (InterruptedException e) {
            logger.error("countDownLatch await error ", e);
        }

        return processInstances;
    }

private SlotCheckState slotCheck(Command command) {
        int slot = ServerNodeManager.getSlot();
        int masterSize = ServerNodeManager.getMasterSize();
        SlotCheckState state;
        if (masterSize <= 0) {
            state = SlotCheckState.CHANGE;
        } else if (command.getId() % masterSize == slot) {
            state = SlotCheckState.PASS;
        } else {
            state = SlotCheckState.INJECT;
        }
        return state;
    }

2.4.3 代码执行流程

代码过于繁琐，此处不再一一粘贴代码解释，了解各个类的功能，自行看代码更加清晰。

2.5 Work启动与执行流程

2.5.1 概念与执行逻辑

NettyRemotingServer（worker包含的netty服务端）
WorkerRegistryClient（zk客户端，封装了worker与zk相关的操作，注册，查询，删除等）
TaskPluginManager（任务插件管理器，封装了插件加载逻辑和任务实际执行业务的抽象）
WorkerManagerThread（任务工作线程生成器，消费netty处理器推进队列的任务信息，并生成任务执行线程提交线程池管理）
TaskExecuteProcessor（Netty任务执行处理器，生成master分发到work的任务信息，并推送到队列）
TaskExecuteThread（任务执行线程）
TaskCallbackService（任务回调线程，与master包含的netty client通信）
AbstractTask（任务实际业务的抽象类，子类包含实际的任务执行业务，SqlTask，DataXTask等）
RetryReportTaskStatusThread（不关注）

2.5.2 代码执行流程

worker节点代码时序图如下：

代码过于繁琐，此处不再一一粘贴代码解释，了解各个类的功能，自行看代码更加清晰。

2.6 rpc交互

因为节点和应用服务之间的rpc通信都是基于netty实现的，netty相关知识不在这里过多的讲解，当前章节只涉及Master与Worker之间的交互模式的设计与实现。
整体设计如下：

2.6.1 Master与Worker交互

Master与worker之间的业务逻辑的交互是基于Netty服务端与客户端来实现Rpc通信的，Master和Worker启动的时候会将自己的Netty服务端信息注册到zk相应的节点上，master的任务分发线程和任务杀死等业务运行时，拉取zk上的worker节点信息，根据负载均衡策略选择一个节点（下章介绍负载均衡），构建netty客户端与worker的netty服务端通信，worker收到master的rpc请求之后会缓存channel信息并处理对应业务，同时callback回调线程会获取缓存的通道来执行回调操作，这样就形成的闭环。任务的执行杀死，以及回调状态处理等操作都是通过netty客户端与服务端绑定的processer处理器来进行的。
master部分具体代码如下：
master启动的时候会初始化nettyserver，注册对应的请求处理器到nettyHandler并启动：

 @PostConstruct
    public void run() throws SchedulerException {
        // init remoting server
        NettyServerConfig serverConfig = new NettyServerConfig();
        serverConfig.setListenPort(masterConfig.getListenPort());
        this.nettyRemotingServer = new NettyRemotingServer(serverConfig);
        this.nettyRemotingServer.registerProcessor(CommandType.TASK_EXECUTE_RESPONSE, taskExecuteResponseProcessor);
        this.nettyRemotingServer.registerProcessor(CommandType.TASK_EXECUTE_RUNNING, taskExecuteRunningProcessor);
        this.nettyRemotingServer.registerProcessor(CommandType.TASK_KILL_RESPONSE, taskKillResponseProcessor);
        this.nettyRemotingServer.registerProcessor(CommandType.STATE_EVENT_REQUEST, stateEventProcessor);
        this.nettyRemotingServer.registerProcessor(CommandType.TASK_FORCE_STATE_EVENT_REQUEST, taskEventProcessor);
        this.nettyRemotingServer.registerProcessor(CommandType.TASK_WAKEUP_EVENT_REQUEST, taskEventProcessor);
        this.nettyRemotingServer.registerProcessor(CommandType.CACHE_EXPIRE, cacheProcessor);

        // logger server
        this.nettyRemotingServer.registerProcessor(CommandType.GET_LOG_BYTES_REQUEST, loggerRequestProcessor);
        this.nettyRemotingServer.registerProcessor(CommandType.ROLL_VIEW_LOG_REQUEST, loggerRequestProcessor);
        this.nettyRemotingServer.registerProcessor(CommandType.VIEW_WHOLE_LOG_REQUEST, loggerRequestProcessor);
        this.nettyRemotingServer.registerProcessor(CommandType.REMOVE_TAK_LOG_REQUEST, loggerRequestProcessor);

        this.nettyRemotingServer.start();

        // install task plugin
        this.taskPluginManager.installPlugin();

        // self tolerant
        this.masterRegistryClient.init();
        this.masterRegistryClient.start();
        this.masterRegistryClient.setRegistryStoppable(this);

        this.masterSchedulerService.init();
        this.masterSchedulerService.start();

        this.eventExecuteService.start();
        this.failoverExecuteThread.start();

        this.scheduler.start();

        Runtime.getRuntime().addShutdownHook(new Thread(() -> {
            if (Stopper.isRunning()) {
                close("shutdownHook");
            }
        }));
    }
 /**
     * server start
     */
    public void start() {
        if (isStarted.compareAndSet(false, true)) {
            this.serverBootstrap
                    .group(this.bossGroup, this.workGroup)
                    .channel(NettyUtils.getServerSocketChannelClass())
                    .option(ChannelOption.SO_REUSEADDR, true)
                    .option(ChannelOption.SO_BACKLOG, serverConfig.getSoBacklog())
                    .childOption(ChannelOption.SO_KEEPALIVE, serverConfig.isSoKeepalive())
                    .childOption(ChannelOption.TCP_NODELAY, serverConfig.isTcpNoDelay())
                    .childOption(ChannelOption.SO_SNDBUF, serverConfig.getSendBufferSize())
                    .childOption(ChannelOption.SO_RCVBUF, serverConfig.getReceiveBufferSize())
                    .childHandler(new ChannelInitializer<SocketChannel>() {

                        @Override
                        protected void initChannel(SocketChannel ch) {
                            initNettyChannel(ch);
                        }
                    });

            ChannelFuture future;
            try {
                future = serverBootstrap.bind(serverConfig.getListenPort()).sync();
            } catch (Exception e) {
                logger.error("NettyRemotingServer bind fail {}, exit", e.getMessage(), e);
                throw new RemoteException(String.format(NETTY_BIND_FAILURE_MSG, serverConfig.getListenPort()));
            }
            if (future.isSuccess()) {
                logger.info("NettyRemotingServer bind success at port : {}", serverConfig.getListenPort());
            } else if (future.cause() != null) {
                throw new RemoteException(String.format(NETTY_BIND_FAILURE_MSG, serverConfig.getListenPort()), future.cause());
            } else {
                throw new RemoteException(String.format(NETTY_BIND_FAILURE_MSG, serverConfig.getListenPort()));
            }
        }
    }

master的NettyExecutorManager初始化的时候会将NettyRemotingClient也初始化，并且会注册处理worker回调请求的处理器，正真的端口绑定是在获取到执行器端口之后：

 /**
     * constructor
     */
    public NettyExecutorManager() {
        final NettyClientConfig clientConfig = new NettyClientConfig();
        this.nettyRemotingClient = new NettyRemotingClient(clientConfig);
    }
##注册处理worker回调的处理器
    @PostConstruct
    public void init() {
        this.nettyRemotingClient.registerProcessor(CommandType.TASK_EXECUTE_RESPONSE, taskExecuteResponseProcessor);
        this.nettyRemotingClient.registerProcessor(CommandType.TASK_EXECUTE_RUNNING, taskExecuteRunningProcessor);
        this.nettyRemotingClient.registerProcessor(CommandType.TASK_KILL_RESPONSE, taskKillResponseProcessor);
    }
    
 public NettyRemotingClient(final NettyClientConfig clientConfig) {
        this.clientConfig = clientConfig;
        if (NettyUtils.useEpoll()) {
            this.workerGroup = new EpollEventLoopGroup(clientConfig.getWorkerThreads(), new ThreadFactory() {
                private final AtomicInteger threadIndex = new AtomicInteger(0);

                @Override
                public Thread newThread(Runnable r) {
                    return new Thread(r, String.format("NettyClient_%d", this.threadIndex.incrementAndGet()));
                }
            });
        } else {
            this.workerGroup = new NioEventLoopGroup(clientConfig.getWorkerThreads(), new ThreadFactory() {
                private final AtomicInteger threadIndex = new AtomicInteger(0);

                @Override
                public Thread newThread(Runnable r) {
                    return new Thread(r, String.format("NettyClient_%d", this.threadIndex.incrementAndGet()));
                }
            });
        }
        this.callbackExecutor = new ThreadPoolExecutor(5, 10, 1, TimeUnit.MINUTES,
                new LinkedBlockingQueue<>(1000), new NamedThreadFactory("CallbackExecutor", 10),
                new CallerThreadExecutePolicy());
        this.clientHandler = new NettyClientHandler(this, callbackExecutor);

        this.responseFutureExecutor = Executors.newSingleThreadScheduledExecutor(new NamedThreadFactory("ResponseFutureExecutor"));

        this.start();
    }
 /**
     * start
     */
 private void start() {

        this.bootstrap
                .group(this.workerGroup)
                .channel(NettyUtils.getSocketChannelClass())
                .option(ChannelOption.SO_KEEPALIVE, clientConfig.isSoKeepalive())
                .option(ChannelOption.TCP_NODELAY, clientConfig.isTcpNoDelay())
                .option(ChannelOption.SO_SNDBUF, clientConfig.getSendBufferSize())
                .option(ChannelOption.SO_RCVBUF, clientConfig.getReceiveBufferSize())
                .option(ChannelOption.CONNECT_TIMEOUT_MILLIS, clientConfig.getConnectTimeoutMillis())
                .handler(new ChannelInitializer<SocketChannel>() {
                    @Override
                    public void initChannel(SocketChannel ch) {
                        ch.pipeline()
                                .addLast("client-idle-handler", new IdleStateHandler(Constants.NETTY_CLIENT_HEART_BEAT_TIME, 0, 0, TimeUnit.MILLISECONDS))
                                .addLast(new NettyDecoder(), clientHandler, encoder);
                    }
                });
        this.responseFutureExecutor.scheduleAtFixedRate(ResponseFuture::scanFutureTable, 5000, 1000, TimeUnit.MILLISECONDS);
        isStarted.compareAndSet(false, true);
    }

任务分发代码如下：

/**
     * task dispatch
     *
     * @param context context
     * @return result
     * @throws ExecuteException if error throws ExecuteException
     */
    public Boolean dispatch(final ExecutionContext context) throws ExecuteException {
        /**
         * get executor manager
         */
        ExecutorManager<Boolean> executorManager = this.executorManagers.get(context.getExecutorType());
        if (executorManager == null) {
            throw new ExecuteException("no ExecutorManager for type : " + context.getExecutorType());
        }

        /**
         * host select
         */

        Host host = hostManager.select(context);
        if (StringUtils.isEmpty(host.getAddress())) {
            throw new ExecuteException(String.format("fail to execute : %s due to no suitable worker, "
                            + "current task needs worker group %s to execute",
                    context.getCommand(),context.getWorkerGroup()));
        }
        context.setHost(host);
        executorManager.beforeExecute(context);
        try {
            /**
             * task execute
             */
            return executorManager.execute(context);
        } finally {
            executorManager.afterExecute(context);
        }
    }


/**
     * execute logic
     *
     * @param context context
     * @return result
     * @throws ExecuteException if error throws ExecuteException
     */
    @Override
    public Boolean execute(ExecutionContext context) throws ExecuteException {

        /**
         *  all nodes
         */
        Set<String> allNodes = getAllNodes(context);

        /**
         * fail nodes
         */
        Set<String> failNodeSet = new HashSet<>();

        /**
         *  build command accord executeContext
         */
        Command command = context.getCommand();

        /**
         * execute task host
         */
        Host host = context.getHost();
        boolean success = false;
        while (!success) {
            try {
                doExecute(host, command);
                success = true;
                context.setHost(host);
            } catch (ExecuteException ex) {
                logger.error(String.format("execute command : %s error", command), ex);
                try {
                    failNodeSet.add(host.getAddress());
                    Set<String> tmpAllIps = new HashSet<>(allNodes);
                    Collection<String> remained = CollectionUtils.subtract(tmpAllIps, failNodeSet);
                    if (remained != null && remained.size() > 0) {
                        host = Host.of(remained.iterator().next());
                        logger.error("retry execute command : {} host : {}", command, host);
                    } else {
                        throw new ExecuteException("fail after try all nodes");
                    }
                } catch (Throwable t) {
                    throw new ExecuteException("fail after try all nodes");
                }
            }
        }

        return success;
    }


/**
     * execute logic
     *
     * @param host host
     * @param command command
     * @throws ExecuteException if error throws ExecuteException
     */
    public void doExecute(final Host host, final Command command) throws ExecuteException {
        /**
         * retry count，default retry 3
         */
        int retryCount = 3;
        boolean success = false;
        do {
            try {
                nettyRemotingClient.send(host, command);
                success = true;
            } catch (Exception ex) {
                logger.error(String.format("send command : %s to %s error", command, host), ex);
                retryCount--;
                ThreadUtils.sleep(100);
            }
        } while (retryCount >= 0 && !success);

        if (!success) {
            throw new ExecuteException(String.format("send command : %s to %s error", command, host));
        }
    }

  /**
     * send task
     *
     * @param host host
     * @param command command
     */
    public void send(final Host host, final Command command) throws RemotingException {
        Channel channel = getChannel(host);
        if (channel == null) {
            throw new RemotingException(String.format("connect to : %s fail", host));
        }
        try {
            ChannelFuture future = channel.writeAndFlush(command).await();
            if (future.isSuccess()) {
                logger.debug("send command : {} , to : {} successfully.", command, host.getAddress());
            } else {
                String msg = String.format("send command : %s , to :%s failed", command, host.getAddress());
                logger.error(msg, future.cause());
                throw new RemotingException(msg);
            }
        } catch (Exception e) {
            logger.error("Send command {} to address {} encounter error.", command, host.getAddress());
            throw new RemotingException(String.format("Send command : %s , to :%s encounter error", command, host.getAddress()), e);
        }
    }

Worker部分具体代码如下：
同理woker在启动的时候会初始化nettyserver，注册对应处理器并启动：

/**
     * worker server run
     */
    @PostConstruct
    public void run() {
        // init remoting server
        NettyServerConfig serverConfig = new NettyServerConfig();
        serverConfig.setListenPort(workerConfig.getListenPort());
        this.nettyRemotingServer = new NettyRemotingServer(serverConfig);
        this.nettyRemotingServer.registerProcessor(CommandType.TASK_EXECUTE_REQUEST, taskExecuteProcessor);
        this.nettyRemotingServer.registerProcessor(CommandType.TASK_KILL_REQUEST, taskKillProcessor);
        this.nettyRemotingServer.registerProcessor(CommandType.TASK_EXECUTE_RUNNING_ACK, taskExecuteRunningAckProcessor);
        this.nettyRemotingServer.registerProcessor(CommandType.TASK_EXECUTE_RESPONSE_ACK, taskExecuteResponseAckProcessor);
        this.nettyRemotingServer.registerProcessor(CommandType.PROCESS_HOST_UPDATE_REQUEST, hostUpdateProcessor);

        // logger server
        this.nettyRemotingServer.registerProcessor(CommandType.GET_LOG_BYTES_REQUEST, loggerRequestProcessor);
        this.nettyRemotingServer.registerProcessor(CommandType.ROLL_VIEW_LOG_REQUEST, loggerRequestProcessor);
        this.nettyRemotingServer.registerProcessor(CommandType.VIEW_WHOLE_LOG_REQUEST, loggerRequestProcessor);
        this.nettyRemotingServer.registerProcessor(CommandType.REMOVE_TAK_LOG_REQUEST, loggerRequestProcessor);

        this.nettyRemotingServer.start();

        // install task plugin
        this.taskPluginManager.installPlugin();

        // worker registry
        try {
            this.workerRegistryClient.registry();
            this.workerRegistryClient.setRegistryStoppable(this);
            Set<String> workerZkPaths = this.workerRegistryClient.getWorkerZkPaths();

            this.workerRegistryClient.handleDeadServer(workerZkPaths, NodeType.WORKER, Constants.DELETE_OP);
        } catch (Exception e) {
            logger.error(e.getMessage(), e);
            throw new RuntimeException(e);
        }

        // task execute manager
        this.workerManagerThread.start();

        // retry report task status
        this.retryReportTaskStatusThread.start();

        /*
         * registry hooks, which are called before the process exits
         */
        Runtime.getRuntime().addShutdownHook(new Thread(() -> {
            if (Stopper.isRunning()) {
                close("shutdownHook");
            }
        }));
    }

回调线程对象初始化的时候，会将包含的nettyremotingclient一起初始化，并注册好对应的业务处理器：

 public TaskCallbackService() {
        final NettyClientConfig clientConfig = new NettyClientConfig();
        this.nettyRemotingClient = new NettyRemotingClient(clientConfig);
        this.nettyRemotingClient.registerProcessor(CommandType.TASK_EXECUTE_RUNNING_ACK, taskExecuteRunningProcessor);
        this.nettyRemotingClient.registerProcessor(CommandType.TASK_EXECUTE_RESPONSE_ACK, taskExecuteResponseAckProcessor);
    }

回调线程会通过其他执行器中缓存下来的chanel与master的客户端进行通信：

/**
     * send result
     *
     * @param taskInstanceId taskInstanceId
     * @param command command
     */
    public void send(int taskInstanceId, Command command) {
        NettyRemoteChannel nettyRemoteChannel = getRemoteChannel(taskInstanceId);
        if (nettyRemoteChannel != null) {
            nettyRemoteChannel.writeAndFlush(command).addListener(new ChannelFutureListener() {

                @Override
                public void operationComplete(ChannelFuture future) throws Exception {
                    if (future.isSuccess()) {
                        // remove(taskInstanceId);
                        return;
                    }
                }
            });
        }
    }

2.6.2 其他服务与Master交互

以日志服务为例，通前端触发请求日志的接口，通过参数与数据库交互获取到Master的NettyServer信息，然后构建netty客户端与master进行通信获取日志并返回。
具体代码如下：

 public Result<String> queryLog(@ApiIgnore @RequestAttribute(value = Constants.SESSION_USER) User loginUser,
                                   @RequestParam(value = "taskInstanceId") int taskInstanceId,
                                   @RequestParam(value = "skipLineNum") int skipNum,
                                   @RequestParam(value = "limit") int limit) {
        return loggerService.queryLog(taskInstanceId, skipNum, limit);
    }

 /**
     * view log
     *
     * @param taskInstId task instance id
     * @param skipLineNum skip line number
     * @param limit limit
     * @return log string data
     */
    @Override
    @SuppressWarnings("unchecked")
    public Result<String> queryLog(int taskInstId, int skipLineNum, int limit) {

        TaskInstance taskInstance = processService.findTaskInstanceById(taskInstId);

        if (taskInstance == null) {
            return Result.error(Status.TASK_INSTANCE_NOT_FOUND);
        }
        if (StringUtils.isBlank(taskInstance.getHost())) {
            return Result.error(Status.TASK_INSTANCE_HOST_IS_NULL);
        }
        Result<String> result = new Result<>(Status.SUCCESS.getCode(), Status.SUCCESS.getMsg());
        String log = queryLog(taskInstance,skipLineNum,limit);
        result.setData(log);
        return result;
    }

/**
     * query log
     *
     * @param taskInstance  task instance
     * @param skipLineNum skip line number
     * @param limit       limit
     * @return log string data
     */
    private String queryLog(TaskInstance taskInstance, int skipLineNum, int limit) {
        Host host = Host.of(taskInstance.getHost());

        logger.info("log host : {} , logPath : {} , port : {}", host.getIp(), taskInstance.getLogPath(),
                host.getPort());

        StringBuilder log = new StringBuilder();
        if (skipLineNum == 0) {
            String head = String.format(LOG_HEAD_FORMAT,
                    taskInstance.getLogPath(),
                    host,
                    Constants.SYSTEM_LINE_SEPARATOR);
            log.append(head);
        }

        log.append(logClient
                .rollViewLog(host.getIp(), host.getPort(), taskInstance.getLogPath(), skipLineNum, limit));

        return log.toString();
    }

 /**
     * roll view log
     *
     * @param host host
     * @param port port
     * @param path path
     * @param skipLineNum skip line number
     * @param limit limit
     * @return log content
     */
    public String rollViewLog(String host, int port, String path, int skipLineNum, int limit) {
        logger.info("roll view log, host : {}, port : {}, path {}, skipLineNum {} ,limit {}", host, port, path, skipLineNum, limit);
        RollViewLogRequestCommand request = new RollViewLogRequestCommand(path, skipLineNum, limit);
        String result = "";
        final Host address = new Host(host, port);
        try {
            Command command = request.convert2Command();
            Command response = this.client.sendSync(address, command, LOG_REQUEST_TIMEOUT);
            if (response != null) {
                RollViewLogResponseCommand rollReviewLog = JSONUtils.parseObject(
                        response.getBody(), RollViewLogResponseCommand.class);
                return rollReviewLog.getMsg();
            }
        } catch (Exception e) {
            logger.error("roll view log error", e);
        } finally {
            this.client.closeChannel(address);
        }
        return result;
    }

 /**
     * sync send
     *
     * @param host host
     * @param command command
     * @param timeoutMillis timeoutMillis
     * @return command
     */
    public Command sendSync(final Host host, final Command command, final long timeoutMillis) throws InterruptedException, RemotingException {
        final Channel channel = getChannel(host);
        if (channel == null) {
            throw new RemotingException(String.format("connect to : %s fail", host));
        }
        final long opaque = command.getOpaque();
        final ResponseFuture responseFuture = new ResponseFuture(opaque, timeoutMillis, null, null);
        channel.writeAndFlush(command).addListener(future -> {
            if (future.isSuccess()) {
                responseFuture.setSendOk(true);
                return;
            } else {
                responseFuture.setSendOk(false);
            }
            responseFuture.setCause(future.cause());
            responseFuture.putResponse(null);
            logger.error("send command {} to host {} failed", command, host);
        });
        /*
         * sync wait for result
         */
        Command result = responseFuture.waitResponse();
        if (result == null) {
            if (responseFuture.isSendOK()) {
                throw new RemotingTimeoutException(host.toString(), timeoutMillis, responseFuture.getCause());
            } else {
                throw new RemotingException(host.toString(), responseFuture.getCause());
            }
        }
        return result;
    }

nettyclient随着日志业务对象初始化而初始化：

 /**
     * construct client
     */
    public LogClientService() {
        this.clientConfig = new NettyClientConfig();
        this.clientConfig.setWorkerThreads(4);
        this.client = new NettyRemotingClient(clientConfig);
        this.isRunning = true;
    }

2.7 负载均衡算法

master在选择执行器的时候DolphinScheduler提供了三种负载均衡算法，且所有的算法都用到了节点权重：加权随机（random），平滑轮询（roundrobin），线性负载（lowerweight）。
通过配置文件来控制到底使用哪一个负载均衡策略，默认配置是权重策略：host-selector: lower_weight。

@Bean
    public HostManager hostManager() {
        HostSelector selector = masterConfig.getHostSelector();
        HostManager hostManager;
        switch (selector) {
            case RANDOM:
                hostManager = new RandomHostManager();
                break;
            case ROUND_ROBIN:
                hostManager = new RoundRobinHostManager();
                break;
            case LOWER_WEIGHT:
                hostManager = new LowerWeightHostManager();
                break;
            default:
                throw new IllegalArgumentException("unSupport selector " + selector);
        }
        beanFactory.autowireBean(hostManager);
        return hostManager;
    }

2.7.1 加权随机

看代码更好理解：按照全部权重值求和，然后取汇总结果的随机整数，随机整数对原先所有host的权重累差，返回小于零的时候的host，没有就随机返回一个。

  @Override
    public HostWorker doSelect(final Collection<HostWorker> source) {

        List<HostWorker> hosts = new ArrayList<>(source);
        int size = hosts.size();
        int[] weights = new int[size];
        int totalWeight = 0;
        int index = 0;

        for (HostWorker host : hosts) {
            totalWeight += host.getHostWeight();
            weights[index] = host.getHostWeight();
            index++;
        }

        if (totalWeight > 0) {
            int offset = ThreadLocalRandom.current().nextInt(totalWeight);

            for (int i = 0; i < size; i++) {
                offset -= weights[i];
                if (offset < 0) {
                    return hosts.get(i);
                }
            }
        }
        return hosts.get(ThreadLocalRandom.current().nextInt(size));
    }

2.7.2 线性负载

权重计算逻辑：利用注册的cpu占用，内存占用，以及加载因子还有启动时间消耗做计算

private double calculateWeight(double cpu, double memory, double loadAverage, long startTime) {
        double calculatedWeight = cpu * CPU_FACTOR + memory * MEMORY_FACTOR + loadAverage * LOAD_AVERAGE_FACTOR;
        long uptime = System.currentTimeMillis() - startTime;
        if (uptime > 0 && uptime < Constants.WARM_UP_TIME) {
            // If the warm-up is not over, add the weight
            return calculatedWeight * Constants.WARM_UP_TIME / uptime;
        }
        return calculatedWeight;
    }

获取权重最小的节点，并把节点权重置为最大。

/**
     * select
     *
     * @param sources sources
     * @return HostWeight
     */
    @Override
    public HostWeight doSelect(Collection<HostWeight> sources) {
        double totalWeight = 0;
        double lowWeight = 0;
        HostWeight lowerNode = null;
        for (HostWeight hostWeight : sources) {
            totalWeight += hostWeight.getWeight();
            hostWeight.setCurrentWeight(hostWeight.getCurrentWeight() + hostWeight.getWeight());
            if (lowerNode == null || lowWeight > hostWeight.getCurrentWeight()) {
                lowerNode = hostWeight;
                lowWeight = hostWeight.getCurrentWeight();
            }
        }
        lowerNode.setCurrentWeight(lowerNode.getCurrentWeight() + totalWeight);
        return lowerNode;

    }

2.7.3 平滑轮询

此算法不是很好理解，不知理解是否正确，它有一个预热的过程，之前都是取第一个，等到累累计的权重超过最大就整数就开始按权重轮询。

 @Override
    public HostWorker doSelect(Collection<HostWorker> source) {

        List<HostWorker> hosts = new ArrayList<>(source);
        String key = hosts.get(0).getWorkerGroup();
        ConcurrentMap<String, WeightedRoundRobin> map = workGroupWeightMap.get(key);
        if (map == null) {
            workGroupWeightMap.putIfAbsent(key, new ConcurrentHashMap<>());
            map = workGroupWeightMap.get(key);
        }

        int totalWeight = 0;
        long maxCurrent = Long.MIN_VALUE;
        long now = System.currentTimeMillis();
        HostWorker selectedHost = null;
        WeightedRoundRobin selectWeightRoundRobin = null;

        for (HostWorker host : hosts) {
            String workGroupHost = host.getWorkerGroup() + host.getAddress();
            WeightedRoundRobin weightedRoundRobin = map.get(workGroupHost);
            int weight = host.getHostWeight();
            if (weight < 0) {
                weight = 0;
            }

            if (weightedRoundRobin == null) {
                weightedRoundRobin = new WeightedRoundRobin();
                // set weight
                weightedRoundRobin.setWeight(weight);
                map.putIfAbsent(workGroupHost, weightedRoundRobin);
                weightedRoundRobin = map.get(workGroupHost);
            }
            if (weight != weightedRoundRobin.getWeight()) {
                weightedRoundRobin.setWeight(weight);
            }

            long cur = weightedRoundRobin.increaseCurrent();
            weightedRoundRobin.setLastUpdate(now);
            if (cur > maxCurrent) {
                maxCurrent = cur;
                selectedHost = host;
                selectWeightRoundRobin = weightedRoundRobin;
            }

            totalWeight += weight;
        }

        if (!updateLock.get() && hosts.size() != map.size() && updateLock.compareAndSet(false, true)) {
            try {
                ConcurrentMap<String, WeightedRoundRobin> newMap = new ConcurrentHashMap<>(map);
                newMap.entrySet().removeIf(item -> now - item.getValue().getLastUpdate() > RECYCLE_PERIOD);
                workGroupWeightMap.put(key, newMap);
            } finally {
                updateLock.set(false);
            }
        }

        if (selectedHost != null) {
            selectWeightRoundRobin.sel(totalWeight);
            return selectedHost;
        }

        return hosts.get(0);
    }

2.8 日志服务

2.6.2已经介绍不在做过多的说明。

2.9 报警

暂未研究，目测基本就是根据规则筛选数据，然后调用指定类型的报警服务接口做报警操作，比如邮件，微信，短信通知等。

3 后记

3.1 make friends

因为没有正式生产使用，业务理解不一定透彻，理解可能有偏差，欢迎大家交流讨论。图片违规，请加wx:

4 参考文献

https://dolphinscheduler.apache.org/zh-cn/development/architecture-design.html;
https://juejin.cn/post/6844903729406148622;
https://www.w3cschool.cn/quartz_doc/quartz_doc-1xbu2clr.html;

你可能感兴趣的:(java,大数据,大数据,分布式)

JSON 与 AJAX Auscy json ajax 前端
一、JSON（JavaScriptObjectNotation）1.数据类型与语法细节支持的数据类型：基本类型：字符串（需用双引号）、数字、布尔值（true/false）、null。复杂类型：数组（[]）、对象（{}）。严格语法规范：键名必须用双引号包裹（如"name":"张三"）。数组元素用逗号分隔，最后一个元素后不能有多余逗号。数字不能以0开头（如012会被解析为12），不支持八进制/十六进制
JavaScript 树形菜单总结 Auscy microsoft
树形菜单是前端开发中常见的交互组件，用于展示具有层级关系的数据（如文件目录、分类列表、组织架构等）。以下从核心概念、实现方式、常见功能及优化方向等方面进行总结。一、核心概念层级结构：数据以父子嵌套形式存在，如{id:1,children:[{id:2}]}。节点：树形结构的基本单元，包含自身信息及子节点（若有）。展开/折叠：子节点的显示与隐藏切换，是树形菜单的核心交互。递归渲染：因数据层级不固定，
精通Canvas：15款时钟特效代码实现指南烟幕缭绕
本文还有配套的精品资源，点击获取简介：HTML5的Canvas是一个用于绘制矢量图形的API，通过JavaScript实现动态效果。本项目集合了15种不同的时钟特效代码，帮助开发者通过学习绘制圆形、线条、时间更新、旋转、颜色样式设置及动画效果等概念，深化对Canvas的理解和应用。项目中的CSS文件负责时钟的样式设定，而JS文件则包含实现各种特效的逻辑，通过不同的函数或类处理时间更新和动画绘制，提
深入剖析OpenJDK 18 GA源码：Java平台最新发展想法臃肿
本文还有配套的精品资源，点击获取简介：OpenJDK18GA作为Java开发的关键里程碑，提供了诸多新特性和改进。本文章深入探讨了OpenJDK18GA源码，揭示其内部机制，帮助开发者更好地理解和利用这个版本。文章还涵盖了PatternMatching、SealedClasses、Records、JEP395、JEP406和JEP407等特性，以及HotSpot虚拟机、编译器、垃圾收集器、内存模型
Java大厂面试实录：谢飞机的电商场景技术问答（Spring Cloud、MyBatis、Redis、Kafka、AI等）
Java大厂面试实录：谢飞机的电商场景技术问答（SpringCloud、MyBatis、Redis、Kafka、AI等）本文模拟知名互联网大厂Java后端岗位面试流程，以电商业务为主线，由严肃面试官与“水货”程序员谢飞机展开有趣的对话，涵盖SpringCloud、MyBatis、Redis、Kafka、SpringSecurity、AI等热门技术栈，并附详细解析，助力求职者备战大厂面试。故事设定谢
【超硬核】JVM源码解读：Java方法main在虚拟机上解释执行 HeapDump性能社区 java 开发语言后端 jvm
本文由HeapDump性能社区首席讲师鸠摩（马智）授权整理发布第1篇-关于Java虚拟机HotSpot，开篇说的简单点开讲Java运行时，这一篇讲一些简单的内容。我们写的主类中的main()方法是如何被Java虚拟机调用到的？在Java类中的一些方法会被由C/C++编写的HotSpot虚拟机的C/C++函数调用，不过由于Java方法与C/C++函数的调用约定不同，所以并不能直接调用，需要JavaC
算法学习笔记：17.蒙特卡洛算法 ——从原理到实战，涵盖 LeetCode 与考研 408 例题
在计算机科学和数学领域，蒙特卡洛算法（MonteCarloAlgorithm）以其独特的随机抽样思想，成为解决复杂问题的有力工具。从圆周率的计算到金融风险评估，从物理模拟到人工智能，蒙特卡洛算法都发挥着不可替代的作用。本文将深入剖析蒙特卡洛算法的思想、解题思路，结合实际应用场景与Java代码实现，并融入考研408的相关考点，穿插图片辅助理解，帮助你全面掌握这一重要算法。蒙特卡洛算法的基本概念蒙特卡
分布式学习笔记_04_复制模型 NzuCRAS 分布式学习笔记架构后端
常见复制模型使用复制的目的在分布式系统中，数据通常需要被分布在多台机器上，主要为了达到：拓展性：数据量因读写负载巨大，一台机器无法承载，数据分散在多台机器上仍然可以有效地进行负载均衡，达到灵活的横向拓展高容错&高可用：在分布式系统中单机故障是常态，在单机故障的情况下希望整体系统仍然能够正常工作，这时候就需要数据在多台机器上做冗余，在遇到单机故障时能够让其他机器接管统一的用户体验：如果系统客户端分布
Java大厂面试故事：谢飞机的互联网音视频场景技术面试全纪录（Spring Boot、MyBatis、Kafka、Redis、AI等）来旺 Java场景面试宝典 Java Spring Boot MyBatis Kafka Redis 微服务 AI
Java大厂面试故事：谢飞机的互联网音视频场景技术面试全纪录（SpringBoot、MyBatis、Kafka、Redis、AI等）互联网大厂技术面试不仅考察技术深度，更注重业务场景与系统设计能力。本篇以严肃面试官与“水货”程序员谢飞机的对话，带你体验音视频业务场景下的Java面试全过程，涵盖主流技术栈，并附详细答案解析，助你面试无忧。故事场景设定谢飞机是一名有趣但技术基础略显薄弱的程序员，这次应
【前端】jQuery数组合并去重方法总结
在jQuery中合并多个数组并去重，推荐使用原生JavaScript的Set对象（高效简单）或$.unique()（仅适用于DOM元素，不适用于普通数组）。以下是完整解决方案：方法1：使用ES6Set（推荐）//定义多个数组constarr1=[1,2,3];constarr2=[2,3,4];constarr3=[3,4,5];//合并数组并用Set去重constmergedArray=[...
php 高并发下日志量巨大，如何高效采集、存储、分析贵哥的编程之路(热爱分享为后来者) PHP语言经典程序100题 php 开发语言
1.问题背景高并发系统每秒产生大量日志（如访问日志、错误日志、业务日志等）。单机写入、存储、分析能力有限，容易成为瓶颈。需要支持实时采集、分布式存储、快速检索与分析。2.主流架构方案一、分布式日志采集架构[应用服务器(PHP等)]|v[日志采集Agent（如Filebeat、Fluentd、Logstash）]|v[消息队列/缓冲（如Kafka、Redis、RabbitMQ）]|v[日志存储（如E
MySQL Explain 详解：从入门到精通，让你的 SQL 飞起来
引言：为什么Explain是SQL优化的“照妖镜”？在Java开发中，我们常常会遇到数据库性能瓶颈的问题。一条看似简单的SQL语句，在数据量增长到一定规模后，可能会从毫秒级响应变成秒级甚至分钟级响应，直接拖慢整个应用的性能。此时，你是否曾困惑于：为什么这条SQL突然变慢了？索引明明建了，为什么没生效？到底是哪里出了问题？答案就藏在MySQL的EXPLAIN命令里。EXPLAIN就像一面“照妖镜”，
Java特性之设计模式【责任链模式】 Naijia_OvO Java特性 java 设计模式责任链模式
一、责任链模式概述顾名思义，责任链模式（ChainofResponsibilityPattern）为请求创建了一个接收者对象的链。这种模式给予请求的类型，对请求的发送者和接收者进行解耦。这种类型的设计模式属于行为型模式在这种模式中，通常每个接收者都包含对另一个接收者的引用。如果一个对象不能处理该请求，那么它会把相同的请求传给下一个接收者，依此类推主要解决：职责链上的处理者负责处理请求，客户只需要将
日历插件-FullCalendar的详细使用老马聊技术 JavaScript 前端 javascript
一、介绍FullCalendar是一个功能强大、高度可定制的JavaScript日历组件，用于在网页中显示和管理日历事件。它支持多种视图（月、周、日等），可以轻松集成各种框架，并提供丰富的事件处理功能。二、实操案例具体代码如下：FullCalendar日期选择body{font-family:Arial,sans-serif;margin:20px;}#calendar{max-width:900
react-native android 环境搭建
环境：macjava版本：Java11最重要：一定要一定要一定要react涉及到很多的依赖下载，gradle和react相关的，第一次安装环境时有外网环境会快速很多。安装nodejs安装react-nativenpminstallreact-native-clinpminstallreact-native创建一个新项目react-nativeinitfirstReact替换gradle下载源rep
Java 调用 HTTP 接口的 7 种方式：全网最全指南
Java调用HTTP接口的7种方式：全网最全指南在开发过程中，调用HTTP接口是最常见的需求之一。本文将详细介绍Java中7种主流的调用HTTP接口的方式，包括每种工具的优缺点和完整代码实现。1.使用RestTemplateRestTemplate是Spring提供的同步HTTP客户端，适用于传统项目。尽管从Spring5开始被标记为过时，它仍然是许多开发者的首选。示例代码importorg.sp
数字孪生技术为UI前端注入新活力：实现产品设计的沉浸式体验 ui设计前端开发老司机 ui
hello宝子们...我们是艾斯视觉擅长ui设计、前端开发、数字孪生、大数据、三维建模、三维动画10年+经验!希望我的分享能帮助到您!如需帮助可以评论关注私信我们一起探讨!致敬感谢感恩!一、引言：从“平面交互”到“沉浸体验”的UI革命当用户在电商APP中翻看3D家具模型却无法感知其与自家客厅的匹配度，当设计师在2D屏幕上绘制汽车内饰却难以预判实际乘坐体验——传统UI设计的“平面化、静态化、割裂感”
Java三年经验程序员技术栈全景指南：从前端到架构，对标阿里美团全栈要求可曾去过倒悬山 java 前端架构
Java三年经验程序员技术栈全景指南：从前端到架构，对标阿里美团全栈要求三年经验是Java程序员的分水岭，技术栈深度决定你成为“业务码农”还是“架构师候选人”。本文整合阿里、美团、滴滴等大厂招聘要求，为你绘制可落地的进阶路线。一、Java核心：从语法糖到JVM底层三年经验与初级的核心差异在于系统级理解，大厂面试常考以下能力：JVM与性能调优内存模型（堆外内存、元空间）、GC算法（G1/ZGC适用场
RocketMQ 之死信队列 firepation RocketMQ rocketmq
在分布式消息系统中，消息的可靠传递和处理至关重要。然而，由于各种原因（如消息处理失败、消费超时等），一些消息可能无法被正常消费。这些无法被消费的消息如果不加以处理，会影响系统的稳定性和数据一致性。为了解决这一问题，RocketMQ提供了死信队列（DeadLetterQueue，DLQ）机制。本文将深入探讨RocketMQ的死信队列，包括其实现原理、应用场景以及使用示例。什么是死信队列？死信队列是一
javascript高级程序设计第3版——第12章 DOM2与DOM3 weixin_30687587 javascript 数据结构与算法 ViewUI
12章——DOM2与DOM3为了增强D0M1，DOM级规范定义了一些模块。DOM2核心：为不同的DOM类型引入了一些与XML命名空间有关的方法，还定义了以编程方式创建Document实例的方法；DOM2级样式：针对操作元素的样式而开发；其特性总结：1.每个元素都有一个关联的style对象，可用来确定和修改行内样式；2.要确定某个元素的计算样式，可使用getComgetComputedStyle（）
Java设计模式实战：高频场景解析与避坑指南 mckim_ 笔记学习 java 设计模式
引言设计模式是软件开发的基石，但许多开发者面对23种模式时容易陷入“学完就忘”或“滥用模式”的困境。本文从工业级项目视角出发，精选10种高频设计模式，结合真实代码案例与主流框架应用，帮你建立模式思维，拒绝纸上谈兵。一、创建型模式：告别new的暴力美学1.工厂方法模式（FactoryMethod）核心痛点：对象创建逻辑散落各处，难以统一管理。场景案例：电商平台需要支持多种支付方式（支付宝、微信、银联
JavaScript 基础09：Web APIs——日期对象、DOM节点梦想当全栈 JavaScript javascript 前端开发语言
JavaScript基础09：WebAPIs——日期对象、DOM节点进一步学习DOM相关知识，实现可交互的网页特效能够插入、删除和替换元素节点。能够依据元素节点关系查找节点。一、日期对象掌握Date日期对象的使用，动态获取当前计算机的时间。ECMAScript中内置了获取系统时间的对象Date，使用Date时与之前学习的内置对象console和Math不同，它需要借助new关键字才能使用。1.实例
《Java前端开发全栈指南：从Servlet到现代框架实战》
前言在当今Web开发领域，Java依然是后端开发的主力语言，而随着前后端分离架构的普及，Java开发者也需要掌握前端技术栈。本文将全面介绍JavaWeb前端开发的核心技术，包括传统Servlet/JSP体系、现代前端框架集成方案，以及全栈开发的最佳实践。通过本文，您将了解如何构建现代化的JavaWeb应用前端界面。一、JavaWeb前端技术演进1.1传统技术栈Servlet：JavaWeb基础，处
javaSE面试题---语法基础、面向对象、常用类、集合、多线程、文件和IO yang_xiao_wu_ java 面试开发语言 javase java基础多线程文件和IO
目录语法基础1.jdkjrejvm区别2.基本数据类型3.引用数据类型4.自动类型转换、强制类型转换5.常见的运算符6.&和&&区别7.++--在前和在后的区别8.+=有什么作用9.switch..case中switch支持哪些数据类型10.break和continue区别11.while和dowhile区别12.如何生成一个取值范围在[min,max]之间的随机数13.数组的长度如何获取？数组下
JAVA 高频八股文 Day03 Conqueror675 java 开发语言
12.TCP和Http的区别是什么TCP是传输层协议，负责建立可靠的点对点连接，确保数据有序、完整地传输（如铁路轨道）；HTTP是应用层协议，基于TCP构建，定义了Web服务交互的报文格式和规则（如货运订单）。TCP关注数据如何可靠送达，通过三次握手建立连接、流量控制等机制保证传输；HTTP关注传输内容的意义，提供请求/响应语义（GET/POST等）和无状态通信。补充：说一下什么是三次握手四次挥手
JVM字节码加载与存储中的细节
问题引出：为什么Java定义int型变量为32767时使用的是bipush32767，而定义int型变量为32768时使用的是ldc#4？在Java中，如果这样定义int型变量：publicclassTest{publicstaticvoidmain(String[]args){inti=0;intj=5;intk=6;intm=32768;intn=32767;}}变量对应的字节码文件内容是这样
JVM与Spring Boot核心解析 AIHacksCash Java场景面试宝典 Java JVM Spring Boot
我是廖志伟，一名Java开发工程师、《Java项目实战——深入理解大型互联网企业通用技术》（基础篇）、（进阶篇）、（架构篇）清华大学出版社签约作家、Java领域优质创作者、CSDN博客专家、阿里云专家博主、51CTO专家博主、产品软文专业写手、技术文章评审老师、技术类问卷调查设计师、幕后大佬社区创始人、开源项目贡献者。拥有多年一线研发和团队管理经验，研究过主流框架的底层源码(Spring、Spri
HashMap的Get(),Put()源码解析 Ttang23 哈希算法散列表算法
1、什么是HashMap？HashMap是Java中用于存储键值对（Key-Value）的集合类，它实现了Map接口。其核心特点是：无序性：不保证元素的存储顺序，也不保证顺序恒定不变。唯一性：键（Key）不能重复，若插入重复键会覆盖原有值。允许null：允许一个null键和任意数量的null值。非线程安全：相比HashTable，HashMap不支持同步，性能更高。2.核心数据结构：哈希表（Has
Java中的Tomcat，开启Web应用腾飞【基础版】
目录一、Tomcat初登场：揭开神秘面纱（一）啥是Tomcat（二）为啥要有Tomcat二、Tomcat的安装与启动：开启第一步（一）下载Tomcat（二）启动Tomcat三、Tomcat的目录结构：探秘内部布局（一）核心目录介绍（二）目录间的协同工作四、部署JavaWeb应用到Tomcat：让应用上线（一）打包Web应用为WAR文件（二）部署WAR文件到Tomcat五、Tomcat的配置优化：让
Java Web 之 Session 详解艾伦~耶格尔 java 开发语言后端前端 session
在JavaWeb开发中，Session就像网站的专属记忆管家，为每个用户保管着重要的信息和状态，确保用户在网站的旅程顺畅无阻。场景一：想象你去一家大型超市购物，推着购物车挑选商品。这个购物车就如同Session，它记录了你的购物信息，方便你在结账时一次性结算。场景二：你在玩一个在线游戏，登录账号后，你的游戏进度、等级、装备等信息都会被保存在Session中，即使你中途关闭游戏，下次登录时依然可以继
遍历dom 并且存储（将每一层的DOM元素存在数组中）换个号韩国红果果 JavaScript html
数组从0开始！！ var a=[],i=0; for(var j=0;j<30;j++){ a[j]=[];//数组里套数组，且第i层存储在第a[i]中 } function walkDOM(n){ do{ if(n.nodeType!==3)//筛选去除#text类型 a[i].push(n); //con
Android+Jquery Mobile学习系列(9)-总结和代码分享白糖_ JQuery Mobile
目录导航经过一个多月的边学习边练手，学会了Android基于Web开发的毛皮，其实开发过程中用Android原生API不是很多，更多的是HTML/Javascript/Css。个人觉得基于WebView的Jquery Mobile开发有以下优点： 1、对于刚从Java Web转型过来的同学非常适合，只要懂得HTML开发就可以上手做事。 2、jquerym
impala参考资料 dayutianfei impala
记录一些有用的Impala资料 1. 入门资料 >>官网翻译： http://my.oschina.net/weiqingbin/blog?catalog=423691 2. 实用进阶 >>代码&架构分析： Impala/Hive现状分析与前景展望：http
JAVA 静态变量与非静态变量初始化顺序之新解周凡杨 java 静态非静态顺序
今天和同事争论一问题，关于静态变量与非静态变量的初始化顺序，谁先谁后，最终想整理出来！测试代码： import java.util.Map; public class T { public static T t = new T(); private Map map = new HashMap(); public T(){ System.out.println(&quo
跳出iframe返回外层页面 g21121 iframe
在web开发过程中难免要用到iframe，但当连接超时或跳转到公共页面时就会出现超时页面显示在iframe中，这时我们就需要跳出这个iframe到达一个公共页面去。首先跳转到一个中间页，这个页面用于判断是否在iframe中，在页面加载的过程中调用如下代码： <script type="text/javascript"> //<!-- function
JAVA多线程监听JMS、MQ队列 510888780 java多线程
背景：消息队列中有非常多的消息需要处理，并且监听器onMessage（）方法中的业务逻辑也相对比较复杂，为了加快队列消息的读取、处理速度。可以通过加快读取速度和加快处理速度来考虑。因此从这两个方面都使用多线程来处理。对于消息处理的业务处理逻辑用线程池来做。对于加快消息监听读取速度可以使用1.使用多个监听器监听一个队列；2.使用一个监听器开启多线程监听。对于上面提到的方法2使用一个监听器开启多线
第一个SpringMvc例子布衣凌宇 spring mvc
第一步：导入需要的包；第二步：配置web.xml文件 <?xml version="1.0" encoding="UTF-8"?> <web-app version="2.5" xmlns="http://java.sun.com/xml/ns/javaee" xmlns:xsi=
我的spring学习笔记15-容器扩展点之PropertyOverrideConfigurer aijuans Spring3
PropertyOverrideConfigurer类似于PropertyPlaceholderConfigurer，但是与后者相比，前者对于bean属性可以有缺省值或者根本没有值。也就是说如果properties文件中没有某个bean属性的内容，那么将使用上下文（配置的xml文件）中相应定义的值。如果properties文件中有bean属性的内容，那么就用properties文件中的值来代替上下
通过XSD验证XML antlove xml schema xsd validation SchemaFactory
1. XmlValidation.java package xml.validation; import java.io.InputStream; import javax.xml.XMLConstants; import javax.xml.transform.stream.StreamSource; import javax.xml.validation.Schem
文本流与字符集百合不是茶 PrintWrite()的使用字符集名字别名获取
文本数据的输入输出; 输入;数据流,缓冲流输出;介绍向文本打印格式化的输出PrintWrite(); package 文本流; import java.io.FileNotFound
ibatis模糊查询sqlmap-mapping-**.xml配置 bijian1013 ibatis
正常我们写ibatis的sqlmap-mapping-*.xml文件时，传入的参数都用##标识，如下所示： <resultMap id="personInfo" class="com.bijian.study.dto.PersonDTO"> <res
java jvm常用命令工具——jdb命令(The Java Debugger) bijian1013 java jvm jdb
用来对core文件和正在运行的Java进程进行实时地调试，里面包含了丰富的命令帮助您进行调试，它的功能和Sun studio里面所带的dbx非常相似，但 jdb是专门用来针对Java应用程序的。现在应该说日常的开发中很少用到JDB了，因为现在的IDE已经帮我们封装好了，如使用ECLI
【Spring框架二】Spring常用注解之Component、Repository、Service和Controller注解 bit1129 controller
在Spring常用注解第一步部分【Spring框架一】Spring常用注解之Autowired和Resource注解（http://bit1129.iteye.com/blog/2114084）中介绍了Autowired和Resource两个注解的功能，它们用于将依赖根据名称或者类型进行自动的注入，这简化了在XML中，依赖注入部分的XML的编写，但是UserDao和UserService两个bea
cxf wsdl2java生成代码super出错,构造函数不匹配 bitray super
由于过去对于soap协议的cxf接触的不是很多,所以遇到了也是迷糊了一会.后来经过查找资料才得以解决. 初始原因一般是由于jaxws2.2规范和jdk6及以上不兼容导致的.所以要强制降为jaxws2.1进行编译生成.我们需要少量的修改: 我们原来的代码 wsdl2java com.test.xxx -client http://..... 修改后的代
动态页面正文部分中文乱码排障一例 ronin47
公司网站一部分动态页面，早先使用apache+resin的架构运行，考虑到高并发访问下的响应性能问题，在前不久逐步开始用nginx替换掉了apache。不过随后发现了一个问题，随意进入某一有分页的网页，第一页是正常的（因为静态化过了）；点“下一页”，出来的页面两边正常，中间部分的标题、关键字等也正常，唯独每个标题下的正文无法正常显示。因为有做过系统调整，所以第一反应就是新上
java-54- 调整数组顺序使奇数位于偶数前面 bylijinnan java
import java.util.Arrays; import java.util.Random; import ljn.help.Helper; public class OddBeforeEven { /** * Q 54 调整数组顺序使奇数位于偶数前面 * 输入一个整数数组，调整数组中数字的顺序，使得所有奇数位于数组的前半部分，所有偶数位于数组的后半
从100PV到1亿级PV网站架构演变 cfyme 网站架构
一个网站就像一个人，存在一个从小到大的过程。养一个网站和养一个人一样，不同时期需要不同的方法，不同的方法下有共同的原则。本文结合我自已14年网站人的经历记录一些架构演变中的体会。 1：积累是必不可少的架构师不是一天练成的。 1999年，我作了一个个人主页，在学校内的虚拟空间，参加了一次主页大赛，几个DREAMWEAVER的页面，几个TABLE作布局，一个DB连接，几行PHP的代码嵌入在HTM
[宇宙时代]宇宙时代的GIS是什么？ comsci Gis
我们都知道一个事实，在行星内部的时候，因为地理信息的坐标都是相对固定的，所以我们获取一组GIS数据之后，就可以存储到硬盘中，长久使用。。。但是，请注意，这种经验在宇宙时代是不能够被继续使用的宇宙是一个高维时空
详解create database命令 czmmiao database
完整命令 CREATE DATABASE mynewdb USER SYS IDENTIFIED BY sys_password USER SYSTEM IDENTIFIED BY system_password LOGFILE GROUP 1 ('/u01/logs/my/redo01a.log','/u02/logs/m
几句不中听却不得不认可的话 datageek
1、人丑就该多读书。 2、你不快乐是因为：你可以像猪一样懒，却无法像只猪一样懒得心安理得。 3、如果你太在意别人的看法，那么你的生活将变成一件裤衩，别人放什么屁，你都得接着。 4、你的问题主要在于：读书不多而买书太多，读书太少又特爱思考，还他妈话痨。 5、与禽兽搏斗的三种结局：(1)、赢了，比禽兽还禽兽。(2)、输了，禽兽不如。(3)、平了，跟禽兽没两样。结论：选择正确的对手很重要。 6
1 14:00 PHP中的“syntax error, unexpected T_PAAMAYIM_NEKUDOTAYIM”错误 dcj3sjt126com PHP
原文地址：http://www.kafka0102.com/2010/08/281.html 因为需要，今天晚些在本机使用PHP做些测试，PHP脚本依赖了一堆我也不清楚做什么用的库。结果一跑起来，就报出类似下面的错误：“Parse error: syntax error, unexpected T_PAAMAYIM_NEKUDOTAYIM in /home/kafka/test/
xcode6 Auto layout and size classes dcj3sjt126com ios
官方GUI https://developer.apple.com/library/ios/documentation/UserExperience/Conceptual/AutolayoutPG/Introduction/Introduction.html iOS中使用自动布局（一） http://www.cocoachina.com/ind
通过PreparedStatement批量执行sql语句【sql语句相同，值不同】梦见x光 sql 事务批量执行
比如说：我有一个List需要添加到数据库中，那么我该如何通过PreparedStatement来操作呢？ public void addCustomerByCommit(Connection conn , List<Customer> customerList) { String sql = "inseret into customer(id
程序员必知必会----linux常用命令之十【系统相关】 hanqunfeng Linux常用命令
一.linux快捷键 Ctrl+C : 终止当前命令 Ctrl+S : 暂停屏幕输出 Ctrl+Q : 恢复屏幕输出 Ctrl+U : 删除当前行光标前的所有字符 Ctrl+Z : 挂起当前正在执行的进程 Ctrl+L : 清除终端屏幕，相当于clear 二.终端命令 clear : 清除终端屏幕 reset : 重置视窗，当屏幕编码混乱时使用 time com
NGINX IXHONG nginx
pcre 编译安装 nginx conf/vhost/test.conf upstream admin { server 127.0.0.1:8080; } server { listen 80; &
设计模式--工厂模式 kerryg 设计模式
工厂方式模式分为三种： 1、普通工厂模式：建立一个工厂类，对实现了同一个接口的一些类进行实例的创建。 2、多个工厂方法的模式：就是对普通工厂方法模式的改进，在普通工厂方法模式中，如果传递的字符串出错，则不能正确创建对象，而多个工厂方法模式就是提供多个工厂方法，分别创建对象。 3、静态工厂方法模式：就是将上面的多个工厂方法模式里的方法置为静态，
Spring InitializingBean/init-method和DisposableBean/destroy-method mx_xiehd java spring bean xml
1.initializingBean/init-method 实现org.springframework.beans.factory.InitializingBean接口允许一个bean在它的所有必须属性被BeanFactory设置后，来执行初始化的工作，InitialzingBean仅仅指定了一个方法。通常InitializingBean接口的使用是能够被避免的，（不鼓励使用，因为没有必要
解决Centos下vim粘贴内容格式混乱问题 qindongliang1922 centos vim
有时候，我们在向vim打开的一个xml，或者任意文件中，拷贝粘贴的代码时，格式莫名其毛的就混乱了，然后自己一个个再重新，把格式排列好，非常耗时，而且很不爽，那么有没有办法避免呢？答案是肯定的，设置下缩进格式就可以了，非常简单：在用户的根目录下直接vi ~/.vimrc文件然后将set pastetoggle=<F9> 写入这个文件中，保存退出，重新登录，
netty大并发请求问题 tianzhihehe netty
多线程并发使用同一个channel java.nio.BufferOverflowException: null at java.nio.HeapByteBuffer.put(HeapByteBuffer.java:183) ~[na:1.7.0_60-ea] at java.nio.ByteBuffer.put(ByteBuffer.java:832) ~[na:1.7.0_60-ea]
Hadoop NameNode单点问题解决方案之一 AvatarNode wyz2009107220 NameNode
我们遇到的情况 Hadoop NameNode存在单点问题。这个问题会影响分布式平台24*7运行。先说说我们的情况吧。我们的团队负责管理一个1200节点的集群(总大小12PB)，目前是运行版本为Hadoop 0.20，transaction logs写入一个共享的NFS filer(注：NetApp NFS Filer)。经常遇到需要中断服务的问题是给hadoop打补丁。 DataNod