AI之风盛行,为什么这么说?记得去年参加公司的校园招聘时,在面试的50人中95%的人学校都有人工智能的课程,如python、matlab等,但很少的人从事过具体应用实践,更不必说工程化场景了,由此看出AI的热度。人工智能前景的确很不错,市场也确实很有刚需,但真正要掌握好,真不是照着教科书打个helloworld那么容易。所以说,为了降低门槛,一直想做个工具,实现从数据准备、特征工程、模型训练、到评估的整个过程,也就是常说由多任务组成的pipeline。
在一个流程系统中,任务间往往存在复杂的依赖关系,为保证pipeline的正确执行,就是要解决各任务间依赖的问题,这样DAG结合拓扑排序是解决存在依赖关系的一类问题的利器。DAG ( Directed Acyclic Graph),有向无环图,是指任意一条边有方向,且不存在环路的图。如果把依赖关系的问题建模成 DAG, 依赖关系成为 Graph 中的 Directed Edge, 然后通过拓扑排序,不断遍历和剔除无依赖的接点,可以达到快速解决依赖的目的。
要打造一个机器学习的平台,重要是要解决如何设计一个很好的DAG流程调度系统。下面一些开源的工作流调度系统可按需选择:
- Python-based platform for running directed acyclic graphs (DAGs) of tasks
- Open source container-native workflow engine for getting work done on Kubernetes
- Batch workflow job scheduler created at LinkedIn to run Hadoop jobs.
- Brigade is a tool for running scriptable, automated tasks in the cloud — as part of your Kubernetes cluster.
- An orchestration engine to execute asynchronous long-running business logic developed by Uber Engineering.
- Workflow engine to automate your DevOps use cases.
- Netflix's Conductor is an orchestration engine that runs in the cloud.
- Workflow engine written in Scala and designed for simplicity and scalability. Executes workflows written in WDL or CWL.
- Digdag is a simple tool that helps you to build, run, schedule, and monitor complex pipelines of tasks.
- A high-perfomant workflow engine for serverless functions on Kubernetes.
- A workflow engine written in Ruby.
- A powerful human-centric Workflow Engine based on the BPMN 2.0 standard.
- Data processing & ETL framework for Ruby
- Workflow service, in OpenStack foundation.
- Workflow Scheduler for Hadoop.
- A distributed Java workflow engine designed to be dead simple.
- scalable workflow manager by Pinterest
- Job Scheduler and Runbook Automation.
- Titanoboa is a platform for creating complex workflows on JVM.
- A high-performance, extensible, modular and cross-platform workflow engine.
- Workflow Core is a light weight workflow engine targeting .NET Standard.
- A high performance Java workflow engine.
- A workflow engine for microservices orchestration that's capable of executing BPMN models, developed by the team at Camunda