个人理解为一个任务规划平台,利用Dags( directed acyclic graph)规划task的运行计划以及运行方式. 详细的内容参照123,这里不再赘述,拷贝的再全部都不如先把它跑起来,慢慢体会它的优点和缺点。
1 it is an open source platform for managing containerized workloads and services and it operates at the container level rather than at the hardware level
2 it provides you with a framework to run distributed systems resiliently.
3 It takes care of scaling and failover for your application, provides deployment patterns
4 service discovery and load balancing
5 storage orchestration Kubernetes allows you to automatically mount a storage system of your choice, such as local storages, public cloud providers, and more.
6 Automated rollouts and rollbacks
7 Self-healing Kubernetes restarts containers that fail, replaces containers, kills containers that don’t respond to your user-defined health check, and doesn’t advertise them to clients until they are ready to serve.
9 Secret and configuration management Kubernetes lets you store and manage sensitive information, such as passwords, OAuth tokens, and SSH keys.
10 comprises a set of independent, composable control processes that continuously drive the current state towards the provided desired state.
简言之,它是一个用来管理container的分布式平台,当你container由于各种原因fail的时候,它可以自动重新启动,避免了app的Down time;可以实现负载均衡;可以scaling up and down 你的应用,其他好处慢慢探索。至于缺点,就是在启动不同的pod的时候需要一定的启动时间,允许pod来pull不同的container application。详细的内容参照kubernetes 官方文档
主要是上述几点,也可以参照Airflow on Kubernetes 里面的详细论述。
终于到重头戏了,如何部署airflow在K8s上,日记将从以下几个方面开展。首先需要一个K8s cluster,然后部署airflow到K8的Worker Node上面,下面将进行详细描述。
根据官方文档安装部分,可以用Kind(Multiple Nodes), Minikube(One Node),Kubeadm(Linux-only,multiple nodes)以及二进制(the hard way,Linux only, maybe),具体安装参照 K8s-Install Tools。 本篇日记用kind 部署一个master node 和 3个work node 的一个K8 cluster。下面的安装操作步骤基于MacOS,对应于其他操作系统文章给出对应的安装链接。
brew install kind
kind --version
kind version 0.11.1
Kubectl is a command to tool which enable us to interact with K8s cluster, such as create,edit pods or any other K8s components.
brew install kubectl
kubectl version --client ##确认安装完成,返回信息如下
Client Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.7", GitCommit:"1dd5338295409edcfff11505e7bb246f0d325d15", GitTreeState:"clean", BuildDate:"2021-01-13T13:23:52Z", GoVersion:"go1.15.5", Compiler:"gc", Platform:"darwin/amd64"}
Set the default namespace as airflow for convenience.
alias k=kubectl
k config set-context ----current --namespace=airflow
kind create cluster --name airflow-kind --config kind-cluster.yaml
其中kind-cluster.yaml 文件配置如下:
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
- role: worker
kubeadmConfigPatches:
- |
kind: JoinConfiguration
nodeRegistration:
kubeletExtraArgs:
node-labels: "node=worker_1"
extraMounts:
- hostPath: ./data
containerPath: /tmp/data
- role: worker
kubeadmConfigPatches:
- |
kind: JoinConfiguration
nodeRegistration:
kubeletExtraArgs:
node-labels: "node=worker_2"
extraMounts:
- hostPath: ./data
containerPath: /tmp/data
- role: worker
kubeadmConfigPatches:
- |
kind: JoinConfiguration
nodeRegistration:
kubeletExtraArgs:
node-labels: "node=worker_3"
extraMounts:
- hostPath: ./data
containerPath: /tmp/data
kind get clusters
airflow-kind ## 确认我们的cluster已经部署完毕,部署需要几分钟的时间,请耐心。
大家都了解K8s部署应用时,需要配置应用各个部分的service,pod,pvc,pv,svc,deploy等一系列的内容,其流程比较复杂。同时,为了配置组件时配置的灵活性,我们借用helm,来实现airflow的部署。
关于什么是helm,请参照Helm- IBM,以及helm 官方文档
brew install helm
helm version ## check existence of helm
version.BuildInfo{Version:"v3.2.1", GitCommit:"fe51cd1e31e6a202cba7dead9552a6d418ded79a", GitTreeState:"clean", GoVersion:"go1.13.10"}
helm repo add apache-airflow https://airflow.apache.org ## add airflow 官方repo网址
helm repo update ## 更新 helm repo
helm install airflow apache-airflow/airflow --namespace=airflow --debug ## namespace is the virtual space for organize the airflow resource inside k8s
helm ls -n airflow ## check the deployment of airflow,返回如下
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
airflow airflow 4 2022-07-28 17:53:31.619956 +0500 +05 deployed airflow-1.6.0 2.3.0
kubectl get nodes #check the nodes
NAME STATUS ROLES AGE VERSION
airflow-kind-control-plane Ready control-plane,master 22h v1.21.1
airflow-kind-worker Ready 22h v1.21.1
airflow-kind-worker2 Ready 22h v1.21.1
airflow-kind-worker3 Ready 22h v1.21.1
kubectl get pods -n airflow. ## check the health of pods inside k8s cluster,and it returns the following pod status.
NAME READY STATUS RESTARTS AGE
airflow-postgresql-0 1/1 Running 0 22h
airflow-scheduler-c4fc586d6-fw2cl 2/2 Running 0 20h
airflow-statsd-7586f9998-kmvrm 1/1 Running 0 22h
airflow-triggerer-69dbf4b6f-hb2n5 1/1 Running 0 20h
airflow-webserver-84fb656485-rm2sj 1/1 Running 0 20h
用下面的port-forward命令,可以用localhost:8080 来登录airflow, airflow 登录用户名和密码在values. yaml(为admin).
k port-forward svc/airflow-webserver 8080:8080
综上,airflow已经部署在K8 cluster里。如果想修改airflow相关的配置,例如Airflow executor ,或者添加环境变量,需要修改helm的values.yaml对应的objec,见 续篇1。
References:
1. A journey to Airflow on Kubernetes
2. Airflow, the easy way
3. Get started with the Official Airflow Helm Chart