如何在Kubernetes 上部署Airflow

目录

  • 1 什么是airflow?
  • 2 什么是kubernetes?
  • 3 为什么在k8s上部署airflow?
  • 4 如何在Kubernetes 上部署Airflow
    • 4.1 Spin up a K8s cluster
      • 4.1.1. 安装 Kind(其他系统参照 [kind Installation](https://kind.sigs.k8s.io/docs/user/quick-start/))
      • 4.1.2. Kubectl
      • 4.1.3. Spin up a k8s cluster using kind with a name of airflow-kind
    • 4.2 安装Airflow
      • 4.2.1. 安装helm,并pull airflow from apache-airflow offical
      • 4.2.2 check the airflow via UI

1 什么是airflow?

  • It’s a data orchestration and scheduling platform
  • it’s a tool to manage your data-flows and data operations
  • allow running your task in a distributed manner
  • manage your task dependency within the dag.

个人理解为一个任务规划平台,利用Dags( directed acyclic graph)规划task的运行计划以及运行方式. 详细的内容参照123,这里不再赘述,拷贝的再全部都不如先把它跑起来,慢慢体会它的优点和缺点。

2 什么是kubernetes?

1 it is an open source platform for managing containerized workloads and services and it operates at the container level rather than at the hardware level
2 it provides you with a framework to run distributed systems resiliently.
3 It takes care of scaling and failover for your application, provides deployment patterns
4 service discovery and load balancing
5 storage orchestration Kubernetes allows you to automatically mount a storage system of your choice, such as local storages, public cloud providers, and more.
6 Automated rollouts and rollbacks
7 Self-healing Kubernetes restarts containers that fail, replaces containers, kills containers that don’t respond to your user-defined health check, and doesn’t advertise them to clients until they are ready to serve.
9 Secret and configuration management Kubernetes lets you store and manage sensitive information, such as passwords, OAuth tokens, and SSH keys.
10 comprises a set of independent, composable control processes that continuously drive the current state towards the provided desired state.

简言之,它是一个用来管理container的分布式平台,当你container由于各种原因fail的时候,它可以自动重新启动,避免了app的Down time;可以实现负载均衡;可以scaling up and down 你的应用,其他好处慢慢探索。至于缺点,就是在启动不同的pod的时候需要一定的启动时间,允许pod来pull不同的container application。详细的内容参照kubernetes 官方文档

3 为什么在k8s上部署airflow?

  • K8s give you the power to scale the application horizontally
  • to upscale it considering the peak hours
  • to downscale it at dawn to minimize needless costs
  • airflow has KubernetesOperator and KuberneteExecutor for spinning a pod(task) on a Kubernetes cluster.

主要是上述几点,也可以参照Airflow on Kubernetes 里面的详细论述。

4 如何在Kubernetes 上部署Airflow

终于到重头戏了,如何部署airflow在K8s上,日记将从以下几个方面开展。首先需要一个K8s cluster,然后部署airflow到K8的Worker Node上面,下面将进行详细描述。

4.1 Spin up a K8s cluster

根据官方文档安装部分,可以用Kind(Multiple Nodes), Minikube(One Node),Kubeadm(Linux-only,multiple nodes)以及二进制(the hard way,Linux only, maybe),具体安装参照 K8s-Install Tools。 本篇日记用kind 部署一个master node 和 3个work node 的一个K8 cluster。下面的安装操作步骤基于MacOS,对应于其他操作系统文章给出对应的安装链接。

4.1.1. 安装 Kind(其他系统参照 kind Installation)

brew install kind
kind --version
kind version 0.11.1

4.1.2. Kubectl

Kubectl is a command to tool which enable us to interact with K8s cluster, such as create,edit pods or any other K8s components.

brew install kubectl
kubectl version --client ##确认安装完成,返回信息如下
Client Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.7", GitCommit:"1dd5338295409edcfff11505e7bb246f0d325d15", GitTreeState:"clean", BuildDate:"2021-01-13T13:23:52Z", GoVersion:"go1.15.5", Compiler:"gc", Platform:"darwin/amd64"}
 

Set the default namespace as airflow for convenience.

alias k=kubectl
k config  set-context  ----current --namespace=airflow

4.1.3. Spin up a k8s cluster using kind with a name of airflow-kind

kind create cluster  --name airflow-kind --config kind-cluster.yaml

其中kind-cluster.yaml 文件配置如下:

kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
- role: worker
  kubeadmConfigPatches:
  - |
    kind: JoinConfiguration
    nodeRegistration:
      kubeletExtraArgs:
        node-labels: "node=worker_1"
  extraMounts:
  - hostPath: ./data
    containerPath: /tmp/data
- role: worker
  kubeadmConfigPatches:
  - |
    kind: JoinConfiguration
    nodeRegistration:
      kubeletExtraArgs:
        node-labels: "node=worker_2"
  extraMounts:
  - hostPath: ./data
    containerPath: /tmp/data
- role: worker
  kubeadmConfigPatches:
  - |
    kind: JoinConfiguration
    nodeRegistration:
      kubeletExtraArgs:
        node-labels: "node=worker_3"
  extraMounts:
  - hostPath: ./data
    containerPath: /tmp/data
kind get clusters
airflow-kind  ## 确认我们的cluster已经部署完毕,部署需要几分钟的时间,请耐心。

4.2 安装Airflow

大家都了解K8s部署应用时,需要配置应用各个部分的service,pod,pvc,pv,svc,deploy等一系列的内容,其流程比较复杂。同时,为了配置组件时配置的灵活性,我们借用helm,来实现airflow的部署。
关于什么是helm,请参照Helm- IBM,以及helm 官方文档

4.2.1. 安装helm,并pull airflow from apache-airflow offical

brew install helm
helm version ## check existence of helm
version.BuildInfo{Version:"v3.2.1", GitCommit:"fe51cd1e31e6a202cba7dead9552a6d418ded79a", GitTreeState:"clean", GoVersion:"go1.13.10"}
helm repo add apache-airflow  https://airflow.apache.org  ## add airflow 官方repo网址
helm repo update ## 更新 helm repo
helm install airflow apache-airflow/airflow --namespace=airflow  --debug ## namespace is the virtual space for organize the airflow resource inside k8s

helm ls -n airflow  ## check the deployment of airflow,返回如下

NAME   	NAMESPACE	REVISION	UPDATED                             	STATUS  	CHART        	APP VERSION
airflow	airflow  	4       	2022-07-28 17:53:31.619956 +0500 +05	deployed	airflow-1.6.0	2.3.0 

kubectl get nodes #check the nodes
NAME                         STATUS   ROLES                  AGE   VERSION
airflow-kind-control-plane   Ready    control-plane,master   22h   v1.21.1
airflow-kind-worker          Ready                     22h   v1.21.1
airflow-kind-worker2         Ready                     22h   v1.21.1
airflow-kind-worker3         Ready                     22h   v1.21.1


kubectl get pods -n airflow. ## check the health of pods inside k8s cluster,and it returns the following pod status.
NAME                                 READY   STATUS    RESTARTS   AGE
airflow-postgresql-0                 1/1     Running   0          22h
airflow-scheduler-c4fc586d6-fw2cl    2/2     Running   0          20h
airflow-statsd-7586f9998-kmvrm       1/1     Running   0          22h
airflow-triggerer-69dbf4b6f-hb2n5    1/1     Running   0          20h
airflow-webserver-84fb656485-rm2sj   1/1     Running   0          20h

4.2.2 check the airflow via UI

用下面的port-forward命令,可以用localhost:8080 来登录airflow, airflow 登录用户名和密码在values. yaml(为admin).

k port-forward svc/airflow-webserver 8080:8080

综上,airflow已经部署在K8 cluster里。如果想修改airflow相关的配置,例如Airflow executor ,或者添加环境变量,需要修改helm的values.yaml对应的objec,见 续篇1。

References:
1. A journey to Airflow on Kubernetes
2. Airflow, the easy way
3. Get started with the Official Airflow Helm Chart

你可能感兴趣的:(Airflow,kubernetes,kubernetes,云原生,docker)