helm部署airflow的主要步骤

airflow版本:2.4.3

airflow-helm版本:1.8.0

k8s版本:1.20

时间:2023/03/30

说明:本文在测试环境已完成全流程跑通,尚未在生产环境使用,计划上生产。

最终效果

只有webserver和scheduler这2个容器,日志挂载在vm的路径(自动会收集到elk日志平台),用LocalExecutor。

常用命令:

helm install  my-airflow --namespace 命名空间 ./airflow1.8.0     #安装
helm uninstall  my-airflow --namespace 命名空间 ./airflow1.8.0   #卸载
helm upgrade  my-airflow --namespace 命名空间 ./airflow1.8.0     #更新

详细步骤

0.创建k8s密钥

kubectl create secret generic my-webserver-secret --from-literal="webserver-secret-key=$(python3 -c 'import secrets; print(secrets.token_hex(16))')"  -n 命名空间

1.修改values.yaml

#1.密钥
webserverSecretKeySecretName: my-webserver-secret

#2.postgresql  
postgresql:
  enabled: false
  
pgbouncer:
  enabled: false
  
  
#3.redis  
redis:
  enabled: false
  
#4.xecutor配置 
executor: "LocalExecutor"

#5.开dag例子 
extraEnv: |
  - name: AIRFLOW__CORE__LOAD_EXAMPLES
    value: 'True'

#6.置节点选择和容忍污点  *改为实际要改的内容
# Select certain nodes for airflow pods.
nodeSelector: 
  *: *
affinity: {}
tolerations: 
- effect: NoSchedule
  key: *
  value: *

#7.参数修改
# 镜像配置
# Default airflow tag to deploy
defaultAirflowTag: "2.3.4-python3.7"

aiflow version (Used to make some decisions based on Airflow Version being deployed)
airflowVersion: "2.3.4"


#8.像配置  *改为实际要改的内容
images:
  airflow:
    repository: *
    tag: *
    pullPolicy: IfNotPresent

#9.镜像密钥  *改为实际要改的内容
registry:
  secretName: *


#10.ingress配置开开关,并设置host)  host改为实际的内容
# Ingress configuration
ingress:
  # Enable all ingress resources (deprecated - use ingress.web.enabled and ingress.flower.enabled)
  enabled: ~

  # Configs for the Ingress of the web Service
  web:
    # Enable web ingress resource
    enabled: true #改这里

    # Annotations for the web Ingress
    annotations: {}

    # The path for the web Ingress
    path: "/"

    # The pathType for the above path (used only with Kubernetes v1.19 and above)
    pathType: "ImplementationSpecific"

    # The hostname for the web Ingress (Deprecated - renamed to `ingress.web.hosts`)
    host: "airflow-web.命名空间.svc.za" # 根据实际改
    # The hostnames or hosts configuration for the web Ingress
    hosts: []
    

#11.日志修改1
logs:
  persistence:
    # Enable persistent volume for storing logs
    enabled: false
    # Volume size for logs
    size: 0Gi
#12.日志修改2
  logGroomerSidecar:
    # Whether to deploy the Airflow scheduler log groomer sidecar.
    enabled: false

#13.日志修改3
.Values.workers.persistence.enabled = false

#14.禁用triggerer服务
# Airflow Triggerer Config
triggerer:
  enabled: false
  
  
#15.配置mysql  *按需修改为需要的内容
data:
  metadataSecretName: ~
  resultBackendSecretName: ~
  brokerUrlSecretName: ~
  metadataConnection:
    user: airflow
    pass: *
    protocol: mysql
    host: *
    port: 3306
    db: airflow_k8s_test
    sslmode: disable

#16.禁用statsd (有待商榷)
statsd:
  enabled: false
  

#17.时区配置   
# Volumes for all airflow containers
volumes:
- hostPath:
    path: /etc/localtime
  name: vm-localtime
- hostPath:
    path: /etc/timezone
  name: vm-timezone


#18.VolumeMounts for all airflow containers
volumeMounts:
- mountPath: /etc/localtime
  name: vm-localtime
  readOnly: true
- mountPath: /etc/timezone
  name: vm-timezone
  readOnly: true


#19.容器属组属组配置  按需
# User and group of airflow user
uid: 0
gid: 0

#20.按需修改对应服务的反亲和 多个服务请参考如下一同修改
  # Select certain nodes for airflow scheduler pods.
  nodeSelector: {}
  affinity: 
    podAntiAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - podAffinityTerm:
          labelSelector:
            matchLabels:
              component: airflow-scheduler #按需修改
          topologyKey: kubernetes.io/zone  #可用区级反亲和
        weight: 100
     
     
#21.按需设置容器资源限制
忽略

#22.副本数
推荐web和scheduler副本数都为1

2.修改模板

#1.日志挂载到vm
scheduler 的dp模板修改:注释emptyDir添加如下内容
{{- else if not $stateful }}
        - name: logs
          hostPath:
            path: /tmp/airflow-scheduler-log  #此处写上vm挂载日志的目录,按需修改
          #emptyDir: {}
{{- else }}


#2.pod名,服务名等自定义
请按需对各个模板文件进行修改

3.日志采集filebeat的配置修改

#如果filebeat采集日志的化,注意修改filebeat的日志采集路径,如下参考。
#因为scheduler的日志在多个层级的目录中
#请按需配置日志清理
    - /tmp/airflow-scheduler-log/*/*.log
    - /tmp/airflow-scheduler-log/*/*/*.log
    - /tmp/airflow-scheduler-log/*/*/*/*.log
    - /tmp/airflow-scheduler-log/*/*/*/*/*.log

4.重建镜像,解决日志的bug

#参考链接:https://blog.csdn.net/weixin_40861707/article/details/119918467

#问题现象-无法看日志,会提示下面这个日志看不到(个人觉得是如下第二行应该访问service的,访问成了pod)
*** Log file does not exist: /opt/airflow/logs/dag_id=example_bash_operator/run_id=manual__2023-03-30T07:00:55.479560+00:00/task_id=runme_1/attempt=1.log
*** Fetching from: http://airflow-scheduler-fljs8998-fsj4873:8793/log/dag_id=example_bash_operator/run_id=manual__2023-03-30T07:00:55.479560+00:00/task_id=runme_1/attempt=1.log
*** Failed to fetch log file from worker. [Errno 111] Connection refused



#解决办法
步骤一:修改原始镜像的file_task_handler.py文件的190行附件
---修改前-----
url = os.path.join("http://{ti.hostname}:{worker_log_server_port}/log", log_relative_path).format(
                ti=ti, worker_log_server_port=conf.get('logging', 'WORKER_LOG_SERVER_PORT')
            )
---修改前-----

主要是把pod name改为service name

---修改后-----
hostname_junyang={ti.hostname}
hostname_junyang1="http://svc-"+"-".join(list(hostname_junyang)[0].split("-")[:-2])+":{worker_log_server_port}/log"
url = os.path.join(hostname_junyang1, log_relative_path).format(
                ti=ti, worker_log_server_port=conf.get('logging', 'WORKER_LOG_SERVER_PORT')
            )
---修改后-----

步骤二:编写dockerfile
FROM  airflow:2.3.4-python3.7
RUN   rm -f /home/airflow/.local/lib/python3.7/site-packages/airflow/utils/log/file_task_handler.py && rm -f  /home/airflow/.local/lib/python3.7/site-packages/airflow/utils/log/__pycache__/file_task_handler.cpython-37.pyc || echo 123
COPY  --chown=airflow:root  file_task_handler.py  /home/airflow/.local/lib/python3.7/site-packages/airflow/utils/log/file_task_handler.py

步骤三:构建并推送镜像

步骤三:修改values.yaml中的镜像,并upgrade

你可能感兴趣的:(kubernetes,云原生)