第一个Airflow的DAGS

1、在airflow目录下建个放dags的目录

/root/airflow/airflow.cfg

2、编写dags的脚本,是个python脚本
vi demo.py

from datetime import datetime, timedelta

from airflow import DAG
from airflow.utils import dates
from airflow.utils.helpers import chain
from airflow.operators.bash import BashOperator

default_args = {
        'owner': 'admin',
        'start_date': dates.days_ago(1),
        'retries': 1,
        'retry_delay': timedelta(seconds=5)
}

dag = DAG(
    'oug_dags', default_args=default_args, schedule_interval=timedelta(days=1))

test1 = BashOperator(
        task_id='test1',  # task_id
        bash_command= 'date',
        dag=dag
)

test3 = BashOperator(
        task_id='test3',
        bash_command='sleep 5',
        dag=dag)

test1.set_downstream(test3)

3、检查

python3 demo.py

如果不报错,就说明没问题
再用命令查看一下

airflow dags list  //查看是否你的dags已经有了
airflow tasks list oug_dags   //查看oug_dags下的tasks

可以看到test1 和test3两个task
测试一下test1

airflow tasks test oug_dags test1 20170803
airflow tasks test oug_dags test2 20170803

test1的test结果输入如下

[root@localhost dags]# airflow tasks test oug_dags test1 20170803
[2021-05-14 18:29:45,617] {dagbag.py:440} INFO - Filling up the DagBag from /root/airflow/dags
[2021-05-14 18:29:45,636] {baseoperator.py:1151} WARNING - Dependency <Task(BashOperator): create_entry_group>, delete_entry_group already registered
[2021-05-14 18:29:45,636] {baseoperator.py:1151} WARNING - Dependency <Task(BashOperator): delete_entry_group>, create_entry_group already registered
[2021-05-14 18:29:45,636] {baseoperator.py:1151} WARNING - Dependency <Task(BashOperator): create_entry_gcs>, delete_entry already registered
[2021-05-14 18:29:45,636] {baseoperator.py:1151} WARNING - Dependency <Task(BashOperator): delete_entry>, create_entry_gcs already registered
[2021-05-14 18:29:45,637] {baseoperator.py:1151} WARNING - Dependency <Task(BashOperator): create_tag>, delete_tag already registered
[2021-05-14 18:29:45,637] {baseoperator.py:1151} WARNING - Dependency <Task(BashOperator): delete_tag>, create_tag already registered
[2021-05-14 18:29:45,639] {baseoperator.py:1151} WARNING - Dependency <Task(_PythonDecoratedOperator): prepare_email>, send_email already registered
[2021-05-14 18:29:45,639] {baseoperator.py:1151} WARNING - Dependency <Task(EmailOperator): send_email>, prepare_email already registered
[2021-05-14 18:29:45,643] {example_kubernetes_executor_config.py:174} WARNING - Could not import DAGs in example_kubernetes_executor_config.py: No module named 'kubernetes'
[2021-05-14 18:29:45,643] {example_kubernetes_executor_config.py:175} WARNING - Install kubernetes dependencies with: pip install apache-airflow['cncf.kubernetes']
[2021-05-14 18:29:45,671] {taskinstance.py:826} INFO - Dependencies all met for <TaskInstance: oug_dags.test1 2017-08-03T00:00:00+00:00 [None]>
[2021-05-14 18:29:45,674] {taskinstance.py:826} INFO - Dependencies all met for <TaskInstance: oug_dags.test1 2017-08-03T00:00:00+00:00 [None]>
[2021-05-14 18:29:45,674] {taskinstance.py:1017} INFO -
--------------------------------------------------------------------------------
[2021-05-14 18:29:45,674] {taskinstance.py:1018} INFO - Starting attempt 1 of 2
[2021-05-14 18:29:45,674] {taskinstance.py:1019} INFO -
--------------------------------------------------------------------------------
[2021-05-14 18:29:45,675] {taskinstance.py:1038} INFO - Executing <Task(BashOperator): test1> on 2017-08-03T00:00:00+00:00
[2021-05-14 18:29:45,710] {taskinstance.py:1230} INFO - Exporting the following env vars:
AIRFLOW_CTX_DAG_OWNER=admin
AIRFLOW_CTX_DAG_ID=oug_dags
AIRFLOW_CTX_TASK_ID=test1
AIRFLOW_CTX_EXECUTION_DATE=2017-08-03T00:00:00+00:00
[2021-05-14 18:29:45,710] {bash.py:135} INFO - Tmp dir root location:
 /tmp
[2021-05-14 18:29:45,711] {bash.py:158} INFO - Running command: date
[2021-05-14 18:29:45,716] {bash.py:169} INFO - Output:
[2021-05-14 18:29:45,718] {bash.py:173} INFO - 2021年 05月 14日 星期五 18:29:45 CST
[2021-05-14 18:29:45,719] {bash.py:177} INFO - Command exited with return code 0
[2021-05-14 18:29:45,733] {taskinstance.py:1135} INFO - Marking task as SUCCESS. dag_id=oug_dags, task_id=test1, execution_date=20170803T000000, start_date=20210514T102945, end_date=20210514T102945

如果都没有问题,就可以运行了

对于Airflow的Scheduler我们一般会使用如下命令启动:

airflow scheduler -D

必须重新加载Web服务器,调度程序和工作程序才能使新配置生效
启动了,才能在网站上看到新建的DAG,才能执行,如果关了,就不能执行了

-D参数是后台启动的意思

4、后续问题

启动运行一段时间发现网站就起不来了,报错日志满了,翻查日志目录,发现里面的schedule目录里的日志达到44G,清理了以后就好了,为了避免再次发生,调整日志等级

vi airflow.cfg

设置日志等级,等级信息参考[logging level级别]
然后重启服务

[core]
#logging_level = INFO
logging_level = WARNING

NOTSET < DEBUG < INFO < WARNING < ERROR < CRITICAL

如果把log的级别设置为INFO, 那么小于INFO级别的日志都不输出, 大于等于INFO级别的日志都输出。也就是说,日志级别越高,打印的日志越不详细。默认日志级别为WARNING。

注意: 如果将logging_level改为WARNING或以上级别,则不仅仅是日志,命令行输出明细也会同样受到影响,也只会输出大于等于指定级别的信息,所以如果命令行输出信息不全且系统无错误日志输出,那么说明是日志级别过高导致的。

你可能感兴趣的:(技术,python,flask)