刚接触airflow,想在本地部署环境练练手,按照官方文档步骤很快就部署好了。那做点啥呢?往mysql数据库插入数据玩玩吧!airflow默认是连接splite数据库,那我们需要修改配置文件。
sql_alchemy_conn = mysql://user:[email protected]:3306/dbname
executor = LocalExecutor
记得先创建相关的数据库,建表,给用户授权。
airflow initdb
然后启动web服务,启动调度服务
airflow webserver -p 8080
airflow scheduler -D
&AIRFLOW_HOME/airflow/dags目录下。
#!/usr/bin/env python
# -*- coding: utf-8 -*-
__author__ = 'Jerry'
from airflow import DAG
from airflow.operators.mysql_operator import MySqlOperator
from airflow.operators.bash_operator import BashOperator
from datetime import datetime, timedelta
default_args = {
'owner':'airflow',
'depends_on_past':False,
'start_date':datetime(2019, 11, 17),
'email':['[email protected]'],
'email_on_failure':True,
'email_on_retry':True,
'retries':1,
'retry_delay':timedelta(minutes = 10),
}
dag = DAG('test_airflow', default_args=default_args, schedule_interval='00 20 * * *')
insert_data = """insert into person(name,gender,age) values ('欢欢', '女' , 28)
"""
t1 = BashOperator(
task_id = 'print_date',
bash_command = 'date',
dag = dag
)
t2 = MySqlOperator(
task_id = 'test_airflow',
sql = insert_data,
mysql_conn_id = 'airflow_db',
dag = dag
)
t1.set_downstream(t2)
然后坑爹的时候到了,任务会一直处于running的状态,查看日志信息发现报错了
*** Reading local file: /Users/iyourcar/airflow/logs/test_airflow/test_airflow/2019-11-18T17:04:19.816358+08:00/1.log
[2019-11-18 17:04:27,675] {
taskinstance.py:630} INFO - Dependencies all met for <TaskInstance: test_airflow.test_airflow 2019-11-18T17:04:19.816358+08:00 [queued]>
[2019-11-18 17:04:27,691] {
taskinstance.py:630} INFO - Dependencies all met for <TaskInstance: test_airflow.test_airflow 2019-11-18T17:04:19.816358+08:00 [queued]>
[2019-11-18 17:04:27,691] {
taskinstance.py:841} INFO -
--------------------------------------------------------------------------------
[2019-11-18 17:04:27,691] {
taskinstance.py:842} INFO - Starting attempt 1 of 2
[2019-11-18 17:04:27,691] {
taskinstance.py:843} INFO -
--------------------------------------------------------------------------------
[2019-11-18 17:04:27,701] {
taskinstance.py:862} INFO - Executing <Task(MySqlOperator): test_airflow> on 2019-11-18T17:04:19.816358+08:00
[2019-11-18 17:04:27,701] {
base_task_runner.py:133} INFO - Running: ['airflow', 'run', 'test_airflow', 'test_airflow', '2019-11-18T17:04:19.816358+08:00', '--job_id', '86', '--pool', 'default_pool', '--raw', '-sd', 'DAGS_FOLDER/test_airflow/test_airflow.py', '--cfg_path', '/var/folders/_j/jw067msn29s8756rcw2_fys80000gn/T/tmp3rkzfwu2']
[2019-11-18 17:04:28,179] {
base_task_runner.py:115} INFO - Job 86: Subtask test_airflow /Users/iyourcar/opt/anaconda3/lib/python3.6/site-packages/airflow/utils/sqlalchemy.py:40: DeprecationWarning: get: Accessing configuration method 'get' directly from the configuration module is deprecated. Please access the configuration from the 'configuration.conf' object via 'conf.get'
[2019-11-18 17:04:28,180] {
base_task_runner.py:115} INFO - Job 86: Subtask test_airflow tz = conf.get("core", "default_timezone")
[2019-11-18 17:04:28,180] {
base_task_runner.py:115} INFO - Job 86: Subtask test_airflow /Users/iyourcar/opt/anaconda3/lib/python3.6/site-packages/airflow/utils/timezone.py:30: DeprecationWarning: get: Accessing configuration method 'get' directly from the configuration module is deprecated. Please access the configuration from the 'configuration.conf' object via 'conf.get'
[2019-11-18 17:04:28,180] {
base_task_runner.py:115} INFO - Job 86: Subtask test_airflow tz = conf.get("core", "default_timezone")
[2019-11-18 17:04:28,288] {
base_task_runner.py:115} INFO - Job 86: Subtask test_airflow [2019-11-18 17:04:28,288] {
settings.py:252} INFO - settings.configure_orm(): Using pool settings. pool_size=5, max_overflow=10, pool_recycle=1800, pid=1639
[2019-11-18 17:04:28,992] {
base_task_runner.py:115} INFO - Job 86: Subtask test_airflow [2019-11-18 17:04:28,992] {
__init__.py:51} INFO - Using executor LocalExecutor
[2019-11-18 17:04:28,993] {
base_task_runner.py:115} INFO - Job 86: Subtask test_airflow [2019-11-18 17:04:28,992] {
dagbag.py:92} INFO - Filling up the DagBag from /Users/iyourcar/airflow/dags/test_airflow/test_airflow.py
[2019-11-18 17:04:29,023] {
base_task_runner.py:115} INFO - Job 86: Subtask test_airflow [2019-11-18 17:04:29,023] {
cli.py:545} INFO - Running <TaskInstance: test_airflow.test_airflow 2019-11-18T17:04:19.816358+08:00 [running]> on host iyourcars-macbook-air-6.local
[2019-11-18 17:04:29,043] {
mysql_operator.py:61} INFO - Executing: insert into person(name,gender,age) values ('欢欢', '女' , 28)
[2019-11-18 17:04:29,052] {
logging_mixin.py:112} INFO - [2019-11-18 17:04:29,051] {
base_hook.py:84} INFO - Using connection to: id: mysql_default. Host: mysql, Port: None, Schema: airflow, Login: root, Password: None, extra: {
}
[2019-11-18 17:04:29,052] {
taskinstance.py:1058} ERROR - (2005, "Unknown MySQL server host 'mysql' (0)")
Traceback (most recent call last):
File "/Users/iyourcar/opt/anaconda3/lib/python3.6/site-packages/airflow/models/taskinstance.py", line 930, in _run_raw_task
result = task_copy.execute(context=context)
File "/Users/iyourcar/opt/anaconda3/lib/python3.6/site-packages/airflow/operators/mysql_operator.py", line 67, in execute
parameters=self.parameters)
File "/Users/iyourcar/opt/anaconda3/lib/python3.6/site-packages/airflow/hooks/dbapi_hook.py", line 159, in run
with closing(self.get_conn()) as conn:
File "/Users/iyourcar/opt/anaconda3/lib/python3.6/site-packages/airflow/hooks/mysql_hook.py", line 116, in get_conn
conn = MySQLdb.connect(**conn_config)
File "/Users/iyourcar/opt/anaconda3/lib/python3.6/site-packages/MySQLdb/__init__.py", line 84, in Connect
return Connection(*args, **kwargs)
File "/Users/iyourcar/opt/anaconda3/lib/python3.6/site-packages/MySQLdb/connections.py", line 171, in __init__
super(Connection, self).__init__(*args, **kwargs2)
MySQLdb._exceptions.OperationalError: (2005, "Unknown MySQL server host 'mysql' (0)")
[2019-11-18 17:04:29,054] {
taskinstance.py:1081} INFO - Marking task as UP_FOR_RETRY
[2019-11-18 17:04:29,064] {
logging_mixin.py:112} INFO - [2019-11-18 17:04:29,064] {
configuration.py:299} WARNING - section/key [smtp/smtp_user] not found in config
[2019-11-18 17:04:32,666] {
logging_mixin.py:112} INFO - [2019-11-18 17:04:32,666] {
local_task_job.py:124} WARNING - Time since last heartbeat(0.01 s) < heartrate(5.0 s), sleeping for 4.991545 s
[2019-11-18 17:04:37,687] {
logging_mixin.py:112} INFO - [2019-11-18 17:04:37,687] {
local_task_job.py:124} WARNING - Time since last heartbeat(0.01 s) < heartrate(5.0 s), sleeping for 4.990359 s
[2019-11-18 17:04:42,708] {
logging_mixin.py:112} INFO - [2019-11-18 17:04:42,708] {
local_task_job.py:124} WARNING - Time since last heartbeat(0.01 s) < heartrate(5.0 s), sleeping for 4.991041 s
[2019-11-18 17:04:47,729] {
logging_mixin.py:112} INFO - [2019-11-18 17:04:47,729] {
local_task_job.py:124} WARNING - Time since last heartbeat(0.01 s) < heartrate(5.0 s), sleeping for 4.990174 s
[2019-11-18 17:04:52,749] {
logging_mixin.py:112} INFO - [2019-11-18 17:04:52,748] {
local_task_job.py:124} WARNING - Time since last heartbeat(0.01 s) < heartrate(5.0 s), sleeping for 4.990348 s
[2019-11-18 17:04:57,771] {
logging_mixin.py:112} INFO - [2019-11-18 17:04:57,771] {
local_task_job.py:124} WARNING - Time since last heartbeat(0.01 s) < heartrate(5.0 s), sleeping for 4.989278 s
[2019-11-18 17:05:02,790] {
logging_mixin.py:112} INFO - [2019-11-18 17:05:02,790] {
local_task_job.py:124} WARNING - Time since last heartbeat(0.01 s) < heartrate(5.0 s), sleeping for 4.990203 s
[2019-11-18 17:05:07,808] {
logging_mixin.py:112} INFO - [2019-11-18 17:05:07,808] {
local_task_job.py:124} WARNING - Time since last heartbeat(0.01 s) < heartrate(5.0 s), sleeping for 4.990449 s
[2019-11-18 17:05:12,824] {
logging_mixin.py:112} INFO - [2019-11-18 17:05:12,824] {
local_task_job.py:124} WARNING - Time since last heartbeat(0.01 s) < heartrate(5.0 s), sleeping for 4.990052 s
[2019-11-18 17:05:17,840] {
logging_mixin.py:112} INFO - [2019-11-18 17:05:17,840] {
local_task_job.py:124} WARNING - Time since last heartbeat(0.01 s) < heartrate(5.0 s), sleeping for 4.991262 s
[2019-11-18 17:05:22,855] {
logging_mixin.py:112} INFO - [2019-11-18 17:05:22,855] {
local_task_job.py:124} WARNING - Time since last heartbeat(0.01 s) < heartrate(5.0 s), sleeping for 4.990786 s
[2019-11-18 17:05:27,871] {
logging_mixin.py:112} INFO - [2019-11-18 17:05:27,871] {
local_task_job.py:124} WARNING - Time since last heartbeat(0.01 s) < heartrate(5.0 s), sleeping for 4.991559 s
[2019-11-18 17:05:32,887] {
logging_mixin.py:112} INFO - [2019-11-18 17:05:32,887] {
local_task_job.py:124} WARNING - Time since last heartbeat(0.01 s) < heartrate(5.0 s), sleeping for 4.991841 s
[2019-11-18 17:05:37,986] {
logging_mixin.py:112} INFO - [2019-11-18 17:05:37,986] {
local_task_job.py:124} WARNING - Time since last heartbeat(0.03 s) < heartrate(5.0 s), sleeping for 4.974319 s
[2019-11-18 17:05:42,988] {
logging_mixin.py:112} INFO - [2019-11-18 17:05:42,988] {
local_task_job.py:124} WARNING - Time since last heartbeat(0.01 s) < heartrate(5.0 s), sleeping for 4.991425 s
[2019-11-18 17:05:45,166] {
taskinstance.py:1093} ERROR - Failed to send email to: ['[email protected]']
[2019-11-18 17:05:45,167] {
taskinstance.py:1094} ERROR - [Errno 60] Operation timed out
Traceback (most recent call last):
File "/Users/iyourcar/opt/anaconda3/lib/python3.6/site-packages/airflow/models/taskinstance.py", line 930, in _run_raw_task
result = task_copy.execute(context=context)
File "/Users/iyourcar/opt/anaconda3/lib/python3.6/site-packages/airflow/operators/mysql_operator.py", line 67, in execute
parameters=self.parameters)
File "/Users/iyourcar/opt/anaconda3/lib/python3.6/site-packages/airflow/hooks/dbapi_hook.py", line 159, in run
with closing(self.get_conn()) as conn:
File "/Users/iyourcar/opt/anaconda3/lib/python3.6/site-packages/airflow/hooks/mysql_hook.py", line 116, in get_conn
conn = MySQLdb.connect(**conn_config)
File "/Users/iyourcar/opt/anaconda3/lib/python3.6/site-packages/MySQLdb/__init__.py", line 84, in Connect
return Connection(*args, **kwargs)
File "/Users/iyourcar/opt/anaconda3/lib/python3.6/site-packages/MySQLdb/connections.py", line 171, in __init__
super(Connection, self).__init__(*args, **kwargs2)
MySQLdb._exceptions.OperationalError: (2005, "Unknown MySQL server host 'mysql' (0)")
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/iyourcar/opt/anaconda3/lib/python3.6/site-packages/airflow/models/taskinstance.py", line 1083, in handle_failure
self.email_alert(error)
File "/Users/iyourcar/opt/anaconda3/lib/python3.6/site-packages/airflow/models/taskinstance.py", line 1305, in email_alert
send_email(self.task.email, subject, html_content)
File "/Users/iyourcar/opt/anaconda3/lib/python3.6/site-packages/airflow/utils/email.py", line 55, in send_email
mime_subtype=mime_subtype, mime_charset=mime_charset, **kwargs)
File "/Users/iyourcar/opt/anaconda3/lib/python3.6/site-packages/airflow/utils/email.py", line 101, in send_email_smtp
send_MIME_email(smtp_mail_from, recipients, msg, dryrun)
File "/Users/iyourcar/opt/anaconda3/lib/python3.6/site-packages/airflow/utils/email.py", line 121, in send_MIME_email
s = smtplib.SMTP_SSL(SMTP_HOST, SMTP_PORT) if SMTP_SSL else smtplib.SMTP(SMTP_HOST, SMTP_PORT)
File "/Users/iyourcar/opt/anaconda3/lib/python3.6/smtplib.py", line 251, in __init__
(code, msg) = self.connect(host, port)
File "/Users/iyourcar/opt/anaconda3/lib/python3.6/smtplib.py", line 336, in connect
self.sock = self._get_socket(host, port, self.timeout)
File "/Users/iyourcar/opt/anaconda3/lib/python3.6/smtplib.py", line 307, in _get_socket
self.source_address)
File "/Users/iyourcar/opt/anaconda3/lib/python3.6/socket.py", line 724, in create_connection
raise err
File "/Users/iyourcar/opt/anaconda3/lib/python3.6/socket.py", line 713, in create_connection
sock.connect(sa)
TimeoutError: [Errno 60] Operation timed out
[2019-11-18 17:05:45,179] {
base_task_runner.py:115} INFO - Job 86: Subtask test_airflow Traceback (most recent call last):
[2019-11-18 17:05:45,179] {
base_task_runner.py:115} INFO - Job 86: Subtask test_airflow File "/Users/iyourcar/opt/anaconda3/bin/airflow", line 37, in <module>
[2019-11-18 17:05:45,179] {
base_task_runner.py:115} INFO - Job 86: Subtask test_airflow args.func(args)
[2019-11-18 17:05:45,179] {
base_task_runner.py:115} INFO - Job 86: Subtask test_airflow File "/Users/iyourcar/opt/anaconda3/lib/python3.6/site-packages/airflow/utils/cli.py", line 74, in wrapper
[2019-11-18 17:05:45,179] {
base_task_runner.py:115} INFO - Job 86: Subtask test_airflow return f(*args, **kwargs)
[2019-11-18 17:05:45,180] {
base_task_runner.py:115} INFO - Job 86: Subtask test_airflow File "/Users/iyourcar/opt/anaconda3/lib/python3.6/site-packages/airflow/bin/cli.py", line 551, in run
[2019-11-18 17:05:45,180] {
base_task_runner.py:115} INFO - Job 86: Subtask test_airflow _run(args, dag, ti)
[2019-11-18 17:05:45,180] {
base_task_runner.py:115} INFO - Job 86: Subtask test_airflow File "/Users/iyourcar/opt/anaconda3/lib/python3.6/site-packages/airflow/bin/cli.py", line 469, in _run
[2019-11-18 17:05:45,180] {
base_task_runner.py:115} INFO - Job 86: Subtask test_airflow pool=args.pool,
[2019-11-18 17:05:45,180] {
base_task_runner.py:115} INFO - Job 86: Subtask test_airflow File "/Users/iyourcar/opt/anaconda3/lib/python3.6/site-packages/airflow/utils/db.py", line 74, in wrapper
[2019-11-18 17:05:45,180] {
base_task_runner.py:115} INFO - Job 86: Subtask test_airflow return func(*args, **kwargs)
[2019-11-18 17:05:45,180] {
base_task_runner.py:115} INFO - Job 86: Subtask test_airflow File "/Users/iyourcar/opt/anaconda3/lib/python3.6/site-packages/airflow/models/taskinstance.py", line 930, in _run_raw_task
[2019-11-18 17:05:45,180] {
base_task_runner.py:115} INFO - Job 86: Subtask test_airflow result = task_copy.execute(context=context)
[2019-11-18 17:05:45,180] {
base_task_runner.py:115} INFO - Job 86: Subtask test_airflow File "/Users/iyourcar/opt/anaconda3/lib/python3.6/site-packages/airflow/operators/mysql_operator.py", line 67, in execute
[2019-11-18 17:05:45,180] {
base_task_runner.py:115} INFO - Job 86: Subtask test_airflow parameters=self.parameters)
[2019-11-18 17:05:45,180] {
base_task_runner.py:115} INFO - Job 86: Subtask test_airflow File "/Users/iyourcar/opt/anaconda3/lib/python3.6/site-packages/airflow/hooks/dbapi_hook.py", line 159, in run
[2019-11-18 17:05:45,180] {
base_task_runner.py:115} INFO - Job 86: Subtask test_airflow with closing(self.get_conn()) as conn:
[2019-11-18 17:05:45,180] {
base_task_runner.py:115} INFO - Job 86: Subtask test_airflow File "/Users/iyourcar/opt/anaconda3/lib/python3.6/site-packages/airflow/hooks/mysql_hook.py", line 116, in get_conn
[2019-11-18 17:05:45,180] {
base_task_runner.py:115} INFO - Job 86: Subtask test_airflow conn = MySQLdb.connect(**conn_config)
[2019-11-18 17:05:45,180] {
base_task_runner.py:115} INFO - Job 86: Subtask test_airflow File "/Users/iyourcar/opt/anaconda3/lib/python3.6/site-packages/MySQLdb/__init__.py", line 84, in Connect
[2019-11-18 17:05:45,180] {
base_task_runner.py:115} INFO - Job 86: Subtask test_airflow return Connection(*args, **kwargs)
[2019-11-18 17:05:45,180] {
base_task_runner.py:115} INFO - Job 86: Subtask test_airflow File "/Users/iyourcar/opt/anaconda3/lib/python3.6/site-packages/MySQLdb/connections.py", line 171, in __init__
[2019-11-18 17:05:45,180] {
base_task_runner.py:115} INFO - Job 86: Subtask test_airflow super(Connection, self).__init__(*args, **kwargs2)
[2019-11-18 17:05:45,180] {
base_task_runner.py:115} INFO - Job 86: Subtask test_airflow MySQLdb._exceptions.OperationalError: (2005, "Unknown MySQL server host 'mysql' (0)")
[2019-11-18 17:05:47,983] {
logging_mixin.py:112} INFO - [2019-11-18 17:05:47,982] {
local_task_job.py:103} INFO - Task exited with return code 1
配置信息错误,但是检查了配置文件设置没有毛病,百度、谷歌都没有这个异常有关的信息,最后只能自己瞎捣鼓。web界面有一个connection会不会跟这有关呢,点开看看。
看到了mysql,再点进去看看
填写相关信息,保存,py文件里面的mysql_conn_id 就是这里的Conn_id。
t2 = MySqlOperator(
task_id = 'test_airflow',
sql = insert_data,
mysql_conn_id = 'airflow_db',
dag = dag
)
然后重启web服务,手动执行task,居然可以了。