如题,笔者按照上百篇(有点夸张,50多篇吧)文章,终于在踩过了无数坑后,
搭建出了三worker节点的、CeleryExecutor+RabbitMQ+HAProxy的分布式Airflow集群(鬼知道我经历了什么)。
然后,就在我准备举杯庆祝的时候——
启动worker成功,启动webserver成功,启动scheduler后,总是报类似如下错误,
Traceback (most recent call last):
File "/home/yurii/Tools/anaconda3/bin/airflow", line 28, in args.func(args)
File"/home/yurii/Tools/anaconda3/lib/python3.6/site-packages/airflow/bin/cli.py", line 839, in scheduler job.run()
File "/home/yurii/Tools/anaconda3/lib/python3.6/site-packages/airflow/jobs.py", line 200, in run self._execute()
File "/home/yurii/Tools/anaconda3/lib/python3.6/site-packages/airflow/jobs.py", line 1309, in _execute self._execute_helper(processor_manager)
File "/home/yurii/Tools/anaconda3/lib/python3.6/site-packages/airflow/jobs.py", line 1441, in _execute_helper self.executor.heartbeat()
File "/home/yurii/Tools/anaconda3/lib/python3.6/site-packages/airflow/executors/base_executor.py", line 124, in heartbeat self.execute_async(key, command=command, queue=queue)
File "/home/yurii/Tools/anaconda3/lib/python3.6/site-packages/airflow/executors/celery_executor.py", line 80, in execute_async args=[command], queue=queue)
File "/home/yurii/Tools/anaconda3/lib/python3.6/site-packages/celery/app/task.py", line 573, in apply_async **dict(self._get_exec_options(), **options)
File "/home/yurii/Tools/anaconda3/lib/python3.6/site-packages/celery/app/base.py", line 354, in send_task reply_to=reply_to or self.oid, **options
File "/home/yurii/Tools/anaconda3/lib/python3.6/site-packages/celery/app/amqp.py", line 310, in publish_task **kwargs
File "/home/yurii/Tools/anaconda3/lib/python3.6/site-packages/kombu/messaging.py", line 172, in publish routing_key, mandatory, immediate, exchange, declare)
File "/home/yurii/Tools/anaconda3/lib/python3.6/site-packages/kombu/connection.py", line 449, in _ensured return fun(*args, **kwargs)
File "/home/yurii/Tools/anaconda3/lib/python3.6/site-packages/kombu/messaging.py", line 188, in _publish mandatory=mandatory, immediate=immediate,
File "/home/yurii/Tools/anaconda3/lib/python3.6/site-packages/librabbitmq/init.py", line 122, in basic_publish mandatory or False, immediate or False,
TypeError: an integer is required (got type NoneType)
这个坑还没等我放下酒杯就唰的一下,掉进去了,爬了好久。
搜遍了所有中文文章,基本上大同小异,没有关于深入问题的解决。忽然间想起了StackOverflow,问题终于得到了解决。
解决方法大致如下:
(如果你其他设置方面都没问题,比如,dags写的完全正确,其他配置也完全正确):
Airflow 默认使用了librabbitmq 库作为 RabbitMQ Broker,而RabbitMQ 核心团队是不推荐使用这种方法,
言外之意就是出错别找我,略略略~~。
所以要卸载掉 librabbitmq,在每台Airflow服务器上运行如下命令:
pip uninstall librabbitmq
并保证每台Airflow服务器安装了 Python库 amqp,命令如下:
pip install amqp
至此,问题解决。
StackOverflow原文地址[点我]
经验教训,对于发源于国外的技术,为了少走弯路,求助于国外论坛是明智的选择。
放出一些图片,表示我在认真做这件事:
Airflow测试demo:
scheduler 正常工作:
RabbitMQ 相关:
HAProxy负载均衡: