airflow详细安装过程

airflow是Airbnb开源出的一个数据流管理工具,关于使用,可参考官网http://pythonhosted.org/airflow/

现将安装过程及踩过的坑分享给大家。

安装airflow

(为了避免对其他程序造成影响,故不想替换掉原有的python2.6.6,此处希望2.6与2.7两个版本共存,而且安装的pip、virtualenv等软件,也只希望在python27中存在)

安装独立的python2.7,只需要在configure时指定prefix为不同的目录即可,这样make install时就会安装到prefix目录,而不是/usr/local/bin

1、下载python2.7.11源码,https://www.python.org/downloads/source/

 2、源码安装

su - root
cd /usr/local/
tar -zxvf Python-2.7.11.tgz
mv Python-2.7.11 python27
cd python27
./configure --prefix=/usr/local/python27 #(修改为自己的路径)
make
make install

3、安装setuptools(需要将setuptools安装到python27下面, 服务器不能连接外网,故下载源码)

tar zvxf setuptools-23.1.0.tar.gz
cd setuptools-23.1.0/
/usr/local/python27/python setup.py install
4、安装pip(需要将pip安装到python27下面, 服务器不能连接外网,故下载源码)(pypi可设置为豆瓣的库)
tar zvxf pip-8.1.2.tar.gz  
cd pip-8.1.2/
/usr/local/python27/python setup.py install

5、安装virtualenv,其他安装方式参考官网https://virtualenv.pypa.io/en/latest/index.html

tar zvxf virtualenv-15.0.2.tar.gz
cd virtualenv-15.0.2/
/usr/local/python27/python setup.py install
  还需在 python2.6 下安装一次,否则在 python2.6 下创建 python2.7 virtualenv 时无法执行

6、由于执行virtualenv命令时,需要联网,所以还是需要设置代理,这里使用ccproxy

下载地址http://www.ccproxy.com/

需要在linux上设置环境变量

export https_proxy=xxx.xxx.xxx.xxx:808
export http_proxy=xxx.xxx.xxx.xxx:808

7、使用virtualenv生成临时环境   

virtualenv --pythonp=/usr/local/python27/bin/pythonairflowenv

这样 source airflowenv/bin/activate之后,就是使用python2.7的shell了

 8、安装mysql,不做赘述

 9、使用root用户安装mysql-devel,yum install mysql-devel

 10、安装mysql-python,python官网下载MySQL-python-1.2.5.zip,解压缩

source airflowenv/bin/activate
cd MySQL-python-1.2.5
python setup.py install

11、安装gevent

source airflowenv/bin/activate
pip install gevent

12、安装airflow

source airflowenv/bin/activate
export AIRFLOW_HOME=~/airflow (修改为自己的路径)
pip install airflow
# initialize the database
airflow initdb
13、vi $AIRFLOW_HOME/airflow.cfg文件

包括添加mysql的连接,设置executor等,其他参数请根据实际需要调整

executor = LocalExecutor
sql_alchemy_conn = mysql://username:password@ip:port/dbname

14、再次执行airflowinitdb,此时将在mysql中创建表

15、安装supervisor,使用supervisor启动airflow,一旦airflow挂掉,supervisor会自动重启airflow

source airflowenv/bin/activate
pip install supervisor
 编辑supervisord.conf文件,指定要启动的程序和日志输出路径
[program:airflow_scheduler]
command=/xxx/airflowenv/bin/airflow scheduler
stdout_logfile=/tmp/airflow_scheduler.log

使用如下命令启动

 supervisord -c /xxx/xxx/airflow/supervisord.conf


安装遇到的问题

1、airflowinitdb报错

(airflowenv)[email protected]:/xxx/xxx/airflowenv/bin$ airflow initdb

Traceback (most recent call last):

 File "/xxx/xxx/airflowenv/bin/airflow", line 4, in

   from airflow import configuration

 File "/xxx/xxx/airflowenv/lib/python2.7/site-packages/airflow/__init__.py",line 31, in

   from airflow.models import DAG

 File "/xxx/xxx/airflowenv/lib/python2.7/site-packages/airflow/models.py",line 56, in

   from airflow import settings, utils

 File "/xxx/xxx/airflowenv/lib/python2.7/site-packages/airflow/settings.py",line 76, in

   engine = create_engine(SQL_ALCHEMY_CONN, **engine_args)

 File "/xxx/xxx/airflowenv/lib/python2.7/site-packages/sqlalchemy/engine/__init__.py",line 386, in create_engine

   return strategy.create(*args, **kwargs)

 File "/xxx/xxx/airflowenv/lib/python2.7/site-packages/sqlalchemy/engine/strategies.py",line 75, in create

   dbapi = dialect_cls.dbapi(**dbapi_args)

 File "/xxx/xxx/airflowenv/lib/python2.7/site-packages/sqlalchemy/dialects/mysql/mysqldb.py",line 92, in dbapi

   return __import__('MySQLdb')

ImportError: No module named MySQLdb

 

缺少mysql-python模块,官网下载MySQL-python-1.2.5.zip,解压缩,

cd MySQL-python-1.2.5

python setup.py install

 

2、安装mysql-python后执行airflow initdb报错,

 

    _mysql.c:36:23: error:my_config.h: No such file or directory 

    _mysql.c:38:19: error:mysql.h: No such file or directory 

    _mysql.c:39:26: error:mysqld_error.h: No such file or directory 

    _mysql.c:40:20: error:errmsg.h: No such file or directory 

 

linux缺少mysql-devel包,使用yum install mysql-devel,或手工下载mysql-devel的rpm包,自己安装

 

3、执行airflow  webserver -p 8080启动webserver报错

Error: class uri 'gevent' invalid ornot found:

 

[Traceback (most recent call last):

 File "/xxx/xxx/airflowenv/lib/python2.7/site-packages/gunicorn/util.py",line 140, in load_class

   mod = import_module('.'.join(components))

 File "/xxx/xxx/software/python27/lib/python2.7/importlib/__init__.py",line 37, in import_module

   __import__(name)

 File "/xxx/xxx/airflowenv/lib/python2.7/site-packages/gunicorn/workers/ggevent.py",line 22, in

   raise RuntimeError("You need gevent installed to use thisworker.")

RuntimeError: You need geventinstalled to use this worker.

]

 

使用pip命令安装gevent pip install gevent


 


你可能感兴趣的:(工作流)