airflow是Airbnb开源出的一个数据流管理工具,关于使用,可参考官网http://pythonhosted.org/airflow/
现将安装过程及踩过的坑分享给大家。
(为了避免对其他程序造成影响,故不想替换掉原有的python2.6.6,此处希望2.6与2.7两个版本共存,而且安装的pip、virtualenv等软件,也只希望在python27中存在)
安装独立的python2.7,只需要在configure时指定prefix为不同的目录即可,这样make install时就会安装到prefix目录,而不是/usr/local/bin
1、下载python2.7.11源码,https://www.python.org/downloads/source/
2、源码安装
su - root
cd /usr/local/
tar -zxvf Python-2.7.11.tgz
mv Python-2.7.11 python27
cd python27
./configure --prefix=/usr/local/python27 #(修改为自己的路径)
make
make install
3、安装setuptools(需要将setuptools安装到python27下面, 服务器不能连接外网,故下载源码)
tar zvxf setuptools-23.1.0.tar.gz
cd setuptools-23.1.0/
/usr/local/python27/python setup.py install
4、安装pip(需要将pip安装到python27下面, 服务器不能连接外网,故下载源码)(pypi可设置为豆瓣的库)
tar zvxf pip-8.1.2.tar.gz
cd pip-8.1.2/
/usr/local/python27/python setup.py install
5、安装virtualenv,其他安装方式参考官网https://virtualenv.pypa.io/en/latest/index.html
tar zvxf virtualenv-15.0.2.tar.gz
cd virtualenv-15.0.2/
/usr/local/python27/python setup.py install
还需在
python2.6
下安装一次,否则在
python2.6
下创建
python2.7
的
virtualenv
时无法执行
6、由于执行virtualenv命令时,需要联网,所以还是需要设置代理,这里使用ccproxy
下载地址http://www.ccproxy.com/
需要在linux上设置环境变量
export https_proxy=xxx.xxx.xxx.xxx:808
export http_proxy=xxx.xxx.xxx.xxx:808
7、使用virtualenv生成临时环境
virtualenv --pythonp=/usr/local/python27/bin/pythonairflowenv
这样 source airflowenv/bin/activate之后,就是使用python2.7的shell了
8、安装mysql,不做赘述
9、使用root用户安装mysql-devel,yum install mysql-devel
10、安装mysql-python,python官网下载MySQL-python-1.2.5.zip,解压缩
source airflowenv/bin/activate
cd MySQL-python-1.2.5
python setup.py install
11、安装gevent
source airflowenv/bin/activate
pip install gevent
12、安装airflow
source airflowenv/bin/activate
export AIRFLOW_HOME=~/airflow (修改为自己的路径)
pip install airflow
# initialize the database
airflow initdb
13、vi $AIRFLOW_HOME/airflow.cfg文件
包括添加mysql的连接,设置executor等,其他参数请根据实际需要调整
executor = LocalExecutor
sql_alchemy_conn = mysql://username:password@ip:port/dbname
14、再次执行airflowinitdb,此时将在mysql中创建表
15、安装supervisor,使用supervisor启动airflow,一旦airflow挂掉,supervisor会自动重启airflow
source airflowenv/bin/activate
pip install supervisor
编辑supervisord.conf文件,指定要启动的程序和日志输出路径
[program:airflow_scheduler]
command=/xxx/airflowenv/bin/airflow scheduler
stdout_logfile=/tmp/airflow_scheduler.log
使用如下命令启动
supervisord -c /xxx/xxx/airflow/supervisord.conf
1、airflowinitdb报错
(airflowenv)[email protected]:/xxx/xxx/airflowenv/bin$ airflow initdb
Traceback (most recent call last):
File "/xxx/xxx/airflowenv/bin/airflow", line 4, in
from airflow import configuration
File "/xxx/xxx/airflowenv/lib/python2.7/site-packages/airflow/__init__.py",line 31, in
from airflow.models import DAG
File "/xxx/xxx/airflowenv/lib/python2.7/site-packages/airflow/models.py",line 56, in
from airflow import settings, utils
File "/xxx/xxx/airflowenv/lib/python2.7/site-packages/airflow/settings.py",line 76, in
engine = create_engine(SQL_ALCHEMY_CONN, **engine_args)
File "/xxx/xxx/airflowenv/lib/python2.7/site-packages/sqlalchemy/engine/__init__.py",line 386, in create_engine
return strategy.create(*args, **kwargs)
File "/xxx/xxx/airflowenv/lib/python2.7/site-packages/sqlalchemy/engine/strategies.py",line 75, in create
dbapi = dialect_cls.dbapi(**dbapi_args)
File "/xxx/xxx/airflowenv/lib/python2.7/site-packages/sqlalchemy/dialects/mysql/mysqldb.py",line 92, in dbapi
return __import__('MySQLdb')
ImportError: No module named MySQLdb
缺少mysql-python模块,官网下载MySQL-python-1.2.5.zip,解压缩,
cd MySQL-python-1.2.5
python setup.py install
2、安装mysql-python后执行airflow initdb报错,
_mysql.c:36:23: error:my_config.h: No such file or directory
_mysql.c:38:19: error:mysql.h: No such file or directory
_mysql.c:39:26: error:mysqld_error.h: No such file or directory
_mysql.c:40:20: error:errmsg.h: No such file or directory
linux缺少mysql-devel包,使用yum install mysql-devel,或手工下载mysql-devel的rpm包,自己安装
3、执行airflow webserver -p 8080启动webserver报错
Error: class uri 'gevent' invalid ornot found:
[Traceback (most recent call last):
File "/xxx/xxx/airflowenv/lib/python2.7/site-packages/gunicorn/util.py",line 140, in load_class
mod = import_module('.'.join(components))
File "/xxx/xxx/software/python27/lib/python2.7/importlib/__init__.py",line 37, in import_module
__import__(name)
File "/xxx/xxx/airflowenv/lib/python2.7/site-packages/gunicorn/workers/ggevent.py",line 22, in
raise RuntimeError("You need gevent installed to use thisworker.")
RuntimeError: You need geventinstalled to use this worker.
]
使用pip命令安装gevent pip install gevent