from impala.dbapi import connect
执行的时候报connect找不到 ,如果只 pip安装impyla是不行的.
https://github.com/cloudera/impyla
官网可以看到依赖的包
Required:
Python 2.6+ or 3.3+
six
, bit_array
thrift
Optional:
thrift_sasl==0.2.1
for hive and/or Kerberos support:
pandas
for conversion to DataFrame
objects; but see the Ibis project instead
sqlalchemy
for the SQLAlchemy engine
pytest
for running tests; unittest2
for testing on Python 2.6
主要是thrift比较难安装
直接啪会报错 :
ERROR: Complete output from command /Users/didi/.conda/envs/19july/bin/python -u -c 'import setuptools, tokenize;__file__='"'"'/private/var/folders/2w/tt1p_4td3yq9xlbl7c2t4jn00000gn/T/pip-install-ogzftbd1/thriftpy/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' bdist_wheel -d /private/var/folders/2w/tt1p_4td3yq9xlbl7c2t4jn00000gn/T/pip-wheel-_dsc9rzz --python-tag cp37:
ERROR: running bdist_wheel
The [wheel] section is deprecated. Use [bdist_wheel] instead.
......
fatal error: too many errors emitted, stopping now [-ferror-limit=]
20 errors generated.
error: command 'gcc' failed with exit status 1
----------------------------------------
ERROR: Failed building wheel for thriftpy
......
note: 'curexc_value' declared here
PyObject *curexc_value;
^
thriftpy/transport/cybase.c:3189:22: error: no member named 'exc_traceback' in 'struct _ts'; did you mean 'curexc_traceback'?
tmp_tb = tstate->exc_traceback;
^~~~~~~~~~~~~
curexc_traceback
解决办法 pip install cython
然后 pip install thriftpy 就没事了
又报错了: ModuleNotFoundError: No module named 'thrift_sasl'
如上面的optional里面有这个包 看来不是可选啊 也是必选的.sasl报错
ERROR: Complete output from command /Users/didi/.conda/envs/19july/bin/python -u -c 'import setuptools, tokenize;__file__='"'"'/private/var/folders/2w/tt1p_4td3yq9xlbl7c2t4jn00000gn/T/pip-install-cw5r7bt2/sasl/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' bdist_wheel -d /private/var/folders/2w/tt1p_4td3yq9xlbl7c2t4jn00000gn/T/pip-wheel-2b5qj55e --python-tag cp37:
thrift_sasl也报错了 这个时候切换conda安装 发现很多依赖包 自己识别了
但是最开始的impala安装conda是没有识别依赖包的
最后一点看到左边的项目了吧 安装包一定要安装在项目里 我之前都习惯安装在base 其实是错的
当一切就绪之后又报了hiveserve2 的错误 ,又报了hive内的执行错误 ,impala async=True ...........
但是机智的我已经看出来那不是包安装的问题了.Python 3.7 introduced a change which made async
a reserved keyword
>>> from impala.dbapi import connect
Traceback (most recent call last):
File "", line 1, in
File "../.venv/lib/python3.7/site-packages/impala/dbapi.py", line 28, in
import impala.hiveserver2 as hs2
File "../.venv/lib/python3.7/site-packages/impala/hiveserver2.py", line 340
async=True)
^
SyntaxError: invalid syntax
果真百度不到了 谷歌到了 https://github.com/cloudera/impyla/issues/312
问题就是python 3.7 对impyla的版本支持有问题 最新的不行 需要指定这个版本
pip install impyla==0.15a1
问题解决了 执行 生产文件
info_sql = impala_conn.cursor()
# info_sql.execute(
# '''set mapreduce.job.queuename=root.a-a.a-ai;''')
# 执行HQL语句
# 取出来数据
info_sql.execute(sql_sequnce)
info_data = info_sql.fetchall()
dt1 = pd.DataFrame(info_data, columns=['a', 'a', 'a', 'a', 'a'])
dt2=dt1[:1000]
dt2.to_csv('/Users/a/gongcheng/tyty.csv',encoding='utf-8-sig')
敏感数据我都aaaa了
解决了