最近做数据ETL系统的更新,需要将原有ETL任务迁移到新的系统中,并验证数据的准确性。
目录
安装依赖包
核心代码
遇到的坑
依赖包版本
因为本人电脑是win本,所以只能使用impyla连接;其他系统还可以使用PyHive包进行连接
pip install impyla
pip install pure-sasl
pip install thrift_sasl
pip install thrift
pip install sasl
注意点:
直接使用pip安装sasl时,一般会报错!可以直接在前往https://www.lfd.uci.edu/~gohlke/pythonlibs/#sasl下载对应版本安装。(目前最高支持python3.7,更高的版本无法安装,后续是否支持待定)
from impala.dbapi import connect
from pandas.testing import assert_frame_equal
import pandas as pd
# 连接hive
hive_conn = connect(host='127.0.0.1', port=12446, database=db_name,
user=user, password=password, auth_mechanism='PLAIN')
cursor = hive_conn.cursor()
# 查询数据量
cursor.execute('select count(1) from %s where %s = %s' % (table_name, pt_col, date))
ret = cursor.fetchall()
for j in ret:
print(j)
ret = pd.DataFrame(ret)
print(ret)
1、
TypeError: can’t concat str to bytes
根据报错信息定位错误在\lib\site-packages\thrift_sasl\__init__.py第94行
header = struct.pack(">BI", status, len(body))
self._trans.write(header + body)
修改为
header = struct.pack(">BI", status, len(body))
if(type(body) is str):
body = body.encode()
self._trans.write(header + body)
2、
thrift.transport.TTransport.TTransportException: Could not start SASL: b'Error in sasl_client_start (-4) SASL(-4): no mechanism available: Unable to find a callback: 2'
此种错误是sasl包版本原因, 直接将原来安装sasl包卸载,然后重新安装0.2.0 版本的thrift-sasl即可
3、
ThriftParserError: ThriftPy does not support generating module with path in protocol ‘c’
根据报错信息定位到 \Lib\site-packages\thriftpy\parser\parser.py
if url_scheme == '':
with open(path) as fh:
data = fh.read()
elif url_scheme in ('http', 'https'):
data = urlopen(path).read()
else:
raise ThriftParserError('ThriftPy does not support generating module '
'with path in protocol \'{}\''.format(
url_scheme))
修改为
if url_scheme == '':
with open(path) as fh:
data = fh.read()
elif url_scheme in ('c', 'd','e','f''):
with open(path) as fh:
data = fh.read()
elif url_scheme in ('http', 'https'):
data = urlopen(path).read()
else:
raise ThriftParserError('ThriftPy does not support generating module '
'with path in protocol \'{}\''.format(
url_scheme))
thrift | 0.13.0 |
thrift-sasl | 0.2.1 |
thriftpy | 0.3.9 |
thriftpy2 | 0.4.0 |
bit-array | 0.1.0 |
bitarray | 2.2.3 |
pure-sasl | 0.6.2 |
impyla | 0.15a1 |