一、需要安装模块
pip install sasl
pip install thrift
pip install thrift-sasl
pip install pyhive
Thrift,sasl 要求最新版本
pip安装遇到的sasl问题,产生此问题的原因是由于缺少gcc c++相关的包:安装就好了。
yum install gcc-c++ python-devel.x86_64 cyrus-sasl-devel.x86_64
pip install pyhs2
二、代码模块
# -*- encoding=utf-8 -*-
import time
time1=time.time()
import pandas as pd
from pyhive import hive
cursor=hive.connect(host='XXXXXXXXXXXX',port='10000',username='dongli').cursor()
sql="""
此处放sql脚本
"""
cursor.execute(sql)
data=pd.DataFrame(cursor.fetchall())
print(data.head())
time2 = time.time()
print('总共耗时:' + str(time2 - time1) + 's')
方法二:采用Impyla连接
# -*- encoding=utf-8 -*-
from impala.dbapi import connect
import pandas as pd
conn=connect(host='XXXXXXXXX',port='XXXX',database='XXX',auth_mechanism='PLAIN')
cur=conn.cursor()
cur.excute("show databases")
data=pd.DataFrame(cur.fetchall())
cur.close()
conn.close()
参考连接:https://blog.csdn.net/dendi_hust/article/details/97294198