【python 连接hive】python 连接hive

一、需要安装模块

pip install sasl
pip install thrift
pip install thrift-sasl
pip install pyhive

Thrift,sasl 要求最新版本

pip安装遇到的sasl问题,产生此问题的原因是由于缺少gcc c++相关的包:安装就好了。

yum install gcc-c++ python-devel.x86_64 cyrus-sasl-devel.x86_64  
pip install pyhs2  

二、代码模块

# -*- encoding=utf-8 -*-
import time
time1=time.time()
import pandas as pd
from pyhive import  hive

cursor=hive.connect(host='XXXXXXXXXXXX',port='10000',username='dongli').cursor()



sql="""


此处放sql脚本

"""
cursor.execute(sql)

data=pd.DataFrame(cursor.fetchall())

print(data.head())


time2 = time.time()
print('总共耗时:' + str(time2 - time1) + 's')

方法二:采用Impyla连接

# -*- encoding=utf-8 -*-


from impala.dbapi import connect
import pandas as pd

conn=connect(host='XXXXXXXXX',port='XXXX',database='XXX',auth_mechanism='PLAIN')


cur=conn.cursor()

cur.excute("show databases")

data=pd.DataFrame(cur.fetchall())
cur.close()

conn.close()

参考连接:https://blog.csdn.net/dendi_hust/article/details/97294198

你可能感兴趣的:(数据科学--python)