使用python客户端访问hive

linux和windows环境下均可

1. python与hiveserver交互

#!/usr/bin/python2.7
#hive --service hiveserver >/dev/null 2>/dev/null&
#/opt/cloudera/parcels/CDH/lib/hive/lib/py
import sys
sys.path.append('C:/hadoop_jar/py')
from hive_service import ThriftHive
from hive_service.ttypes import HiveServerException
from thrift.transport import TSocket
from thrift import Thrift
from thrift.transport import TTransport
from thrift.protocol import TBinaryProtocol

if __name__=='__main__':
    try:
        socket = TSocket.TSocket('10.70.50.111', 10000)
        transport = TTransport.TBufferedTransport(socket)
        protocol = TBinaryProtocol.TBinaryProtocol(transport)
        client = ThriftHive.Client(protocol)
        sql = 'select * from test'
        transport.open()
        client.execute(sql)
        with open('C:/Users/DWJ/Desktop/python2hive.txt','w') as out_file:
            while client.fetchOne():
                out_file.write(client.fetchOne())
        transport.close()
    except Thrift.TException, tx:
        print'%s'%(tx.message)

其中,C:/hadoop_jar/py里的包来自于hive安装文件自带的py,如:/opt/cloudera/parcels/CDH/lib/hive/lib/py,将其添加到python中即可。

2. python与hiveserver2交互

#!/usr/bin/python2.7  
#hive --service hiveserver2 >/dev/null 2>/dev/null&  
#install pyhs2,first install cyrus-sasl-devel,gcc,libxml2-devel,libxslt-devel  
#hiveserver2 is different from hiveserver on authority
import pyhs2        
with pyhs2.connect(host='xx.xx.xx.xxx',port=10000,authMechanism="NOSASL",user='test',password='testdvlp',database='default') as conn:
    with conn.cursor() as cur:
        #Show databases
        print cur.getDatabases()
        #Execute query
        cur.execute("select * from test")
        #Return column info from query
        print cur.getSchema()
        #Fetch table results
        for i in cur.fetch():
            print i

其中,authMechanism的值取决于hive-site.xml里的配置

<name>hive.server2.authenticationname>
<value>NOSASLvalue>

默认为NONE,另外还可以为’NOSASL’, ‘PLAIN’, ‘KERBEROS’, ‘LDAP’.
另外,在widows下运行时,安装pyhs2会报错,因为有依赖包sasl无法下载,可到http://www.lfd.uci.edu/~gohlke/pythonlibs/里面下载相应windows版的whl包进行安装即可成功。

3 两种通讯有一个共同点,就是必须启动hive服务器。

hive --service hiveserver

或者

hive --service hiveserver2

如果出现如下错误:
使用python客户端访问hive_第1张图片
通过以下命令可查看端口使用情况:

 netstat -apn|grep 10000

则表示10000端口已启动。若端口被占用,可重新定制端口:

hive --service hiveserver -p 10008

另外,有时连接成功后,执行client.execute(sql)一直无反应,既不报错,也无运行结果,这个还未找到原因。

你可能感兴趣的:(python,Hive)