python 借助pysh2包 连接hiveserver2操作hive数据库时,报如下错误提示信息:
python连接hive数据库时运行报错如下:
thrift.transport.TTransport.TTransportException: TSocket read 0 bytes
或者
Required field 'sessionHandle' is unset! Struct:TExecuteStatementReq(sessionHandle:null, statement:USE default, confOverlay:{})
hive 的hiveserver2的运行日志报错如下:org.apache.thrift.protocol.TProtocolException: Missing version in readMessageBegin, old client?
序—写在前面:
最近工作中开始接触大数据项目,由于对大数据相关的一些软件感兴趣,如Hadoop,Hbase,hive,thrift,zookeeper等软件包感兴趣,于是在工作间隙在本地mac安装这些开发环境的伪分布式,前几天顺利完成python利用thrift操作hbase的小程序编写,接着就想同样利用python来操作hive数据库,虽然最后成功完成该小程序,但其中过程之波折,主要是遇到如下这个问题,百度之,很少回答或者回答内容让人摸不着头脑,困扰了我快一天,最后在罗大神帮忙下,顺利解决该问题。此文仅仅说明该问题的解决过程,最后再次感谢,罗大神和峰哥的帮忙。一定要充分利用好日志!
闲话不说啦,开始正文啦。本文首选抛出本文要解决的问题,然后进行执行错误原因查找,之后给出原因分析及原因解决方案,最后补充给出Hive中HiveServer或者HiveServer2的区别。
localhost:bin a6$ pwd
/Users/a6/Applications/apache-hive-2.3.0-bin/bin
localhost:bin a6$ hive --service hiveserver2 &
localhost:bin a6$ sudo pip install pyhs2
Password:
The directory '/Users/a6/Library/Caches/pip/http' or its parent directory is not owned by the current user and the cache has been disabled. Please check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag.
The directory '/Users/a6/Library/Caches/pip' or its parent directory is not owned by the current user and caching wheels has been disabled. check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag.
Requirement already satisfied: pyhs2 in /Library/Python/2.7/site-packages
Requirement already satisfied: sasl in /Library/Python/2.7/site-packages (from pyhs2)
Requirement already satisfied: thrift in /Library/Python/2.7/site-packages/thrift-0.10.0-py2.7-macosx-10.12-intel.egg (from pyhs2)
Requirement already satisfied: six in /System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python (from sasl->pyhs2)
import pyhs2
with pyhs2.connect(host='localhost',
port=10000,
authMechanism="NOSASL",
user='a6',
password=''
#password='anonymous'
) as conn:
with conn.cursor() as cur:
#Show databases
print "connect hive database success"
print cur.getDatabases()
print "read data sucess"
/System/Library/Frameworks/Python.framework/Versions/2.7/bin/python2.7 /Users/a6/Downloads/PycharmProjects/test_use_hbase_by_thrift/test11.py
Traceback (most recent call last):
dssdskd
File "/Users/a6/Downloads/PycharmProjects/test_use_hbase_by_thrift/test11.py", line 13, in
print "sucess"
File "/Library/Python/2.7/site-packages/pyhs2/connections.py", line 58, in __exit__
self.close()
File "/Library/Python/2.7/site-packages/pyhs2/connections.py", line 78, in close
self.client.CloseSession(req)
File "/Library/Python/2.7/site-packages/pyhs2/TCLIService/TCLIService.py", line 184, in CloseSession
return self.recv_CloseSession()
File "/Library/Python/2.7/site-packages/pyhs2/TCLIService/TCLIService.py", line 195, in recv_CloseSession
(fname, mtype, rseqid) = self._iprot.readMessageBegin()
File "build/bdist.macosx-10.12-intel/egg/thrift/protocol/TBinaryProtocol.py", line 134, in readMessageBegin
File "build/bdist.macosx-10.12-intel/egg/thrift/protocol/TBinaryProtocol.py", line 217, in readI32
File "build/bdist.macosx-10.12-intel/egg/thrift/transport/TTransport.py", line 60, in readAll
File "build/bdist.macosx-10.12-intel/egg/thrift/transport/TTransport.py", line 161, in read
File "build/bdist.macosx-10.12-intel/egg/thrift/transport/TSocket.py", line 132, in read
thrift.transport.TTransport.TTransportException: TSocket read 0 bytes
localhost:conf a6$ pwd
/Users/a6/Applications/apache-hive-2.3.0-bin/conf
localhost:conf a6$ vi hive-site.xml
hive.server2.webui.host
0.0.0.0
The host address the HiveServer2 WebUI will listen on
hive.server2.webui.port
10002
The port the HiveServer2 WebUI will listen on. This can beset to 0 or a
negative integer to disable the web UI
2017-10-12T14:20:45,755 INFO [HiveServer2-Handler-Pool: Thread-42] session.SessionState: Resetting thread name to HiveServer2-Handler-Pool: Thread-42
2017-10-12T14:20:45,760 WARN [HiveServer2-Handler-Pool: Thread-42] thrift.ThriftCLIService: Error opening session:
org.apache.hive.service.cli.HiveSQLException: Failed to open new session: java.lang.RuntimeException: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException): User: a6 is not allowed to impersonate anonymous
at org.apache.hive.service.cli.session.SessionManager.createSession(SessionManager.java:419) ~[hive-service-2.3.0.jar:2.3.0]
at org.apache.hive.service.cli.session.SessionManager.openSession(SessionManager.java:362) ~[hive-service-2.3.0.jar:2.3.0]
at org.apache.hive.service.cli.CLIService.openSessionWithImpersonation(CLIService.java:193) ~[hive-service-2.3.0.jar:2.3.0]
at org.apache.hive.service.cli.thrift.ThriftCLIService.getSessionHandle(ThriftCLIService.java:440) ~[hive-service-2.3.0.jar:2.3.0]
at org.apache.hive.service.cli.thrift.ThriftCLIService.OpenSession(ThriftCLIService.java:322) ~[hive-service-2.3.0.jar:2.3.0]
at org.apache.hive.service.rpc.thrift.TCLIService$Processor$OpenSession.getResult(TCLIService.java:1377) ~[hive-exec-2.3.0.jar:2.3.0]
at org.apache.hive.service.rpc.thrift.TCLIService$Processor$OpenSession.getResult(TCLIService.java:1362) ~[hive-exec-2.3.0.jar:2.3.0]
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) ~[hive-exec-2.3.0.jar:2.3.0]
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) ~[hive-exec-2.3.0.jar:2.3.0]
at org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56) ~[hive-service-2.3.0.jar:2.3.0]
at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286) ~[hive-exec-2.3.0.jar:2.3.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_131]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_131]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131]
Caused by: java.lang.RuntimeException: java.lang.RuntimeException: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException): User: a6 is not allowed to impersonate anonymous
at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:89) ~[hive-service-2.3.0.jar:2.3.0]
at org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:36) ~[hive-service-2.3.0.jar:2.3.0]
at org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:63) ~[hive-service-2.3.0.jar:2.3.0]
at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_131]
at javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_131]
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1692) ~[hadoop-common-2.6.5.jar:?]
at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:59) ~[hive-service-2.3.0.jar:2.3.0]
at com.sun.proxy.$Proxy37.open(Unknown Source) ~[?:?]
at org.apache.hive.service.cli.session.SessionManager.createSession(SessionManager.java:410) ~[hive-service-2.3.0.jar:2.3.0]
... 13 more
Caused by: java.lang.RuntimeException: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException): User: a6 is not allowed to impersonate anonymous
三、原因分析及解决方案
python连接hive信息,报出如下信息:
Required field 'sessionHandle' is unset! Struct:TExecuteStatementReq(sessionHandle:null, statement:USE default, confOverlay:{})
显示,这个的时候说明你写的连接Hive的参数有问题。
我的这里的信息是hive账号出现了问题,导致权限不够。
请检查hive的username,或者其他连接信息、
或者项目的hive-jdbc版本和服务器不一致的原因造成的,替换成和服务器一致的版本就可以了,PS:hive前期版本中bug较多,推荐使用最新的版本
我的出错原因是执行查询hive操作的用户与配置hadoop和hive操作的用户不一致
2.解决方案
hadoop.proxyuser.a6.hosts
*
hadoop.proxyuser.a6.groups
*
localhost:hadoop a6$ pwd
/Users/a6/Applications/hadoop-2.6.5/etc/hadoop
localhost:hadoop a6$ sh ../../sbin/start-all.sh
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
17/10/12 15:07:52 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting namenodes on [localhost]
localhost: starting namenode, logging to /Users/a6/Applications/hadoop-2.6.5/logs/hadoop-a6-namenode-localhost.out
localhost: starting datanode, logging to /Users/a6/Applications/hadoop-2.6.5/logs/hadoop-a6-datanode-localhost.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /Users/a6/Applications/hadoop-2.6.5/logs/hadoop-a6-secondarynamenode-localhost.out
17/10/12 15:08:08 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
starting yarn daemons
starting resourcemanager, logging to /Users/a6/Applications/hadoop-2.6.5/logs/yarn-a6-resourcemanager-localhost.out
localhost: starting nodemanager, logging to /Users/a6/Applications/hadoop-2.6.5/logs/yarn-a6-nodemanager-localhost.out