python借助pysh2连接hiveserver2操作hive数据库时thrift.transport.TTransport.TTransportException: TSocket read 0

python 借助pysh2包 连接hiveserver2操作hive数据库时,报如下错误提示信息:

python连接hive数据库时运行报错如下:

thrift.transport.TTransport.TTransportException: TSocket read 0 bytes

或者

Required field 'sessionHandle' is unset! Struct:TExecuteStatementReq(sessionHandle:null, statement:USE default, confOverlay:{})

hive 的hiveserver2的运行日志报错如下:
2017-10-12T14:24:03,540  WARN [HiveServer2-Handler-Pool: Thread-39] service.CompositeService: Failed to open session
java.lang.RuntimeException: java.lang.RuntimeException: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException): User: a6 is not allowed to impersonate anonymous
………………
ERROR [HiveServer2-Handler-Pool: Thread-39] server.TThreadPoolServer: Thrift error occurred during processing of message.

org.apache.thrift.protocol.TProtocolException: Missing version in readMessageBegin, old client?

序—写在前面:

最近工作中开始接触大数据项目,由于对大数据相关的一些软件感兴趣,如Hadoop,Hbase,hive,thrift,zookeeper等软件包感兴趣,于是在工作间隙在本地mac安装这些开发环境的伪分布式,前几天顺利完成python利用thrift操作hbase的小程序编写,接着就想同样利用python来操作hive数据库,虽然最后成功完成该小程序,但其中过程之波折,主要是遇到如下这个问题,百度之,很少回答或者回答内容让人摸不着头脑,困扰了我快一天,最后在罗大神帮忙下,顺利解决该问题。此文仅仅说明该问题的解决过程,最后再次感谢,罗大神和峰哥的帮忙。一定要充分利用好日志!

闲话不说啦,开始正文啦。本文首选抛出本文要解决的问题,然后进行执行错误原因查找,之后给出原因分析及原因解决方案,最后补充给出Hive中HiveServer或者HiveServer2的区别。

一、抛出问题
1.启动MySQL,hadoop,hive之后,最后需要启动hiveserver2
python使用HiveServer2模式连接hive数据库服务器时,成功启动HiveServer2后并置于后台运行
localhost:bin a6$ pwd
/Users/a6/Applications/apache-hive-2.3.0-bin/bin
localhost:bin a6$ hive --service hiveserver2 &
默认端口是10000
也可启动时指定端口,命令如下
hive --service hiveserver2 --hiveconf hive.server2.thrift.port=10001 &

2.安装pyhs2这个python工具包,下面显示我已经安装成功。
localhost:bin a6$ sudo pip install pyhs2
Password:
The directory '/Users/a6/Library/Caches/pip/http' or its parent directory is not owned by the current user and the cache has been disabled. Please check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag.
The directory '/Users/a6/Library/Caches/pip' or its parent directory is not owned by the current user and caching wheels has been disabled. check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag.
Requirement already satisfied: pyhs2 in /Library/Python/2.7/site-packages
Requirement already satisfied: sasl in /Library/Python/2.7/site-packages (from pyhs2)
Requirement already satisfied: thrift in /Library/Python/2.7/site-packages/thrift-0.10.0-py2.7-macosx-10.12-intel.egg (from pyhs2)
Requirement already satisfied: six in /System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python (from sasl->pyhs2)
3.python利用pyhs2操作hive数据的代码如下 

import pyhs2
with pyhs2.connect(host='localhost',
                   port=10000,
                   authMechanism="NOSASL",
                   user='a6',
                   password=''
                   #password='anonymous'
                 ) as conn:
    with conn.cursor() as cur:
        #Show databases
        print "connect hive database success"
        print cur.getDatabases()
        print "read data sucess"

二、原因查找

1. python执行窗口报错如下:

/System/Library/Frameworks/Python.framework/Versions/2.7/bin/python2.7 /Users/a6/Downloads/PycharmProjects/test_use_hbase_by_thrift/test11.py
Traceback (most recent call last):
dssdskd
  File "/Users/a6/Downloads/PycharmProjects/test_use_hbase_by_thrift/test11.py", line 13, in 
    print "sucess"
  File "/Library/Python/2.7/site-packages/pyhs2/connections.py", line 58, in __exit__
    self.close()
  File "/Library/Python/2.7/site-packages/pyhs2/connections.py", line 78, in close
    self.client.CloseSession(req)
  File "/Library/Python/2.7/site-packages/pyhs2/TCLIService/TCLIService.py", line 184, in CloseSession
    return self.recv_CloseSession()
  File "/Library/Python/2.7/site-packages/pyhs2/TCLIService/TCLIService.py", line 195, in recv_CloseSession
    (fname, mtype, rseqid) = self._iprot.readMessageBegin()
  File "build/bdist.macosx-10.12-intel/egg/thrift/protocol/TBinaryProtocol.py", line 134, in readMessageBegin
  File "build/bdist.macosx-10.12-intel/egg/thrift/protocol/TBinaryProtocol.py", line 217, in readI32
  File "build/bdist.macosx-10.12-intel/egg/thrift/transport/TTransport.py", line 60, in readAll
  File "build/bdist.macosx-10.12-intel/egg/thrift/transport/TTransport.py", line 161, in read
  File "build/bdist.macosx-10.12-intel/egg/thrift/transport/TSocket.py", line 132, in read
thrift.transport.TTransport.TTransportException: TSocket read 0 bytes

2. hive 执行日志的web UI查找

     Hive从2.0版本开始,为HiveServer2提供了一个简单的WEB UI界面,界面中可以直观的看到当前链接的会话、历史日志、配置参数以及度量信息。
 1).查看并配置hiveserver2的web UI信息
localhost:conf a6$ pwd
/Users/a6/Applications/apache-hive-2.3.0-bin/conf
localhost:conf a6$ vi hive-site.xml
配置web ui 界面非常简单,两个参数:

    hive.server2.webui.host
    0.0.0.0
    The host address the HiveServer2 WebUI will listen on
  
  
    hive.server2.webui.port
    10002
    The port the HiveServer2 WebUI will listen on. This can beset to 0 or a 
negative integer to disable the web UI
  

修改配置文件后,必须需要重新启动HiveServer2,在浏览器中输入
http://localhost:10002/    或者     http://127.0.0.1:10002/

即可进入HiveServer2的WEB UI管理界面,然后就可以方便 查看其相关的执行log日志。

2).利用hiveserver2的web UI页面查看执行记录
在浏览器中输入
http://localhost:10002
依次选择“Local logs”——>“hive.log”——>然后翻阅到最下面查看报错信息。
http://localhost:10002/logs/hive.log
报错信息如下:
2017-10-12T14:20:45,755  INFO [HiveServer2-Handler-Pool: Thread-42] session.SessionState: Resetting thread name to  HiveServer2-Handler-Pool: Thread-42
2017-10-12T14:20:45,760  WARN [HiveServer2-Handler-Pool: Thread-42] thrift.ThriftCLIService: Error opening session:
org.apache.hive.service.cli.HiveSQLException: Failed to open new session: java.lang.RuntimeException: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException): User: a6 is not allowed to impersonate anonymous
        at org.apache.hive.service.cli.session.SessionManager.createSession(SessionManager.java:419) ~[hive-service-2.3.0.jar:2.3.0]
        at org.apache.hive.service.cli.session.SessionManager.openSession(SessionManager.java:362) ~[hive-service-2.3.0.jar:2.3.0]
        at org.apache.hive.service.cli.CLIService.openSessionWithImpersonation(CLIService.java:193) ~[hive-service-2.3.0.jar:2.3.0]
        at org.apache.hive.service.cli.thrift.ThriftCLIService.getSessionHandle(ThriftCLIService.java:440) ~[hive-service-2.3.0.jar:2.3.0]
        at org.apache.hive.service.cli.thrift.ThriftCLIService.OpenSession(ThriftCLIService.java:322) ~[hive-service-2.3.0.jar:2.3.0]
        at org.apache.hive.service.rpc.thrift.TCLIService$Processor$OpenSession.getResult(TCLIService.java:1377) ~[hive-exec-2.3.0.jar:2.3.0]
        at org.apache.hive.service.rpc.thrift.TCLIService$Processor$OpenSession.getResult(TCLIService.java:1362) ~[hive-exec-2.3.0.jar:2.3.0]
        at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) ~[hive-exec-2.3.0.jar:2.3.0]
        at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) ~[hive-exec-2.3.0.jar:2.3.0]
        at org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56) ~[hive-service-2.3.0.jar:2.3.0]
        at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286) ~[hive-exec-2.3.0.jar:2.3.0]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_131]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_131]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131]
Caused by: java.lang.RuntimeException: java.lang.RuntimeException: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException): User: a6 is not allowed to impersonate anonymous
        at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:89) ~[hive-service-2.3.0.jar:2.3.0]
        at org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:36) ~[hive-service-2.3.0.jar:2.3.0]
        at org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:63) ~[hive-service-2.3.0.jar:2.3.0]
        at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_131]
        at javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_131]
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1692) ~[hadoop-common-2.6.5.jar:?]
        at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:59) ~[hive-service-2.3.0.jar:2.3.0]
        at com.sun.proxy.$Proxy37.open(Unknown Source) ~[?:?]
        at org.apache.hive.service.cli.session.SessionManager.createSession(SessionManager.java:410) ~[hive-service-2.3.0.jar:2.3.0]
        ... 13 more
Caused by: java.lang.RuntimeException: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException): User: a6 is not allowed to impersonate anonymous

三、原因分析及解决方案
1. 原因分析:

python连接hive信息,报出如下信息:


Required field 'sessionHandle' is unset! Struct:TExecuteStatementReq(sessionHandle:null, statement:USE default, confOverlay:{})


显示,这个的时候说明你写的连接Hive的参数有问题。

我的这里的信息是hive账号出现了问题,导致权限不够。

请检查hive的username,或者其他连接信息、

或者项目的hive-jdbc版本和服务器不一致的原因造成的,替换成和服务器一致的版本就可以了,PS:hive前期版本中bug较多,推荐使用最新的版本

我的出错原因是执行查询hive操作的用户与配置hadoop和hive操作的用户不一致

     2.解决方案
  • 1). 修改hadoop 配置文件 etc/hadoop/core-site.xml,加入如下配置项

    
        hadoop.proxyuser.a6.hosts
        *
    

    
            hadoop.proxyuser.a6.groups
            *
    

      2). 最终配置结果如下图:
python借助pysh2连接hiveserver2操作hive数据库时thrift.transport.TTransport.TTransportException: TSocket read 0_第1张图片
  
3).修改hadoop的core-site.xml配置文件完成之后,需要重新启动hadoop服务(主要是hdfs服务)
localhost:hadoop a6$ pwd
/Users/a6/Applications/hadoop-2.6.5/etc/hadoop
localhost:hadoop a6$ sh ../../sbin/start-all.sh
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
17/10/12 15:07:52 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting namenodes on [localhost]
localhost: starting namenode, logging to /Users/a6/Applications/hadoop-2.6.5/logs/hadoop-a6-namenode-localhost.out
localhost: starting datanode, logging to /Users/a6/Applications/hadoop-2.6.5/logs/hadoop-a6-datanode-localhost.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /Users/a6/Applications/hadoop-2.6.5/logs/hadoop-a6-secondarynamenode-localhost.out
17/10/12 15:08:08 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
starting yarn daemons
starting resourcemanager, logging to /Users/a6/Applications/hadoop-2.6.5/logs/yarn-a6-resourcemanager-localhost.out
localhost: starting nodemanager, logging to /Users/a6/Applications/hadoop-2.6.5/logs/yarn-a6-nodemanager-localhost.out

四、Hive中HiveServer或者HiveServer2的区别

在之前的学习和实践Hive中,使用的都是CLI或者hive –e的方式,该方式仅允许使用HiveQL执行查询、更新等操作,并且该方式比较笨拙单一。幸好Hive提供了轻客户端的实现,通过HiveServer或者HiveServer2,客户端可以在不启动CLI的情况下对Hive中的数据进行操作,两者都允许远程客户端使用多种编程语言如Java、Python向Hive提交请求,取回结果。HiveServer或者HiveServer2都是基于Thrift的,但HiveSever有时被称为Thrift server,而HiveServer2却不会。既然已经存在HiveServer为什么还需要HiveServer2呢?这是因为HiveServer不能处理多于一个客户端的并发请求,这是由于HiveServer使用的Thrift接口所导致的限制,不能通过修改HiveServer的代码修正。因此在Hive-0.11.0版本中重写了HiveServer代码得到了HiveServer2,进而解决了该问题。HiveServer2支持多客户端的并发和认证,为开放API客户端如JDBC、ODBC提供了更好的支持。
       既然HiveServer2提供了更强大的功能,将会对其进行着重学习,但也会简单了解一下HiveServer的使用方法。在命令中输入hive --service help,结果如下。可以使用hive --service serviceName 启动特定的服务,如cli、hiverserver、hiveserver2等.

参考:
http://blog.csdn.net/u011686226/article/details/52044176
http://blog.csdn.net/vfgbv/article/details/51012806
http://blog.csdn.net/u012965373/article/details/52903389
http://blog.csdn.net/u012965373/article/details/52057968

 
  

你可能感兴趣的:(Python,mac,hive)