carbondata表中出现beeline连接问题简析:【 借助问题分析】
这个和起的的beeline问题有不同,主要这是链接carbondata的不是连接hive的
Q1:
[hdfs@ps-device-id-ydsc-229045 hive]$ $SPARK_HOME/bin/beeline -u jdbc:hive2://11.111.111.45:10000
Connecting to jdbc:hive2://11.111.111.45:10000
2023-02-01 09:26:16 [main] INFO Utils:325 - Supplied authorities: 11.111.111.45:10000
2023-02-01 09:26:16 [main] INFO Utils:444 - Resolved authority: 11.111.111.45:10000
2023-02-01 09:26:16 [main] ERROR HiveConnection:697 - Error opening session
org.apache.thrift.TApplicationException: Required field 'client_protocol' is unset! Struct:TOpenSessionReq(client_protocol:null, configuration:{set:hiveconf:hive.server2.thrift.resultset.default.fetch.size=1000, use:database=default})
at org.apache.thrift.TApplicationException.read(TApplicationException.java:111) ~[apache-carbondata-2.2.0-bin-spark3.1.1-hadoop2.7.2.jar:2.2.0]
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:79) ~[apache-carbondata-2.2.0-bin-spark3.1.1-hadoop2.7.2.jar:2.2.0]
at org.apache.hive.service.rpc.thrift.TCLIService$Client.recv_OpenSession(TCLIService.java:176) ~[hive-service-rpc-3.1.2.jar:3.1.2]
at org.apache.hive.service.rpc.thrift.TCLIService$Client.OpenSession(TCLIService.java:163) ~[hive-service-rpc-3.1.2.jar:3.1.2]
at org.apache.hive.jdbc.HiveConnection.openSession(HiveConnection.java:680) [hive-jdbc-2.3.7.jar:2.3.7]
at org.apache.hive.jdbc.HiveConnection.(HiveConnection.java:200) [hive-jdbc-2.3.7.jar:2.3.7]
at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:107) [hive-jdbc-2.3.7.jar:2.3.7]
at java.sql.DriverManager.getConnection(DriverManager.java:664) [?:1.8.0_162]
at java.sql.DriverManager.getConnection(DriverManager.java:208) [?:1.8.0_162]
at org.apache.hive.beeline.DatabaseConnection.connect(DatabaseConnection.java:145) [hive-beeline-2.3.7.jar:2.3.7]
at org.apache.hive.beeline.DatabaseConnection.getConnection(DatabaseConnection.java:209) [hive-beeline-2.3.7.jar:2.3.7]
at org.apache.hive.beeline.Commands.connect(Commands.java:1641) [hive-beeline-2.3.7.jar:2.3.7]
at org.apache.hive.beeline.Commands.connect(Commands.java:1536) [hive-beeline-2.3.7.jar:2.3.7]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_162]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_162]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_162]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_162]
at org.apache.hive.beeline.ReflectiveCommandHandler.execute(ReflectiveCommandHandler.java:56) [hive-beeline-2.3.7.jar:2.3.7]
at org.apache.hive.beeline.BeeLine.execCommandWithPrefix(BeeLine.java:1273) [hive-beeline-2.3.7.jar:2.3.7]
at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:1312) [hive-beeline-2.3.7.jar:2.3.7]
at org.apache.hive.beeline.BeeLine.connectUsingArgs(BeeLine.java:867) [hive-beeline-2.3.7.jar:2.3.7]
at org.apache.hive.beeline.BeeLine.initArgs(BeeLine.java:776) [hive-beeline-2.3.7.jar:2.3.7]
at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:1010) [hive-beeline-2.3.7.jar:2.3.7]
at org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:519) [hive-beeline-2.3.7.jar:2.3.7]
at org.apache.hive.beeline.BeeLine.main(BeeLine.java:501) [hive-beeline-2.3.7.jar:2.3.7]
2023-02-01 09:26:16 [main] WARN HiveConnection:205 - Failed to connect to 11.111.111.45:10000
Error: Could not open client transport with JDBC Uri: jdbc:hive2://11.111.111.45:10000: Could not establish connection to jdbc:hive2://11.111.111.45:10000: Required field 'client_protocol' is unset! Struct:TOpenSessionReq(client_protocol:null, configuration:{set:hiveconf:hive.server2.thrift.resultset.default.fetch.size=1000, use:database=default}) (state=08S01,code=0)
错误信息:org.apache.thrift.TApplicationException: Required field ‘client_protocol’ is unset! Struct:TOpenSessionReq(client_protocol:null, configuration:{set:hiveconf:hive.server2.thrift.resultset.default.fetch.size=1000, use:database=default})
我开始的分析步骤:
1.查看hive service的版本和hive的jdbc的版本是否一致
我核对了版本,发下hive的版本是 (version 1.2.2)但是我启动的hive目录中的jdbc的版本是2.3.7
这样我启动的时候报错,我发现linux上有两个版本的hive,一个是1.2.2一个是2.3.7,于是我将更改了环境变量、etc/profile中的hive目录,更改为hive的2.3.版本的路径,然后重启hive 的metastore和hiveserver2
先启动metastore
nohup hive --service metastore &
再启动hiveserver2
nohup hive --service hiveserver2 &
重启后9083和10000端口都起来了,通过beeline连接也都是正常的,但是就是看不到里面的数据
只有默认的default库
2.重新回顾问题:我们使用的是carbondata表,他的元数据是自己管理的,所以我启动hiveserver2连接的是hive的元数据,访问的是mysql的,我应该是启动查询carbondata的元数据的 CarbonData Thrift Server。所以停掉hiveserver2的进程
启动命令,使用的是yarn模式,队列名称,指定的启动资源
nohup $SPARK_HOME/bin/spark-submit --master yarn --queue query --conf spark.driver.maxResultSize=10g --conf spark.sql.shuffle.partition=300 --driver-memory 50g --executor-cores 4 --executor-memory 40G --num-executors 10 --class org.apache.carbondata.spark.thriftserver.CarbonThriftServer $SPARK_HOME/carbonlib/apache-carbondata-2.2.0-bin-spark3.1.1-hadoop2.7.2.jar > query_thrift_carbon_server_dm50g_ec4_em40g_ne10.out &
beeline连接正常。一个小问题,主要是自己分析问题方向误。
我这是在主节点,所以需要启动hive的metastore,然后各个节点通过thrift进行通讯,访问。
注意,如果先启动了hiveserver2,没有指定端口10000端口被占用,启动CarbonData Thrift Server会使用10001端口,所以连接10000端口也是访问不了。
CarbonData Thrift Server只是使用了hive的jdbc驱动
所以只需要启动metadata和CarbonData Thrift Server,不需要启动hiveserver2
(impala是使用hive的元数据管理的,这里是)