Hadoop之通过Java Api连接Hive

在之前的学习和实践Hive中,使用的都是CLI或者hive –e的方式,该方式仅允许使用HiveQL执行查询、更新等操作,并且该方式比较笨拙单一。幸好Hive提供了轻客户端的实现,通过HiveServer或者HiveServer2,客户端可以在不启动CLI的情况下对Hive中的数据进行操作,两者都允许远程客户端使用多种编程语言如Java、Python向Hive提交请求,取回结果。HiveServer或者HiveServer2都是基于Thrift的,但HiveSever有时被称为Thrift
server,而HiveServer2却不会。既然已经存在HiveServer为什么还需要HiveServer2呢?这是因为HiveServer不能处理多于一个客户端的并发请求,这是由于HiveServer使用的Thrift接口所导致的限制,不能通过修改HiveServer的代码修正。因此在Hive-0.11.0版本中重写了HiveServer代码得到了HiveServer2,进而解决了该问题。HiveServer2支持多客户端的并发和认证,为开放API客户端如JDBC、ODBC提供了更好的支持。

所以本文将以HiveServer2为例,介绍并编写远程操作的Hive的Java API。

首先先列出并本文使用的hive的关键的配置信息:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
   hive.metastore.warehouse.dir
   /usr/hive/warehouse               //(hive中的数据库和表在HDFS中存放的文件夹的位置)
   location of default  database for  the warehouse
   hive.server2.thrift.port
   10000                                //(HiveServer2远程连接的端口,默认为10000)
   Port number of HiveServer2 Thrift interface .
   Can be overridden by setting $HIVE_SERVER2_THRIFT_PORT
 
   hive.server2.thrift.bind.host
   **.**.**.**                          //(hive所在集群的IP地址)
   Bind host on which to run the HiveServer2 Thrift interface .
   Can be overridden by setting $HIVE_SERVER2_THRIFT_BIND_HOST
   hive.server2. long .polling.timeout
   5000                                 // (默认为5000L,此处修改为5000,不然程序会报错)
   Time in milliseconds that HiveServer2 will wait, before responding to asynchronous calls that use long  polling
   javax.jdo.option.ConnectionURL
   jdbc:mysql: //localhost:3306/hive?createDatabaseIfNotExist=true  //(Hive的元数据库,我采用的是本地Mysql作为元数据库)
   JDBC connect string for  a JDBC metastore
 
                        
   javax.jdo.option.ConnectionDriverName          //(连接元数据的驱动名)
   com.mysql.jdbc.Driver
   Driver class  name for  a JDBC metastore
   javax.jdo.option.ConnectionUserName             //(连接元数据库用户名)
   root
   username to use against metastore database
 
   javax.jdo.option.ConnectionPassword             // (连接元数据库密码)
   root
   password to use against metastore database


 
  
 
  

确保上述正确配置后,下面启动HiveServer2服务:

先启动元数据库,在命令行中键入: hive --service metastore & (&符号表示该进程将在后台运行,因为执行此命令后命令行会卡住,如果没加此符号,用ctrl+C退回命令行输入界面后会自动shotdown 该服务)

如下图:



之后命令行会卡住, /tmp/root/ 目录下。此时查看日志文件hive.log,显示如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
[root@centos7-2 apache-hive-2.1.1-bin]# cd /tmp/root/
[root@centos7-2 root]# ls
hive.log  hive.log.2017-07-13  hive.log.2017-07-14  stderr
[root@centos7-2 root]# vi hive.log
STARTUP_MSG:   build = git://jcamachorodriguez-rMBP.local/Users/jcamachorodriguez/src/workspaces/hive/HIVE-release2/hive -r 1af77bbf8356e86cabbed92cfa8cc2e1470a1d5c; compiled by 'jcamachorodriguez' on Tue Nov 29 19:46:12 GMT 2016
************************************************************/
2017-07-17T09:54:45,504  INFO [main] metastore.HiveMetaStore: Starting hive metastore on port 9083
2017-07-17T09:54:45,597  INFO [main] metastore.HiveMetaStore: 0: Opening raw store with implementation class:org.apache.hadoop.hive.metastore.ObjectStore
2017-07-17T09:54:45,686  INFO [main] metastore.ObjectStore: ObjectStore, initialize called
2017-07-17T09:54:47,203  INFO [main] metastore.ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
2017-07-17T09:54:51,691  INFO [main] metastore.MetaStoreDirectSql: Using direct SQL, underlying DB is MYSQL
2017-07-17T09:54:51,703  INFO [main] metastore.ObjectStore: Initialized ObjectStore
2017-07-17T09:54:51,877  WARN [main] metastore.ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 2.1.0
2017-07-17T09:54:51,877  WARN [main] metastore.ObjectStore: setMetaStoreSchemaVersion called but recording version is disabled: version = 2.1.0, comment = Set by MetaStore [email protected]
2017-07-17T09:54:51,946  INFO [main] metastore.HiveMetaStore: Added admin role in metastore
2017-07-17T09:54:51,952  INFO [main] metastore.HiveMetaStore: Added public role in metastore
2017-07-17T09:54:51,991  INFO [main] metastore.HiveMetaStore: No user is added in admin role, since config is empty
2017-07-17T09:54:52,286  INFO [main] metastore.HiveMetaStore: Starting DB backed MetaStore Server with SetUGI enabled
2017-07-17T09:54:52,289  INFO [main] metastore.HiveMetaStore: Started the new metaserver on port [9083]...
2017-07-17T09:54:52,289  INFO [main] metastore.HiveMetaStore: Options.minWorkerThreads = 200
2017-07-17T09:54:52,289  INFO [main] metastore.HiveMetaStore: Options.maxWorkerThreads = 1000
此时证明metastore已经开启。

接下来开启hiveserver2服务:

在命令行中键入: hive --service hiveserver2 &

同上,也会出现命令行卡住的现象。查看日志文件如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
2016 - 04 - 26  04 : 53 : 24 , 212  INFO  [main]: server.HiveServer2 (HiveStringUtils.java:startupShutdownMessage( 605 )) - STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting HiveServer2
STARTUP_MSG:   host = master/(你之前配置的IP)
STARTUP_MSG:   args = []
STARTUP_MSG:   version = 0.13.0
STARTUP_MSG:   classpath = /opt/modules/hadoop-2.2.0/etc/hadoop:/opt/modules/hadoop-2.2.0/share/hadoop/common/lib
//(……中间略掉classpath内容,日志信息太长……)
STARTUP_MSG:   build = file:///Users/hbutani/svn/branch-0.13 -r Unknown; compiled by 'hbutani' on Tue Apr 15 13:55:42 PDT 2014
************************************************************/
2016 - 04 - 26  04 : 53 : 24 , 553  WARN  [main]: conf.HiveConf (HiveConf.java:initialize( 1390 )) - DEPRECATED: hive.metastore.ds.retry.* no longer has any effect.  Use hive.hmshandler.retry.* instead
2016 - 04 - 26  04 : 53 : 25 , 258  INFO  [main]: metastore.HiveMetaStore (HiveMetaStore.java:newRawStore( 494 )) - 0 : Opening raw store with implemenation class :org.apache.hadoop.hive.metastore.ObjectStore
2016 - 04 - 26  04 : 53 : 25 , 325  INFO  [main]: metastore.ObjectStore (ObjectStore.java:initialize( 245 )) - ObjectStore, initialize called
2016 - 04 - 26  04 : 53 : 28 , 312  WARN  [main]: conf.HiveConf (HiveConf.java:initialize( 1390 )) - DEPRECATED: hive.metastore.ds.retry.* no longer has any effect.  Use hive.hmshandler.retry.* instead
2016 - 04 - 26  04 : 53 : 28 , 313  INFO  [main]: metastore.ObjectStore (ObjectStore.java:getPMF( 314 )) - Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes= "Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
2016 - 04 - 26  04 : 53 : 31 , 537  INFO  [main]: metastore.ObjectStore (ObjectStore.java:setConf( 228 )) - Initialized ObjectStore
2016 - 04 - 26  04 : 53 : 32 , 064  INFO  [main]: metastore.HiveMetaStore (HiveMetaStore.java:createDefaultRoles( 552 )) - Added admin role in metastore
2016 - 04 - 26  04 : 53 : 32 , 079  INFO  [main]: metastore.HiveMetaStore (HiveMetaStore.java:createDefaultRoles( 561 )) - Added public  role in metastore
2016 - 04 - 26  04 : 53 : 32 , 205  INFO  [main]: metastore.HiveMetaStore (HiveMetaStore.java:addAdminUsers( 589 )) - No user is added in admin role, since config is empty
2016 - 04 - 26  04 : 53 : 33 , 887  INFO  [main]: session.SessionState (SessionState.java:start( 358 )) - No Tez session required at this  point. hive.execution.engine=mr.
2016 - 04 - 26  04 : 53 : 34 , 168  WARN  [main]: conf.HiveConf (HiveConf.java:initialize( 1390 )) - DEPRECATED: hive.metastore.ds.retry.* no longer has any effect.  Use hive.hmshandler.retry.* instead
2016 - 04 - 26  04 : 53 : 34 , 241  INFO  [main]: service.CompositeService (SessionManager.java:init( 70 )) - HiveServer2: Async execution thread pool size: 100
2016 - 04 - 26  04 : 53 : 34 , 241  INFO  [main]: service.CompositeService (SessionManager.java:init( 72 )) - HiveServer2: Async execution wait queue size: 100
2016 - 04 - 26  04 : 53 : 34 , 242  INFO  [main]: service.CompositeService (SessionManager.java:init( 74 )) - HiveServer2: Async execution thread keepalive time: 10
2016 - 04 - 26  04 : 53 : 34 , 244  INFO  [main]: service.AbstractService (AbstractService.java:init( 89 )) - Service:OperationManager is inited.
2016 - 04 - 26  04 : 53 : 34 , 247  INFO  [main]: service.AbstractService (AbstractService.java:init( 89 )) - Service:SessionManager is inited.
2016 - 04 - 26  04 : 53 : 34 , 247  INFO  [main]: service.AbstractService (AbstractService.java:init( 89 )) - Service:CLIService is inited.
2016 - 04 - 26  04 : 53 : 34 , 247  INFO  [main]: service.AbstractService (AbstractService.java:init( 89 )) - Service:ThriftBinaryCLIService is inited.
2016 - 04 - 26  04 : 53 : 34 , 247  INFO  [main]: service.AbstractService (AbstractService.java:init( 89 )) - Service:HiveServer2 is inited.
2016 - 04 - 26  04 : 53 : 34 , 248  INFO  [main]: service.AbstractService (AbstractService.java:start( 104 )) - Service:OperationManager is started.
2016 - 04 - 26  04 : 53 : 34 , 248  INFO  [main]: service.AbstractService (AbstractService.java:start( 104 )) - Service:SessionManager is started.
2016 - 04 - 26  04 : 53 : 34 , 248  INFO  [main]: service.AbstractService (AbstractService.java:start( 104 )) - Service:CLIService is started.
2016 - 04 - 26  04 : 53 : 34 , 698  INFO  [main]: metastore.HiveMetaStore (HiveMetaStore.java:addAdminUsers( 589 )) - No user is added in admin role, since config is empty
2016 - 04 - 26  04 : 53 : 34 , 699  INFO  [main]: metastore.HiveMetaStore (HiveMetaStore.java:logInfo( 624 )) - 0 : get_databases: default
2016 - 04 - 26  04 : 53 : 34 , 701  INFO  [main]: HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent( 306 )) - ugi=hh  ip=unknown-ip-addr  cmd=get_databases: default 
2016 - 04 - 26  04 : 53 : 34 , 725  INFO  [main]: metastore.HiveMetaStore (HiveMetaStore.java:newRawStore( 494 )) - 0 : Opening raw store with implemenation class :org.apache.hadoop.hive.metastore.ObjectStore
2016 - 04 - 26  04 : 53 : 34 , 728  INFO  [main]: metastore.ObjectStore (ObjectStore.java:initialize( 245 )) - ObjectStore, initialize called
2016 - 04 - 26  04 : 53 : 34 , 745  INFO  [main]: metastore.ObjectStore (ObjectStore.java:setConf( 228 )) - Initialized ObjectStore
2016 - 04 - 26  04 : 53 : 34 , 795  INFO  [main]: service.AbstractService (AbstractService.java:start( 104 )) - Service:ThriftBinaryCLIService is started.
2016 - 04 - 26  04 : 53 : 34 , 796  INFO  [main]: service.AbstractService (AbstractService.java:start( 104 )) - Service:HiveServer2 is started.
2016 - 04 - 26  04 : 53 : 34 , 947  WARN  [Thread- 5 ]: conf.HiveConf (HiveConf.java:initialize( 1390 )) - DEPRECATED: hive.metastore.ds.retry.* no longer has any effect.  Use hive.hmshandler.retry.* instead
2016 - 04 - 26  04 : 53 : 35 , 584  INFO  [Thread- 5 ]: thrift.ThriftCLIService (ThriftBinaryCLIService.java:run( 88 )) - ThriftBinaryCLIService listening on /(你的IP): 10000

你也可以通过下述命令查看hiveserver2是否已经开启:

1
2
[hh @master  Desktop]$ netstat -nl |grep 10000
tcp        0       0  (你的IP): 10000         0.0 . 0.0 :*                   LISTEN

 
  

此时证明hiveserver2服务已经开启!
(注意:一定要去查看日志信息,因为命令行并不会报错,如果启动失败,相应的异常会在日志信息中显示,日志文件hive.log的路径在$HIVE_HOME/conf/hive-log4j.properties中配置)

下面开始编写java API:

首先列出本程序依赖的Jar包:

1
2
3
4
5
6
7
8
hadoop- 2.2 . 0 /share/hadoop/common/hadoop-common- 2.2 . 0 .jar
$HIVE_HOME/lib/hive-exec- 0.11 . 0 .jar
$HIVE_HOME/lib/hive-jdbc- 0.11 . 0 .jar
$HIVE_HOME/lib/hive-metastore- 0.11 . 0 .jar
$HIVE_HOME/lib/hive-service- 0.11 . 0 .jar
$HIVE_HOME/lib/libfb303- 0.9 . 0 .jar
$HIVE_HOME/lib/commons-logging- 1.0 . 4 .jar
$HIVE_HOME/lib/slf4j-api- 1.6 . 1 .jar


下面贴出java代码:

JDBCToHiveUtils.java

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
import  java.sql.Connection;
import  java.sql.DriverManager;
import  java.sql.PreparedStatement;
import  java.sql.SQLException;
 
public  class  JDBCToHiveUtils {
     private  static  String driverName = "org.apache.hive.jdbc.HiveDriver" ;
     private  static  String Url= "jdbc:hive2://**.**.**.**:10000/default" ;    //填写hive的IP,之前在配置文件中配置的IP
     private  static  Connection conn;
     public  static  Connection getConnnection()
     {
         try
                {
                   Class.forName(driverName);
                   conn = DriverManager.getConnection(Url, "hh" , "" );        //此处的用户名一定是有权限操作HDFS的用户,否则程序会提示"permission deny"异常
                }
         catch (ClassNotFoundException e)  {
                    e.printStackTrace();
                    System.exit( 1 );
                 }
          catch  (SQLException e) {
             e.printStackTrace();
         }
         return  conn;
     }
     public  static  PreparedStatement prepare(Connection conn, String sql) {
         PreparedStatement ps = null ;
         try  {
             ps = conn.prepareStatement(sql);
         } catch  (SQLException e) {
             e.printStackTrace();
         }
         return  ps;
     }
}


QueryHiveUtils.java

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
import  java.sql.Connection;
import  java.sql.PreparedStatement;
import  java.sql.ResultSet;
import  java.sql.SQLException;
import  java.sql.Statement;
 
public  class  QueryHiveUtils {
     private  static  Connection conn=JDBCToHiveUtils.getConnnection();
     private  static  PreparedStatement ps;
     private  static  ResultSet rs;
     public  static  void  getAll(String tablename)
     {
         String sql= "select * from " +tablename;
         System.out.println(sql);
         try  {
             ps=JDBCToHiveUtils.prepare(conn, sql);
             rs=ps.executeQuery();
             int  columns=rs.getMetaData().getColumnCount();
             while (rs.next())
             {
                 for ( int  i= 1 ;i<=columns;i++)
                 {
                     System.out.print(rs.getString(i));
                     System.out.print( "\t\t" );
                 }
                 System.out.println();
             }
         } catch  (SQLException e) {
             // TODO Auto-generated catch block
             e.printStackTrace();
         }
 
     }
 
}


QuerHiveTest.java

1
2
3
4
5
6
7
8
public  class  QueryHiveTest {
 
     public  static  void  main(String[] args) {
         String tablename= "test1" ;
                 QueryHiveUtils.getAll(tablename);
     }
 
}


运行结果如下:
1
2
3
4
5
select * from test1
1        张三      男       20.0               
2        李四      女       35.0               
3        王五      男       null               
4        赵六      null         70.0

 参考:点击打开链接

你可能感兴趣的:(Hadoop)