hive实战

１.　前言

作为数据仓库的工具，hive提供了两种ETL运行方式，分别是通过Hive 命令行和beeline客户端；

命令行方式即通过hive进入命令模式后通过执行不同的HQL命令得到对应的结果；相当于胖客户端模式，即客户机中需要安装JRE环境和Hive程序。

beeline客户端方式相当于瘦客户端模式，采用JDBC方式借助于Hive Thrift服务访问Hive数据仓库。

HiveThrift(HiveServer)是Hive中的组件之一，设计目的是为了实现跨语言轻量级访问Hive数据仓库，有Hiveserver和

Hiveserver2两个版本，两者不兼容，使用中要注意区分。体现在启动HiveServer的参数和jdbc:hiveX的参数上。

２.　beeline相关的Server.Thrift配置

主要是hive/conf/hive-site.xml中hive.server2.thrift相关的一些配置项，但要注意一致性

hive.server2.thrift.bind.host

slave1

Bind host on which to run the HiveServer2 Thrift service.

hive.server2.thrift.port

10000

Port number of HiveServer2 Thrift interface when hive.server2.transport.mode is 'binary'.

hive.server2.thrift.http.port

10001

Port number of HiveServer2 Thrift interface when hive.server2.transport.mode is 'http'.

进入beeline连接数据库后，因为要访问的文件在HDFS上，对应的路径有访问权限限制，所以，这里要设成hadoop中的用户名，实例中用户名即为'hadoop’。如果使用其它用户名，可能会报权限拒绝的错误。或通过修改hadoop中的配置项hadoop.proxyuser.ＸＸ为“*”　来放宽用户名和权限，如示例。

属性 hive.server2.thrift.client.user 的值是hadoop

属性 hive.server2.thrift.client.password 的值是 hadoop

修改下面配置放宽用户名和权限

配置文件 hadoop/etc/hadoop/core-site.xml

属性 hadoop.proxyuser.hadoop.hosts 的值改为 *

属性 hadoop.proxyuser.hadoop.groups 的值改为 *

3.　启动beeline并访问Hive

slave1上启动hiveserver2,

nohup hive --service hiveserver2 &

ps -ef | grep Hive 能看到Hiveserver2已启动

master机器上执行beeline并访问hive

hadoop@master:~/bigdata/hive$ beeline

Beeline version 1.2.1.spark2 by Apache Hive

beeline>

beeline> !connect jdbc:hive2://slave1:10000 // 2中配置项的host:port ，因为启动的是hiveserver2，所以参数中是hive2

Connecting to jdbc:hive2://slave1:10000

Enter username for jdbc:hive2://slave1:10000:hadoop

Enter password for jdbc:hive2://slave1:10000:****** //2中配置项的user/password

17/09/08 14:39:27 INFO jdbc.Utils: Supplied authorities: slave1:10000

17/09/08 14:39:27 INFO jdbc.Utils: Resolved authority: slave1:10000

17/09/08 14:39:27 INFO jdbc.HiveConnection: Will try to open client transport with JDBC Uri: jdbc:hive2://slave1:10000

Connected to: Apache Hive (version 2.1.1)

Driver: Hive JDBC (version 1.2.1.spark2)

Transaction isolation: TRANSACTION_REPEATABLE_READ

0: jdbc:hive2://slave1:10000>

0: jdbc:hive2://slave1:10000> show databases;

+----------------+--+

| database_name |

+----------------+--+

| default |

| shizhan |

+----------------+--+

2 rows selected (0.379 seconds)

看到结果后，进入hadoop webui http://master:8088/cluster/apps/FINISHED　可看到刚执行的任务。

0: jdbc:hive2://slave1:10000> !q//// 退出beeline

hive实战

你可能感兴趣的:(hive实战)