Hive在实际运用中,也会涉及到很多Shell相关的指令,这里据例举一些常用的;
最常用的指令,进入Hive Cli环境,在Linux配置好Hive安装目录的环境变量后,直接Hive
进入,具体如下;
[hadoop@dw-test-cluster-007 ]$ hive
which: no hbase in (/usr/local/tools/anaconda3/bin:/usr/local/tools/anaconda3/condabin:/usr/local/tools/azkaban/azkaban-exec-server/bin:/usr/local/tools/azkaban/azkaban-web-server/bin:/usr/local/tools/anaconda3/bin:/usr/local/tools/java/current/bin:/usr/local/tools/scala/current/bin:/usr/local/tools/hadoop/current/bin:/usr/local/tools/spark/current/bin:/usr/local/tools/hive/current/bin:/usr/local/tools/zookeeper/current/bin:/usr/local/tools/flume/current/bin:/usr/local/tools/flink/current/bin:/usr/local/tools/node/current/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/home/liuxiaowei/.local/bin:/home/liuxiaowei/bin)
Logging initialized using configuration in file:/usr/local/tools/hive/apache-hive-2.3.5-bin/conf/hive-log4j2.properties Async: true
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
hive>
hive [-hiveconf x=y]* [<-i filename>]* [<-f filename>|<-e query-string>] [-S]
-i 从文件初始化 HQL
-e 从命令行执行指定的 HQL
-f 执行 HQL 脚本
-v 输出执行的 HQL 语句到控制台
-p connect to Hive Server on port number
-hiveconf x=y(Use this to set hive/hadoop configuration variables)
-S:表示以不打印日志的形式执行命名操作
直接执行HiveQL语句,如下;
[hadoop@dw-test-cluster-001 ~]$ hive -e "select * from dw.ods_sale_order_producttion_amount;"
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
202005 20200501 199000.00 2020 30 20200731 00
202005 20200502 185000.00 2020 30 20200731 00
202005 20200503 199000.00 2020 30 20200731 00
202005 20200504 138500.00 2020 30 20200731 00
202005 20200505 196540.00 2020 30 20200731 00
202005 20200506 138500.00 2020 30 20200731 00
202005 20200507 159840.00 2020 30 20200731 00
202005 20200508 189462.00 2020 30 20200731 00
202005 20200509 200000.00 2020 30 20200731 00
202005 20200510 198540.00 2020 30 20200731 00
202006 20200601 189000.00 2020 30 20200731 00
202006 20200602 185000.00 2020 30 20200731 00
202006 20200603 189000.00 2020 30 20200731 00
202006 20200604 158500.00 2020 30 20200731 00
202006 20200605 200140.00 2020 30 20200731 00
202006 20200606 158500.00 2020 30 20200731 00
202006 20200607 198420.00 2020 30 20200731 00
202006 20200608 158500.00 2020 30 20200731 00
202006 20200609 200100.00 2020 30 20200731 00
202006 20200610 135480.00 2020 30 20200731 00
Time taken: 4.23 seconds, Fetched: 20 row(s)
[hadoop@dw-test-cluster-001 ~]$
根据HiveQL存在文件上的语句来执行,把刚刚的hive -e "select * from dw.ods_sale_order_producttion_amount;"
写入文件ods_sale_order_producttion_amount.sql S
,就可以执行以下操作;
[hadoop@dw-test-cluster-001 ~]$ hive -f ods_sale_order_producttion_amount.sql
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
202005 20200501 199000.00 2020 30 20200731 00
202005 20200502 185000.00 2020 30 20200731 00
202005 20200503 199000.00 2020 30 20200731 00
202005 20200504 138500.00 2020 30 20200731 00
202005 20200505 196540.00 2020 30 20200731 00
202005 20200506 138500.00 2020 30 20200731 00
202005 20200507 159840.00 2020 30 20200731 00
202005 20200508 189462.00 2020 30 20200731 00
202005 20200509 200000.00 2020 30 20200731 00
202005 20200510 198540.00 2020 30 20200731 00
202006 20200601 189000.00 2020 30 20200731 00
202006 20200602 185000.00 2020 30 20200731 00
202006 20200603 189000.00 2020 30 20200731 00
202006 20200604 158500.00 2020 30 20200731 00
202006 20200605 200140.00 2020 30 20200731 00
202006 20200606 158500.00 2020 30 20200731 00
202006 20200607 198420.00 2020 30 20200731 00
202006 20200608 158500.00 2020 30 20200731 00
202006 20200609 200100.00 2020 30 20200731 00
202006 20200610 135480.00 2020 30 20200731 00
Time taken: 4.23 seconds, Fetched: 20 row(s)
[hadoop@dw-test-cluster-001 ~]$
前提确保自己的hive配置了jdbc连接,而且 启动了jdbc service,然后通过shell指令的jdbc连接如下:
方法1
beeline
!connect jdbc:hive2://dw-test-cluster-007:10000
输入用户名,输入密码;
方法2:
或者 beeline -u “jdbc:hive2://dw-test-cluster-007:10000” -n hadoop hadoop
-u : 指定元数据库的链接信息
-n : 指定用户名和密码
shell环境下jdbc连接效果如下:
[hadoop@node1 apache-hive-2.3.5-bin]$ beeline -u "jdbc:hive2://node1:10000" -n hadoop hadoop
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/data/tools/apache-hive-2.3.5-bin/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/data/tools/hadoop-2.8.5/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Connecting to jdbc:hive2://node1:10000
Connected to: Apache Hive (version 2.3.5)
Driver: Hive JDBC (version 2.3.5)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 2.3.5 by Apache Hive
0: jdbc:hive2://node1:10000> show databases;
+----------------+
| database_name |
+----------------+
| default |
| dw |
+----------------+
2 rows selected (5.817 seconds)
0: jdbc:hive2://node1:10000>
准确启动后,jps指令能查到两个RunJar
进程对应hive的本身service和hive的jdbc service,要 重启hive的话,就先kill
这两个RunJar
,然后在运行这个脚本即可,注意:jdbc连接hive的配置为冷加载,什么意思呢,假设你改了一个hdfs或者hive的配置,你要是用jdbc连接的hive,不一定能刷新到新的配置,甚至可能会报错,如你修改了hdfs块的大小,另一边你用jdbc在查询hive表,可能会引发异常,这个时候你需要重启一下hive的jdbc service,jps查看如下:
[hadoop@node1 apache-hive-2.3.5-bin]$ jps
8195 DFSZKFailoverController
15686 Jps
7607 NameNode
15303 RunJar
6408 QuorumPeerMain
15304 RunJar
可以在你的hive目录下,新建一个shell文件start_hive.sh
,
shell指令vim /data/tools/apache-hive-2.3.5-bin/start_hive.sh
,里面内容如下,然后保存退出;
#!/usr/bin/env bash
#后台启动hive service
nohup hive --service metastore >> /data/logs/hive/meta.log 2>&1 &
#后台启动hive 的jdbc service
nohup hive --service hiveserver2 >> /data/logs/hive/hiveserver2.log 2>&1 &
以上就是Hive在使用中涉及到的一些常用的Shell命令;
在实际运用中,最靠谱最稳定的还是使用Hive Cli环境,而Hive Cli环境有一个很常用的功能,就是Hive的相关参数配置,不仅能方便我们读取数据结果,还能起到调优的作用,Hive参数配置大致可以分为三种情况;
Hive 也会读入Hadoop 的配置,因为 Hive 是作为 Hadoop 的客户端启动的,Hive 的配置会覆盖 Hadoop 的配置,配置文件的设定对本机启动的所有 Hive 进程都有效。配置文件通常是指
hive-site.xml
hive-default.xml
core-site.xml
,hdfs-site.xml
,mapred-site.xml
等)core-default.xml
,hdfs-default.xml
,mapred-default.xml
等)启动 Hive(客户端或 Server 方式)时,可以在命令行添加-hiveconf param=value 选项来设定参数,例如启东Hive Cli时指定表头输出;
[hadoop@shucang-10 ~]$ hive -hiveconf hive.cli.print.header=true
通过进入Hive的cli环境后,利用set hive.cli.print.header=true;
来设置属性,具体如下;
[hadoop@dw-test-cluster-007 ~]$ hive
Logging initialized using configuration in file:/usr/local/tools/hive/apache-hive-2.3.5-bin/conf/hive-log4j2.properties Async: true
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
hive>use dw;
OK
Time taken: 5.351 seconds
hive> set hive.cli.print.header=true;
hive>
这些方式的参数设置有一定的优先级顺序,顺序如下;
SET
命令声明 (对 hive 的连接 session 有效);-hiveconf
参数(对 hive 启动实例有效);hive-site.xml
hive-default.xml
core-site.xml
,hdfs-site.xml
,mapred-site.xml
等)core-default.xml
,hdfs-default.xml
,mapred-default.xml
等)具体的参数的值和配置的含义,可以参考官方wiki:Hive Configuration Properties;