Hive从入门到放弃——Hive的shell操作及参数配置(十七)

背 景

  Hive在实际运用中,也会涉及到很多Shell相关的指令,这里据例举一些常用的;

Hive Cli启动

  最常用的指令,进入Hive Cli环境,在Linux配置好Hive安装目录的环境变量后,直接Hive进入,具体如下;

[hadoop@dw-test-cluster-007 ]$ hive
which: no hbase in (/usr/local/tools/anaconda3/bin:/usr/local/tools/anaconda3/condabin:/usr/local/tools/azkaban/azkaban-exec-server/bin:/usr/local/tools/azkaban/azkaban-web-server/bin:/usr/local/tools/anaconda3/bin:/usr/local/tools/java/current/bin:/usr/local/tools/scala/current/bin:/usr/local/tools/hadoop/current/bin:/usr/local/tools/spark/current/bin:/usr/local/tools/hive/current/bin:/usr/local/tools/zookeeper/current/bin:/usr/local/tools/flume/current/bin:/usr/local/tools/flink/current/bin:/usr/local/tools/node/current/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/home/liuxiaowei/.local/bin:/home/liuxiaowei/bin)

Logging initialized using configuration in file:/usr/local/tools/hive/apache-hive-2.3.5-bin/conf/hive-log4j2.properties Async: true
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
hive>

Hive Cli的扩展

hive指令带选项参数

hive [-hiveconf x=y]* [<-i filename>]* [<-f filename>|<-e query-string>] [-S]

-i 从文件初始化 HQL
-e 从命令行执行指定的 HQL
-f 执行 HQL 脚本
-v 输出执行的 HQL 语句到控制台
-p connect to Hive Server on port number
-hiveconf x=y(Use this to set hive/hadoop configuration variables)
-S:表示以不打印日志的形式执行命名操作

hive指令带选项参数效果举例

  直接执行HiveQL语句,如下;

[hadoop@dw-test-cluster-001 ~]$ hive -e "select * from dw.ods_sale_order_producttion_amount;"
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
202005  20200501        199000.00       2020    30      20200731        00
202005  20200502        185000.00       2020    30      20200731        00
202005  20200503        199000.00       2020    30      20200731        00
202005  20200504        138500.00       2020    30      20200731        00
202005  20200505        196540.00       2020    30      20200731        00
202005  20200506        138500.00       2020    30      20200731        00
202005  20200507        159840.00       2020    30      20200731        00
202005  20200508        189462.00       2020    30      20200731        00
202005  20200509        200000.00       2020    30      20200731        00
202005  20200510        198540.00       2020    30      20200731        00
202006  20200601        189000.00       2020    30      20200731        00
202006  20200602        185000.00       2020    30      20200731        00
202006  20200603        189000.00       2020    30      20200731        00
202006  20200604        158500.00       2020    30      20200731        00
202006  20200605        200140.00       2020    30      20200731        00
202006  20200606        158500.00       2020    30      20200731        00
202006  20200607        198420.00       2020    30      20200731        00
202006  20200608        158500.00       2020    30      20200731        00
202006  20200609        200100.00       2020    30      20200731        00
202006  20200610        135480.00       2020    30      20200731        00
Time taken: 4.23 seconds, Fetched: 20 row(s)
[hadoop@dw-test-cluster-001 ~]$ 

  根据HiveQL存在文件上的语句来执行,把刚刚的hive -e "select * from dw.ods_sale_order_producttion_amount;"写入文件ods_sale_order_producttion_amount.sql S,就可以执行以下操作;

[hadoop@dw-test-cluster-001 ~]$ hive -f ods_sale_order_producttion_amount.sql
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
202005  20200501        199000.00       2020    30      20200731        00
202005  20200502        185000.00       2020    30      20200731        00
202005  20200503        199000.00       2020    30      20200731        00
202005  20200504        138500.00       2020    30      20200731        00
202005  20200505        196540.00       2020    30      20200731        00
202005  20200506        138500.00       2020    30      20200731        00
202005  20200507        159840.00       2020    30      20200731        00
202005  20200508        189462.00       2020    30      20200731        00
202005  20200509        200000.00       2020    30      20200731        00
202005  20200510        198540.00       2020    30      20200731        00
202006  20200601        189000.00       2020    30      20200731        00
202006  20200602        185000.00       2020    30      20200731        00
202006  20200603        189000.00       2020    30      20200731        00
202006  20200604        158500.00       2020    30      20200731        00
202006  20200605        200140.00       2020    30      20200731        00
202006  20200606        158500.00       2020    30      20200731        00
202006  20200607        198420.00       2020    30      20200731        00
202006  20200608        158500.00       2020    30      20200731        00
202006  20200609        200100.00       2020    30      20200731        00
202006  20200610        135480.00       2020    30      20200731        00
Time taken: 4.23 seconds, Fetched: 20 row(s)
[hadoop@dw-test-cluster-001 ~]$ 

Hive在Linux下启动JDBC连接

  前提确保自己的hive配置了jdbc连接,而且 启动了jdbc service,然后通过shell指令的jdbc连接如下:

方法1
beeline
!connect jdbc:hive2://dw-test-cluster-007:10000
输入用户名,输入密码;
方法2:
或者 beeline -u “jdbc:hive2://dw-test-cluster-007:10000” -n hadoop hadoop
-u : 指定元数据库的链接信息
-n : 指定用户名和密码

  shell环境下jdbc连接效果如下:

[hadoop@node1 apache-hive-2.3.5-bin]$  beeline -u "jdbc:hive2://node1:10000" -n hadoop hadoop
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/data/tools/apache-hive-2.3.5-bin/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/data/tools/hadoop-2.8.5/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Connecting to jdbc:hive2://node1:10000
Connected to: Apache Hive (version 2.3.5)
Driver: Hive JDBC (version 2.3.5)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 2.3.5 by Apache Hive
0: jdbc:hive2://node1:10000> show databases;
+----------------+
| database_name  |
+----------------+
| default        |
| dw             |
+----------------+
2 rows selected (5.817 seconds)
0: jdbc:hive2://node1:10000>

Hive的重启

  准确启动后,jps指令能查到两个RunJar进程对应hive的本身service和hive的jdbc service,要 重启hive的话,就先kill这两个RunJar,然后在运行这个脚本即可,注意:jdbc连接hive的配置为冷加载,什么意思呢,假设你改了一个hdfs或者hive的配置,你要是用jdbc连接的hive,不一定能刷新到新的配置,甚至可能会报错,如你修改了hdfs块的大小,另一边你用jdbc在查询hive表,可能会引发异常,这个时候你需要重启一下hive的jdbc service,jps查看如下:

[hadoop@node1 apache-hive-2.3.5-bin]$ jps
8195 DFSZKFailoverController
15686 Jps
7607 NameNode
15303 RunJar
6408 QuorumPeerMain
15304 RunJar

  可以在你的hive目录下,新建一个shell文件start_hive.sh,
shell指令vim /data/tools/apache-hive-2.3.5-bin/start_hive.sh,里面内容如下,然后保存退出;

#!/usr/bin/env bash

#后台启动hive service
nohup hive --service metastore >> /data/logs/hive/meta.log 2>&1 &

#后台启动hive 的jdbc service
nohup  hive --service hiveserver2 >> /data/logs/hive/hiveserver2.log 2>&1 &

  以上就是Hive在使用中涉及到的一些常用的Shell命令;

Hive参数配置

  在实际运用中,最靠谱最稳定的还是使用Hive Cli环境,而Hive Cli环境有一个很常用的功能,就是Hive的相关参数配置,不仅能方便我们读取数据结果,还能起到调优的作用,Hive参数配置大致可以分为三种情况;

  • 通过配置文件
  • 通过命令行参数(对 hive 启动实例有效)
  • 参数声明 (对 hive 的连接 session 有效)

通过配置文件

  Hive 也会读入Hadoop 的配置,因为 Hive 是作为 Hadoop 的客户端启动的,Hive 的配置会覆盖 Hadoop 的配置,配置文件的设定对本机启动的所有 Hive 进程都有效。配置文件通常是指

  1. hive-site.xml
  2. hive-default.xml
  3. hadoop相关的用户自定义配置文件(core-site.xml,hdfs-site.xml,mapred-site.xml等)
  4. hadoop相关的默认配置文件(core-default.xml,hdfs-default.xml,mapred-default.xml等)

通过命令行参数(对 hive 启动实例有效)

  启动 Hive(客户端或 Server 方式)时,可以在命令行添加-hiveconf param=value 选项来设定参数,例如启东Hive Cli时指定表头输出;

[hadoop@shucang-10 ~]$  hive -hiveconf hive.cli.print.header=true

参数声明 (对 hive 的连接 session 有效)

  通过进入Hive的cli环境后,利用set hive.cli.print.header=true;来设置属性,具体如下;

[hadoop@dw-test-cluster-007 ~]$ hive
Logging initialized using configuration in file:/usr/local/tools/hive/apache-hive-2.3.5-bin/conf/hive-log4j2.properties Async: true
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
hive>use dw;
OK
Time taken: 5.351 seconds
hive> set hive.cli.print.header=true;
hive>

  这些方式的参数设置有一定的优先级顺序,顺序如下;

  1. 参数SET命令声明 (对 hive 的连接 session 有效);
  2. 通过命令行-hiveconf参数(对 hive 启动实例有效);
  3. Hive用户自定义的参数配置文件hive-site.xml
  4. Hive默认的参数配置文件hive-default.xml
  5. Hadoop相关的用户自定义配置文件(core-site.xml,hdfs-site.xml,mapred-site.xml等)
  6. Hadoop相关的默认配置文件(core-default.xml,hdfs-default.xml,mapred-default.xml等)

  具体的参数的值和配置的含义,可以参考官方wiki:Hive Configuration Properties;

你可能感兴趣的:(Hadoop,Hive)