一、简介

hive是基于Hadoop的一个数据仓库工具，用来进行数据提取、转化、加载，这是一种可以存储、查询和分析存储在Hadoop中的大规模数据的机制。hive数据仓库工具能将结构化的数据文件映射为一张数据库表，并提供SQL查询功能，能将SQL语句转变成MapReduce任务来执行。Hive的优点是学习成本低，可以通过类似SQL语句实现快速MapReduce统计，使MapReduce变得更加简单，而不必开发专门的MapReduce应用程序。hive是十分适合数据仓库的统计分析和Windows注册表文件。（摘自百度百科）

二、下载

下载地址：http://hive.apache.org/

下载得到：apache-hive-3.1.2-bin.tar.gz

Hive首页

从下面可以看到，3.1.2可以兼容Hadoop3.x.x

下载页面

选择清华下载点

选3.1.2

选bin文件

三、安装

tar zxvf apache-hive-3.1.2-bin.tar.gz -C /mylab/soft

四、配置

这个配置是最麻烦的一个，请客观一定要耐心一点，耐心一点、再耐心一点

1.修改环境变量

修改~/.bashrc

vi ~/.bashrc

#hive 3.1.2

export HIVE_HOME=/mylab/soft/apache-hive-3.1.2-bin

export PATH=$PATH:$HIVE_HOME/bin

export CLASSPATH=$CLASSPATH:$HIVE_HOME/lib

source ~/.bashrc

2.为hive配置mysql

创建新用户

CREATE USER 'hive'@'%' IDENTIFIED BY 'hive';

创建数据库

create database hive;

查看全部用户

SELECT user, host FROM mysql.user;

用户授权

GRANT ALL PRIVILEGES ON *.* TO 'hive'@'%' with grant option;

刷新权限

flush privileges;

显示MySQL用户帐户权限

SHOW GRANTS FOR 'hive'@'%';

测试登录

mysql -u hive -p hive

3.修改hbase配置文件

a) 创建工作目录

注：在配置文件中要用到的目录，有些我也搞不清楚干啥的，反正看到有目录的地方，我都给它配上一个目录，且都放在$HIVE_HOME/working下，并给$HIVE_HOME/working及所有子目录授予读写属性

mkdir -p $HIVE_HOME/working/exec.scratchdir

mkdir -p $HIVE_HOME/working/repl.rootdir

mkdir -p $HIVE_HOME/working/repl.cmrootdir

mkdir -p $HIVE_HOME/working/replica.functions.root

mkdir -p $HIVE_HOME/working/local.scratchdir

mkdir -p $HIVE_HOME/working/downloaded.resources

mkdir -p $HIVE_HOME/working/metastore.warehouse

mkdir -p $HIVE_HOME/working/replica.functions.root

mkdir -p $HIVE_HOME/working/operation.log

mkdir -p $HIVE_HOME/working/querylog

mkdir -p $HIVE_HOME/working/logs

chmod 777 -R $HIVE_HOME/working

b) 复制mysql-connector-java

为了省事，直接复制mysql-connector-java-8.0.15.jar到$HIVE_HOME/lib

cp /usr/share/java/mysql-connector-java-8.0.21.jar $HIVE_HOME/lib

或者

ln -s /usr/share/java/mysql-connector-java-8.0.21.jar $HIVE_HOME/lib/mysql-connector-java-8.0.21.jar

要是不确定在哪个目录下，用find命令找

sudo find / -name mysql-connector-java-8.0.21.jar

c) 修改hive-env.sh

这个需要从模板文件里复制

cp $HIVE_HOME/conf/hive-env.sh.template $HIVE_HOME/conf/hive-env.sh

vi $HIVE_HOME/conf/hive-env.sh

HADOOP_HOME=/mylab/soft/hadoop-3.2.1

export HIVE_CONF_DIR=/mylab/soft/apache-hive-3.1.2-bin/conf

export HIVE_AUX_JARS_PATH=/mylab/soft/apache-hive-3.1.2-bin/lib

c) 修改hive-site.xml

这个需要从模板文件里复制(这个模板文件名起的有点坑)

cp $HIVE_HOME/conf/hive-default.xml.template $HIVE_HOME/conf/hive-site.xml

这个配置文件比较长，所以参见附录hive-site.xml

d) 去掉hive-site.xml里的一个特殊字符

hive.txn.xlock.iow

true

Ensures commands with OVERWRITE (such as INSERT OVERWRITE) acquire Exclusive locks fortransactional tables. This ensures that inserts (w/o overwrite) running concurrently

are not hidden by the INSERT OVERWRITE.

把这个for 去掉

e) 修复hive的bug

将$HIVE_HOME/bin/hive文件放到share共享目录里，利用windows的notepad打开它，然后把所有的“file://”替换为“file:///”（三个反斜杠，为啥会有这种错误）

改完后放回到$HIVE_HOME/bin下

f) 解决guava库版本冲突

ls /mylab/soft/hadoop-3.2.1/share/hadoop/common/lib/guava*

guava-27.0-jre.jar

ls /mylab/soft/apache-hive-3.1.2-bin/lib/guava*

guava-19.0.jar

删除掉hive下低版本的，复制Hadoop下高版本的到$HIVE_HOME/lib目录

rm /mylab/soft/apache-hive-3.1.2-bin/lib/guava-19.0.jar

cp /mylab/soft/hadoop-3.2.1/share/hadoop/common/lib/guava-27.0-jre.jar /mylab/soft/apache-hive-3.1.2-bin/lib

到https://mvnrepository.com/artifact/com.google.guava/guava

找最新的替换也行

f) hive-log4j2.properties

cp $HIVE_HOME/conf/hive-log4j2.properties.template $HIVE_HOME/conf/hive-log4j2.properties

g) 修改hive-exec-log4j2.properties

cp $HIVE_HOME/conf/hive-exec-log4j2.properties.template $HIVE_HOME/conf/hive-exec-log4j2.properties

五、验证

终于可以运行一把了

格式化数据库

schematool -dbType mysql -initSchema

启动hive服务

hive --service metastore > $HIVE_HOME/working/logs/metastore.log 2>&1 &

启动hiveserver2服务

hive --service hiveserver2 > $HIVE_HOME/working/logs/hiveserver2.log 2>&1 &

或者交互模式

hive --service metastore

hive --service hiveserver2

jps

查看hive进程

普通型

ps -ef | grep hive

精简型（命令很复杂，但结果很精简，输出为：linux用户名进程ID）

ps -ef | grep hive | grep -v grep | awk '{print $1 " " $2}'

启动客户端

方法1

hive

方法2

beeline -u jdbc:hive2://master:10000 -n root

beeline

!connect jdbc:hive2://master:10000 -n root

退出beeline

!exit

查看hiveserver2（默认10000端口）

netstat -tulnp | grep 10000

beeline使用详见

https://www.cnblogs.com/lenmom/p/11218807.html

https://www.cnblogs.com/xinfang520/p/7684598.html

停止hive服务

只能kill掉

普通型

ps -ef | grep hive

kill -9

精简型（命令很复杂，但一剑封喉）

ps -ef | grep hive | grep -v grep | awk '{print "kill -9 " $2}' | sh

六、附录

1.hive-site.xml

hive.metastore.uris

thrift://master:9083

Thrift URI for the remote metastore. Used by metastore client to connect to remote metastore.

hive.exec.scratchdir

/mylab/soft/apache-hive-3.1.2-bin/working/scratchdir

HDFS root scratch dir for Hive jobs which gets created with write all (733) permission. For each connecting user, an HDFS scratch dir: ${hive.exec.scratchdir}/<username> is created, with ${hive.scratch.dir.permission}.

hive.repl.rootdir

/mylab/soft/apache-hive-3.1.2-bin/working/repl.rootdir

HDFS root dir for all replication dumps.

hive.repl.cmrootdir

/mylab/soft/apache-hive-3.1.2-bin/working/repl.cmrootdir

Root dir for ChangeManager, used for deleted files.

hive.repl.replica.functions.root.dir

/mylab/soft/apache-hive-3.1.2-bin/working/repl.replica.functions

Root directory on the replica warehouse where the repl sub-system will store jars from the primary warehouse

hive.exec.local.scratchdir

/mylab/soft/apache-hive-3.1.2-bin/working/local_scratchdir

Local scratch space for Hive jobs

hive.downloaded.resources.dir

/mylab/soft/apache-hive-3.1.2-bin/working/downloaded.resources/${hive.session.id}_resources

Temporary local directory for added resources in the remote file system.

hive.metastore.warehouse.dir

/mylab/soft/apache-hive-3.1.2-bin/working/metastore.warehouse

location of default database for the warehouse

hive.aux.jars.path

/mylab/soft/apache-hive-3.1.2-bin/lib

The location of the plugin jars that contain implementations of user defined functions and serdes.

hive.querylog.location

/mylab/soft/apache-hive-3.1.2-bin/working/querylog

Location of Hive run time structured log file

hive.server2.logging.operation.log.location

/mylab/soft/apache-hive-3.1.2-bin/working/operation_log

Top level directory where operation logs are stored if logging functionality is enabled

javax.jdo.option.ConnectionURL

jdbc:mysql://master:3306/hive?createDatabaseIfNotExist=true

JDBC connect string for a JDBC metastore.

To use SSL to encrypt/authenticate the connection, provide database-specific SSL flag in the connection URL.

For example, jdbc:postgresql://myhost/db?ssl=true for postgres database.

javax.jdo.option.ConnectionDriverName

com.mysql.cj.jdbc.Driver

Driver class name for a JDBC metastore

javax.jdo.option.ConnectionUserName

hive

Username to use against metastore database

javax.jdo.option.ConnectionPassword

hive

password to use against metastore database

hive.server2.thrift.bind.host

master

Bind host on which to run the HiveServer2 Thrift service.

hive.server2.thrift.port

10000

TCP port number to listen on, default 10000

hive.metastore.event.db.notification.api.auth

false

hive.cli.print.header

true

hive.cli.print.current.db

true

2.参考

https://blog.csdn.net/weixin_43824520/article/details/100580557

https://blog.csdn.net/weixin_45883933/article/details/106843035

好玩的大数据之17：Hive安装（Hive-3.1.2）

一、简介

二、下载

三、安装

四、配置

a) 创建工作目录

b) 复制mysql-connector-java

c) 修改hive-env.sh

c) 修改hive-site.xml

d) 去掉hive-site.xml里的一个特殊字符

e) 修复hive的bug

f) 解决guava库版本冲突

f) hive-log4j2.properties

g) 修改hive-exec-log4j2.properties

五、验证

格式化数据库

启动hive服务

启动hiveserver2服务

查看hive进程

启动客户端

停止hive服务

只能kill掉

普通型

精简型（命令很复杂，但一剑封喉）

六、附录

1.hive-site.xml

2.参考

你可能感兴趣的:(好玩的大数据之17：Hive安装（Hive-3.1.2）)