一、Hive环境搭建
1)把apache-hive-3.1.2-bin.tar.gz上传到linux的/opt/software目录下
2)解压apache-hive-3.1.2-bin.tar.gz到/opt/module/目录下面
[yili@hadoop102 software]$ tar -zxvf /opt/software/apache-hive-3.1.2-bin.tar.gz -C /opt/module/
3)修改apache-hive-3.1.2-bin.tar.gz的名称为hive
[yili@hadoop102 software]$ mv /opt/module/apache-hive-3.1.2-bin/ /opt/module/
4)修改/etc/profile.d/my_env.sh,添加环境变量
[yili@hadoop102 software]$ sudo vim /etc/profile.d/my_env.sh
添加内容
#HIVE_HOME
export HIVE_HOME=/opt/module/hive-3.1.2
export PATH=$PATH:$HIVE_HOME/bin
重启Xshell对话框或者source一下 /etc/profile.d/my_env.sh文件,使环境变量生效
[yili@hadoop102 software]$ source /etc/profile.d/my_env.sh
5)解决日志Jar包冲突,进入/opt/module/hive-3.1.2/lib目录
[yili@hadoop102 lib]$ mv log4j-slf4j-impl-2.10.0.jar log4j-slf4j-impl-2.10.0.jar.bak
将MySQL的JDBC驱动拷贝到Hive的lib目录下
[yili@hadoop102 lib]$ cp /opt/software/mysql-connector-java-bin-5.1.27.jar /opt/module/hive-3.1.2/lib/
(2)配置MySQL作为元数据存储
在$HIVE_HOME/conf目录下新建hive-site.xml文件
[atguigu@hadoop102 conf]$ vim hive-site.xml
添加如下内容
"1.0"?>
type="text/xsl" href="configuration.xsl"?>
javax.jdo.option.ConnectionURL</name>
jdbc:mysql://hadoop102:3306/metastore?useSSL=false?useSSL=false&useUnicode=true&characterEncoding=UTF-8</value>
</property>
javax.jdo.option.ConnectionDriverName</name>
com.mysql.jdbc.Driver</value>
</property>
javax.jdo.option.ConnectionUserName</name>
root</value>
</property>
javax.jdo.option.ConnectionPassword</name>
123456</value>
</property>
hive.metastore.warehouse.dir</name>
/user/hive/warehouse</value>
</property>
hive.metastore.schema.verification</name>
false</value>
</property>
hive.server2.thrift.port</name>
10000</value>
</property>
hive.server2.thrift.bind.host</name>
hadoop102</value>
</property>
hive.metastore.event.db.notification.api.auth</name>
false</value>
</property>
hive.cli.print.header</name>
true</value>
</property>
hive.cli.print.current.db</name>
true</value>
</property>
</configuration>
1)登录MySQL
[yili@hadoop102 conf]$ mysql -uroot -p123456
2)新建Hive元数据库
mysql> create database metastore;
mysql> quit;
3)初始化Hive元数据库
[yili@hadoop102 conf]$ schematool -initSchema -dbType mysql -verbose
(2)启动hive客户端
1)启动Hive客户端
[yili@hadoop102 hive]$ bin/hive
2)查看数据库
hive (default)> show databases;
OK
database_name
default
mkdir /opt/module/tez
tar -zxvf /opt/software/tez-0.10.1-SNAPSHOT-minimal.tar.gz -C /opt/module/tez
3)上传tez依赖到HDFS(上传的是不带minimal的那个)
hadoop fs -mkdir /tez(集群创建/tez路径,然后再上传,注意路径)
hadoop fs -put /opt/software/tez-0.10.1-SNAPSHOT.tar.gz /tez
4)新建tez-site.xml在$HADOOP_HOME/etc/hadoop/路径下(注意,不要放在hive/conf/目录下,不生效),记得把tez-site.xml同步到集群其他机器。
type="text/xsl" href="configuration.xsl"?>
<!-- 注意你的路径以及文件名是否和我的一样 -->
tez.lib.uris</name>
${fs.defaultFS}/tez/tez-0.10.1-SNAPSHOT.tar.gz</value>
</property>
tez.use.cluster.hadoop-libs</name>
true</value>
</property>
tez.am.resource.memory.mb</name>
1024</value>
</property>
tez.am.resource.cpu.vcores</name>
1</value>
</property>
tez.container.max.java.heap.fraction</name>
0.4</value>
</property>
tez.task.resource.memory.mb</name>
1024</value>
</property>
tez.task.resource.cpu.vcores</name>
1</value>
</property>
</configuration>
4)修改Hadoop环境变量,添加以下内容
cd /opt/module/hadoop-3.1.3/etc/hadoop/shellprofile.d
vi example.sh
hadoop_add_profile tez
function _tez_hadoop_classpath
{
hadoop_add_classpath "$HADOOP_HOME/etc/hadoop" after
hadoop_add_classpath "/opt/module/tez/*" after
hadoop_add_classpath "/opt/module/tez/lib/*" after
}
5)修改hive的计算引擎,vim $HIVE_HOME/conf/hive-site.xml,添加以下内容
hive.execution.engine</name>
tez</value>
</property>
hive.tez.container.size</name>
1024</value>
</property>
6)在hive-env.sh中添加tez的路径
export TEZ_HOME=/opt/module/tez #是你的tez的解压目录
export TEZ_JARS=""
for jar in `ls $TEZ_HOME |grep jar`; do
export TEZ_JARS=$TEZ_JARS:$TEZ_HOME/$jar
done
for jar in `ls $TEZ_HOME/lib`; do
export TEZ_JARS=$TEZ_JARS:$TEZ_HOME/lib/$jar
done
export HIVE_AUX_JARS_PATH=/opt/module/hadoop-3.1.3/share/hadoop/common/hadoop-lzo-0.4.21-SNAPSHOT.jar$TEZ_JARS
7)解决日志Jar包冲突
rm /opt/module/tez/lib/slf4j-log4j12-1.7.10.jar
1)启动Hive
[yili@hadoop102 hive]$ hive
2)创建表
hive (default)> create table student(
id int,
name string);
3)向表中插入数据
hive> insert into table student values(3,'guodong');
4)如果没有报错就表示成功了
Query ID = yili_20220721144352_ebaa6f16-71e2-4bd5-bfcf-b611af819ebb
Total jobs = 1
Launching Job 1 out of 1
Tez session was closed. Reopening...
Session re-established.
Session re-established.
Status: Running (Executing on YARN cluster with App id application_1658371098258_0011)
----------------------------------------------------------------------------------------------
VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
----------------------------------------------------------------------------------------------
Map 1 .......... container SUCCEEDED 1 1 0 0 0 0
Reducer 2 ...... container SUCCEEDED 1 1 0 0 0 0
----------------------------------------------------------------------------------------------
VERTICES: 02/02 [==========================>>] 100% ELAPSED TIME: 17.20 s
----------------------------------------------------------------------------------------------
Loading data to table default.student
OK
Time taken: 31.1 seconds
hive> select * from student;
OK
1 zhangsan
2 linxin
3 guodong
但是现实却往往并没有这么顺利,再执行hive命令以及插入数据都分别报了一个错误,请看如下两个错误;
error1:
错误描述:报错的大概内容就是加载hive的过程中,缺少相关依赖:
解决办法:下载相关依赖commons-collections4-4.1.jar,然后放置到$HIVE_HOME/lib目录下
error2:
错误描述:当向hive库插入数据时候会报上述错误
解决办法:
向表中 执行一次load 函数从本机加载数据到表中,下次就不会报这个错误了。