本文记录的是部署的3节点的分布式Hadoop集群的过程,环境是CentOS 7.4,1个NameNode,2个DataNode。并在此Hadoop环境基础上完成Hive的安装。
Hadoop,Apache软件基金会旗下的一个开源分布式计算平台,是一个运行和处理海量数据的软件平台。以Hadoop分布式文件系统 HDFS 和 MapReduce 为核心的Hadoop为用户提供了系统底层细节透明的分布式基础架构。
Hive,基于Hadoop的数据仓库解决方案,可以将结构化的数据文件映射为一张数据库表,并提供类SQL(HQL)查询功能。其设计目标是使Hadoop上的数据操作与传统SQL结合,让熟悉SQL编程的开发人员能够轻松向Hadoop平台转移。
在CentOS 7下新建hadoop用户,官方推荐的是hadoop、mapreduce、yarn分别用不同的用户安装,本文相关环境全部在hadoop用户下安装。首先需要添加Hadoop用户,为了方便部署,并为其分配管理员权限:
[root@localhost ~]# groupadd hadoop
[root@localhost ~]# useradd -m hadoop -G hadoop -s /bin/bash
[root@localhost ~]# passwd hadoop
[root@localhost ~]# visudo
使用 visudo 命令后找到 root ALL=(ALL) ALL
这一行(应该在90到100行之间,vi命令模式下输入 :set nu
就会显示行号,可以输入例如 :92
回车跳转到92行。),然后在这行下面增加一行内容:hadoop ALL=(ALL) ALL
(当中的间隔为tab),然后保存退出。
在 /usr/local/
目录下创建目录:java、hadoop、hive三个目录并分配权限给hadoop用户:
[root@localhost local]# chown -R hadoop:hadoop /usr/local/java
[root@localhost local]# chown -R hadoop:hadoop /usr/local/hadoop
[root@localhost local]# chown -R hadoop:hadoop /usr/local/hive
一些开发版的CentOS会自带jdk,我们一般用自己的jdk,把自带的删除。先看看有没有安装java -version。
关于OpenJDK和JDK的区别可以查看:http://www.cnblogs.com/sxdcgaq8080/p/7487369.html
先看看有没有安装java -version
[root@localhost ~]# java -version
openjdk version "1.8.0_101"
OpenJDK Runtime Environment (build 1.8.0_101-b13)
OpenJDK 64-Bit Server VM (build 25.101-b13, mixed mode)
如果没有可以略过第一步,如果有(如上图),则找到它们的安装位置:
[root@localhost ~]# rpm -qa | grep java
java-1.8.0-openjdk-headless-1.8.0.101-3.b13.el7_2.x86_64
tzdata-java-2016f-1.el7.noarch
java-1.8.0-openjdk-1.8.0.101-3.b13.el7_2.x86_64
javapackages-tools-3.4.1-11.el7.noarch
java-1.7.0-openjdk-headless-1.7.0.111-2.6.7.2.el7_2.x86_64
java-1.7.0-openjdk-1.7.0.111-2.6.7.2.el7_2.x86_64
python-javapackages-3.4.1-11.el7.noarch
删除全部,noarch文件可以不用删除
[root@localhost ~]# rpm -e --nodeps java-1.8.0-openjdk-headless-1.8.0.101-3.b13.el7_2.x86_64
[root@localhost ~]# rpm -e --nodeps java-1.8.0-openjdk-1.8.0.101-3.b13.el7_2.x86_64
[root@localhost ~]# rpm -e --nodeps java-1.7.0-openjdk-headless-1.7.0.111-2.6.7.2.el7_2.x86_64
[root@localhost ~]# rpm -e --nodeps java-1.7.0-openjdk-1.7.0.111-2.6.7.2.el7_2.x86_64
检查有没有删除
[root@localhost ~]# java -version
-bash: /usr/bin/java: 没有那个文件或目录
如果还没有删除,则用yum -y remove去删除他们。
1.下载JDK安装包
如果系统没有wget命令则先使用 yum install wget
安装wget命令。
然后下载JDK安装包:cd /usr/local/java
,然后下载JDK:
[root@localhost java]# wget http://download.oracle.com/otn-pub/java/jdk/8u181-b13/96a7b8442fe848ef90c96a2fad6ed6d1/jdk-8u181-linux-x64.tar.gz
下载完成之后执行解压命令:tar -zxvf jdk-8u181-linux-x64.tar.gz
2.配置Java环境变量
执行 vim /etc/profile
,打开文件添加如下内容:
# set java environment
JAVA_HOME=/usr/local/java/jdk1.8.0_181
JRE_HOME=$JAVA_HOME/jre
PATH=$PATH:$JAVA_HOME/bin:$JRE_HOME/bin
CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$JRE_HOME/lib
export JAVA_HOME JRE_HOME PATH CLASSPATH
保存并退出,然后使用命令 source /etc/profile
使环境变量生效,并验证Java环境是否成功配置:
[root@localhost java]# java -version
java version "1.8.0_181"
Java(TM) SE Runtime Environment (build 1.8.0_181-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.181-b13, mixed mode)
[root@localhost java]#
到此,需要在每一台 Hadoop 节点上依次配置如上安装准备步骤内容
1.执行如下命令关闭防火墙,并禁用开机启动:
[root@localhost ~]# systemctl stop firewalld.service
[root@localhost ~]# systemctl disable firewalld.service
2.修改主机名
在Master主机上执行: hostnamectl set-hostname master
在Slave01主机上执行:hostnamectl set-hostname slave01
在Slave02主机上执行:hostnamectl set-hostname slave02
注意: 尽量不要在主机别名中含有任何特殊字符,这可能导致后面的Hadoop和Hive的配置出现问题!
3.配置相关网络
以master主机为例,演示如何配置静态网络及host文件。
首先使用 ifconfig
命令查看本地网卡及IP:
然后打开配置文件:
[root@master ~]# vim /etc/sysconfig/network-scripts/ifcfg-ens33
根据节点网络相关信息,修改或添加如下内容:
ONBOOT=yes
BOOTPROTO=static
IPADDR=192.168.188.8
NETMASK=255.255.255.0
GATEWAY=182.168.188.2
DNS1=114.114.114.114
DEFROUTE=yes
IPV6INIT=no
IPV4_FAILURE_FATAL=yes
依次在各个节点中配置上述网络信息
4.修改hosts文件
vim /etc/hosts
在各个节点主机上添加所有节点其静态IP和主机别名:
192.168.188.8 master
192.168.188.9 slave01
192.168.188.10 slave02
配置完成后使用ping命令检查这3个机器是否相互ping得通,以master为例:
ping -c 3 slave01
1.对于每一台机器,su - hadoop
切换到hadoop用户,在hadoop用户下执行以下指令,以master节点为例,在hadoop用户home目录下生成密钥文件
[hadoop@master ~]$ ssh-keygen -t rsa -P ''
一直Enter到底。
2.对于每台机器,首先将自己的公钥加到authorized_keys中,保证ssh localhost无密码登录:
cat id_rsa.pub >> authorized_keys
3.然后将自己的公钥添加至其他每台机器的authorized_keys中,在此过程中需要输入其他机器的密码:
master:
scp /home/hadoop/.ssh/id_rsa.pub hadoop@slave01:/home/hadoop/.ssh/id_rsa_master.pub
scp /home/hadoop/.ssh/id_rsa.pub hadoop@slave02:/home/hadoop/.ssh/id_rsa_master.pub
slave01:
scp /home/hadoop/.ssh/id_rsa.pub hadoop@master:/home/hadoop/.ssh/id_rsa_slave01.pub
scp /home/hadoop/.ssh/id_rsa.pub hadoop@slave02:/home/hadoop/.ssh/id_rsa_slave01.pub
slave02:
scp /home/hadoop/.ssh/id_rsa.pub hadoop@master:/home/hadoop/.ssh/id_rsa_slave02.pub
scp /home/hadoop/.ssh/id_rsa.pub hadoop@slave01:/home/hadoop/.ssh/id_rsa_slave02.pub
4.分别进每一台主机的/home/hadoop/.ssh/目录下,将除本机产生的公钥(id_rsa.pub)之外的其他公钥使用cat命令添加至authorized_keys中。添加完毕之后使用chmod命令给authorized_keys文件设置权限,然后使用rm命令删除所有的公钥:
master:
cat id_rsa_slave01.pub >> authorized_keys
cat id_rsa_slave02.pub >> authorized_keys
chmod 600 authorized_keys
rm id_rsa*.pub
slave01:
cat id_rsa_master.pub >> authorized_keys
cat id_rsa_slave02.pub >> authorized_keys
chmod 600 authorized_keys
rm id_rsa*.pub
slave02:
cat id_rsa_master.pub >> authorized_keys
cat id_rsa_slave01.pub >> authorized_keys
chmod 600 authorized_keys
rm id_rsa*.pub
完成上述步骤,就可以实现从任意一台机器通过ssh命令免密码登录任意一台其他机器了。
本文采用的是Hadoop 2.9.1版本,下载Hadoop:
1.进入 /usr/local/hadoop/
目录,下载hadoop-2.9.1.tar.gz文件并解压:
wget http://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/stable/hadoop-2.9.1.tar.gz
tar -zxvf hadoop-2.9.1.tar.gz
2.生成Hadoop相关数据目录
在master上,首先/home/hadoop/目录下创建以下目录:
mkdir -p /usr/local/hadoop/hadoop-2.9.1/hadoopdir/name
mkdir -p /usr/local/hadoop/hadoop-2.9.1/hadoopdir/data
mkdir -p /usr/local/hadoop/hadoop-2.9.1/hadoopdir/temp
mkdir -p /usr/local/hadoop/hadoop-2.9.1/hadoopdir/logs
mkdir -p /usr/local/hadoop/hadoop-2.9.1/hadoopdir/pids
3.配置Hadoop脚本文件
hadoop-env.sh
export JAVA_HOME=/usr/local/java/jdk1.8.0_181
export HADOOP_LOG_DIR=/usr/local/hadoop/hadoop-2.9.1/hadoopdir/logs
export HADOOP_PID_DIR=/usr/local/hadoop/hadoop-2.9.1/hadoopdir/pids
mapred-env.sh
export JAVA_HOME=/usr/local/java/jdk1.8.0_181
export HADOOP_MAPRED_LOG_DIR=/usr/local/hadoop/hadoop-2.9.1/hadoopdir/logs
export HADOOP_MAPRED_PID_DIR=/usr/local/hadoop/hadoop-2.9.1/hadoopdir/pids
yarn-env.sh
export JAVA_HOME=/usr/java/jdk1.8.0_112
YARN_LOG_DIR=/usr/local/hadoop/hadoop-2.9.1/hadoopdir/logs
Slaves文件
#localhost
slave1
slave2
注意: 如果slaves文件里面不注释localhost,意思是把本机也作为一个DataNode节点
4.配置Hadoop相关XML文件
core-site.xml
<configuration>
<property>
<name>fs.defaultFSname>
<value>hdfs://master:9000value>
property>
<property>
<name>io.file.buffer.sizename>
<value>131072value>
property>
<property>
<name>hadoop.tmp.dirname>
<value>file:///usr/local/hadoop/hadoop-2.9.1/hadoopdir/tempvalue>
property>
configuration>
hdfs-site.xml
<configuration>
<property>
<name>dfs.namenode.name.dirname>
<value>file:///usr/local/hadoop/hadoop-2.9.1/hadoopdir/namevalue>
property>
<property>
<name>dfs.datanode.data.dirname>
<value>file:///usr/local/hadoop/hadoop-2.9.1/hadoopdir/datavalue>
property>
<property>
<name>dfs.replicationname>
<value>2value>
property>
<property>
<name>dfs.blocksizename>
<value>64mvalue>
property>
<property>
<name>dfs.namenode.secondary.http-addressname>
<value>master:9001value>
property>
<property>
<name>dfs.webhdfs.enabledname>
<value>truevalue>
property>
configuration>
mapred-site.xml
cp mapred-site.xml.template mapred-site.xml
vi mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.namename>
<value>yarnvalue>
<final>truefinal>
property>
<property>
<name>mapreduce.jobhistory.addressname>
<value>master:10020value>
property>
<property>
<name>mapreduce.jobtracker.http.addressname>
<value>master:50030value>
property>
<property>
<name>mapred.job.trackername>
<value>http://master:9001value>
property>
<property>
<name>mapreduce.jobhistory.webapp.addressname>
<value>master:19888value>
property>
configuration>
yarn-site.xml
<property>
<name>yarn.nodemanager.aux-servicesname>
<value>mapreduce_shufflevalue>
property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce_shuffle.classname>
<value>org.apache.hadoop.mapred.ShuffleHandlervalue>
property>
<property>
<name>yarn.resourcemanager.hostnamename>
<value>mastervalue>
property>
<property>
<name>yarn.resourcemanager.scheduler.addressname>
<value>master:8030value>
property>
<property>
<name>yarn.resourcemanager.resource-tracker.addressname>
<value>master:8031value>
property>
<property>
<name>yarn.resourcemanager.addressname>
<value>master:8032value>
property>
<property>
<name>yarn.resourcemanager.admin.addressname>
<value>master:8033value>
property>
<property>
<name>yarn.resourcemanager.webapp.addressname>
<value>master:8088value>
property>
5.master节点下,将/usr/local/hadoop/hadoop-2.9.1目录里面所有内容拷贝至其他节点
scp -r /usr/local/hadoop/ hadoop@slave01:/usr/local/hadoop/
scp -r /usr/local/hadoop/ hadoop@slave02:/usr/local/hadoop/
6.进入/usr/local/hadoop/hadoop-2.9.1/bin目录,格式化文件系统:
./hdfs namenode -format
格式化文件系统会产生一系列的终端输出,在输出最后几行看到STATUS=0表示格式化成功,如果格式化失败请详细查看日志确定错误原因。
7.进入/usr/local/hadoop/hadoop-2.9.1/sbin目录:
./start-dfs.sh
./start-yarn.sh
上述命令就启动了hdfs和yarn。hadoop集群就跑起来了,如果要关闭,在sbin目录下执行以下命令:
./stop-yarn.sh
./stop-dfs.sh
8.HDFS启动示例
可以在master:50070网页上看到如下结果,可以看到集群信息和DataNode相关信息:
Hive 内置了derby数据库,Hive默认使用内嵌的Derby数据库来存储它的元数据,但由于Derby数据库只支持单会话,所以,通常会使用MySQL作为它的外置存储引擎,方便多用户同时访问,这里以MySQL 5.7为例。
1.下载MySQL Yum 包
wget http://repo.mysql.com/mysql57-community-release-el7-10.noarch.rpm
2.安转软件源
rpm -Uvh mysql57-community-release-el7-10.noarch.rpm
3.安装mysql服务端
yum install -y mysql-community-server
4.启动MySQL,并设置开机启动
service mysqld start
systemctl start mysqld.service
5.检查mysql 的运行状态
service mysqld status
systemctl status mysqld.service
6.修改MySQL密码
为了加强安全性,MySQL5.7为root用户随机生成了一个密码,在error log中,关于error log的位置,如果安装的是RPM包,则默认是/var/log/mysqld.log。
只有启动过一次mysql才可以查看临时密码
(1)通过以下命令可以看到初始密码:
grep 'temporary password' /var/log/mysqld.log
(2)使用默认密码登陆并修改密码
mysql -uroot -p
用默认密码登录到服务端后,必须马上修改密码,不然会报如下错误:
mysql> select @@log_error;
ERROR 1820 (HY000): You must reset your password using ALTER USER statement before executing this statement.
mysql>
修改密码:
ALTER USER 'root'@'localhost' IDENTIFIED BY 'root123456';
授权其他机器登陆:
GRANT ALL PRIVILEGES ON *.* TO 'root'@'%' IDENTIFIED BY 'root123456' WITH GRANT OPTION;
FLUSH PRIVILEGES;
Hive官网地址:http://hive.apache.org/index.html
Hive下载地址:https://mirrors.tuna.tsinghua.edu.cn/apache/hive/
注意: 在安装Hive之前,需要保证你的Hadoop集群已经正常启动,Hive只需在Hadoop集群的NameNode节点上安装即可,无需在DataNode节点上安装。
本文安装的是 apache-hive-2.3.3-bin.tar.gz 其下载地址为:
https://mirrors.tuna.tsinghua.edu.cn/apache/hive/hive-2.3.3/apache-hive-2.3.3-bin.tar.gz
# 切换到hive压缩包的下载目录
cd /usr/local/hive/
# 据自己的实际需要下载相应版本的 hive 压缩包
wget https://mirrors.tuna.tsinghua.edu.cn/apache/hive/hive-2.3.3/apache-hive-2.3.3-bin.tar.gz
--2018-08-12 13:53:31-- https://mirrors.tuna.tsinghua.edu.cn/apache/hive/hive-2.3.3/apache-hive-2.3.3-bin.tar.gz
Resolving mirrors.tuna.tsinghua.edu.cn (mirrors.tuna.tsinghua.edu.cn)... 101.6.8.193, 2402:f000:1:408:8100::1
Connecting to mirrors.tuna.tsinghua.edu.cn (mirrors.tuna.tsinghua.edu.cn)|101.6.8.193|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 232229830 (221M) [application/octet-stream]
Saving to: ‘apache-hive-2.3.3-bin.tar.gz’
100%[==============================================================================>] 232,229,830 1.54MB/s in 1m 52s
2018-08-12 13:55:24 (1.97 MB/s) - ‘apache-hive-2.3.3-bin.tar.gz’ saved [232229830/232229830]
# 将下载好的 hive 压缩包解压到
tar zxvf apache-hive-2.3.3-bin.tar.gz
# 之前已经将 /usr/local/hive/ 目录权限分配给hadoop用户
1.执行 sudo vim /etc/profile
命令配置环境变量,在 /etc/profile 配置文件中添加 Hive 环境变量,内容如下:
export HIVE_HOME=/usr/local/hive/apache-hive-2.3.3
export HIVE_CONF_DIR=$HIVE_HOME/conf
PATH=$HIVE_HOME/bin:$PATH
然后使用命令 source /etc/profile
使环境变量生效。
2.创建hive-site.xml
# 在开始配置Hive之前,先执行如下命令,切换到Hive的操作账户,我的是 hadoop
su - hadoop
cd $HIVE_CONF_DIR
# 以 hive-default.xml.template 为模板,创建 hive-site.xml
cp hive-default.xml.template hive-site.xml
3.在HDFS中创建Hive所需目录
因为在hive-site.xml中有以下配置:
<property>
<name>hive.metastore.warehouse.dirname>
<value>/user/hive/warehousevalue>
<description>location of default database for the warehousedescription>
property>
<property>
<name>hive.exec.scratchdirname>
<value>/tmp/hivevalue>
<description>HDFS root scratch dir for Hive jobs which gets created with write all (733) permission. description>
property>
所以需要在HDFS中创建好相应的目录,操作命令如下:
[hadoop@master conf]$ hdfs dfs -mkdir -p /user/hive/warehouse
[hadoop@master conf]$ hdfs dfs -chmod -R 777 /user/hive/warehouse
[hadoop@master conf]$ hdfs dfs -mkdir -p /tmp/hive
[hadoop@master conf]$ hdfs dfs -chmod -R 777 /tmp/hive
[hadoop@master conf]$ hdfs dfs -ls /
Found 2 items
drwx------ - hadoop supergroup 0 2018-08-12 11:53 /tmp
drwxr-xr-x - hadoop supergroup 0 2018-08-12 14:31 /user
[hadoop@master conf]$ hdfs dfs -ls /tmp/
Found 1 items
drwxrwxrwx - hadoop supergroup 0 2018-08-12 11:53 /tmp/hive
[hadoop@master conf]$ hdfs dfs -ls /user/hive
Found 1 items
drwxrwxrwx - hadoop supergroup 0 2018-08-12 14:31 /user/hive/warehouse
4.配置hive-site.xml
(1) 配置hive本地临时目录
将hive-site.xml文件中的${system:java.io.tmpdir}替换为hive的本地临时目录,例如我使用的是 /usr/local/hive-2.3.3/tmp ,如果该目录不存在,需要先进行创建,并且赋予读写权限:
[hadoop@master conf]$ cd $HIVE_HOME
[hadoop@master hive-2.3.3]$ mkdir tmp/
[hadoop@master hive-2.3.3]$ chmod -R 777 tmp/
[hadoop@master hive-2.3.3]$ cd $HIVE_CONF_DIR
[hadoop@master conf]$ vim hive-site.xml
在vim命令模式下执行如下命令完成替换:
:%s#${system:java.io.tmpdir}#/usr/local/hive-2.3.3/tmp#g
例如:
<property>
<name>hive.exec.local.scratchdirname>
<value>${system:java.io.tmpdir}/${system:user.name}value>
<description>Local scratch space for Hive jobsdescription>
property>
替换为:
<property>
<name>hive.exec.local.scratchdirname>
<value>/usr/local/hive-2.3.3/tmp/${system:user.name}value>
<description>Local scratch space for Hive jobsdescription>
property>
(2) 配置Hive用户名
将hive-site.xml文件中的 ${system:user.name} 替换为操作Hive的账户的用户名,例如我的是 hadoop 。在vim命令模式下执行如下命令完成替换:
:%s#${system:user.name}#hadoop#g
例如:
<property>
<name>hive.exec.local.scratchdirname>
<value>/usr/local/hive-2.3.3/tmp/${system:user.name}value>
<description>Local scratch space for Hive jobsdescription>
property>
替换为:
<property>
<name>hive.exec.local.scratchdirname>
<value>/usr/local/hive-2.3.3/tmp/hadoopvalue>
<description>Local scratch space for Hive jobsdescription>
property>
(3) 修改Hive数据库配置
属性名称 | 描述 |
---|---|
javax.jdo.option.ConnectionDriverName | 数据库的驱动类名称 |
javax.jdo.option.ConnectionURL | 数据库的JDBC连接地址 |
javax.jdo.option.ConnectionUserName | 连接数据库所使用的用户名 |
javax.jdo.option.ConnectionPassword | 连接数据库所使用的密码 |
Hive默认的配置使用的是Derby数据库来存储Hive的元数据信息,其配置信息如下:
<property>
<name>javax.jdo.option.ConnectionDriverNamename>
<value>org.apache.derby.jdbc.EmbeddedDrivervalue>
<description>Driver class name for a JDBC metastoredescription>
property>
<property>
<name>javax.jdo.option.ConnectionURLname>
<value>jdbc:derby:;databaseName=metastore_db;create=truevalue>
<description>
JDBC connect string for a JDBC metastore.
To use SSL to encrypt/authenticate the connection, provide database-specific SSL flag in the connection URL.
For example, jdbc:postgresql://myhost/db?ssl=true for postgres database.
description>
property>
<property>
<name>javax.jdo.option.ConnectionUserNamename>
<value>APPvalue>
<description>Username to use against metastore databasedescription>
property>
<property>
<name>javax.jdo.option.ConnectionPasswordname>
<value>minevalue>
<description>password to use against metastore databasedescription>
property>
需要将Derby数据库切换为MySQL数据库的话,只需要修改以上4项配置,例如,我的是:
<property>
<name>javax.jdo.option.ConnectionDriverNamename>
<value>com.mysql.cj.jdbc.Drivervalue>
<description>Driver class name for a JDBC metastoredescription>
property>
<property>
<name>javax.jdo.option.ConnectionURLname>
<value>jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true&useSSL=falsevalue>
<description>
JDBC connect string for a JDBC metastore.
To use SSL to encrypt/authenticate the connection, provide database-specific SSL flag in the connection URL.
For example, jdbc:postgresql://myhost/db?ssl=true for postgres database.
description>
property>
<property>
<name>javax.jdo.option.ConnectionUserNamename>
<value>rootvalue>
<description>Username to use against metastore databasedescription>
property>
<property>
<name>javax.jdo.option.ConnectionPasswordname>
<value>root123456value>
<description>password to use against metastore databasedescription>
property>
在配置
javax.jdo.option.ConnectionURL
的时候,使用useSSL=false,禁用MySQL连接警告,而且可能会导致Hive初始化MySQL元数据失败。
此外,还需要将MySQL的驱动包拷贝到Hive的lib目录下:
因为MySQL官方强烈建议使用MySQL Connector/J 8.0与MySQL Server 8.0、5.7、5.6和5.5一起使用。所以使用mysql-connector-java-8.0.12.jar,上面配置文件中的驱动名称是
com.mysql.cj.jdbc.Driver
cp /home/hadoop/mysql-connector-java-8.0.12.jar $HIVE_HOME/lib/
5.配置 hive-env.sh
[hadoop@master conf]$ cd $HIVE_CONF_DIR
[hadoop@master conf]$ cp hive-env.sh.template hive-env.sh
[hadoop@master conf]$ vim hive-env.sh
# 编辑 hive-env.sh 增加下面3行内容
export HADOOP_HOME=/usr/local/hadoop/hadoop-2.9.1
export HIVE_CONF_DIR=/usr/local/hive/hive-2.3.3/conf
export HIVE_AUX_JARS_PATH=/usr/local/hive/hive-2.3.3/lib
1.Hive数据库初始化
[hadoop@master conf]$ cd $HIVE_HOME/bin
# 初始化mysql数据库
[hadoop@master bin]$ schematool -initSchema -dbType mysql
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/hive-2.3.3/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/hadoop-2.9.1/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Metastore connection URL: jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true&useSSL=false
Metastore Connection Driver : com.mysql.cj.jdbc.Driver
Metastore connection User: root
Starting metastore schema initialization to 2.3.0
Initialization script hive-schema-2.3.0.mysql.sql
Initialization script completed
schemaTool completed
数据库初始化完成之后,会在MySQL数据库里生成如下metadata表用于存储Hive的元数据信息:
2.启动Hive
[hadoop@master bin]$ cd $HIVE_HOME/bin
# 使用 hive 命令启动Hive
[hadoop@master bin]$ ./hive
which: no hbase in (/usr/local/hive-2.3.3/bin:/usr/local/jdk1.8.0_144/bin:/usr/local/jdk1.8.0_144/bin:/usr/lib64/qt-3.3/bin:/usr/local/hive-2.3.3/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/usr/local/zookeeper-3.4.12/bin:/home/hadoop/.local/bin:/home/hadoop/bin:/usr/local/hadoop-2.9.1/bin:/usr/local/hadoop-2.9.1/sbin:/home/hadoop/.local/bin:/home/hadoop/bin:/home/hadoop/.local/bin:/home/hadoop/bin:/usr/local/hadoop-2.9.1/bin:/usr/local/hadoop-2.9.1/sbin:/usr/local/zookeeper-3.4.12/bin:/home/hadoop/.local/bin:/home/hadoop/bin:/usr/local/hadoop-2.9.1/bin:/usr/local/hadoop-2.9.1/sbin)
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/hive-2.3.3/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/hadoop-2.9.1/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Logging initialized using configuration in jar:file:/usr/local/hive-2.3.3/lib/hive-common-2.3.3.jar!/hive-log4j2.properties Async: true
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
hive> show databases;
OK
Time taken: 5.682 seconds, Fetched: 0 row(s)
hive> desc function sum;
OK
sum(x) - Returns the sum of a set of numbers
Time taken: 0.008 seconds, Fetched: 1 row(s)
hive>
THE END