Hive 是基于 Hadoop 的一个数据仓库,可以将结构化的数据文件映射为一张表,并提供类 sql 查询功能,Hive 底层将 sql 语句转化为 MapReduce 任务运行。
下载 Hive2.3.4 到 maste r的 /home/dc2-user 并解压
wget http://mirror.bit.edu.cn/apache/hive/hive-2.3.4/apache-hive-2.3.4-bin.tar.gz
tar zxvf apache-hive-2.3.4-bin.tar.gz
编辑 /etc/profile 文件, 在其中添加以下内容。
sudo vi /etc/profile
export HIVE_HOME=/home/dc2-user/apache-hive-2.3.4-bin
export PATH=$PATH:$HIVE_HOME/bin
source /etc/profile
cd apache-hive-2.3.4-bin/conf/
cp hive-env.sh.template hive-env.sh
cp hive-default.xml.template hive-site.xml
cp hive-log4j2.properties.template hive-log4j2.properties
cp hive-exec-log4j2.properties.template hive-exec-log4j2.properties
修改 hive-env.sh:
export JAVA_HOME=/home/dc2-user/java/jdk1.8.0_191 ##Java路径
export HADOOP_HOME=/home/dc2-user/hadoop-2.7.7 ##Hadoop安装路径
export HIVE_HOME=/home/dc2-user/apache-hive-2.3.4-bin ##Hive安装路径
export HIVE_CONF_DIR=$HIVE_HOME/conf ##Hive配置文件路径
修改对应属性的 value 值
vi hive-site.xml
HDFS root scratch dir for Hive jobs which gets created with write all (733) permission. For each connecting user, an HDFS scratch dir: ${hive.exec.scratchdir}/ is created, with ${hive.scratch.dir.permission}.
Local scratch space for Hive jobs
Temporary local directory for added resources in the remote file system.
Location of Hive run time structured log file
Top level directory where operation logs are stored if logging functionality is enabled
Hive Metastore 是用来获取 Hive 表和分区的元数据,本例中使用 mariadb 来存储此类元数据。
将 mysql-connector-java-5.1.40-bin.jar 放入 $HIVE_HOME/lib 下并在 hive-site.xml 中配置 MySQL 数据库连接信息。
start-dfs.s #如果在安装配置hadoop是已经启动,则此命令可省略
hdfs dfs -mkdir /tmp
hdfs dfs -mkdir -p /usr/hive/warehouse
hdfs dfs -chmod g+w /tmp
hdfs dfs -chmod g+w /usr/hive/warehouse
本例中使用的是 mariadb。
sudo yum install -y mariadb-server
sudo systemctl start mariadb
登录 mysql,初始无密码,创建 Hve 用户并设置密码。
mysql -uroot
MariaDB [(none)]> create user'hive'@'localhost' identified by 'hive';
Query OK, 0 rows affected (0.00 sec)
MariaDB [(none)]> grant all privileges on *.* to hive@localhost identified by 'hive';
Query OK, 0 rows affected (0.00 sec)
运行 Hive 之前必须保证 HDFS 已经启动,可以使用 start-dfs.sh 来启动,如果之前安装 Hadoop 是已启动,次步骤可略过。
从 Hive 2.1 版本开始, 在启动 Hive 之前需运行 schematool 命令来执行初始化操作:
schematool -dbType mysql -initSchema
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/dc2-user/apache-hive-2.3.4-bin/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/dc2-user/hadoop-2.7.7/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Metastore connection URL: jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true
Metastore Connection Driver : com.mysql.jdbc.Driver
Metastore connection User: hive
Starting metastore schema initialization to 2.3.0
Initialization script hive-schema-2.3.0.mysql.sql
Initialization script completed schemaTool completed
启动 Hive,输入命令 Hive
which: no hbase in (/home/dc2-user/java/jdk1.8.0_191/bin:/home/dc2-user/hadoop-2.7.7/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/usr/local/bin:/home/dc2-user/apache-hive-2.3.4-bin/bin:/home/dc2-user/.local/bin:/home/dc2-user/bin)
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/dc2-user/apache-hive-2.3.4-bin/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/dc2-user/hadoop-2.7.7/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Logging initialized using configuration in file:/home/dc2-user/apache-hive-2.3.4-bin/conf/hive-log4j2.properties Async: true Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
在 Hive中创建一个表:
hive> create table test_hive(id int, name string)
> row format delimited fields terminated by '\t' #字段之间用tab键进行分割
> stored as textfile; # 设置加载数据的数据类型,默认是TEXTFILE,如果文件数据是纯文本,就是使用 [STORED AS TEXTFILE],然后从本地直接拷贝到HDFS上,hive直接可以识别数据
Time taken: 10.857 seconds
hive> show tables;
OK test_hive
Time taken: 0.396 seconds, Fetched: 1 row(s)
可以看到表已经创建成功,输入 quit ; 退出 Hive,接下来以文本形式创建数据:
vi test_tb.txt
101 aa
102 bb
103 cc
进入 Hive,导入数据:
hive> load data local inpath '/home/dc2-user/test_db.txt' into table test_hive;
Loading data to table default.test_hive
Time taken: 6.679 seconds
hive> select * from test_hive;
101 aa
102 bb
103 cc
Time taken: 2.814 seconds, Fetched: 3 row(s)