一、网络规划
服务器 | IP地址 | 软件 | 说明 |
---|---|---|---|
master | 192.168.71.130 | MySql | 主机 |
slave1 | 192.168.71.129 | 从机 | |
slave2 | 192.168.71.132 | 从机 |
前置条件
:
首先,Hive是依赖于hadoop系统的,因此在运行Hive之前需要保证已经搭建好hadoop集群环境。
二、安装MySQL
1)安装MySql服务器
root@master:~# sudo apt-get install mysql-server
2)安装客户端:
root@master:~# sudo apt install mysql-client
3)安装库开发环境:
root@master:~# sudo apt install libmysqlclient-dev
所需的库和包含文件。
4)检测是否安装成功:
root@master:~# netstat -npl|grep 3306
tcp 0 0 127.0.0.1:3306 0.0.0.0:* LISTEN 92330/mysqld
如果看到有mysql 的socket处于 listen 状态则表示安装成功。
5)登录测试
默认root用户为空密码,直接回车登录即可。
root@master:~# mysql -u root -p
Enter password:
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 2
Server version: 5.7.28-0ubuntu0.18.04.4 (Ubuntu)
Copyright (c) 2000, 2019, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql> show databases;
+--------------------+
| Database |
+--------------------+
| information_schema |
| mysql |
| performance_schema |
| sys |
+--------------------+
4 rows in set (0.00 sec)
mysql>
- 修改root密码
# 改变当前数据库
mysql> use mysql;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Database changed
# 更改root密码为1234
mysql> update user set authentication_string=PASSWORD("1234") where User='root';
Query OK, 1 row affected, 1 warning (0.00 sec)
Rows matched: 1 Changed: 1 Warnings: 1
# 更新插件为mysql_native_password
mysql> update user set plugin="mysql_native_password";
Query OK, 1 row affected (0.00 sec)
Rows matched: 4 Changed: 1 Warnings: 0
# 更新所有操作权限
mysql> flush privileges;
Query OK, 0 rows affected (0.00 sec)
# 退出mysql
mysql> quit;
Bye
# 重启mysql服务
root@master:~# systemctl restart mysql
# 新密码登录测试
root@master:~# mysql -u root -p
Enter password:
- Mysql开启远程连接
# 给用户添加权限
mysql> GRANT ALL PRIVILEGES ON *.* TO 'root'@'%' IDENTIFIED BY '1234' WITH GRANT OPTION;
Query OK, 0 rows affected, 1 warning (0.00 sec)
# 更新所有操作权限
mysql> flush privileges;
Query OK, 0 rows affected (0.00 sec)
# 退出mysql
mysql> quit
Bye
# 配置文件
root@master:~# vi /etc/mysql/mysql.conf.d/mysqld.cnf
# 修改
#bind-address = 127.0.0.1
bind-address = 0.0.0.0
# 重启mysql服务
root@master:~# systemctl restart mysql
三、Hive下载安装
- 下载Hive
下载地址:https://mirrors.tuna.tsinghua.edu.cn/apache/hive/
# 下载
root@master:~# wget https://mirrors.tuna.tsinghua.edu.cn/apache/hive/hive-2.3.6/apache-hive-2.3.6-bin.tar.gz
# 解压
root@master:~# tar -zxvf apache-hive-2.3.6-bin.tar.gz -C /usr/local
- 配置环境变量
root@master:~# vi /etc/profile
# 内容
export HIVE_HOME=/usr/local/apache-hive-2.3.6-bin
export PATH=$PATH:$HIVE_HOME/bin
- Metastore
metastore是Hive元数据集中存放地。它包括两部分:服务和后台数据存储。有三种方式配置metastore:内嵌metastore、本地metastore以及远程metastore。
Hive官网上介绍了Hive的3种安装方式,分别对应不同的应用场景。
1、内嵌模式(元数据保村在内嵌的derby种,允许一个会话链接,尝试多个会话链接时会报错)
2、本地模式(本地安装mysql 替代derby存储元数据)
3、远程模式(远程安装mysql 替代derby存储元数据) - 日志配置
root@master:/usr/local/apache-hive-2.3.6-bin# cp conf/hive-log4j2.properties.template conf/hive-log4j2.properties
root@master:/usr/local/apache-hive-2.3.6-bin# vi conf/hive-log4j2.properties
# 内容 24行
property.hive.log.dir = /hive/tmpdir/root
四、内嵌模式(单用户模式)
这种安装模式的元数据是内嵌在Derby数据库中的,只能允许一个会话连接,数据会存放到HDFS上。
- Hive配置:hive-site.xml
修改如下属性
(hive.exec.local.scratchdir,hive.downloaded.resources.dir,hive.querylog.location,hive.server2.logging.operation.log.location)
值定位到能读写的目录;比如 /hive/tmpdir;
root@master:/usr/local/apache-hive-2.3.6-bin# cp conf/hive-default.xml.template conf/hive-site.xml
root@master:/usr/local/apache-hive-2.3.6-bin# vi conf/hive-site.xml
# 内容
hive.exec.local.scratchdir
/hive/tmpdir/root
Hive作业本地空间
hive.downloaded.resources.dir
/hive/tmpdir/resources
远程文件系统中添加资源的临时本地目录
hive.querylog.location
/hive/tmpdir/logs
Hive日志文件的位置
hive.server2.logging.operation.log.location
/hive/tmpdir/operation_logs
如果启用日志记录功能,存储操作日志目录
hive.metastore.schema.verification
false
版本检查:属性为flase
- 数据库初始化
# 初始化数据库
root@master:/usr/local/apache-hive-2.3.6-bin/bin# schematool -initSchema -dbType derby
Metastore connection URL: jdbc:derby:;databaseName=metastore_db;create=true
Metastore Connection Driver : org.apache.derby.jdbc.EmbeddedDriver
Metastore connection User: APP
Starting metastore schema initialization to 2.3.0
Initialization script hive-schema-2.3.0.derby.sql
Initialization script completed
schemaTool completed
- 启动Hive
root@master:/usr/local/apache-hive-2.3.6-bin/bin# hive
Logging initialized using configuration in file:/usr/local/apache-hive-2.3.6-bin/conf/hive-log4j2.properties Async: true
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
# 显示数据库
hive> show databases;
OK
default
Time taken: 7.185 seconds, Fetched: 1 row(s)
hive>
- 测试
# 创建内部表t1,只有一个int类型的id字段
hive> CREATE TABLE t1(id int);
hive> quit;
# 创建文件:
root@hadoopmaster:~# vi t1.txt
# 输入内容:
# 查看内容:
root@hadoopmaster:~# cat t1.txt
1
2
3
4
5
6
7
8
9
# 加载数据
hive> LOAD DATA LOCAL INPATH '/root/t1.txt' INTO TABLE t1;
# 数据查询:
hive> select id from t1;
五、本地模式(多用户模式)
这种安装方式和嵌入式的区别在于,不再使用内嵌的Derby作为元数据的存储介质,而是使用其他数据库比如MySQL来存储元数据。
这种方式是一个多用户的模式,运行多个用户client连接到一个数据库中。这种方式一般作为公司内部同时使用Hive。
这里有一个前提,每一个用户必须要有对MySQL的访问权利,即每一个客户端使用者需要知道MySQL的用户名和密码才行。
- 配置MySql
查找并配置以下内容:
# 修改hive-site.xml 文件
root@master:/usr/local/apache-hive-2.3.6-bin# vi conf/hive-site.xml
# 内容
javax.jdo.option.ConnectionDriverName
com.mysql.jdbc.Driver
JDBC驱动
javax.jdo.option.ConnectionUserName
root
连接数据库的用户名
javax.jdo.option.ConnectionPassword
1234
数据库密码
javax.jdo.option.ConnectionURL
jdbc:mysql://192.168.71.130:3306/hive?characterEncoding=UTF8&useSSL=false&createDatabaseIfNotExist=true
mysql连接字串
添加驱动jar文件
mysql-connector-java-5.1.48.jar 到 /usr/local/apache-hive-2.3.6-bin/lib/目录。初始化Mysql数据库
# 初始化元数据
root@master:/usr/local/apache-hive-2.3.6-bin# ./bin/schematool -initSchema -dbType mysql
Metastore connection URL: jdbc:mysql://192.168.71.130:3306/hive?characterEncoding=UTF8&useSSL=false&createDatabaseIfNotExist=true
Metastore Connection Driver : com.mysql.jdbc.Driver
Metastore connection User: root
Starting metastore schema initialization to 2.3.0
Initialization script hive-schema-2.3.0.mysql.sql
Initialization script completed
schemaTool completed
# 查看数据库信息
root@master:/usr/local/apache-hive-2.3.6-bin# ./bin/schematool -dbType mysql -info
Metastore connection URL: jdbc:mysql://192.168.71.130:3306/hive?characterEncoding=UTF8&useSSL=false&createDatabaseIfNotExist=true
Metastore Connection Driver : com.mysql.jdbc.Driver
Metastore connection User: root
Hive distribution version: 2.3.0
Metastore schema version: 2.3.0
schemaTool completed
- 启动Hive
# 启动Hive
root@master:/usr/local/apache-hive-2.3.6-bin# ./bin/hive
Logging initialized using configuration in file:/usr/local/apache-hive-2.3.6-bin/conf/hive-log4j2.properties Async: true
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
hive>
# 查看数据库
hive> show databases;
OK
default
Time taken: 4.897 seconds, Fetched: 1 row(s)
hive>
- 数据操作
# 创建表
hive> create table users(id int,name string);
OK
Time taken: 0.839 seconds
# 查看表结构
hive> desc users;
OK
id int
name string
Time taken: 5.192 seconds, Fetched: 2 row(s)
# 插入记录
hive> insert into users values (1,'houjianjun');
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20191129105035_9497607a-510f-4e75-96fa-190ebe7ac010
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1574995391746_0003, Tracking URL = http://master:8088/proxy/application_1574995391746_0003/
Kill Command = /usr/local/hadoop-2.9.2/bin/hadoop job -kill job_1574995391746_0003
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2019-11-29 10:50:53,900 Stage-1 map = 0%, reduce = 0%
2019-11-29 10:51:01,137 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.81 sec
MapReduce Total cumulative CPU time: 1 seconds 810 msec
Ended Job = job_1574995391746_0003
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to directory hdfs://hadoopha/user/hive/warehouse/users/.hive-staging_hive_2019-11-29_10-50-35_528_1481323553371246919-1/-ext-10000
Loading data to table default.users
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Cumulative CPU: 1.81 sec HDFS Read: 4091 HDFS Write: 82 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 810 msec
OK
Time taken: 27.553 seconds
# 查询记录
hive> select * from users;
OK
1 houjianjun
Time taken: 0.244 seconds, Fetched: 1 row(s)
六、远程模式
这种模式需要使用hive安装目录下提供的beeline+hiveserver2配合使用才可以。
其原理就是将metadata作为一个单独的服务进行启动。各种客户端通过beeline来连接,连接之前无需知道数据库的密码。
(一)一体式
- 数据库配置同多用户
- 客户远程连接配置
hive.metastore.uris
thrift://192.168.71.130:9083
- 重新初始化数据库
# 删除数据库
mysql> drop database hive;
Query OK, 57 rows affected (2.19 sec)
mysql>
# 初始化数据库
root@master:~# schematool -initSchema -dbType mysql
Metastore connection URL: jdbc:mysql://192.168.71.130:3306/hive?characterEncoding=UTF8&useSSL=false&createDatabaseIfNotExist=true
Metastore Connection Driver : com.mysql.jdbc.Driver
Metastore connection User: root
Starting metastore schema initialization to 2.3.0
Initialization script hive-schema-2.3.0.mysql.sql
Initialization script completed
schemaTool completed
# 启动元数据服务
root@master:~# hive --service metastore &
[1] 13064
# 启动hiveserver2服务
root@master:~# hive --service hiveserver2 &
[1] 13143
- 客户端登录操作
# 客户端直接使用hive命令即可
root@master:~# hive
Logging initialized using configuration in file:/usr/local/apache-hive-2.3.6-bin/conf/hive-log4j2.properties Async: true
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
hive>
# beeline登录操作
root@master:~# beeline
Beeline version 2.3.6 by Apache Hive
# 连接hive 端口号默认为:10000
beeline> !connect jdbc:hive2://192.168.71.130:10000
Connecting to jdbc:hive2://192.168.71.130:10000
# 输入用户名密码,同数据库密码
Enter username for jdbc:hive2://192.168.71.130:10000: root
Enter password for jdbc:hive2://192.168.71.130:10000: ****
Connected to: Apache Hive (version 2.3.6)
Driver: Hive JDBC (version 2.3.6)
Transaction isolation: TRANSACTION_REPEATABLE_READ
# 显示数据库
0: jdbc:hive2://192.168.71.130:10000> show databases;
OK
+----------------+
| database_name |
+----------------+
| default |
+----------------+
1 row selected (1.074 seconds)
创建表,并添加查询记录:
0: jdbc:hive2://192.168.71.130:10000> select * from users;
FAILED: SemanticException [Error 10001]: Line 1:14 Table not found 'users'
Error: Error while compiling statement: FAILED: SemanticException [Error 10001]: Line 1:14 Table not found 'users' (state=42S02,code=10001)
0: jdbc:hive2://192.168.71.130:10000> create table users(id int,name string);
OK
No rows affected (0.679 seconds)
0: jdbc:hive2://192.168.71.130:10000> insert into users values (1,'houjianjun');
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20191129114331_3826f801-4dd1-456b-a0ba-4b9aa9f5245d
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Job running in-process (local Hadoop)
2019-11-29 11:43:36,781 Stage-1 map = 0%, reduce = 0%
2019-11-29 11:43:37,805 Stage-1 map = 100%, reduce = 0%
Ended Job = job_local734466231_0001
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to directory hdfs://hadoopha/user/hive/warehouse/users/.hive-staging_hive_2019-11-29_11-43-31_911_2889855612823797721-1/-ext-10000
Loading data to table default.users
MapReduce Jobs Launched:
Stage-Stage-1: HDFS Read: 13 HDFS Write: 95 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
No rows affected (6.81 seconds)
0: jdbc:hive2://192.168.71.130:10000> select * from users;
OK
+-----------+-------------+
| users.id | users.name |
+-----------+-------------+
| 1 | houjianjun |
+-----------+-------------+
2 rows selected (0.395 seconds)
0: jdbc:hive2://192.168.71.130:10000>
七、常见问题:
- FAILED: SemanticException org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
原因:
因为没有正常启动Hive的 Metastore Server服务进程。
# 前台启动服务
root@master:/usr/local/apache-hive-2.3.6-bin/bin# hive --service metastore -v
2019-11-28 15:17:55: Starting Hive Metastore Server
Starting hive metastore on port 9083
# 后台启动
root@master:/usr/local/apache-hive-2.3.6-bin/bin# hive --service metastore &
[1] 112078
# 移除数据库
root@master:/usr/local/apache-hive-2.3.6-bin/bin# mv metastore_db metastore_db.tmp
# 初始化数据库
root@master:/usr/local/apache-hive-2.3.6-bin/bin# schematool -initSchema -dbType derby
Metastore connection URL: jdbc:derby:;databaseName=metastore_db;create=true
Metastore Connection Driver : org.apache.derby.jdbc.EmbeddedDriver
Metastore connection User: APP
Starting metastore schema initialization to 2.3.0
Initialization script hive-schema-2.3.0.derby.sql
Initialization script completed
schemaTool completed
- SLF4J: Class path contains multiple SLF4J bindings.
解决:
删除其中的一个jar即可。
root@master:/usr/local/apache-hive-2.3.6-bin/lib# mv log4j-slf4j-impl-2.6.2.jar log4j-slf4j-impl-2.6.2.jars
Relative path in absolute URI: %7Bsystem:user.name%7D
修改配置文件路径为绝对。FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
No appenders could be found for logger (org.apache.hadoop.mapreduce.v2.app.MRAppMaster).
FAILED: SemanticException org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
# 查看启动metastore的进程号
$ ps -aux | grep 'metastore'
# 杀死相关进程
$ kill -9 进程号
- Could not open client transport with JDBC Uri: jdbc:hive2://
原因:
没能启动hiveserver2服务
root@master:~# hive --service hiveserver2 &