Hadoop开发--Hive安装

一、网络规划

服务器 IP地址 软件 说明
master 192.168.71.130 MySql 主机
slave1 192.168.71.129 从机
slave2 192.168.71.132 从机

前置条件
首先,Hive是依赖于hadoop系统的,因此在运行Hive之前需要保证已经搭建好hadoop集群环境。

二、安装MySQL

1)安装MySql服务器

root@master:~# sudo apt-get install mysql-server
Hadoop开发--Hive安装_第1张图片
选择NO

Hadoop开发--Hive安装_第2张图片
选择OK

2)安装客户端:

root@master:~# sudo apt install mysql-client

3)安装库开发环境:

root@master:~# sudo apt install libmysqlclient-dev

所需的库和包含文件。
4)检测是否安装成功:

root@master:~# netstat -npl|grep 3306
tcp        0      0 127.0.0.1:3306          0.0.0.0:*               LISTEN      92330/mysqld     

如果看到有mysql 的socket处于 listen 状态则表示安装成功。
5)登录测试
默认root用户为空密码,直接回车登录即可。

root@master:~# mysql -u root -p
Enter password: 
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 2
Server version: 5.7.28-0ubuntu0.18.04.4 (Ubuntu)

Copyright (c) 2000, 2019, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| mysql              |
| performance_schema |
| sys                |
+--------------------+
4 rows in set (0.00 sec)

mysql> 

  1. 修改root密码
# 改变当前数据库
mysql> use mysql;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
# 更改root密码为1234
mysql> update user set authentication_string=PASSWORD("1234") where User='root'; 
Query OK, 1 row affected, 1 warning (0.00 sec)
Rows matched: 1  Changed: 1  Warnings: 1
# 更新插件为mysql_native_password
mysql> update user set plugin="mysql_native_password";
Query OK, 1 row affected (0.00 sec)
Rows matched: 4  Changed: 1  Warnings: 0

# 更新所有操作权限
mysql> flush privileges;
Query OK, 0 rows affected (0.00 sec)
# 退出mysql
mysql> quit;
Bye

# 重启mysql服务
root@master:~# systemctl restart mysql
# 新密码登录测试
root@master:~# mysql -u root -p
Enter password: 

  1. Mysql开启远程连接
# 给用户添加权限
mysql> GRANT ALL PRIVILEGES ON *.* TO 'root'@'%' IDENTIFIED BY '1234' WITH GRANT OPTION;
Query OK, 0 rows affected, 1 warning (0.00 sec)
# 更新所有操作权限
mysql> flush privileges;
Query OK, 0 rows affected (0.00 sec)
# 退出mysql
mysql> quit
Bye

# 配置文件
root@master:~# vi /etc/mysql/mysql.conf.d/mysqld.cnf

# 修改
#bind-address           = 127.0.0.1
bind-address            = 0.0.0.0

# 重启mysql服务
root@master:~# systemctl restart mysql

三、Hive下载安装

  1. 下载Hive
    下载地址:https://mirrors.tuna.tsinghua.edu.cn/apache/hive/
# 下载
root@master:~# wget https://mirrors.tuna.tsinghua.edu.cn/apache/hive/hive-2.3.6/apache-hive-2.3.6-bin.tar.gz
# 解压
root@master:~# tar -zxvf apache-hive-2.3.6-bin.tar.gz -C /usr/local

  1. 配置环境变量
root@master:~#  vi /etc/profile
# 内容
export HIVE_HOME=/usr/local/apache-hive-2.3.6-bin
export PATH=$PATH:$HIVE_HOME/bin

  1. Metastore
      metastore是Hive元数据集中存放地。它包括两部分:服务和后台数据存储。有三种方式配置metastore:内嵌metastore、本地metastore以及远程metastore。
    Hive官网上介绍了Hive的3种安装方式,分别对应不同的应用场景。
    1、内嵌模式(元数据保村在内嵌的derby种,允许一个会话链接,尝试多个会话链接时会报错)
    2、本地模式(本地安装mysql 替代derby存储元数据)
    3、远程模式(远程安装mysql 替代derby存储元数据)
  2. 日志配置
root@master:/usr/local/apache-hive-2.3.6-bin# cp conf/hive-log4j2.properties.template conf/hive-log4j2.properties
root@master:/usr/local/apache-hive-2.3.6-bin# vi conf/hive-log4j2.properties
# 内容 24行
property.hive.log.dir = /hive/tmpdir/root

四、内嵌模式(单用户模式)

  这种安装模式的元数据是内嵌在Derby数据库中的,只能允许一个会话连接,数据会存放到HDFS上。

  1. Hive配置:hive-site.xml
    修改如下属性
    (hive.exec.local.scratchdir,hive.downloaded.resources.dir,hive.querylog.location,hive.server2.logging.operation.log.location)
    值定位到能读写的目录;比如 /hive/tmpdir;
root@master:/usr/local/apache-hive-2.3.6-bin# cp conf/hive-default.xml.template conf/hive-site.xml
root@master:/usr/local/apache-hive-2.3.6-bin# vi conf/hive-site.xml
# 内容
  
    hive.exec.local.scratchdir
    /hive/tmpdir/root
   Hive作业本地空间
  

  
    hive.downloaded.resources.dir
    /hive/tmpdir/resources
  远程文件系统中添加资源的临时本地目录
  

  
    hive.querylog.location
    /hive/tmpdir/logs
  Hive日志文件的位置


  
    hive.server2.logging.operation.log.location
    /hive/tmpdir/operation_logs
  如果启用日志记录功能,存储操作日志目录
  

  
    hive.metastore.schema.verification
    false
     版本检查:属性为flase 
  
  1. 数据库初始化
# 初始化数据库
root@master:/usr/local/apache-hive-2.3.6-bin/bin# schematool -initSchema -dbType derby
Metastore connection URL:        jdbc:derby:;databaseName=metastore_db;create=true
Metastore Connection Driver :    org.apache.derby.jdbc.EmbeddedDriver
Metastore connection User:       APP
Starting metastore schema initialization to 2.3.0
Initialization script hive-schema-2.3.0.derby.sql
Initialization script completed
schemaTool completed

  1. 启动Hive
root@master:/usr/local/apache-hive-2.3.6-bin/bin# hive

Logging initialized using configuration in file:/usr/local/apache-hive-2.3.6-bin/conf/hive-log4j2.properties Async: true
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.

# 显示数据库
hive> show databases;
OK
default
Time taken: 7.185 seconds, Fetched: 1 row(s)
hive> 

  1. 测试
#  创建内部表t1,只有一个int类型的id字段
hive> CREATE TABLE t1(id int);
hive> quit;

# 创建文件:
root@hadoopmaster:~# vi t1.txt
# 输入内容:
# 查看内容:
root@hadoopmaster:~# cat t1.txt
1
2
3
4
5
6
7
8
9
# 加载数据
hive> LOAD DATA LOCAL INPATH '/root/t1.txt' INTO TABLE t1;

# 数据查询:
hive> select id from t1;

五、本地模式(多用户模式)

  这种安装方式和嵌入式的区别在于,不再使用内嵌的Derby作为元数据的存储介质,而是使用其他数据库比如MySQL来存储元数据。
  这种方式是一个多用户的模式,运行多个用户client连接到一个数据库中。这种方式一般作为公司内部同时使用Hive。
这里有一个前提,每一个用户必须要有对MySQL的访问权利,即每一个客户端使用者需要知道MySQL的用户名和密码才行。

  1. 配置MySql
    查找并配置以下内容:
# 修改hive-site.xml 文件
root@master:/usr/local/apache-hive-2.3.6-bin# vi conf/hive-site.xml 
# 内容
  
    javax.jdo.option.ConnectionDriverName
    com.mysql.jdbc.Driver
    JDBC驱动
  
  
    javax.jdo.option.ConnectionUserName
    root
    连接数据库的用户名
  
  
    javax.jdo.option.ConnectionPassword
    1234
    数据库密码
  
  
    javax.jdo.option.ConnectionURL
    jdbc:mysql://192.168.71.130:3306/hive?characterEncoding=UTF8&useSSL=false&createDatabaseIfNotExist=true
    
    mysql连接字串
    
  

  1. 添加驱动jar文件
    mysql-connector-java-5.1.48.jar 到 /usr/local/apache-hive-2.3.6-bin/lib/目录。

  2. 初始化Mysql数据库

# 初始化元数据
root@master:/usr/local/apache-hive-2.3.6-bin# ./bin/schematool -initSchema -dbType mysql
Metastore connection URL:        jdbc:mysql://192.168.71.130:3306/hive?characterEncoding=UTF8&useSSL=false&createDatabaseIfNotExist=true
Metastore Connection Driver :    com.mysql.jdbc.Driver
Metastore connection User:       root
Starting metastore schema initialization to 2.3.0
Initialization script hive-schema-2.3.0.mysql.sql
Initialization script completed
schemaTool completed
# 查看数据库信息
root@master:/usr/local/apache-hive-2.3.6-bin# ./bin/schematool -dbType mysql -info
Metastore connection URL:        jdbc:mysql://192.168.71.130:3306/hive?characterEncoding=UTF8&useSSL=false&createDatabaseIfNotExist=true
Metastore Connection Driver :    com.mysql.jdbc.Driver
Metastore connection User:       root
Hive distribution version:       2.3.0
Metastore schema version:        2.3.0
schemaTool completed

  1. 启动Hive
# 启动Hive
root@master:/usr/local/apache-hive-2.3.6-bin# ./bin/hive

Logging initialized using configuration in file:/usr/local/apache-hive-2.3.6-bin/conf/hive-log4j2.properties Async: true
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
hive> 

# 查看数据库
hive> show databases;
OK
default
Time taken: 4.897 seconds, Fetched: 1 row(s)
hive> 

  1. 数据操作
# 创建表
hive> create table users(id int,name string);
OK
Time taken: 0.839 seconds
# 查看表结构
hive> desc users;
OK
id                      int                                         
name                    string                                      
Time taken: 5.192 seconds, Fetched: 2 row(s)

# 插入记录
hive> insert into users values (1,'houjianjun');
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20191129105035_9497607a-510f-4e75-96fa-190ebe7ac010
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1574995391746_0003, Tracking URL = http://master:8088/proxy/application_1574995391746_0003/
Kill Command = /usr/local/hadoop-2.9.2/bin/hadoop job  -kill job_1574995391746_0003
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2019-11-29 10:50:53,900 Stage-1 map = 0%,  reduce = 0%
2019-11-29 10:51:01,137 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.81 sec
MapReduce Total cumulative CPU time: 1 seconds 810 msec
Ended Job = job_1574995391746_0003
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to directory hdfs://hadoopha/user/hive/warehouse/users/.hive-staging_hive_2019-11-29_10-50-35_528_1481323553371246919-1/-ext-10000
Loading data to table default.users
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1   Cumulative CPU: 1.81 sec   HDFS Read: 4091 HDFS Write: 82 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 810 msec
OK
Time taken: 27.553 seconds

# 查询记录
hive> select * from users;
OK
1       houjianjun
Time taken: 0.244 seconds, Fetched: 1 row(s)

六、远程模式

  这种模式需要使用hive安装目录下提供的beeline+hiveserver2配合使用才可以。
  其原理就是将metadata作为一个单独的服务进行启动。各种客户端通过beeline来连接,连接之前无需知道数据库的密码。
(一)一体式

  1. 数据库配置同多用户
  2. 客户远程连接配置
  
   hive.metastore.uris  
   thrift://192.168.71.130:9083  
  
  1. 重新初始化数据库
# 删除数据库
mysql> drop database hive;
Query OK, 57 rows affected (2.19 sec)

mysql> 

# 初始化数据库
root@master:~# schematool -initSchema -dbType mysql
Metastore connection URL:        jdbc:mysql://192.168.71.130:3306/hive?characterEncoding=UTF8&useSSL=false&createDatabaseIfNotExist=true
Metastore Connection Driver :    com.mysql.jdbc.Driver
Metastore connection User:       root
Starting metastore schema initialization to 2.3.0
Initialization script hive-schema-2.3.0.mysql.sql
Initialization script completed
schemaTool completed

# 启动元数据服务
root@master:~# hive --service metastore &
[1] 13064

# 启动hiveserver2服务
root@master:~# hive --service hiveserver2 &
[1] 13143

  1. 客户端登录操作
# 客户端直接使用hive命令即可
root@master:~# hive

Logging initialized using configuration in file:/usr/local/apache-hive-2.3.6-bin/conf/hive-log4j2.properties Async: true
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
hive> 

# beeline登录操作
root@master:~# beeline
Beeline version 2.3.6 by Apache Hive
# 连接hive 端口号默认为:10000
beeline> !connect jdbc:hive2://192.168.71.130:10000
Connecting to jdbc:hive2://192.168.71.130:10000
# 输入用户名密码,同数据库密码
Enter username for jdbc:hive2://192.168.71.130:10000: root
Enter password for jdbc:hive2://192.168.71.130:10000: ****
Connected to: Apache Hive (version 2.3.6)
Driver: Hive JDBC (version 2.3.6)
Transaction isolation: TRANSACTION_REPEATABLE_READ
# 显示数据库
0: jdbc:hive2://192.168.71.130:10000> show databases;
OK
+----------------+
| database_name  |
+----------------+
| default        |
+----------------+
1 row selected (1.074 seconds)

创建表,并添加查询记录:

0: jdbc:hive2://192.168.71.130:10000> select * from users;
FAILED: SemanticException [Error 10001]: Line 1:14 Table not found 'users'
Error: Error while compiling statement: FAILED: SemanticException [Error 10001]: Line 1:14 Table not found 'users' (state=42S02,code=10001)
0: jdbc:hive2://192.168.71.130:10000> create table users(id int,name string);
OK
No rows affected (0.679 seconds)
0: jdbc:hive2://192.168.71.130:10000> insert into users values (1,'houjianjun');
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20191129114331_3826f801-4dd1-456b-a0ba-4b9aa9f5245d
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Job running in-process (local Hadoop)
2019-11-29 11:43:36,781 Stage-1 map = 0%,  reduce = 0%
2019-11-29 11:43:37,805 Stage-1 map = 100%,  reduce = 0%
Ended Job = job_local734466231_0001
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to directory hdfs://hadoopha/user/hive/warehouse/users/.hive-staging_hive_2019-11-29_11-43-31_911_2889855612823797721-1/-ext-10000
Loading data to table default.users
MapReduce Jobs Launched: 
Stage-Stage-1:  HDFS Read: 13 HDFS Write: 95 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
No rows affected (6.81 seconds)
0: jdbc:hive2://192.168.71.130:10000> select * from users;
OK
+-----------+-------------+
| users.id  | users.name  |
+-----------+-------------+
| 1         | houjianjun  |
+-----------+-------------+
2 rows selected (0.395 seconds)
0: jdbc:hive2://192.168.71.130:10000> 

七、常见问题:

  1. FAILED: SemanticException org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
    原因:
    因为没有正常启动Hive的 Metastore Server服务进程。
# 前台启动服务
root@master:/usr/local/apache-hive-2.3.6-bin/bin# hive --service metastore -v
2019-11-28 15:17:55: Starting Hive Metastore Server
Starting hive metastore on port 9083
# 后台启动
root@master:/usr/local/apache-hive-2.3.6-bin/bin# hive --service metastore &
[1] 112078
# 移除数据库
root@master:/usr/local/apache-hive-2.3.6-bin/bin# mv metastore_db metastore_db.tmp
# 初始化数据库
root@master:/usr/local/apache-hive-2.3.6-bin/bin# schematool -initSchema -dbType derby
Metastore connection URL:        jdbc:derby:;databaseName=metastore_db;create=true
Metastore Connection Driver :    org.apache.derby.jdbc.EmbeddedDriver
Metastore connection User:       APP
Starting metastore schema initialization to 2.3.0
Initialization script hive-schema-2.3.0.derby.sql
Initialization script completed
schemaTool completed

  1. SLF4J: Class path contains multiple SLF4J bindings.
    解决:
    删除其中的一个jar即可。
root@master:/usr/local/apache-hive-2.3.6-bin/lib# mv log4j-slf4j-impl-2.6.2.jar log4j-slf4j-impl-2.6.2.jars

  1. Relative path in absolute URI: %7Bsystem:user.name%7D
    修改配置文件路径为绝对。

  2. FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask

  3. No appenders could be found for logger (org.apache.hadoop.mapreduce.v2.app.MRAppMaster).

  4. FAILED: SemanticException org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient

# 查看启动metastore的进程号
$ ps -aux | grep 'metastore'
# 杀死相关进程
$ kill -9 进程号
  1. Could not open client transport with JDBC Uri: jdbc:hive2://
    原因:
    没能启动hiveserver2服务
root@master:~# hive --service hiveserver2 &

你可能感兴趣的:(Hadoop开发--Hive安装)