本文前置基础Hadoop安装部署参照之前的Hadoop集群搭建的那篇博客:CentOS7环境下Hadoop3 NameNode ResourceManager HA 集群搭建
现在NameNode1节点上安装部署Hive,新增2台服务器用于安装MySQL数据库作为Hive的元数据存储的数据库服务器,其他保持不变。
IP | HostName | 用途 | 安装软件 |
---|---|---|---|
192.168.100.131 | lzjnn1 | NameNode1 | hadoop,hive |
192.168.100.161 | lzjmysql1 | MySQL主数据库 | mysql-server |
192.168.100.162 | lzjmysql2 | MySQL从数据库 | mysql-server |
MySQL数据库服务的安装这里就不写了,主要写一些关键的配置。我这边用的是mysql-community-server-5.7.24版本,rpm包安装方式。
1、主MySQL服务器上修改 /etc/my.cnf文件,在[mysqld]下新增内容:
server-id=1
log-bin=master-bin
log-bin-index=master-bin.index
character-set-server=utf8
2、从MySQL服务器上修改 /etc/my.cnf文件,在[mysqld]下新增内容:
server-id=2
relay-log-index=slave-relay-bin.index
relay-log=slave-relay-bin
character-set-server=utf8
3、启动主MySQL数据库服务,使用root用户链接本机的数据库
#启动mysql服务
service mysqld start
#连接本机mysql服务
#初始root密码在第一次启动服务的时候,在mysql的日志文件/var/log/mysqld.log中会打印出来
mysql -uroot -p
#进入mysql命令行后,创建用户repl用户主从同步
mysql>create user repl;
mysql>grant replication slave on *.* to 'repl'@'192.168.0.%' identified by 'XXXXXXXX';
#查看主MySQL的binlog文件名和位置,记录下来File、Position的值
mysql> show master status;
+-------------------+----------+--------------+------------------+-------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+-------------------+----------+--------------+------------------+-------------------+
| master-bin.000003 | 580013 | | | |
+-------------------+----------+--------------+------------------+-------------------+
1 row in set (0.00 sec)
3、启动从MySQL数据库服务,使用root用户链接本机的数据库
#启动mysql服务
service mysqld start
#连接本机mysql服务
#初始root密码在第一次启动服务的时候,在mysql的日志文件/var/log/mysqld.log中会打印出来
mysql -uroot -p
#关闭slave
mysql> stop slave;
#配置从库的同步设置
# master_host/master_port填写主库的连接IP和端口
# master_user/master_password填写之前主库新建的用户同步的用户名和密码
# master_log_file/master_log_pos填写主库show master status的执行结果
mysql> change master to master_host='192.168.100.148',
master_port=3306,
master_user='repl',
master_password='XXXXXXXX',
master_log_file='master-bin.000003',
master_log_pos=580013;
#启动slave
mysql> start slave;
#查看从库状态,显示Slave_IO_State: Waiting for master to send event为正常
mysql> show slave status\G;
在主服务器上创建hive数据库和hive用户
mysql> create database hive;
mysql> grant all privileges on hive.* to 'hive'@'%' identified by 'XXXXXXXX';
每台服务器上都需要安装配置jdk
进入Hive官网下载
下载页面链接
选择apache-hive-2.3.4-bin.tar.gz下载
#在/usr/local目录下新建hive目录
mkdir /usr/local/hive
#复制下载的jdk文件到此目录下
cp /download/path/apache-hive-2.3.4-bin.tar.gz /usr/local/hive
#cd到对应目录下解压文件
tar zxvf apache-hive-2.3.4-bin.tar.gz
1、在/etc/profile.d目录下修改 hadoop.sh文件,新增如下内容:
# for hive
export HIVE_HOME=/usr/local/hive/apache-hive-2.3.4-bin
export HIVE_CONF_DIR=$HIVE_HOME/conf
export HIVE_AUX_JARS_PATH=$HIVE_HOME/lib/*
2、修改 $HIVE_HOME/conf/hive-site文件,对应配置改为如下:
<configuration>
<property>
<name>hive.metastore.warehouse.dirname>
<value>/app/hive/warehousevalue>
property>
<property>
<name>system:java.io.tmpdirname>
<value>/app/hive/iotmpvalue>
property>
<property>
<name>javax.jdo.option.ConnectionURLname>
<value>jdbc:mysql://192.168.100.161:3306/hive?createDatabaseIfNotExist=true&characterEncoding=UTF-8&useSSL=falsevalue>
property>
<property>
<name>javax.jdo.option.ConnectionDriverNamename>
<value>com.mysql.cj.jdbc.Drivervalue>
property>
<property>
<name>javax.jdo.option.ConnectionUserNamename>
<value>hivevalue>
property>
<property>
<name>javax.jdo.option.ConnectionPasswordname>
<value>XXXXXXXXvalue>
property>
<property>
<name>hive.metastore.urisname>
<value>thrift://lzjnn1:9083value>
property>
<property>
<name>hive.exec.dynamic.partition.modename>
<value>nonstrictvalue>
property>
configuration>
3、修改日志文件目录(不需要改可以略过):
修改 $HIVE_HOME/conf/hive-log4j2.properties 文件,对应日志目录配置改为如下:
property.hive.log.dir = /app/logs/hive
4、初始化hive的元数据库
$HIVE_HOME/bin/schematool -dbType mysql -initSchema;
5、修改mysql中hive库里的表的某些字段字符集设置,避免在hive中设置的中文注释有乱码(很重要!!)
使用hive用户登录mysql服务的命令行,执行如下命令
#修改hive元数据表中注解相关字段为utf8:
mysql> alter table COLUMNS_V2 modify column COMMENT varchar(256) character set utf8;
mysql> alter table TABLE_PARAMS modify column PARAM_VALUE varchar(4000) character set utf8;
mysql> alter table PARTITION_PARAMS modify column PARAM_VALUE?varchar(4000) character set utf8 ;
mysql> alter table PARTITION_KEYS modify column PKEY_COMMENT varchar(4000) character set utf8;
mysql> alter table INDEX_PARAMS modify column PARAM_VALUE varchar(4000) character set utf8;
1、首先启动metastore服务
nohup $HIVE_HOME/bin/hive --service metastore >/dev/null 2>&1 &
2、再启动hiveserver2 服务
nohup $HIVE_HOME/bin/hive --service hiveserver2 >/dev/null 2>&1 &
3、测试连接,使用beeline工具测试
$HIVE_HOME/bin/beeline
#连接本机的hive
#会要求输入用户名密码,我这边hive是使用hadoop用户启动的,用户名就是hadoop,密码不写
beeline> !connect jdbc:hive2://localhost:10000/default
Connecting to jdbc:hive2://localhost:10000/default
Enter username for jdbc:hive2://localhost:10000/default: hadoop
Enter password for jdbc:hive2://localhost:10000/default:
Connected to: Apache Hive (version 2.3.4)
Driver: Hive JDBC (version 2.3.4)
Transaction isolation: TRANSACTION_REPEATABLE_READ
#创建测试数据库 test1
0: jdbc:hive2://localhost:10000/default> create database test1;
No rows affected (1.11 seconds)
#切换数据库到test1
0: jdbc:hive2://localhost:10000/default> use test1;
No rows affected (0.114 seconds)
#创建测试表
0: jdbc:hive2://localhost:10000/default> create table testa(
. . . . . . . . . . . . . . . . . . . .> ORDER_ID int comment '订单ID',
. . . . . . . . . . . . . . . . . . . .> DEALER_ID int comment '门店ID',
. . . . . . . . . . . . . . . . . . . .> CUST_ID int comment '客户ID'
. . . . . . . . . . . . . . . . . . . .> );
No rows affected (0.482 seconds)
#查看表
0: jdbc:hive2://localhost:10000/default> show tables;
+-----------+
| tab_name |
+-----------+
| testa |
+-----------+
1 row selected (0.109 seconds)
#查看表创建语句
0: jdbc:hive2://localhost:10000/default> show create table testa;
+----------------------------------------------------+
| createtab_stmt |
+----------------------------------------------------+
| CREATE TABLE `testa`( |
| `order_id` int COMMENT '订单ID', |
| `dealer_id` int COMMENT '门店ID', |
| `cust_id` int COMMENT '客户ID') |
| ROW FORMAT SERDE |
| 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' |
| STORED AS INPUTFORMAT |
| 'org.apache.hadoop.mapred.TextInputFormat' |
| OUTPUTFORMAT |
| 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' |
| LOCATION |
| 'hdfs://lzjcluster/app/hive/warehouse/test1.db/testa' |
| TBLPROPERTIES ( |
| 'transient_lastDdlTime'='1545731518') |
+----------------------------------------------------+
14 rows selected (0.17 seconds)
#测试写入数据
0: jdbc:hive2://localhost:10000/default> insert into testa values(1,2,3);
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
No rows affected (17.425 seconds)
#查询数据
0: jdbc:hive2://localhost:10000/default> select * from testa;
+-----------------+------------------+----------------+
| testa.order_id | testa.dealer_id | testa.cust_id |
+-----------------+------------------+----------------+
| 1 | 2 | 3 |
+-----------------+------------------+----------------+
1 row selected (0.318 seconds)
#删除测试库
0: jdbc:hive2://localhost:10000/default> drop database test1 cascade;
No rows affected (1.539 seconds)
4、异常情况记录
在连接hive过程中,一开始会报错,
Error: Could not open client transport with JDBC Uri: jdbc:hive2://localhost:10000/default: Failed to open new session: java.lang.RuntimeException: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException): User: hadoop is not allowed to impersonate hadoop (state=08S01,code=0)
这是由于用户hadoop权限不足导致的,需要修改$HADOOP_HOME/etc/hadoop/core-site.xml 文件,新增如下内容:
<property>
<name>hadoop.proxyuser.hadoop.hostsname>
<value>*value>
property>
<property>
<name>hadoop.proxyuser.hadoop.groupsname>
<value>hadoopvalue>
property>
之后重启hadoop服务,异常解决