Hive 2.3.x 使用 MySQL存储元数据 安装部署

1、集群规划

本文前置基础Hadoop安装部署参照之前的Hadoop集群搭建的那篇博客:CentOS7环境下Hadoop3 NameNode ResourceManager HA 集群搭建
现在NameNode1节点上安装部署Hive,新增2台服务器用于安装MySQL数据库作为Hive的元数据存储的数据库服务器,其他保持不变。

IP HostName 用途 安装软件
192.168.100.131 lzjnn1 NameNode1 hadoop,hive
192.168.100.161 lzjmysql1 MySQL主数据库 mysql-server
192.168.100.162 lzjmysql2 MySQL从数据库 mysql-server

2、MySQL安装配置

MySQL数据库服务的安装这里就不写了,主要写一些关键的配置。我这边用的是mysql-community-server-5.7.24版本,rpm包安装方式。

2.1、主从同步配置

1、主MySQL服务器上修改 /etc/my.cnf文件,在[mysqld]下新增内容:

server-id=1
log-bin=master-bin
log-bin-index=master-bin.index

character-set-server=utf8

2、从MySQL服务器上修改 /etc/my.cnf文件,在[mysqld]下新增内容:

server-id=2
relay-log-index=slave-relay-bin.index
relay-log=slave-relay-bin 

character-set-server=utf8

3、启动主MySQL数据库服务,使用root用户链接本机的数据库

#启动mysql服务
service mysqld start

#连接本机mysql服务
#初始root密码在第一次启动服务的时候,在mysql的日志文件/var/log/mysqld.log中会打印出来
mysql -uroot -p

#进入mysql命令行后,创建用户repl用户主从同步
mysql>create user repl;
mysql>grant replication slave on *.* to 'repl'@'192.168.0.%' identified by 'XXXXXXXX';

#查看主MySQL的binlog文件名和位置,记录下来File、Position的值
mysql> show master status;
+-------------------+----------+--------------+------------------+-------------------+
| File              | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+-------------------+----------+--------------+------------------+-------------------+
| master-bin.000003 |   580013 |              |                  |                   |
+-------------------+----------+--------------+------------------+-------------------+
1 row in set (0.00 sec)

3、启动从MySQL数据库服务,使用root用户链接本机的数据库

#启动mysql服务
service mysqld start

#连接本机mysql服务
#初始root密码在第一次启动服务的时候,在mysql的日志文件/var/log/mysqld.log中会打印出来
mysql -uroot -p

#关闭slave
mysql> stop slave;

#配置从库的同步设置
# master_host/master_port填写主库的连接IP和端口
# master_user/master_password填写之前主库新建的用户同步的用户名和密码
# master_log_file/master_log_pos填写主库show master status的执行结果
mysql> change master to master_host='192.168.100.148', 
master_port=3306,
master_user='repl',
master_password='XXXXXXXX', 
master_log_file='master-bin.000003',
master_log_pos=580013;

#启动slave
mysql> start slave;

#查看从库状态,显示Slave_IO_State: Waiting for master to send event为正常
mysql> show slave status\G;

2.2、创建Hive使用的库

在主服务器上创建hive数据库和hive用户

mysql> create database hive;
mysql> grant all privileges on hive.* to 'hive'@'%' identified by 'XXXXXXXX';

3、Hive安装配置

每台服务器上都需要安装配置jdk

3.1、下载

进入Hive官网下载
下载页面链接
选择apache-hive-2.3.4-bin.tar.gz下载

3.2、安装

#在/usr/local目录下新建hive目录
mkdir /usr/local/hive

#复制下载的jdk文件到此目录下
cp /download/path/apache-hive-2.3.4-bin.tar.gz /usr/local/hive

#cd到对应目录下解压文件
tar zxvf apache-hive-2.3.4-bin.tar.gz

2.3、配置

1、在/etc/profile.d目录下修改 hadoop.sh文件,新增如下内容:

# for hive
export HIVE_HOME=/usr/local/hive/apache-hive-2.3.4-bin
export HIVE_CONF_DIR=$HIVE_HOME/conf
export HIVE_AUX_JARS_PATH=$HIVE_HOME/lib/*

2、修改 $HIVE_HOME/conf/hive-site文件,对应配置改为如下:

<configuration>
  <property>
    <name>hive.metastore.warehouse.dirname>
    <value>/app/hive/warehousevalue>
  property>
  <property>
    <name>system:java.io.tmpdirname>
    <value>/app/hive/iotmpvalue>
  property>
  <property>
    <name>javax.jdo.option.ConnectionURLname>
    <value>jdbc:mysql://192.168.100.161:3306/hive?createDatabaseIfNotExist=true&characterEncoding=UTF-8&useSSL=falsevalue>
  property>
  <property>
    <name>javax.jdo.option.ConnectionDriverNamename>
    <value>com.mysql.cj.jdbc.Drivervalue>
  property>
  <property>
    <name>javax.jdo.option.ConnectionUserNamename>
    <value>hivevalue>
  property>
  <property>
    <name>javax.jdo.option.ConnectionPasswordname>
    <value>XXXXXXXXvalue>
  property>

  <property>
    <name>hive.metastore.urisname>
    <value>thrift://lzjnn1:9083value>
  property>
  
  <property>
    <name>hive.exec.dynamic.partition.modename>
    <value>nonstrictvalue>
  property>
configuration>

3、修改日志文件目录(不需要改可以略过):
修改 $HIVE_HOME/conf/hive-log4j2.properties 文件,对应日志目录配置改为如下:

property.hive.log.dir = /app/logs/hive

4、初始化hive的元数据库

$HIVE_HOME/bin/schematool -dbType mysql -initSchema;

5、修改mysql中hive库里的表的某些字段字符集设置,避免在hive中设置的中文注释有乱码(很重要!!)
使用hive用户登录mysql服务的命令行,执行如下命令

#修改hive元数据表中注解相关字段为utf8:
mysql> alter table COLUMNS_V2 modify column COMMENT varchar(256) character set utf8;
mysql> alter table TABLE_PARAMS modify column PARAM_VALUE varchar(4000) character set utf8;
mysql> alter table PARTITION_PARAMS modify column PARAM_VALUE?varchar(4000) character set utf8 ;
mysql> alter table PARTITION_KEYS modify column PKEY_COMMENT varchar(4000) character set utf8;
mysql> alter table INDEX_PARAMS modify column PARAM_VALUE varchar(4000) character set utf8;

2.4、启动

1、首先启动metastore服务

nohup $HIVE_HOME/bin/hive --service metastore >/dev/null 2>&1 &

2、再启动hiveserver2 服务

nohup $HIVE_HOME/bin/hive --service hiveserver2 >/dev/null 2>&1 &

3、测试连接,使用beeline工具测试

$HIVE_HOME/bin/beeline

#连接本机的hive
#会要求输入用户名密码,我这边hive是使用hadoop用户启动的,用户名就是hadoop,密码不写
beeline> !connect jdbc:hive2://localhost:10000/default
Connecting to jdbc:hive2://localhost:10000/default
Enter username for jdbc:hive2://localhost:10000/default: hadoop
Enter password for jdbc:hive2://localhost:10000/default: 
Connected to: Apache Hive (version 2.3.4)
Driver: Hive JDBC (version 2.3.4)
Transaction isolation: TRANSACTION_REPEATABLE_READ

#创建测试数据库 test1
0: jdbc:hive2://localhost:10000/default> create database test1;
No rows affected (1.11 seconds)

#切换数据库到test1
0: jdbc:hive2://localhost:10000/default> use test1;
No rows affected (0.114 seconds)

#创建测试表
0: jdbc:hive2://localhost:10000/default> create table testa(
. . . . . . . . . . . . . . . . . . . .>    ORDER_ID int comment '订单ID',
. . . . . . . . . . . . . . . . . . . .>    DEALER_ID int comment '门店ID',
. . . . . . . . . . . . . . . . . . . .>    CUST_ID int comment '客户ID'
. . . . . . . . . . . . . . . . . . . .> );
No rows affected (0.482 seconds)

#查看表
0: jdbc:hive2://localhost:10000/default> show tables;
+-----------+
| tab_name  |
+-----------+
| testa     |
+-----------+
1 row selected (0.109 seconds)

#查看表创建语句
0: jdbc:hive2://localhost:10000/default> show create table testa;
+----------------------------------------------------+
|                   createtab_stmt                   |
+----------------------------------------------------+
| CREATE TABLE `testa`(                              |
|   `order_id` int COMMENT '订单ID',                   |
|   `dealer_id` int COMMENT '门店ID',                  |
|   `cust_id` int COMMENT '客户ID')                    |
| ROW FORMAT SERDE                                   |
|   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'  |
| STORED AS INPUTFORMAT                              |
|   'org.apache.hadoop.mapred.TextInputFormat'       |
| OUTPUTFORMAT                                       |
|   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' |
| LOCATION                                           |
|   'hdfs://lzjcluster/app/hive/warehouse/test1.db/testa' |
| TBLPROPERTIES (                                    |
|   'transient_lastDdlTime'='1545731518')            |
+----------------------------------------------------+
14 rows selected (0.17 seconds)

#测试写入数据
0: jdbc:hive2://localhost:10000/default> insert into testa values(1,2,3);
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
No rows affected (17.425 seconds)

#查询数据
0: jdbc:hive2://localhost:10000/default> select * from testa;
+-----------------+------------------+----------------+
| testa.order_id  | testa.dealer_id  | testa.cust_id  |
+-----------------+------------------+----------------+
| 1               | 2                | 3              |
+-----------------+------------------+----------------+
1 row selected (0.318 seconds)

#删除测试库
0: jdbc:hive2://localhost:10000/default> drop database test1 cascade;
No rows affected (1.539 seconds)

4、异常情况记录
在连接hive过程中,一开始会报错,

Error: Could not open client transport with JDBC Uri: jdbc:hive2://localhost:10000/default: Failed to open new session: java.lang.RuntimeException: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException): User: hadoop is not allowed to impersonate hadoop (state=08S01,code=0)

这是由于用户hadoop权限不足导致的,需要修改$HADOOP_HOME/etc/hadoop/core-site.xml 文件,新增如下内容:

<property>
  <name>hadoop.proxyuser.hadoop.hostsname>
  <value>*value>
property>
<property>
  <name>hadoop.proxyuser.hadoop.groupsname>
  <value>hadoopvalue>
property>

之后重启hadoop服务,异常解决

你可能感兴趣的:(大数据)