hadoop之hive部署

阅读更多

1、解压

tar -zxvf Downloads/apache-hive-3.1.1-bin.tar.gz -C applications/

2、建软连接

ln -s apache-hive-3.1.1-bin hive

3、驱动包mysql-connector-java-5.1.27.jar放在/lib

cp ~/Downloads/mysql-connector-java-5.1.46/mysql-connector-java-5.1.46.jar ~/applications/apache-hive-3.1.1-bin/lib/

4、配置环境变量

/etc/profile

export HIVE_HOME=/opt/applications/hive

exporT PATH=$HIVE_HOME/bin:$PATH

5、创建Hive  mysql数据库和表

首先创建hive账户

mysql> create user 'hive' identified by '123456';

将mysql所有权限授予hive账户

grant all on *.* to 'hive'@'%' identified by '123456';
 
  

flush privileges;

使用hive用户登录mysql数据库:
mysql -h localhost -u hive -p

创建数据库hive

mysql> create database hive;
Query OK, 1 row affected (0.05 sec)

mysql> show databases;
+--------------------+
| Database |
+--------------------+
| information_schema |
| confluence |
| hive |
| mysql |
| performance_schema |
| sys |
+--------------------+
6 rows in set (0.00 sec)

6、hive-site.xml配置

新建 hive-env.sh

cp conf/hive-env.sh.template conf/hive-env.sh

hive-env.sh配置

HADOOP_HOME=/opt/applications/hadoop

export HIVE_CONF_DIR=/opt/applications/hive/conf

export HIVE_AUX_JARS_PATH=/opt/applications/hive/lib

hive-site.xml不存在,复制一份

[wls81@master applications]$ cd hive/

[wls81@master hive]$ cd conf/

[wls81@master conf]$ ls -lrt

total 332

-rw-r--r-- 1 wls81 wls81 2662 Apr 4 2018 parquet-logging.properties

-rw-r--r-- 1 wls81 wls81 2060 Apr 4 2018 ivysettings.xml

-rw-r--r-- 1 wls81 wls81 2365 Apr 4 2018 hive-env.sh.template

-rw-r--r-- 1 wls81 wls81 1596 Apr 4 2018 beeline-log4j2.properties.template

-rw-r--r-- 1 wls81 wls81 2274 Apr 4 2018 hive-exec-log4j2.properties.template

-rw-r--r-- 1 wls81 wls81 3086 Oct 24 07:49 hive-log4j2.properties.template

-rw-r--r-- 1 wls81 wls81 7163 Oct 24 07:49 llap-daemon-log4j2.properties.template

-rw-r--r-- 1 wls81 wls81 3558 Oct 24 07:49 llap-cli-log4j2.properties.template

-rw-r--r-- 1 wls81 wls81 299970 Oct 24 08:19 hive-default.xml.template

[wls81@master conf]$ cp hive-default.xml.template hive-site.xml

默认值

hive.metastore.db.type

DERBY

Expects one of [derby, oracle, mysql, mssql, postgres].

Type of database used by the metastore. Information schema & JDBCStorageHandler depend on it.

 


javax.jdo.option.ConnectionURL
jdbc:derby:;databaseName=metastore_db;create=true

JDBC connect string for a JDBC metastore.
To use SSL to encrypt/authenticate the connection, provide database-specific SSL flag in the connection URL.
For example, jdbc:postgresql://myhost/db?ssl=true for postgres database.

javax.jdo.option.ConnectionDriverName

org.apache.derby.jdbc.EmbeddedDriver

Driver class name for a JDBC metastore

javax.jdo.option.ConnectionUserName

APP

Username to use against metastore database

javax.jdo.option.ConnectionPassword

mine

password to use against metastore database

hive.metastore.uris

Thrift URI for the remote metastore. Used by metastore client to connect to remote metastore.

 

 

修改如下

hive.metastore.db.type

mysql

Expects one of [derby, oracle, mysql, mssql, postgres].

Type of database used by the metastore. Information schema & JDBCStorageHandler depend on it.

 


javax.jdo.option.ConnectionURL
jdbc:mysql://master:3306/hive?createDatabaseIfNotExist=true&useSSL=false

JDBC connect string for a JDBC metastore.
To use SSL to encrypt/authenticate the connection, provide database-specific SSL flag in the connection URL.
For example, jdbc:postgresql://myhost/db?ssl=true for postgres database.

javax.jdo.option.ConnectionDriverName

com.mysql.jdbc.Driver

Driver class name for a JDBC metastore

javax.jdo.option.ConnectionUserName

hive

Username to use against metastore database

javax.jdo.option.ConnectionPassword

123456

password to use against metastore database

 

 

增加配置远程数据库模式

https://blog.csdn.net/dufufd/article/details/78614958三种部署Mysql模式

hive.metastore.local

false#true 为本地模式

hive.metastore.uris

thrift://master:9083

Thrift URI for the remote metastore. Used by metastore client to connect to remote metastore.

7、修改hive数据目录

修改配置文件vi hive-site.xml,更改相关数据目录

默认值


hive.querylog.location
${system:java.io.tmpdir}/${system:user.name}
Location of Hive run time structured log file


hive.exec.local.scratchdir
${system:java.io.tmpdir}/${system:user.name}
Local scratch space for Hive jobs


hive.downloaded.resources.dir
${system:java.io.tmpdir}/${hive.session.id}_resources
Temporary local directory for added resources in the remote file system.


hive.server2.logging.operation.log.location
${system:java.io.tmpdir}/${system:user.name}/operation_logs
Top level directory where operation logs are stored if logging functionality is enabled

修改值

hive.querylog.location

/wls/log/hive/logs

Location of Hive run time structured log file


hive.exec.local.scratchdir
/Data/hive/scratchdir
Local scratch space for Hive jobs


hive.downloaded.resources.dir
/Data/hive/resources
Temporary local directory for added resources in the remote file system.


hive.server2.logging.operation.log.location
/wls/log/hive/operation_logs
Top level directory where operation logs are stored if logging functionality is enabled

sudo mkdir -p /wls/log/hive/logs

sudo mkdir -p /wls/log/hive/operation_logs
sudo mkdir -p /Data/hive/scratchdir
sudo mkdir -p /Data/hive/resources

sudo chown -R wls81:wls81 /wls/log/hive/logs
sudo chown -R wls81:wls81 /Data/hive/scratchdir
sudo chown -R wls81:wls81 /Data/hive/resources

sudo chown -R wls81:wls81 /wls/log/hive/operation_logs

其中

hive.metastore.warehouse.dir

/user/hive/warehouse

location of default database for the warehouse

创建Hdfs    

hdfs dfs -mkdir -p /user/hive/warehouse

8、初始化hive 元数据

./bin/schematool -dbType mysql -initSchema

初始化报错

Metastore connection URL: jdbc:mysql//localhost:3306/hive?createDatabaseIfNotExist=true&useSSL=false

Metastore Connection Driver : com.mysql.jdbc.Driver

Metastore connection User: hive

org.apache.hadoop.hive.metastore.HiveMetaException: Failed to get schema version.

Underlying cause: java.sql.SQLException : No suitable driver found for jdbc:mysql//localhost:3306/hive?createDatabaseIfNotExist=true&useSSL=false

SQL Error code: 0

Use --verbose for detailed stacktrace.

原因是Jdbc:mysql后面忘记红色部分


javax.jdo.option.ConnectionURL
jdbc:mysql://master:3306/hive?createDatabaseIfNotExist=true&useSSL=false

初始化成功,数据库表有74张表

9、启动

遇到问题

aused by: com.ctc.wstx.exc.WstxParsingException: Illegal character entity: expansion character (code 0x8

原因是Conf/hive-site.xml的红色部分,删除即可

hive.txn.xlock.iow

true

Ensures commands with OVERWRITE (such as INSERT OVERWRITE) acquire Exclusive locks fortransactional tables. This ensures that inserts (w/o overwrite) running concurrently

are not hidden by the INSERT OVERWRITE.

再次启动

[wls81@master bin]$ hive -version
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/wls81/applications/apache-hive-3.1.1-bin/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/wls81/applications/hadoop-3.2.0/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Hive Session ID = f69eeb96-efac-44f1-a227-bf306d3267ff

Logging initialized using configuration in jar:file:/home/wls81/applications/apache-hive-3.1.1-bin/lib/hive-common-3.1.1.jar!/hive-log4j2.properties Async: true
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.

hive> show databases;

OK

default

Time taken: 0.714 seconds, Fetched: 1 row(s)

hive>quit;

beeline通过HiveServer2访问Hive的配置和操作

前言

作为数据仓库的工具,hive提供了两种ETL运行方式,分别是通过Hive 命令行和beeline客户端;

命令行方式即通过hive进入命令模式后通过执行不同的HQL命令得到对应的结果;相当于胖客户端模式,即客户机中需要安装JRE环境和Hive程序。

beeline客户端方式相当于瘦客户端模式,采用JDBC方式借助于Hive Thrift服务访问Hive数据仓库。

HiveThrift(HiveServer)是Hive中的组件之一,设计目的是为了实现跨语言轻量级访问Hive数据仓库,有Hiveserver和 Hiveserver2两个版本,两者不兼容,使用中要注意区分。体现在启动HiveServer的参数和jdbc:hiveX的参数上。

beeline相关的Server.Thrift配置

主要是hive/conf/hive-site.xml中hive.server2.thrift相关的一些配置项,但要注意一致性


hive.server2.thrift.bind.host
master
Bind host on which to run the HiveServer2 Thrift service.


hive.server2.thrift.port
10000
Port number of HiveServer2 Thrift interface when hive.server2.transport.mode is 'binary'.


hive.server2.thrift.http.port
10001
Port number of HiveServer2 Thrift interface when hive.server2.transport.mode is 'http'.


进入beeline连接数据库后,因为要访问的文件在HDFS上,对应的路径有访问权限限制,所以,这里要设成hadoop中的用户名,实例中用户名即为'wls81’。

如果使用其它用户名,可能会报权限拒绝的错误。或通过修改hadoop中的配置项hadoop.proxyuser.XX为“*” 来放宽用户名和权限










hive.server2.thrift.client.userwls81Username to use against thrift clienthive.server2.thrift.client.password123456Password to use against thrift client

hadoop/etc/hadoop/core-site.xml


hadoop.proxyuser.hadoop.hosts
*


hadoop.proxyuser.hadoop.groups
*

配置解析:

hadoop.proxyuser.hadoop.hosts 配置成*的意义,表示任意节点使用 hadoop 集群的代理用户hadoop 都能访问 hdfs 集群,hadoop.proxyuser.hadoop.groups 表示代理用户的组所属

如果代理用户的组所属wls81,上述则修改为 hadoop.proxyuser.wls81.hosts  hadoop.proxyuser.wls81.hosts

 

启动beeline并访问Hive

master上启动hiveserver2,   

nohup hive --service metastore & #启动metastore服务

nohup hive --service hiveserver2 & 

ps -ef | grep Hive 能看到Hiveserver2已启动

beeline


hive2://master:10000
hive2://master:10000
hive2://master:10000
hive2://master:10000



hive2://master:10000 Beeline version 3.1.1 by Apache Hivebeeline> !connect jdbc:Connecting to jdbc:Enter username for jdbc:: hiveEnter password for jdbc:: ******Connected to: Apache Hive (version 3.1.1)Driver: Hive JDBC (version 3.1.1)Transaction isolation: TRANSACTION_REPEATABLE_READ0: jdbc:> show databases;

或者用

 
  

beeline -u jdbc:hive2://master:10000 -n hive

退出指令是!quit

beeline的一些操作

!help //查看帮助
!close //关闭当前连接 如我们连接jdbc连接
!table ; //显示表
!sh clear ; //执行shell脚本命令
!quit ; //退出beeline终端

在beeline上执行聚合函数和高级查询

select count(*) from t1; //统计
select max(*) from t1; //最大值
select min(*) from t1; //最小值
select sum(*) form t1; //求和
select avg(*) from t1; //求平均值
select * from t1 order by id limit 5,5; //分页
select * from (select id,name from t1) a; //子查询或者叫嵌套查询
select name,case when id < 3 then 'small' case when id =3 then "true" else 'big'
//case when等价于java中if else/switch case
select count(*),sum(id) from t1 gourp by city having id >10;


like和rlike区别









like和rlike一般用于模糊查询(假如我们要从employees表中查找所有住址街道名称中含有单词Chicago和Ontario的雇员名称和街道信息)like实例:select name,address from employees where address like '%Chicago%' OR address like '%Ontario%';rlike实例:select name,address from employeeswhere address rlike '.*(Chicago|Ontario).*';我们看的出来rlike是like的强化版,支持java的正则表达式,更方便,简化代码

你可能感兴趣的:(Hadoop,hive,部署)