hive 2.x版本出来已经有一段时间了,目前的2.x中的稳定版本为2.1.0
github地址:https://github.com/apache/hive/tree/master
官方下载地址为:https://mirrors.tuna.tsinghua.edu.cn/apache/hive/
工作之余,我们就来看看hive2.1.0这个版本相对于1.2来说的change
之前的一篇hive1.2的mysql部署文章:
http://blog.csdn.net/gamer_gyt/article/details/52032579
hive2.1.0相对1.2来讲部署上并没有什么变化,但就算是温习吧,我们依旧走一遍这个过程,看看有哪些坑等着我们去踩。
我这里hive的存放目录是/opt/bigdata/hive,并重命名为hive
tar -zxvf /home/thinkgamer/下载/apache-hive-2.1.0-bin.tar.gz -C /opt/bigdata/
mv apache-hive-2.1.0-bin/ hive
创建hive21用户,赋予权限,清除缓存
CREATE USER 'hive21' IDENTIFIED BY 'hive21';
grant all privileges on *.* to 'hive21' with grant option;
flush privileges;
cp /path/to/mysql-connector-java-5.1.38-bin.jar hive/lib
<name>javax.jdo.option.ConnectionURLname>
<value>jdbc:mysql://localhost:3306/hive21?createDatabaseIfNotExist=true&useUnicode=true&characterEncoding=UTF-8value>
javax.jdo.option.ConnectionDriverName
com.mysql.jdbc.Driver
<name>javax.jdo.option.ConnectionUserNamename>
<value>hive21value>
<name>javax.jdo.option.ConnectionPasswordname>
<value>hive21value>
bin/hive
hive> show databases;
OK
default
Time taken: 1.123 seconds, Fetched: 1 row(s)
hive> create table table_name (
> id int,
> dtDontQuery string,
> name string
> );
OK
Time taken: 0.983 seconds
hive> show tables;
OK
table_name
Time taken: 0.094 seconds, Fetched: 1 row(s)
这个时候进入mysql数据库有一个hive21的数据库
mysql> show databases;
+--------------------+
| Database |
+--------------------+
| information_schema |
| hive |
| hive21 |
| mysql |
| performance_schema |
+--------------------+
报错如下:
Caused by: MetaException(message:Hive metastore database is not initialized. Please use schematool (e.g. ./schematool -initSchema -dbType ...) to create the schema.
If needed, don't forget to include the option to auto-create the underlying database in your JDBC connection string (e.g. ?createDatabaseIfNotExist=true for mysql))
解决办法:
bin/schematool -initSchema -dbType mysql --verbose
此问题解决时在网上查阅资料有人说这里要初始化derby数据库,个人认为这是不正确的,因为我们已经配置使用了mysql作为元数据库
报错如下:
Logging initialized using configuration in file:/opt/bigdata/hive/conf/hive-log4j2.properties Async: true
Exception in thread "main" java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: ${system:java.io.tmpdir%7D/$%7Bsystem:user.name%7D
at org.apache.hadoop.fs.Path.initialize(Path.java:205)
at org.apache.hadoop.fs.Path.(Path.java:171)
at org.apache.hadoop.hive.ql.session.SessionState.createSessionDirs(SessionState.java:631)
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:550)
at org.apache.hadoop.hive.ql.session.SessionState.beginStart(SessionState.java:518)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:705)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:641)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.net.URISyntaxException: Relative path in absolute URI: ${system:java.io.tmpdir%7D/$%7Bsystem:user.name%7D
at java.net.URI.checkPath(URI.java:1823)
at java.net.URI.(URI.java:745)
at org.apache.hadoop.fs.Path.initialize(Path.java:202)
... 12 more
解决办法:
修改 hive-site.xml 替换${system:java.io.tmpdir} 和 ${system:user.name}为/opt/bigdata/hive/tmp
hive1.2
[master@master1 hive]$ bin/hive --service help
Usage ./hive --service serviceName
Service List: beeline cli help hiveburninclient hiveserver2 hiveserver hwi jar lineage metastore metatool orcfiledump rcfilecat schemaTool version
Parameters parsed:
--auxpath : Auxillary jars
--config : Hive configuration directory
--service : Starts specific service/component. cli is default
Parameters used:
HADOOP_HOME or HADOOP_PREFIX : Hadoop install directory
HIVE_OPT : Hive options
For help on a particular service:
./hive --service serviceName --help
Debug help: ./hive --debug --help
hive2.1
root@thinkgamer-pc:/opt/bigdata/hive# bin/hive --service help
Usage ./hive --service serviceName
Service List: beeline cleardanglingscratchdir cli hbaseimport hbaseschematool help hiveburninclient hiveserver2 hplsql hwi jar lineage llapdump llap llapstatus metastore metatool orcfiledump rcfilecat schemaTool version
Parameters parsed:
--auxpath : Auxillary jars
--config : Hive configuration directory
--service : Starts specific service/component. cli is default
Parameters used:
HADOOP_HOME or HADOOP_PREFIX : Hadoop install directory
HIVE_OPT : Hive options
For help on a particular service:
./hive --service serviceName --help
Debug help: ./hive --debug --help
我们可以看到在hive2.1中增加了对hbase的支持,同时还增加了hplsql等等,这些都是hive2.1的新特性,这里介绍几个常用的
beeline:和hive1.2中beeline使用方法应该是一样的,至于性能方面的提升肯定是有的,beeline的使用,参考
http://blog.csdn.net/gamer_gyt/article/details/52062460
cleardanglingscratchdir:scratch directory(清楚缓存)
使用方法: bin/hive –service cleardanglingscratchdir
hbaseimport/hbaseschematool:与Hbase进行交互
hiveserver2:提供一个JDBC接口,供外部程序操作hive
hplsql:一个工具,实现sql在Apache hive,sparkSql,以及其他基于hadoop的sql,Nosql和关系数据库的使用
官方解释:
HPL/SQL (previously known as PL/HQL) is an open source tool (Apache License 2.0) that implements procedural SQL language for Apache Hive, SparkSQL as well as any other SQL-on-Hadoop implementations, NoSQL and RDBMS.
HPL/SQL language is compatible to a large extent with Oracle PL/SQL, ANSI/ISO SQL/PSM (IBM DB2, MySQL, Teradata i.e), PostgreSQL PL/pgSQL (Netezza), Transact-SQL (Microsoft SQL Server and Sybase) that allows you leveraging existing SQL/DWH skills and familiar approach to implement data warehouse solutions on Hadoop. It also facilitates migration of existing business logic to Hadoop.
HPL/SQL is an efficient way to implement ETL processes in Hadoop.
下面附一张从网上看到的图片: