【二、大数据环境篇】003、Hive安装(下)

注意:安装mysql使用的是root,所以,需要将Hadoop的工作用户XXX_user添加到root组,否则hive初始化失败:

org.apache.hadoop.hive.metastore.HiveMetaException: Failed to getschema version.

Caused by: java.sql.SQLException: Access denied for user'hive'@'host22VLK1737' (using password: YES)

注意,一般一个集群就一个hive实例。在master节点安装则可。Hive的数据存储在HDFS上。


本篇包含Hive的安装,Hive与spark的整合,以及Hive的基本使用示例。

一、运行模式

与 Hadoop 类似,Hive 也有 3 种运行模式:

 

1. 内嵌模式

将元数据保存在本地内嵌的 Derby 数据库中,这是使用 hive 最简单的方式。但是这种方式缺点也比较明显,因为一个内嵌的 Derby 数据库每次只能访问一个数据文件,这也就意味着它不支持多会话连接

2. 本地模式

这种模式是将元数据保存在本地独立的数据库中(一般是 MySQL),这用就可以支持多会话和多用户连接了。

3. 远程模式

此模式应用于 Hive 客户端较多的情况。把 MySQL 数据库独立出来,将元数据保存在远端独立的 mysql 服务中,避免了在每个客户端都安装 MySQL 服务从而造成冗余浪费的情况。master节点安装hive服务,将hive安装包拷贝到其他节点。那么其它节点通过hive client就可以远程访问Hive 服务。

 

二、下载安装

主要参考帖子:http://blog.csdn.net/u014695188/article/details/54646575 

其它参考链接:

http://blog.csdn.net/yinlei_81/article/details/69569403

http://blog.csdn.net/skywalker_only/article/details/34847043

http://blog.csdn.net/u010454030/article/details/53189943

http://blog.csdn.net/u014695188/article/details/54646575

http://blog.csdn.net/u011086367/article/details/52863345

http://www.jianshu.com/p/6108e0aed204

 

1、下载网址:

http://hive.apache.org/downloads.html

跳转到http://apache.fayea.com/hive/

apache-hive-2.1.1-bin.tar.gz 2016-12-0821:45  143M 

apache-hive-2.1.1-src.tar.gz 2016-12-0821:45   19M  源码包(java)

 

2、环境变量

/etc/profile

********************

# Hive environment

exportHIVE_HOME=/app/data_platform/soft/hive/apache-hive-2.1.1-bin

export PATH=$HIVE_HOME/bin:$HIVE_HOME/conf:$PATH

********************

 

3、hive配置

安装路径:/app/data_platform/soft/hive/apache-hive-2.1.1-bin

配置文件:[hive-site.xml]

配置如下:

cd/app/data_platform/soft/hive/apache-hive-2.1.1-bin/conf

cp hive-default.xml.template hive-site.xml

[以下参数可以不改,保持默认则可]

hive.metastore.warehouse.dir  #该参数指定了 Hive 的数据存储目录,默认位置在 HDFS 上面的 /user/hive/warehouse 路径下。

hive.exec.scratchdir   #该参数指定了 Hive 的数据临时文件目录,默认位置为 HDFS 上面的 /tmp/hive 路径下。

[hive-env.sh]

cd /app/data_platform/soft/hive/apache-hive-2.1.1-bin/conf

cp hive-env.sh.template hive-env.sh

********************

# Set HADOOP_HOME to point to a specifichadoop install directory 

export  HADOOP_HOME=/app/data_platform/soft/hadoop 

 

# Hive Configuration Directory can becontrolled by: 

export HIVE_CONF_DIR=/app/data_platform/soft/hive/apache-hive-2.1.1-bin/conf

 

# Folder containing extra ibraries requiredfor hive compilation/execution can be controlled by: 

export HIVE_AUX_JARS_PATH=/app/data_platform/soft/hive/apache-hive-2.1.1-bin/lib 

********************

4、创建hdfs路径

hadoop fs -ls /

hadoop fs -mkdir -p  /user/hive/warehouse

hadoop fs -mkdir -p /tmp/hive

hadoop fs -chmod 777/user/hive/warehouse 

hadoop fs -chmod 777 /tmp/hive 

5、修改 io.tmpdir 路径

要修改 hive-site.xml 中所有包含 ${system:java.io.tmpdir} 字段的 value 即路径(vim下 / 表示搜索,后面跟你的关键词,比如搜索 hello,

则为 /hello , 再回车即可),你可以自己新建一个目录来替换它,

例如:/app/data_platform/data/hive

把hive-site.xml 中所有包含 ${system:Java.io.tmpdir}替换成/opt/sdg_date/json_data/hive/iotmp 

全局替换命令先按Esc键  再同时按shift+:把以下替换命令粘贴按回车即可全局替换(一共修改了4行)

%s#${system:java.io.tmpdir}#/opt/sdg_date/json_data/hive/iotmp#g

%s#${system:java.io.tmpdir}#/app/data_platform/data/hive#g


6、运行(内嵌模式)

[your_user@XXXX  apache-hive-2.1.1-bin]$./bin/schematool -initSchema -dbType derby

which: no hbase in(/app/data_platform/soft/hive/apache-hive-2.1.1-bin/bin:/app/data_platform/soft/hive/apache-hive-2.1.1-bin/conf:/opt/data_platform/soft/oracle/instantclient_11_2:/opt/data_platform/soft/jdk1.8.0_111/bin:/app/data_platform/soft/hive/apache-hive-2.1.1-bin/bin:/app/data_platform/soft/hive/apache-hive-2.1.1-bin/conf:/opt/data_platform/soft/oracle/instantclient_11_2:/opt/data_platform/soft/jdk1.8.0_111/bin:.:/opt/data_platform/soft/jdk1.8.0_111/bin:/opt/data_platform/soft/jdk1.8.0_111/jre/bin:/app/data_platform/soft/hadoop/sbin:/app/data_platform/soft/hadoop/bin:/opt/data_platform/soft/oracle/instantclient_11_2:/opt/data_platform/soft/jdk1.8.0_111/bin:/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/app/data_platform/soft/scala/bin:/app/data_platform/soft/hive/apache-hive-2.1.1-bin/bin:/opt/data_platform/soft/jdk1.8.0_111/bin:/app/data_platform/soft/hadoop/bin:/app/data_platform/soft/hadoop/sbin:/home/es_user/bin:/app/data_platform/soft/scala/bin:/opt/data_platform/soft/jdk1.8.0_111/bin:/app/data_platform/soft/hadoop/bin:/app/data_platform/soft/hadoop/sbin:/app/data_platform/soft/scala/bin:/opt/data_platform/soft/jdk1.8.0_111/bin:/app/data_platform/soft/hadoop/bin:/app/data_platform/soft/hadoop/sbin)

SLF4J: Class path contains multiple SLF4Jbindings.

SLF4J: Found binding in[jar:file:/app/data_platform/soft/hive/apache-hive-2.1.1-bin/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in[jar:file:/app/data_platform/soft/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Seehttp://www.slf4j.org/codes.html#multiple_bindings for an explanation.

SLF4J: Actual binding is of type[org.apache.logging.slf4j.Log4jLoggerFactory]

Metastore connection URL:    jdbc:derby:;databaseName=metastore_db;create=true

Metastore Connection Driver :        org.apache.derby.jdbc.EmbeddedDriver

Metastore connection User:   APP

Starting metastore schema initialization to2.1.0

Initialization scripthive-schema-2.1.0.derby.sql

Error: FUNCTION 'NUCLEUS_ASCII' alreadyexists. (state=X0Y68,code=30000)

org.apache.hadoop.hive.metastore.HiveMetaException:Schema initialization FAILED! Metastore state would be inconsistent !!

Underlying cause: java.io.IOException :Schema script failed, errorcode 2

Use --verbose for detailed stacktrace.

*** schemaTool failed ***

 

[your_user@XXXX  apache-hive-2.1.1-bin]$ rm-rf metastore_db

 

[your_user@XXXX  apache-hive-2.1.1-bin]$bin/schematool -initSchema -dbType derby

which: no hbase in(/app/data_platform/soft/hive/apache-hive-2.1.1-bin/bin:/app/data_platform/soft/hive/apache-hive-2.1.1-bin/conf:/opt/data_platform/soft/oracle/instantclient_11_2:/opt/data_platform/soft/jdk1.8.0_111/bin:/app/data_platform/soft/hive/apache-hive-2.1.1-bin/bin:/app/data_platform/soft/hive/apache-hive-2.1.1-bin/conf:/opt/data_platform/soft/oracle/instantclient_11_2:/opt/data_platform/soft/jdk1.8.0_111/bin:.:/opt/data_platform/soft/jdk1.8.0_111/bin:/opt/data_platform/soft/jdk1.8.0_111/jre/bin:/app/data_platform/soft/hadoop/sbin:/data05/opt/data_platform/soft/hadoop/bin:/opt/data_platform/soft/oracle/instantclient_11_2:/opt/data_platform/soft/jdk1.8.0_111/bin:/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/app/data_platform/soft/scala/bin:/app/data_platform/soft/hive/apache-hive-2.1.1-bin/bin:/opt/data_platform/soft/jdk1.8.0_111/bin:/app/data_platform/soft/hadoop/bin:/app/data_platform/soft/hadoop/sbin:/home/es_user/bin:/app/data_platform/soft/scala/bin:/opt/data_platform/soft/jdk1.8.0_111/bin:/app/data_platform/soft/hadoop/bin:/app/data_platform/soft/hadoop/sbin:/app/data_platform/soft/scala/bin:/opt/data_platform/soft/jdk1.8.0_111/bin:/app/data_platform/soft/hadoop/bin:/app/data_platform/soft/hadoop/sbin)

SLF4J: Class path contains multiple SLF4Jbindings.

SLF4J: Found binding in[jar:file:/app/data_platform/soft/hive/apache-hive-2.1.1-bin/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in[jar:file:/app/data_platform/soft/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Seehttp://www.slf4j.org/codes.html#multiple_bindings for an explanation.

SLF4J: Actual binding is of type[org.apache.logging.slf4j.Log4jLoggerFactory]

Metastore connection URL:    jdbc:derby:;databaseName=metastore_db;create=true

Metastore Connection Driver :        org.apache.derby.jdbc.EmbeddedDriver

Metastore connection User:   APP

Starting metastore schema initialization to2.1.0

Initialization scripthive-schema-2.1.0.derby.sql

Initialization script completed

schemaTool completed

 

[your_user@XXXX  apache-hive-2.1.1-bin]$bin/hive

Logging initialized using configuration injar:file:/app/data_platform/soft/hive/apache-hive-2.1.1-bin/lib/hive-common-2.1.1.jar!/hive-log4j2.propertiesAsync: true

Hive-on-MR is deprecated in Hive 2 and maynot be available in the future versions. Consider using a different executionengine (i.e. spark, tez) or using Hive 1.X releases.

hive>

hive> create database aaa;

OK

Time taken: 0.188 seconds


hive> create table t1 ( f1 string, f2int);

OK

Time taken: 0.808 seconds

 

hive> show tables;

OK

t1

Time taken: 0.199 seconds, Fetched: 1row(s)

hive> insert into t1 values('aaa',333);

WARNING: Hive-on-MR is deprecated in Hive 2and may not be available in the future versions. Consider using a differentexecution engine (i.e. spark, tez) or using Hive 1.X releases.

Query ID =es_user_20170830153032_252e01ae-3531-476c-969a-fd8d5669fde3

Total jobs = 3

Launching Job 1 out of 3

Number of reduce tasks is set to 0 sincethere's no reduce operator

Starting Job = job_1504074498349_0001,Tracking URL = http://你的ip:8090/proxy/application_1504074498349_0001/

Kill Command = /app/data_platform/soft/hadoop/bin/hadoopjob  -kill job_1504074498349_0001

Hadoop job information for Stage-1: numberof mappers: 1; number of reducers: 0

2017-08-30 15:31:16,337 Stage-1 map =0%,  reduce = 0%

2017-08-30 15:31:33,731 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 3.17 sec

MapReduce Total cumulative CPU time: 3seconds 170 msec

Ended Job = job_1504074498349_0001

Stage-4 is selected by condition resolver.

Stage-3 is filtered out by conditionresolver.

Stage-5 is filtered out by condition resolver.

Moving data to directoryhdfs://XXXX:9000/user/hive/warehouse/aaa.db/t1/.hive-staging_hive_2017-08-30_15-30-32_792_1024599595931100846-1/-ext-10000

Loading data to table aaa.t1

MapReduce Jobs Launched:

Stage-Stage-1: Map: 1   Cumulative CPU: 3.17 sec   HDFS Read: 4046 HDFS Write: 70 SUCCESS

Total MapReduce CPU Time Spent: 3 seconds170 msec

OK

Time taken: 63.441 seconds


hive>select * from t1;

OK

aaa  333

Time taken: 0.129 seconds, Fetched: 1row(s)

报错与问题:

[ERROR] Failed with exception Java.io.IOException:java.lang.IllegalArgumentException:java.NET.URISyntaxException:

 Relative path in absolute URI:${system:user.name}

解决方法:hive-site.xml

将    /opt/sdg_date/json_data/hive/iotmp/${system:user.name}

改成:/opt/sdg_date/json_data/hive/iotmp/${ user.name}

 

注意:Hive本身自带一个数据库derby,但是有弊端,hive本身数据库,每次只允许一个用户登录

 

 

7、配置使用MySQL作为元数据存储(本地模式/远程模式)

mysql安装:http://blog.csdn.net/u014695188/article/details/51532410

关于安装mysql以及分配权限的请参考散仙之前的文章:http://qindongliang.iteye.com/blog/2337865

或者《【二、大数据环境篇】003、Hive安装(上)- MySql安装》

vi hive-site.xml

配置下面的几项参数

在最上面添加这个置顶属性

    system:user.name

    hive

   

 

   javax.jdo.option.ConnectionURL

   jdbc:mysql://你的ip:3306/hive?createDatabaseIfNotExist=true&characterEncoding=utf-8

   

     JDBC connect string for a JDBC metastore.

     To use SSL to encrypt/authenticate the connection, providedatabase-specific SSL flag in the connection URL.

     For example, jdbc:postgresql://myhost/db?ssl=true for postgres database.

   

 

 

   javax.jdo.option.ConnectionUserName

   hive

   Username to use against metastoredatabase

 

 

 

   javax.jdo.option.ConnectionPassword

   >hive

   password to use against metastoredatabase

 

 

   javax.jdo.option.ConnectionDriverName

   com.mysql.jdbc.Driver

   Driver class name for a JDBCmetastore

 

 

  

   hive.metastore.warehouse.dir

   /user/hive/warehouse

   location of default database for thewarehouse

 

 

 

   hive.metastore.schema.verification.record.version

   false

   

     When true the current MS version is recorded in the VERSION table. Ifthis is disabled and verification is

      enabled the MS will be unusable.

   

  

***********************

其他的凡是包含 ${system:java.io.tmpdir}变量的统一替代成绝对路径,上面的内嵌模式已经配置好了。

最后切记添加mysql的jdbc驱动包到hive/lib的目录下面

/app/data_platform/soft/hive/apache-hive-2.1.1-bin/lib/mysql-connector-java-5.1.42-bin.jar

 

说明下:

上面的jdbc的url里面驱动字符串设置为数据库编码为utf-8此外&符号需要转义

jdbc:mysql://192.168.10.40:3306/hive?createDatabaseIfNotExist=true&characterEncoding=utf-8

此外默认hive读取Hbase的lib如果没有安装hbase则会启动不起来: 需要下载hbase然后配置对应的HBASE_HOME,文末会给出所有的环境变量

 

注意:切换元数据的数据库,需要清空原来的数据库信息

rm -rf metastore_db

bin/schematool -initSchema -dbType mysql

如果已经创建了hive库,那么重新初始化,还是可能报错的,使用root登入mysql,删除hive数据库。

[[email protected]]$bin/schematool -initSchema -dbType mysql

which: no hbase in (/data05/opt/data_platform/soft/hive/apache-hive-2.1.1-bin/bin:/data05/opt/data_platform/soft/hive/apache-hive-2.1.1-bin/conf:/opt/data_platform/soft/oracle/instantclient_11_2:/opt/data_platform/soft/jdk1.8.0_111/bin:/data05/opt/data_platform/soft/hive/apache-hive-2.1.1-bin/bin:/data05/opt/data_platform/soft/hive/apache-hive-2.1.1-bin/conf:/opt/data_platform/soft/oracle/instantclient_11_2:/opt/data_platform/soft/jdk1.8.0_111/bin:.:/opt/data_platform/soft/jdk1.8.0_111/bin:/opt/data_platform/soft/jdk1.8.0_111/jre/bin:/data05/opt/data_platform/soft/hadoop/sbin:/data05/opt/data_platform/soft/hadoop/bin:/opt/data_platform/soft/oracle/instantclient_11_2:/opt/data_platform/soft/jdk1.8.0_111/bin:/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/data05/opt/data_platform/soft/scala/bin:/data05/opt/data_platform/soft/hive/apache-hive-2.1.1-bin/bin:/opt/data_platform/soft/jdk1.8.0_111/bin:/data05/opt/data_platform/soft/hadoop/bin:/data05/opt/data_platform/soft/hadoop/sbin:/home/es_user/bin:/data05/opt/data_platform/soft/scala/bin:/opt/data_platform/soft/jdk1.8.0_111/bin:/data05/opt/data_platform/soft/hadoop/bin:/data05/opt/data_platform/soft/hadoop/sbin:/data05/opt/data_platform/soft/scala/bin:/opt/data_platform/soft/jdk1.8.0_111/bin:/data05/opt/data_platform/soft/hadoop/bin:/data05/opt/data_platform/soft/hadoop/sbin)

SLF4J: Class path contains multiple SLF4Jbindings.

SLF4J: Found binding in[jar:file:/data05/opt/data_platform/soft/hive/apache-hive-2.1.1-bin/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in[jar:file:/data05/opt/data_platform/soft/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Seehttp://www.slf4j.org/codes.html#multiple_bindings for an explanation.

SLF4J: Actual binding is of type[org.apache.logging.slf4j.Log4jLoggerFactory]

Metastore connection URL:    jdbc:mysql://你的ip:3306/hive?createDatabaseIfNotExist=true&characterEncoding=utf-8

Metastore Connection Driver :        com.mysql.jdbc.Driver

Metastore connection User:   hive

Starting metastore schema initialization to2.1.0

Initialization scripthive-schema-2.1.0.mysql.sql

Error: Duplicate key name 'PCS_STATS_IDX'(state=42000,code=1061)

org.apache.hadoop.hive.metastore.HiveMetaException:Schema initialization FAILED! Metastore state would be inconsistent !!

Underlying cause: java.io.IOException :Schema script failed, errorcode 2

Use --verbose for detailed stacktrace.

*** schemaTool failed ***

 【如果是当前hive-site配置的主机不允许访问Mysql server的话,登录mysql 进行授权,详细见《 XXXX hive安装(上)》

GRANT ALL PRIVILEGES ON *.* TO'hive'@'%' IDENTIFIED BY 'hive'WITH GRANT OPTION;

FLUSH PRIVILEGES ;】

 

解决方法:进入MySQL,删掉metastore数据库再重新创建metastore数据库即可(以上定义的metastore数据库的名称是hive)

登录mysql客户端:Linux命令行直接执行 mysql -uhive -phive  (-u 用户  -p 密码, k-v中间不需要空格!)

mysql> show databases;

+--------------------+

| Database           |

+--------------------+

| information_schema |

| hive               |

+--------------------+

2 rows in set (0.00 sec)

 

mysql> drop database hive;

Query OK, 57 rows affected (1.85 sec)

 

mysql> show databases;

+--------------------+

| Database           |

+--------------------+

| information_schema |

+--------------------+

1 row in set (0.00 sec)

 

重新执行初始化:

[[email protected]]$bin/schematool -initSchema -dbType mysql

which: no hbase in(/data05/opt/data_platform/soft/hive/apache-hive-2.1.1-bin/bin:/data05/opt/data_platform/soft/hive/apache-hive-2.1.1-bin/conf:/opt/data_platform/soft/oracle/instantclient_11_2:/opt/data_platform/soft/jdk1.8.0_111/bin:/data05/opt/data_platform/soft/hive/apache-hive-2.1.1-bin/bin:/data05/opt/data_platform/soft/hive/apache-hive-2.1.1-bin/conf:/opt/data_platform/soft/oracle/instantclient_11_2:/opt/data_platform/soft/jdk1.8.0_111/bin:.:/opt/data_platform/soft/jdk1.8.0_111/bin:/opt/data_platform/soft/jdk1.8.0_111/jre/bin:/data05/opt/data_platform/soft/hadoop/sbin:/data05/opt/data_platform/soft/hadoop/bin:/opt/data_platform/soft/oracle/instantclient_11_2:/opt/data_platform/soft/jdk1.8.0_111/bin:/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/data05/opt/data_platform/soft/scala/bin:/data05/opt/data_platform/soft/hive/apache-hive-2.1.1-bin/bin:/opt/data_platform/soft/jdk1.8.0_111/bin:/data05/opt/data_platform/soft/hadoop/bin:/data05/opt/data_platform/soft/hadoop/sbin:/home/es_user/bin:/data05/opt/data_platform/soft/scala/bin:/opt/data_platform/soft/jdk1.8.0_111/bin:/data05/opt/data_platform/soft/hadoop/bin:/data05/opt/data_platform/soft/hadoop/sbin:/data05/opt/data_platform/soft/scala/bin:/opt/data_platform/soft/jdk1.8.0_111/bin:/data05/opt/data_platform/soft/hadoop/bin:/data05/opt/data_platform/soft/hadoop/sbin)

SLF4J: Class path contains multiple SLF4Jbindings.

SLF4J: Found binding in[jar:file:/data05/opt/data_platform/soft/hive/apache-hive-2.1.1-bin/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in[jar:file:/data05/opt/data_platform/soft/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: See http://www.slf4j.org/codes.html#multiple_bindingsfor an explanation.

SLF4J: Actual binding is of type[org.apache.logging.slf4j.Log4jLoggerFactory]

Metastore connection URL:    jdbc:mysql://你的ip:3306/hive?createDatabaseIfNotExist=true&characterEncoding=utf-8

Metastore Connection Driver :        com.mysql.jdbc.Driver

Metastore connection User:   hive

Starting metastore schema initialization to2.1.0

Initialization scripthive-schema-2.1.0.mysql.sql

Initialization script completed

schemaTool completed

 

8、运行(本地模式/远程模式)

[your_user@XXXX apache-hive-2.1.1-bin]$bin/hive

which: no hbase in(/data05/opt/data_platform/soft/hive/apache-hive-2.1.1-bin/bin:/data05/opt/data_platform/soft/hive/apache-hive-2.1.1-bin/conf:/opt/data_platform/soft/oracle/instantclient_11_2:/opt/data_platform/soft/jdk1.8.0_111/bin:/data05/opt/data_platform/soft/hive/apache-hive-2.1.1-bin/bin:/data05/opt/data_platform/soft/hive/apache-hive-2.1.1-bin/conf:/opt/data_platform/soft/oracle/instantclient_11_2:/opt/data_platform/soft/jdk1.8.0_111/bin:.:/opt/data_platform/soft/jdk1.8.0_111/bin:/opt/data_platform/soft/jdk1.8.0_111/jre/bin:/data05/opt/data_platform/soft/hadoop/sbin:/data05/opt/data_platform/soft/hadoop/bin:/opt/data_platform/soft/oracle/instantclient_11_2:/opt/data_platform/soft/jdk1.8.0_111/bin:/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/data05/opt/data_platform/soft/scala/bin:/data05/opt/data_platform/soft/hive/apache-hive-2.1.1-bin/bin:/opt/data_platform/soft/jdk1.8.0_111/bin:/data05/opt/data_platform/soft/hadoop/bin:/data05/opt/data_platform/soft/hadoop/sbin:/home/es_user/bin:/data05/opt/data_platform/soft/scala/bin:/opt/data_platform/soft/jdk1.8.0_111/bin:/data05/opt/data_platform/soft/hadoop/bin:/data05/opt/data_platform/soft/hadoop/sbin:/data05/opt/data_platform/soft/scala/bin:/opt/data_platform/soft/jdk1.8.0_111/bin:/data05/opt/data_platform/soft/hadoop/bin:/data05/opt/data_platform/soft/hadoop/sbin)

SLF4J:Class path contains multiple SLF4J bindings.

SLF4J:Found binding in [jar:file:/data05/opt/data_platform/soft/hive/apache-hive-2.1.1-bin/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J:Found binding in[jar:file:/data05/opt/data_platform/soft/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J:See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.

SLF4J:Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]

Logginginitialized using configuration in jar:file:/data05/opt/data_platform/soft/hive/apache-hive-2.1.1-bin/lib/hive-common-2.1.1.jar!/hive-log4j2.propertiesAsync: true

Hive-on-MRis deprecated in Hive 2 and may not be available in the future versions.Consider using a different execution engine (i.e. spark, tez) or using Hive 1.Xreleases.

hive>

说明:which: nohbase in 表示没有安装hbase,只是一个提示,不影响使用。

hive> create database test_db;

OK

Time taken: 0.214 seconds

 

hive> show databases;

OK

default

test_db

Time taken: 0.027 seconds, Fetched: 2row(s)

 

hive> use test_db;

OK

Time taken: 0.028 seconds

 

hive> create table test1(name string,age int);

OK

Time taken: 0.492 seconds

 

hive> insert into test1values('tom',20);

WARNING: Hive-on-MR is deprecated in Hive 2and may not be available in the future versions. Consider using a differentexecution engine (i.e. spark, tez) or using Hive 1.X releases.

Query ID =es_user_20170830160012_868d00f4-c775-487e-a934-bced26f22c54

Total jobs = 3

Launching Job 1 out of 3

Number of reduce tasks is set to 0 sincethere's no reduce operator

Starting Job = job_1504074498349_0002,Tracking URL = http://ES100:8090/proxy/application_1504074498349_0002/

Kill Command =/data05/opt/data_platform/soft/hadoop/bin/hadoop job  -kill job_1504074498349_0002

Hadoop job information for Stage-1: numberof mappers: 1; number of reducers: 0

2017-08-30 16:01:08,915 Stage-1 map =0%,  reduce = 0%

2017-08-30 16:01:15,149 Stage-1 map =100%,  reduce = 0%, Cumulative CPU 2.6sec

MapReduce Total cumulative CPU time: 2seconds 600 msec

Ended Job = job_1504074498349_0002

Stage-4 is selected by condition resolver.

Stage-3 is filtered out by conditionresolver.

Stage-5 is filtered out by conditionresolver.

Moving data to directory hdfs://ES100:9000/user/hive/warehouse/test_db.db/test1/.hive-staging_hive_2017-08-30_16-00-12_103_8680995249472473972-1/-ext-10000

Loading data to table test_db.test1

MapReduce Jobs Launched:

Stage-Stage-1: Map: 1   Cumulative CPU: 2.6 sec   HDFS Read: 4103 HDFS Write: 76 SUCCESS

Total MapReduce CPU Time Spent: 2 seconds600 msec

OK

Time taken: 64.554 seconds

 

hive> select * from test1;

OK

tom  20

Time taken: 0.115 seconds, Fetched: 1row(s)

 

 

启动hivemetastore服务【远程模式下】:

bin/hive --service metastore 2>&1>> /tmp/hive_metastore.log &

或者

hive --service metastore  &

 

报错与问题:

***************************

启动报错

 

【1、查看MySQL 状态】

启动mysql:

service mysqld start (5.0版本是mysqld)

service mysql start  (5.5.7版本是mysql)

service 是root的命令,所以要加sudo

 

[your_user@XXXX bin]$ sudo service mysqldstart

正在启动 mysqld:                                          [确定]

[your_user@XXXX bin]$ sudo service mysqldstatus

mysqld (pid 28955) 正在运行...

 

mysql -uhive -phive

mysql> show databases;

+--------------------+

| Database           |

+--------------------+

| information_schema |

| hive               |

+--------------------+

 

注意:登录hive客户端是看不到hive自己的metastore数据库的。

 

2 rows in set (0.00 sec)

mysql> use hive

Reading table information for completion oftable and column names

You can turn off this feature to get aquicker startup with -A

 

Database changed

 

mysql> show tables;

+---------------------------+

| Tables_in_hive            |

+---------------------------+

| AUX_TABLE                 |

| BUCKETING_COLS            |

| CDS                       |

| COLUMNS_V2                |

| COMPACTION_QUEUE          |

| COMPLETED_COMPACTIONS     |

| COMPLETED_TXN_COMPONENTS  |

| DATABASE_PARAMS           |

| DBS                       |

| DB_PRIVS                  |

| DELEGATION_TOKENS         |

| FUNCS                     |

| FUNC_RU                   |

| GLOBAL_PRIVS              |

| HIVE_LOCKS                |

| IDXS                      |

| INDEX_PARAMS              |

| KEY_CONSTRAINTS           |

| MASTER_KEYS               |

| NEXT_COMPACTION_QUEUE_ID  |

| NEXT_LOCK_ID              |

| NEXT_TXN_ID               |

| NOTIFICATION_LOG          |

| NOTIFICATION_SEQUENCE     |

| NUCLEUS_TABLES            |

| PARTITIONS                |

| PARTITION_EVENTS          |

| PARTITION_KEYS            |

| PARTITION_KEY_VALS        |

| PARTITION_PARAMS          |

| PART_COL_PRIVS            |

| PART_COL_STATS            |

| PART_PRIVS                |

| ROLES                     |

| ROLE_MAP                  |

| SDS                       |

| SD_PARAMS                 |

| SEQUENCE_TABLE            |

| SERDES                    |

| SERDE_PARAMS              |

| SKEWED_COL_NAMES          |

| SKEWED_COL_VALUE_LOC_MAP  |

| SKEWED_STRING_LIST        |

| SKEWED_STRING_LIST_VALUES |

| SKEWED_VALUES             |

| SORT_COLS                 |

| TABLE_PARAMS              |

| TAB_COL_STATS             |

| TBLS                      |

| TBL_COL_PRIVS             |

| TBL_PRIVS                 |

| TXNS                      |

| TXN_COMPONENTS            |

| TYPES                     |

| TYPE_FIELDS               |

| VERSION                   |

| WRITE_SET                 |

+---------------------------+

57 rows in set (0.00 sec)

说明:hive的元数据库的表,不需要我们过多关注。

 

【2、连接失败】

注意:使用元数据服务器模式,需要先启动元数据服务:hive --service metastore &

否则会报错:

【3、超时关闭】

如果在wait_timeout秒期间内,数据库连接(Java.sql.Connection)一直处于等待状态,mysql就将该连接关闭。

这时,你的Java应用的连接池仍然合法地持有该连接的引用。当用该连接来进行数据库操作时,就碰到上述错误。

 

vi /etc/my.cnf

追加wait_timeout=1814400

重启mysql

 

mysql> show global variables like "wait_timeout";

+---------------+---------+

| Variable_name | Value   |

+---------------+---------+

| wait_timeout  | 1814400 |

+---------------+---------+

1 row in set (0.00 sec)

【4】

hive> show tables;

OK

Failed with exceptionjava.io.IOException:java.lang.IllegalArgumentException:java.net.URISyntaxException: Relative path in absolute URI: ${system:user.name%7D

Time taken: 0.02 seconds

 

解决方法:
在hive-site.xml中添加

  system:java.io.tmpdir

    /home/grid/apache-hive-2.1.0-bin/tmpdir

     

    system:user.name

    hive

   

 

9、其他节点调用hive

将以上的hive目录直接拷贝到其他节点,

配置/etc/profile,source /etc/profile

配置io.tmpdir 路径

然后就可以直接连接到主节点的hive了。

10、spark 整合 hive

 

Spark SQL的其中一个分支就是Spark onHive,

也就是使用Hive中HQL的解析、逻辑执行计划翻译、执行计划优化等逻辑,

可以近似认为仅将物理执行计划从MR作业替换成了Spark作业。

 

预先编译好的Spark assembly包是不支持Hive的,如果你需要在Spark中使用Hive,必须重新编译,

加上-Phive选项既可,具体如下:

 

cd $SPARK_HOME

./make-distribution.sh --tgz -Phadoop-2.6-Pyarn -DskipTests -Dhadoop.version=2.6.0 -Phive

 

编译完成之后,会在SPARK_HOME的lib目录下多产生三个jar包:

         datanucleus-api-jdo-3.2.6.jar、

         datanucleus-core-3.2.10.jar、

         datanucleus-rdbms-3.2.9.jar,

这些包都是Hive所需要的。不想编译,直接拷贝放入lib目录

 

启动hive的metastore服务:

bin/hive --service metastore 2>&1>> /tmp/hive_metastore.log &

 

在SPARK_HOME/conf目录下创建hive-site.xml文件,然后在该配置文件中,仅仅添加hive.metastore.uris属性则可

(当然也可以将HIVE_HOME目录的整个hive-site.xml 拷贝到 spark的conf目录):

 

 

 

 

   hive.metastore.uris

   thrift://你的ip:9083

   Thrift URI for the remote metastore. Used bymetastore client to connect to remote metastore.

然后将mysql的驱动包拷贝到 spark的lib目录。

 

启动spark-sql:(通过bin/spark-sql –help可以查看CLI命令参数)

bin/spark-sql --master spark://你的ip:7077 --executor-memory1g

 

 

【报错:】

Mon Feb 13 12:01:42 EST 2017 WARN:Establishing SSL connection without server's identity verification is notrecommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSLconnection must be established by default if explicit option isn't set. Forcompliance with existing applications not using SSL the verifyServerCertificateproperty is set to 'false'. You need either to explicitly disable SSL bysetting useSSL=false, or set useSSL=true and provide truststore for servercertificate verification.

Exception in thread "main"java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException:java.lang.RuntimeException: Unable to instantiateorg.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient

   atorg.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:591)

   atorg.apache.hadoop.hive.ql.session.SessionState.beginStart(SessionState.java:531)

   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:705)

   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:641)

   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

   atsun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

   at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

   at java.lang.reflect.Method.invoke(Method.java:498)

   at org.apache.hadoop.util.RunJar.run(RunJar.java:221)

   at org.apache.hadoop.util.RunJar.main(RunJar.java:136)

Caused by:org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException:Unable to instantiateorg.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient

   atorg.apache.hadoop.hive.ql.metadata.Hive.registerAllFunctionsOnce(Hive.java:226)

   at org.apache.hadoop.hive.ql.metadata.Hive.(Hive.java:366)

   at org.apache.hadoop.hive.ql.metadata.Hive.create(Hive.java:310)

   at org.apache.hadoop.hive.ql.metadata.Hive.getInternal(Hive.java:290)

   at org.apache.hadoop.hive.ql.metadata.Hive.get(Hive.java:266)

   at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:558)

   ... 9 more

Caused by: java.lang.RuntimeException:Unable to instantiateorg.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient

   at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1654)

   atorg.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:80)

   atorg.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:130)

   atorg.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:101)

   atorg.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3367)

   at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3406)

   at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3386)

   atorg.apache.hadoop.hive.ql.metadata.Hive.getAllFunctions(Hive.java:3640)

   atorg.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:236)

   at org.apache.hadoop.hive.ql.metadata.Hive.registerAllFunctionsOnce(Hive.java:221)

   ... 14 more

Caused by:java.lang.reflect.InvocationTargetException

   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)

   at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)

   atsun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)

   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)

   at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1652)

   ... 23 more

Caused by: MetaException(message:Versioninformation not found in metastore. )

   atorg.apache.hadoop.hive.metastore.ObjectStore.checkSchema(ObjectStore.java:7753)

    atorg.apache.hadoop.hive.metastore.ObjectStore.verifySchema(ObjectStore.java:7731)

   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

   atsun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

   at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

   at java.lang.reflect.Method.invoke(Method.java:498)

   atorg.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:101)

   at com.sun.proxy.$Proxy21.verifySchema(Unknown Source)

   atorg.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:565)

   atorg.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:626)

   at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:416)

   atorg.apache.hadoop.hive.metastore.RetryingHMSHandler.(RetryingHMSHandler.java:78)

   atorg.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:84)

   atorg.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:6490)

   atorg.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:238)

   at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.(SessionHiveMetaStoreClient.java:70)

   ... 28 more

        

解决方法:

  

   hive.metastore.schema.verification

   false

   

     Enforce metastore schema version consistency.

     True: Verify that version information stored in is compatible with onefrom Hive jars.  Also disable automatic

           schema migration attempt. Users are required to manually migrate schemaafter Hive upgrade which ensures

           proper metastore schema migration. (Default)

     False: Warn if the version information stored in metastore doesn't matchwith one from in Hive jars.

   

 

 

 

【正常启动】:

[XXX@XXXXX spark]$ bin/spark-sql--master spark://XXXXX:7077 --executor-memory 1g

SLF4J: Class path contains multiple SLF4Jbindings.

SLF4J: Found binding in[jar:file:/opt/sdg_date/json_data/alluxio-spark-getting-started/alluxio-1.5.0-hadoop-2.6/client/spark/alluxio-1.5.0-spark-client.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in[jar:file:/data05/opt/data_platform/soft/spark/lib/spark-assembly-1.6.3-hadoop2.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Seehttp://www.slf4j.org/codes.html#multiple_bindings for an explanation.

SLF4J: Actual binding is of type[org.slf4j.impl.Log4jLoggerFactory]

Mon Sep 04 16:16:54 CST 2017Thread[main,5,main] java.io.FileNotFoundException: derby.log (权限不够)

----------------------------------------------------------------

Mon Sep 04 16:16:54 CST 2017:

Booting Derby version The Apache SoftwareFoundation - Apache Derby - 10.12.1.1 - (1704137): instancea816c00e-015e-4bf6-b9c3-00012b5dd0f8

on database directory/tmp/spark-f8fc4bec-3624-43cc-84d2-58f2cd74392a/metastore with class loadersun.misc.Launcher$AppClassLoader@42a57993

Loaded fromfile:/data05/opt/data_platform/soft/spark/lib/spark-assembly-1.6.3-hadoop2.6.0.jar

java.vendor=Oracle Corporation

java.runtime.version=1.8.0_111-b14

user.dir=/data05/opt/data_platform/soft/spark

os.name=Linux

os.arch=amd64

os.version=2.6.32-642.13.1.el6.x86_64

derby.system.home=null

Database Class Loader started -derby.database.classpath=''

SEThive.support.sql11.reserved.keywords=false

SET spark.sql.hive.version=1.2.1

SET spark.sql.hive.version=1.2.1

spark-sql>

        

        

界面查看:

http://XXXXX:8080/ 可以查看到当前spark-sql的程序在运行。

 

 

补充笔记:

 

1)把$HIVE_HOME/conf/hive-site.sml复制到$SPARK_HOME/conf

最好另外配置,因为原生的hive-site.xml配置项太多了。

   hive.metastore.uris

   thrift://XXXX:9083

   Thrift URI for the remote metastore. Used bymetastore client to connect to remote metastore.

   

 

   hive.server2.thrift.min.worker.threads

       5

           Minimum number of Thrift workerthreads

             

 

 

     hive.server2.thrift.max.worker.threads

         500

              Maximum numberof Thrift worker threads

               

 

   hive.server2.thrift.port

   10000

              Port number ofHiveServer2 Thrift interface. Can be overridden by setting$HIVE_SERVER2_THRIFT_PORT

               

 

 

     hive.server2.thrift.bind.host

     XXXX

     Bind host on which to run the HiveServer2 Thriftinterface.Can be overridden bysetting$HIVE_SERVER2_THRIFT_BIND_HOST

      

2)在spark_env.sh中添加如下内容:

exportHADOOP_DIR_CONF=/usr/local/soft/hadoop/etc/hadoop

exportSPARK_CLASSPATH=$SPARK_CLASSPATH:/usr/local/soft/hive/lib/mysql-connector-java-5.1.10.jar

 

3)记得将以上配置,都同步到其他spark节点上。

 

 

启动方式:

1 : bin/spark-sql --master spark://hadoop1:7077 --jars /usr/local/soft/hive/lib/mysql-connector-java-5.1.10.jar

2  bin/spark-shell --master yarn-client --jars /usr/local/soft/hive/lib/mysql-connector-java-5.1.10.jar

 

以上如果都配置在环境变量,则直接登录:

bin/spark-sql  或者 bin/spark-shell 则可

 

 

确定能否集成成功之前:

首先得保证hive能用

首先得保证hadoopHDFS Mapreduce YARN

保证MySQL能用

 

 

三、基本使用

更多概念性知识点,请参考《004、Spark实战系列-L篇.docx》中的hive篇章。

1、database 操作

hive> show databases;

hive> create database my_hive;

hive> drop database aaaa;

下面的操作都是基于my_hive数据库进行:

use my_hive;

 

2、table 操作

2.1 内部表

创建内部表

由于Hive使用了类似SQL的语法,所以HQL创建内部表的语句,相对SQL而言,只是增加了行和字段分隔符。

CREATE TABLE SOGOUQ2(DT STRING,WEBSESSIONSTRING,WORD STRING,S_SEQ INT,C_SEQ INT,WEBSITE STRING) ROW FORMAT DELIMITEDFIELDS TERMINATED BY '\001' LINES TERMINATED BY '\n' ;

HQL格式化后:

CREATE TABLE SOGOUQ2 (

         DT STRING,

         WEBSESSION STRING,

         WORD STRING,

         S_SEQ INT,

         C_SEQ INT,

         WEBSITE STRING

) ROW FORMAT DELIMITED FIELDS TERMINATED BY 'v-v' LINESTERMINATED BY '\n' ;

 

默认列分隔符是\t,  这里自顶一个一个新的分隔符\001

注意:自定义的分隔符必须是单个字符,形如 @#@ 只会识别为第一个字符@

 

常用的分隔符

FIELDS TERMINATED BY '\001'

COLLECTION ITEMS TERMINATED BY '\002'

MAP KEYS TERMINATED BY '\003'

LINES TERMINATED BY '\n'

 

删除内部表

hive> drop table SOGOUQ2;

加载外部数据

数据文件可以从HDFS或者本地操作系统加载到表中。

加载HDFS文件使用LOAD DATA INPATH

加载本地操作系统文件使用LOADDATA LOCAL INPATH命令。

HIVE表保存的默认路径在${HIVE_HOME}/conf/hive-site.xml配置文件的hive.metastore.warehouse.dir属性指定。

当创建表时会在hive.metastore.warehouse.dir指向的目录下以表名创建一个文件夹,在本演示中表默认指向的是/user/hive/warehouse 

数据文件在本地操作系统复制表对应的目录中,而数据文件在HDFS中,数据文件将移动表对应的目录中,原来的路径将不存在该文件

搜狗日志数据可以从http://www.sogou.com/labs/resource/q.php  下载,其中完整版大概2GB左右,文件中字段分别为:

 

访问时间\t

用户ID\t

[查询词]\t

URL在返回结果中的排名\t

用户点击的顺序号\t

用户点击的URL

 

专门为您生成的下载账号密码已启用。

用户名:XXXXXX@163.com

密码:O&2ArWdONEcmrse&

请不要将用户名、密码公布给其他人,否则搜狗将终止您的下载权限!

 

[es_user@ES100 hive]$ wc -l SogouQ.sample

10000 SogouQ.sample

里面的字段\t 有问题,需要修复,并转为新的分隔符:sed"s/v-v/^A/g" SogouQ.sample > SogouQ.sample_A

Linux 环境下,^A = ctrl + v+ A

 

hive> LOAD DATA LOCAL INPATH '/data05/opt/data_platform/soft/hive/SogouQ.sample_A' INTO TABLESOGOUQ2;

hive> select * from SOGOUQ2 limit 10;

 

http://es100:50070/explorer.html#/user/hive/warehouse/my_hive.db/sogouq2

 

查看HQL对应的MR作业状态

hive> select count(*) from Sogouq2;

 

查看job的工作状态:

[es_user@ES100 hive]$ mr-jobhistory-daemon.sh start historyserver

starting historyserver, logging to/data05/opt/data_platform/soft/hadoop/logs/mapred-es_user-historyserver-xxx.out

[xxxxxx@xxxxx hive]$ jps

12512 AlluxioWorker

31457 DataNode

12643 AlluxioProxy

12324 AlluxioMaster

8844 JobHistoryServer

6384 Worker

31794 ResourceManager

31635 SecondaryNameNode

31924 NodeManager

31317 NameNode

8887 Jps

6265 Master

5662 RunJar

 

【yarn-site.xml】

       yarn.resourcemanager.webapp.address

       xxxxx:8090

http://es100:8090/cluster


查询行数

hive> select count(*) from Sogouq2;

包含baidu的行数

hive> select count(*) from SOGOUQ2 where WEBSITE like '%baidu%';


查询结果排名第1,点击次序排第2,其中URL包含baidu的数据

hive > select count(*) from SOGOUQ2 whereS_SEQ=1 and C_SEQ=2 and WEBSITE like '%baidu%';


查询次数排行榜

按照session号进行归组,并按照查询次数进行排序,最终显示查询次数最多的前10条。

hive> selectWEBSESSION,count(WEBSESSION) as cw from SOGOUQ2 group by WEBSESSION order by cw desc limit 10;


2.2 外部表

创建表关联数据

hadoop fs -mkdir -p /my_hive/Sougo_data/

hadoop fs -put SogouQ.sample_A/my_hive/Sougo_data/

hadoop fs -lsr /my_hive/Sougo_data/


创建外部表,指定表存放目录

创建外部表多了EXTERNAL关键字以及指定了表对应存放文件夹LOCATION ‘/class5/sogouq1’ 

CREATE EXTERNAL TABLE SOGOUQ1(DTSTRING,WEBSESSION STRING,WORD STRING,S_SEQ INT,C_SEQ INT,WEBSITE STRING) ROW FORMAT DELIMITED FIELDSTERMINATED BY '\001'LINES TERMINATED BY '\n'STORED AS TEXTFILE LOCATION '/my_hive/Sougo_data/';


【注】在删除表的时候,内部表将删除表的元数据和数据文件;而删除外部表的时候,仅仅删除外部表的元数据,不删除数据文件

 

加载数据文件到外部表对应的目录中 

创建Hive外部表关联数据文件有两种方式:

一种是把外部表数据位置直接关联到数据文件所在目录上,这种方式适合数据文件已经在HDFS存在;

另外一种方式是创建表时指定外部表数据目录,随后把数据加载到该目录下。

以上属于第一种:我们的源数据放在/my_hive/Sougo_data/,同时外部表也是指定/my_hive/Sougo_data/,所以表和数据直接关联在一起了,可以直接查询:

 

下面演示第二种情况:

CREATE EXTERNAL TABLE SOGOUQ3(DTSTRING,WEBSESSION STRING,WORD STRING,S_SEQ INT,C_SEQ INT,WEBSITE STRING) ROW FORMAT DELIMITED FIELDSTERMINATED BY '\001'LINES TERMINATED BY '\n'STORED AS TEXTFILE LOCATION '/my_hive/Sougo_data/SOGOUQ3';

 

创建外部表后,立即生成表对应的目录:

hadoop fs -lsr /my_hive/Sougo_data/

drwxr-xr-x  - es_user supergroup          02017-08-30 18:24 /my_hive/Sougo_data/SOGOUQ3

 

加载数据到该目录:

(1)      可以从本地文件系统复制上传到hdfs的表目录

hadoop fs -copyFromLocal /data05/opt/data_platform/soft/hive/SogouQ.sample_A /my_hive/Sougo_data/SOGOUQ3

(2)      也可以从其它HDFS路径copy到表目录

hadoop fs -cp/my_hive/Sougo_data/SogouQ.sample_A  /my_hive/Sougo_data/SOGOUQ3

 

综合案例:交易数据查询

 

2.3 分区表

 后续补全

 

2.4 桶

后续补全

 


你可能感兴趣的:(二,大数据环境篇)