hive独立模式安装--jared


该部署笔记是在2014年年初记录,现在放在51cto上。

有关hadoop基础环境的搭建请参考如下链接:

http://ganlanqing.blog.51cto.com/6967482/1387210


JDK版本:jdk-7u51-linux-x64.rpm
hadoop版本:hadoop-0.20.2.tar.gz
hive版本:hive-0.12.0.tar.gz
mysql驱动包版本:mysql-connector-java-5.1.7-bin.jar

1.安装mysql环境
[root@master ~]# yum install mysql mysql-server -y
[root@master ~]# /etc/init.d/mysqld start
[root@master ~]# mysqladmin -uroot  password "123456"
[root@master ~]# mysql -uroot -p123456
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 2
Server version: 5.1.73 Source distribution

Copyright (c) 2000, 2013, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> create user 'hive' identified by '123456';
Query OK, 0 rows affected (0.00 sec)

mysql> GRANT ALL PRIVILEGES ON *.* TO 'hive'@'master' IDENTIFIED BY '123456' WITH GRANT OPTION;
Query OK, 0 rows affected (0.00 sec)

mysql> flush privileges;
Query OK, 0 rows affected (0.00 sec)

mysql> exit
Bye
[root@master ~]#
#########################

缺少hive用户创建hive库的步骤!


#########################

2.下载hive安装包
[jared@master conf]$ wget http://mirror.bjtu.edu.cn/apache/hive/hive-0.12.0/hive-0.12.0-bin.tar.gz
[jared@master conf]$ gzip -d hive-0.12.0.tar.gz
[jared@master conf]$ tar -xf hive-0.12.0.tar
[jared@master conf]$ mv hive-0.12.0 hive


3.设置环境变量
[root@master ~]# vim /etc/profile
export JAVA_HOME=/usr/java/jdk1.7.0_51
export HIVE_HOME=/home/jared/hive
export HIVE_CONF_DIR=/home/jared/hive/conf
export HIVE_LIB=$HIVE_HOME/lib
export HADOOP_INSTALL=/home/jared/hadoop
export HBASE_INSTALL=/home/jared/hbase
export PATH=$PATH:$HADOOP_INSTALL/bin:$HBASE_INSTALL/bin:$HIVE_HOME/bin
[root@master ~]# source /etc/profile
[root@master ~]# exit
logout
[jared@master conf]$ pwd
/home/jared/hive/conf
[jared@master conf]$ source /etc/profile
[jared@master conf]$ echo $HIVE_HOME
/home/jared/hive
[jared@master conf]$ cp hive-env.sh.template hive-env.sh
[jared@master conf]$ vim hive-env.sh
export HADOOP_HEAPSIZE=1024
HADOOP_HOME=/home/jared/hadoop
export HIVE_CONF_DIR=/home/jared/hive/conf
export HIVE_AUX_JARS_PATH=/home/jared/hive/lib
[jared@master conf]$ source hive-env.sh


4.配置hive-site.xml

[jared@master conf]$ vim hive-site.xml


   javax.jdo.option.ConnectionURL
   jdbc:mysql://master:3306/hive?createDatabaseIfNotExist=true



   javax.jdo.option.ConnectionDriverName
   com.mysql.jdbc.Driver



   javax.jdo.option.ConnectionUserName
   hive



   javax.jdo.option.ConnectionPassword
   123456



5.把mysql的驱动包拷贝到Hive安装路径下的lib目录
[jared@master ~]$ wget http://cdn.mysql.com/archives/mysql-connector-java-5.1/mysql-connector-java-5.1.7.tar.gz
[jared@master ~]$ tar -zxvf mysql-connector-java-5.1.7.tar.gz
[jared@master ~]$ cd mysql-connector-java-5.1.7
[jared@master ~]$ cp mysql-connector-java-5.1.7-bin.jar /home/jared/hive/lib/


6.CLI访问接口:shell
[jared@master ~]$ hive

Logging initialized using configuration in jar:file:/home/jared/hive/lib/hive-common-0.12.0.jar!/hive-log4j.properties
hive> show databases;
OK
default
Time taken: 11.506 seconds, Fetched: 1 row(s)
hive> create table test (key string);
OK
Time taken: 2.805 seconds
hive> show tables;
OK
test
Time taken: 0.091 seconds, Fetched: 1 row(s)
hive>

7.本地上传数据测试

本地文件信息
文件名:access.log
大小:11M
[jared@master input]$ du -h access.log
11M     access.log
[jared@master input]$ cat access.log |wc -l
60000
数据结构:
[jared@master input]$ cat access.log
1393960136.926 0 212.92.231.166 TCP_DENIED/403 1256 GET http://221.181.39.85/phpTest/zologize/axa.php - NONE/- text/html "-" "-" -
1393960137.600 0 212.92.231.166 TCP_DENIED/403 1264 GET http://221.181.39.85/phpMyAdmin/scripts/setup.php - NONE/- text/html "-" "-" -
1393960138.274 0 212.92.231.166 TCP_DENIED/403 1250 GET http://221.181.39.85/pma/scripts/setup.php - NONE/- text/html "-" "-" -
1393960138.946 0 212.92.231.166 TCP_DENIED/403 1258 GET http://221.181.39.85/myadmin/scripts/setup.php - NONE/- text/html "-" "-" -
1393960143.624 1 127.0.0.1 TCP_HIT/200 22874 GET http://www.chinacache.com/p_w_picpaths/logo.gif - NONE/- p_w_picpath/gif "-" "-" -
1393960143.628 1 127.0.0.1 TCP_HIT/200 22874 GET http://www.chinacache.com/p_w_picpaths/logo.gif - NONE/- p_w_picpath/gif "-" "-" -
1393960144.636 2 127.0.0.1 TCP_HIT/200 22874 GET http://www.chinacache.com/p_w_picpaths/logo.gif - NONE/- p_w_picpath/gif "-" "-" -
1393960145.643 2 127.0.0.1 TCP_HIT/200 22874 GET http://www.chinacache.com/p_w_picpaths/logo.gif - NONE/- p_w_picpath/gif "-" "-" -
1393982948.194 1 112.5.4.63 TCP_HIT/200 467 GET http://cu005.www.duba.net/duba/2011/kcomponent/kcom_commonfast/53a08fed.dat - NONE/- text/plain "-" "-" -
1393982948.246 0 218.203.54.25 TCP_HIT/200 462 GET http://cu005.www.duba.net/duba/2011/kcomponent/kcom_kvm2/indexkcom_kvm2.dat - NONE/- text/plain "-" "-" -
1393982948.258 0 218.203.54.25 TCP_HIT/200 467 GET http://cu005.www.duba.net/duba/2011/kcomponent/kcom_commonfast/53a08fed.dat - NONE/- text/plain "-" "-" -

建立表结构
hive> CREATE TABLE CU005_LOG (TIMES_TAMP STRING,RES_TIME INT,FC_IP STRING,FC_HANDLING STRING,FILE_SIZE INT,REQ_METHOD STRING,URL STRING,USER STRING,BACK_SRC STRING,MIME STRING,REFERER STRING,UA STRING,COOKIE STRING )ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ' STORED AS TEXTFILE;
hive> show tables;
OK
cu005_log
Time taken: 0.08 seconds, Fetched: 1 row(s)
hive>desc cu005_log;
OK
times_tamp              string                  None                
res_time                int                     None                
fc_ip                   string                  None                
fc_handling             string                  None                
file_size               int                     None                
req_method              string                  None                
url                     string                  None                
user                    string                  None                
back_src                string                  None                
mime                    string                  None                
referer                 string                  None                
ua                      string                  None                
cookie                  string                  None                
Time taken: 0.208 seconds, Fetched: 13 row(s)
hive>

导入本地数据
hive> LOAD DATA LOCAL INPATH '/home/jared/input/access.log' OVERWRITE INTO TABLE CU005_LOG;
Copying data from file:/home/jared/input/access.log
Copying file: file:/home/jared/input/access.log
Loading data to table default.cu005_log
Table default.cu005_log stats: [num_partitions: 0, num_files: 1, num_rows: 0, total_size: 10872324, raw_data_size: 0]
OK
Time taken: 1.811 seconds

在hadoop集群中的存放位置
hdfs://master:9000/user/hive/warehouse/cu005_log/access.log
[jared@master ~]$ hadoop dfs -ls /user/hive/warehouse/
Found 1 items
drwxr-xr-x   - jared supergroup          0 2014-03-06 18:31 /user/hive/warehouse/cu005_log

查询
hive> select count(*) from cu005_log;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=
In order to set a constant number of reducers:
  set mapred.reduce.tasks=
Starting Job = job_201402230829_0003, Tracking URL = http://master:50030/jobdetails.jsp?jobid=job_201402230829_0003
Kill Command = /home/jared/hadoop/bin/../bin/hadoop job  -kill job_201402230829_0003
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2014-03-06 17:18:07,994 Stage-1 map = 0%,  reduce = 0%
2014-03-06 17:18:32,121 Stage-1 map = 100%,  reduce = 0%
2014-03-06 17:18:44,200 Stage-1 map = 100%,  reduce = 33%
2014-03-06 17:18:47,220 Stage-1 map = 100%,  reduce = 100%
Ended Job = job_201402230829_0003
MapReduce Jobs Launched:
Job 0: Map: 1  Reduce: 1   HDFS Read: 10872324 HDFS Write: 6 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
60000
Time taken: 77.157 seconds, Fetched: 1 row(s)
hive>


Web界面访问
详细请参考https://cwiki.apache.org/confluence/display/Hive/HiveWebInterface

需要在$HIVE_HOME/conf/hive_site.xml配置文件中添加一些配置项,添加字段如下:

  hive.hwi.listen.host
  192.168.255.25
   This is the host address the Hive Web Interface will listen on

       

  hive.hwi.listen.port
  9999
   This is the port the Hive Web Interface will listen on

       

  hive.hwi.war.file
  lib/hive-hwi-0.12.0.war
   This is the WAR file with the jsp content for Hive Web Interface


启动hive hwi

后台启动
[jared@master ~]$ nohup hive --service hwi > /dev/null 2> /dev/null &

浏览器访问 http://192.168.255.25:9999/hwi/
使用参考http://www.cnblogs.com/gpcuster/archive/2010/02/25/1673480.html

HWI与CLI对比
如果使用过cli的朋友看了上面的介绍,一定会发现一个很严重的问题:执行的过程没有提示。我们不知道某一个查询执行是什么时候结束的。

总结一下HWI与CLI对比的优缺点:

优点:HWI支持浏览器的方式浏览,方便直观。
缺点:无执行过程提示。
我个人还是更倾向于使用cli的方式