hive独立模式安装--jared
该部署笔记是在2014年年初记录,现在放在51cto上。
有关hadoop基础环境的搭建请参考如下链接:
http://ganlanqing.blog.51cto.com/6967482/1387210
JDK版本:jdk-7u51-linux-x64.rpm
hadoop版本:hadoop-0.20.2.tar.gz
hive版本:hive-0.12.0.tar.gz
mysql驱动包版本:mysql-connector-java-5.1.7-bin.jar
1.安装mysql环境
[root@master ~]# yum install mysql mysql-server -y
[root@master ~]# /etc/init.d/mysqld start
[root@master ~]# mysqladmin -uroot password "123456"
[root@master ~]# mysql -uroot -p123456
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 2
Server version: 5.1.73 Source distribution
Copyright (c) 2000, 2013, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql> create user 'hive' identified by '123456';
Query OK, 0 rows affected (0.00 sec)
mysql> GRANT ALL PRIVILEGES ON *.* TO 'hive'@'master' IDENTIFIED BY '123456' WITH GRANT OPTION;
Query OK, 0 rows affected (0.00 sec)
mysql> flush privileges;
Query OK, 0 rows affected (0.00 sec)
mysql> exit
Bye
[root@master ~]#
#########################
缺少hive用户创建hive库的步骤!
#########################
2.下载hive安装包
[jared@master conf]$ wget http://mirror.bjtu.edu.cn/apache/hive/hive-0.12.0/hive-0.12.0-bin.tar.gz
[jared@master conf]$ gzip -d hive-0.12.0.tar.gz
[jared@master conf]$ tar -xf hive-0.12.0.tar
[jared@master conf]$ mv hive-0.12.0 hive
3.设置环境变量
[root@master ~]# vim /etc/profile
export JAVA_HOME=/usr/java/jdk1.7.0_51
export HIVE_HOME=/home/jared/hive
export HIVE_CONF_DIR=/home/jared/hive/conf
export HIVE_LIB=$HIVE_HOME/lib
export HADOOP_INSTALL=/home/jared/hadoop
export HBASE_INSTALL=/home/jared/hbase
export PATH=$PATH:$HADOOP_INSTALL/bin:$HBASE_INSTALL/bin:$HIVE_HOME/bin
[root@master ~]# source /etc/profile
[root@master ~]# exit
logout
[jared@master conf]$ pwd
/home/jared/hive/conf
[jared@master conf]$ source /etc/profile
[jared@master conf]$ echo $HIVE_HOME
/home/jared/hive
[jared@master conf]$ cp hive-env.sh.template hive-env.sh
[jared@master conf]$ vim hive-env.sh
export HADOOP_HEAPSIZE=1024
HADOOP_HOME=/home/jared/hadoop
export HIVE_CONF_DIR=/home/jared/hive/conf
export HIVE_AUX_JARS_PATH=/home/jared/hive/lib
[jared@master conf]$ source hive-env.sh
4.配置hive-site.xml
[jared@master conf]$ vim hive-site.xml
5.把mysql的驱动包拷贝到Hive安装路径下的lib目录
[jared@master ~]$ wget http://cdn.mysql.com/archives/mysql-connector-java-5.1/mysql-connector-java-5.1.7.tar.gz
[jared@master ~]$ tar -zxvf mysql-connector-java-5.1.7.tar.gz
[jared@master ~]$ cd mysql-connector-java-5.1.7
[jared@master ~]$ cp mysql-connector-java-5.1.7-bin.jar /home/jared/hive/lib/
6.CLI访问接口:shell
[jared@master ~]$ hive
Logging initialized using configuration in jar:file:/home/jared/hive/lib/hive-common-0.12.0.jar!/hive-log4j.properties
hive> show databases;
OK
default
Time taken: 11.506 seconds, Fetched: 1 row(s)
hive> create table test (key string);
OK
Time taken: 2.805 seconds
hive> show tables;
OK
test
Time taken: 0.091 seconds, Fetched: 1 row(s)
hive>
7.本地上传数据测试
本地文件信息
文件名:access.log
大小:11M
[jared@master input]$ du -h access.log
11M access.log
[jared@master input]$ cat access.log |wc -l
60000
数据结构:
[jared@master input]$ cat access.log
1393960136.926 0 212.92.231.166 TCP_DENIED/403 1256 GET http://221.181.39.85/phpTest/zologize/axa.php - NONE/- text/html "-" "-" -
1393960137.600 0 212.92.231.166 TCP_DENIED/403 1264 GET http://221.181.39.85/phpMyAdmin/scripts/setup.php - NONE/- text/html "-" "-" -
1393960138.274 0 212.92.231.166 TCP_DENIED/403 1250 GET http://221.181.39.85/pma/scripts/setup.php - NONE/- text/html "-" "-" -
1393960138.946 0 212.92.231.166 TCP_DENIED/403 1258 GET http://221.181.39.85/myadmin/scripts/setup.php - NONE/- text/html "-" "-" -
1393960143.624 1 127.0.0.1 TCP_HIT/200 22874 GET http://www.chinacache.com/p_w_picpaths/logo.gif - NONE/- p_w_picpath/gif "-" "-" -
1393960143.628 1 127.0.0.1 TCP_HIT/200 22874 GET http://www.chinacache.com/p_w_picpaths/logo.gif - NONE/- p_w_picpath/gif "-" "-" -
1393960144.636 2 127.0.0.1 TCP_HIT/200 22874 GET http://www.chinacache.com/p_w_picpaths/logo.gif - NONE/- p_w_picpath/gif "-" "-" -
1393960145.643 2 127.0.0.1 TCP_HIT/200 22874 GET http://www.chinacache.com/p_w_picpaths/logo.gif - NONE/- p_w_picpath/gif "-" "-" -
1393982948.194 1 112.5.4.63 TCP_HIT/200 467 GET http://cu005.www.duba.net/duba/2011/kcomponent/kcom_commonfast/53a08fed.dat - NONE/- text/plain "-" "-" -
1393982948.246 0 218.203.54.25 TCP_HIT/200 462 GET http://cu005.www.duba.net/duba/2011/kcomponent/kcom_kvm2/indexkcom_kvm2.dat - NONE/- text/plain "-" "-" -
1393982948.258 0 218.203.54.25 TCP_HIT/200 467 GET http://cu005.www.duba.net/duba/2011/kcomponent/kcom_commonfast/53a08fed.dat - NONE/- text/plain "-" "-" -
建立表结构
hive> CREATE TABLE CU005_LOG (TIMES_TAMP STRING,RES_TIME INT,FC_IP STRING,FC_HANDLING STRING,FILE_SIZE INT,REQ_METHOD STRING,URL STRING,USER STRING,BACK_SRC STRING,MIME STRING,REFERER STRING,UA STRING,COOKIE STRING )ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ' STORED AS TEXTFILE;
hive> show tables;
OK
cu005_log
Time taken: 0.08 seconds, Fetched: 1 row(s)
hive>desc cu005_log;
OK
times_tamp string None
res_time int None
fc_ip string None
fc_handling string None
file_size int None
req_method string None
url string None
user string None
back_src string None
mime string None
referer string None
ua string None
cookie string None
Time taken: 0.208 seconds, Fetched: 13 row(s)
hive>
导入本地数据
hive> LOAD DATA LOCAL INPATH '/home/jared/input/access.log' OVERWRITE INTO TABLE CU005_LOG;
Copying data from file:/home/jared/input/access.log
Copying file: file:/home/jared/input/access.log
Loading data to table default.cu005_log
Table default.cu005_log stats: [num_partitions: 0, num_files: 1, num_rows: 0, total_size: 10872324, raw_data_size: 0]
OK
Time taken: 1.811 seconds
在hadoop集群中的存放位置
hdfs://master:9000/user/hive/warehouse/cu005_log/access.log
[jared@master ~]$ hadoop dfs -ls /user/hive/warehouse/
Found 1 items
drwxr-xr-x - jared supergroup 0 2014-03-06 18:31 /user/hive/warehouse/cu005_log
查询
hive> select count(*) from cu005_log;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=
In order to set a constant number of reducers:
set mapred.reduce.tasks=
Starting Job = job_201402230829_0003, Tracking URL = http://master:50030/jobdetails.jsp?jobid=job_201402230829_0003
Kill Command = /home/jared/hadoop/bin/../bin/hadoop job -kill job_201402230829_0003
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2014-03-06 17:18:07,994 Stage-1 map = 0%, reduce = 0%
2014-03-06 17:18:32,121 Stage-1 map = 100%, reduce = 0%
2014-03-06 17:18:44,200 Stage-1 map = 100%, reduce = 33%
2014-03-06 17:18:47,220 Stage-1 map = 100%, reduce = 100%
Ended Job = job_201402230829_0003
MapReduce Jobs Launched:
Job 0: Map: 1 Reduce: 1 HDFS Read: 10872324 HDFS Write: 6 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
60000
Time taken: 77.157 seconds, Fetched: 1 row(s)
hive>
Web界面访问
详细请参考https://cwiki.apache.org/confluence/display/Hive/HiveWebInterface
需要在$HIVE_HOME/conf/hive_site.xml配置文件中添加一些配置项,添加字段如下:
启动hive hwi
后台启动
[jared@master ~]$ nohup hive --service hwi > /dev/null 2> /dev/null &
浏览器访问 http://192.168.255.25:9999/hwi/
使用参考http://www.cnblogs.com/gpcuster/archive/2010/02/25/1673480.html
HWI与CLI对比
如果使用过cli的朋友看了上面的介绍,一定会发现一个很严重的问题:执行的过程没有提示。我们不知道某一个查询执行是什么时候结束的。
总结一下HWI与CLI对比的优缺点:
优点:HWI支持浏览器的方式浏览,方便直观。
缺点:无执行过程提示。
我个人还是更倾向于使用cli的方式