一、检查之前创建的表在hdfs中的存储位置
这里假设有一个文件具有以下类数据,假设文件名称为user.txt
userid username orgid logintimes
U0001 Zhangsan G0001 10
U0002 Lisi G0001 12
U0003 Wangwu G0002 13
U0004 Liuneng G0002 18
U0005 Zhaosi G0004 29
数据已制表符\t分割
现在针对这种文件数据结构,利用hive创建一个存放这种结构数据的hive表
hive> create table t_user
> (userid string,username string,orgid string,logintimes int)
> row format delimited
> fields terminated by '\t';
OK
Time taken: 0.689 seconds
查看表t_user在hdfs的存放目录
从截图可以看出,我们创建的表默认存放在/user/hive/warehouse/t_user目录下,该目录就是存放user.txt文件的位置
二、构造数据
[root@hadoop-server01 data]# pwd
/root/data
[root@hadoop-server01 data]# cat t_user.data.1
U0001 Zhangsan G0001 10
U0002 Lisi G0001 12
U0003 Wangwu G0002 13
U0004 Liuneng G0002 18
U0005 Zhaosi G0004 29
三、将构造的数据上传到hdfs的/user/hive/warehouse/t_user目录下
[root@hadoop-server01 data]# hadoop fs -put t_user.data.1 /user/hive/warehouse/t_user
[root@hadoop-server01 data]# hadoop fs -ls /user/hive/warehouse/t_user
Found 1 items
-rw-r--r-- 1 root supergroup 115 2018-07-08 19:45 /user/hive/warehouse/t_user/t_user.data.1
从截图看出表目录下已经有我们上传的文件
四、利用hive执行查询
hive> select * from t_user;
OK
U0001 Zhangsan G0001 10
U0002 Lisi G0001 12
U0003 Wangwu G0002 13
U0004 Liuneng G0002 18
U0005 Zhaosi G0004 29
Time taken: 0.041 seconds, Fetched: 5 row(s)
从查询结果看出,完全和t_user.data.1 中数据一致
五、查看hive语句转换为MapReduce程序的过程
执行如下语句:
可以看出该count(*)语句通过hive翻译转换成了MapReduce程序提交给Hadoop集群执行,执行返回结果为5,符合结果要求
hive> select count(*) from t_user;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=
In order to set a constant number of reducers:
set mapred.reduce.tasks=
Starting Job = job_1531100516398_0001, Tracking URL = http://hadoop-server01:8088/proxy/application_1531100516398_0001/
Kill Command = /usr/local/apps/hadoop-2.4.1/bin/hadoop job -kill job_1531100516398_0001
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2018-07-08 20:00:56,964 Stage-1 map = 0%, reduce = 0%
2018-07-08 20:01:02,138 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.71 sec
2018-07-08 20:01:03,162 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.71 sec
2018-07-08 20:01:04,187 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.71 sec
2018-07-08 20:01:05,215 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.71 sec
2018-07-08 20:01:06,241 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.71 sec
2018-07-08 20:01:07,272 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.71 sec
2018-07-08 20:01:08,303 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.71 sec
2018-07-08 20:01:09,329 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 1.42 sec
2018-07-08 20:01:10,353 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 1.42 sec
MapReduce Total cumulative CPU time: 1 seconds 420 msec
Ended Job = job_1531100516398_0001
MapReduce Jobs Launched:
Job 0: Map: 1 Reduce: 1 Cumulative CPU: 1.42 sec HDFS Read: 345 HDFS Write: 2 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 420 msec
OK
5
Time taken: 20.262 seconds, Fetched: 1 row(s)
高级查询,统计按orgid分组查询,查询语句通过hive翻译转换为了MapReduce程序提交hadoop执行,查询结果符合要求
hive> select orgid,count(*) from t_user GROUP BY orgid;
hive> select orgid,count(*) from t_user GROUP BY orgid;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=
In order to set a constant number of reducers:
set mapred.reduce.tasks=
Starting Job = job_1531100516398_0002, Tracking URL = http://hadoop-server01:8088/proxy/application_1531100516398_0002/
Kill Command = /usr/local/apps/hadoop-2.4.1/bin/hadoop job -kill job_1531100516398_0002
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2018-07-08 20:05:29,787 Stage-1 map = 0%, reduce = 0%
2018-07-08 20:05:34,915 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.08 sec
2018-07-08 20:05:35,937 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.08 sec
2018-07-08 20:05:36,964 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.08 sec
2018-07-08 20:05:37,988 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.08 sec
2018-07-08 20:05:39,012 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 1.59 sec
2018-07-08 20:05:40,034 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 1.59 sec
MapReduce Total cumulative CPU time: 1 seconds 590 msec
Ended Job = job_1531100516398_0002
MapReduce Jobs Launched:
Job 0: Map: 1 Reduce: 1 Cumulative CPU: 1.59 sec HDFS Read: 345 HDFS Write: 24 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 590 msec
OK
G0001 2
G0002 2
G0004 1
Time taken: 13.8 seconds, Fetched: 3 row(s)
特别备注:如果要不显示打印的日志,可以采用静默模式启动Hive命令行,#hive -S
六、利用hive想hdfs中加载数据
之前我们使用hadoop命令手动将文件上传到 /user/hive/warehouse/t_user目录,现在我们使用hive命令load来加载
造一份数据t_user.data.2,数据内容如下
[root@hadoop-server01 data]# cat t_user.data.2
U0004 Liuneng G0002 18
U0005 Zhaosi1 G0004 29
U0006 Zhaosi2 G0004 29
U0007 Zhaosi3 G0008 29
U0008 Zhaosi4 G0007 29
U0009 Zhaosi5 G0004 29
U00010 Zhaosi6 G0004 29
U00011 Zhaosi7 G0002 29
U00012 Zhaosi8 G0004 29
U00012 zhangsan G0004 29
U00012 Liuwu G0009 29
U00012 Zhangq G0004 29
U00012 Lilin G0004 29
U00012 Zhaoqi G0004 29
现在利用hive load命令将数据加载到 /user/hive/warehouse/t_user目录下
hive> load data local inpath '/root/data/t_user.data.2' into table t_user;
Copying data from file:/root/data/t_user.data.2
Copying file: file:/root/data/t_user.data.2
Loading data to table default.t_user
Table default.t_user stats: [num_partitions: 0, num_files: 2, num_rows: 0, total_size: 481, raw_data_size: 0]
OK
Time taken: 0.19 seconds
检查文件是否上传到t_user目录,通过检查上传的文件正常,传到t_user目录
[root@hadoop-server01 data]# hadoop fs -ls /user/hive/warehouse/t_user
Found 2 items
-rw-r--r-- 1 root supergroup 123 2018-07-08 19:57 /user/hive/warehouse/t_user/t_user.data.1
-rw-r--r-- 1 root supergroup 358 2018-07-08 20:25 /user/hive/warehouse/t_user/t_user.data.2
通过hive检查数据
hive> select * from t_user;
OK
U0001 Zhangsan G0001 10
U0002 Lisi G0001 12
U0003 Wangwu G0002 13
U0004 Liuneng G0002 18
U0005 Zhaosi G0004 29
U0004 Liuneng G0002 18
U0005 Zhaosi1 G0004 29
U0006 Zhaosi2 G0004 29
U0007 Zhaosi3 G0008 29
U0008 Zhaosi4 G0007 29
U0009 Zhaosi5 G0004 29
U00010 Zhaosi6 G0004 29
U00011 Zhaosi7 G0002 29
U00012 Zhaosi8 G0004 29
U00012 zhangsan G0004 29
U00012 Liuwu G0009 29
U00012 Zhangq G0004 29
U00012 Lilin G0004 29
U00012 Zhaoqi G0004 29
检查数据正常