userid username orgid logintimes
U0001 Zhangsan G0001 10
U0002 Lisi G0001 12
U0003 Wangwu G0002 13
U0004 Liuneng G0002 18
U0005 Zhaosi G0004 29
hive> create table t_user
> (userid string,username string,orgid string,logintimes int)
> row format delimited
> fields terminated by '\t';
Time taken: 0.689 seconds
[root@hadoop-server01 data]# pwd
[root@hadoop-server01 data]# cat t_user.data.1
U0001 Zhangsan G0001 10
U0002 Lisi G0001 12
U0003 Wangwu G0002 13
U0004 Liuneng G0002 18
U0005 Zhaosi G0004 29
[root@hadoop-server01 data]# hadoop fs -put t_user.data.1 /user/hive/warehouse/t_user
[root@hadoop-server01 data]# hadoop fs -ls /user/hive/warehouse/t_user
Found 1 items
-rw-r--r-- 1 root supergroup 115 2018-07-08 19:45 /user/hive/warehouse/t_user/t_user.data.1
hive> select * from t_user;
U0001 Zhangsan G0001 10
U0002 Lisi G0001 12
U0003 Wangwu G0002 13
U0004 Liuneng G0002 18
U0005 Zhaosi G0004 29
Time taken: 0.041 seconds, Fetched: 5 row(s)
从查询结果看出,完全和t_user.data.1 中数据一致
hive> select count(*) from t_user;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=
In order to set a constant number of reducers:
set mapred.reduce.tasks=
Starting Job = job_1531100516398_0001, Tracking URL = http://hadoop-server01:8088/proxy/application_1531100516398_0001/
Kill Command = /usr/local/apps/hadoop-2.4.1/bin/hadoop job -kill job_1531100516398_0001
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2018-07-08 20:00:56,964 Stage-1 map = 0%, reduce = 0%
2018-07-08 20:01:02,138 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.71 sec
2018-07-08 20:01:03,162 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.71 sec
2018-07-08 20:01:04,187 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.71 sec
2018-07-08 20:01:05,215 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.71 sec
2018-07-08 20:01:06,241 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.71 sec
2018-07-08 20:01:07,272 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.71 sec
2018-07-08 20:01:08,303 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.71 sec
2018-07-08 20:01:09,329 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 1.42 sec
2018-07-08 20:01:10,353 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 1.42 sec
MapReduce Total cumulative CPU time: 1 seconds 420 msec
Ended Job = job_1531100516398_0001
MapReduce Jobs Launched:
Job 0: Map: 1 Reduce: 1 Cumulative CPU: 1.42 sec HDFS Read: 345 HDFS Write: 2 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 420 msec
Time taken: 20.262 seconds, Fetched: 1 row(s)
hive> select orgid,count(*) from t_user GROUP BY orgid;
hive> select orgid,count(*) from t_user GROUP BY orgid;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=
In order to set a constant number of reducers:
set mapred.reduce.tasks=
Starting Job = job_1531100516398_0002, Tracking URL = http://hadoop-server01:8088/proxy/application_1531100516398_0002/
Kill Command = /usr/local/apps/hadoop-2.4.1/bin/hadoop job -kill job_1531100516398_0002
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2018-07-08 20:05:29,787 Stage-1 map = 0%, reduce = 0%
2018-07-08 20:05:34,915 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.08 sec
2018-07-08 20:05:35,937 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.08 sec
2018-07-08 20:05:36,964 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.08 sec
2018-07-08 20:05:37,988 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.08 sec
2018-07-08 20:05:39,012 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 1.59 sec
2018-07-08 20:05:40,034 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 1.59 sec
MapReduce Total cumulative CPU time: 1 seconds 590 msec
Ended Job = job_1531100516398_0002
MapReduce Jobs Launched:
Job 0: Map: 1 Reduce: 1 Cumulative CPU: 1.59 sec HDFS Read: 345 HDFS Write: 24 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 590 msec
G0001 2
G0002 2
G0004 1
Time taken: 13.8 seconds, Fetched: 3 row(s)
特别备注:如果要不显示打印的日志,可以采用静默模式启动Hive命令行,#hive -S
之前我们使用hadoop命令手动将文件上传到 /user/hive/warehouse/t_user目录,现在我们使用hive命令load来加载
[root@hadoop-server01 data]# cat t_user.data.2
U0004 Liuneng G0002 18
U0005 Zhaosi1 G0004 29
U0006 Zhaosi2 G0004 29
U0007 Zhaosi3 G0008 29
U0008 Zhaosi4 G0007 29
U0009 Zhaosi5 G0004 29
U00010 Zhaosi6 G0004 29
U00011 Zhaosi7 G0002 29
U00012 Zhaosi8 G0004 29
U00012 zhangsan G0004 29
U00012 Liuwu G0009 29
U00012 Zhangq G0004 29
U00012 Lilin G0004 29
U00012 Zhaoqi G0004 29
现在利用hive load命令将数据加载到 /user/hive/warehouse/t_user目录下
hive> load data local inpath '/root/data/t_user.data.2' into table t_user;
Copying data from file:/root/data/t_user.data.2
Copying file: file:/root/data/t_user.data.2
Loading data to table default.t_user
Table default.t_user stats: [num_partitions: 0, num_files: 2, num_rows: 0, total_size: 481, raw_data_size: 0]
Time taken: 0.19 seconds
[root@hadoop-server01 data]# hadoop fs -ls /user/hive/warehouse/t_user
Found 2 items
-rw-r--r-- 1 root supergroup 123 2018-07-08 19:57 /user/hive/warehouse/t_user/t_user.data.1
-rw-r--r-- 1 root supergroup 358 2018-07-08 20:25 /user/hive/warehouse/t_user/t_user.data.2
hive> select * from t_user;
U0001 Zhangsan G0001 10
U0002 Lisi G0001 12
U0003 Wangwu G0002 13
U0004 Liuneng G0002 18
U0005 Zhaosi G0004 29
U0004 Liuneng G0002 18
U0005 Zhaosi1 G0004 29
U0006 Zhaosi2 G0004 29
U0007 Zhaosi3 G0008 29
U0008 Zhaosi4 G0007 29
U0009 Zhaosi5 G0004 29
U00010 Zhaosi6 G0004 29
U00011 Zhaosi7 G0002 29
U00012 Zhaosi8 G0004 29
U00012 zhangsan G0004 29
U00012 Liuwu G0009 29
U00012 Zhangq G0004 29
U00012 Lilin G0004 29
U00012 Zhaoqi G0004 29