一、检查之前创建的表在hdfs中的存储位置

这里假设有一个文件具有以下类数据,假设文件名称为user.txt

userid username orgid logintimes

U0001 Zhangsan G0001 10

U0002 Lisi G0001 12

U0003 Wangwu G0002 13

U0004 Liuneng G0002 18

U0005 Zhaosi G0004 29

数据已制表符\t分割

现在针对这种文件数据结构,利用hive创建一个存放这种结构数据的hive表

hive> create table t_user

   > (userid string,username string,orgid string,logintimes int)

   > row format delimited

   > fields terminated by '\t';

OK

Time taken: 0.689 seconds

查看表t_user在hdfs的存放目录

从截图可以看出,我们创建的表默认存放在/user/hive/warehouse/t_user目录下,该目录就是存放user.txt文件的位置

二、构造数据

[root@hadoop-server01 data]# pwd

/root/data

[root@hadoop-server01 data]# cat t_user.data.1

U0001   Zhangsan        G0001   10

U0002   Lisi            G0001   12

U0003   Wangwu          G0002   13

U0004   Liuneng         G0002   18

U0005   Zhaosi          G0004   29

三、将构造的数据上传到hdfs的/user/hive/warehouse/t_user目录下

[root@hadoop-server01 data]# hadoop fs -put t_user.data.1 /user/hive/warehouse/t_user

[root@hadoop-server01 data]# hadoop fs -ls /user/hive/warehouse/t_user

Found 1 items

-rw-r--r--   1 root supergroup        115 2018-07-08 19:45 /user/hive/warehouse/t_user/t_user.data.1

从截图看出表目录下已经有我们上传的文件

四、利用hive执行查询

hive> select * from t_user;

OK

U0001   Zhangsan        G0001   10

U0002   Lisi            G0001   12

U0003   Wangwu          G0002   13

U0004   Liuneng         G0002   18

U0005   Zhaosi          G0004   29

Time taken: 0.041 seconds, Fetched: 5 row(s)

从查询结果看出,完全和t_user.data.1 中数据一致

五、查看hive语句转换为MapReduce程序的过程

执行如下语句:

可以看出该count(*)语句通过hive翻译转换成了MapReduce程序提交给Hadoop集群执行,执行返回结果为5,符合结果要求

hive> select count(*) from t_user;

Total MapReduce jobs = 1

Launching Job 1 out of 1

Number of reduce tasks determined at compile time: 1

In order to change the average load for a reducer (in bytes):

 set hive.exec.reducers.bytes.per.reducer=

In order to limit the maximum number of reducers:

 set hive.exec.reducers.max=

In order to set a constant number of reducers:

 set mapred.reduce.tasks=

Starting Job = job_1531100516398_0001, Tracking URL = http://hadoop-server01:8088/proxy/application_1531100516398_0001/

Kill Command = /usr/local/apps/hadoop-2.4.1/bin/hadoop job  -kill job_1531100516398_0001

Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1

2018-07-08 20:00:56,964 Stage-1 map = 0%,  reduce = 0%

2018-07-08 20:01:02,138 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.71 sec

2018-07-08 20:01:03,162 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.71 sec

2018-07-08 20:01:04,187 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.71 sec

2018-07-08 20:01:05,215 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.71 sec

2018-07-08 20:01:06,241 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.71 sec

2018-07-08 20:01:07,272 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.71 sec

2018-07-08 20:01:08,303 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.71 sec

2018-07-08 20:01:09,329 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 1.42 sec

2018-07-08 20:01:10,353 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 1.42 sec

MapReduce Total cumulative CPU time: 1 seconds 420 msec

Ended Job = job_1531100516398_0001

MapReduce Jobs Launched:

Job 0: Map: 1  Reduce: 1   Cumulative CPU: 1.42 sec   HDFS Read: 345 HDFS Write: 2 SUCCESS

Total MapReduce CPU Time Spent: 1 seconds 420 msec

OK

5

Time taken: 20.262 seconds, Fetched: 1 row(s)

高级查询,统计按orgid分组查询,查询语句通过hive翻译转换为了MapReduce程序提交hadoop执行,查询结果符合要求

hive> select orgid,count(*) from t_user GROUP BY  orgid;

hive> select orgid,count(*) from t_user GROUP BY  orgid;  

Total MapReduce jobs = 1

Launching Job 1 out of 1

Number of reduce tasks not specified. Estimated from input data size: 1

In order to change the average load for a reducer (in bytes):

 set hive.exec.reducers.bytes.per.reducer=

In order to limit the maximum number of reducers:

 set hive.exec.reducers.max=

In order to set a constant number of reducers:

 set mapred.reduce.tasks=

Starting Job = job_1531100516398_0002, Tracking URL = http://hadoop-server01:8088/proxy/application_1531100516398_0002/

Kill Command = /usr/local/apps/hadoop-2.4.1/bin/hadoop job  -kill job_1531100516398_0002

Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1

2018-07-08 20:05:29,787 Stage-1 map = 0%,  reduce = 0%

2018-07-08 20:05:34,915 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.08 sec

2018-07-08 20:05:35,937 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.08 sec

2018-07-08 20:05:36,964 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.08 sec

2018-07-08 20:05:37,988 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.08 sec

2018-07-08 20:05:39,012 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 1.59 sec

2018-07-08 20:05:40,034 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 1.59 sec

MapReduce Total cumulative CPU time: 1 seconds 590 msec

Ended Job = job_1531100516398_0002

MapReduce Jobs Launched:

Job 0: Map: 1  Reduce: 1   Cumulative CPU: 1.59 sec   HDFS Read: 345 HDFS Write: 24 SUCCESS

Total MapReduce CPU Time Spent: 1 seconds 590 msec

OK

G0001   2

G0002   2

G0004   1

Time taken: 13.8 seconds, Fetched: 3 row(s)

特别备注如果要不显示打印的日志,可以采用静默模式启动Hive命令行,#hive -S

六、利用hive想hdfs中加载数据

之前我们使用hadoop命令手动将文件上传到 /user/hive/warehouse/t_user目录,现在我们使用hive命令load来加载

造一份数据t_user.data.2,数据内容如下

[root@hadoop-server01 data]# cat t_user.data.2

U0004   Liuneng         G0002   18

U0005   Zhaosi1         G0004   29

U0006   Zhaosi2         G0004   29

U0007   Zhaosi3         G0008   29

U0008   Zhaosi4         G0007   29

U0009   Zhaosi5         G0004   29

U00010  Zhaosi6         G0004   29

U00011  Zhaosi7         G0002   29

U00012  Zhaosi8         G0004   29

U00012  zhangsan        G0004   29

U00012  Liuwu           G0009   29

U00012  Zhangq          G0004   29

U00012  Lilin           G0004   29

U00012  Zhaoqi          G0004   29

现在利用hive load命令将数据加载到 /user/hive/warehouse/t_user目录下

hive> load data local inpath '/root/data/t_user.data.2' into table t_user;

Copying data from file:/root/data/t_user.data.2

Copying file: file:/root/data/t_user.data.2

Loading data to table default.t_user

Table default.t_user stats: [num_partitions: 0, num_files: 2, num_rows: 0, total_size: 481, raw_data_size: 0]

OK

Time taken: 0.19 seconds

检查文件是否上传到t_user目录,通过检查上传的文件正常,传到t_user目录

[root@hadoop-server01 data]# hadoop fs -ls  /user/hive/warehouse/t_user

Found 2 items

-rw-r--r--   1 root supergroup        123 2018-07-08 19:57 /user/hive/warehouse/t_user/t_user.data.1

-rw-r--r--   1 root supergroup        358 2018-07-08 20:25 /user/hive/warehouse/t_user/t_user.data.2

通过hive检查数据

hive> select * from t_user;

OK

U0001   Zhangsan        G0001   10

U0002   Lisi            G0001   12

U0003   Wangwu          G0002   13

U0004   Liuneng         G0002   18

U0005   Zhaosi          G0004   29

U0004   Liuneng         G0002   18

U0005   Zhaosi1         G0004   29

U0006   Zhaosi2         G0004   29

U0007   Zhaosi3         G0008   29

U0008   Zhaosi4         G0007   29

U0009   Zhaosi5         G0004   29

U00010  Zhaosi6         G0004   29

U00011  Zhaosi7         G0002   29

U00012  Zhaosi8         G0004   29

U00012  zhangsan        G0004   29

U00012  Liuwu           G0009   29

U00012  Zhangq          G0004   29

U00012  Lilin           G0004   29

U00012  Zhaoqi          G0004   29

检查数据正常