使用Hive进行mapreduce计算

创建表

hive> create table wordcount(line String);

加载数据

hive> load data inpath '/wcinput' overwrite into table wordcount;
Loading data to table default.wordcount
OK
hive> select * from wordcount;
OK
hello world
hello world
hello java
hello c
hello c++
hello java
Time taken: 0.16 seconds, Fetched: 6 row(s)
hive> select split(line, ' ') from wordcount;
OK
["hello","world"]
["hello","world"]
["hello","java"]
["hello","c"]
["hello","c++"]
["hello","java"]
Time taken: 0.571 seconds, Fetched: 6 row(s)
hive> select explode(split(line, ' ')) from wordcount;
OK
hello
world
hello
world
hello
java
hello
c
hello
c++
hello
java
Time taken: 0.183 seconds, Fetched: 12 row(s)

hive> select word, count(1) as count from (select explode(split(line, ' ')) as word from wordcount) w group by word;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20170929034857_f67738b5-8744-4694-8621-f855bcc57cf5
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=
In order to set a constant number of reducers:
  set mapreduce.job.reduces=
Starting Job = job_1506604275847_0002, Tracking URL = http://master:8088/proxy/application_1506604275847_0002/
Kill Command = /usr/local/hadoop/bin/hadoop job  -kill job_1506604275847_0002
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2017-09-29 03:49:17,089 Stage-1 map = 0%,  reduce = 0%
2017-09-29 03:49:29,157 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.04 sec
2017-09-29 03:49:37,721 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 3.48 sec
MapReduce Total cumulative CPU time: 3 seconds 480 msec
Ended Job = job_1506604275847_0002
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 3.48 sec   HDFS Read: 8812 HDFS Write: 180 SUCCESS
Total MapReduce CPU Time Spent: 3 seconds 480 msec
OK
c   1
c++ 1
hello   6
java    2
world   2
Time taken: 42.133 seconds, Fetched: 5 row(s)

你可能感兴趣的:(使用Hive进行mapreduce计算)