eclipse运行wordcount参数配置

要想wordcount在hadoop上运行,那么必须为wordcount程序指定输入路径和输出路径。输入路径是我们要进行词频统计的文本文件,在这里我们的文件名是20417.txt。而输出路径是词频统计结果存放的路径。如下图所示,是进行参数配置:WordCount.java->右键->Run As->Run Configuration

eclipse运行wordcount参数配置_第1张图片

上述的路径是HDFS中的路径,HDFS路径可以查看下图:


在图一中我们输入完输入输出路径以后,我们点击Apply,但是这个时候不能点击Run,因为这里的run是指在单机上run,而我们是要在hadoop集群上run,因此我们执行以下步骤:WordCount.java->右键->Run as->Run on hadoop

运行过程中console会提示一些信息,如下所示:

11/10/09 14:07:50 WARN conf.Configuration: DEPRECATED: hadoop-site.xml found in the classpath. Usage of hadoop-site.xml is deprecated. Instead use core-site.xml, mapred-site.xml and hdfs-site.xml to override properties of core-default.xml, mapred-default.xml and hdfs-default.xml respectively
11/10/09 14:07:50 INFO input.FileInputFormat: Total input paths to process : 1
11/10/09 14:07:50 INFO mapred.JobClient: Running job: job_201110091333_0001
11/10/09 14:07:51 INFO mapred.JobClient:  map 0% reduce 0%
11/10/09 14:07:59 INFO mapred.JobClient:  map 100% reduce 0%
11/10/09 14:08:12 INFO mapred.JobClient:  map 100% reduce 100%
11/10/09 14:08:14 INFO mapred.JobClient: Job complete: job_201110091333_0001
11/10/09 14:08:14 INFO mapred.JobClient: Counters: 17
11/10/09 14:08:14 INFO mapred.JobClient:   Job Counters 
11/10/09 14:08:14 INFO mapred.JobClient:     Launched reduce tasks=1
11/10/09 14:08:14 INFO mapred.JobClient:     Launched map tasks=1
11/10/09 14:08:14 INFO mapred.JobClient:     Data-local map tasks=1
11/10/09 14:08:14 INFO mapred.JobClient:   FileSystemCounters
11/10/09 14:08:14 INFO mapred.JobClient:     FILE_BYTES_READ=143076
11/10/09 14:08:14 INFO mapred.JobClient:     HDFS_BYTES_READ=674762
11/10/09 14:08:14 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=286184
11/10/09 14:08:14 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=205265
11/10/09 14:08:14 INFO mapred.JobClient:   Map-Reduce Framework
11/10/09 14:08:14 INFO mapred.JobClient:     Reduce input groups=0
11/10/09 14:08:14 INFO mapred.JobClient:     Combine output records=10015
11/10/09 14:08:14 INFO mapred.JobClient:     Map input records=12761
11/10/09 14:08:14 INFO mapred.JobClient:     Reduce shuffle bytes=0
11/10/09 14:08:14 INFO mapred.JobClient:     Reduce output records=0
11/10/09 14:08:14 INFO mapred.JobClient:     Spilled Records=20030
11/10/09 14:08:14 INFO mapred.JobClient:     Map output bytes=1082004
11/10/09 14:08:14 INFO mapred.JobClient:     Combine input records=112607
11/10/09 14:08:14 INFO mapred.JobClient:     Map output records=112607
11/10/09 14:08:14 INFO mapred.JobClient:     Reduce input records=10015
11/10/09 14:08:14 INFO input.FileInputFormat: Total input paths to process : 1
11/10/09 14:08:14 INFO mapred.JobClient: Running job: job_201110091333_0002
11/10/09 14:08:15 INFO mapred.JobClient:  map 0% reduce 0%
11/10/09 14:08:24 INFO mapred.JobClient:  map 100% reduce 0%
11/10/09 14:08:36 INFO mapred.JobClient:  map 100% reduce 100%
11/10/09 14:08:38 INFO mapred.JobClient: Job complete: job_201110091333_0002
11/10/09 14:08:38 INFO mapred.JobClient: Counters: 17
11/10/09 14:08:38 INFO mapred.JobClient:   Job Counters 
11/10/09 14:08:38 INFO mapred.JobClient:     Launched reduce tasks=1
11/10/09 14:08:38 INFO mapred.JobClient:     Launched map tasks=1
11/10/09 14:08:38 INFO mapred.JobClient:     Data-local map tasks=1
11/10/09 14:08:38 INFO mapred.JobClient:   FileSystemCounters
11/10/09 14:08:38 INFO mapred.JobClient:     FILE_BYTES_READ=143076
11/10/09 14:08:38 INFO mapred.JobClient:     HDFS_BYTES_READ=205265
11/10/09 14:08:38 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=286184
11/10/09 14:08:38 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=104533
11/10/09 14:08:38 INFO mapred.JobClient:   Map-Reduce Framework
11/10/09 14:08:38 INFO mapred.JobClient:     Reduce input groups=0
11/10/09 14:08:38 INFO mapred.JobClient:     Combine output records=0
11/10/09 14:08:38 INFO mapred.JobClient:     Map input records=10015
11/10/09 14:08:38 INFO mapred.JobClient:     Reduce shuffle bytes=0
11/10/09 14:08:38 INFO mapred.JobClient:     Reduce output records=0
11/10/09 14:08:38 INFO mapred.JobClient:     Spilled Records=20030
11/10/09 14:08:38 INFO mapred.JobClient:     Map output bytes=123040
11/10/09 14:08:38 INFO mapred.JobClient:     Combine input records=0
11/10/09 14:08:38 INFO mapred.JobClient:     Map output records=10015
11/10/09 14:08:38 INFO mapred.JobClient:     Reduce input records=10015
在运行完以后,HDFS中会产生词频统计结果,如下图所示:

eclipse运行wordcount参数配置_第2张图片

词频统计结果存放在part-r-00000这个文件中。

你可能感兴趣的:(eclipse,hadoop,properties,input,deprecated,output)