1.要想在eclipse上编写MapReduce,那么就需要在eclipse上安装hadoop插件,具体操作是将hadoop安装目录下的contrib/eclipse-plugin/hadoop-0.20.2-eclipse-plugin.jar复制到eclipse安装目录里的plugins目录中。
2.插件安装完成之后,就可以新建Map/Reduce Project了,在java文件中需要导入hadoop提供的一些包,具体有:
import org.apache.hadoop.conf.*; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.*; import org.apache.hadoop.mapreduce.*; import org.apache.hadoop.mapreduce.lib.input.*; import org.apache.hadoop.mapreduce.lib.output.*; import org.apache.hadoop.util.*;
java文件中,需要继承两大类Mapper和Reducer,然后重新实现map函数和reduce函数。在MapReduce主类中还要实现run方法和main方法,run方法中设置作业名以及对参数的处理。
3.运行MapReduce程序
在Hadoop上运行MapReduce程序有2中方法,第一种是直接在eclipse上输入所需参数运行,结果在控制台上打印出来,这种方便调试;设置参数如下图所示:
中间可能会出现“org.apache.hadoop.hdfs.server.namenode.SafeModeException:Cannot create directory。。。Name node is in safe mode”错误,这时就需要强制集群离开安全模式
bin/hadoop dfsadmin -safemode leave
Administrator@ML ~/hadoop-0.20.2 $ bin/hadoop dfs -ls Found 4 items -rw-r--r-- 1 ml\root supergroup 33 2013-12-28 12:10 /user/ml/root/input drwxr-xr-x - ml\root supergroup 0 2013-12-28 12:13 /user/ml/root/output drwxr-xr-x - ml\administrator supergroup 0 2013-12-28 17:08 /user/ml/root/output_arg drwxr-xr-x - ml\root supergroup 0 2013-12-28 15:45 /user/ml/root/output_ecl Administrator@ML ~/hadoop-0.20.2 $ bin/hadoop dfs -cat output_arg/part* I 1 Oh 1 am 1 father 1 hello 1 shit 1 your 1
另外一种是将程序先生成jar包,顺便将生成的包发送至Hadoop安装目录下,
然后以命令行的方式执行该jar包,
Administrator@ML ~/hadoop-0.20.2 $ bin/hadoop jar myWordCount.jar MRTest input output_ecl 13/12/28 15:44:58 INFO input.FileInputFormat: Total input paths to process : 1 13/12/28 15:45:01 INFO mapred.JobClient: Running job: job_201312281151_0008 13/12/28 15:45:02 INFO mapred.JobClient: map 0% reduce 0% 13/12/28 15:45:16 INFO mapred.JobClient: map 100% reduce 0% 13/12/28 15:45:28 INFO mapred.JobClient: map 100% reduce 100% 13/12/28 15:45:30 INFO mapred.JobClient: Job complete: job_201312281151_0008 13/12/28 15:45:30 INFO mapred.JobClient: Counters: 17 13/12/28 15:45:30 INFO mapred.JobClient: Job Counters 13/12/28 15:45:30 INFO mapred.JobClient: Launched reduce tasks=1 13/12/28 15:45:30 INFO mapred.JobClient: Launched map tasks=1 13/12/28 15:45:30 INFO mapred.JobClient: Data-local map tasks=1 13/12/28 15:45:30 INFO mapred.JobClient: FileSystemCounters 13/12/28 15:45:30 INFO mapred.JobClient: FILE_BYTES_READ=160 13/12/28 15:45:30 INFO mapred.JobClient: HDFS_BYTES_READ=33 13/12/28 15:45:30 INFO mapred.JobClient: FILE_BYTES_WRITTEN=271 13/12/28 15:45:30 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=45 13/12/28 15:45:30 INFO mapred.JobClient: Map-Reduce Framework 13/12/28 15:45:30 INFO mapred.JobClient: Reduce input groups=7 13/12/28 15:45:30 INFO mapred.JobClient: Combine output records=7 13/12/28 15:45:30 INFO mapred.JobClient: Map input records=2 13/12/28 15:45:30 INFO mapred.JobClient: Reduce shuffle bytes=0 13/12/28 15:45:30 INFO mapred.JobClient: Reduce output records=7 13/12/28 15:45:30 INFO mapred.JobClient: Spilled Records=14 13/12/28 15:45:30 INFO mapred.JobClient: Map output bytes=59 13/12/28 15:45:30 INFO mapred.JobClient: Combine input records=7 13/12/28 15:45:30 INFO mapred.JobClient: Map output records=7 13/12/28 15:45:30 INFO mapred.JobClient: Reduce input records=7
Administrator@ML ~/hadoop-0.20.2 $ bin/hadoop dfs -ls output_ecl/* drwxr-xr-x - ml\root supergroup 0 2013-12-28 15:45 /user/ml/root/output_ecl/_logs/history -rw-r--r-- 1 ml\root supergroup 45 2013-12-28 15:45 /user/ml/root/output_ecl/part-r-00000
Administrator@ML ~/hadoop-0.20.2 $ bin/hadoop dfs -cat output_ecl/part* I 1 Oh 1 am 1 father 1 hello 1 shit 1 your 1