用eclipse编写MapReduce程序的基本要点

1.要想在eclipse上编写MapReduce,那么就需要在eclipse上安装hadoop插件,具体操作是将hadoop安装目录下的contrib/eclipse-plugin/hadoop-0.20.2-eclipse-plugin.jar复制到eclipse安装目录里的plugins目录中。

2.插件安装完成之后,就可以新建Map/Reduce Project了,在java文件中需要导入hadoop提供的一些包,具体有:

import org.apache.hadoop.conf.*;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.*;
import org.apache.hadoop.mapreduce.lib.input.*;
import org.apache.hadoop.mapreduce.lib.output.*;
import org.apache.hadoop.util.*;

java文件中,需要继承两大类Mapper和Reducer,然后重新实现map函数和reduce函数。在MapReduce主类中还要实现run方法和main方法,run方法中设置作业名以及对参数的处理。

3.运行MapReduce程序

在Hadoop上运行MapReduce程序有2中方法,第一种是直接在eclipse上输入所需参数运行,结果在控制台上打印出来,这种方便调试;设置参数如下图所示:


中间可能会出现“org.apache.hadoop.hdfs.server.namenode.SafeModeException:Cannot create directory。。。Name node is in safe mode”错误,这时就需要强制集群离开安全模式

bin/hadoop  dfsadmin -safemode leave

再次以参数的方式运行,得结果

用eclipse编写MapReduce程序的基本要点_第1张图片


Administrator@ML ~/hadoop-0.20.2
$ bin/hadoop dfs -ls
Found 4 items
-rw-r--r--   1 ml\root          supergroup         33 2013-12-28 12:10 /user/ml/root/input
drwxr-xr-x   - ml\root          supergroup          0 2013-12-28 12:13 /user/ml/root/output
drwxr-xr-x   - ml\administrator supergroup          0 2013-12-28 17:08 /user/ml/root/output_arg
drwxr-xr-x   - ml\root          supergroup          0 2013-12-28 15:45 /user/ml/root/output_ecl

Administrator@ML ~/hadoop-0.20.2
$ bin/hadoop dfs -cat output_arg/part*
I       1
Oh      1
am      1
father  1
hello   1
shit    1
your    1

结果与预期一致。


另外一种是将程序先生成jar包,顺便将生成的包发送至Hadoop安装目录下,

然后以命令行的方式执行该jar包,

Administrator@ML ~/hadoop-0.20.2
$  bin/hadoop  jar myWordCount.jar  MRTest  input  output_ecl
13/12/28 15:44:58 INFO input.FileInputFormat: Total input paths to process : 1
13/12/28 15:45:01 INFO mapred.JobClient: Running job: job_201312281151_0008
13/12/28 15:45:02 INFO mapred.JobClient:  map 0% reduce 0%
13/12/28 15:45:16 INFO mapred.JobClient:  map 100% reduce 0%
13/12/28 15:45:28 INFO mapred.JobClient:  map 100% reduce 100%
13/12/28 15:45:30 INFO mapred.JobClient: Job complete: job_201312281151_0008
13/12/28 15:45:30 INFO mapred.JobClient: Counters: 17
13/12/28 15:45:30 INFO mapred.JobClient:   Job Counters
13/12/28 15:45:30 INFO mapred.JobClient:     Launched reduce tasks=1
13/12/28 15:45:30 INFO mapred.JobClient:     Launched map tasks=1
13/12/28 15:45:30 INFO mapred.JobClient:     Data-local map tasks=1
13/12/28 15:45:30 INFO mapred.JobClient:   FileSystemCounters
13/12/28 15:45:30 INFO mapred.JobClient:     FILE_BYTES_READ=160
13/12/28 15:45:30 INFO mapred.JobClient:     HDFS_BYTES_READ=33
13/12/28 15:45:30 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=271
13/12/28 15:45:30 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=45
13/12/28 15:45:30 INFO mapred.JobClient:   Map-Reduce Framework
13/12/28 15:45:30 INFO mapred.JobClient:     Reduce input groups=7
13/12/28 15:45:30 INFO mapred.JobClient:     Combine output records=7
13/12/28 15:45:30 INFO mapred.JobClient:     Map input records=2
13/12/28 15:45:30 INFO mapred.JobClient:     Reduce shuffle bytes=0
13/12/28 15:45:30 INFO mapred.JobClient:     Reduce output records=7
13/12/28 15:45:30 INFO mapred.JobClient:     Spilled Records=14
13/12/28 15:45:30 INFO mapred.JobClient:     Map output bytes=59
13/12/28 15:45:30 INFO mapred.JobClient:     Combine input records=7
13/12/28 15:45:30 INFO mapred.JobClient:     Map output records=7
13/12/28 15:45:30 INFO mapred.JobClient:     Reduce input records=7
Administrator@ML ~/hadoop-0.20.2
$ bin/hadoop dfs -ls output_ecl/*
drwxr-xr-x   - ml\root supergroup          0 2013-12-28 15:45 /user/ml/root/output_ecl/_logs/history
-rw-r--r--   1 ml\root supergroup         45 2013-12-28 15:45 /user/ml/root/output_ecl/part-r-00000

Administrator@ML ~/hadoop-0.20.2
$ bin/hadoop dfs -cat output_ecl/part*
I       1
Oh      1
am      1
father  1
hello   1
shit    1
your    1

程序完美执行!

你可能感兴趣的:(eclipse,mapreduce,hadoop)