如果没有意外,在你的eclipse的右上角应该出现了一只蓝色的大象logo,请点击那只大象。之后,在正下方的区域将会多出一项Map/Reduce Locations的选项卡,点击该选项卡,并右键新建New Hadoop Location。
Location name 指的是当前创建的链接名字,可以任意指定;Map/Reduce Master 指的是执行MR的主机地址,并且需要给定hdfs协议的通讯地址; DFS Master 指的是Distribution File System的主机地址,并且需要给定hdfs协议的通讯地址; User name 指定的是链接至Hadoop的用户名。
参数名 | 配置参数 | 说明 |
Location name | hadoop | |
MapReduce Master | Host: | NameNode 的IP地址 |
MapReduce Master | Port: 8021 | MapReduce Port,参考自己配置的mapred-site.xml |
DFS Master | Port: 8020 | DFS Port,参考自己配置的core-site.xml |
User name | hadoop |
之后,切换到Advanced parameters,而你需要修改的有如下参数
参数名 | 配置参数 | 说明 |
fs.default.name | hdfs:// | 参考core-site.xml |
hadoop.tmp.dir | /home/hadoop/hadoopdata/tmp | 参考core-site.xml |
mapred.job.tracker | hdfs:// | 参考mapred-site.xml |
./bin/hadoop fs -chmod -R 777 /
之后,打开 eclipse -> Preferences -> Hadoop Map/Reduce,将解压后的路径添加在 hadoop installation directory 中,并点击apply使设置生效。
这个时候,我们可以试着编译一两个Hadoop程序, File -> Map/Reduce -> Map/Reduce Project 或者直接通过 Project Wizzard 新建一个Hadoop项目,并命名该项目为 Hadoop Test。
我们的第一个程序是 wordcount, 源代码可以从 ..\hadoop-1.2.1\src\examples\org\apache\hadoop\examples 中获得。
package org.apache.hadoop.examples;
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
public class WordCount {
public static class TokenizerMapper
extends Mapper<Object, Text, Text, IntWritable>{
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(Object key, Text value, Context context
) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
context.write(word, one);
public static class IntSumReducer
extends Reducer<Text,IntWritable,Text,IntWritable> {
private IntWritable result = new IntWritable();
public void reduce(Text key, Iterable<IntWritable> values,
Context context
) throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
context.write(key, result);
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
if (otherArgs.length != 2) {
System.err.println("Usage: wordcount <in> <out>");
Job job = new Job(conf, "word count");
FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
这里面,为了方便,我们直接贴出该代码。准备好后,就可以直接点击 Run 命令,对代码进行编译。不过在编译前,会弹出一个小窗口,选择 Run on Hadoop,并确认。
Usage: wordcount <in> <out>
WordCount例程,需要输入文件,并且需要指定输出的文件存放目录。因此,我们还需要为程序设定参数。方法是,在Run命令下,选择Run Configurations。
在 Arguments 选项卡中,Program arguments一栏里,指定输入和输出的参数。
我们给定的需要进行统计的文本存放在 /Data/words。
Mary had a little lamb
its fleece very white as snow
and everywhere that Mary went
the lamb was sure to go
hdfs:// hdfs://
Mary 2
a 1
and 1
as 1
everywhere 1
fleece 1
go 1
had 1
its 1
lamb 2
little 1
snow 1
sure 1
that 1
the 1
to 1
very 1
was 1
went 1
white 1