使用的是Linux mint 15 系统,hadoop-1.2.1。
首先将hadoop配置为伪分布模式,具体请参考:配置hadoop伪分布模式。
现有一个文件num.txt,内容如下:
123 1 23 231 333 001 234 543 1111每一行是一个数字,我们是要求出最大值。
~/hadoop-1.2.1/hadoop-core-1.2.1.jar在该项目下建立并编辑以下三个文件:
import java.io.*; import org.apache.hadoop.io.*; import org.apache.hadoop.mapred.*; public class MaxNumMapper extends MapReduceBase implements Mapper<LongWritable, Text, LongWritable, LongWritable>{ @Override public void map(LongWritable key, Text value, OutputCollector<LongWritable, LongWritable> output, Reporter reporter) throws IOException { String line = value.toString().trim(); Long num = Long.parseLong(line); Long numKey = (long) 1; if (line.length() != 0) { output.collect(new LongWritable(numKey), new LongWritable(num)); //所有的key都是1 } } }MaxNumReducer.java:
import java.io.*; import java.util.Iterator; import org.apache.hadoop.io.*; import org.apache.hadoop.mapred.*; public class MaxNumReducer extends MapReduceBase implements Reducer<LongWritable, LongWritable, LongWritable, LongWritable>{ @Override public void reduce(LongWritable key, Iterator<LongWritable> values, OutputCollector<LongWritable, LongWritable> output, Reporter reporter) throws IOException { long maxNum = Long.MIN_VALUE; while (values.hasNext()) { maxNum = Math.max(maxNum, values.next().get()); } output.collect(key, new LongWritable(maxNum)); } }MaxNum.java:
/* * 需要两个参数,第一个是用于输入数据的文本文件,第二个是输出目录名 */ import java.io.*; import org.apache.hadoop.fs.*; import org.apache.hadoop.io.*; import org.apache.hadoop.mapred.*; public class MaxNum { public static void main(String[] args) throws IOException { if (args.length != 2) { System.err.println("需要两个参数,第一个是用于输入数据的文本文件,第二个是输出目录名"); } JobConf conf = new JobConf(MaxNum.class); conf.setJobName("get max number"); FileInputFormat.addInputPath(conf, new Path(args[0])); FileOutputFormat.setOutputPath(conf, new Path(args[1])); conf.setMapperClass(MaxNumMapper.class); conf.setReducerClass(MaxNumReducer.class); conf.setOutputKeyClass(LongWritable.class); conf.setOutputValueClass(LongWritable.class); JobClient.runJob(conf); } }代码编辑完毕,右键点击项目,打开Export,将项目导出为jar文件maxnum.jar。
$ hadoop fs -put num.txt . $ hadoop fs -ls / drwxr-xr-x - letian supergroup 0 2013-10-22 22:07 /test drwxr-xr-x - letian supergroup 0 2013-10-22 21:58 /tmp drwxr-xr-x - letian supergroup 0 2013-10-23 14:58 /user $ hadoop fs -ls /user drwxr-xr-x - letian supergroup 0 2013-10-23 14:58 /user/letian $ hadoop fs -ls /user/letian Found 1 items -rw-r--r-- 3 letian supergroup 34 2013-10-23 14:58 /user/letian/num.txt可以看到,我以sunlt用户进入linux系统,"hadoop fs -put num.txt . "中hdfs位置使用了相对路径,于是乎num.txt放入了/user/sunlt/num.txt中。
$ hadoop jar maxnum.jar MaxNum num.txt result.txt查看运行结果:
$ hadoop fs -ls /user/letian Found 2 items -rw-r--r-- 3 letian supergroup 34 2013-10-23 14:58 /user/letian/num.txt drwxr-xr-x - letian supergroup 0 2013-10-23 14:59 /user/letian/result.txt实际上result.txt是个目录,是我命名失误了。
$ hadoop fs -copyToLocal /user/letian/result.txt result.txt $ ls result.txt/ _logs/ part-00000 _SUCCESS $ less result.txt/part-00000 $ cat result.txt/part-00000 1 1111nice~