hadoop实战(1)求一堆数字中的最大值

使用的是Linux mint 15 系统,hadoop-1.2.1。 

首先将hadoop配置为伪分布模式,具体请参考:配置hadoop伪分布模式
现有一个文件num.txt,内容如下:

123
1
23
231
333
001
234
543
1111
每一行是一个数字,我们是要求出最大值。

我们在eclipse中建立项目maxnum-hadoop,在build path中加入外部jar文件:
~/hadoop-1.2.1/hadoop-core-1.2.1.jar
在该项目下建立并编辑以下三个文件: 
MaxNumMapper.java:
import java.io.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapred.*;
public class MaxNumMapper extends MapReduceBase 
implements Mapper<LongWritable, Text, LongWritable, LongWritable>{  
    @Override
    public void map(LongWritable key, Text value,
            OutputCollector<LongWritable, LongWritable> output, Reporter reporter)
            throws IOException {
        String line = value.toString().trim();
        Long num = Long.parseLong(line);
        Long numKey = (long) 1;
        if (line.length() != 0) {
            output.collect(new LongWritable(numKey), new LongWritable(num)); //所有的key都是1
        }
    }
}
MaxNumReducer.java:
import java.io.*;
import java.util.Iterator;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapred.*;
public class MaxNumReducer extends MapReduceBase 
implements Reducer<LongWritable, LongWritable, LongWritable, LongWritable>{
    @Override
    public void reduce(LongWritable key, Iterator<LongWritable> values,
            OutputCollector<LongWritable, LongWritable> output, Reporter reporter)
            throws IOException {
        long maxNum = Long.MIN_VALUE;
        while (values.hasNext()) {
            maxNum = Math.max(maxNum, values.next().get());
        }
        output.collect(key, new LongWritable(maxNum)); 
    }
}
MaxNum.java:
/*
 * 需要两个参数,第一个是用于输入数据的文本文件,第二个是输出目录名
 */
import java.io.*;
import org.apache.hadoop.fs.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapred.*;
public class MaxNum {
    public static void main(String[] args) throws IOException {
        if (args.length != 2) {
            System.err.println("需要两个参数,第一个是用于输入数据的文本文件,第二个是输出目录名");
        }
        JobConf conf = new JobConf(MaxNum.class);
        conf.setJobName("get max number");
        FileInputFormat.addInputPath(conf, new Path(args[0]));
        FileOutputFormat.setOutputPath(conf, new Path(args[1]));
        conf.setMapperClass(MaxNumMapper.class);
        conf.setReducerClass(MaxNumReducer.class);
        conf.setOutputKeyClass(LongWritable.class);
        conf.setOutputValueClass(LongWritable.class);
        JobClient.runJob(conf);
    }
}
代码编辑完毕,右键点击项目,打开Export,将项目导出为jar文件maxnum.jar。

下面开始测试代码:
运行start-dfs.sh、start-mapred.sh开启HDFS和Hadoop。将num.txt放入hdfs中:  
$ hadoop fs -put num.txt .
$ hadoop fs -ls /
drwxr-xr-x   - letian supergroup          0 2013-10-22 22:07 /test
drwxr-xr-x   - letian supergroup          0 2013-10-22 21:58 /tmp
drwxr-xr-x   - letian supergroup          0 2013-10-23 14:58 /user
$ hadoop fs -ls /user
drwxr-xr-x   - letian supergroup          0 2013-10-23 14:58 /user/letian
$ hadoop fs -ls /user/letian
Found 1 items
-rw-r--r--   3 letian supergroup         34 2013-10-23 14:58 /user/letian/num.txt
可以看到,我以sunlt用户进入linux系统,"hadoop fs -put num.txt . "中hdfs位置使用了相对路径,于是乎num.txt放入了/user/sunlt/num.txt中。 

运行我们的MR程序: 
$ hadoop  jar maxnum.jar MaxNum  num.txt result.txt
查看运行结果:
$ hadoop fs -ls /user/letian
Found 2 items
-rw-r--r--   3 letian supergroup         34 2013-10-23 14:58 /user/letian/num.txt
drwxr-xr-x   - letian supergroup          0 2013-10-23 14:59 /user/letian/result.txt
实际上result.txt是个目录,是我命名失误了。   

然后,我们将result.txt存放到本地,并查看结果: 
$ hadoop fs -copyToLocal /user/letian/result.txt result.txt
$ ls result.txt/ 
_logs/      part-00000  _SUCCESS    
$ less result.txt/part-00000 
$ cat result.txt/part-00000 
1	1111
nice~



你可能感兴趣的:(hadoop,最大值)