文章标题

Hadoop系列学习-MapReduce的排序与自定义排序

默认排序

由于Hadoop默认是根据key去排序的。


实现效果:
排序前:
1991 06
1991 08
1991 07
1989 01
1979 02
1990 03
2000 04
排序后:
1979 1979 02
1989 1989 01
1990 1990 03
1991 1991 06
1991 1991 08
1991 1991 07
2000 2000 04
是针对第一列的key进行排序。

Map
public static class ComparedDefaultMap extends Mapper<LongWritable, Text, LongWritable, Text>{
        String line;
        @Override
        protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
            line = value.toString();
            String [] str = line.split(" ");
            if(str.length > 1){
                long l = Long.parseLong(str[0]);
                context.write(new LongWritable(l), value);
            }
        }
    }

Reduce阶段

public static class ComparedDefaultReduce extends Reducer<LongWritable, Text, LongWritable, Text>{
        @Override
        protected void reduce(LongWritable key, Iterable values, Context context) throws IOException, InterruptedException {
            for (Text value: values){
                context.write(key, value);
            }
        }
    }

根据默认排序的结果
1979 1979 02
1989 1989 01
1990 1990 03
1991 1991 06
1991 1991 08
1991 1991 07
2000 2000 04
但是希望获得的是:

1979 1979 02
1989 1989 01
1990 1990 03
1991 1991 06
1991 1991 07
1991 1991 08
2000 2000 04


所以需要自定义排序

自定义Writable

/**
 * 自定义Writable
 * Created with IntelliJ IDEA.
 * User: Administrator
 * Date: 15-5-20
 * Time: 上午11:17
 * To change this template use File | Settings | File Templates.
 */
public class ComparedKey implements WritableComparable {
    long first;
    long second;

    public ComparedKey() {
    }
    public ComparedKey(long first, long second) {
        this.first = first;
        this.second = second;
    }

    @Override
    public int compareTo(ComparedKey o) {
        long min = first - o.first;
        if(min != 0){
            return (int)min;
        }
        return (int)(second-o.second);
    }

    @Override
    public void write(DataOutput out) throws IOException {
        out.writeLong(first);
        out.writeLong(second);
    }

    @Override
    public void readFields(DataInput in) throws IOException {
        first=in.readLong();
        second=in.readLong();
    }
}

Map阶段

public static class ComparedMapper extends Mapper<LongWritable, Text, ComparedKey, Text>{
        String line;
        @Override
        protected void map(LongWritable key, Text value, Mapper.Context context) throws IOException, InterruptedException {
            line = value.toString();
            System.out.println(line);
            ComparedKey comparedKey = null;
            String [] str = line.split(" ");
            if(str.length>2){
                long first = Long.parseLong(str[0]);
                long second = Long.parseLong(str[1]);
                System.out.println("SUCCESS");
                comparedKey = new ComparedKey(first, second);
            }
            if(comparedKey != null){
                System.out.println("write SUCCESS");
                context.write(comparedKey, new Text(line));
            }
        }
    }

Reduce阶段

 public static class CompareReducer extends Reducer<ComparedKey, Text, LongWritable, Text>{
        @Override
        protected void reduce(ComparedKey key, Iterable values, Reducer.Context context) throws IOException, InterruptedException {
            for(Text text : values){
                System.out.println();
                context.write(new LongWritable(key.first), text);
            }
        }
    }

执行Job中

 Job job = new Job(configuration, "compared");
        job.setJarByClass(ComparedToTest.class);
        job.setMapperClass(ComparedMapper.class);
        job.setReducerClass(CompareReducer.class);
        job.setMapOutputKeyClass(ComparedKey.class);
        job.setOutputKeyClass(LongWritable.class);
        job.setOutputValueClass(Text.class);

最后结果:

1979 1979 02
1989 1989 01
1990 1990 03
1991 1991 06
1991 1991 07
1991 1991 08
2000 2000 04

你可能感兴趣的:(Hadoop,hadoop,排序,自定义排序)