Hadoop 自定义key

自定义key简介

hadoop中自定义key的组成是由writable类型组成。如果用java的数据类型,最终还是要转换成writable类型。
自定义key要继承WritableComparable接口,原因参考文章
Hadoop 的Writable序列化接口

自定义key例子

public class MyKeyWritable implements WritableComparable {
    private static final Text.Comparator TEXT_COMPARATOR = new Text.Comparator();
    private static final IntWritable.Comparator INT_COMPARATOR = new IntWritable.Comparator();

    private Text value;
    private IntWritable flag;


    public MyKeyWritable() {
        this.set(new Text(), new IntWritable());
    }

    public MyKeyWritable(Text value, IntWritable flag) {
        this.set(value, flag);
    }

    public void set(Text value, IntWritable flag) {
        this.value = value;
        this.flag = flag;
    }

    public Text getValue() {
        return value;
    }

    public IntWritable getFlag() {
        return flag;
    }

    public void write(DataOutput out) throws IOException {
        this.value.write(out);
        this.flag.write(out);
    }

    public void readFields(DataInput in) throws IOException {
        this.value.readFields(in);
        this.flag.readFields(in);
    }

    @Override
    public int hashCode() {
        return super.hashCode();
    }

    @Override
    public boolean equals(Object obj) {
        if (!(obj instanceof MyKeyWritable))
            return false;
        MyKeyWritable sw = (MyKeyWritable) obj;
        return this.value.equals(sw.value) && this.flag.equals(sw.flag);
    }

    @Override
    public String toString() {
        return this.value.toString() + "|" + this.flag.get();
    }

    public static class Comparator extends WritableComparator {
        public Comparator() {
            super(MyKeyWritable.class);
        }
        @Override
        public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2) {
            int thisValueLen = WritableUtils.decodeVIntSize(b1[s1]) + readInt(b1, s1);
            int thatValueLen = WritableUtils.decodeVIntSize(b2[s2]) + readInt(b2, s2);

            int res1 = TEXT_COMPARATOR.compare(b1, s1, thisValueLen, b2, s2, thatValueLen);

             /*
                a negative integer, zero, or a positive integer, first
                argument is less than, equal to, or greater than the second
             */
            if (res1 != 0)
                return res1;
            int res2 = INT_COMPARATOR.compare(b1, s1 + thisValueLen, l1 - thisValueLen, b2,
                    s2 + thatValueLen, l2 - thatValueLen);
            return res2;
        }
    }

    public int compareTo(MyKeyWritable o) {
        int res = this.value.compareTo(o.value);
        if (res != 0)
            return res;
        return this.flag.compareTo(o.flag);
    }

    static {
        WritableComparator.define(MyKeyWritable.class, new Comparator());
    }
}

分析

自定义key 继承了WritableComparable 接口,实现了Writable接口的write(DataOutput out)和readFields(DataInput in)两个方法,也实现了Comparable 接口的compareTo(T o)的方法,并且实现了Object 的equals(Object obj)方法,到此一个自定义key就实现了

为什么要在用内部类实现WritableComparator类呢?
虽然实现了compareTo(MyKeyWritable o) ,但是他进行比较的时候必须是对象之间进行比较,在数据传递过程中已经将其反序列化成字节流,因此在比较时,需要将对象的字节流进行序列化,然后进行比较,序列化是要消耗资源和性能的,为了提高比较效率,实现WritableComparator类或者RawComparator接口,实现其compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2) 方法,就不需要序列化,以字节的方式去比较,效率得以提高。

TEXT_COMPARATOR 、INT_COMPARATOR 是Text和IntWritable里面WritableComparator的实现,我们可以直接去使用,只不过在自定义的时候对其进行了整合,为我所用。(这里可以浏览源码去了解)

下面代码是注册者个比较器
static {
WritableComparator.define(MyKeyWritable .class, new Comparator());
}

你可能感兴趣的:(Hadoop,2.x)