RawComparator

阅读更多

RawComparator用于 Writable对象的比较,

例如:

Job.setSortComparatorClass(Class );
Job.setGroupingComparatorClass(Class );

 

 

能作为Key的 Writable有以下特征:

 必须实现 接口WritableComparable;

 一般都包含一个扩展自WritableComparator  的比较器类。

 

而 WritableComparator类,实现了 RawComparator接口。

 

public interface WritableComparable extends Writable, Comparable;

public interface RawComparator extends Comparator;

public class WritableComparator implements RawComparator;

 

 

说明其中一个方法:

public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2);

该方法以字节方式比较两个Writable对象

 

做个实验,

import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.WritableUtils;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

...
private static final Logger log = LoggerFactory.getLogger(...class);

public static void main (String[] args) {
	Text text = new Text(
		"01234567890123456789012345678901234567890123456789"
		+ "01234567890123456789012345678901234567890123456789"
		+ "01234567890123456789012345678901234567890123456789"
		+ "01234567890123456789012345678901234567890123456789"
		+ "01234567890123456789012345678901234567890123456789"
		+ "01234567890123456789012345678901234567890123456789");

	/*
	CharsetEncoder encoder = Charset.forName("UTF-8").newEncoder()
				.onMalformedInput(CodingErrorAction.REPORT)
				.onUnmappableCharacter(CodingErrorAction.REPORT);
	CharBuffer charBuffer = CharBuffer.wrap(text.toString().toCharArray());
	ByteBuffer byteBuffer = encoder.encode(charBuffer);
	int l1 = byteBuffer.limit();

	byte[] byteArray = byteBuffer.array();
	DataOutputBuffer out = new DataOutputBuffer();
	WritableUtils.writeVInt(out, l1);
	out.write(byteArray, 0, l1);
	out.close();
	byte[] b1 = out.getData();
    */
	int l1 = text.toString().length();
	byte[] b1 = WritableUtils.toByteArray(text);

	int s1 = 0;
	int n1 = WritableUtils.decodeVIntSize(b1[s1]);

	log.info("[{}, {}]", l1, n1);

	byte[] b2 = Arrays.copyOfRange(b1, s1 + n1, l1 + n1);
	log.info(new String(b2));
}

 

执行结果,

[303, 3]
012345678901234567890123456789012345678901...

 

Text 会在序列化的时候,在字节数组的最开始,标示字符串的实际长度。上例中的注释部分

class Text:
public void write(DataOutput out) throws IOException {
	WritableUtils.writeVInt(out, length);
	out.write(bytes, 0, length);
}
 

 

RawComparator comparator = new RawComparator {

	public int compare(Text t1, Text t2) { 
		return t1.toString.compareTo(t2.toString());
	}

	public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2) {
		int n1 = WritableUtils.decodeVIntSize(b1[s1]);
		int n2 = WritableUtils.decodeVIntSize(b2[s2]);

		// Text的比较是这么实现的 
		// WritableComparator.compareBytes(b1, s1 + n1, l1 - n1, b2, s2 + n2, l2 - n2);

		// 其实完全可以这么干
		byte[] _b1 = Arrays.copyOfRange(b1, s1 + n1, s1 + l1);
		byte[] _b2 = Arrays.copyOfRange(b2, s2 + n2, s2 + l2);
		String t1 = new String(_b1);
		String t2 = new String(_b2);
		return compare(new Text(t1), new Text(t2));
	}

}

你可能感兴趣的:(Hadoop,Apache)