RawComparator用于 Writable对象的比较,
例如:
Job.setSortComparatorClass(Class <? extends RowComparator>);
Job.setGroupingComparatorClass(Class <? extends RowComparator>);
能作为Key的 Writable有以下特征:
必须实现 接口WritableComparable;
一般都包含一个扩展自WritableComparator 的比较器类。
而 WritableComparator类,实现了 RawComparator接口。
public interface WritableComparable<T> extends Writable, Comparable<T>;
public interface RawComparator<T> extends Comparator<T>;
public class WritableComparator implements RawComparator;
说明其中一个方法:
public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2);
该方法以字节方式比较两个Writable对象
做个实验,
import org.apache.hadoop.io.Text; import org.apache.hadoop.io.WritableUtils; import org.slf4j.Logger; import org.slf4j.LoggerFactory; ... private static final Logger log = LoggerFactory.getLogger(...class); public static void main (String[] args) { Text text = new Text( "01234567890123456789012345678901234567890123456789" + "01234567890123456789012345678901234567890123456789" + "01234567890123456789012345678901234567890123456789" + "01234567890123456789012345678901234567890123456789" + "01234567890123456789012345678901234567890123456789" + "01234567890123456789012345678901234567890123456789"); /* CharsetEncoder encoder = Charset.forName("UTF-8").newEncoder() .onMalformedInput(CodingErrorAction.REPORT) .onUnmappableCharacter(CodingErrorAction.REPORT); CharBuffer charBuffer = CharBuffer.wrap(text.toString().toCharArray()); ByteBuffer byteBuffer = encoder.encode(charBuffer); int l1 = byteBuffer.limit(); byte[] byteArray = byteBuffer.array(); DataOutputBuffer out = new DataOutputBuffer(); WritableUtils.writeVInt(out, l1); out.write(byteArray, 0, l1); out.close(); byte[] b1 = out.getData(); */ int l1 = text.toString().length(); byte[] b1 = WritableUtils.toByteArray(text); int s1 = 0; int n1 = WritableUtils.decodeVIntSize(b1[s1]); log.info("[{}, {}]", l1, n1); byte[] b2 = Arrays.copyOfRange(b1, s1 + n1, l1 + n1); log.info(new String(b2)); }
执行结果,
[303, 3] 012345678901234567890123456789012345678901...
Text 会在序列化的时候,在字节数组的最开始,标示字符串的实际长度。上例中的注释部分
class Text: public void write(DataOutput out) throws IOException { WritableUtils.writeVInt(out, length); out.write(bytes, 0, length); }
RawComparator comparator = new RawComparator<Text> { public int compare(Text t1, Text t2) { return t1.toString.compareTo(t2.toString()); } public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2) { int n1 = WritableUtils.decodeVIntSize(b1[s1]); int n2 = WritableUtils.decodeVIntSize(b2[s2]); // Text的比较是这么实现的 // WritableComparator.compareBytes(b1, s1 + n1, l1 - n1, b2, s2 + n2, l2 - n2); // 其实完全可以这么干 byte[] _b1 = Arrays.copyOfRange(b1, s1 + n1, s1 + l1); byte[] _b2 = Arrays.copyOfRange(b2, s2 + n2, s2 + l2); String t1 = new String(_b1); String t2 = new String(_b2); return compare(new Text(t1), new Text(t2)); } }