
在使用hbase实现点查询业务中,经常要用到二级索引的方式,而 hbase over solr 是一种比较灵活,性能较高的方案。




Exception in thread "main" java.lang.IllegalArgumentException: offset (0) + length (4) exceed the capacity of the array: 2
    at org.apache.hadoop.hbase.util.Bytes.explainWrongLengthOrOffset(Bytes.java:629)
    at org.apache.hadoop.hbase.util.Bytes.toInt(Bytes.java:799)
    at org.apache.hadoop.hbase.util.Bytes.toInt(Bytes.java:775)
    at transform.BytesTransform.main(BytesTransform.java:11)


比如你要将 29 这个int型数字放入到hbase的一个cell value中,可能在处理时有两种方式:

byte[] intBytes = Bytes.toInt(29)

byte[] stringBytes = Bytes.toInt("29")

在java的基本类型中, 29 是4个字节,而 "29" 其实是两个 char 类型,也就是 2个字节。



hbase-indexer在将hbase中的byte[] 转换为具体类型时,其实用的都是 org.apache.hadoop.hbase.util.Bytes 这个hbase的工具类,展示部分源码:

package com.ngdata.hbaseindexer.parse;

import java.math.BigDecimal;
import java.util.Collection;

import com.google.common.collect.ImmutableList;
import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import org.apache.hadoop.hbase.util.Bytes;

 * Contains factory methods for {@link ByteArrayValueMapper}s.
public class ByteArrayValueMappers {

    private static Log log = LogFactory.getLog(ByteArrayValueMappers.class);

    private static final ByteArrayValueMapper INT_MAPPER = new AbstractByteValueMapper(int.class) {

        protected Object mapInternal(byte[] input) {
            return Bytes.toInt(input);

    private static final ByteArrayValueMapper LONG_MAPPER = new AbstractByteValueMapper(long.class) {

        protected Object mapInternal(byte[] input) {
            return Bytes.toLong(input);

    private static final ByteArrayValueMapper STRING_MAPPER = new AbstractByteValueMapper(String.class) {

        protected Object mapInternal(byte[] input) {
            return Bytes.toString(input);



   public static ByteArrayValueMapper getMapper(String mapperType) {
        if ("int".equals(mapperType)) {
            return INT_MAPPER;
        } else if ("long".equals(mapperType)) {
            return LONG_MAPPER;
        } else if ("string".equals(mapperType)) {
            return STRING_MAPPER;
        } else if ("boolean".equals(mapperType)) {
            return BOOLEAN_MAPPER;
        } else if ("float".equals(mapperType)) {
            return FLOAT_MAPPER;
        } else if ("double".equals(mapperType)) {
            return DOUBLE_MAPPER;
        } else if ("short".equals(mapperType)) {
            return SHORT_MAPPER;
        } else if ("bigdecimal".equals(mapperType)) {
            return BIG_DECIMAL_MAPPER;
        } else {
            return instantiateCustomMapper(mapperType);

    private static ByteArrayValueMapper instantiateCustomMapper(String className) {
        Object obj;
        try {
           obj = Class.forName(className).newInstance();
        } catch (Throwable e) {
            throw new IllegalArgumentException("Can't instantiate custom mapper class '" + className + "'", e);

        if (obj instanceof ByteArrayValueMapper) {
            return (ByteArrayValueMapper)obj;
        } else {
            throw new IllegalArgumentException(obj.getClass() + " does not implement "
                    + ByteArrayValueMapper.class.getName());


我们可以看一下 hbase的 Bytes.toInt(byte[] bytes) 这个方法做了啥,实际是调用了下面这个方法,offset为0,length为4.

  public static int toInt(byte[] bytes, int offset, final int length) {
    if (length != SIZEOF_INT || offset + length > bytes.length) {
      throw explainWrongLengthOrOffset(bytes, offset, length, SIZEOF_INT);
      return toIntUnsafe(bytes, offset);
    } else {
      int n = 0;
      for(int i = offset; i < (offset + length); i++) {
        n <<= 8;
        n ^= bytes[i] & 0xFF;
      return n;

看一下 explainWrongLengthOrOffset 这个方法抛出的错误。

private static IllegalArgumentException
    explainWrongLengthOrOffset(final byte[] bytes,
                               final int offset,
                               final int length,
                               final int expectedLength) {
    String reason;
    if (length != expectedLength) {
      reason = "Wrong length: " + length + ", expected " + expectedLength;
    } else {
     reason = "offset (" + offset + ") + length (" + length + ") exceed the"
        + " capacity of the array: " + bytes.length;
    return new IllegalArgumentException(reason);



package customization.hyren.type;

import com.google.common.collect.ImmutableList;
import com.ngdata.hbaseindexer.parse.ByteArrayValueMapper;
import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import org.apache.hadoop.hbase.util.Bytes;

import java.util.Collection;

public class Hint implements ByteArrayValueMapper {

    private static Log log = LogFactory.getLog(Hint.class);
     * Map a byte array to a collection of values. The returned collection can be empty.

* If a value cannot be mapped as requested, it should log the error and return an empty collection. * * @param input byte array to be mapped * @return mapped values */ @Override public Collection map(byte[] input) { try { return ImmutableList.of(mapInternal(Bytes.toString(input))); } catch (IllegalArgumentException e) { log.warn( String.format("Error mapping byte value %s to %s", Bytes.toStringBinary(input), int.class.getName()), e); return ImmutableList.of(); } } private int mapInternal(String toString) { return Integer.parseInt(toString); } }



在实际使用中,这种hbase的数据同步是将hbase的rowkey写入到solr的uniqueKey中的,默认是id。假如你solr的uniqueKey不是默认的id,那么hbase-indexer的配置文件中也可以用 unique-key-field 来配置。



但是存在这样一个接口  UniqueTableKeyFormatter


这个接口是没有任何实现的,但是很明显的,虽然没有实现,但是在编码时,可以去实现这个方法,因为调用该接口的方法去指定documentId时传入了 tablename




package customization.hyren.formatter;

import com.google.common.base.Joiner;
import com.google.common.base.Preconditions;
import com.google.common.base.Splitter;
import com.google.common.collect.Lists;
import com.ngdata.hbaseindexer.conf.IndexerConf;
import com.ngdata.hbaseindexer.uniquekey.BaseUniqueKeyFormatter;
import com.ngdata.hbaseindexer.uniquekey.UniqueTableKeyFormatter;
import org.apache.hadoop.hbase.KeyValue;
import org.apache.hadoop.hbase.util.Bytes;

import java.util.List;

 *  customize value in solr uniquekey like ${table name} + ${ rowkey }
public class UniqueTableKeyFormatterImpl implements UniqueTableKeyFormatter {

    private static final HyphenEscapingUniqueKeyFormatter hyphenEscapingFormatter = new HyphenEscapingUniqueKeyFormatter();
    private static final Splitter SPLITTER = Splitter.onPattern("(?
     * Called as part of column-based mapping, {@link IndexerConf.MappingType#COLUMN}.
     * @param family    family bytes to be formatted
     * @param tableName
    public String formatFamily(byte[] family, byte[] tableName) {
        Preconditions.checkNotNull(family, "family");
        return encodeAsString(family);

     * Format a {@code KeyValue} into a human-readable form. Only the row, column family, and qualifier
     * of the {@code KeyValue} will be encoded.

* Called in case of column-based mapping, {@link IndexerConf.MappingType#COLUMN}. * * @param keyValue value to be formatted * @param tableName */ @Override public String formatKeyValue(KeyValue keyValue, byte[] tableName) { return hyphenEscapingFormatter.formatKeyValue(keyValue); } /** * Format a row key into a human-readable form. * * @param row row key to be formatted */ @Override public String formatRow(byte[] row) { Preconditions.checkNotNull(row, "row"); return encodeAsString(row); } /** * Format a column family value into a human-readable form. *

* Called as part of column-based mapping, {@link IndexerConf.MappingType#COLUMN}. * * @param family family bytes to be formatted */ @Override public String formatFamily(byte[] family) { Preconditions.checkNotNull(family, "family"); return encodeAsString(family); } /** * Format a {@code KeyValue} into a human-readable form. Only the row, column family, and qualifier * of the {@code KeyValue} will be encoded. *

* Called in case of column-based mapping, {@link IndexerConf.MappingType#COLUMN}. * * @param keyValue value to be formatted */ @Override public String formatKeyValue(KeyValue keyValue) { return JOINER.join(encodeAsString(keyValue.getRow()), encodeAsString(keyValue.getFamily()), encodeAsString(keyValue.getQualifier())); } /** * Perform the reverse formatting of a row key. * * @param keyString the formatted row key * @return the unformatted row key */ @Override public byte[] unformatRow(String keyString) { return decodeFromString(keyString); } /** * Perform the reverse formatting of a column family value. * * @param familyString the formatted column family string * @return the unformatted column family value */ @Override public byte[] unformatFamily(String familyString) { return decodeFromString(familyString); } /** * Perform the reverse formatting of a {@code KeyValue}. *

* The returned KeyValue will only have the row key, column family, and column qualifier filled in. * * @param keyValueString the formatted {@code KeyValue} * @return the unformatted {@code KeyValue} */ @Override public KeyValue unformatKeyValue(String keyValueString) { List parts = Lists.newArrayList(SPLITTER.split(keyValueString)); if (parts.size() != 3) { throw new IllegalArgumentException("Value cannot be split into row, column family, qualifier: " + keyValueString); } byte[] rowKey = decodeFromString(parts.get(0)); byte[] columnFamily = decodeFromString(parts.get(1)); byte[] columnQualifier = decodeFromString(parts.get(2)); return new KeyValue(rowKey, columnFamily, columnQualifier); } private static class HyphenEscapingUniqueKeyFormatter extends BaseUniqueKeyFormatter { @Override protected String encodeAsString(byte[] bytes) { String encoded = Bytes.toString(bytes); if (encoded.indexOf('-') > -1) { encoded = encoded.replace("-", "\\-"); } return encoded; } @Override protected byte[] decodeFromString(String value) { if (value.contains("\\-")) { value = value.replace("\\-", "-"); } return Bytes.toBytes(value); } } private String encodeAsString(byte[] bytes) { return Bytes.toString(bytes); } private byte[] decodeFromString(String value) { return Bytes.toBytes(value); } }

这个实现是根据默认的 StringUniqueKeyFormatter 实现去改了一小部分东西。

把你实现的类去打成jar包,放到 hbase-indexer-mr-*-job.jar 里面,还有 ${hbase-solr}/lib/目录下,就可以做Batch Indexing了。



注意:上面用到的 F- 开头,FI- 开头是solr的 dynamic field 的类型设置。不同类型的是不一样的。需要在solr的schema.xml中设置的,这里就不讲了。


yarn \
--config /etc/hive/conf \
jar /home/hyshf/mick/lib/hbase-indexer-mr-job.jar \
--conf /etc/hbase/conf/hbase-site.xml \
--zk-host hdp001:2181,hdp002:2181,hdp003:2181/solr \
-hbase-indexer-file /home/hyshf/mick/mick-solr-conf.xml \
--collection HbaseOverSolr \
--go-live \
--reducers 0

上面的具体示例可查看 https://github.com/mickyuan/hbaseindexer.git

