RoaringBitmap运行机制解析

用途

用于将int类型转换成bitmap类型

public static RoaringBitmap bitmapOf(final int... dat) {
    final RoaringBitmap ans = new RoaringBitmap();
    ans.add(dat);
    return ans;
  }

运行机制

  1. 初始化bitmap
    final RoaringBitmap ans = new RoaringBitmap();
    在初始化过程中,无参构造器会新建一个RoaringArray类对象,该对象被赋值给成员变量highLowContainer。hightLowContainer包含两个重要成员变量:高位值数组和低位值数组,它们的初始容量为4,高位是short类型数组,低位是Container类型数组。
    一个int类型数据(占32位)按16位均分成两部分:高位数值+低位数值
  public RoaringBitmap() {
    highLowContainer = new RoaringArray();
  }

RoaringArray类

public final class RoaringArray implements Cloneable, Externalizable {
  static final int INITIAL_CAPACITY = 4;
  short[] keys = null;
  Container[] values = null;
  protected RoaringArray() {
    this.keys = new short[INITIAL_CAPACITY];
    this.values = new Container[INITIAL_CAPACITY];
  }
}
  1. 往bitmap中增加新数据
    目前Container容器分三种:ArrayContainer/BitmapContainer/RunContainer,成员变量cardinality表示容器中包含的元素个数。
    前两种容器类型随着数据量的变化自动切换:ArrayContainer (0<=数据量<4096,2的12次方) -> BitmapContainer (4096<=数据量<65536,2的16次方)
    RunContainer容器可以通过调用runOptimize方法,将ArrayContainer容器或BitmapContainer容器转变为RunContainer,达到降低容器所占的内存空间。但需要注意的是,调用runOptimize方法后,容器的类型并非一定发生改变,因为此方法实现了内存占用检测机制,如果转变为RunContainer后,所需内存变小,那么容器类型将被改变,反之保持不变。
    第一个元素:
    初始化一个ArrayContainer类型的容器,其本质是初始容量为4的short类型数组。
    计算第一个元素的高位数值,将其放入highLowContainer的short数组下标为0的位置,然后再计算出低位数值,通过插入方式放入ArrayContainer容器的指定位置(所有元素有序),最后将ArrayContainer容器放入highLowContainer的Container数组下标为0的位置。
    后续元素:
    计算元素的高位数值,在highLowContainer的short类型数组中判断是否存在该数值(此处有过优化,如果最后一个元素恰好等于该数值,直接返回下标索引;否则通过混合二分查找算法返回下标索引),如果存在,那么将元素的低位数值添加到Container类型数组相同位置的容器中(容器为三类容器中的某一种,因为容器中的数据量决定容器的类型);如果不存在,那么将元素的高位数值追加到highLowContainer的short数组的末尾,并将低位数值放入新建的ArrayContainer容器后追加到highLowContainer的Container数组末尾。
public class RoaringBitmap implements Cloneable, Serializable, Iterable<Integer>, Externalizable,
    ImmutableBitmapDataProvider, BitmapDataProvider {
    /**
 1. Set all the specified values  to true. This can be expected to be slightly
 2. faster than calling "add" repeatedly. The provided integers values don't
 3. have to be in sorted order, but it may be preferable to sort them from a performance point of
 4. view.
 5.  6. @param dat set values
   */
  public void add(final int... dat) {
    Container currentcont = null;
    short currenthb = 0;
    int currentcontainerindex = 0;
    int j = 0;
    if(j < dat.length) {
      int val = dat[j];
      currenthb = Util.highbits(val);
      currentcontainerindex = highLowContainer.getIndex(currenthb);
      if (currentcontainerindex >= 0) {
        currentcont = highLowContainer.getContainerAtIndex(currentcontainerindex);
        Container newcont = currentcont.add(Util.lowbits(val));
        if(newcont != currentcont) {
          highLowContainer.setContainerAtIndex(currentcontainerindex, newcont);
          currentcont = newcont;
        }
      } else {
        currentcontainerindex = - currentcontainerindex - 1;
        final ArrayContainer newac = new ArrayContainer();
        currentcont = newac.add(Util.lowbits(val));
        highLowContainer.insertNewKeyValueAt(currentcontainerindex, currenthb, currentcont);
      }
      j++;
    }
    for( ; j < dat.length; ++j) {
      int val = dat[j];
      short newhb = Util.highbits(val);
      if(currenthb == newhb) {// easy case
        // this could be quite frequent
        Container newcont = currentcont.add(Util.lowbits(val));
        if(newcont != currentcont) {
          highLowContainer.setContainerAtIndex(currentcontainerindex, newcont);
          currentcont = newcont;
        }
      } else {
        currenthb = newhb;
        currentcontainerindex = highLowContainer.getIndex(currenthb);
        if (currentcontainerindex >= 0) {
          currentcont = highLowContainer.getContainerAtIndex(currentcontainerindex);
          Container newcont = currentcont.add(Util.lowbits(val));
          if(newcont != currentcont) {
            highLowContainer.setContainerAtIndex(currentcontainerindex, newcont);
            currentcont = newcont;
          }
        } else {
          currentcontainerindex = - currentcontainerindex - 1;
          final ArrayContainer newac = new ArrayContainer();
          currentcont = newac.add(Util.lowbits(val));
          highLowContainer.insertNewKeyValueAt(currentcontainerindex, currenthb, currentcont);
        }
      }
    }
  }
    }
  1. ArrayContainer晋升为BitmapContainer
    当ArrayContainer容器中的元素个数cardinality大于等于默认最大容量4096(2^12)时,就会晋升为BitmapContainer容器。
    BitmapContainer类型容器本质是一个long类型数组,在初始化时,数组容量一次性被指定为1024(即65536 / 64)。
    在晋升过程中完成两件事,第一件事是把ArrayContainer的元素个数cardinality赋值给BitmapContainer的成员变量cardinality,第二件事是将ArrayContainer拥有的元素会逐个复制到BitmapContainer中,具体的方法为:当元素除以64等于0,1,2,…时,这些元素被分成不同组,每组元素按位运算符“或”合并成一个long数值后被添加到BitmapContainer的long类型数组中
public final class ArrayContainer extends Container implements Cloneable {
	private static final int DEFAULT_INIT_SIZE = 4;
	static final int DEFAULT_MAX_SIZE = 4096;
	
	@Override
  	public Container add(final short x) {
	    int loc = Util.unsignedBinarySearch(content, 0, cardinality, x);
	    if (loc < 0) {
	      // Transform the ArrayContainer to a BitmapContainer
	      // when cardinality = DEFAULT_MAX_SIZE
	      if (cardinality >= DEFAULT_MAX_SIZE) {
	        BitmapContainer a = this.toBitmapContainer();
	        a.add(x);
	        return a;
	      }
	      if (cardinality >= this.content.length) {
	        increaseCapacity();
	      }
	      // insertion : shift the elements > x by one position to
	      // the right
	      // and put x in it's appropriate place
	      System.arraycopy(content, -loc - 1, content, -loc, cardinality + loc + 1);
	      content[-loc - 1] = x;
	      ++cardinality;
	    }
	    return this;
  	}

	 @Override
	 public BitmapContainer toBitmapContainer() {
	    BitmapContainer bc = new BitmapContainer();
	    bc.loadData(this);
	    return bc;
	 }
}

BitmapContainer容器

public final class BitmapContainer extends Container implements Cloneable {
	protected static final int MAX_CAPACITY = 1 << 16;
	final long[] bitmap;
	int cardinality;

	public BitmapContainer() {
    	this.cardinality = 0;
    	this.bitmap = new long[MAX_CAPACITY / 64];
    }
    //将ArrayContainer容器元素放入BitmapContainer容器
    protected void loadData(final ArrayContainer arrayContainer) {
    	this.cardinality = arrayContainer.cardinality;
	    for (int k = 0; k < arrayContainer.cardinality; ++k) {
	      final short x = arrayContainer.content[k];
	      bitmap[Util.toIntUnsigned(x) / 64] |= (1L << x);
	    }
    }
}
  1. 往BitmapContainer中添加新数据
    计算新数据在BitmapContainer的long类型数组中的位置,并按位运算符“或”合并到旧值(此时新值可能保持不变,任然等于旧值,这种情况是因为待添加的新数据曾经已经被添加过),然后将新值更新到数组的指定位置。在更新元素的个数cardinality时,如果新值不等于旧值,那么cardinality加1,反之保持不变
  @Override
  public Container add(final short i) {
    final int x = Util.toIntUnsigned(i);
    final long previous = bitmap[x / 64];
    long newval = previous | (1L << x);
    bitmap[x / 64] = newval;
    if (USE_BRANCHLESS) {
      cardinality += (previous ^ newval) >>> x;
    } else if (previous != newval) {
      ++cardinality;
    }
    return this;
  }

序列化与反序列化

import java.io.*;
import java.nio.*;
import java.util.Base64;

public static String serializeToStr(RoaringBitmap bitmap) throws IOException {
        bitmap.runOptimize();
        ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
        bitmap.serialize(new DataOutputStream(outputStream));
        return Base64.getEncoder().encodeToString(outputStream.toByteArray());
}

public static RoaringBitmap deserializeToBitmap(String src) throws IOException {
        RoaringBitmap bitmap = new RoaringBitmap();
        byte[] bytes = Base64.getDecoder().decode(src);
        ByteArrayInputStream in = new ByteArrayInputStream(bytes);
        bitmap.deserialize(new DataInputStream(in));
        return bitmap;
}

你可能感兴趣的:(RoaringBitmap运行机制解析)