Storage Format

文档简介(0.9.0)

Data in Druid is stored in a custom column format known as a segment. Segments are composed of different types of columns. Column.java and the classes that extend it is a great place to looking into the storage format.

基本类

ValueType

枚举类,包含四个可选项:

  1. Float
  2. Long
  3. String
  4. Complex

IndexedInts

主要有三个方法:

int size();
int get(int index);
void fill(int index, int[] toFill);

实现类主要有:

  1. EmptyIndexedInts
  2. IntBufferIndexedInts
  3. ListBasedIndexedInts
  4. VSizeIndexedInts

size() 指的是该 Buffer 下还有多少个元素可读或可写;
get(index) 读取该 Buffer 下的 index 个元素;
fill()将对应的 Channel 数据填充到该 Buffer,目前都不支持该方法.
其中,ListBasedIndexedInts采用的存储是 List.
可以看出,部分是采用 Java NIO 操作 native memory.

ColumnCapabilities

属性:

private ValueType type = null;
private boolean dictionaryEncoded = false;  // 是否字典编码
private boolean runLengthEncoded = false;  // 是否 runLength 编码,runLength 是虚构的,可忽略
private boolean hasInvertedIndexes = false;  // 是否倒排索引
private boolean hasSpatialIndexes = false;  // 是否稀疏索引
private boolean hasMultipleValues = false;  // 是否有多值

DictionaryEncodedColumn

基本方法:

public int length();  // 一个字典编码列的总长度
public boolean hasMultipleValues();  // 是否有多值的情况
public int getSingleValueRow(int rowNum);  // 获取某行的单值
public IndexedInts getMultiValueRow(int rowNum);  // 获取某行的多值
public String lookupName(int id);  // 通过 id 索引获取对应行的值,注意,null and empty 都会转化成 null
public int lookupId(String name);  // 
public int getCardinality();  // 获取基数,字典长度

唯一实现类SimpleDictionaryEncodedColumn,有三个属性:

private final IndexedInts column;
private final IndexedMultivalue multiValueColumn;
private final CachingIndexed cachedLookups;

其中有意思的是 cachedLookups,存储的是字典。

CachingIndexed

字典的具体实现类,实现了 Indexed接口,其它的实现类主要有:

  1. GenericIndexed
  2. ArrayIndexed
  3. BufferIndexed
  4. ListIndexed
  5. VSizeIndexed

CachingIndexed 是 wrapping a given GenericIndexed,同时使用一个 LRUMap SizedLRUMap来存储 cachedValues.

GenericIndexed

A generic, flat storage mechanism. Use static methods fromArray() or fromIterable() to construct. If input is sorted, supports binary search index lookups. If input is not sorted, only supports array-like index lookups.
V1 Storage Format:

  • byte 1: version (0x1)
  • byte 2 == 0x1 => allowReverseLookup
  • bytes 3-6 => numBytesUsed
  • bytes 7-10 => numElements
  • bytes 10-((numElements * 4) + 10): integers representing 'end' offsets of byte serialized values
  • bytes ((numElements * 4) + 10)-(numBytesUsed + 2): 4-byte integer representing length of value, followed by bytes for value

属性有:

private final ByteBuffer theBuffer;  // 内置的 ByteBuffer 存储
private final ObjectStrategy strategy;
private final boolean allowReverseLookup;
private final int size;  // theBuffer 的当前 int 值
private final int valuesOffset;
private final BufferIndexed bufferIndexed;  // 内部类, BufferIndexed

Column 类

接口,详见实现类

SimpleColumn 类

属性:


private final ColumnCapabilitiescapabilities;

private final SupplierdictionaryEncodedColumn;

private final SupplierrunLengthColumn;

private final SuppliergenericColumn;

private final SuppliercomplexColumn;

private final SupplierbitmapIndex;

private final SupplierspatialIndex;

你可能感兴趣的:(Storage Format)