Filter:所有的过滤器都在服务端生效,以保证被过滤掉的数据不会被传送到客户端
•过滤器是在HBase服务器端上执行判断操作
•过滤器可以应用到行键(RowFilter),列限定符(QualifierFilter)或者数据值(ValueFilter)
•过滤器允许对数据分页处理(PageFilter),限制扫描器返回行数
•FilterList可以组合使用多个Filter
注意:
基于字符串的比较器,如RegexStringComparator和SubstringComparator,比基于字节的比较器更慢,更消耗资源。因为每次比较时它们都需要将给定的值转化为String.截取字符串子串和正则式的处理也需要花费额外的时间。
过滤器本来的目的是为了筛掉无用的信息,所有基于CompareFilter的过滤处理过程是返回匹配的值。
Interface for row and column filters directly applied within the regionserver. A filter can expect the following call sequence:
- reset() : reset the filter state before filtering a new row.
- filterAllRemaining(): true means row scan is over; false means keep going.
- filterRowKey(byte[],int,int): true means drop this row; false means include.
- filterKeyValue(Cell): decides whether to include or exclude this KeyValue. See Filter.ReturnCode.
- transform(KeyValue): if the KeyValue is included, let the filter transform the KeyValue.
- filterRowCells(List): allows direct modification of the final list to be submitted
- filterRow(): last chance to drop entire row based on the sequence of filter calls. Eg: filter a row if it doesn't contain a specified column.
Filter instances are created one per region/scan. This abstract class replaces the old RowFilterInterface. When implementing your own filters, consider inheriting FilterBase to help you reduce boilerplate.
•Filter
–所有的过滤器都在服务端生效,以保证被过滤掉的数据不会被传送到客户端
•Filter包含如下方法
–reset(): reset the filter state before filtering a new row. 在过滤新的一行前,重设过滤器状态。
–filterRowKey(byte[],int,int):true means drop this row; false means include.(常用)是否过滤掉剩余的行,是否停止扫描,是停止扫描,否,继续扫描
–filterKeyValue(Cell):decides whether to include or exclude this KeyValue.See Filter.ReturnCode.(常用),表明是否包含此KeyValue对,返回Filter.ReturnCode
–transform(KeyValue):if the KeyValueis included, let the filter transform the KeyValue.(基本不用),如果Keyvalue对包含,可以让filter转换它。
–filterRow(Listkvs):Chance to alter the list of keyvaluesto be submitted。 提交前,修改要被过滤的KeyValue的机会
–filterRow():last chance to drop entire row based on the sequence of filter calls. Eg:filter a row if it doesn't contain a specified column.丢掉整行的最后机会
–filterAllRemaining():true means row scan is over; false means keep going. 为true表明rowscan 跳过,为false表明继续scan
过滤器实例在每次region/scan时被创建,并且使用抽象类代替了原来的接口。如果你需要实现自定义的过滤器,考虑直接继承FilterBase,来避免许多重复的结构代码。
过滤器执行流程
•SingleColumnFilter
l对特定columnfamily、qualifier进行过滤
l例如:EQUAL、LESS、LESS_OR_EQUAL...
•Filter使用
lbyte[]family = Bytes.toBytes("f");
lbyte[]qualifier = Bytes.toBytes("q");
lbyte[]value = Bytes.toBytes("v");
lSingleColumnValueFilterscvf =new SingleColumnValueFilter(family,qualifier, CompareOp.EQUAL,value);
•setFilterIfMissing(boolean)
ltrue:filter if column is not found.
lfalse:the row will pass if column is not found. That is default.
•PrefixFilter
l对特定前缀进行过滤
•Filter使用
lPrefixFilter pf = new PrefixFilter(Bytes.toBytes(“prefix”));
•MultiRowRangeFilter
l根据rowKey的一段段区间进行过滤
•Filter使用
lListrowKeyRanges= new ArrayList();
lrowKeyRanges.add(newRowKeyRange(startkey1,stopkey1));
lrowKeyRanges.add(newRowKeyRange(startkey2,stopkey2));
lrowKeyRanges.add(newRowKeyRange(startkey3,stopkey3));
lMultiRowRangeFiltermrrf =new MultiRowRangeFilter(rowKeyRanges);
•FilterList
l使用多个Filter共同作为过滤条件。
•Filter使用
l同时满足f1与f2
lFilter f1 = new …
lFilterf2 = new …
lFilterListfs = new FilterList(f1,f2)
l满足f1或f2中的一个
lFilterf1 = new …
lFilterf2 = new …
lFilterListfs = new FilterList(Operator.MUST_PASS_ONE,f1, f2)