HTable table = HBaseDAO.getHTable("147patents");
FilterList filterList = new FilterList(FilterList.Operator.MUST_PASS_ALL);
// RegexStringComparator comp = new RegexStringComparator("2013-06-1.");
SubstringComparator comp = new SubstringComparator("2013-06-1");
SingleColumnValueFilter filter = new SingleColumnValueFilter(
Bytes.toBytes("patentinfo"),
Bytes.toBytes("CREATE_TIME"),
CompareOp.EQUAL,
comp
);
filterList.addFilter(filter);
Scan scan = new Scan();
scan.setFilter(filterList);
ResultScanner rs = table.getScanner(scan);
for (Result r : rs) {
System.out.println("Scan: " + r);
}
table.close();
二进制比较器,用得较少,有需要请自行查阅官网:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/BinaryComparator.html
由于HBase 采用
键值对保存内部数据,
键值元数据过滤器评估一行的
键
(ColumnFamily:Qualifiers)
是否存在
, 对应前节所述值的情况。
3.1. 基于列族过滤数据的FamilyFilter
构造函数:
FamilyFilter(CompareFilter.CompareOp familyCompareOp, ByteArrayComparable familyComparator)
个人实测代码:
HTable table = HBaseDAO.getHTable("147patents");
/**
* FamilyFilter构造函数中第二个参数是ByteArrayComparable类型
* ByteArrayComparable类参见“引言-参数基础”章节
* 下面仅以最可能用到的BinaryComparator、BinaryPrefixComparator举例:
*/
FamilyFilter ff = new FamilyFilter(
CompareFilter.CompareOp.EQUAL ,
new BinaryComparator(Bytes.toBytes("pat")) //表中不存在pat列族,过滤结果为空
);
FamilyFilter ff1 = new FamilyFilter(
CompareFilter.CompareOp.EQUAL ,
new BinaryPrefixComparator(Bytes.toBytes("pat")) //表中存在以pat打头的列族patentinfo,过滤结果为该列族所有行
);
Scan scan = new Scan();
scan.setFilter(ff1);
ResultScanner rs = table.getScanner(scan);
注意:
- 如果希望查找的是一个已知的列族,则使用 scan.addFamily(family) 比使用过滤器效率更高;
- 由于目前HBase对多列族支持不完善,所以该过滤器目前用途不大。
3.2. 基于限定符Qualifier(列)过滤数据的QualifierFilter
构造函数:
QualifierFilter(CompareFilter.CompareOp op, ByteArrayComparable qualifierComparator)
个人实测代码:
HTable table = HBaseDAO.getHTable("147patents");
/**
* QualifierFilter构造函数中第二个参数是ByteArrayComparable类型
* ByteArrayComparable类有以下子类可以使用:
* *******************************************
* BinaryComparator 匹配完整字节数组,
* BinaryPrefixComparator 匹配开始的部分字节数组,
* BitComparator,
* NullComparator,
* RegexStringComparator, 正则表达式匹配
* SubstringComparator
* *******************************************
* 下面仅以最可能用到的BinaryComparator、BinaryPrefixComparator举例:
*/
QualifierFilter ff = new QualifierFilter(
CompareFilter.CompareOp.EQUAL ,
new BinaryComparator(Bytes.toBytes("belong")) //表中不存在belong列,过滤结果为空
);
QualifierFilter ff1 = new QualifierFilter(
CompareFilter.CompareOp.EQUAL ,
new BinaryPrefixComparator(Bytes.toBytes("BELONG")) //表中存在以BELONG打头的列BELONG_SITE,过滤结果为所有行的该列数据
);
Scan scan = new Scan();
scan.setFilter(ff1);
ResultScanner rs = table.getScanner(scan);
说明:
- 一旦涉及到列(Qualifier),HBase就只认大写字母了!
- 该过滤器应该比FamilyFilter更常用!
3.3. 基于列名(即Qualifier)前缀过滤数据的ColumnPrefixFilter ( 该功能用QualifierFilter也能实现 )
构造函数:
ColumnPrefixFilter(byte[] prefix)
注意:
一个列名是可以出现在多个列族中的,该过滤器将返回所有列族中匹配的列。
官网示例代码,查找所有"abc"打头的列:
HTableInterface t = ...;
byte[] row = ...;
byte[] family = ...;
byte[] prefix = Bytes.toBytes("abc");
Scan scan = new Scan(row, row); // (optional) limit to one row
scan.addFamily(family); // (optional) limit to one family
Filter f = new ColumnPrefixFilter(prefix);
scan.setFilter(f);
scan.setBatch(10); // set this if there could be many columns returned
ResultScanner rs = t.getScanner(scan);
for (Result r = rs.next(); r != null; r = rs.next()) {
for (KeyValue kv : r.raw()) {
// each kv represents a column
}
}
rs.close();
个人实测代码:
HTable table = HBaseDAO.getHTable("147patents");
//返回所有行中以BELONG打头的列的数据
ColumnPrefixFilter ff1 = new ColumnPrefixFilter(Bytes.toBytes("BELONG"));
Scan scan = new Scan();
scan.setFilter(ff1);
ResultScanner rs = table.getScanner(scan);
3.4. 基于多个列名(即Qualifier)前缀过滤数据的MultipleColumnPrefixFilter
说明:
MultipleColumnPrefixFilter 和 ColumnPrefixFilter 行为差不多,但可以指定
多个前缀。
官方示例代码,查找所有"abc"或"xyz"打头的列:
HTableInterface t = ...;
byte[] row = ...;
byte[] family = ...;
byte[][] prefixes = new byte[][] {Bytes.toBytes("abc"), Bytes.toBytes("xyz")};
Scan scan = new Scan(row, row); // (optional) limit to one row
scan.addFamily(family); // (optional) limit to one family
Filter f = new MultipleColumnPrefixFilter(prefixes);
scan.setFilter(f);
scan.setBatch(10); // set this if there could be many columns returned
ResultScanner rs = t.getScanner(scan);
for (Result r = rs.next(); r != null; r = rs.next()) {
for (KeyValue kv : r.raw()) {
// each kv represents a column
}
}
rs.close();
个人实测代码:
HTable table = HBaseDAO.getHTable("147patents");
byte[][] prefixes = new byte[][] {Bytes.toBytes("BELONG"), Bytes.toBytes("CREATE")};
//返回所有行中以BELONG或者CREATE打头的列的数据
MultipleColumnPrefixFilter ff = new MultipleColumnPrefixFilter(prefixes);
Scan scan = new Scan();
scan.setFilter(ff);
ResultScanner rs = table.getScanner(scan);
3.5. 基于列范围(不是行范围)过滤数据ColumnRangeFilter
说明:
- 可用于获得一个范围的列,例如,如果你的一行中有百万个列,但是你只希望查看列名为bbbb到dddd的范围
- 该方法从 HBase 0.92 版本开始引入
- 一个列名是可以出现在多个列族中的,该过滤器将返回所有列族中匹配的列
构造函数:
ColumnRangeFilter(byte[] minColumn, boolean minColumnInclusive, byte[] maxColumn, boolean maxColumnInclusive)
参数解释:
- minColumn - 列范围的最小值,如果为空,则没有下限;
- minColumnInclusive - 列范围是否包含minColumn ;
- maxColumn - 列范围最大值,如果为空,则没有上限;
- maxColumnInclusive - 列范围是否包含maxColumn 。
官网示例代码,查找列名在"bbbb"到"dddd"范围的数据:
HTableInterface t = ...;
byte[] row = ...;
byte[] family = ...;
byte[] startColumn = Bytes.toBytes("bbbb");
byte[] endColumn = Bytes.toBytes("bbdd");
Scan scan = new Scan(row, row); // (optional) limit to one row
scan.addFamily(family); // (optional) limit to one family
Filter f = new ColumnRangeFilter(startColumn, true, endColumn, true);
scan.setFilter(f);
scan.setBatch(10); // set this if there could be many columns returned
ResultScanner rs = t.getScanner(scan);
for (Result r = rs.next(); r != null; r = rs.next()) {
for (KeyValue kv : r.raw()) {
// each kv represents a column
}
}
rs.close();