Get和Scan都支持过滤器,这些类提供的接口不能对行键、列名和列值进行过滤,但过滤器可以实现。过滤器的接口为Filter。所有的过滤器都在服务器端生效,防止过滤掉的数据被传到客户端;用户可以在客户端代码实现过滤的功能,但会影响系统性能。
hbase(main):022:0> scan 'scores'
ROW COLUMN+CELL
1 column=courses:java, timestamp=1707747922242, value=90
1 column=courses:python, timestamp=1707747820188, value=90
1 column=student:name, timestamp=1707747936081, value=xiaoming
2 column=courses:java, timestamp=1707747874299, value=80
2 column=courses:python, timestamp=1707747869889, value=80
2 column=student:name, timestamp=1707747864664, value=xiaokai
3 column=courses:java, timestamp=1707747888458, value=95
3 column=courses:python, timestamp=1707747882915, value=95
3 column=student:name, timestamp=1707747879216, value=xiaohong
3 row(s) in 0.0120 seconds
将过滤掉行键为1和2的数据
scan 'scores',FILTER=>"RowFilter(=,'binary:3')"
hbase(main):023:0> scan 'scores',FILTER=>"RowFilter(=,'binary:3')"
ROW COLUMN+CELL
3 column=courses:java, timestamp=1707747888458, value=95
3 column=courses:python, timestamp=1707747882915, value=95
3 column=student:name, timestamp=1707747879216, value=xiaohong
1 row(s) in 0.0370 seconds
只扫描scores表中列族为student的记录
hbase(main):024:0> scan 'scores',FILTER=>"FamilyFilter(=,'binary:student')"
ROW COLUMN+CELL
1 column=student:name, timestamp=1707747936081, value=xiaoming
2 column=student:name, timestamp=1707747864664, value=xiaokai
3 column=student:name, timestamp=1707747879216, value=xiaohong
3 row(s) in 0.0260 seconds
扫描scores表中列名为math的记录
hbase(main):026:0> scan 'scores',FILTER=>"QualifierFilter(=,'binary:python')"
ROW COLUMN+CELL
1 column=courses:python, timestamp=1707747820188, value=90
2 column=courses:python, timestamp=1707747869889, value=80
3 column=courses:python, timestamp=1707747882915, value=95
3 row(s) in 0.0140 seconds
扫描scores表中包含hong的记录
hbase(main):003:0> scan 'scores',FILTER=>"ValueFilter(=,'substring:hong')"
ROW COLUMN+CELL
3 column=student:name, timestamp=1707747879216, value=xiaohong
1 row(s) in 0.0160 seconds
扫描scores表中courses大于等于90的记录
hbase(main):003:0> scan 'scores',FILTER=>"ValueFilter(=,'binary:90')"
扫描scores表中 包含2开头的记录
hbase(main):007:0> scan 'scores',FILTER=>"PrefixFilter('2')"
ROW COLUMN+CELL
2 column=courses:java, timestamp=1707747874299, value=80
2 column=courses:python, timestamp=1707747869889, value=80
2 column=student:name, timestamp=1707747864664, value=xiaokai
1 row(s) in 0.0570 seconds
扫描scores表中列名为ja开头的记录
hbase(main):016:0> scan 'scores',FILTER=>"ColumnPrefixFilter('ja')"
ROW COLUMN+CELL
1 column=courses:java, timestamp=1707747922242, value=90
2 column=courses:java, timestamp=1707747874299, value=80
3 column=courses:java, timestamp=1707747888458, value=95
3 row(s) in 0.0230 seconds
扫描scores表中所有行,value为空。
hbase(main):020:0> scan 'scores',FILTER=>"KeyOnlyFilter()"
ROW COLUMN+CELL
1 column=courses:java, timestamp=1707747922242, value=
1 column=courses:python, timestamp=1707747820188, value=
1 column=student:name, timestamp=1707747936081, value=
2 column=courses:java, timestamp=1707747874299, value=
2 column=courses:python, timestamp=1707747869889, value=
2 column=student:name, timestamp=1707747864664, value=
3 column=courses:java, timestamp=1707747888458, value=
3 column=courses:python, timestamp=1707747882915, value=
3 column=student:name, timestamp=1707747879216, value=
3 row(s) in 0.0220 seconds
扫描scores表中首次行键记录
hbase(main):024:0> scan 'scores',FILTER=>"FirstKeyOnlyFilter()"
ROW COLUMN+CELL
1 column=courses:java, timestamp=1707747922242, value=90
2 column=courses:java, timestamp=1707747874299, value=80
3 column=courses:java, timestamp=1707747888458, value=95
3 row(s) in 0.0370 seconds
扫描score表中的student:name为xiaohong的记录
hbase(main):005:0> scan 'scores', {COLUMNS=>['student'], FILTER=>"SingleColumnValueFilter('student','name',=,'binary:xiaohong')"}
ROW COLUMN+CELL
3 column=student:name, timestamp=1707747879216, value=xiaohong
1 row(s) in 0.0100 seconds
扫描scores表中的student:lastname为xiaohong的记录
hbase(main):002:0> scan 'scores', {COLUMNS=>['student'], FILTER=>"SingleColumnValueFilter('student','lastname',=,'binary:xiaohong')"}
ROW COLUMN+CELL
1 column=student:name, timestamp=1707747936081, value=xiaoming
2 column=student:name, timestamp=1707747864664, value=xiaokai
3 column=student:name, timestamp=1707747879216, value=xiaohong
3 row(s) in 0.1570 seconds
扫描scores表中的student:name为小明的记录,但不包含student:name列
hbase(main):010:0> scan 'scores',FILTER=>"SingleColumnValueExcludeFilter('student','name',=,'binary:xiaohong')"
ROW COLUMN+CELL
3 column=courses:java, timestamp=1707747888458, value=95
3 column=courses:python, timestamp=1707747882915, value=95
1 row(s) in 0.0570 seconds
扫描表中的记录,直到行键为1停止
hbase(main):017:0> scan 'scores', {FILTER=>"InclusiveStopFilter('1')"}
ROW COLUMN+CELL
1 column=courses:java, timestamp=1707747922242, value=90
1 column=courses:python, timestamp=1707747820188, value=90
1 column=student:name, timestamp=1707747936081, value=xiaoming
1 row(s) in 0.0320 seconds
扫描scores表中的记录,列数超过1条停止
hbase(main):018:0> scan 'scores', {FILTER=>"ColumnCountGetFilter(1)"}
ROW COLUMN+CELL
1 column=courses:java, timestamp=1707747922242, value=90
1 row(s) in 0.0580 seconds