HBase过滤器

HBase过滤器

Get和Scan都支持过滤器,这些类提供的接口不能对行键、列名和列值进行过滤,但过滤器可以实现。过滤器的接口为Filter。所有的过滤器都在服务器端生效,防止过滤掉的数据被传到客户端;用户可以在客户端代码实现过滤的功能,但会影响系统性能。

1.表的数据

hbase(main):022:0> scan 'scores'
ROW                    COLUMN+CELL                                                   
 1                     column=courses:java, timestamp=1707747922242, value=90        
 1                     column=courses:python, timestamp=1707747820188, value=90      
 1                     column=student:name, timestamp=1707747936081, value=xiaoming  
 2                     column=courses:java, timestamp=1707747874299, value=80        
 2                     column=courses:python, timestamp=1707747869889, value=80      
 2                     column=student:name, timestamp=1707747864664, value=xiaokai   
 3                     column=courses:java, timestamp=1707747888458, value=95        
 3                     column=courses:python, timestamp=1707747882915, value=95      
 3                     column=student:name, timestamp=1707747879216, value=xiaohong  
3 row(s) in 0.0120 seconds

2.行过滤器(RowFilter)

将过滤掉行键为1和2的数据

scan 'scores',FILTER=>"RowFilter(=,'binary:3')"
hbase(main):023:0> scan 'scores',FILTER=>"RowFilter(=,'binary:3')"
ROW                    COLUMN+CELL                                                   
 3                     column=courses:java, timestamp=1707747888458, value=95        
 3                     column=courses:python, timestamp=1707747882915, value=95      
 3                     column=student:name, timestamp=1707747879216, value=xiaohong  
1 row(s) in 0.0370 seconds

3.列名过滤器(FamilyFilter)

只扫描scores表中列族为student的记录

hbase(main):024:0> scan 'scores',FILTER=>"FamilyFilter(=,'binary:student')"
ROW                    COLUMN+CELL                                                   
 1                     column=student:name, timestamp=1707747936081, value=xiaoming  
 2                     column=student:name, timestamp=1707747864664, value=xiaokai   
 3                     column=student:name, timestamp=1707747879216, value=xiaohong  
3 row(s) in 0.0260 seconds

4.列名过滤器(QualifierFilter)

扫描scores表中列名为math的记录

hbase(main):026:0> scan 'scores',FILTER=>"QualifierFilter(=,'binary:python')"
ROW                    COLUMN+CELL                                                   
 1                     column=courses:python, timestamp=1707747820188, value=90      
 2                     column=courses:python, timestamp=1707747869889, value=80      
 3                     column=courses:python, timestamp=1707747882915, value=95      
3 row(s) in 0.0140 seconds

5.值过滤器(ValueFilter)

扫描scores表中包含hong的记录

hbase(main):003:0> scan 'scores',FILTER=>"ValueFilter(=,'substring:hong')"
ROW                                  COLUMN+CELL                                                                                            
 3                                   column=student:name, timestamp=1707747879216, value=xiaohong                                           
1 row(s) in 0.0160 seconds

扫描scores表中courses大于等于90的记录

hbase(main):003:0> scan 'scores',FILTER=>"ValueFilter(=,'binary:90')"

6.前缀过滤器(PrefixFilter)

扫描scores表中 包含2开头的记录

hbase(main):007:0> scan 'scores',FILTER=>"PrefixFilter('2')"
ROW                                  COLUMN+CELL                                                                                            
 2                                   column=courses:java, timestamp=1707747874299, value=80                                                 
 2                                   column=courses:python, timestamp=1707747869889, value=80                                               
 2                                   column=student:name, timestamp=1707747864664, value=xiaokai                                            
1 row(s) in 0.0570 seconds

7.列前缀过滤器(ColumnPrefixFilter)

扫描scores表中列名为ja开头的记录

hbase(main):016:0> scan 'scores',FILTER=>"ColumnPrefixFilter('ja')"
ROW                                  COLUMN+CELL                                                                                            
 1                                   column=courses:java, timestamp=1707747922242, value=90                                                 
 2                                   column=courses:java, timestamp=1707747874299, value=80                                                 
 3                                   column=courses:java, timestamp=1707747888458, value=95                                                 
3 row(s) in 0.0230 seconds

8.行键过滤器(KeyOnlyFilter)

扫描scores表中所有行,value为空。

hbase(main):020:0> scan 'scores',FILTER=>"KeyOnlyFilter()"
ROW                                  COLUMN+CELL                                                                                            
 1                                   column=courses:java, timestamp=1707747922242, value=                                                   
 1                                   column=courses:python, timestamp=1707747820188, value=                                                 
 1                                   column=student:name, timestamp=1707747936081, value=                                                   
 2                                   column=courses:java, timestamp=1707747874299, value=                                                   
 2                                   column=courses:python, timestamp=1707747869889, value=                                                 
 2                                   column=student:name, timestamp=1707747864664, value=                                                   
 3                                   column=courses:java, timestamp=1707747888458, value=                                                   
 3                                   column=courses:python, timestamp=1707747882915, value=                                                 
 3                                   column=student:name, timestamp=1707747879216, value=                                                   
3 row(s) in 0.0220 seconds

9.首次行键过滤器(FirstKeyOnlyFilter)

扫描scores表中首次行键记录

hbase(main):024:0> scan 'scores',FILTER=>"FirstKeyOnlyFilter()"
ROW                                  COLUMN+CELL                                                                                            
 1                                   column=courses:java, timestamp=1707747922242, value=90                                                 
 2                                   column=courses:java, timestamp=1707747874299, value=80                                                 
 3                                   column=courses:java, timestamp=1707747888458, value=95                                                 
3 row(s) in 0.0370 seconds

10.单列值过滤器(SingleColumnValueFilter)

扫描score表中的student:name为xiaohong的记录

hbase(main):005:0>  scan 'scores', {COLUMNS=>['student'], FILTER=>"SingleColumnValueFilter('student','name',=,'binary:xiaohong')"}
ROW                                  COLUMN+CELL                                                                                            
 3                                   column=student:name, timestamp=1707747879216, value=xiaohong                                           
1 row(s) in 0.0100 seconds

扫描scores表中的student:lastname为xiaohong的记录

hbase(main):002:0> scan 'scores', {COLUMNS=>['student'], FILTER=>"SingleColumnValueFilter('student','lastname',=,'binary:xiaohong')"}
ROW                                  COLUMN+CELL                                                                                            
 1                                   column=student:name, timestamp=1707747936081, value=xiaoming                                           
 2                                   column=student:name, timestamp=1707747864664, value=xiaokai                                            
 3                                   column=student:name, timestamp=1707747879216, value=xiaohong                                           
3 row(s) in 0.1570 seconds

11.单列排除过滤器(SingleColumnValueExcludeFilter)

扫描scores表中的student:name为小明的记录,但不包含student:name列

hbase(main):010:0> scan 'scores',FILTER=>"SingleColumnValueExcludeFilter('student','name',=,'binary:xiaohong')"
ROW                                  COLUMN+CELL                                                                                            
 3                                   column=courses:java, timestamp=1707747888458, value=95                                                 
 3                                   column=courses:python, timestamp=1707747882915, value=95                                               
1 row(s) in 0.0570 seconds

12.包含结束过滤器(InclusiveStopFilter)

扫描表中的记录,直到行键为1停止

hbase(main):017:0> scan 'scores', {FILTER=>"InclusiveStopFilter('1')"}
ROW                                  COLUMN+CELL                                                                                            
 1                                   column=courses:java, timestamp=1707747922242, value=90                                                 
 1                                   column=courses:python, timestamp=1707747820188, value=90                                               
 1                                   column=student:name, timestamp=1707747936081, value=xiaoming                                           
1 row(s) in 0.0320 seconds

13.列计数过滤器(ColumnCountGetFilter)

扫描scores表中的记录,列数超过1条停止

hbase(main):018:0> scan 'scores', {FILTER=>"ColumnCountGetFilter(1)"}
ROW                                  COLUMN+CELL                                                                                            
 1                                   column=courses:java, timestamp=1707747922242, value=90                                                 
1 row(s) in 0.0580 seconds

你可能感兴趣的:(大数据,hbase,python,数据库)