常见HBase filter用法

转自:http://www.hadooptpoint.com/filters-in-hbase-shell/
Filters In Hbase Shell

Filters In Hbase Shell,Filter Language was introduced in APache HBase 0.92. It allows you to perform server-side filtering when accessing HBase over Thrift or in the HBase shell. You can find out more about shell integration by using the scan help command in the shell.

Let us see list of filters available in hbase by using hbase command (show_filters)
command for list filters are available in hbase
···

hbase(main):010:0> show_filters
Documentation ON filters mentioned below can be found AT: https://our.intern.facebook.com/intern/wiki/INDEX.php/HBase/Filter_Language
ColumnPrefixFilter
TimestampsFilter
PageFilter
MultipleColumnPrefixFilter
FamilyFilter
ColumnPaginationFilter
SingleColumnValueFilter
RowFilter
QualifierFilter
ColumnRangeFilter
ValueFilter
PrefixFilter
SingleColumnValueExcludeFilter
ColumnCountGetFilter
InclusiveStopFilter
DependentColumnFilter
FirstKeyOnlyFilter
KeyOnlyFilter

KeyOnlyFilter

This filter does not take any arguments. It returns only the key component of each key-value.
Syntax

KeyOnlyFilter ()

Example of keyonlyfilter

hbase(main):012:0> scan 'airline',{ FILTER => "KeyOnlyFilter()"}
ROW                   COLUMN+CELL                                               
  row1                 COLUMN=flightbetween:destination, TIMESTAMP=1411981750093,
                        VALUE=                                                   
  row1                 COLUMN=flightbetween:source, TIMESTAMP=1411981724972, valu
                       e=                                                        
  row1                 COLUMN=flightinfo:airlines, TIMESTAMP=1411982131699, VALUE
                       =                                                         
  row1                 COLUMN=flightinfo:flightno, TIMESTAMP=1411982109827, VALUE
                       =                                                         
  row1                 COLUMN=TIME:arrivaltime, TIMESTAMP=1411981821497, VALUE=  
  row1                 COLUMN=TIME:DATE, TIMESTAMP=1411981843455, VALUE=         
  row1                 COLUMN=TIME:depaturetime, TIMESTAMP=1411981808445, VALUE= 
  row2                 COLUMN=flightbetween:destination, TIMESTAMP=1411982226629,
                        VALUE=                                                   
  row2                 COLUMN=flightbetween:source, TIMESTAMP=1411982209701, valu
                       e=                                                        
  row2                 COLUMN=flightinfo:airlines, TIMESTAMP=1411982193228, VALUE
                       =                                                         
  row2                 COLUMN=flightinfo:flightno, TIMESTAMP=1411982183561, VALUE
                       =                                                         
  row2                 COLUMN=TIME:arrivaltime, TIMESTAMP=1411982277561, VALUE=  
  row2                 COLUMN=TIME:DATE, TIMESTAMP=1411982261000, VALUE=         
  row2                 COLUMN=TIME:depaturetime, TIMESTAMP=1411982244265, VALUE= 
 2 ROW(s) IN 0.0770 seconds

FirstKeyOnlyFilter

This filter doesntt take any arguments. It returns only the first key-value from each row.
Syntax

FirstKeyOnlyFilter ()

Example of firstkeyonlyfilter

hbase(main):013:0> scan 'airline',{ FILTER => "FirstKeyOnlyFilter()"}
ROW                   COLUMN+CELL                                               
  row1                 COLUMN=flightbetween:destination, TIMESTAMP=1411981750093,
                        VALUE=banglre                                            
  row2                 COLUMN=flightbetween:destination, TIMESTAMP=1411982226629,
                        VALUE=banglre                                            
 2 ROW(s) IN 0.0380 seconds

prefixfilter:

This filter takes one argument a prefix of a row key. It returns only those key-values present in a row that starts with the specified row prefix

Syntax

PrefixFilter ()

Example of prefixfilter

hbase(main):041:0>  scan 'airline', {FILTER => "(PrefixFilter ('row2'))"}
ROW                   COLUMN+CELL                                               
  row2                 COLUMN=flightbetween:destination, TIMESTAMP=1411982226629,
                        VALUE=banglre                                            
  row2                 COLUMN=flightbetween:source, TIMESTAMP=1411982209701, valu
                       e=hyd                                                     
  row2                 COLUMN=flightinfo:airlines, TIMESTAMP=1411982193228, VALUE
                       =americanairlines                                         
  row2                 COLUMN=flightinfo:flightno, TIMESTAMP=1411982183561, VALUE
                       =12346                                                    
  row2                 COLUMN=TIME:arrivaltime, TIMESTAMP=1411982277561, VALUE=10
                       am                                                        
  row2                 COLUMN=TIME:DATE, TIMESTAMP=1411982261000, VALUE=21/05/201
                       4                                                         
  row2                 COLUMN=TIME:depaturetime, TIMESTAMP=1411982244265, VALUE=8
                       am                                                        
 1 ROW(s) IN 0.0710 seconds

ColumnPrefixFilter

This filter takes one argument a column prefix. It returns only those key-values present in a column that starts with the specified column prefix. The column prefix must be of the form qualifier

Syntax

ColumnPrefixFilter()

Example of columnprefixfilter



hbase(main):042:0>  scan 'airline', {FILTER => "(PrefixFilter ('row2')) AND ColumnPrefixFilter('destination')"}
ROW                   COLUMN+CELL                                               
  row2                 COLUMN=flightbetween:destination, TIMESTAMP=1411982226629,
                        VALUE=banglre                                            
 1 ROW(s) IN 0.0260 seconds


MultipleColumnPrefixFilter

MultipleColumnPrefixFilter

This filter takes a list of column prefixes. It returns key-values that are present in a column that starts with any of the specified column prefixes. Each of the column prefixes must be of the form qualifier

Syntax

MultipleColumnPrefixFilter(< column_prefix>,< column_prefix>,….< column_prefix>)

Example of multiplecolumnprefixfilter

hbase(main):011:0> scan 'airline',{FILTER => "MultipleColumnPrefixFilter('source','destination','date')"}
ROW                                   COLUMN+CELL                                                                                               
  row1                   COLUMN=flightbetween:destination, TIMESTAMP=1411981750093,VALUE=banglre                                  
  row1                   COLUMN=flightbetween:source, TIMESTAMP=1411981724972, VALUE=hyd                                           
  row1                   COLUMN=TIME:DATE, TIMESTAMP=1411981843455, VALUE=20/05/2014                                               
  row2                   COLUMN=flightbetween:destination, TIMESTAMP=1411982226629, VALUE=banglre                                  
  row2                   COLUMN=flightbetween:source, TIMESTAMP=1411982209701, VALUE=hyd                                           
  row2                   COLUMN=TIME:DATE, TIMESTAMP=1411982261000, VALUE=21/05/2014                                               
 2 ROW(s) IN 0.1600 seconds

ColumnCountGetFilter

This filter takes one argument a limit. It returns the first limit number of columns in the table.

Syntax

ColumnCountGetFilter()

Example of columncountgetfilter

hbase(main):018:0> scan 'airline',{FILTER => "ColumnCountGetFilter(2)"}
ROW                                   COLUMN+CELL                                                                                               
  row1                                COLUMN=flightbetween:destination, TIMESTAMP=1411981750093, VALUE=banglre                                  
  row1                                COLUMN=flightbetween:source, TIMESTAMP=1411981724972, VALUE=hyd                                           
  row2                                COLUMN=flightbetween:destination, TIMESTAMP=1411982226629, VALUE=banglre                                  
  row2                                COLUMN=flightbetween:source, TIMESTAMP=1411982209701, VALUE=hyd                                           
 2 ROW(s) IN 0.0390 seconds

PageFilter

This filter takes one argument a page size. It returns page size number of rows from the table.

Syntax

PageFilter ()

Example of pagefilter

hbase(main):020:0> scan 'airline',{FILTER => "PageFilter(1)"}
ROW                                   COLUMN+CELL                                                                                               
  row1                                COLUMN=flightbetween:destination, TIMESTAMP=1411981750093, VALUE=banglre                                  
  row1                                COLUMN=flightbetween:source, TIMESTAMP=1411981724972, VALUE=hyd                                           
  row1                                COLUMN=flightinfo:airlines, TIMESTAMP=1411982131699, VALUE=americanairlines                               
  row1                                COLUMN=flightinfo:flightno, TIMESTAMP=1411982109827, VALUE=12346                                          
  row1                                COLUMN=TIME:arrivaltime, TIMESTAMP=1411981821497, VALUE=9am                                               
  row1                                COLUMN=TIME:DATE, TIMESTAMP=1411981843455, VALUE=20/05/2014                                               
  row1                                COLUMN=TIME:depaturetime, TIMESTAMP=1411981808445, VALUE=7am                                              
 1 ROW(s) IN 0.0460 seconds

InclusiveStopFilter

This filter takes one argument a row key on which to stop scanning. It returns all key-values present in rows up to and including the specified row.

Syntax

InclusiveStopFilter()

Example of Inclusivestopfilter

hbase(main):002:0> scan 'airline',{FILTER => "InclusiveStopFilter('row1')"}
ROW                                   COLUMN+CELL                                                                                               
  row1                                COLUMN=flightbetween:destination, TIMESTAMP=1411981750093, VALUE=banglre                                  
  row1                               COLUMN=flightbetween:source, TIMESTAMP=1411981724972, VALUE=hyd                                           
  row1                                COLUMN=flightinfo:airlines, TIMESTAMP=1411982131699, VALUE=americanairlines                               
  row1                                COLUMN=flightinfo:flightno, TIMESTAMP=1411982109827, VALUE=12346                                          
  row1                                COLUMN=TIME:arrivaltime, TIMESTAMP=1411981821497, VALUE=9am                                               
  row1                                COLUMN=TIME:DATE, TIMESTAMP=1411981843455, VALUE=20/05/2014                                               
  row1                                COLUMN=TIME:depaturetime, TIMESTAMP=1411981808445, VALUE=7am                                              
 1 ROW(s) IN 0.0510 seconds

Family Filter(Qualifier Filter)

This filter takes a compare operator and a comparator. It compares each qualifier name with the comparator using the compare operator and if the comparison returns true, it returns all the key-values in that column.

Syntax

QualifierFilter (, )

Example of FamilyFilter

row2                                COLUMN=flightinfo:flightno, TIMESTAMP=1411982183561, VALUE=12346                                          
 2 ROW(s) IN 0.0470 seconds

 hbase(main):017:0> scan 'airline',{ FILTER => "QualifierFilter(=,'binary:airlines')"}
ROW                                   COLUMN+CELL                                                                                               
  row1                                COLUMN=flightinfo:airlines, TIMESTAMP=1411982131699, VALUE=americanairlines                               
  row2                                COLUMN=flightinfo:airlines, TIMESTAMP=1411982193228, VALUE=americanairlines                               
 2 ROW(s) IN 0.0540 seconds

ValueFilter

This filter takes a compare operator and a comparator. It compares each value with the comparator using the compare operator and if the comparison returns true, it returns that key-value.

Syntax

ValueFilter (< compareOp>,‘< value_comparator>’)

The above all filters are very basic filters in hbase shell

hbase(main):018:0> scan 'airline', { COLUMNS => 'flightbetween:source', LIMIT => 4, FILTER => "ValueFilter( =, 'binaryprefix:hyd' )" }
ROW                  COLUMN+CELL                                               
  row1                COLUMN=flightbetween:source, TIMESTAMP=1411981724972, valu
                      e=hyd                                                     
  row2                COLUMN=flightbetween:source, TIMESTAMP=1411982209701, valu
                      e=hyd                                                     
 2 ROW(s) IN 0.0660 seconds

 hbase(main):044:0> scan 'airline' ,{ FILTER => " MultipleColumnPrefixFilter('source') AND (ValueFilter(=,'binary:hyd'))" } 
ROW                  COLUMN+CELL                                               
  row1                COLUMN=flightbetween:source, TIMESTAMP=1411981724972, valu
                      e=hyd                                                     
  row2                COLUMN=flightbetween:source, TIMESTAMP=1411982209701, valu
                      e=hyd                                                     
 2 ROW(s) IN 0.1520 seconds

SingleColumnValueFilter

This filter takes a column family, a qualifier, a compare operator and a comparator. If the specified column is not found – all the columns of that row will be emitted. If the column is found and the comparison with the comparator returns true, all the columns of the row will be emitted. If the condition fails, the row will not be emitted.

This filter also takes two additional optional boolean arguments – filterIfColumnMissing and setLatestVersionOnly

If the filterIfColumnMissing flag is set to true the columns of the row will not be emitted if the specified column to check is not found in the row. The default value is false.

If the setLatestVersionOnly flag is set to false, it will test previous versions (timestamps) too. The default value is true.

These flags are optional and if you must set neither or both.

Syntax

SingleColumnValueFilter(‘< family>’,‘< qualifier>’, < compare operator>, ‘< comparator>’, < filterIfColumnMissing_boolean>, < latest_version_boolean>)
SingleColumnValueFilter(‘< family>’, ‘< qualifier>, < compare operator>, ‘< comparator>’)

hbase(main):020:0> scan 'airline' ,{ FILTER => "SingleColumnValueFilter('flightbetween','source',=, 'binary:hyd')" } 
ROW                  COLUMN+CELL                                               
  row1                COLUMN=flightbetween:destination, TIMESTAMP=1411981750093,
                       VALUE=banglre                                            
  row1                COLUMN=flightbetween:source, TIMESTAMP=1411981724972, valu
                      e=hyd                                                     
  row1                COLUMN=flightinfo:airlines, TIMESTAMP=1411982131699, VALUE
                      =americanairlines                                         
  row1                COLUMN=flightinfo:flightno, TIMESTAMP=1411982109827, VALUE
                      =12346                                                    
  row1                COLUMN=TIME:arrivaltime, TIMESTAMP=1411981821497, VALUE=9a
                      m                                                         
  row1                COLUMN=TIME:DATE, TIMESTAMP=1411981843455, VALUE=20/05/201
                      4                                                         
  row1                COLUMN=TIME:depaturetime, TIMESTAMP=1411981808445, VALUE=7
                      am                                                        
  row2                COLUMN=flightbetween:destination, TIMESTAMP=1411982226629,
                       VALUE=banglre                                            
  row2                COLUMN=flightbetween:source, TIMESTAMP=1411982209701, valu
                      e=hyd                                                     
  row2                COLUMN=flightinfo:airlines, TIMESTAMP=1411982193228, VALUE
                      =americanairlines                                         
  row2                COLUMN=flightinfo:flightno, TIMESTAMP=1411982183561, VALUE
                      =12346                                                    
  row2                COLUMN=TIME:arrivaltime, TIMESTAMP=1411982277561, VALUE=10
                      am                                                        
  row2                COLUMN=TIME:DATE, TIMESTAMP=1411982261000, VALUE=21/05/201
                      4                                                         
  row2                COLUMN=TIME:depaturetime, TIMESTAMP=1411982244265, VALUE=8
                      am                                                        
 2 ROW(s) IN 0.0950 seconds

你可能感兴趣的:(hbase)