转自:http://www.hadooptpoint.com/filters-in-hbase-shell/
Filters In Hbase Shell
Filters In Hbase Shell,Filter Language was introduced in APache HBase 0.92. It allows you to perform server-side filtering when accessing HBase over Thrift or in the HBase shell. You can find out more about shell integration by using the scan help command in the shell.
Let us see list of filters available in hbase by using hbase command (show_filters)
command for list filters are available in hbase
···
hbase(main):010:0> show_filters
Documentation ON filters mentioned below can be found AT: https://our.intern.facebook.com/intern/wiki/INDEX.php/HBase/Filter_Language
ColumnPrefixFilter
TimestampsFilter
PageFilter
MultipleColumnPrefixFilter
FamilyFilter
ColumnPaginationFilter
SingleColumnValueFilter
RowFilter
QualifierFilter
ColumnRangeFilter
ValueFilter
PrefixFilter
SingleColumnValueExcludeFilter
ColumnCountGetFilter
InclusiveStopFilter
DependentColumnFilter
FirstKeyOnlyFilter
KeyOnlyFilter
KeyOnlyFilter
This filter does not take any arguments. It returns only the key component of each key-value.
Syntax
KeyOnlyFilter ()
Example of keyonlyfilter
hbase(main):012:0> scan 'airline',{ FILTER => "KeyOnlyFilter()"}
ROW COLUMN+CELL
row1 COLUMN=flightbetween:destination, TIMESTAMP=1411981750093,
VALUE=
row1 COLUMN=flightbetween:source, TIMESTAMP=1411981724972, valu
e=
row1 COLUMN=flightinfo:airlines, TIMESTAMP=1411982131699, VALUE
=
row1 COLUMN=flightinfo:flightno, TIMESTAMP=1411982109827, VALUE
=
row1 COLUMN=TIME:arrivaltime, TIMESTAMP=1411981821497, VALUE=
row1 COLUMN=TIME:DATE, TIMESTAMP=1411981843455, VALUE=
row1 COLUMN=TIME:depaturetime, TIMESTAMP=1411981808445, VALUE=
row2 COLUMN=flightbetween:destination, TIMESTAMP=1411982226629,
VALUE=
row2 COLUMN=flightbetween:source, TIMESTAMP=1411982209701, valu
e=
row2 COLUMN=flightinfo:airlines, TIMESTAMP=1411982193228, VALUE
=
row2 COLUMN=flightinfo:flightno, TIMESTAMP=1411982183561, VALUE
=
row2 COLUMN=TIME:arrivaltime, TIMESTAMP=1411982277561, VALUE=
row2 COLUMN=TIME:DATE, TIMESTAMP=1411982261000, VALUE=
row2 COLUMN=TIME:depaturetime, TIMESTAMP=1411982244265, VALUE=
2 ROW(s) IN 0.0770 seconds
FirstKeyOnlyFilter
This filter doesntt take any arguments. It returns only the first key-value from each row.
Syntax
FirstKeyOnlyFilter ()
Example of firstkeyonlyfilter
hbase(main):013:0> scan 'airline',{ FILTER => "FirstKeyOnlyFilter()"}
ROW COLUMN+CELL
row1 COLUMN=flightbetween:destination, TIMESTAMP=1411981750093,
VALUE=banglre
row2 COLUMN=flightbetween:destination, TIMESTAMP=1411982226629,
VALUE=banglre
2 ROW(s) IN 0.0380 seconds
prefixfilter:
This filter takes one argument a prefix of a row key. It returns only those key-values present in a row that starts with the specified row prefix
Syntax
PrefixFilter ()
Example of prefixfilter
hbase(main):041:0> scan 'airline', {FILTER => "(PrefixFilter ('row2'))"}
ROW COLUMN+CELL
row2 COLUMN=flightbetween:destination, TIMESTAMP=1411982226629,
VALUE=banglre
row2 COLUMN=flightbetween:source, TIMESTAMP=1411982209701, valu
e=hyd
row2 COLUMN=flightinfo:airlines, TIMESTAMP=1411982193228, VALUE
=americanairlines
row2 COLUMN=flightinfo:flightno, TIMESTAMP=1411982183561, VALUE
=12346
row2 COLUMN=TIME:arrivaltime, TIMESTAMP=1411982277561, VALUE=10
am
row2 COLUMN=TIME:DATE, TIMESTAMP=1411982261000, VALUE=21/05/201
4
row2 COLUMN=TIME:depaturetime, TIMESTAMP=1411982244265, VALUE=8
am
1 ROW(s) IN 0.0710 seconds
ColumnPrefixFilter
This filter takes one argument a column prefix. It returns only those key-values present in a column that starts with the specified column prefix. The column prefix must be of the form qualifier
Syntax
ColumnPrefixFilter()
Example of columnprefixfilter
hbase(main):042:0> scan 'airline', {FILTER => "(PrefixFilter ('row2')) AND ColumnPrefixFilter('destination')"}
ROW COLUMN+CELL
row2 COLUMN=flightbetween:destination, TIMESTAMP=1411982226629,
VALUE=banglre
1 ROW(s) IN 0.0260 seconds
MultipleColumnPrefixFilter
MultipleColumnPrefixFilter
This filter takes a list of column prefixes. It returns key-values that are present in a column that starts with any of the specified column prefixes. Each of the column prefixes must be of the form qualifier
Syntax
MultipleColumnPrefixFilter(< column_prefix>,< column_prefix>,….< column_prefix>)
Example of multiplecolumnprefixfilter
hbase(main):011:0> scan 'airline',{FILTER => "MultipleColumnPrefixFilter('source','destination','date')"}
ROW COLUMN+CELL
row1 COLUMN=flightbetween:destination, TIMESTAMP=1411981750093,VALUE=banglre
row1 COLUMN=flightbetween:source, TIMESTAMP=1411981724972, VALUE=hyd
row1 COLUMN=TIME:DATE, TIMESTAMP=1411981843455, VALUE=20/05/2014
row2 COLUMN=flightbetween:destination, TIMESTAMP=1411982226629, VALUE=banglre
row2 COLUMN=flightbetween:source, TIMESTAMP=1411982209701, VALUE=hyd
row2 COLUMN=TIME:DATE, TIMESTAMP=1411982261000, VALUE=21/05/2014
2 ROW(s) IN 0.1600 seconds
ColumnCountGetFilter
This filter takes one argument a limit. It returns the first limit number of columns in the table.
Syntax
ColumnCountGetFilter()
Example of columncountgetfilter
hbase(main):018:0> scan 'airline',{FILTER => "ColumnCountGetFilter(2)"}
ROW COLUMN+CELL
row1 COLUMN=flightbetween:destination, TIMESTAMP=1411981750093, VALUE=banglre
row1 COLUMN=flightbetween:source, TIMESTAMP=1411981724972, VALUE=hyd
row2 COLUMN=flightbetween:destination, TIMESTAMP=1411982226629, VALUE=banglre
row2 COLUMN=flightbetween:source, TIMESTAMP=1411982209701, VALUE=hyd
2 ROW(s) IN 0.0390 seconds
PageFilter
This filter takes one argument a page size. It returns page size number of rows from the table.
Syntax
PageFilter ()
Example of pagefilter
hbase(main):020:0> scan 'airline',{FILTER => "PageFilter(1)"}
ROW COLUMN+CELL
row1 COLUMN=flightbetween:destination, TIMESTAMP=1411981750093, VALUE=banglre
row1 COLUMN=flightbetween:source, TIMESTAMP=1411981724972, VALUE=hyd
row1 COLUMN=flightinfo:airlines, TIMESTAMP=1411982131699, VALUE=americanairlines
row1 COLUMN=flightinfo:flightno, TIMESTAMP=1411982109827, VALUE=12346
row1 COLUMN=TIME:arrivaltime, TIMESTAMP=1411981821497, VALUE=9am
row1 COLUMN=TIME:DATE, TIMESTAMP=1411981843455, VALUE=20/05/2014
row1 COLUMN=TIME:depaturetime, TIMESTAMP=1411981808445, VALUE=7am
1 ROW(s) IN 0.0460 seconds
InclusiveStopFilter
This filter takes one argument a row key on which to stop scanning. It returns all key-values present in rows up to and including the specified row.
Syntax
InclusiveStopFilter()
Example of Inclusivestopfilter
hbase(main):002:0> scan 'airline',{FILTER => "InclusiveStopFilter('row1')"}
ROW COLUMN+CELL
row1 COLUMN=flightbetween:destination, TIMESTAMP=1411981750093, VALUE=banglre
row1 COLUMN=flightbetween:source, TIMESTAMP=1411981724972, VALUE=hyd
row1 COLUMN=flightinfo:airlines, TIMESTAMP=1411982131699, VALUE=americanairlines
row1 COLUMN=flightinfo:flightno, TIMESTAMP=1411982109827, VALUE=12346
row1 COLUMN=TIME:arrivaltime, TIMESTAMP=1411981821497, VALUE=9am
row1 COLUMN=TIME:DATE, TIMESTAMP=1411981843455, VALUE=20/05/2014
row1 COLUMN=TIME:depaturetime, TIMESTAMP=1411981808445, VALUE=7am
1 ROW(s) IN 0.0510 seconds
Family Filter(Qualifier Filter)
This filter takes a compare operator and a comparator. It compares each qualifier name with the comparator using the compare operator and if the comparison returns true, it returns all the key-values in that column.
Syntax
QualifierFilter (, )
Example of FamilyFilter
row2 COLUMN=flightinfo:flightno, TIMESTAMP=1411982183561, VALUE=12346
2 ROW(s) IN 0.0470 seconds
hbase(main):017:0> scan 'airline',{ FILTER => "QualifierFilter(=,'binary:airlines')"}
ROW COLUMN+CELL
row1 COLUMN=flightinfo:airlines, TIMESTAMP=1411982131699, VALUE=americanairlines
row2 COLUMN=flightinfo:airlines, TIMESTAMP=1411982193228, VALUE=americanairlines
2 ROW(s) IN 0.0540 seconds
ValueFilter
This filter takes a compare operator and a comparator. It compares each value with the comparator using the compare operator and if the comparison returns true, it returns that key-value.
Syntax
ValueFilter (< compareOp>,‘< value_comparator>’)
The above all filters are very basic filters in hbase shell
hbase(main):018:0> scan 'airline', { COLUMNS => 'flightbetween:source', LIMIT => 4, FILTER => "ValueFilter( =, 'binaryprefix:hyd' )" }
ROW COLUMN+CELL
row1 COLUMN=flightbetween:source, TIMESTAMP=1411981724972, valu
e=hyd
row2 COLUMN=flightbetween:source, TIMESTAMP=1411982209701, valu
e=hyd
2 ROW(s) IN 0.0660 seconds
hbase(main):044:0> scan 'airline' ,{ FILTER => " MultipleColumnPrefixFilter('source') AND (ValueFilter(=,'binary:hyd'))" }
ROW COLUMN+CELL
row1 COLUMN=flightbetween:source, TIMESTAMP=1411981724972, valu
e=hyd
row2 COLUMN=flightbetween:source, TIMESTAMP=1411982209701, valu
e=hyd
2 ROW(s) IN 0.1520 seconds
SingleColumnValueFilter
This filter takes a column family, a qualifier, a compare operator and a comparator. If the specified column is not found – all the columns of that row will be emitted. If the column is found and the comparison with the comparator returns true, all the columns of the row will be emitted. If the condition fails, the row will not be emitted.
This filter also takes two additional optional boolean arguments – filterIfColumnMissing and setLatestVersionOnly
If the filterIfColumnMissing flag is set to true the columns of the row will not be emitted if the specified column to check is not found in the row. The default value is false.
If the setLatestVersionOnly flag is set to false, it will test previous versions (timestamps) too. The default value is true.
These flags are optional and if you must set neither or both.
Syntax
SingleColumnValueFilter(‘< family>’,‘< qualifier>’, < compare operator>, ‘< comparator>’, < filterIfColumnMissing_boolean>, < latest_version_boolean>)
SingleColumnValueFilter(‘< family>’, ‘< qualifier>, < compare operator>, ‘< comparator>’)
hbase(main):020:0> scan 'airline' ,{ FILTER => "SingleColumnValueFilter('flightbetween','source',=, 'binary:hyd')" }
ROW COLUMN+CELL
row1 COLUMN=flightbetween:destination, TIMESTAMP=1411981750093,
VALUE=banglre
row1 COLUMN=flightbetween:source, TIMESTAMP=1411981724972, valu
e=hyd
row1 COLUMN=flightinfo:airlines, TIMESTAMP=1411982131699, VALUE
=americanairlines
row1 COLUMN=flightinfo:flightno, TIMESTAMP=1411982109827, VALUE
=12346
row1 COLUMN=TIME:arrivaltime, TIMESTAMP=1411981821497, VALUE=9a
m
row1 COLUMN=TIME:DATE, TIMESTAMP=1411981843455, VALUE=20/05/201
4
row1 COLUMN=TIME:depaturetime, TIMESTAMP=1411981808445, VALUE=7
am
row2 COLUMN=flightbetween:destination, TIMESTAMP=1411982226629,
VALUE=banglre
row2 COLUMN=flightbetween:source, TIMESTAMP=1411982209701, valu
e=hyd
row2 COLUMN=flightinfo:airlines, TIMESTAMP=1411982193228, VALUE
=americanairlines
row2 COLUMN=flightinfo:flightno, TIMESTAMP=1411982183561, VALUE
=12346
row2 COLUMN=TIME:arrivaltime, TIMESTAMP=1411982277561, VALUE=10
am
row2 COLUMN=TIME:DATE, TIMESTAMP=1411982261000, VALUE=21/05/201
4
row2 COLUMN=TIME:depaturetime, TIMESTAMP=1411982244265, VALUE=8
am
2 ROW(s) IN 0.0950 seconds