Hbase shell scan命令解析

文章目录

  • namespace
    • 自定义namespace
    • 预定义的namespace
  • scan
    • 准备数据
      • 创建表
      • 导入数据
    • 查询案例
      • COLUMNS
      • TIMERANGE
      • STARTROW STOPROW
      • REVERSED
      • ALL_METRICS or METRICS
      • ROWPREFIXFILTER(PrefixFilter)
      • QualifierFilter
      • ColumnPrefixFilter
      • ValueFilter
      • TimestampsFilter
      • RAW
      • FirstKeyOnlyFilter
      • 常用包

namespace

命名空间是与关系数据库系统中的数据库类似的表的逻辑分组。这种抽象为即将出现的多租户相关功能奠定了基础:

  • 配额管理(HBASE-8410) - 限制命名空间可以使用的资源量(即区域,表)。

  • 命名空间安全管理(HBASE-9206) - 为租户提供另一级别的安全管理。

  • 区域服务器组(HBASE-6721) - 可以将命名空间/表固定到RegionServers的子集上,从而保证粗略的隔离级别。

自定义namespace

创建namespace

hbase(main):039:0>create_namespace 'my_ns' 

删除namespace

hbase(main):039:0>drop_namespace 'my_ns'

修改namespace

hbase(main):049:0>alter_namespace 'my_ns', {METHOD => 'set', 'PROPERTY_NAME' => 'PROPERTY_VALUE'}

查看namespace

hbase(main):049:0>describe_namespace 'my_ns'

列出所有namespace

hbase(main):049:0>list_namespace

创建namespace中的表

hbase(main):049:0>create 'my_ns:table1','info'

删除命名空间中的表

hbase(main):052:0> disable 'my_ns:table1'
0 row(s) in 2.2520 seconds

hbase(main):053:0> drop 'my_ns:table1'
0 row(s) in 1.2360 seconds

查看namespace下的表

hbase(main):053:0>list_namespace_tables 'my_ns'

预定义的namespace

有两个预定义的特殊Namespace

  • hbase - 系统命名空间,用于包含HBase内部表

  • default - 没有明确指定名称空间的表将自动落入此名称空间

hbase(main):055:0> list_namespace
NAMESPACE                                                                         
default   
hbase  
my_ns  
3 row(s) in 0.0130 seconds

scan

准备数据

创建表

hbase(main):058:0> create 'my_test','f1','f2'
0 row(s) in 1.2390 seconds

=> Hbase::Table - my_test

导入数据

hbase(main):060:0> put 'my_test','u1_td1','f1:a1','abc1'
0 row(s) in 0.0620 seconds

hbase(main):061:0> put 'my_test','u1_td2','f1:a1','abc199'
0 row(s) in 0.0050 seconds

hbase(main):062:0> put 'my_test','u1_td3','f1:b1','abc123'
0 row(s) in 0.0080 seconds

hbase(main):063:0> put 'my_test','u2_td4','f1:a1','abc2'
0 row(s) in 0.0040 seconds

hbase(main):064:0> put 'my_test','u2_td5','f1:a2','abc299'
0 row(s) in 0.0050 seconds

hbase(main):065:0> put 'my_test','u2_td6','f1:s2','abc222'
0 row(s) in 0.0050 seconds

查询案例

将hbase的查询结果保存到文件中
echo “scan ‘tablename’, {LIMIT=>1}” | hbase shell > hbaseout1.txt

COLUMNS

指定要查询哪些列族或列,示例中查询列族f1中的的a1列b1列,只写f1则查询f1列族中所有列。

hbase(main):019:0> scan 'my_test',{COLUMNS => ['f1:a1','f1:b1']}
ROW                     COLUMN+CELL                                                        
 u1_td1                 column=f1:a1, timestamp=1548136934397, value=ab1                   
 u1_td2                 column=f1:a1, timestamp=1548128525390, value=abc199                
 u1_td3                 column=f1:b1, timestamp=1548128540271, value=abc123                
 u2_err                 column=f1:b1, timestamp=1548139413694, value=abc888                
 u2_td4                 column=f1:a1, timestamp=1548128558845, value=abc2                  
5 row(s) in 0.0120 seconds

TIMERANGE

查询指定时间范围内的数据,前闭后开区间。

hbase(main):021:0> scan 'my_test',{TIMERANGE=>[1548128540271,1548128614427]}
ROW                     COLUMN+CELL                                                        
 u1_td3                 column=f1:b1, timestamp=1548128540271, value=abc123                
 u2_td4                 column=f1:a1, timestamp=1548128558845, value=abc2                  
 u2_td5                 column=f1:a2, timestamp=1548128587695, value=abc299                
3 row(s) in 0.0070 seconds

STARTROW STOPROW

按照rowkey的范围查找数据。

hbase(main):095:0> scan 'my_test',{STARTROW=>'u1_td2',STOPROW=>'u2_td5'}
ROW                     COLUMN+CELL                                                        
 u1_td2                 column=f1:a1, timestamp=1548128525390, value=abc199                
 u1_td3                 column=f1:b1, timestamp=1548128540271, value=abc123                
 u2_td4                 column=f1:a1, timestamp=1548128558845, value=abc2                  
3 row(s) in 0.0050 seconds

REVERSED

查询结果反转排序。

hbase(main):022:0> scan 'my_test',{TIMERANGE=>[1548128540271,1548128614427],REVERSED => true}
ROW                     COLUMN+CELL                                                        
 u2_td5                 column=f1:a2, timestamp=1548128587695, value=abc299                
 u2_td4                 column=f1:a1, timestamp=1548128558845, value=abc2                  
 u1_td3                 column=f1:b1, timestamp=1548128540271, value=abc123                
3 row(s) in 0.0120 seconds

ALL_METRICS or METRICS

查看有关扫描执行的指标,ALL_METRICS设置为true返回或全部指标,METRICS返回指定的指标。

hbase(main):023:0> scan 'my_test',{ALL_METRICS => true}
ROW                     COLUMN+CELL                                                        
 u1_td1                 column=f1:a1, timestamp=1548136934397, value=ab1                   
 u1_td2                 column=f1:a1, timestamp=1548128525390, value=abc199                
 u1_td3                 column=f1:b1, timestamp=1548128540271, value=abc123                
 u2_err                 column=f1:b1, timestamp=1548139413694, value=abc888                
 u2_td4                 column=f1:a1, timestamp=1548128558845, value=abc2                  
 u2_td5                 column=f1:a2, timestamp=1548128587695, value=abc299                
 u2_td6                 column=f1:s2, timestamp=1548128614427, value=abc222                
7 row(s) in 0.0390 seconds

METRIC                  VALUE                                                              
 BYTES_IN_REMOTE_RESULT 275                                                                
 S                                                                                         
 BYTES_IN_RESULTS       275                                                                
 MILLIS_BETWEEN_NEXTS   11                                                                 
 NOT_SERVING_REGION_EXC 0                                                                  
 EPTION                                                                                    
 REGIONS_SCANNED        1                                                                  
 REMOTE_RPC_CALLS       3                                                                  
 REMOTE_RPC_RETRIES     0                                                                  
 ROWS_FILTERED          0                                                                  
 ROWS_SCANNED           7                                                                  
 RPC_CALLS              3                                                                  
 RPC_RETRIES            0                                                                  

hbase(main):024:0> scan 'my_test',{METRICS => ['ROWS_SCANNED','RPC_CALLS']}
ROW                     COLUMN+CELL                                                        
 u1_td1                 column=f1:a1, timestamp=1548136934397, value=ab1                   
 u1_td2                 column=f1:a1, timestamp=1548128525390, value=abc199                
 u1_td3                 column=f1:b1, timestamp=1548128540271, value=abc123                
 u2_err                 column=f1:b1, timestamp=1548139413694, value=abc888                
 u2_td4                 column=f1:a1, timestamp=1548128558845, value=abc2                  
 u2_td5                 column=f1:a2, timestamp=1548128587695, value=abc299                
 u2_td6                 column=f1:s2, timestamp=1548128614427, value=abc222                
7 row(s) in 0.0100 seconds

METRIC                  VALUE                                                              
 ROWS_SCANNED           7                                                                  
 RPC_CALLS              3         

ROWPREFIXFILTER(PrefixFilter)

查询以指定开头的rowkey数据。

hbase(main):040:0> scan 'my_test',{ROWPREFIXFILTER => 'u2'}
ROW                     COLUMN+CELL                                                        
 u2_err                 column=f1:b1, timestamp=1548139413694, value=abc888                
 u2_td4                 column=f1:a1, timestamp=1548128558845, value=abc2                  
 u2_td5                 column=f1:a2, timestamp=1548128587695, value=abc299                
 u2_td6                 column=f1:s2, timestamp=1548128614427, value=abc222                
4 row(s) in 0.0060 seconds

hbase(main):029:0> scan 'my_test',{FILTER => "PrefixFilter('u2')"}
ROW                     COLUMN+CELL                                                        
 u2_err                 column=f1:b1, timestamp=1548139413694, value=abc888                
 u2_td4                 column=f1:a1, timestamp=1548128558845, value=abc2                  
 u2_td5                 column=f1:a2, timestamp=1548128587695, value=abc299                
 u2_td6                 column=f1:s2, timestamp=1548128614427, value=abc222                
4 row(s) in 0.0070 seconds

QualifierFilter

按列查找,可以指定某一确定的列或列的范围。binary是确定的参数,substring是参数中含有的值。

hbase(main):081:0>  scan 'my_test',{FILTER => "(QualifierFilter (<,'binary:b1')) AND (QualifierFilter (=,'substring:1'))"}
ROW                     COLUMN+CELL                                                        
 u1_td1                 column=f1:a1, timestamp=1548136934397, value=ab1                   
 u1_td2                 column=f1:a1, timestamp=1548128525390, value=abc199                
 u2_td4                 column=f1:a1, timestamp=1548128558845, value=abc2                  
3 row(s) in 0.0060 seconds

ColumnPrefixFilter

以指定列的前缀查找数据。

hbase(main):073:0>  scan 'my_test',{FILTER=>"ColumnPrefixFilter('a') AND (ValueFilter(=,'substring:9') OR ValueFilter(=,'substring:2'))"}
ROW                     COLUMN+CELL                                                        
 u1_td2                 column=f1:a1, timestamp=1548128525390, value=abc199                
 u2_td4                 column=f1:a1, timestamp=1548128558845, value=abc2                  
 u2_td5                 column=f1:a2, timestamp=1548128587695, value=abc299                
3 row(s) in 0.0060 seconds

ValueFilter

按值查找,可以指定确定的值或者值的范围。

hbase(main):066:0> scan 'my_test',{FILTER=>"ValueFilter(=,'binary:abc1')"}
ROW                   COLUMN+CELL                                                 
 u1_td1               column=f1:a1, timestamp=1548128506440, value=abc1           
1 row(s) in 0.0420 seconds

TimestampsFilter

按照时间戳范围查找。

hbase(main):071:0> scan 'my_test',{FILTER => "TimestampsFilter(1548128525390,1548128614427)"}
ROW                     COLUMN+CELL                                                        
 u1_td2                 column=f1:a1, timestamp=1548128525390, value=abc199                
 u2_td6                 column=f1:s2, timestamp=1548128614427, value=abc222                
2 row(s) in 0.0070 seconds

RAW

它指导扫描器返回所有单元格(包括删除标记和未收集的已删除单元格)。此选项不能与请求特定列相结合。默认情况下禁用。

hbase(main):023:0> scan 'my_test'
ROW                     COLUMN+CELL                                                        
 u1_td2                 column=f1:a1, timestamp=1548128525390, value=abc199                
 u1_td3                 column=f1:b1, timestamp=1548128540271, value=abc123                
 u2_err                 column=f1:b1, timestamp=1548139413694, value=abc888                
 u2_td4                 column=f1:a1, timestamp=1548128558845, value=abc2                  
 u2_td5                 column=f1:a2, timestamp=1548128587695, value=abc299                
 u2_td6                 column=f1:s2, timestamp=1548128614427, value=abc222                
6 row(s) in 0.0090 seconds

hbase(main):024:0> scan 'my_test',{RAW => true,VERSIONS => 2}
ROW                     COLUMN+CELL                                                        
 u1_td1                 column=f1:a1, timestamp=1548226315249, type=DeleteColumn           
 u1_td1                 column=f1:a1, timestamp=1548136934397, value=ab1                   
 u1_td1                 column=f1:a1, timestamp=1548128506440, value=abc1                  
 u1_td2                 column=f1:a1, timestamp=1548128525390, value=abc199                
 u1_td3                 column=f1:b1, timestamp=1548128540271, value=abc123                
 u2_err                 column=f1:b1, timestamp=1548139413694, value=abc888                
 u2_td4                 column=f1:a1, timestamp=1548128558845, value=abc2                  
 u2_td5                 column=f1:a2, timestamp=1548128587695, value=abc299                
 u2_td6                 column=f1:s2, timestamp=1548128614427, value=abc222                
7 row(s) in 0.0120 seconds

FirstKeyOnlyFilter

一个rowkey可以有多个version,同一个rowkey的同一个column也会有多个的值, 只拿出key中的第一个column的第一个version
KeyOnlyFilter: 只要key,不要value

hbase(main):081:0> scan 'my_test',FILTER => "FirstKeyOnlyFilter() AND ValueFilter(=,'binary:abc199') AND KeyOnlyFilter()"
ROW                   COLUMN+CELL                                                 
 u1_td2               column=f1:a1, timestamp=1548128525390, value=               
1 row(s) in 0.0140 seconds

常用包

返回列的个数
返回列的个数,(5,1)第一个参数表示返回列的多少,第二个参数表示从第几个列开始。

hbase(main):012:0> scan 'test001', {LIMIT => 1}
ROW                     COLUMN+CELL                                                        
 36.56.0.0_10000120     column=f1:bts, timestamp=1545897321394, value=10704                
 36.56.0.0_10000120     column=f1:dip, timestamp=1545897321394, value=36.56.0.211          
 36.56.0.0_10000120     column=f1:dport, timestamp=1545897321394, value=81085              
 36.56.0.0_10000120     column=f1:pk, timestamp=1545897321394, value=2                     
 36.56.0.0_10000120     column=f1:sip, timestamp=1545897321394, value=36.56.0.0            
 36.56.0.0_10000120     column=f1:sport, timestamp=1545897321394, value=12790              
 36.56.0.0_10000120     column=f1:ts, timestamp=1545897321394, value=1545896770661         
1 row(s) in 0.0180 seconds

hbase(main):014:0> import org.apache.hadoop.hbase.filter.ColumnPaginationFilter
=> Java::OrgApacheHadoopHbaseFilter::ColumnPaginationFilter

hbase(main):015:0> scan 'test001', {FILTER =>ColumnPaginationFilter.new(5, 1),LIMIT => 1}
ROW                     COLUMN+CELL                                                        
 36.56.0.0_10000120     column=f1:dip, timestamp=1545897321394, value=36.56.0.211          
 36.56.0.0_10000120     column=f1:dport, timestamp=1545897321394, value=81085              
 36.56.0.0_10000120     column=f1:pk, timestamp=1545897321394, value=2                     
 36.56.0.0_10000120     column=f1:sip, timestamp=1545897321394, value=36.56.0.0            
 36.56.0.0_10000120     column=f1:sport, timestamp=1545897321394, value=12790              
1 row(s) in 0.0110 seconds

查找rowkey里面包含td3

hbase(main):097:0> import org.apache.hadoop.hbase.filter.CompareFilter
=> Java::OrgApacheHadoopHbaseFilter::CompareFilter

hbase(main):098:0> import org.apache.hadoop.hbase.filter.SubstringComparator
=> Java::OrgApacheHadoopHbaseFilter::SubstringComparator

hbase(main):099:0>  import org.apache.hadoop.hbase.filter.RowFilter
=> Java::OrgApacheHadoopHbaseFilter::RowFilter

hbase(main):101:0> scan 'my_test',{FILTER => RowFilter.new(CompareFilter::CompareOp.valueOf('EQUAL'),SubstringComparator.new('td3'))}
ROW                     COLUMN+CELL                                                        
 u1_td3                 column=f1:b1, timestamp=1548128540271, value=abc123                
1 row(s) in 0.0100 seconds

正则表达式
加入一条测试数据

hbase(main):001:0> put 'my_test','u2_err','f1:b1','abc888'
0 row(s) in 0.2470 seconds

查询rowkey里面以u开头的,新加入的测试数据并不符合正则表达式的规则,故查询不出来

hbase(main):003:0> import org.apache.hadoop.hbase.filter.RegexStringComparator
=> Java::OrgApacheHadoopHbaseFilter::RegexStringComparator

hbase(main):004:0> import org.apache.hadoop.hbase.filter.CompareFilter
=> Java::OrgApacheHadoopHbaseFilter::CompareFilter

hbase(main):006:0> import org.apache.hadoop.hbase.filter.SubstringComparator
=> Java::OrgApacheHadoopHbaseFilter::SubstringComparator

hbase(main):008:0> import org.apache.hadoop.hbase.filter.RowFilter
=> Java::OrgApacheHadoopHbaseFilter::RowFilter

hbase(main):010:0> scan 'my_test', {FILTER => RowFilter.new(CompareFilter::CompareOp.valueOf('EQUAL'),RegexStringComparator.new('^u\d+\_td\d+$'))}
ROW                     COLUMN+CELL                                                        
 u1_td1                 column=f1:a1, timestamp=1548136934397, value=ab1                   
 u1_td2                 column=f1:a1, timestamp=1548128525390, value=abc199                
 u1_td3                 column=f1:b1, timestamp=1548128540271, value=abc123                
 u2_td4                 column=f1:a1, timestamp=1548128558845, value=abc2                  
 u2_td5                 column=f1:a2, timestamp=1548128587695, value=abc299                
 u2_td6                 column=f1:s2, timestamp=1548128614427, value=abc222                
6 row(s) in 0.0110 seconds

你可能感兴趣的:(Hbase)