一、简介
HBase Shell 提供了大多数的 HBase 命令,通过 HBase Shell,用户可以方便地创建、删除及修改表,还可以向表中添加数据,列出表中的相关信息等。本节介绍一些常用的命令和具体操作,并讲解如何使用命令行实现一个“学生成绩表”。
Shell命令很杂,很难简单的描述,我发现最好的办法就是用help命令,在理解HBase列式存储的基础上,多试几次就懂了
二、HBase内部存储结构
下图摘自:https://blog.csdn.net/u010416101/article/details/89186320
在HBase内, 数据按照<行键><列族1: 列1-1, 列1-2><列族2: 列2-1, 列2-2>这样的类型进行存储的. 且, 其一, 其中行键的排列顺序是按照字典顺序排序的, 这点对于搜索非常重要. 其二, 同一行键的相同列族中列的值, 是可能变化的, 并且按照时间戳进行排序的.(当然, 有些数据在合并的时候, 会被删除.)
其中, 相应的模块知识, 如下所示:
Row Key
Row Key, 行键. 是用来检索记录的主键. 访问HBase Table中的行, 主要有三种方式. 单个row key进行访问/通过 row key 正则匹配 / 全表扫描. Row Key的值可以是任意字符串(最大长度为64KB, 实际使用经常为10-100byte) .其中, 行键的排列顺序是按照字典顺序排序的, 这点对于搜索非常重要. (PS: 字典顺序: 1 10 12 6 7 9 中, 11排在9之前.)
Columns Family
Columns Family列族. HBase内的每个列, 都属于一个列族. 列族是Schema一部分(即,表设计), 而列不是(列可以在插入数据时, 动态添加). 列族是需要在使用之前进行提前定义的. 列名都以列族为前缀, 如course:namecourse:age.
Cell
Cell, 数据单元. 有
Time Stamp
每个Cell存储一个数据的多个版本. 版本号, 通过时间戳进行索引(时间精确到毫秒). 时间戳类型为64位整数类型. 时间戳按照时间类型倒叙排序.
回收版本机制: <保存数据的最后n个版本>/<保存最近一段时间的版本(如最近七天)>.
三、HBase Shell命令列表
1.查看命令列表(忘记了就help)
使用help命令可以查看所有的命令
使用方法1:help
使用方法2:help "COMMAND"
使用方法3:help "COMMAND_GROUP"
示例:
help
help "get"
help "ddl"
常用的Shell命令组及Shell命令:
Group name: general
Commands: processlist, status, table_help, version, whoami
Group name: ddl
Commands: alter, alter_async, alter_status, clone_table_schema, create, describe, disable, disable_all, drop, drop_all, enable, enable_all, exists, get_table, is_disabled, is_enabled, list, list_regions, locate_region, show_filters
Group name: namespace
Commands: alter_namespace, create_namespace, describe_namespace, drop_namespace, list_namespace, list_namespace_tables
Group name: dml
Commands: append, count, delete, deleteall, get, get_counter, get_splits, incr, put, scan, truncate, truncate_preserve
Group name: tools
Commands: assign, balance_switch, balancer, balancer_enabled, catalogjanitor_enabled, catalogjanitor_run, catalogjanitor_switch, cleaner_chore_enabled, cleaner_chore_run, cleaner_chore_switch, clear_block_cache, clear_compaction_queues, clear_deadservers, clear_slowlog_responses, close_region, compact, compact_rs, compaction_state, compaction_switch, decommission_regionservers, flush, get_largelog_responses, get_slowlog_responses, hbck_chore_run, is_in_maintenance_mode, list_deadservers, list_decommissioned_regionservers, major_compact, merge_region, move, normalize, normalizer_enabled, normalizer_switch, recommission_regionserver, regioninfo, rit, snapshot_cleanup_enabled, snapshot_cleanup_switch, split, splitormerge_enabled, splitormerge_switch, stop_master, stop_regionserver, trace, unassign, wal_roll, zk_dump
Group name: replication
Commands: add_peer, append_peer_exclude_namespaces, append_peer_exclude_tableCFs, append_peer_namespaces, append_peer_tableCFs, disable_peer, disable_table_replication, enable_peer, enable_table_replication, get_peer_config, list_peer_configs, list_peers, list_replicated_tables, remove_peer, remove_peer_exclude_namespaces, remove_peer_exclude_tableCFs, remove_peer_namespaces, remove_peer_tableCFs, set_peer_bandwidth, set_peer_exclude_namespaces, set_peer_exclude_tableCFs, set_peer_namespaces, set_peer_replicate_all, set_peer_serial, set_peer_tableCFs, show_peer_tableCFs, update_peer_config
Group name: snapshots
Commands: clone_snapshot, delete_all_snapshot, delete_snapshot, delete_table_snapshots, list_snapshots, list_table_snapshots, restore_snapshot, snapshot
Group name: configuration
Commands: update_all_config, update_config
Group name: quotas
Commands: disable_exceed_throttle_quota, disable_rpc_throttle, enable_exceed_throttle_quota, enable_rpc_throttle, list_quota_snapshots, list_quota_table_sizes, list_quotas, list_snapshot_sizes, set_quota
Group name: security
Commands: grant, list_security_capabilities, revoke, user_permission
Group name: procedures
Commands: list_locks, list_procedures
Group name: visibility labels
Commands: add_labels, clear_auths, get_auths, list_labels, set_auths, set_visibility
Group name: rsgroup
Commands: add_rsgroup, balance_rsgroup, get_rsgroup, get_server_rsgroup, get_table_rsgroup, list_rsgroups, move_namespaces_rsgroup, move_servers_namespaces_rsgroup, move_servers_rsgroup, move_servers_tables_rsgroup, move_tables_rsgroup, remove_rsgroup, remove_servers_rsgroup, rename_rsgroup
2.使用方法
所有名字都要用单引号或者双引号引起来,参数之间用逗号分隔
回车后运行
create或者alter表的时候,使用Ruby Hashes表达法
{'key1' => 'value1', 'key2' => 'value2', ...}
当key为NAME, VERSIONS, COMPRESSION这些关键字的时候,不需要引号
使用二进制表达时,使用如下格式
hbase> get 't1', "key\x03\x3f\xcd"
hbase> get 't1', "key\003\023\011"
hbase> put 't1', "test\xef\xff", 'f1:', "\x01\x33\x40"
四、实验数据模型
本实验采用如下的表结构、列族和列
create 'student','info','score'
put 'student','1','info:name','zhang'
put 'student','1','info:age','18'
put 'student','1','info:sex','male'
put 'student','1','score:math','89'
put 'student','1','score:eng','91'
put 'student','1','score:phy','88'
put 'student','1','score:chem','99'
五、常用HBase Shell命令
1.general
processlist 进程列表
status 服务状态
version 版本
whoami 我是谁
table_help 表的帮助
尝试一下:
hbase> t = create 't', 'cf'
hbase> t = get_table 't'
hbase> t.put 'r', 'cf:q', 'v'
hbase> t.scan
hbase> t.help 'scan'
hbase> t.enable
hbase> t.flush
hbase> t.disable
hbase> t.drop
2. ddl
create 建表
help 'create' 打‘***’的可以试一下
Create a table with namespace=ns1 and table qualifier=t1
*** hbase> create 'ns1:t1', {NAME => 'f1', VERSIONS => 5}
Create a table with namespace=default and table qualifier=t1
*** hbase> create 't2', {NAME => 'f1'}, {NAME => 'f2'}, {NAME => 'f3'}
# The above in shorthand would be the following:
*** hbase> create 't3', 'f1', 'f2', 'f3'
hbase> create 't4', {NAME => 'f1', VERSIONS => 1, TTL => 2592000, BLOCKCACHE => true}
Table configuration options can be put at the end.
Examples:
*** hbase> create 'ns1:t7', 'f1', SPLITS => ['10', '20', '30', '40']
hbase> create 't10', 'f1', SPLITS_FILE => 'splits.txt', OWNER => 'hadoop'
*** hbase> create 't11', {NAME => 'f1', VERSIONS => 5}, METADATA => { 'mykey' => 'myvalue' }
hbase> # Optionally pre-split the table into NUMREGIONS, using
hbase> # SPLITALGO ("HexStringSplit", "UniformSplit" or classname)
hbase> create 't12', 'f1', {NUMREGIONS => 15, SPLITALGO => 'HexStringSplit'}
hbase> create 't1', 'f1', {NUMREGIONS => 15, SPLITALGO => 'HexStringSplit', REGION_REPLICATION => 2, CONFIGURATION => {'hbase.hregion.scan.loadColumnFamiliesOnDemand' => 'true'}}
hbase> create 't13', 'f1', {SPLIT_ENABLED => false, MERGE_ENABLED => false}
hbase> create 't14', {NAME => 'f1', DFS_REPLICATION => 1}
You can also keep around a reference to the created table:
*** hbase> t1 = create 't1', 'f1'
(1):create
create 'student','info','score'
(2):create
create 'student',{'NAME'=>'info'},{'NAME'=>'score'}
alter 变更表
help ‘alter’
describe 获取表的描述
describe 'student'
disable 失效表
disable 'student'
drop 删除表
drop 'student'
enable 生效表
enable 'student'
exists 是否存在表
exists 'student'
get_table 获取表的链接
t = get_table 'student'
t.put '1', 'info:name', 'zhang'
t.put '1','info:age','18'
t.put '1','info:sex','male'
t.put '1','score:math','89'
t.put '1','score:eng','91'
t.put '1','score:phy','88'
t.put '1','score:chem','99'
t.scan
list 列出当前命名空间的表
list
list_regions 表的region
list_regions 'student'
show_filters 过滤器
show_filters
3. namespace 命名空间
create_namespace 创建命名空间
help 'create_namespace'
create_namespace 'ns1'
create_namespace 'ns1', {'PROPERTY_NAME'=>'PROPERTY_VALUE'}
alter_namespace 变更命名空间的属性
help 'alter_namespace'
alter_namespace 'ns1', {METHOD => 'set', 'PROPERTY_NAME' => 'PROPERTY_VALUE'}
alter_namespace 'ns1', {METHOD => 'unset', NAME=>'PROPERTY_NAME'}
describe_namespace 获取命名空间的描述
describe_namespace 'ns1'
drop_namespace 删除命名空间
drop_namespace 'ns2'
list_namespace 列出所有的命名空间
list_namespace
list_namespace_tables 列出命名空间下的表
list_namespace_tables 'ns1'
4. dml
append 追加值
help 'append'
append 'student','2','info:name','li'
append 'student','2','info:age','19'
t=get_table 'student'
t.append '2','info:sex','female'
count 表的行计数
help 'count'
count 'student'
count 'student', FILTER => "RowFilter(=, 'binary:1')"
count 'student', FILTER => "(RowFilter(=, 'binary:1')) AND (FamilyFilter(=,'substring:info'))"
delete 删除值
hbase> delete 'student', '1', 'info:name'
hbase> delete 'student', '1', 'info:name',ts1
deleteall
hbase> deleteall 'student', '1'
hbase> deleteall 'student', '1', 'info:name'
hbase> deleteall 'student', '1', 'info:name', ts1
get
get 'student', '1'
get 'student', '1', {TIMERANGE => [1303668804000, 2303668904000]}
get 'student', '1', {COLUMN => 'info:name'}
get 'student', '1', {COLUMN => ['info:name', 'info:age', 'info:sex']}
get 'student', '1', {COLUMN => 'info:name', TIMESTAMP => 1303668804000}
get 'student', '1', {COLUMN => 'info:name', TIMERANGE => [1303668804000, 2303668904000, VERSIONS => 4}
get 'student', '1', {FILTER => "ValueFilter(=, 'binary:18')"}
get 'student', '1', 'info:name'
get 'student', '1', 'info:name', 'info:age', 'info:sex'
get 'student', '1', ['info:name', 'info:age', 'info:sex']
get 'student', '1', {COLUMN => ''info:name', 'info:age', 'info:sex', ATTRIBUTES =>{'mykey'=>'myvalue'}}
get 'student', '1', {COLUMN => ''info:name', 'info:age', 'info:sex', AUTHORIZATIONS => ['PRIVATE','SECRET']}
get 'student', '1', {CONSISTENCY => 'TIMELINE'}
get 'student', '1', {CONSISTENCY => 'TIMELINE', REGION_REPLICA_ID => 1}
get_counter
Return a counter cell value at specified table/row/column coordinates.
A counter cell should be managed with atomic increment functions on HBase
and the data should be binary encoded (as long value). Example:
hbase> get_counter 'student', '1', 'info:c1'
The same commands also can be run on a table reference.
hbase> t.get_counter '1', 'info:c1'
get_splits
get_splits 't1'
incr
Increments a cell 'value' at specified table/row/column coordinates.
To increment a cell value in table 'ns1:t1' or 't1' at row 'r1' under column
'c1' by 1 (can be omitted) or 10 do:
hbase> incr 'student', '1', 'info:c1'
hbase> incr 'student', '1', 'info:c1', 1
hbase> incr 'student', '1', 'info:c1', 10
hbase> incr 'student', '1', 'info:c1', 10, {ATTRIBUTES=>{'mykey'=>'myvalue'}}
hbase> incr 'student', '1', 'info:c1', {ATTRIBUTES=>{'mykey'=>'myvalue'}}
hbase> incr 'student', '1', 'info:c1', 10, {VISIBILITY=>'PRIVATE|SECRET'}
The same commands also can be run on a table reference.
hbase> t.incr '1', 'info:c1'
hbase> t.incr '1', 'info:c1', 1
hbase> t.incr '1', 'info:c1', 10, {ATTRIBUTES=>{'mykey'=>'myvalue'}}
hbase> t.incr '1', 'info:c1', 10, {VISIBILITY=>'PRIVATE|SECRET'}
put 放入值
help 'put'
put 'student','3','info:name','zhao'
put 'student','3','info:age','18'
t=get_table 'student'
t.put '3','info:sex','male'
t.scan
scan 扫描表
help 'scan'
Scan a table; pass table name and optionally a dictionary of scanner
specifications. Scanner specifications may include one or more of:
TIMERANGE, FILTER, LIMIT, STARTROW, STOPROW,
ROWPREFIXFILTER, TIMESTAMP,
MAXLENGTH, COLUMNS, CACHE, RAW, VERSIONS, ALL_METRICS, METRICS,
REGION_REPLICA_ID, ISOLATION_LEVEL, READ_TYPE,
ALLOW_PARTIAL_RESULTS,BATCH or MAX_RESULT_SIZE
Some examples:
scan 'student'
scan 'student', {COLUMNS => 'info:name'}
scan 'student', {COLUMNS => [ 'info:name', 'info:age'], LIMIT => 10, STARTROW => '2'}
scan 'student', {COLUMNS => [ 'info:name', 'info:age'], TIMERANGE => [1303668804000, 2303668904000]}
scan 'student', {REVERSED => true}
scan 'student', {ALL_METRICS => true}
scan 'student', {METRICS => ['RPC_RETRIES', 'ROWS_FILTERED']}
scan 'student', {ROWPREFIXFILTER => '2', FILTER => "(QualifierFilter (>=, 'binary:name')) "}
scan 'student', {FILTER =>org.apache.hadoop.hbase.filter.ColumnPaginationFilter.new(1, 0)}
scan 'student', {CONSISTENCY => 'TIMELINE'}
scan 'student', {ISOLATION_LEVEL => 'READ_UNCOMMITTED'}
scan 'student', {MAX_RESULT_SIZE => 1}
scan 'student', { COLUMNS => [ 'info:name', 'info:age'], ATTRIBUTES => {'mykey' => 'myvalue'}}
scan 'student', { COLUMNS => [ 'info:name', 'info:age'], AUTHORIZATIONS => ['PRIVATE','SECRET']}
t=get_table 'student'
t.scan
truncate
truncate 't1'