HBase Shell命令(1.2官方文档)参考博客https://blog.csdn.net/gk_kk/article/details/74216361
HBase Shell 是Apache HBase官方提供的SHell命令行操作接口,通过执行命令的方式操作HBase,如果已经配置HBase的环境变量,就可以在Linux的SHell命令行终端执行hbase shell 命令进入【HBase Shell 命令行终端】
[root@hadoop ~]# hbase shell
2020-01-19 10:07:20,147 INFO [main] Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available
2020-01-19 10:07:22,511 WARN [main] util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/hbase-1.2.0-cdh5.7.0/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/hadoop-2.6.0-cdh5.7.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
HBase Shell; enter 'help' for list of supported commands.
Type "exit" to leave the HBase Shell
Version 1.2.0-cdh5.7.0, rUnknown, Wed Mar 23 11:46:29 PDT 2016
hbase(main):001:0>
HBase Shell常用操作命令
(1) create命令在Hbase库创建一张新表
hbase(main):009:0> create 'students', 'info'
0 row(s) in 1.4030 seconds
=> Hbase::Table - students
说明:students是表名,info是列族名
(2) list命令查看HBase库中存在哪些表
hbase(main):011:0> list
TABLE
students
1 row(s) in 0.0240 seconds
=> ["students"]
(3) describe命令查看表属性
hbase(main):012:0> describe 'students'
Table studentsis ENABLED
students COLUMN FAMILIES DESCRIPTION
{NAME => 'info', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SC
OPE => '0'}
1 row(s) in 0.0680 seconds
说明:Name表示列族名称,后面的属性信息都是针对列族的
(4) alter命令修改表
a)为指定表增加一个新的列族
hbase(main):013:0> alter 'students', 'scores'
Updating all regions with the new schema...
1/1 regions updated.
Done.
0 row(s) in 2.3280 seconds
或执行等效命令:alter "students" , { NAME => 'scores'}
b)删除指定表的指定列族
hbase(main):032:0> alter "students" , { NAME => 'scores', METHOD => 'delete' }
c)修改指定表的列族属性
修改列族版本号为3:
hbase(main):032:0> alter 'students', { NAME => 'scores', VERSIONS => 3 }
修改压缩模式为GZ:
hbase(main):032:0> alter 'students', { NAME => 'scores', COMPRESSION => 'GZ'}
说明:执行alter命令后,再执行describe 'students'命令查看对students表的列族属性的修改
(5) put命令插入数据(单元格)
hbase(main):016:0> put 'students', 'row1', 'info:name', 'Jack'
0 row(s) in 0.2210 seconds
说明:向students表中插入一个单元格Cell,该单元格的行键RowKey值为row1,所属列族名family为info,所属列名column qualifier为name,单元格的数据值为Jack
hbase(main):006:0> put 'students', 'row1', 'scores:Math', '99'
0 row(s) in 0.1050 seconds
说明:向students表中插入一个单元格Cell,该单元格的行键RowKey值为row1,所属列族family为scores,单元格的列名column qualifier为Math,单元格的数据值为99
(6) get命令获取单元格数据值
a) 获取行键为row1的所有单元格数据值
hbase(main):008:0> get 'students', '001'
COLUMN CELL
info:name timestamp=1579401526017, value=Jack
scoresMath timestamp=1579402944774, value=99
2 row(s) in 0.0240 seconds
b)获取students表的行键为row1,列族为scores,列名为Math的特定单元格的数据值
hbase(main):009:0> get 'students', '001', 'scores:Math'
COLUMN CELL
scores:Math timestamp=1579402944774, value=99
1 row(s) in 0.0340 seconds
c)获取指定列值得多个版本
先插入三个语文成绩:
put 'students', '001', 'scores:Math', '95'
put 'students', '001', 'scores:Math', '96'
put 'students', '001', 'scores:Math', '100'
再查询列值的多个版本
hbase(main):057:0> get 'students', '001', { COLUMN =>'scores:Math', VERSIONS => 3 }
COLUMN CELL
scores:Math timestamp=1586515375411, value=100
scores:Math timestamp=1586515371064, value=96
scores:Math timestamp=1586515187877, value=95
说明:获取students表的行键为001,列族为scores,列名为Math,VERSIONS版本数为3的单元格的值
(7) scan命令查看表数据
插入成功的效果用scan命令查看如下:
hbase(main):017:0> scan 'students'
ROW COLUMN+CELL
001 column=info:name, timestamp=1579401526017, value=Jack
1 row(s) in 0.0570 seconds
指定行键范围:scan 'students', { STARTROW => '002', STOPROW =>'005'}
指定列族和列名:scan 'students', {COLUMNS => ['baseinfo']}
scan 'students_new', {COLUMNS => ['baseinfo:name']}
限定扫描前3行:scan 'students', {LIMIT => 3}
指定扫描的时间戳范围:scan 'students', { TIMERANGE => [1587474265773, 1587474587137]}
(8) delete命令删除指定的单元格
hbase(main):011:0> delete 'students', '001', 'scores:Math'
0 row(s) in 0.0680 seconds
指定删除特定时间戳的列:delete 'students_new', '001', 'score:Chinese', 1587473763487
说明:删除students表的行键为001,列族为scores,列名为Math的特定单元格
(8.1) deleteall命令删除一整行
删除指定行的所有列:deleteall "students", '001'
(9) disable命令禁用表,drop命令删除表
hbase(main):011:0> disable "students"
0 row(s) in 2.4800 seconds
hbase(main):012:0> is_enabled "students"
false
0 row(s) in 0.0640 seconds
hbase(main):014:0> drop "students"
0 row(s) in 1.3710 seconds
说明:删除表之前必须先禁用表再执行删除操作,否则会报错
(10) 快照管理
snapshot(快照):用于恢复数据表到某个时刻的数据和状态
给指定的数据表创建快照:snapshot 'students', 'snapshot_students'
列出所有的快照:list_snapshots
SNAPSHOT TABLE + CREATION TIME
snapshot_students students (Sun Apr 26 19:06:56 +0800 2020)
快照克隆(使用快照克隆出一张新表):clone_snapshot 'snapshot_students', 'students_clone'
使用快照恢复原表(表中原有数据被重置):
恢复前必须先停用原表 disable 'students'
执行快照恢复:restore_snapshot 'snapshot_students'
恢复后再启用表:enable 'students'
(11) 读操作(get和scan)高级操作:使用过滤器
什么是过滤器:在 HBase中,get 和 scan 操作都可以使用过滤器来设置输出的范围,类似 SQL 里的 Where 查询条件。
a)根据单元格值进行过滤:完全相等
hbase(main):058:0> get 'students', '001', { FILTER => "ValueFilter(=, 'binary:Jack')"}
COLUMN CELL
info:name timestamp=1586515121222, value=Jack
hbase(main):110:0> scan 'students', { FILTER => "ValueFilter(=, 'binary:Jack')"}
ROW COLUMN+CELL
001 column=info:name, timestamp=1587897072846, value=Jack
1 row(s) in 0.0370 seconds
b)根据单元格值进行过滤:子字符串匹配
hbase(main):062:0> get 'students', '001', { FILTER => "ValueFilter(=, 'substring:a')"}
COLUMN CELL
info:name timestamp=1586515121222, value=Jack
info:sex timestamp=1586515147295, value=male
hbase(main):113:0> scan 'students', { FILTER => "ValueFilter(=, 'substring:a')"}
ROW COLUMN+CELL
001 column=info:name, timestamp=1587897072846, value=Jack
002 column=info:sex, timestamp=1587897444455, value=male
004 column=info:sex, timestamp=1587897821363, value=female
005 column=info:home, timestamp=1587898039997, value=Wuhan
005 column=info:sex, timestamp=1587897927553, value=female
4 row(s) in 0.0310 seconds
c)根据列名进行过滤:完全相等
hbase(main):066:0> get 'students', '001', { FILTER => "QualifierFilter(=, 'binary:name')"}
COLUMN CELL
info:name timestamp=1586515121222, value=Jack
1 row(s) in 0.0680 seconds
d)根据列名进行过滤:子字符串匹配
hbase(main):067:0> get 'students', '001', { FILTER => "QualifierFilter(=, 'substring:e')"}
COLUMN CELL
info:age timestamp=1586515143189, value=19
info:name timestamp=1586515121222, value=Jack
info:sex timestamp=1586515147295, value=male
scores:Chinese timestamp=1586515197750, value=92
scores:English timestamp=1586515192876, value=88
e)根据列名进行过滤:前缀字符串匹配
hbase(main):003:0> scan 'students', FILTER=>"ColumnPrefixFilter('Eng')"
ROW COLUMN+CELL
001 column=scores:English, timestamp=1586597303484, value=88.5
1 row(s) in 0.0300 seconds
hbase(main):057:0> get 'students', '002', FILTER=>"ColumnPrefixFilter('Ja')"
COLUMN CELL
scores:Java timestamp=1587898403699, value=97
f)根据行键进行过滤:行键相等
hbase(main):023:0> scan 'students', FILTER => "RowFilter(=,'binary:001')"
ROW COLUMN+CELL
row1 column=info:age, timestamp=1586597303518, value=19
row1 column=info:height, timestamp=1586597303522, value=180
row1 column=info:name, timestamp=1586597303513, value=Jack
row1 column=scores:Chinese, timestamp=1586597303473, value=100.0
row1 column=scores:English, timestamp=1586597303484, value=88.5
row1 column=scores:Math, timestamp=1586597303444, value=99.5
hbase(main):086:0> scan 'students', FILTER => "RowFilter(=,'substring:01')"
ROW COLUMN+CELL
001 column=info:name, timestamp=1588074604900, value=Jackey
001 column=scores:Chinese, timestamp=1587898124286, value=99
001 column=scores:English, timestamp=1587898154507, value=88
001 column=scores:Hadoop, timestamp=1587898212793, value=98
001 column=scores:name, timestamp=1588076079039, value=abc
1 row(s) in 0.0610 seconds
g)根据行键进行过滤:行键的前缀字符串匹配
hbase(main):024:0> scan 'students', FILTER => "PrefixFilter('00')"
h)根据列族进行过滤(有没有必要过滤列族?):列族的子字符串匹配
hbase(main):005:0> scan 'students', FILTER => "FamilyFilter(=,'substring:cor')"
ROW COLUMN+CELL
row1 column=scores:Chinese, timestamp=1586597303473, value=100.0
row1 column=scores:English, timestamp=1586597303484, value=88.5
row1 column=scores:Math, timestamp=1586597303444, value=99.5
row2 column=scores:HBase, timestamp=1586597303510, value=77.7
row2 column=scores:Java, timestamp=1586597303492, value=95.2
h)根据列族和列名进行单个列值的过滤
hbase(main):008:0* scan 'students', { COLUMN => 'info:age', FILTER => "SingleColumnValueFilter('info','age', =, 'binary:19')" }
ROW COLUMN+CELL
row1 column=info:age, timestamp=1586597303518, value=19
hbase(main):017:0> scan 'students', { COLUMN => 'info:name', FILTER => "SingleColumnValueFilter('info','name', =, 'substring:m')" }
ROW COLUMN+CELL
002 column=info:name, timestamp=1587897100452, value=Tom
003 column=info:name, timestamp=1587897124080, value=Mike
hbase(main):018:0> scan 'students', { FILTER => "SingleColumnValueFilter('info','name', =, 'substring:m')" }
ROW COLUMN+CELL
002 column=info:age, timestamp=1587897291766, value=20
002 column=info:name, timestamp=1587897100452, value=Tom
002 column=info:sex, timestamp=1587897444455, value=male
002 column=scores:HBase, timestamp=1587898409880, value=97
002 column=scores:Java, timestamp=1587898403699, value=97
002 column=scores:Python, timestamp=1587898478281, value=82
003 column=info:height, timestamp=1587897487463, value=180
003 column=info:name, timestamp=1587897124080, value=Mike
003 column=scores:Hadoop, timestamp=1587898555397, value=89
003 column=scores:Maths, timestamp=1587898519275, value=72
003 column=scores:Spark, timestamp=1587898578474, value=78
说明:如果在参数列表中不指定列参数COLUMN => 'info:name',
i)根据指定的时间戳进行过滤
hbase(main):062:0> scan 'students', { FILTER => "TimestampsFilter ( 1588074604900, 1587898942609, 1587898478281)"}
ROW COLUMN+CELL
001 column=info:name, timestamp=1588074604900, value=Jackey
002 column=scores:Python, timestamp=1587898478281, value=82
005 column=scores:Crawler, timestamp=1587898942609, value=93
说明:过滤出其时间戳和指定时间戳列表中的时间戳完全相等的列
j)使用多种过滤器组合过滤
hbase(main):001:0> scan 'students', {ROWPREFIXFILTER => '00', FILTER => "(QualifierFilter (=, 'binary:name')) AND (TimestampsFilter ( 1588074604900))"}
ROW COLUMN+CELL
001 column=info:name, timestamp=1588074604900, value=Jackey
说明:过滤出行键前缀为00,列名等于name,时间戳等于1588074604900的列
k)显示HBase中所有的过滤器
hbase(main):024:0> show_filters
DependentColumnFilter
KeyOnlyFilter
ColumnCountGetFilter
SingleColumnValueFilter
PrefixFilter
SingleColumnValueExcludeFilter
FirstKeyOnlyFilter
ColumnRangeFilter
TimestampsFilter
FamilyFilter
QualifierFilter
ColumnPrefixFilter
RowFilter
MultipleColumnPrefixFilter
InclusiveStopFilter
PageFilter
ValueFilter
ColumnPaginationFilter
(12)名字空间(NameSpace)操作
命名空间NameSpace是与关系数据库系统中的数据库类似的表的逻辑分组,相当于MySQL中的数据库
a) 创建namespace
hbase(main):039:0>create_namespace 'my_namespace'
b) 删除namespacehbase(main):039:0>drop_namespace 'my_namespace'
c)修改namespace
d)hbase(main):049:0>alter_namespace 'my_namespace', {METHOD => 'set', 'PROPERTY_NAME' => 'PROPERTY_VALUE'}
查看namespace
e)hbase(main):049:0>describe_namespace 'my_namespace'
列出所有namespace
hbase(main):049:0>list_namespace
f)创建namespace中的表
hbase(main):049:0>create 'my_namespace:my_table', 'info'
g)查看namespace下的表
hbase(main):053:0>list_namespace_tables 'my_namespace'
说明:HBase自带了两个预定义的特殊namespace;hbase - 系统命名空间,用于包含HBase内部表;default - 没有明确指定名称空间的表将自动落入此名称空间
h)删除命名空间中的表
hbase(main):052:0> disable 'my_namespace:my_table'
hbase(main):053:0> drop 'my_namespace:my_table'
(13)修改布隆过滤器
alter "students" , NAME=>'info', BLOOMFILTER=>'ROWCOL'
alter "students" , NAME=>'info', BLOOMFILTER=>'ROW'
alter "students" , NAME=>'info', BLOOMFILTER=>'NONE'
(14)修改TTL生存时间
将info列族的TTL生成时间修改为1024秒
hbase(main):009:0> alter "students", NAME=>'info', TTL=>'1024'
Updating all regions with the new schema...
1/1 regions updated.
Done.
0 row(s) in 1.9710 seconds
hbase(main):010:0> desc "students"
Table students is ENABLED
students
COLUMN FAMILIES DESCRIPTION
{NAME => 'info', BLOOMFILTER => 'NONE', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_
ENCODING => 'NONE', TTL => '1024 SECONDS (17 MINUTES 4 SECONDS)', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE
=> 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
{NAME => 'scores', BLOOMFILTER => 'ROW', VERSIONS => '3', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK
_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65
536', REPLICATION_SCOPE => '0'}
等过了1024秒(17分4秒)后,再执行scan ‘students’命令,会发现info列族的数据都不见了,超过TTL时间的数据列都被HFile水的小合并删除了
(15)region分区的拆分和合并
a) Region的拆分
分为手动拆分和自动拆分。
自动拆分通过配置参数,来设置各种自动拆分的策略,由HBase自动完成,HBase运维工程师应熟练掌握自动拆分的参数含义和配置方法。
手动拆分有三种方式:预拆分,自定义拆分点和强制拆分。
1)预拆分:在Linux终端执行预拆分命令
hbase org.apache.hadoop.hbase.util.RegionSplitter pre_split_table HexStringSplit -c 10 -f myfamily 建表时预先设置region的数量为10个
2)自定义拆分点:在hbase shell下使用如下命令
create 'newtable', 'family1', {SPLITS => ['100', '200', '300', '400']}
按照rowkey设置region的切分范围:[0, 100), [10, 200), [200, 300), [30, 400), [400 ,)
3)强制拆分:hbase shell自带了手动的强制拆分region命令split,region太大了可以手动强制拆分。
例如执行命令 split 'newtable_1', '250' 将该表的某个region,从rowkey=250的位置拆分为两个region
说明:可以查看HBase自带的Web页面,查看table的分区的情况
HBase运维工程师应该熟练掌握region的手动拆分命令。
另外,还可以手动进行region的移动:hbase shell自带了手动的的移动region命令move,如果负载不均衡了,可以移动region到另一个Region Server。
例如 move '1589285167037.55c9fcd8141809d419121388947c8e92.', 'hadoop'
将该表的某个编码名的region移动到主机名为hadoop的Region Server上
b) Region的合并
HBase Shell提供了手动分区合并命令merge_region。例如
merge_region 'a5a88ab671f86e8432a10350ded4d82a', '5c4ec20440ff7ca4515ccaa26faac5