好玩的大数据之22:Hbase Shell

一、简介


HBase Shell 提供了大多数的 HBase 命令,通过 HBase Shell,用户可以方便地创建、删除及修改表,还可以向表中添加数据,列出表中的相关信息等。本节介绍一些常用的命令和具体操作,并讲解如何使用命令行实现一个“学生成绩表”。

Shell命令很杂,很难简单的描述,我发现最好的办法就是用help命令,在理解HBase列式存储的基础上,多试几次就懂了

二、HBase内部存储结构

下图摘自:https://blog.csdn.net/u010416101/article/details/89186320

在HBase内, 数据按照<行键><列族1: 列1-1, 列1-2><列族2: 列2-1, 列2-2>这样的类型进行存储的. 且, 其一, 其中行键的排列顺序是按照字典顺序排序的, 这点对于搜索非常重要. 其二, 同一行键的相同列族中列的值, 是可能变化的, 并且按照时间戳进行排序的.(当然, 有些数据在合并的时候, 会被删除.)

其中, 相应的模块知识, 如下所示:

Row Key

Row Key, 行键. 是用来检索记录的主键. 访问HBase Table中的行, 主要有三种方式. 单个row key进行访问/通过 row key 正则匹配 / 全表扫描. Row Key的值可以是任意字符串(最大长度为64KB, 实际使用经常为10-100byte) .其中, 行键的排列顺序是按照字典顺序排序的, 这点对于搜索非常重要. (PS: 字典顺序: 1 10 12 6 7 9 中, 11排在9之前.)

Columns Family

Columns Family列族. HBase内的每个列, 都属于一个列族. 列族是Schema一部分(即,表设计), 而列不是(列可以在插入数据时, 动态添加). 列族是需要在使用之前进行提前定义的. 列名都以列族为前缀, 如course:namecourse:age.

Cell

Cell, 数据单元. 有唯一确定的单元. Cell内的数据是没有类型的, 全部都是字节码进行存储的.

Time Stamp

每个Cell存储一个数据的多个版本. 版本号, 通过时间戳进行索引(时间精确到毫秒). 时间戳类型为64位整数类型. 时间戳按照时间类型倒叙排序.

回收版本机制: <保存数据的最后n个版本>/<保存最近一段时间的版本(如最近七天)>.

三、HBase Shell命令列表


    1.查看命令列表(忘记了就help)

        使用help命令可以查看所有的命令

        使用方法1:help

        使用方法2:help "COMMAND"

        使用方法3:help "COMMAND_GROUP"

        示例:

                help

                help "get"

                help "ddl"

常用的Shell命令组及Shell命令:

  Group name: general

  Commands: processlist, status, table_help, version, whoami

  Group name: ddl

  Commands: alter, alter_async, alter_status, clone_table_schema, create, describe, disable, disable_all, drop, drop_all, enable, enable_all, exists, get_table, is_disabled, is_enabled, list, list_regions, locate_region, show_filters

  Group name: namespace

  Commands: alter_namespace, create_namespace, describe_namespace, drop_namespace, list_namespace, list_namespace_tables

  Group name: dml

  Commands: append, count, delete, deleteall, get, get_counter, get_splits, incr, put, scan, truncate, truncate_preserve

  Group name: tools

  Commands: assign, balance_switch, balancer, balancer_enabled, catalogjanitor_enabled, catalogjanitor_run, catalogjanitor_switch, cleaner_chore_enabled, cleaner_chore_run, cleaner_chore_switch, clear_block_cache, clear_compaction_queues, clear_deadservers, clear_slowlog_responses, close_region, compact, compact_rs, compaction_state, compaction_switch, decommission_regionservers, flush, get_largelog_responses, get_slowlog_responses, hbck_chore_run, is_in_maintenance_mode, list_deadservers, list_decommissioned_regionservers, major_compact, merge_region, move, normalize, normalizer_enabled, normalizer_switch, recommission_regionserver, regioninfo, rit, snapshot_cleanup_enabled, snapshot_cleanup_switch, split, splitormerge_enabled, splitormerge_switch, stop_master, stop_regionserver, trace, unassign, wal_roll, zk_dump

  Group name: replication

  Commands: add_peer, append_peer_exclude_namespaces, append_peer_exclude_tableCFs, append_peer_namespaces, append_peer_tableCFs, disable_peer, disable_table_replication, enable_peer, enable_table_replication, get_peer_config, list_peer_configs, list_peers, list_replicated_tables, remove_peer, remove_peer_exclude_namespaces, remove_peer_exclude_tableCFs, remove_peer_namespaces, remove_peer_tableCFs, set_peer_bandwidth, set_peer_exclude_namespaces, set_peer_exclude_tableCFs, set_peer_namespaces, set_peer_replicate_all, set_peer_serial, set_peer_tableCFs, show_peer_tableCFs, update_peer_config

  Group name: snapshots

  Commands: clone_snapshot, delete_all_snapshot, delete_snapshot, delete_table_snapshots, list_snapshots, list_table_snapshots, restore_snapshot, snapshot

  Group name: configuration

  Commands: update_all_config, update_config

  Group name: quotas

  Commands: disable_exceed_throttle_quota, disable_rpc_throttle, enable_exceed_throttle_quota, enable_rpc_throttle, list_quota_snapshots, list_quota_table_sizes, list_quotas, list_snapshot_sizes, set_quota

  Group name: security

  Commands: grant, list_security_capabilities, revoke, user_permission

  Group name: procedures

  Commands: list_locks, list_procedures

  Group name: visibility labels

  Commands: add_labels, clear_auths, get_auths, list_labels, set_auths, set_visibility

  Group name: rsgroup

  Commands: add_rsgroup, balance_rsgroup, get_rsgroup, get_server_rsgroup, get_table_rsgroup, list_rsgroups, move_namespaces_rsgroup, move_servers_namespaces_rsgroup, move_servers_rsgroup, move_servers_tables_rsgroup, move_tables_rsgroup, remove_rsgroup, remove_servers_rsgroup, rename_rsgroup

    2.使用方法

            所有名字都要用单引号或者双引号引起来,参数之间用逗号分隔

            回车后运行

            create或者alter表的时候,使用Ruby Hashes表达法

            {'key1' => 'value1', 'key2' => 'value2', ...}

            当key为NAME, VERSIONS, COMPRESSION这些关键字的时候,不需要引号

                    使用二进制表达时,使用如下格式

                              hbase> get 't1', "key\x03\x3f\xcd"

                             hbase> get 't1', "key\003\023\011"

                            hbase> put 't1', "test\xef\xff", 'f1:', "\x01\x33\x40"

四、实验数据模型


本实验采用如下的表结构、列族和列

表结构

            create 'student','info','score'

            put 'student','1','info:name','zhang'

            put 'student','1','info:age','18'

            put 'student','1','info:sex','male'

            put 'student','1','score:math','89'

            put 'student','1','score:eng','91'

            put 'student','1','score:phy','88'

            put 'student','1','score:chem','99'

五、常用HBase Shell命令


    1.general

          processlist    进程列表

processlist

          status    服务状态

status

          version    版本

version

          whoami    我是谁

whoami

          table_help    表的帮助

table_help

尝试一下:

   hbase> t = create 't', 'cf'

   hbase> t = get_table 't'

  hbase> t.put 'r', 'cf:q', 'v'

  hbase> t.scan

   hbase> t.help 'scan'

   hbase> t.enable

   hbase> t.flush

   hbase> t.disable

   hbase> t.drop

     2. ddl

         create    建表                        

                    help 'create'    打‘***’的可以试一下
                        Create a table with namespace=ns1 and table qualifier=t1

***                       hbase> create 'ns1:t1', {NAME => 'f1', VERSIONS => 5}

                        Create a table with namespace=default and table qualifier=t1

***                       hbase> create 't2', {NAME => 'f1'}, {NAME => 'f2'}, {NAME => 'f3'}

                      # The above in shorthand would be the following:

***                       hbase> create 't3', 'f1', 'f2', 'f3'

                              hbase> create 't4', {NAME => 'f1', VERSIONS => 1, TTL => 2592000, BLOCKCACHE => true}

                        Table configuration options can be put at the end.

                        Examples:

***                       hbase> create 'ns1:t7', 'f1', SPLITS => ['10', '20', '30', '40']

                          hbase> create 't10', 'f1', SPLITS_FILE => 'splits.txt', OWNER => 'hadoop'

***                       hbase> create 't11', {NAME => 'f1', VERSIONS => 5}, METADATA => { 'mykey' => 'myvalue' }

                          hbase> # Optionally pre-split the table into NUMREGIONS, using

                          hbase> # SPLITALGO ("HexStringSplit", "UniformSplit" or classname)

                          hbase> create 't12', 'f1', {NUMREGIONS => 15, SPLITALGO => 'HexStringSplit'}

                          hbase> create 't1', 'f1', {NUMREGIONS => 15, SPLITALGO => 'HexStringSplit', REGION_REPLICATION => 2, CONFIGURATION => {'hbase.hregion.scan.loadColumnFamiliesOnDemand' => 'true'}}

                          hbase> create 't13', 'f1', {SPLIT_ENABLED => false, MERGE_ENABLED => false}

                          hbase> create 't14', {NAME => 'f1', DFS_REPLICATION => 1}

                        You can also keep around a reference to the created table:

***                       hbase> t1 = create 't1', 'f1'


                (1):create [ , ,... , ]

                        create 'student','info','score'

create 

                (2):create , {NAME=>'colFamilyName'  }  [,  {NAME=>'colFamilyNameN'  }     ] 

                        create 'student',{'NAME'=>'info'},{'NAME'=>'score'}

create 

          alter    变更表            

                help    ‘alter’

         describe    获取表的描述

                        describe 'student'

describe

           disable    失效表

                        disable 'student'

disable

            drop    删除表

                        drop 'student'

drop

             enable    生效表

                         enable 'student'

enable

              exists    是否存在表

                        exists 'student'

exists

               get_table    获取表的链接

                            t = get_table 'student'

                            t.put '1', 'info:name', 'zhang'

                            t.put '1','info:age','18'

                            t.put '1','info:sex','male'

                            t.put '1','score:math','89'

                            t.put '1','score:eng','91'

                            t.put '1','score:phy','88'

                            t.put '1','score:chem','99'

                            t.scan

get_table

            list    列出当前命名空间的表

                        list

list

            list_regions    表的region

                        list_regions 'student'

list_regions

            show_filters    过滤器

                        show_filters

show_filters

      3. namespace    命名空间

            create_namespace     创建命名空间

                        help 'create_namespace'

                        create_namespace 'ns1'

                        create_namespace 'ns1', {'PROPERTY_NAME'=>'PROPERTY_VALUE'}

create_namespace

           alter_namespace     变更命名空间的属性

                        help 'alter_namespace'

                       alter_namespace 'ns1', {METHOD => 'set', 'PROPERTY_NAME' => 'PROPERTY_VALUE'}

                       alter_namespace 'ns1', {METHOD => 'unset', NAME=>'PROPERTY_NAME'}

alter_namespace

            describe_namespace       获取命名空间的描述

                        describe_namespace 'ns1'

describe_namespace

            drop_namespace    删除命名空间

                        drop_namespace 'ns2'

drop_namespace

            list_namespace     列出所有的命名空间

                        list_namespace

list_namespace

             list_namespace_tables    列出命名空间下的表

                        list_namespace_tables    'ns1'

list_namespace_tables

      4. dml 

            append    追加值

                        help 'append'

                        append 'student','2','info:name','li'

                        append 'student','2','info:age','19'

                        t=get_table 'student'

                        t.append '2','info:sex','female'

append

            count    表的行计数

                        help 'count'

                        count 'student'

                        count 'student', FILTER => "RowFilter(=, 'binary:1')"

                        count 'student', FILTER => "(RowFilter(=, 'binary:1')) AND (FamilyFilter(=,'substring:info'))"

count

            delete    删除值

                   hbase> delete 'student', '1', 'info:name'

                   hbase> delete 'student', '1', 'info:name',ts1

            deleteall

                    hbase> deleteall 'student', '1'

                    hbase> deleteall 'student', '1', 'info:name'

                    hbase> deleteall 'student', '1', 'info:name', ts1

            get

                    get 'student', '1'

                    get 'student', '1', {TIMERANGE => [1303668804000, 2303668904000]}

                    get 'student', '1', {COLUMN => 'info:name'}

                    get 'student', '1', {COLUMN => ['info:name', 'info:age', 'info:sex']}

                    get 'student', '1', {COLUMN => 'info:name', TIMESTAMP => 1303668804000}

                    get 'student', '1', {COLUMN => 'info:name', TIMERANGE => [1303668804000, 2303668904000, VERSIONS => 4}

                    get 'student', '1', {FILTER => "ValueFilter(=, 'binary:18')"}

                    get 'student', '1', 'info:name'

                    get 'student', '1', 'info:name', 'info:age', 'info:sex'

                    get 'student', '1', ['info:name', 'info:age', 'info:sex']

                    get 'student', '1', {COLUMN => ''info:name', 'info:age', 'info:sex', ATTRIBUTES =>{'mykey'=>'myvalue'}}

                    get 'student', '1', {COLUMN => ''info:name', 'info:age', 'info:sex', AUTHORIZATIONS => ['PRIVATE','SECRET']}

                    get 'student', '1', {CONSISTENCY => 'TIMELINE'}

                    get 'student', '1', {CONSISTENCY => 'TIMELINE', REGION_REPLICA_ID => 1}

            get_counter

                    Return a counter cell value at specified table/row/column coordinates.

                    A counter cell should be managed with atomic increment functions on HBase

                    and the data should be binary encoded (as long value). Example:

                          hbase> get_counter 'student', '1', 'info:c1'

                    The same commands also can be run on a table reference. 

                          hbase> t.get_counter  '1', 'info:c1'

            get_splits

                    get_splits 't1'

get_splits

            incr

                        Increments a cell 'value' at specified table/row/column coordinates.

                        To increment a cell value in table 'ns1:t1' or 't1' at row 'r1' under column

                        'c1' by 1 (can be omitted) or 10 do:

                              hbase> incr 'student', '1', 'info:c1'

                              hbase> incr 'student', '1', 'info:c1', 1

                              hbase> incr 'student', '1', 'info:c1', 10

                              hbase> incr 'student', '1', 'info:c1', 10, {ATTRIBUTES=>{'mykey'=>'myvalue'}}

                              hbase> incr 'student', '1', 'info:c1', {ATTRIBUTES=>{'mykey'=>'myvalue'}}

                              hbase> incr 'student', '1', 'info:c1', 10, {VISIBILITY=>'PRIVATE|SECRET'}

                        The same commands also can be run on a table reference. 

                              hbase> t.incr  '1', 'info:c1'

                              hbase> t.incr  '1', 'info:c1', 1

                              hbase> t.incr  '1', 'info:c1', 10, {ATTRIBUTES=>{'mykey'=>'myvalue'}}

                              hbase> t.incr  '1', 'info:c1', 10, {VISIBILITY=>'PRIVATE|SECRET'}

            put    放入值

                        help 'put'

                        put 'student','3','info:name','zhao'

                        put 'student','3','info:age','18'

                        t=get_table 'student'

                        t.put '3','info:sex','male'

                        t.scan

put

            scan    扫描表

                        help 'scan'

                        Scan a table; pass table name and optionally a dictionary of scanner

                        specifications.  Scanner specifications may include one or more of:

                        TIMERANGE, FILTER, LIMIT, STARTROW, STOPROW, 

                        ROWPREFIXFILTER, TIMESTAMP,

                        MAXLENGTH, COLUMNS, CACHE, RAW, VERSIONS, ALL_METRICS, METRICS,

                        REGION_REPLICA_ID, ISOLATION_LEVEL, READ_TYPE, 

                        ALLOW_PARTIAL_RESULTS,BATCH or MAX_RESULT_SIZE

                        Some examples:

                            scan 'student'

                           scan 'student', {COLUMNS => 'info:name'}

scan 

                            scan 'student', {COLUMNS => [ 'info:name',  'info:age'], LIMIT => 10, STARTROW => '2'}

scan 

                          scan 'student', {COLUMNS => [ 'info:name',  'info:age'], TIMERANGE => [1303668804000, 2303668904000]}

                          scan 'student', {REVERSED => true}

                          scan 'student', {ALL_METRICS => true}

                         scan 'student', {METRICS => ['RPC_RETRIES', 'ROWS_FILTERED']}

                        scan 'student', {ROWPREFIXFILTER => '2', FILTER => "(QualifierFilter (>=, 'binary:name')) "}

                          scan 'student', {FILTER =>org.apache.hadoop.hbase.filter.ColumnPaginationFilter.new(1, 0)}

                          scan 'student', {CONSISTENCY => 'TIMELINE'}

                          scan 'student', {ISOLATION_LEVEL => 'READ_UNCOMMITTED'}

                          scan 'student', {MAX_RESULT_SIZE => 1}

                          scan 'student', { COLUMNS => [ 'info:name',  'info:age'], ATTRIBUTES => {'mykey' => 'myvalue'}}

                          scan 'student', { COLUMNS => [ 'info:name',  'info:age'], AUTHORIZATIONS => ['PRIVATE','SECRET']}

                        t=get_table 'student'

                        t.scan

            truncate

                 truncate  't1'



你可能感兴趣的:(好玩的大数据之22:Hbase Shell)