hbase 分布式部署

环境:

[hadoop@big-master2 ~]$ cat /etc/hosts

127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4

::1 localhost localhost.localdomain localhost6 localhost6.localdomain6

 

## bigdata cluster ##

192.168.41.20 big-master1 #bigdata1 namenode1,zookeeper,resourcemanager

192.168.41.21 big-master2 #bigdata2 namenode2,zookeeper,slave,resourcemanager

192.168.41.22 big-slave01 #bigdata3 datanode1,zookeeper,slave

192.168.41.25 big-slave02 #bigdata4 datanode2,zookeeper,slave

192.168.41.27 big-slave03 #bigdata5 datanode3,zookeeper,slave

  • HMaster是Master Server的实现,负责监控集群中的RegionServer实例,同时是所有metadata改变的接口,在集群中,通常运行在NameNode上面。
    • HMasterInterface暴露的接口,Table(createTable, modifyTable, removeTable, enable, disable),ColumnFamily (addColumn, modifyColumn, removeColumn),Region (move, assign, unassign)
    • Master运行的后台线程:LoadBalancer线程,控制region来平衡集群的负载。CatalogJanitor线程,周期性的检查hbase:meta表。
  • HRegionServer是RegionServer的实现,服务和管理Regions,集群中RegionServer运行在DataNode
    • HRegionRegionInterface暴露接口:Data (get, put, delete, next, etc.),Region (splitRegion, compactRegion, etc.)
    • RegionServer后台线程:CompactSplitThread,MajorCompactionChecker,MemStoreFlusher,LogRoller
  • Regions,代表table,Region有多个Store(列簇),Store有一个Memstore和多个StoreFiles(HFiles),StoreFiles的底层是Block。

存储设计

在Hbase中,表被分割成多个更小的块然后分散的存储在不同的服务器上,这些小块叫做Regions,存放Regions的地方叫做RegionServer。Master进程负责处理不同的RegionServer之间的Region的分发。在Hbase实现中HRegionServer和HRegion类代表RegionServer和Region。HRegionServer除了包含一些HRegions之外,还处理两种类型的文件用于数据存储

  • HLog, 预写日志文件,也叫做WAL(write-ahead log)
  • HFile 真实的数据存储文件

HLog

  • MasterProcWAL:HMaster记录管理操作,比如解决冲突的服务器,表创建和其它DDLs等操作到它的WAL文件中,这个WALs存储在MasterProcWALs目录下,它不像RegionServer的WALs,HMaster的WAL也支持弹性操作,就是如果Master服务器挂了,其它的Master接管的时候继续操作这个文件。
  • WAL记录所有的Hbase数据改变,如果一个RegionServer在MemStore进行FLush的时候挂掉了,WAL可以保证数据的改变被应用到。如果写WAL失败了,那么修改数据的完整操作就是失败的。
    • 通常情况,每个RegionServer只有一个WAL实例。在2.0之前,WAL的实现叫做HLog
    • WAL位于/hbase/WALs/目录下
    • MultiWAL: 如果每个RegionServer只有一个WAL,由于HDFS必须是连续的,导致必须写WAL连续的,然后出现性能问题。MultiWAL可以让RegionServer同时写多个WAL并行的,通过HDFS底层的多管道,最终提升总的吞吐量,但是不会提升单个Region的吞吐量。
  • WAL的配置:

// 启用multiwal hbase.wal.provider multiwal

 

###############################################

----------- 开始 ---------

###############################################

## 自定义部署 Hbase 分布式定义 ##

big-master1 hmaster1

big-master2 hmaster2

big-slave01 HRegionServer1 ## data

big-slave02 HRegionServer2 ## data

big-slave03 HRegionServer3 ## data

###########################################

下载地址:官方: https://mirror.bit.edu.cn/apache/hbase/

 

#################################################

作为一个 IT 农,是不是或多或少有些强迫症,比如用软件就用最新的~

---- 因为太懒,所以一些图片不想单独上传,排版,所以,将就将就吧。

HBase 和 JDK 兼容性

HBase Version

JDK 7

JDK 8

JDK 9

JDK 10

2.0

Not Supported

yes

Not Supported

Not Supported

1.3

yes

yes

Not Supported

Not Supported

1.2

yes

yes

Not Supported

Not Supported

从该表可以看出,JDK建议用 JDK7 或者 JDK8。但用 JDK7 时,HBase2.0 不支持。当然也没事,因为大多数企业生产环境,还是 1.x 版本。

HBase 和 Hadoop 兼容性

Hadoop version support matrix

  • "S" = supported
  • "X" = not supported
  • "NT" = Not tested

 

HBase-1.2.x

HBase-1.3.x

HBase-1.5.x

HBase-2.0.x

HBase-2.1.x

Hadoop-2.4.x

S

S

X

X

X

Hadoop-2.5.x

S

S

X

X

X

Hadoop-2.6.0

X

X

X

X

X

Hadoop-2.6.1+

S

S

X

S

X

Hadoop-2.7.0

X

X

X

X

X

Hadoop-2.7.1+

S

S

S

S

S

Hadoop-2.8.[0-1]

X

X

X

X

X

Hadoop-2.8.2

NT

NT

NT

NT

NT

Hadoop-2.8.3+

NT

NT

NT

S

S

Hadoop-2.9.0

X

X

X

X

X

Hadoop-2.9.1+

NT

NT

NT

NT

NT

Hadoop-3.0.x

X

X

X

X

X

Hadoop-3.1.0

X

X

X

X

X

从该表可以看出,学习 HBase,兼容各个版本的 Hadoop 版本还是2.7.1+ 系列,所以 2.8.x、2.9.x、3.x并不是最好的选择。

Hadoop 和 JDK 兼容性

Version 2.7 and later of Apache Hadoop requires Java 7. It is built and tested on both OpenJDK and Oracle (HotSpot)'s JDK/JRE.

Earlier versions (2.6 and earlier) support Java 6.

Here are the known JDKs in use or which have been tested:

Version

Status

Reported By

oracle 1.7.0_15

Good

Cloudera

oracle 1.7.0_21

Good (4)

Hortonworks

oracle 1.7.0_45

Good

Pivotal

openjdk 1.7.0_09-icedtea

Good (5)

Hortonworks

oracle 1.6.0_16

Avoid (1)

Cloudera

oracle 1.6.0_18

Avoid

Many

oracle 1.6.0_19

Avoid

Many

oracle 1.6.0_20

Good (2)

LinkedIn, Cloudera

oracle 1.6.0_21

Good (2)

Yahoo!, Cloudera

oracle 1.6.0_24

Good

Cloudera

oracle 1.6.0_26

Good(2)

Hortonworks, Cloudera

oracle 1.6.0_28

Good

LinkedIn

oracle 1.6.0_31

Good(3, 4)

Cloudera, Hortonworks

从该表可以看出,Hadoop 版本所依赖 JDK 环境,7 版本已经过测试,8 目前在官网无体现。所以还是选择 JDK7为好,而且是 JDK7 的中间版本,并不是最新版本。

总结

综上,建议安装:

JDK:Java SE Runtime Environment 7u45(当然其它 7版本 系列也可尝试,问题应该不大,下载地址:http://www.oracle.com/technetwork/java/javase/downloads/java-archive-downloads-javase7-521261.html)

Hadoop:2.7.1+(下载地址:https://archive.apache.org/dist/hadoop/common/)

HBase:1.x 系列(下载地址:http://archive.apache.org/dist/hbase/)

 

#####################

开始部署: -- 我这里是基于hdfs,zookeeper,时间同步,ssh等效性 等集群已经部署到位。

(一)

下载,解压:

gunzip hbase-2.2.5-bin.tar.gz

tar -xvf hbase-2.2.5-bin.tar -C /usr/local/

cd /usr/local/

mv hbase-2.2.5 hbase

[root@big-master1 ~]# cd /usr/local/hbase/

[root@big-master1 hbase]# ls

bin CHANGES.md conf hbase-webapps LEGAL lib LICENSE.txt logs NOTICE.txt README.txt RELEASENOTES.md

 

(二)

配置参数:

1, /etc/profile 配置hbase全局变量。

### JDK ###

JAVA_HOME=/usr/local/jdk1.8.0_251

CLASSPATH=$JAVA_HOME/lib/tools.jar$JAVA_HOME/lib/dt.jar

PATH=$JAVA_HOME/bin:$PATH

export JAVA_HOME CLASSPATH PATH

 

### zookeeper ##

export ZK_HOME=/usr/local/zookeeper

export PATH=$ZK_HOME/bin:$PATH

 

### hadoop ##

export HADOOP_HOME=/usr/local/hadoop

export PATH=$HADOOP_HOME/bin:$PATH:$HADOOP_HOME/sbin:$PATH

 

## tools ##

export PATH=/home/hadoop/tools:$PATH

 

## sqoop ##

export SQOOP_HOME=/usr/local/sqoop

export PATH=$SQOOP_HOME/bin:$PATH

 

## flume ##

export FLUME_HOME=/usr/local/flume

export PATH=$FLUME_HOME/bin:$PATH

 

## hbase ##

export HBASE_HOME=/usr/local/hbase

export PATH=$HBASE_HOME/bin:$PATH

 

2, 修改文件$HBASE_HOME/conf/hbase-env.sh,新增修改内容如下:

## JDK 路径

export JAVA_HOME=/usr/local/jdk1.8.0_251

## 关闭hbase 自带的zookeeper

export HBASE_MANAGES_ZK=false

 

3, 修改文件$HBASE_HOME/conf/regionservers ,新增修改内容如下:

[hadoop@big-master1 conf]$ pwd

/usr/local/hbase/conf

[hadoop@big-master1 conf]$ cat regionservers

big-slave01

big-slave02

big-slave03

 

4, 修改文件$HBASE_HOME/conf/hbase-site.xml ,新增修改内容如下:

hbase.zookeeper.quorum

big-master1:2181,big-master2:2181,big-slave01:2181,big-slave02:2181,big-slave03:2181

The directory share by RegionServers.

hbase.zookeeper.property.clientPort

2181

 

hbase.zookeeper.property.dataDir

/data/hbase/zk

Property from ZooKeeper config zoo.cfg. The Directory where the snapshot is stored.

 

hbase.rootdir

hdfs://cluster1/hbase

The Directory shared by RegionServers.

 

hbase.cluster.distributed

true

Possible values are false:standalone and pseudo-distributed setups with managed

zookeeper true:fully-distributed with unmanaged zookeeper Quorum(see hbase-env.sh)

 

 

5, 将/usr/local/hbase 目录和/etc/profile 文件 拷贝至其他几个节点,big-master2,big-slave01 - big-slave03,并授权 :

[root@big-master1 local]# rsync -avzP /usr/local/hbase big-master2:/usr/local/

[root@big-master1 local]# rsync -avzP /usr/local/hbase big-slave01:/usr/local/

[root@big-master1 local]# rsync -avzP /usr/local/hbase big-slave02:/usr/local/

[root@big-master1 local]# rsync -avzP /usr/local/hbase big-slave03:/usr/local/

 

6,创建对应的目录:

mkdir -pv /data/hbase/zk

 

(三)

Hbase 启动及验证:

在big-master1 上启动Hbase 集群服务.

[hadoop@big-master1 ~]$ start-hbase.sh

SLF4J: Class path contains multiple SLF4J bindings.

SLF4J: Found binding in [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/usr/local/hbase/lib/client-facing-thirdparty/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.

SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

running master, logging to /usr/local/hbase/logs/hbase-hadoop-master-big-master1.out

big-slave02: running regionserver, logging to /usr/local/hbase/bin/../logs/hbase-hadoop-regionserver-big-slave02.out

big-slave03: running regionserver, logging to /usr/local/hbase/bin/../logs/hbase-hadoop-regionserver-big-slave03.out

big-slave01: running regionserver, logging to /usr/local/hbase/bin/../logs/hbase-hadoop-regionserver-big-slave01.out

[hadoop@big-master1 ~]$ jps

30037 JournalNode

10181 HMaster

10438 Jps

4023 ResourceManager

29642 DFSZKFailoverController

29804 NameNode

28141 QuorumPeerMain

 

在big-master2节点上再启动一个Hmaster进程,构成高可用环境。

[hadoop@big-master2 ~]$ hbase-daemon.sh start master

running master, logging to /usr/local/hbase/logs/hbase-hadoop-master-big-master2.out

[hadoop@big-master2 ~]$ jps

20032 NameNode

20116 JournalNode

20324 DFSZKFailoverController

31540 HMaster

31704 Jps

18830 QuorumPeerMain

2462 ResourceManager

 

其他slave 节点,直接jps 验证查看即可:

[hadoop@big-slave01 ~]$ jps

10161 NodeManager

28513 Jps

28338 HRegionServer

7702 QuorumPeerMain

8583 DataNode

8686 JournalNode

 

[hadoop@big-slave02 ~]$ jps

26097 Jps

5187 DataNode

6697 NodeManager

4362 QuorumPeerMain

5290 JournalNode

25869 HRegionServer

 

[hadoop@big-slave03 ~]$ jps

26193 Jps

4562 QuorumPeerMain

5442 DataNode

26004 HRegionServer

6903 NodeManager

5545 JournalNode

 

hbase 默认端口为16010 ,直接登陆

http://192.168.41.20:16010/master-status

http://192.168.41.21:16010/master-status

(四)

基本操作:

[root@big-master1 ~]# su - hadoop

Last login: Thu Jun 4 23:34:55 CST 2020 on pts/0

[hadoop@big-master1 ~]$ hbase shell

SLF4J: Class path contains multiple SLF4J bindings.

SLF4J: Found binding in [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/usr/local/hbase/lib/client-facing-thirdparty/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.

SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

HBase Shell

Use "help" to get list of supported commands.

Use "exit" to quit this interactive shell.

For Reference, please visit: http://hbase.apache.org/2.0/book.html#shell

Version 2.2.5, rf76a601273e834267b55c0cda12474590283fd4c, 2020年 05月 21日 星期四 18:34:40 CST

Took 0.0082 seconds

hbase(main):001:0> whoami

hadoop (auth:SIMPLE)

groups: hadoop

Took 0.0332 seconds

hbase(main):002:0> create 'User','info'

Created table User

Took 10.1702 seconds

=> Hbase::Table - User

hbase(main):003:0> list

TABLE

User

1 row(s)

Took 0.0573 seconds

=> ["User"]

 

---  以下内容为摘抄写 ------

  • 表结构

1. 创建表

语法:create

, {NAME => , VERSIONS => }

创建一个User表,并且有一个info列族

hbase(main):002:0> create 'User','info' 0 row(s) in 1.5890 seconds => Hbase::Table - User

3. 查看所有表

hbase(main):003:0> list TABLE SYSTEM.CATALOG SYSTEM.FUNCTION SYSTEM.SEQUENCE SYSTEM.STATS TEST.USER User 6 row(s) in 0.0340 seconds => ["SYSTEM.CATALOG", "SYSTEM.FUNCTION", "SYSTEM.SEQUENCE", "SYSTEM.STATS", "TEST.USER", "User"]

4. 查看表详情

hbase(main):004:0> describe 'User' Table User is ENABLED User COLUMN FAMILIES DESCRIPTION {NAME => 'info', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FORE VER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'} 1 row(s) in 0.1410 seconds hbase(main):025:0> desc 'User' Table User is ENABLED User COLUMN FAMILIES DESCRIPTION {NAME => 'info', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FORE VER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'} 1 row(s) in 0.0380 seconds

5. 表修改

删除指定的列族

hbase(main):002:0> alter 'User', 'delete' => 'info' Updating all regions with the new schema... 1/1 regions updated. Done. 0 row(s) in 2.5340 seconds

  • 表数据

1. 插入数据

语法:put

,,,

hbase(main):005:0> put 'User', 'row1', 'info:name', 'xiaoming' 0 row(s) in 0.1200 seconds hbase(main):006:0> put 'User', 'row2', 'info:age', '18' 0 row(s) in 0.0170 seconds hbase(main):007:0> put 'User', 'row3', 'info:sex', 'man' 0 row(s) in 0.0030 seconds

2. 根据rowKey查询某个记录

语法:get

,,[,....]

hbase(main):008:0> get 'User', 'row2' COLUMN CELL info:age timestamp=1502368069926, value=18 1 row(s) in 0.0280 seconds hbase(main):028:0> get 'User', 'row3', 'info:sex' COLUMN CELL info:sex timestamp=1502368093636, value=man hbase(main):036:0> get 'User', 'row1', {COLUMN => 'info:name'} COLUMN CELL info:name timestamp=1502368030841, value=xiaoming 1 row(s) in 0.0120 seconds

3. 查询所有记录

语法:scan

, {COLUMNS => [ ,.... ], LIMIT => num}

扫描所有记录

hbase(main):009:0> scan 'User' ROW COLUMN+CELL row1 column=info:name, timestamp=1502368030841, value=xiaoming row2 column=info:age, timestamp=1502368069926, value=18 row3 column=info:sex, timestamp=1502368093636, value=man 3 row(s) in 0.0380 seconds

扫描前2条

hbase(main):037:0> scan 'User', {LIMIT => 2} ROW COLUMN+CELL row1 column=info:name, timestamp=1502368030841, value=xiaoming row2 column=info:age, timestamp=1502368069926, value=18 2 row(s) in 0.0170 seconds

范围查询

hbase(main):011:0> scan 'User', {STARTROW => 'row2'} ROW COLUMN+CELL row2 column=info:age, timestamp=1502368069926, value=18 row3 column=info:sex, timestamp=1502368093636, value=man 2 row(s) in 0.0170 seconds hbase(main):012:0> scan 'User', {STARTROW => 'row2', ENDROW => 'row2'} ROW COLUMN+CELL row2 column=info:age, timestamp=1502368069926, value=18 1 row(s) in 0.0110 seconds hbase(main):013:0> scan 'User', {STARTROW => 'row2', ENDROW => 'row3'} ROW COLUMN+CELL row2 column=info:age, timestamp=1502368069926, value=18 1 row(s) in 0.0120 seconds

另外,还可以添加TIMERANGE和FITLER等高级功能

STARTROW,ENDROW必须大写,否则报错;查询结果不包含等于ENDROW的结果集

4. 统计表记录数

语法:count

, {INTERVAL => intervalNum, CACHE => cacheNum}

INTERVAL设置多少行显示一次及对应的rowkey,默认1000;CACHE每次去取的缓存区大小,默认是10,调整该参数可提高查询速度

hbase(main):020:0> count 'User' 3 row(s) in 0.0360 seconds => 3

5. 删除

删除列

hbase(main):008:0> delete 'User', 'row1', 'info:age' 0 row(s) in 0.0290 seconds

删除所有行

hbase(main):014:0> deleteall 'User', 'row2' 0 row(s) in 0.0090 seconds

删除表中所有数据

hbase(main):016:0> truncate 'User' Truncating 'User' table (it may take a while): - Disabling table... - Truncating table... 0 row(s) in 3.6610 seconds

  • 表管理

1. 禁用表

hbase(main):014:0> disable 'User' 0 row(s) in 2.2660 seconds hbase(main):015:0> describe 'User' Table User is DISABLED User COLUMN FAMILIES DESCRIPTION {NAME => 'info', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FORE VER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'} 1 row(s) in 0.0340 seconds hbase(main):016:0> scan 'User', {STARTROW => 'row2', ENDROW => 'row3'} ROW COLUMN+CELL ERROR: User is disabled.

2. 启用表

hbase(main):017:0> enable 'User' 0 row(s) in 1.3470 seconds hbase(main):018:0> describe 'User' Table User is ENABLED User COLUMN FAMILIES DESCRIPTION {NAME => 'info', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FORE VER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'} 1 row(s) in 0.0310 seconds hbase(main):019:0> scan 'User', {STARTROW => 'row2', ENDROW => 'row3'} ROW COLUMN+CELL row2 column=info:age, timestamp=1502368069926, value=18 1 row(s) in 0.0280 seconds

3. 测试表是否存在

hbase(main):022:0> exists 'User' Table User does exist 0 row(s) in 0.0150 seconds hbase(main):023:0> exists 'user' Table user does not exist 0 row(s) in 0.0110 seconds hbase(main):024:0> exists user NameError: undefined local variable or method `user' for #

4. 删除表

删除前,必须先disable

hbase(main):030:0> drop 'TEST.USER' ERROR: Table TEST.USER is enabled. Disable it first. Here is some help for this command: Drop the named table. Table must first be disabled: hbase> drop 't1' hbase> drop 'ns1:t1' hbase(main):031:0> disable 'TEST.USER' 0 row(s) in 2.2640 seconds hbase(main):033:0> drop 'TEST.USER' 0 row(s) in 1.2490 seconds hbase(main):034:0> list TABLE SYSTEM.CATALOG SYSTEM.FUNCTION SYSTEM.SEQUENCE SYSTEM.STATS User 5 row(s) in 0.0080 seconds => ["SYSTEM.CATALOG", "SYSTEM.FUNCTION", "SYSTEM.SEQUENCE", "SYSTEM.STATS", "User"]

 

你可能感兴趣的:(㊣,BigData,㊣)