安装环境:
OS: Oracle linux 5.6
JDK: jdk1.6.0_18
Hadoop: hadoop-0.20.2
Hbase: hbase-0.90.5
安装准备:
1. Jdk环境已安装:版本为1.6以上
2. hadoop环境已安装:完全分布模式安装如下
http://blog.csdn.net/lichangzai/article/details/8206834
3. hbase版本选择
Hbase 版本必需与 Hadoop版本匹配,否则会安装失败或不能正常使用。关于两者何种版本能正常匹配,可以看官方文档或在网上搜寻安装的成功案例。
4. hbase软件下载
http://mirror.bjtu.edu.cn/apache/hbase/hbase-0.90.5/
安装概述:
l 配置hosts,确保涉及的主机名均可以解析为ip
l 编辑hbase-env.xml
l 编辑hbase-site.xml
l 编辑regionservers文件
l 把Hbase复制到其它节点
l 启动Hbase
l 验证启动
安装步骤:
1. 配置hosts
此步在配置hadoop时已经完成,如下:
[root@gc ~]$ cat /etc/hosts
# Do not remove the following line, or various programs
# that require network functionality will fail.
127.0.0.1 localhost.localdomain localhost
::1 localhost6.localdomain6 localhost6
192.168.2.101 rac1.localdomain rac1
192.168.2.102 rac2.localdomain rac2
192.168.2.100 gc.localdomain gc
2. 拷贝并解压安装包
[grid@gc ~]$ pwd
/home/grid
[grid@gc ~]$ tar -xzvf hbase-0.90.5.tar.gz
3. 替换hadoop核心jar包
主要目的是防止因为hbase和hadoop版本不同出现兼容问题,造成hmaster启动异常
$ pwd
/home/grid/hbase-0.90.5/lib
$ mv hadoop-core-0.20-append-r1056497.jar hadoop-core-0.20-append-r1056497.jar.bak
$ cp /home/grid/hadoop-0.20.2/hadoop-0.20.2-core.jar /home/grid/hbase-0.90.5/lib/
$ chmod 775 hadoop-0.20.2-core.jar
4. 编辑hbase-env.xml
[grid@gc conf]$ pwd
/home/grid/hbase-0.90.5/conf
[grid@gc conf]$ vi hbase-env.sh
# 添加如下内容
# The java implementation to use. Java 1.6 required.
export JAVA_HOME=/usr/java/jdk1.6.0_18
# Extra Java CLASSPATH elements. Optional.
export HBASE_CLASSPATH=/home/grid/hadoop-0.20.2/conf
# Where log files are stored. $HBASE_HOME/logs by default.
export HBASE_LOG_DIR=${HBASE_HOME}/logs
# Tell HBase whether it should manage it's own instance of Zookeeper or not.
export HBASE_MANAGES_ZK=true
5. 编辑hbase-site.xml
[grid@gc conf]$ vi hbase-site.xml
# 添加如下内容
6. 编辑regionservers文件
[grid@gc conf]$ cat regionservers
# 把localhost改为如下
rac1
rac2
7. 将修改的hbase目录同步其它节点
--分别同步到rac1,rac2两节点
[grid@gc ~]$ scp -r hbase-0.90.5 rac1:/home/grid/
[grid@gc ~]$ scp -r hbase-0.90.5 rac2:/home/grid/
8. 启动/关闭Hbase数据库集群
--启动hbase之前必需检查hadoop是否已经启动
[grid@gc ~]$ hadoop-0.20.2/bin/hadoop dfsadmin -report
Configured Capacity: 45702094848 (42.56 GB)
Present Capacity: 3562618880 (3.32 GB)
DFS Remaining: 3562348544 (3.32 GB)
DFS Used: 270336 (264 KB)
DFS Used%: 0.01%
Under replicated blocks: 4
Blocks with corrupt replicas: 0
Missing blocks: 0
-------------------------------------------------
Datanodes available: 2 (2 total, 0 dead)
Name: 192.168.2.101:50010
Decommission Status : Normal
Configured Capacity: 22851047424 (21.28 GB)
DFS Used: 135168 (132 KB)
Non DFS Used: 20131606528 (18.75 GB)
DFS Remaining: 2719305728(2.53 GB)
DFS Used%: 0%
DFS Remaining%: 11.9%
Last contact: Tue Dec 25 09:40:14 CST 2012
Name: 192.168.2.102:50010
Decommission Status : Normal
Configured Capacity: 22851047424 (21.28 GB)
DFS Used: 135168 (132 KB)
Non DFS Used: 22007869440 (20.5 GB)
DFS Remaining: 843042816(803.99 MB)
DFS Used%: 0%
DFS Remaining%: 3.69%
Last contact: Tue Dec 25 09:40:13 CST 2012
--启动Hbase集群
----在gc master节点
[grid@gc ~]$ hbase-0.90.5/bin/start-hbase.sh
rac2: starting zookeeper, logging to /home/grid/hbase-0.90.5/bin/../logs/hbase-grid-zookeeper-rac2.localdomain.out
gc: starting zookeeper, logging to /home/grid/hbase-0.90.5/bin/../logs/hbase-grid-zookeeper-gc.localdomain.out
rac1: starting zookeeper, logging to /home/grid/hbase-0.90.5/bin/../logs/hbase-grid-zookeeper-rac1.localdomain.out
starting master, logging to /home/grid/hbase-0.90.5/bin/../logs/hbase-grid-master-gc.localdomain.out
rac1: starting regionserver, logging to /home/grid/hbase-0.90.5/bin/../logs/hbase-grid-regionserver-rac1.localdomain.out
rac2: starting regionserver, logging to /home/grid/hbase-0.90.5/bin/../logs/hbase-grid-regionserver-rac2.localdomain.out
--可以看到多出两个hbase进程
[grid@gc ~]$ jps
2718 HQuorumPeer
6875 JobTracker
6799 SecondaryNameNode
8129 org.eclipse.equinox.launcher_1.1.1.R36x_v20101122_1400.jar
2864 Jps
6651 NameNode
2772 HMaster
--rac1,rac2 slave节点
[grid@rac1 ~]$ jps
23663 HRegionServer
3736 DataNode
23585 HQuorumPeer
23737 Jps
3840 TaskTracker
[grid@rac2 ~]$ jps
10579 TaskTracker
29735 HQuorumPeer
29897 Jps
10480 DataNode
29812 HRegionServer
--通过浏览器验证:
http://192.168.2.100:60010/master.jsp
--关闭hbase集群
[grid@gc hbase-0.90.5]$ bin/stop-hbase.sh
stopping hbase...................
gc: stopping zookeeper.
rac2: stopping zookeeper.
rac1: stopping zookeeper.
命令行操作:
1. 常用hbase命令
--进入habase
[grid@gc ~]$ hbase-0.90.5/bin/hbase shell
HBase Shell; enter 'help
Type "exit
Version 0.90.5, r1212209, Fri Dec 9 05:40:36 UTC 2011
hbase(main):001:0>
--查看数据库状态
hbase(main):002:0> status
2 servers, 0 dead, 1.0000 average load
--查询数据库版本
hbase(main):004:0> version
0.90.5, r1212209, Fri Dec 9 05:40:36 UTC 2011
--帮助命令
hbase(main):003:0> help
HBase Shell, version 0.90.5, r1212209, Fri Dec 9 05:40:36 UTC 2011
Type 'help "COMMAND"', (e.g. 'help "get"' -- the quotes are necessary) for help on a specific command.
Commands are grouped. Type 'help "COMMAND_GROUP"', (e.g. 'help "general"') for help on a command group.
COMMAND GROUPS:
Group name: general
Commands: status, version
Group name: ddl
Commands: alter, create, describe, disable, drop, enable, exists, is_disabled, is_enabled, list
Group name: dml
Commands: count, delete, deleteall, get, get_counter, incr, put, scan, truncate
Group name: tools
Commands: assign, balance_switch, balancer, close_region, compact, flush, major_compact, move, split, unassign, zk_dump
Group name: replication
Commands: add_peer, disable_peer, enable_peer, remove_peer, start_replication, stop_replication
SHELL USAGE:
Quote all names in HBase Shell such as table and column names. Commas delimit
command parameters. Type
Dictionaries of configuration used in the creation and alteration of tables are
Ruby Hashes. They look like this:
{'key1' => 'value1', 'key2' => 'value2', ...}
and are opened and closed with curley-braces. Key/values are delimited by the
'=>' character combination. Usually keys are predefined constants such as
NAME, VERSIONS, COMPRESSION, etc. Constants do not need to be quoted. Type
'Object.constants' to see a (messy) list of all constants in the environment.
If you are using binary keys or values and need to enter them in the shell, use
double-quote'd hexadecimal representation. For example:
hbase> get 't1', "key\x03\x3f\xcd"
hbase> get 't1', "key\003\023\011"
hbase> put 't1', "test\xef\xff", 'f1:', "\x01\x33\x40"
The HBase shell is the (J)Ruby IRB with the above HBase-specific commands added.
For more on the HBase Shell, see http://hbase.apache.org/docs/current/book.html
2. Hbase数据库操作命令
--创建表
resume表逻辑模型:
行键 |
时间戳 |
列族binfo |
列族edu |
列族work |
lichangzai |
T2 |
binfo:age=’1980-1-1’ |
|
|
T3 |
binfo:sex=’man’ |
|
|
|
T5 |
|
edu:mschool=’rq no.1’ |
|
|
T6 |
|
edu:university=’qhddx’ |
|
|
T7 |
|
|
work:company1=’12580’ |
|
changfei |
T10 |
binfo:age=’1986-2-1’ |
|
|
T11 |
|
edu:university=’bjdx’ |
|
|
T12 |
|
|
work:company1=’LG’ |
|
…… |
Tn |
|
|
|
--创建表
hbase(main):005:0> create 'resume','binfo','edu','work'
0 row(s) in 16.5710 seconds
--列出表
hbase(main):006:0> list
TABLE
resume
1 row(s) in 1.6080 seconds
--查看表结构
hbase(main):007:0> describe 'resume'
DESCRIPTION ENABLED
{NAME => 'resume', FAMILIES => [{NAME => 'binfo', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', C true
OMPRESSION => 'NONE', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'fals
e', BLOCKCACHE => 'true'}, {NAME => 'edu', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', COMPRESS
ION => 'NONE', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLO
CKCACHE => 'true'}, {NAME => 'work', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', COMPRESSION =>
'NONE', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACH
E => 'true'}]}
1 row(s) in 1.8590 seconds
--添加列族
hbase(main):014:0> disable 'resume'
0 row(s) in 4.2630 seconds
hbase(main):015:0> alter 'resume',name='f1'
0 row(s) in 4.6990 seconds
--删除列族
hbase(main):017:0> alter 'resume',{NAME=>'f1',METHOD=>'delete'}
0 row(s) in 1.1390 seconds
--或是
hbase(main):021:0> alter 'resume','delete' => 'f1'
0 row(s) in 1.9310 seconds
hbase(main):022:0> enable 'resume'
0 row(s) in 5.9060 seconds
注意:
(1) ddl命令是区分大小写的,像ddl中的alter,create, drop, enable等都必需用小写。而{}中的属性名都必需用大写。
(2) alter、drop表之前必需在先禁用(disabel)表,修改完后再启用表(enable)表,否则会报错
--查询禁用状态
hbase(main):024:0> is_disabled 'resume'
false
0 row(s) in 0.4930 seconds
hbase(main):021:0> is_enabled 'resume'
true
0 row(s) in 0.2450 seconds
--删除表
hbase(main):015:0> create 't1','f1'
0 row(s) in 15.3730 seconds
hbase(main):016:0> disable 't1'
0 row(s) in 6.4840 seconds
hbase(main):017:0> drop 't1'
0 row(s) in 7.3730 seconds
--查询表是否存在
hbase(main):018:0> exists 'resume'
Table resume does exist
0 row(s) in 2.3900 seconds
hbase(main):019:0> exists 't1'
Table t1 does not exist
0 row(s) in 1.3270 seconds
--插入数据
put 'resume','lichangzai','binfo:age','1980-1-1'
put 'resume','lichangzai','binfo:sex','man'
put 'resume','lichangzai','edu:mschool','rq no.1'
put 'resume','lichangzai','edu:university','qhddx'
put 'resume','lichangzai','work:company1','12580'
put 'resume','lichangzai','work:company2','china mobile'
put 'resume','lichangzai','binfo:site','blog.csdn.net/lichangzai'
put 'resume','lichangzai','binfo:mobile','13712345678'
put 'resume','changfei','binfo:age','1986-2-1'
put 'resume','changfei','edu:university','bjdx'
put 'resume','changfei','work:company1','LG'
put 'resume','changfei','binfo:mobile','13598765401'
put 'resume','changfei','binfo:site','hi.baidu/lichangzai'
--获取一行键的所有数据
hbase(main):014:0> get 'resume','lichangzai'
COLUMN CELL
binfo:age timestamp=1356485720612, value=1980-1-1
binfo:mobile timestamp=1356485865523, value=13712345678
binfo:sex timestamp=1356485733603, value=man
binfo:site timestamp=1356485859806, value=blog.csdn.net/lichangzai
edu:mschool timestamp=1356485750361, value=rq no.1
edu:university timestamp=1356485764211, value=qhddx
work:company1 timestamp=1356485837743, value=12580
work:company2 timestamp=1356485849365, value=china mobile
8 row(s) in 2.1090 seconds
注意:必须通过行键Row Key来查询数据
--获取一个行键,一个列族的所有数据
hbase(main):015:0> get 'resume','lichangzai','binfo'
COLUMN CELL
binfo:age timestamp=1356485720612, value=1980-1-1
binfo:mobile timestamp=1356485865523, value=13712345678
binfo:sex timestamp=1356485733603, value=man
binfo:site timestamp=1356485859806, value=blog.csdn.net/lichangzai
4 row(s) in 1.6010 seconds
--获取一个行键,一个列族中一个列的所有数据
hbase(main):017:0> get 'resume','lichangzai','binfo:sex'
COLUMN CELL
binfo:sex timestamp=1356485733603, value=man
1 row(s) in 0.8980 seconds
--更新一条记录
hbase(main):018:0> put 'resume','lichangzai','binfo:mobile','13899999999'
0 row(s) in 1.7640 seconds
hbase(main):019:0> get 'resume','lichangzai','binfo:mobile'
COLUMN CELL
binfo:mobile timestamp=1356486691591, value=13899999999
1 row(s) in 1.5710 seconds
注意:更新实质就是插入一条带有时间戳的记录,get查询时只显示最新时间的记录
--通过timestamp来获取数据
------查询最新的时间戳的数据
hbase(main):020:0> get 'resume','lichangzai',{COLUMN=>'binfo:mobile',TIMESTAMP=>1356486691591}
COLUMN CELL
binfo:mobile timestamp=1356486691591, value=13899999999
1 row(s) in 0.4060 seconds
------查之前(即删除)时间戳的数据
hbase(main):021:0> get 'resume','lichangzai',{COLUMN=>'binfo:mobile',TIMESTAMP=>1356485865523}
COLUMN CELL
binfo:mobile timestamp=1356485865523, value=13712345678
1 row(s) in 0.7780 seconds
--全表扫描
hbase(main):022:0> scan 'resume'
ROW COLUMN+CELL
changfei column=binfo:age, timestamp=1356485874056, value=1986-2-1
changfei column=binfo:mobile, timestamp=1356485897477, value=13598765401
changfei column=binfo:site, timestamp=1356485906106, value=hi.baidu/lichangzai
changfei column=edu:university, timestamp=1356485880977, value=bjdx
changfei column=work:company1, timestamp=1356485888939, value=LG
lichangzai column=binfo:age, timestamp=1356485720612, value=1980-1-1
lichangzai column=binfo:mobile, timestamp=1356486691591, value=13899999999
lichangzai column=binfo:sex, timestamp=1356485733603, value=man
lichangzai column=binfo:site, timestamp=1356485859806, value=blog.csdn.net/lichangzai
lichangzai column=edu:mschool, timestamp=1356485750361, value=rq no.1
lichangzai column=edu:university, timestamp=1356485764211, value=qhddx
lichangzai column=work:company1, timestamp=1356485837743, value=12580
lichangzai column=work:company2, timestamp=1356485849365, value=china mobile
2 row(s) in 3.6300 seconds
--删除指定行键的列族字段
hbase(main):023:0> put 'resume','changfei','binfo:sex','man'
0 row(s) in 1.2630 seconds
hbase(main):024:0> delete 'resume','changfei','binfo:sex'
0 row(s) in 0.5890 seconds
hbase(main):026:0> get 'resume','changfei','binfo:sex'
COLUMN CELL
0 row(s) in 0.5560 seconds
--删除整行
hbase(main):028:0> create 't1','f1','f2'
0 row(s) in 8.3950 seconds
hbase(main):029:0> put 't1','a','f1:col1','xxxxx'
0 row(s) in 2.6790 seconds
hbase(main):030:0> put 't1','a','f1:col2','xyxyx'
0 row(s) in 0.5130 seconds
hbase(main):031:0> put 't1','b','f2:cl1','ppppp'
0 row(s) in 1.2620 seconds
hbase(main):032:0> deleteall 't1','a'
0 row(s) in 1.2030 seconds
hbase(main):033:0> get 't1','a'
COLUMN CELL
0 row(s) in 0.8980 seconds
--查询表中有多少行
hbase(main):035:0> count 'resume'
2 row(s) in 2.8150 seconds
hbase(main):036:0> count 't1'
1 row(s) in 0.9500 seconds
--清空表
hbase(main):037:0> truncate 't1'
Truncating 't1' table (it may take a while):
- Disabling table...
- Dropping table...
- Creating table...
0 row(s) in 21.0060 seconds
注意:Truncate表的处理过程:由于Hadoop的HDFS文件系统不允许直接修改,所以只能先删除表在重新创建已达到清空表的目的
3.
遇到的问题
问题:
在刚配置完成hbase安装后,各节点进程还正常,可是过一小段时间后,master节点的HMaster进程就自已停止了。
之后再重新启动master节点后,就出现了下面的问题
--master节点缺少HMaster进程
[grid@gc bin]$ ./start-hbase.sh
rac1: starting zookeeper, logging to /home/grid/hbase-0.90.5/bin/../logs/hbase-grid-zookeeper-rac1.localdomain.out
rac2: starting zookeeper, logging to /home/grid/hbase-0.90.5/bin/../logs/hbase-grid-zookeeper-rac2.localdomain.out
gc: starting zookeeper, logging to /home/grid/hbase-0.90.5/bin/../logs/hbase-grid-zookeeper-gc.localdomain.out
starting master, logging to /home/grid/hbase-0.90.5/bin/../logs/hbase-grid-master-gc.localdomain.out
rac2: starting regionserver, logging to /home/grid/hbase-0.90.5/bin/../logs/hbase-grid-regionserver-rac2.localdomain.out
rac1: starting regionserver, logging to /home/grid/hbase-0.90.5/bin/../logs/hbase-grid-regionserver-rac1.localdomain.out
[grid@gc bin]$ jps
3871 NameNode
4075 JobTracker
8853 Jps
4011 SecondaryNameNode
8673 HQuorumPeer
--两slave节点rac1,rac2进程正常
[grid@rac1 bin]$ jps
10353 HQuorumPeer
10576 Jps
6457 DataNode
6579 TaskTracker
10448 HRegionServer
[grid@rac2 ~]$ jps
10311 HQuorumPeer
10534 Jps
6426 DataNode
6546 TaskTracker
10391 HRegionServer
下面是部分日志
--master节点gc的日志
[grid@gc logs]$ tail -100f hbase-grid-master-gc.localdomain.log
2012-12-25 15:23:45,842 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server gc/192.168.2.100:2181
2012-12-25 15:23:45,853 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to gc/192.168.2.100:2181, initiating session
2012-12-25 15:23:45,861 INFO org.apache.zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect
2012-12-25 15:23:46,930 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server rac1/192.168.2.101:2181
2012-12-25 15:23:47,167 WARN org.apache.zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1119)
2012-12-25 15:23:48,251 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server rac2/192.168.2.102:2181
2012-12-25 15:23:48,362 INFO org.apache.zookeeper.ZooKeeper: Session: 0x0 closed
2012-12-25 15:23:48,362 INFO org.apache.zookeeper.ClientCnxn: EventThread shut down
2012-12-25 15:23:48,367 ERROR org.apache.hadoop.hbase.master.HMasterCommandLine: Failed to start master
java.lang.RuntimeException: Failed construction of Master: class org.apache.hadoop.hbase.master.HMaster
at org.apache.hadoop.hbase.master.HMaster.constructMaster(HMaster.java:1065)
at org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:142)
at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:102)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:76)
at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:1079)
Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase
at org.apache.zookeeper.KeeperException.create(KeeperException.java:90)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:809)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:837)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.createAndFailSilent(ZKUtil.java:931)
at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at org.apache.hadoop.hbase.master.HMaster.constructMaster(HMaster.java:1060)
[grid@gc logs]$ tail -100f hbase-grid-zookeeper-gc.localdomain.log
2012-12-25 15:23:57,380 WARN org.apache.zookeeper.server.quorum.QuorumCnxManager: Cannot open channel to 2 at election address rac2/192.168.2.102:3888
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:100)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:366)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:335)
at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:360)
at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:333)
at java.lang.Thread.run(Thread.java:619)
.......
2012-12-25 15:23:57,670 INFO org.apache.zookeeper.server.ZooKeeperServer: Server environment:user.home=/home/grid
2012-12-25 15:23:57,671 INFO org.apache.zookeeper.server.ZooKeeperServer: Server environment:user.dir=/home/grid/hbase-0.90.5
2012-12-25 15:23:57,679 INFO org.apache.zookeeper.server.ZooKeeperServer: Created server with tickTime 3000 minSessionTimeout 6000 maxSessionTimeout 180000 datadir /home/grid/hbase-0.90.5/zookeeper/version-2 snapdir /home/grid/hbase-0.90.5/zookeeper/version-2
2012-12-25 15:23:58,118 WARN org.apache.zookeeper.server.quorum.Learner: Unexpected exception, tries=0, connecting to rac1/192.168.2.101:2888
java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)
at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195)
at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
at java.net.Socket.connect(Socket.java:525)
at org.apache.zookeeper.server.quorum.Learner.connectToLeader(Learner.java:212)
at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:65)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:644) at
2012-12-25 15:24:00,886 INFO org.apache.zookeeper.server.quorum.Learner: Getting a snapshot from leader
2012-12-25 15:24:00,897 INFO org.apache.zookeeper.server.quorum.Learner: Setting leader epoch 9
2012-12-25 15:24:01,051 INFO org.apache.zookeeper.server.persistence.FileTxnSnapLog: Snapshotting: 900000000
2012-12-25 15:24:03,218 INFO org.apache.zookeeper.server.NIOServerCnxn: Accepted socket connection from /192.168.2.101:12397
2012-12-25 15:24:03,377 INFO org.apache.zookeeper.server.NIOServerCnxn: Client attempting to establish new session at /192.168.2.101:12397
2012-12-25 15:24:03,396 WARN org.apache.zookeeper.server.quorum.Learner: Got zxid 0x900000001 expected 0x1
2012-12-25 15:24:03,400 INFO org.apache.zookeeper.server.persistence.FileTxnLog: Creating new log file: log.900000001
2012-12-25 15:24:03,470 INFO org.apache.zookeeper.server.NIOServerCnxn: Established session 0x3bd0f2560e0000 with negotiated timeout 180000 for client /192.168.2.101:12397
2012-12-25 15:24:07,057 INFO org.apache.zookeeper.server.NIOServerCnxn: Accepted socket connection from /192.168.2.102:52300
2012-12-25 15:24:07,690 INFO org.apache.zookeeper.server.NIOServerCnxn: Client attempting to establish new session at /192.168.2.102:52300
2012-12-25 15:24:07,712 INFO org.apache.zookeeper.server.NIOServerCnxn: Established session 0x3bd0f2560e0001 with negotiated timeout 180000 for client /192.168.2.102:52300
2012-12-25 15:24:10,016 INFO org.apache.zookeeper.server.quorum.FastLeaderElection: Notification: 2 (n.leader), 34359738398 (n.zxid), 1 (n.round), LOOKING (n.state), 2 (n.sid), FOLLOWING (my state)
2012-12-25 15:24:30,422 INFO org.apache.zookeeper.server.quorum.FastLeaderElection: Notification: 2 (n.leader), 34359738398 (n.zxid), 2 (n.round), LOOKING (n.state), 2 (n.sid), FOLLOWING (my state)
2012-12-25 15:24:30,423 INFO org.apache.zookeeper.server.quorum.FastLeaderElection: Notification: 2 (n.leader), 34359738398 (n.zxid), 2 (n.round), LOOKING (n.state), 2 (n.sid), FOLLOWING (my state)
--slave节点rac2的日志
[grid@rac2 logs]$ tail -100f hbase-grid-regionserver-rac2.localdomain.log
2012-12-25 15:23:46,939 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server rac1/192.168.2.101:2181
2012-12-25 15:23:47,154 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to rac1/192.168.2.101:2181, initiating session
2012-12-25 15:23:47,453 INFO org.apache.zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect
2012-12-25 15:23:47,977 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server gc/192.168.2.100:2181
2012-12-25 15:23:48,354 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to gc/192.168.2.100:2181, initiating session
2012-12-25 15:23:49,583 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete on server gc/192.168.2.100:2181, sessionid = 0x3bd0f2560e0001, negotiated timeout = 180000
2012-12-25 15:23:52,052 INFO org.apache.hadoop.hbase.regionserver.ShutdownHook: Installed shutdown hook thread: Shutdownhook:regionserver60020
解决方法
禁用IPV6,将/etc/hosts文件里面的::1 localhost那一行删掉重启
[grid@rac1 ~]$ cat /etc/hosts
127.0.0.1 localhost.localdomain localhost
# ::1 localhost6.localdomain6 localhosti6
192.168.2.101 rac1.localdomain rac1
192.168.2.102 rac2.localdomain rac2
192.168.2.100 gc.localdomaingc
Hbase故障解决:
http://wiki.apache.org/hadoop/Hbase/Troubleshooting
参考了这名网友的文章:
http://chfpdxx.blog.163.com/blog/static/29542296201241411325789/