关于hbase zookeeper 启动过程的一次剖析:
在做完一些配置后
[root@hadoop002 conf]# vi hbase-site.xml
<configuration>
<property>
<name>hbase.rootdirname>
<value>hdfs://hadoop002:8020/hbasevalue>
property>
<property>
<name>hbase.zookeeper.property.dataDirname>
<value>/home/hbase/zookeepervalue>
property>
<property>
<name>hbase.cluster.distributedname>
<value>truevalue>
property>
configuration>
启动hbase,发现 zookeeper 启动的时候会 先ssh 0.0.0.0 导致我要输入密码才能运行。
这明显是不合理的,如果在复杂环境可能会出错,而且这种情况体现的是配置不可控,即便能正常运行,也存在风险。
[root@hadoop002 bin]# start-hbase.sh
localhost: starting zookeeper, logging to /opt/software/hbase-1.2.0-cdh5.7.0/bin/../logs/hbase-root-zookeeper-hadoop002.out
starting master, logging to /opt/software/hbase-1.2.0-cdh5.7.0/logs/hbase-root-master-hadoop002.out
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option PermSize=128m; support was removed in 8.0
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=128m; support was removed in 8.0
hadoop002: starting regionserver, logging to /opt/software/hbase-1.2.0-cdh5.7.0/bin/../logs/hbase-root-regionserver-hadoop002.out
hadoop002: Java HotSpot(TM) 64-Bit Server VM warning: ignoring option PermSize=128m; support was removed in 8.0
hadoop002: Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=128m; support was removed in 8.0
那么,是不是我漏了什么配置呢?
开始检查。
1、启动的时候使用了 start-hbase.sh 。
[root@hadoop002 bin]# cat start-hbase.sh
发现这样一段话:
if [ "$distMode" == 'false' ]
then
"$bin"/hbase-daemon.sh --config "${HBASE_CONF_DIR}" $commandToRun master $@
else
"$bin"/hbase-daemons.sh --config "${HBASE_CONF_DIR}" $commandToRun zookeeper
"$bin"/hbase-daemon.sh --config "${HBASE_CONF_DIR}" $commandToRun master
"$bin"/hbase-daemons.sh --config "${HBASE_CONF_DIR}" \
--hosts "${HBASE_REGIONSERVERS}" $commandToRun regionserver
"$bin"/hbase-daemons.sh --config "${HBASE_CONF_DIR}" \
--hosts "${HBASE_BACKUP_MASTERS}" $commandToRun master-backup
fi
其中关于zookeeper的是 $bin"/hbase-daemons.sh --config "${HBASE_CONF_DIR}" $commandToRun zookeeper
,那么检查bin目录下是否有zookeeper文件?
检查 zookeepers
[root@hadoop002 bin]# cat zookeepers.sh
发现这样一段话:
if [ "$HBASE_MANAGES_ZK" = "true" ]; then
hosts=`"$bin"/hbase org.apache.hadoop.hbase.zookeeper.ZKServerTool | grep '^ZK host:' | sed 's,^ZK host:,,'`
cmd=$"${@// /\\ }"
for zookeeper in $hosts; do
ssh $HBASE_SSH_OPTS $zookeeper $cmd 2>&1 | sed "s/^/$zookeeper: /" &
if [ "$HBASE_SLAVE_SLEEP" != "" ]; then
sleep $HBASE_SLAVE_SLEEP
fi
done
fi
可见 ssh $HBASE_SSH_OPTS $zookeeper ,$HBASE_SSH_OPTS
在shell中好像是空,$zookeeper
为 $hosts
,而$hosts 为 "$bin"/hbase org.apache.hadoop.hbase.zookeeper.ZKServerTool | grep '^ZK host:' | sed 's,^ZK host:,,'
手动执行
/opt/software/hbase-1.2.0-cdh5.7.0/bin/hbase org.apache.hadoop.hbase.zookeeper.ZKServerTool
ZK host: localhost
可见,hbase 运行 ZKServerTool 类得到的结果进行字符串截断获得。
去源码中查看 ZKServerTool 类:
package org.apache.hadoop.hbase.zookeeper;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.HBaseInterfaceAudience;
import org.apache.hadoop.hbase.HConstants;
import org.apache.hadoop.hbase.ServerName;
import org.apache.yetus.audience.InterfaceAudience;
import java.util.LinkedList;
import java.util.List;
public class ZKServerTool {
public static ServerName[] readZKNodes(Configuration conf) {
List hosts = new LinkedList<>();
String quorum = conf.get(HConstants.ZOOKEEPER_QUORUM, HConstants.LOCALHOST);
String[] values = quorum.split(",");
for (String value : values) {
String[] parts = value.split(":");
String host = parts[0];
int port = HConstants.DEFAULT_ZOOKEPER_CLIENT_PORT;
if (parts.length > 1) {
port = Integer.parseInt(parts[1]);
}
hosts.add(ServerName.valueOf(host, port, -1));
}
return hosts.toArray(new ServerName[hosts.size()]);
}
/**
* Run the tool.
* @param args Command line arguments.
*/
public static void main(String args[]) {
for(ServerName server: readZKNodes(HBaseConfiguration.create())) {
// bin/zookeeper.sh relies on the "ZK host" string for grepping which is case sensitive.
System.out.println("ZK host: " + server.getHostname());
}
}
}
发现:System.out.println("ZK host: " + server.getHostname());
为类的输出。
server.getHostname()
为值,继续检查,server 为 readZKNodes(HBaseConfiguration.create())
的结果。
去readZKNodes
类中查看入参,发现HConstants.ZOOKEEPER_QUORUM, HConstants.LOCALHOST
这两个,并且 HConstants.ZOOKEEPER_QUORUM
在前面。
继续在源码中查看HConstants.ZOOKEEPER_QUORUM 和 HConstants.LOCALHOST 方法。
在hbase/hbase-common/src/main/java/org/apache/hadoop/hbase/HConstants.java
中发现
/** Name of ZooKeeper quorum configuration parameter. */
public static final String ZOOKEEPER_QUORUM = "hbase.zookeeper.quorum";
/** Host name of the local machine */
public static final String LOCALHOST = "localhost";
这时候发现这个参数:hbase.zookeeper.quorum
去hbase官网 default configuration
(http://hbase.apache.org/book.html#hbase_default_configurations)中检查这个参数 hbase.zookeeper.quorum
3.Configure ZooKeeper
In reality, you should carefully consider your ZooKeeper configuration.
You can find out more about configuring ZooKeeper in zookeeper section.
This configuration will direct HBase to start and manage a ZooKeeper instance on each node of the cluster.
On node-a, edit conf/hbase-site.xml and add the following properties.
发现在之前配置的时候少配了一个参数 hbase.zookeeper.quorum 。
在 hbase-site.xml 添加后,配置为:
这时,重启hbase
[root@hadoop002 bin]# stop-hbase.sh
[root@hadoop002 bin]# start-hbase.sh
hadoop002: starting zookeeper, logging to /opt/software/hbase-1.2.0-cdh5.7.0/bin/../logs/hbase-root-zookeeper-hadoop002.out
starting master, logging to /opt/software/hbase-1.2.0-cdh5.7.0/logs/hbase-root-master-hadoop002.out
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option PermSize=128m; support was removed in 8.0
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=128m; support was removed in 8.0
hadoop002: starting regionserver, logging to /opt/software/hbase-1.2.0-cdh5.7.0/bin/../logs/hbase-root-regionserver-hadoop002.out
hadoop002: Java HotSpot(TM) 64-Bit Server VM warning: ignoring option PermSize=128m; support was removed in 8.0
hadoop002: Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=128m; support was removed in 8.0
发现这时候,为我所指定的 hadoop002 而不是localhost了。
总结:
从这个问题来看,主要解决的问题不是解决某个配置,而是在于去剖析程序的运行机制,从源码中分析hadoop工作机制,方便后期排错检查。思路比较重要!