收集记录一些Hadoop配置部署过程中遇到的问题。
1.
Q:safe mode issue
2013-12-10 17:20:46,399 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 54310, call delete(/app/hadoop/tmp/mapred/system, true) from 127.0.0.1:59760: error: org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot delete /app/hadoop/tmp/mapred/system. Name node is in safe mode. The ratio of reported blocks 1.0000 has reached the threshold 0.9990. Safe mode will be turned off automatically in 17 seconds. org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot delete /app/hadoop/tmp/mapred/system. Name node is in safe mode. The ratio of reported blocks 1.0000 has reached the threshold 0.9990. Safe mode will be turned off automatically in 17 seconds. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInternal(FSNamesystem.java:1994) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:1974) at org.apache.hadoop.hdfs.server.namenode.NameNode.delete(NameNode.java:792) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1382) 2013-12-10 17:20:53,868 INFO org.apache.hadoop.hdfs.StateChange: STATE* Safe mode ON.
S:
bin/hadoop dfsadmin -safemode leave
这种方法解决了运行中的hadoop的safe mode问题,但是下次重启hadoop,还会出现这个问题。
其实这个问题,我猜测可能是由于目录/app/hadoop/tmp/mapred/system被破坏造成。
永久解决,可以删除掉/app/hadoop/tmp/,重新创建,重新format,重启hadoop——如果条件允许的话。
2. namespaceID issue
Q:
2013-12-09 19:37:19,796 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: Incompatible namespaceIDs in /app/hadoop/tmp/dfs/data: namenode namespaceID = 346059444; datanode namespaceID = 313579633 at org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(DataStorage.java:232) at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:147) at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:385) at org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:299) at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1582) at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1521) at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1539) at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:1665) at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1682) 2013-12-09 19:37:19,819 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:
S:这个可能是由于某些操作造成了name节点和data节点的namespaceID不一致。
解决方法可以参照(这个方法没有实际实践过,暂时先记录下):
I've had this happen a few times. If restarting the data node doesn't help, then do the following: Restart Hadoop Go to /app/hadoop/tmp/dfs/name/current Open VERSION (i.e. by vim VERSION) Record namespaceID Go to /app/hadoop/tmp/dfs/data/current Open VERSION (i.e. by vim VERSION) Replace the namespaceID with the namespaceID you recorded in step 4. This should fix the problem.
from:http://stackoverflow.com/questions/18300940/why-does-data-node-shut-down-when-i-run-hadoop
我实际使用的方法非常暴力,直接将name节点和data节点的目录(/app/hadoop/)删掉,
然后,重新创建这个目录,重新formatf(bin/hadoop namenode -format),重新启动hadoop。
这种方法不适合生产环境,学习阶段是足够了。
http://www.cnblogs.com/Dreama/articles/2097200.html
3.ipv6 issue
Q:ipv6 issue
http://wiki.apache.org/hadoop/HadoopIPv6
据说,hadoop不支持ipv6,所以禁用掉ipv6
hduser@hdnamenode:/usr/local/hadoop$ netstat -plten | grep java (并非所有进程都能被检测到,所有非本用户的进程信息将不会显示,如果想看到所有信息,则必须切换到 root 用户) tcp6 0 0 127.0.0.1:54310 :::* LISTEN 1001 21535 7799/java tcp6 0 0 127.0.0.1:54311 :::* LISTEN 1001 22111 8387/java tcp6 0 0 :::50090 :::* LISTEN 1001 22211 8291/java tcp6 0 0 :::50060 :::* LISTEN 1001 22864 8633/java tcp6 0 0 :::50030 :::* LISTEN 1001 22295 8387/java tcp6 0 0 :::37518 :::* LISTEN 1001 21549 8291/java tcp6 0 0 :::41359 :::* LISTEN 1001 20832 7799/java tcp6 0 0 127.0.0.1:51827 :::* LISTEN 1001 22582 8633/java tcp6 0 0 :::50070 :::* LISTEN 1001 22121 7799/java tcp6 0 0 :::51161 :::* LISTEN 1001 21726 8387/java tcp6 0 0 :::38905 :::* LISTEN 1001 21172 8045/java tcp6 0 0 :::50010 :::* LISTEN 1001 22207 8045/java tcp6 0 0 :::50075 :::* LISTEN 1001 22593 8045/java tcp6 0 0 :::50020 :::* LISTEN 1001 22598 8045/java
如果是ipv6,则使用netstat查看端口会显示为“tcp6”,如果是ipv4,则显示为“tcp”
在ubuntu 12.04上禁用ipv6,可以使用以下方法:
关闭ipv6就可以了,关闭方法:对于ubuntu 9.10及以上版本 ,可用以下方法: 1. gksu gedit /etc/default/grub 将 GRUB_CMDLINE_LINUX_DEFAULT="quiet splash" 变为 GRUB_CMDLINE_LINUX_DEFAULT="ipv6.disable=1 quiet splash" 2. sudo update-grub
修改之后,要reboot server。
在查看一下
hduser@hdnamenode:/usr/local/hadoop$ netstat -plten | grep java (并非所有进程都能被检测到,所有非本用户的进程信息将不会显示,如果想看到所有信息,则必须切换到 root 用户) tcp 0 0 127.0.0.1:54310 :::* LISTEN 1001 21535 7799/java tcp 0 0 127.0.0.1:54311 :::* LISTEN 1001 22111 8387/java tcp 0 0 :::50090 :::* LISTEN 1001 22211 8291/java tcp 0 0 :::50060 :::* LISTEN 1001 22864 8633/java tcp 0 0 :::50030 :::* LISTEN 1001 22295 8387/java tcp 0 0 :::37518 :::* LISTEN 1001 21549 8291/java tcp 0 0 :::41359 :::* LISTEN 1001 20832 7799/java tcp 0 0 127.0.0.1:51827 :::* LISTEN 1001 22582 8633/java tcp 0 0 :::50070 :::* LISTEN 1001 22121 7799/java tcp 0 0 :::51161 :::* LISTEN 1001 21726 8387/java tcp 0 0 :::38905 :::* LISTEN 1001 21172 8045/java tcp 0 0 :::50010 :::* LISTEN 1001 22207 8045/java tcp 0 0 :::50075 :::* LISTEN 1001 22593 8045/java tcp 0 0 :::50020 :::* LISTEN 1001 22598 8045/java
4. issue:“jobtracker.info could only be replicated to 0 nodes, instead of 1”
2013-12-10 17:22:51,999 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 54310: starting 2013-12-10 17:22:52,003 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 54310: starting 2013-12-10 17:22:52,004 INFO org.apache.hadoop.ipc.Server: IPC Server handler 1 on 54310: starting 2013-12-10 17:22:52,005 INFO org.apache.hadoop.ipc.Server: IPC Server handler 2 on 54310: starting 2013-12-10 17:22:52,005 INFO org.apache.hadoop.ipc.Server: IPC Server handler 3 on 54310: starting 2013-12-10 17:22:52,006 INFO org.apache.hadoop.ipc.Server: IPC Server handler 4 on 54310: starting 2013-12-10 17:22:52,006 INFO org.apache.hadoop.ipc.Server: IPC Server handler 5 on 54310: starting 2013-12-10 17:22:52,007 INFO org.apache.hadoop.ipc.Server: IPC Server handler 6 on 54310: starting 2013-12-10 17:22:52,007 INFO org.apache.hadoop.ipc.Server: IPC Server handler 7 on 54310: starting 2013-12-10 17:22:52,007 INFO org.apache.hadoop.ipc.Server: IPC Server handler 8 on 54310: starting 2013-12-10 17:22:52,016 INFO org.apache.hadoop.ipc.Server: IPC Server handler 9 on 54310: starting 2013-12-10 17:22:56,038 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:hduser cause:java.io.IOException: File /app/hadoop/tmp/mapred/system/jobtracker.info could only be replicated to 0 nodes, instead of 1 2013-12-10 17:22:56,040 INFO org.apache.hadoop.ipc.Server: IPC Server handler 6 on 54310, call addBlock(/app/hadoop/tmp/mapred/system/jobtracker.info, DFSClient_-1488093128, null) from 127.0.0.1:59783: error: java.io.IOException: File /app/hadoop/tmp/mapred/system/jobtracker.info could only be replicated to 0 nodes, instead of 1 java.io.IOException: File /app/hadoop/tmp/mapred/system/jobtracker.info could only be replicated to 0 nodes, instead of 1 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1558) at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:696) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1382)
S:这个问题可以有多种原因造成,但是归根到是由于data节点连不上name节点造成的。
我们还可以检查下data节点的log来验证下:
2013-12-10 17:31:57,434 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: hdnamenode/10.0.2.7:54310. Already tried 0 time(s). 2013-12-10 17:31:58,435 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: hdnamenode/10.0.2.7:54310. Already tried 1 time(s). 2013-12-10 17:31:59,437 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: hdnamenode/10.0.2.7:54310. Already tried 2 time(s). 2013-12-10 17:32:00,439 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: hdnamenode/10.0.2.7:54310. Already tried 3 time(s). 2013-12-10 17:32:01,440 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: hdnamenode/10.0.2.7:54310. Already tried 4 time(s). 2013-12-10 17:32:02,442 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: hdnamenode/10.0.2.7:54310. Already tried 5 time(s). 2013-12-10 17:32:03,444 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: hdnamenode/10.0.2.7:54310. Already tried 6 time(s). 2013-12-10 17:32:04,446 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: hdnamenode/10.0.2.7:54310. Already tried 7 time(s). 2013-12-10 17:32:05,448 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: hdnamenode/10.0.2.7:54310. Already tried 8 time(s). 2013-12-10 17:32:06,450 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: hdnamenode/10.0.2.7:54310. Already tried 9 time(s). 2013-12-10 17:32:06,451 INFO org.apache.hadoop.ipc.RPC: Server at hdnamenode/10.0.2.7:54310 not available yet, Zzzzz...
一直在retry
下面说说我的一次解决问题的经历。
其实是很无语的。
当时一开始配置的是单机版的hadoop,然后在此基础上配置的多节点hadoop,core-site.xml中
<property> <name>fs.default.name</name> <value>hdfs://localhost:54310</value> <description>The name of the default file system. A URI whose scheme and authority determine the FileSystem implementation. The uri's scheme determines the config property (fs.SCHEME.impl) naming the FileSystem implementation class. The uri's authority is used to determine the host, port, etc. for a filesystem.</description> </property>
是这样配置的,这在单机版上没有问题。
当配置多节点时,没太在意,只改了data节点的core-site.xml,没改name节点的core-site.xml。
就是这个不在意造成了data节点连接不上name节点。
当时想,name节点,就我自己访问我自己,没关系的。
其实,不然,如果name节点的core-site.xml配成localhost会造成54310端口只能被本机访问,而不能被data节点访问。
hduser@hdnamenode:/usr/local/hadoop$ netstat -plten | grep java (并非所有进程都能被检测到,所有非本用户的进程信息将不会显示,如果想看到所有信息,则必须切换到 root 用户) tcp 0 0 127.0.0.1:54310 0.0.0.0:* LISTEN 1001 12417 2347/java tcp 0 0 127.0.0.1:54311 0.0.0.0:* LISTEN 1001 12405 2930/java tcp 0 0 0.0.0.0:50090 0.0.0.0:* LISTEN 1001 12705 2839/java tcp 0 0 127.0.0.1:47373 0.0.0.0:* LISTEN 1001 12426 3178/java tcp 0 0 0.0.0.0:53485 0.0.0.0:* LISTEN 1001 11865 2930/java tcp 0 0 0.0.0.0:50030 0.0.0.0:* LISTEN 1001 12519 2930/java tcp 0 0 0.0.0.0:42673 0.0.0.0:* LISTEN 1001 11761 2347/java tcp 0 0 0.0.0.0:50070 0.0.0.0:* LISTEN 1001 12525 2347/java tcp 0 0 0.0.0.0:48024 0.0.0.0:* LISTEN 1001 11762 2591/java tcp 0 0 0.0.0.0:50010 0.0.0.0:* LISTEN 1001 12701 2591/java tcp 0 0 0.0.0.0:50075 0.0.0.0:* LISTEN 1001 12709 2591/java tcp 0 0 0.0.0.0:54972 0.0.0.0:* LISTEN 1001 11763 2839/java
注意54310和54311端口。
这个我们可以使用telnet验证:
hduser@hdnamenode:/usr/local/hadoop$ telnet 10.0.2.7 54310 Trying 10.0.2.7... telnet: Unable to connect to remote host: Connection refused hduser@hdnamenode:/usr/local/hadoop$ telnet 127.0.0.1 54310 Trying 127.0.0.1... Connected to 127.0.0.1. Escape character is '^]'. exit Connection closed by foreign host. hduser@hdnamenode:/usr/local/hadoop$ telnet 10.0.2.7 22 Trying 10.0.2.7... Connected to 10.0.2.7. Escape character is '^]'. SSH-2.0-OpenSSH_5.9p1 Debian-5ubuntu1.1 e Protocol mismatch. Connection closed by foreign host.
当我们将name节点的core-site.xml改成实际的ip时,将是这样:
hduser@hdnamenode:/usr/local/hadoop/conf$ netstat -plten | grep java (并非所有进程都能被检测到,所有非本用户的进程信息将不会显示,如果想看到所有信息,则必须切换到 root 用户) tcp 0 0 127.0.0.1:34054 0.0.0.0:* LISTEN 1001 17197 5142/java tcp 0 0 10.0.2.7:54310 0.0.0.0:* LISTEN 1001 16262 4291/java tcp 0 0 10.0.2.7:54311 0.0.0.0:* LISTEN 1001 16856 4892/java tcp 0 0 0.0.0.0:50090 0.0.0.0:* LISTEN 1001 17007 4791/java tcp 0 0 0.0.0.0:39084 0.0.0.0:* LISTEN 1001 15558 4291/java tcp 0 0 0.0.0.0:50030 0.0.0.0:* LISTEN 1001 17091 4892/java tcp 0 0 0.0.0.0:44687 0.0.0.0:* LISTEN 1001 16415 4791/java tcp 0 0 0.0.0.0:32785 0.0.0.0:* LISTEN 1001 16523 4892/java tcp 0 0 0.0.0.0:50070 0.0.0.0:* LISTEN 1001 16916 4291/java tcp 0 0 0.0.0.0:50010 0.0.0.0:* LISTEN 1001 16997 4541/java tcp 0 0 0.0.0.0:50075 0.0.0.0:* LISTEN 1001 17208 4541/java tcp 0 0 0.0.0.0:40002 0.0.0.0:* LISTEN 1001 15899 4541/java tcp 0 0 0.0.0.0:50020 0.0.0.0:* LISTEN 1001 17213 4541/java hduser@hdnamenode:/usr/local/hadoop/conf$ telnet 10.0.2.7 54310 Trying 10.0.2.7... Connected to 10.0.2.7. Escape character is '^]'. exit Connection closed by foreign host.
另外,有些时候,name节点启动的会慢于data节点,也会造成这个问题。这只要耐心等一会即可
总结下遇到这样的问题,如何考虑
1)检查name节点是否正常,查看端口是否被监听——注意ip是否正确
2)检查data节点能否telnet name节点相应的端口
3)检查conf/下的配置是否正确
4)检查ntfs空间是否足够??以及读写权限是否正确。
5)向google寻求帮助
下面的链接列举了一些情况:
http://blog.csdn.net/foamflower/article/details/5980406
5.“Incomplete HDFS URI” issue
Q:新问题
name节点:
2013-12-10 19:01:28,881 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: ReplicationMonitor thread received InterruptedException.java.lang.InterruptedException: sleep interrupted 2013-12-10 19:01:28,882 INFO org.apache.hadoop.hdfs.server.namenode.DecommissionManager: Interrupted Monitor java.lang.InterruptedException: sleep interrupted at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.hdfs.server.namenode.DecommissionManager$Monitor.run(DecommissionManager.java:65) at java.lang.Thread.run(Thread.java:662) 2013-12-10 19:01:28,883 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of transactions: 0 Total time for transactions(ms): 0Number of transactions batched in Syncs: 0 Number of syncs: 0 SyncTimes(ms): 0 2013-12-10 19:01:28,998 INFO org.apache.hadoop.ipc.Server: Stopping server on 54310 2013-12-10 19:01:28,998 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 54310: exiting 2013-12-10 19:01:28,998 INFO org.apache.hadoop.ipc.Server: IPC Server handler 1 on 54310: exiting 2013-12-10 19:01:28,998 INFO org.apache.hadoop.ipc.Server: IPC Server handler 2 on 54310: exiting 2013-12-10 19:01:28,999 INFO org.apache.hadoop.ipc.Server: IPC Server handler 3 on 54310: exiting 2013-12-10 19:01:28,999 INFO org.apache.hadoop.ipc.Server: IPC Server handler 4 on 54310: exiting 2013-12-10 19:01:28,999 INFO org.apache.hadoop.ipc.Server: IPC Server handler 5 on 54310: exiting 2013-12-10 19:01:28,999 INFO org.apache.hadoop.ipc.Server: IPC Server handler 9 on 54310: exiting 2013-12-10 19:01:28,999 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server listener on 54310 2013-12-10 19:01:29,000 INFO org.apache.hadoop.ipc.metrics.RpcInstrumentation: shut down 2013-12-10 19:01:29,000 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: java.io.IOException: Incomplete HDFS URI, no host: hdfs://hadoop_namenode:54310 at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:85) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1310) at org.apache.hadoop.fs.FileSystem.access$100(FileSystem.java:65) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1328) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:226) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:109) at org.apache.hadoop.fs.Trash.<init>(Trash.java:62) at org.apache.hadoop.hdfs.server.namenode.NameNode.startTrashEmptier(NameNode.java:292) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:288) at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:434) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1153) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1162) 2013-12-10 19:01:29,000 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server Responder 2013-12-10 19:01:29,007 INFO org.apache.hadoop.ipc.Server: IPC Server handler 6 on 54310: exiting 2013-12-10 19:01:29,007 INFO org.apache.hadoop.ipc.Server: IPC Server handler 7 on 54310: exiting 2013-12-10 19:01:29,028 WARN org.apache.hadoop.ipc.Server: IPC Server Responder, call delete(/app/hadoop/tmp/mapred/system, true) from 10.0.2.11:35266: output error 2013-12-10 19:01:29,028 INFO org.apache.hadoop.ipc.Server: IPC Server handler 8 on 54310 caught: java.nio.channels.ClosedByInterruptException at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:184) at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:343) at org.apache.hadoop.ipc.Server.channelWrite(Server.java:1695) at org.apache.hadoop.ipc.Server.access$2000(Server.java:93) at org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:739) at org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:803) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1411) 2013-12-10 19:01:29,028 INFO org.apache.hadoop.ipc.Server: IPC Server handler 8 on 54310: exiting 2013-12-10 19:01:29,030 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down NameNode at hadoop_namenode/10.0.2.11 ************************************************************/
data节点:
2013-12-10 19:01:55,577 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: /10.0.2.11:54310. Already tried 9 time(s). 2013-12-10 19:01:55,579 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.net.ConnectException: Call to /10.0.2.11:54310 failed on connection exception: java.net.ConnectException: Connection refused at org.apache.hadoop.ipc.Client.wrapException(Client.java:1057) at org.apache.hadoop.ipc.Client.call(Client.java:1033) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:224) at $Proxy5.register(Unknown Source) at org.apache.hadoop.hdfs.server.datanode.DataNode.register(DataNode.java:635) at org.apache.hadoop.hdfs.server.datanode.DataNode.runDatanodeDaemon(DataNode.java:1378) at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1438) at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:1563) at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1573) Caused by: java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:599) at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:406) at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:414) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:527) at org.apache.hadoop.ipc.Client$Connection.access$1800(Client.java:187) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1164) at org.apache.hadoop.ipc.Client.call(Client.java:1010) ... 7 more 2013-12-10 19:01:55,580 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down DataNode at hadoop_datanode2/10.0.2.13 ************************************************************/
S:修改主机名,移除主机名中的下划线。~_~
然后运行
sudo /etc/init.d/networking restart
有可能,还需要reboot和删除logs
参考:
http://blog.csdn.net/xiaochawan/article/details/8733094
6.hbase的hbase.rootdir配置成ip的问题
Q:
2014-01-04 16:29:10,026 FATAL org.apache.hadoop.hbase.master.HMaster: Unhandled exception. Starting shutdown. java.lang.IllegalArgumentException: Wrong FS: hdfs://10.0.2.11:54310/hbase, expected: hdfs://hadoopnamenode:54310 at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:354) at org.apache.hadoop.hdfs.DistributedFileSystem.checkPath(DistributedFileSystem.java:106) at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:162) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:521) at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:692) at org.apache.hadoop.hbase.master.MasterFileSystem.checkRootDir(MasterFileSystem.java:238) at org.apache.hadoop.hbase.master.MasterFileSystem.createInitialFileSystemLayout(MasterFileSystem.java:106) at org.apache.hadoop.hbase.master.MasterFileSystem.<init>(MasterFileSystem.java:91) at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:346) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:282) 2014-01-04 16:29:10,027 INFO org.apache.hadoop.hbase.master.HMaster: Aborting
S:
~/conf/hbase-site.xml中hbase.rootdir的配置不能使用ip,要使用hostname
http://www.cnblogs.com/ventlam/archive/2011/01/22/HBaseCluster.html
7.hbase各节点间时间同步问题
Q: 由于各节点的时间不同步,会导致hbase无法正常启动。这时查看日志,将会看到以下警告信息。
2014-01-04 19:41:46,436 INFO org.apache.hadoop.hbase.master.ServerManager: Waiting on regionserver(s) to checkin 2014-01-04 19:41:47,847 DEBUG org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: Lookedup root region location, connection=org.apa
S:
将各节点的时间同步。
http://lookqlp.iteye.com/blog/1341118
最好的做法是安装ntp,将name节点作为ntp server,将其他节点作为ntp client。
并且在启动各个节点时,等name节点完全启动后,再启动其他的节点。
8. zookeeper没有创建myid 时的报错
Q:
hduser@hadoopnamenode:~/zookeeper-3.3.3$ bin/zkServer.sh start JMX enabled by default Using config: /home/hduser/zookeeper-3.3.3/bin/../conf/zoo.cfg Starting zookeeper ... bin/zkServer.sh: 80: bin/zkServer.sh: cannot create /export/crawlspace/mahadev/zookeeper/server1/data /var/zookeeper//zookeeper_server.pid: Directory nonexistent STARTED hduser@hadoopnamenode:~/zookeeper-3.3.3$ 2014-01-05 11:16:25,256 - INFO [main:QuorumPeerConfig@90] - Reading configuration from: /home/hduser/zookeeper-3.3.3/bin/../conf/zoo.cfg 2014-01-05 11:16:25,280 - INFO [main:QuorumPeerConfig@310] - Defaulting to majority quorums 2014-01-05 11:16:25,282 - FATAL [main:QuorumPeerMain@83] - Invalid config, exiting abnormally org.apache.zookeeper.server.quorum.QuorumPeerConfig$ConfigException: Error processing /home/hduser/zookeeper-3.3.3/bin/../conf/zoo.cfg at org.apache.zookeeper.server.quorum.QuorumPeerConfig.parse(QuorumPeerConfig.java:110) at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:99) at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:76) Caused by: java.lang.IllegalArgumentException: /var/zookeeper/myid file is missing at org.apache.zookeeper.server.quorum.QuorumPeerConfig.parseProperties(QuorumPeerConfig.java:320) at org.apache.zookeeper.server.quorum.QuorumPeerConfig.parse(QuorumPeerConfig.java:106) ... 2 more Invalid config, exiting abnormally
S:
解决办法,就是到 dataDir指定的目录下创建myid文件,将server id(只是数字部分)写到文件中。
dataDir,server id配置在~/conf/zoo.cfg
BTW:
如果是配置zookeeper集群,有多台zookeeper server,那么单独启动一台server,会有大量的错误报出。
像这样:
hduser@hadoopnamenode:~/zookeeper-3.3.3$ bin/zkServer.sh start JMX enabled by default Using config: /home/hduser/zookeeper-3.3.3/bin/../conf/zoo.cfg Starting zookeeper ... bin/zkServer.sh: 80: bin/zkServer.sh: cannot create /export/crawlspace/mahadev/zookeeper/server1/data /var/zookeeper//zookeeper_server.pid: Directory nonexistent STARTED hduser@hadoopnamenode:~/zookeeper-3.3.3$ 2014-01-05 11:25:22,123 - INFO [main:QuorumPeerConfig@90] - Reading configuration from: /home/hduser/zookeeper-3.3.3/bin/../conf/zoo.cfg 2014-01-05 11:25:22,134 - INFO [main:QuorumPeerConfig@310] - Defaulting to majority quorums 2014-01-05 11:25:22,143 - INFO [main:QuorumPeerMain@119] - Starting quorum peer 2014-01-05 11:25:22,165 - INFO [main:NIOServerCnxn$Factory@143] - binding to port 0.0.0.0/0.0.0.0:2181 2014-01-05 11:25:22,186 - INFO [main:QuorumPeer@819] - tickTime set to 2000 2014-01-05 11:25:22,190 - INFO [main:QuorumPeer@830] - minSessionTimeout set to -1 2014-01-05 11:25:22,191 - INFO [main:QuorumPeer@841] - maxSessionTimeout set to -1 2014-01-05 11:25:22,192 - INFO [main:QuorumPeer@856] - initLimit set to 5 2014-01-05 11:25:22,216 - INFO [main:FileSnap@82] - Reading snapshot /var/zookeeper/version-2/snapshot.0 2014-01-05 11:25:22,243 - INFO [Thread-1:QuorumCnxManager$Listener@473] - My election bind port: 3888 2014-01-05 11:25:22,253 - INFO [QuorumPeer:/0.0.0.0:2181:QuorumPeer@621] - LOOKING 2014-01-05 11:25:22,255 - INFO [QuorumPeer:/0.0.0.0:2181:FastLeaderElection@663] - New election. My id = 1, Proposed zxid = 0 2014-01-05 11:25:22,257 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 1 (n.leader), 0 (n.zxid), 1 (n.round), LOOKING (n.state), 1 (n.sid), LOOKING (my state) 2014-01-05 11:25:22,286 - WARN [WorkerSender Thread:QuorumCnxManager@384] - Cannot open channel to 2 at election address hadoopdatanode1/10.0.2.12:3888 java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:599) at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:100) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:371) at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:340) at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:360) at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:333) at java.lang.Thread.run(Thread.java:662) 2014-01-05 11:25:22,291 - WARN [WorkerSender Thread:QuorumCnxManager@384] - Cannot open channel to 3 at election address hadoopdatanode2/10.0.2.13:3888 java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:599) at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:100) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:371) at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:340) at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:360) at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:333) at java.lang.Thread.run(Thread.java:662) 2014-01-05 11:25:22,464 - WARN [QuorumPeer:/0.0.0.0:2181:QuorumCnxManager@384] - Cannot open channel to 2 at election address hadoopdatanode1/10.0.2.12:3888 java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:599) at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:100) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:371) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:404) at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:688) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:622) 2014-01-05 11:25:22,473 - WARN [QuorumPeer:/0.0.0.0:2181:QuorumCnxManager@384] - Cannot open channel to 3 at election address hadoopdatanode2/10.0.2.13:3888 java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:599) at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:100) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:371) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:404) at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:688) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:622) 2014-01-05 11:25:22,476 - INFO [QuorumPeer:/0.0.0.0:2181:FastLeaderElection@697] - Notification time out: 400 2014-01-05 11:25:22,883 - WARN [QuorumPeer:/0.0.0.0:2181:QuorumCnxManager@384] - Cannot open channel to 2 at election address hadoopdatanode1/10.0.2.12:3888 java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:599) at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:100) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:371) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:404) at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:688) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:622) 2014-01-05 11:25:22,887 - WARN [QuorumPeer:/0.0.0.0:2181:QuorumCnxManager@384] - Cannot open channel to 3 at election address hadoopdatanode2/10.0.2.13:3888 java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:599) at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:100) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:371) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:404) at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:688) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:622) 2014-01-05 11:25:22,891 - INFO [QuorumPeer:/0.0.0.0:2181:FastLeaderElection@697] - Notification time out: 800 2014-01-05 11:25:23,694 - WARN [QuorumPeer:/0.0.0.0:2181:QuorumCnxManager@384] - Cannot open channel to 2 at election address hadoopdatanode1/10.0.2.12:3888 java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:599) at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:100) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:371) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:404) at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:688) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:622) 2014-01-05 11:25:23,704 - WARN [QuorumPeer:/0.0.0.0:2181:QuorumCnxManager@384] - Cannot open channel to 3 at election address hadoopdatanode2/10.0.2.13:3888 java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:599) at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:100) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:371) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:404) at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:688) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:622) 2014-01-05 11:25:23,705 - INFO [QuorumPeer:/0.0.0.0:2181:FastLeaderElection@697] - Notification time out: 1600 2014-01-05 11:25:25,306 - WARN [QuorumPeer:/0.0.0.0:2181:QuorumCnxManager@384] - Cannot open channel to 2 at election address hadoopdatanode1/10.0.2.12:3888 java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:599) at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:100) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:371) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:404) at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:688) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:622) 2014-01-05 11:25:25,311 - WARN [QuorumPeer:/0.0.0.0:2181:QuorumCnxManager@384] - Cannot open channel to 3 at election address hadoopdatanode2/10.0.2.13:3888 java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:599) at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:100) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:371) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:404) at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:688) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:622) 2014-01-05 11:25:25,320 - INFO [QuorumPeer:/0.0.0.0:2181:FastLeaderElection@697] - Notification time out: 3200 2014-01-05 11:25:28,521 - WARN [QuorumPeer:/0.0.0.0:2181:QuorumCnxManager@384] - Cannot open channel to 2 at election address hadoopdatanode1/10.0.2.12:3888 java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:599) at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:100) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:371) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:404) at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:688) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:622) 2014-01-05 11:25:28,526 - WARN [QuorumPeer:/0.0.0.0:2181:QuorumCnxManager@384] - Cannot open channel to 3 at election address hadoopdatanode2/10.0.2.13:3888 java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:599) at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:100) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:371) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:404) at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:688) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:622) 2014-01-05 11:25:28,535 - INFO [QuorumPeer:/0.0.0.0:2181:FastLeaderElection@697] - Notification time out: 6400 2014-01-05 11:25:34,937 - WARN [QuorumPeer:/0.0.0.0:2181:QuorumCnxManager@384] - Cannot open channel to 2 at election address hadoopdatanode1/10.0.2.12:3888 java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:599) at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:100) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:371) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:404) at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:688) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:622) 2014-01-05 11:25:34,942 - WARN [QuorumPeer:/0.0.0.0:2181:QuorumCnxManager@384] - Cannot open channel to 3 at election address hadoopdatanode2/10.0.2.13:3888 java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:599) at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:100) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:371) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:404) at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:688) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:622) 2014-01-05 11:25:34,944 - INFO [QuorumPeer:/0.0.0.0:2181:FastLeaderElection@697] - Notification time out: 12800 2014-01-05 11:25:47,745 - WARN [QuorumPeer:/0.0.0.0:2181:QuorumCnxManager@384] - Cannot open channel to 2 at election address hadoopdatanode1/10.0.2.12:3888 java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:599) at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:100) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:371) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:404) at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:688) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:622) 2014-01-05 11:25:47,750 - WARN [QuorumPeer:/0.0.0.0:2181:QuorumCnxManager@384] - Cannot open channel to 3 at election address hadoopdatanode2/10.0.2.13:3888 java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:599) at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:100) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:371) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:404) at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:688) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:622) 2014-01-05 11:25:47,752 - INFO [QuorumPeer:/0.0.0.0:2181:FastLeaderElection@697] - Notification time out: 25600 2014-01-05 11:26:13,359 - WARN [QuorumPeer:/0.0.0.0:2181:QuorumCnxManager@384] - Cannot open channel to 2 at election address hadoopdatanode1/10.0.2.12:3888 java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:599) at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:100) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:371) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:404) at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:688) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:622) 2014-01-05 11:26:13,363 - WARN [QuorumPeer:/0.0.0.0:2181:QuorumCnxManager@384] - Cannot open channel to 3 at election address hadoopdatanode2/10.0.2.13:3888 java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:599) at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:100) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:371) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:404) at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:688) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:622) 2014-01-05 11:26:13,365 - INFO [QuorumPeer:/0.0.0.0:2181:FastLeaderElection@697] - Notification time out: 51200 2014-01-05 11:27:04,568 - WARN [QuorumPeer:/0.0.0.0:2181:QuorumCnxManager@384] - Cannot open channel to 2 at election address hadoopdatanode1/10.0.2.12:3888 java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:599) at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:100) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:371) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:404) at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:688) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:622) 2014-01-05 11:27:04,575 - WARN [QuorumPeer:/0.0.0.0:2181:QuorumCnxManager@384] - Cannot open channel to 3 at election address hadoopdatanode2/10.0.2.13:3888 java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:599) at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:100) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:371) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:404) at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:688) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:622) 2014-01-05 11:27:04,578 - INFO [QuorumPeer:/0.0.0.0:2181:FastLeaderElection@697] - Notification time out: 60000 2014-01-05 11:28:04,579 - WARN [QuorumPeer:/0.0.0.0:2181:QuorumCnxManager@384] - Cannot open channel to 2 at election address hadoopdatanode1/10.0.2.12:3888 java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:599) at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:100) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:371) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:404) at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:688) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:622) 2014-01-05 11:28:04,584 - WARN [QuorumPeer:/0.0.0.0:2181:QuorumCnxManager@384] - Cannot open channel to 3 at election address hadoopdatanode2/10.0.2.13:3888 java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:599) at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:100) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:371) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:404) at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:688) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:622) 2014-01-05 11:28:04,587 - INFO [QuorumPeer:/0.0.0.0:2181:FastLeaderElection@697] - Notification time out: 60000
这时,不需要担心,只需要将剩下的server启动即可。