CDH4 HA 切换时间

blocksize:35M
filesize 96M
zk-session-timeout:10s

logs:


active nn:Wed Sep  5 13:20:25 CST 2012


zk:
[zk: localhost:2181(CONNECTED) 19] get /hadoop-ha/mycluster/ActiveStandbyElectorLock

myclusternn1bd10 \ufffdF(\ufffd>
cZxid = 0xd90
ctime = Wed Sep 05 13:20:58 CST 2012
mZxid = 0xd90
mtime = Wed Sep 05 13:20:58 CST 2012
pZxid = 0xd90
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x13971759a9a045a
dataLength = 28
numChildren = 0
[zk: localhost:2181(CONNECTED) 20] get /hadoop-ha/mycluster/Active                 

ActiveBreadCrumb           ActiveStandbyElectorLock
[zk: localhost:2181(CONNECTED) 20] get /hadoop-ha/mycluster/ActiveBreadCrumb

myclusternn1bd10 \ufffdF(\ufffd>
cZxid = 0x41
ctime = Thu Aug 30 09:50:56 CST 2012
mZxid = 0xd93
mtime = Wed Sep 05 13:21:13 CST 2012
pZxid = 0x41
cversion = 0
dataVersion = 89
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 28
numChildren = 0



client
copy 0 ...
Wed Sep  5 13:18:45 CST 2012
copy 1 ...
Wed Sep  5 13:18:55 CST 2012
copy 2 ...
Wed Sep  5 13:19:16 CST 2012
copy 3 ...
Wed Sep  5 13:19:50 CST 2012
copy 4 ...
Wed Sep  5 13:20:09 CST 2012
12/09/05 13:20:49 WARN retry.RetryInvocationHandler: Exception while invoking addBlock of class ClientNamenodeProtocolTranslatorPB. Trying to fail over immediately.
12/09/05 13:20:49 WARN retry.RetryInvocationHandler: Exception while invoking addBlock of class ClientNamenodeProtocolTranslatorPB after 1 fail over attempts. Trying to fail over after sleeping for 643ms.
12/09/05 13:21:09 WARN retry.RetryInvocationHandler: Exception while invoking renewLease of class ClientNamenodeProtocolTranslatorPB. Trying to fail over immediately.
12/09/05 13:21:09 WARN retry.RetryInvocationHandler: A failover has occurred since the start of this method invocation attempt.
12/09/05 13:21:12 WARN retry.RetryInvocationHandler: Exception while invoking addBlock of class ClientNamenodeProtocolTranslatorPB after 2 fail over attempts. Trying to fail over after sleeping for 1851ms.
12/09/05 13:21:15 WARN retry.RetryInvocationHandler: Exception while invoking renewLease of class ClientNamenodeProtocolTranslatorPB after 1 fail over attempts. Trying to fail over after sleeping for 549ms.
12/09/05 13:21:15 WARN retry.RetryInvocationHandler: A failover has occurred since the start of this method invocation attempt.
copy 5 ...
Wed Sep  5 13:21:18 CST 2012
copy 6 ...
Wed Sep  5 13:21:25 CST 2012




blocksize:35M
filesize 96M
zk-session-timeout:10s


Active NN:Wed Sep  5 13:51:28 CST 2012


zk:
[zk: localhost:2181(CONNECTED) 30] get /hadoop-ha/mycluster/ActiveStandbyElectorLock

myclusternn2bd09 \ufffdF(\ufffd>
cZxid = 0xd9b
ctime = Wed Sep 05 13:51:38 CST 2012
mZxid = 0xd9b
mtime = Wed Sep 05 13:51:38 CST 2012
pZxid = 0xd9b
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x13971759a9a045c
dataLength = 28
numChildren = 0
[zk: localhost:2181(CONNECTED) 31] get /hadoop-ha/mycluster/ActiveBreadCrumb       

myclusternn2bd09 \ufffdF(\ufffd>
cZxid = 0x41
ctime = Thu Aug 30 09:50:56 CST 2012
mZxid = 0xd9c
mtime = Wed Sep 05 13:52:07 CST 2012
pZxid = 0x41
cversion = 0
dataVersion = 91
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 28
numChildren = 0
[zk: localhost:2181(CONNECTED) 32]


client:
copy 0 ...
Wed Sep  5 13:50:30 CST 2012
copy 1 ...
Wed Sep  5 13:50:42 CST 2012
copy 2 ...
Wed Sep  5 13:51:01 CST 2012
copy 3 ...
Wed Sep  5 13:51:27 CST 2012
12/09/05 13:51:49 WARN retry.RetryInvocationHandler: Exception while invoking getFileInfo of class ClientNamenodeProtocolTranslatorPB after 1 fail over attempts. Trying to fail over after sleeping for 703ms.
12/09/05 13:52:02 WARN retry.RetryInvocationHandler: Exception while invoking getFileInfo of class ClientNamenodeProtocolTranslatorPB after 2 fail over attempts. Trying to fail over after sleeping for 1761ms.
12/09/05 13:52:04 WARN retry.RetryInvocationHandler: Exception while invoking getFileInfo of class ClientNamenodeProtocolTranslatorPB after 3 fail over attempts. Trying to fail over after sleeping for 2651ms.
12/09/05 13:52:09 WARN retry.RetryInvocationHandler: Exception while invoking getFileInfo of class ClientNamenodeProtocolTranslatorPB after 4 fail over attempts. Trying to fail over after sleeping for 9203ms.
copy 4 ...
Wed Sep  5 13:52:22 CST 2012
copy 5 ...
Wed Sep  5 13:52:52 CST 2012
copy 6 ...
Wed Sep  5 13:53:07 CST 2012
copy 7 ...
Wed Sep  5 13:53:19 CST 2012


blocksize:35M
filesize 96M
zk-session-timeout:10M

Active NN shutdown:Wed Sep  5 15:46:43 CST 2012

zk:

[zk: localhost:2181(CONNECTED) 1] get /hadoop-ha/mycluster/ActiveStandbyElectorLock

myclusternn2bd09 \ufffdF(\ufffd>
cZxid = 0xdbe
ctime = Wed Sep 05 15:47:20 CST 2012
mZxid = 0xdbe
mtime = Wed Sep 05 15:47:20 CST 2012
pZxid = 0xdbe
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x13971759a9a0463
dataLength = 28
numChildren = 0
[zk: localhost:2181(CONNECTED) 2] get /hadoop-ha/mycluster/ActiveBreadCrumb       

myclusternn2bd09 \ufffdF(\ufffd>
cZxid = 0x41
ctime = Thu Aug 30 09:50:56 CST 2012
mZxid = 0xdbf
mtime = Wed Sep 05 15:47:26 CST 2012
pZxid = 0x41
cversion = 0
dataVersion = 99
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 28
numChildren = 0
[zk: localhost:2181(CONNECTED) 3]


client:
copy 0 ...
Wed Sep  5 15:42:51 CST 2012
copy 1 ...
Wed Sep  5 15:43:00 CST 2012
copy 2 ...
Wed Sep  5 15:43:46 CST 2012
copy 3 ...
Wed Sep  5 15:43:52 CST 2012
copy 4 ...
Wed Sep  5 15:44:11 CST 2012
copy 5 ...
Wed Sep  5 15:44:47 CST 2012
copy 6 ...
Wed Sep  5 15:45:28 CST 2012
copy 7 ...
Wed Sep  5 15:45:51 CST 2012
copy 8 ...
Wed Sep  5 15:46:08 CST 2012
copy 9 ...
Wed Sep  5 15:46:35 CST 2012
12/09/05 15:47:09 WARN retry.RetryInvocationHandler: Exception while invoking addBlock of class ClientNamenodeProtocolTranslatorPB. Trying to fail over immediately.
12/09/05 15:47:09 WARN retry.RetryInvocationHandler: Exception while invoking addBlock of class ClientNamenodeProtocolTranslatorPB after 1 fail over attempts. Trying to fail over after sleeping for 971ms.
12/09/05 15:47:29 WARN retry.RetryInvocationHandler: Exception while invoking renewLease of class ClientNamenodeProtocolTranslatorPB. Trying to fail over immediately.
12/09/05 15:47:29 WARN retry.RetryInvocationHandler: A failover has occurred since the start of this method invocation attempt.
12/09/05 15:47:41 WARN retry.RetryInvocationHandler: Exception while invoking addBlock of class ClientNamenodeProtocolTranslatorPB after 2 fail over attempts. Trying to fail over after sleeping for 2610ms.
12/09/05 15:47:53 WARN retry.RetryInvocationHandler: Exception while invoking renewLease of class ClientNamenodeProtocolTranslatorPB after 1 fail over attempts. Trying to fail over after sleeping for 1440ms.
copy 10 ...
Wed Sep  5 15:47:53 CST 2012
copy 11 ...
Wed Sep  5 15:48:00 CST 2012


zkfc:
2012-09-05 15:41:57,592 INFO org.apache.hadoop.ha.ZKFailoverController: Successfully transitioned NameNode at bd09/10.1.1.83:9000 to standby state


2012-09-05 15:47:19,975 INFO org.apache.hadoop.ha.ActiveStandbyElector: Checking for any old active which needs to be fenced...
2012-09-05 15:47:20,001 INFO org.apache.hadoop.ha.ActiveStandbyElector: Old node exists: 0a096d79636c757374657212036e6e311a046264313020a84628d33e
2012-09-05 15:47:20,003 INFO org.apache.hadoop.ha.ZKFailoverController: Should fence: NameNode at bd10/10.1.1.144:9000
2012-09-05 15:47:23,272 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: bd10/10.1.1.144:9000. Already tried 0 time(s).
2012-09-05 15:47:26,274 WARN org.apache.hadoop.ha.FailoverController: Unable to gracefully make NameNode at bd10/10.1.1.144:9000 standby (unable to connect)
java.net.NoRouteToHostException: No Route to Host from  bd09/10.1.1.83 to bd10:9000 failed on socket timeout exception: java.net.NoRouteToHostException: No route to host; For more details see:  http://wiki.apache.org/hadoop/NoRouteToHost
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:756)
at org.apache.hadoop.ipc.Client.call(Client.java:1165)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:184)
at $Proxy8.transitionToStandby(Unknown Source)
at org.apache.hadoop.ha.protocolPB.HAServiceProtocolClientSideTranslatorPB.transitionToStandby(HAServiceProtocolClientSideTranslatorPB.java:112)
at org.apache.hadoop.ha.FailoverController.tryGracefulFence(FailoverController.java:154)
at org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:510)
at org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:501)
at org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:59)
at org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:838)
at org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:859)
at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:760)
at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:407)
at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:609)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:497)
Caused by: java.net.NoRouteToHostException: No route to host
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:524)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:489)
at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:472)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:566)
at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:215)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1271)
at org.apache.hadoop.ipc.Client.call(Client.java:1141)
... 13 more
2012-09-05 15:47:26,275 INFO org.apache.hadoop.ha.NodeFencer: ====== Beginning Service Fencing Process... ======
2012-09-05 15:47:26,275 INFO org.apache.hadoop.ha.NodeFencer: Trying method 1/1: org.apache.hadoop.ha.ShellCommandFencer(/opt/hadoop/etc/hadoop/fencing.sh)
2012-09-05 15:47:26,319 INFO org.apache.hadoop.ha.ShellCommandFencer: Launched fencing command '/opt/hadoop/etc/hadoop/fencing.sh' with pid 2777
2012-09-05 15:47:26,321 INFO org.apache.hadoop.ha.ShellCommandFencer: [PID 2777] /opt/had...encing.sh: OK
2012-09-05 15:47:26,321 INFO org.apache.hadoop.ha.NodeFencer: ====== Fencing successful by method org.apache.hadoop.ha.ShellCommandFencer(/opt/hadoop/etc/hadoop/fencing.sh) ======
2012-09-05 15:47:26,321 INFO org.apache.hadoop.ha.ActiveStandbyElector: Writing znode /hadoop-ha/mycluster/ActiveBreadCrumb to indicate that the local node is the most recent active...
2012-09-05 15:47:26,325 INFO org.apache.hadoop.ha.ZKFailoverController: Trying to make NameNode at bd09/10.1.1.83:9000 active...
2012-09-05 15:47:40,960 INFO org.apache.hadoop.ha.ZKFailoverController: Successfully transitioned NameNode at bd09/10.1.1.83:9000 to active state


another client @ha switch

./runRW.sh
Wed Sep  5 16:19:56 CST 2012
12/09/05 16:20:09 WARN retry.RetryInvocationHandler: Exception while invoking getFileInfo of class ClientNamenodeProtocolTranslatorPB after 1 fail over attempts. Trying to fail over after sleeping for 958ms.
12/09/05 16:20:10 WARN retry.RetryInvocationHandler: Exception while invoking getFileInfo of class ClientNamenodeProtocolTranslatorPB after 2 fail over attempts. Trying to fail over after sleeping for 2238ms.
12/09/05 16:20:15 WARN retry.RetryInvocationHandler: Exception while invoking getFileInfo of class ClientNamenodeProtocolTranslatorPB after 3 fail over attempts. Trying to fail over after sleeping for 3806ms.
12/09/05 16:20:19 WARN retry.RetryInvocationHandler: Exception while invoking getFileInfo of class ClientNamenodeProtocolTranslatorPB after 4 fail over attempts. Trying to fail over after sleeping for 6348ms.
12/09/05 16:20:29 WARN retry.RetryInvocationHandler: Exception while invoking getFileInfo of class ClientNamenodeProtocolTranslatorPB after 5 fail over attempts. Trying to fail over after sleeping for 20625ms.
Found 6 items
-rw-r--r--   3 peter supergroup  100291546 2012-09-05 16:15 4
drwxr-xr-x   - peter supergroup          0 2012-08-24 17:03 abc
-rw-r--r--   3 peter supergroup  100291546 2012-09-05 15:10 hadoop-2.0.0-cdh4.0.0.tar.gz
drwxr-xr-x   - peter supergroup          0 2012-09-04 16:41 smallfiles
drwxr-xr-x   - peter supergroup          0 2012-08-28 16:57 smallfiles1
Deleted 4
Wed Sep  5 16:20:57 CST 2012

你可能感兴趣的:(cdh)