Description
The disk that ZooKeeper was using filled up. During a snapshot write, I got the following exception
2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - Severe unrecoverable error, exiting
java.io.IOException: No space left on device
at java.io.FileOutputStream.writeBytes(Native Method)
at java.io.FileOutputStream.write(FileOutputStream.java:282)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
at org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309)
at org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306)
at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484)
at org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162)
at org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101)
Then many subsequent exceptions like:
2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was partial.
2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected exception, exiting abnormally
java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:375)
at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
at org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.
at org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341)
at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130)
at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
at org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259)
at org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386)
at org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138)
at org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112)
at org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
at org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
It seems to me that writing the transaction log should be fully atomic to avoid such situations. Is this not the case?
Answers
delete zookeeper snapshot file its solved.
1.I was able to workaround the issue by deleting the partially written snapshot file
2.I believe the exception is being thrown while reading the snapshot and the partial transaction message is not an indication of what is causing it to crash. It sounds right that we should try a different snapshot, but according to the log messages you posted, it sounds like the problem is that we are not catching EOFException.
3.So there exceptions are thrown when ZooKeeper is running? Am not sure why its exiting so many times. Do you guys restart the ZK server if it dies?
4.We run ZooKeeper with runit, so yes it is restarted when it dies. It ends up in a loop of:
5.I thought you said it does not recover when disk was full, but looks like the disk is still full? No?
6.Here is the full sequence of events (sorry for the confusion):
2013-01-16 15:01:35,194 - ERROR [main:Util@239] - Last transaction was partial.
2013-01-16 15:01:35,196 - ERROR [main:ZooKeeperServerMain@63] - Unexpected exception, exiting abnormally
java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:375)
at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
at org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.
at org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341)
at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130)
at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
at org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259)
at org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386)
at org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138)
at org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112)
at org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
at org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
ubuntu@ip-10-78-19-254:/opt/zookeeper-3.4.3/data/version-2$ ls -lat
total 18096
drwxr-xr-x 2 zookeeper zookeeper 4096 Jan 16 06:41 .
rw-rr- 1 zookeeper zookeeper 0 Jan 16 06:41 log.19a3e
rw-rr- 1 zookeeper zookeeper 585377 Jan 16 06:41 snapshot.19a3d
rw-rr- 1 zookeeper zookeeper 67108880 Jan 16 03:11 log.19a2a
rw-rr- 1 zookeeper zookeeper 585911 Jan 16 03:11 snapshot.19a29
rw-rr- 1 zookeeper zookeeper 67108880 Jan 16 03:11 log.11549
rw-rr- 1 zookeeper zookeeper 585190 Jan 15 17:28 snapshot.11547
rw-rr- 1 zookeeper zookeeper 67108880 Jan 15 17:28 log.1
rw-rr- 1 zookeeper zookeeper 296 Jan 14 16:44 snapshot.0
drwxr-xr-x 3 zookeeper zookeeper 4096 Jan 14 16:44 ..
ubuntu@ip-10-78-19-254:/opt/zookeeper-3.4.3/data/version-2$ sudo rm log.19a3e
ubuntu@ip-10-78-19-254:/opt/zookeeper-3.4.3/data/version-2$ sudo rm snapshot.19a3d
1. Attaching zookeeper.log
2.FYI, this issue is a duplication of ZOOKEEPER-1612 (curiously, a permutation of the last two digits, heh). I'd suggest to close 1612 as dup instead, if possible.
3.Ill makr 1612 as dup. Thanks for pointing that out Edward.
4.Looks like the header was incomplete. Unfortunately we do not handle corrupt header but do handle corrupt txn's later. Am suprised that this happened twice in a row for 2 users. Ill upload a patch and test case.
5.Should FileTxnIterator.goToNextLog() return false if the header is corrupted/incomplete, or should it skip the log file and go to the next log file if it exists?
6.-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12645856/ZOOKEEPER-1621.patch
against trunk revision 1596284.
+1 @author. The patch does not contain any @author tags.
+1 tests included. The patch appears to include 3 new or modified tests.
+1 javadoc. The javadoc tool did not generate any warning messages.
+1 javac. The applied patch does not increase the total number of javac compiler warnings.
+1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
-1 core tests. The patch failed core unit tests.
+1 contrib tests. The patch passed contrib unit tests.
Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2105//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2105//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2105//console
This message is automatically generated.
1.Here's a different option - intuitively once zookeeper fails to write to disk, by continuing to operate normally it violates its promises to users (which is that if a majority acked the data is always there even if reboots happen). Once we realize the promise can't be kept it may be better to crash the server at that point and violate liveness (no availability) rather than to continue and risk coming up with a partial log at a later point violating safety (inconsistent state, lost transactions, etc).
2.I'm fine with Alex's suggestion. We should document how to manually recover when the server doesn't start because the log file doesn't contain the complete header.
3.I actually like Alexander Shraer's suggestion. However, if this is going to be the way you recommended recovering a corrupt log file, there should be a script that does it for users: zk-recover.sh or some such. In this world of deployment automation, it's not a nice thing to say "go delete the most recent log segment from ZK's data dir". Much better for the application to handle it through a script or command.
4.Apart from these corrective measures there should be some preventive measures as well.
Can we have disk space availability checker which check periodically whether disk space is available or not and if not available then close the Zookeeper gracefully.
5.You mean, like a ZK thread dedicated to this? What would the behavior be, only shutdown if it's the leader?
6.Yes, dedicated thread for this like org.apache.zookeeper.server.DatadirCleanupManager
7.Reviving this old thread. Alexander Shraer has a valid concern about trading off consistency for availability. However, for the specific issue being addressed here, we can have both.
The patch skips transaction logs with an incomplete header (the first 16 bytes). Skipping such files should not cause any loss of data as the header is an internal bookkeeping write from Zookeeper and does not contain any user data. This avoids the current behavior of Zookeeper crashing on encountering an incomplete header, which compromises availability.
This has been a recurring problem for us in production because our app's operating environment occasionally causes a Zookeeper server's disk to become full. After that, the server invariably runs into this problem - perhaps because there's something else that deterministically triggers a log rotation when the previous txn log throws an IOException due to disk full?
That said, we can tighten the exception being caught in Michi Mutsuzaki's patch to EOFException instead of IOException to make sure that the log we are skipping indeed only has a partially written header and nothing else (in FileTxnLog.goToNextLog).
Additionally, I have written a test to verify that EOFException is thrown if and only if the header is truncated. Zookeeper already ignores any other partially written transactions in the txn log. If that's useful, I can upload the test, thanks.
8.Agreed. Forcing users to manually clean up the partial/empty header in this scenario seems undesirable, and if we only catch EOFException instead of IOException, we shouldn't run into any problems with correctness. Additionally, since this issue should only occur "legitimately" in the most recent txn log file, we can be even more conservative and only continue in that case.
9.Thanks Meyer Kizner. Your suggestion of doing this only for the most recent txn log file is sound. Are you also suggesting that we delete this truncated txn log file?Cause, if we skip it and don't delete, then in the future, newer txn log files will get created. So, the truncated txn log file will no longer be the latest txn log when we do a purge afterwards.Deletion seems consistent with this approach as well as consistent with PurgeTxnLog's behavior.
10.Yes, we would have to delete such a log file upon encountering it. I don't believe this would cause any problems, and it seems desirable to have the extra check this enables.
11.he proposal of the fix makes sense to me.
Is it feasible to make a stronger guarantee for the ZooKeeper serialization semantics - that is, under no cases (disk full, power failure, hardware failure) would ZooKeeper generates invalid persistent files (for both snapshot and tx logs)? This might be possible by serializing things to a swap file first and then at one point do an atomic rename of the file. With the guarantee of the sanity of the on disk formats the deserializing logic would be simplified, as there will not be many corner cases to consider, besides the existing basic checksum check logic.
I can think two potential drawback of this approach:
My feeling is the best approach might be a mix of efforts on both serialization and deserialization side:
详情查看:https://issues.apache.org/jira/browse/ZOOKEEPER-1621