1、查看服务状态
[root@VM-centos install]# systemctl status zookeeper.service
● zookeeper.service - Coordination service for distributed applications
Loaded: loaded (/usr/lib/systemd/system/zookeeper.service; enabled; vendor preset: disabled)
Active: failed (Result: start-limit) since Sat 2023-10-07 09:16:22 CST; 29s ago
Process: 12771 ExecStart=/usr/bin/java -cp ${CLASSPATH} $JAVA_OPTS -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.local.only=${JMXLOCALONLY} -Dzookeeper.log.dir=${ZOO_LOG_DIR} -Dzookeeper.root.logger=${ZOO_LOG4J_PROP} $ZOOMAIN $ZOOCFG (code=exited, status=1/FAILURE)
Main PID: 12771 (code=exited, status=1/FAILURE)
Oct 07 09:16:22 VM-5-123-centos systemd[1]: Unit zookeeper.service entered failed state.
Oct 07 09:16:22 VM-5-123-centos systemd[1]: zookeeper.service failed.
Oct 07 09:16:22 VM-5-123-centos systemd[1]: zookeeper.service holdoff time over, scheduling restart.
Oct 07 09:16:22 VM-5-123-centos systemd[1]: Stopped Coordination service for distributed applications.
Oct 07 09:16:22 VM-5-123-centos systemd[1]: start request repeated too quickly for zookeeper.service
Oct 07 09:16:22 VM-5-123-centos systemd[1]: Failed to start Coordination service for distributed applications.
Oct 07 09:16:22 VM-5-123-centos systemd[1]: Unit zookeeper.service entered failed state.
Oct 07 09:16:22 VM-5-123-centos systemd[1]: zookeeper.service failed.
2、查看启动日志
[root@VM-centos install]# journalctl -u zookeeper.service -f
-- Logs begin at Wed 2023-10-04 02:00:32 CST. --
Oct 07 09:16:21 VM-5-123-centos systemd[1]: Started Coordination service for distributed applications.
Oct 07 09:16:22 VM-5-123-centos systemd[1]: zookeeper.service: main process exited, code=exited, status=1/FAILURE
Oct 07 09:16:22 VM-5-123-centos systemd[1]: Unit zookeeper.service entered failed state.
Oct 07 09:16:22 VM-5-123-centos systemd[1]: zookeeper.service failed.
Oct 07 09:16:22 VM-5-123-centos systemd[1]: zookeeper.service holdoff time over, scheduling restart.
Oct 07 09:16:22 VM-5-123-centos systemd[1]: Stopped Coordination service for distributed applications.
Oct 07 09:16:22 VM-5-123-centos systemd[1]: start request repeated too quickly for zookeeper.service
Oct 07 09:16:22 VM-5-123-centos systemd[1]: Failed to start Coordination service for distributed applications.
Oct 07 09:16:22 VM-5-123-centos systemd[1]: Unit zookeeper.service entered failed state.
Oct 07 09:16:22 VM-5-123-centos systemd[1]: zookeeper.service failed.
3、查看服务日志
cat /var/log/zookeeper/zookeeper.log
2023-10-07 09:16:21,615 [myid:] - ERROR [main:QuorumPeerConfig@347] - Invalid configuration, only one server specified (ignoring)
2023-10-07 09:16:21,619 [myid:] - WARN [main:QuorumPeerMain@116] - Either no config or no quorum defined in config, running in standalone mode
2023-10-07 09:16:21,631 [myid:] - ERROR [main:QuorumPeerConfig@347] - Invalid configuration, only one server specified (ignoring)
2023-10-07 09:16:21,757 [myid:] - WARN [NIOServerCxn.Factory:/10.0.5.123:2181:NIOServerCnxn@383] - Exception causing close of session 0x0: ZooKeeperServer not running
2023-10-07 09:16:22,046 [myid:] - ERROR [main:Util@214] - Last transaction was partial.
2023-10-07 09:16:22,063 [myid:] - ERROR [main:Util@214] - Last transaction was partial.
2023-10-07 09:16:22,064 [myid:] - ERROR [main:ZooKeeperServerMain@66] - Unexpected exception, exiting abnormally
java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:392)
at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
at org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:66)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:588)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:607)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:573)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:653)
at org.apache.zookeeper.server.persistence.FileTxnSnapLog.fastForwardFromEdits(FileTxnSnapLog.java:219)
at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:176)
at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:217)
at org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:284)
at org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:407)
at org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:118)
at org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:122)
at org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:89)
at org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:55)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:119)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:81)
从提供的日志中,可以看到ZooKeeper在启动过程中遇到了一个java.io.EOFException
异常。这个异常通常表示在读取文件时意外地到达了文件的末尾。在这种情况下,异常可能是由于ZooKeeper的事务日志文件(txnLog)损坏或不完整所导致的。
以下是一些建议,可以帮助你解决这个问题:
检查ZooKeeper的数据目录(通过dataDir
配置项指定),确保事务日志文件和快照文件的完整性。如果有损坏或不完整的文件,尝试从备份中恢复。
如果没有备份,尝试删除损坏的事务日志文件,然后重新启动ZooKeeper。请注意,这样做可能会导致数据丢失,因此在进行此操作之前,请确保已尝试其他恢复方法。
如果你的ZooKeeper部署是一个集群(Quorum),可以尝试从其他正常运行的节点复制事务日志文件和快照文件,然后将它们放到出现问题的节点上。在执行此操作之前,请确保停止出现问题的ZooKeeper节点。
检查ZooKeeper的配置文件(zoo.cfg
),确保所有配置项(如dataDir
、dataLogDir
等)都正确设置。
如果问题仍然存在,考虑升级ZooKeeper到最新版本,以便获得潜在的错误修复和性能改进。
请注意,处理损坏的事务日志文件和快照文件可能会导致数据丢失。因此,在执行任何操作之前,请确保已采取适当的备份措施。在生产环境中,建议定期备份ZooKeeper的数据目录,以便在出现问题时可以快速恢复。
删除损坏的事务日志,(大小为零的日志
)
ll /var/lib/zookeeper/version-2/
total 65500
-rw-r--r-- 1 zookeeper zookeeper 67108880 Sep 24 01:49 log.1b9e9b
-rw-r--r-- 1 zookeeper zookeeper 67108880 Sep 24 21:00 log.1c86b9
-rw-r--r-- 1 zookeeper zookeeper 67108880 Sep 25 17:37 log.1dcf95
-rw-r--r-- 1 zookeeper zookeeper 67108880 Sep 26 06:50 log.1f30fb
-rw-r--r-- 1 zookeeper zookeeper 67108880 Sep 27 16:48 log.2013c8
-rw-r--r-- 1 zookeeper zookeeper 67108880 Sep 28 05:25 log.21397c
-rw-r--r-- 1 zookeeper zookeeper 67108880 Sep 28 07:51 log.218201
`-rw-r--r-- 1 zookeeper zookeeper 0 Sep 28 07:51 log.2194d7`
-rw-r--r-- 1 zookeeper zookeeper 754513 Sep 24 01:49 snapshot.1c86b7
-rw-r--r-- 1 zookeeper zookeeper 754513 Sep 24 21:00 snapshot.1dcf93
-rw-r--r-- 1 zookeeper zookeeper 754513 Sep 25 17:37 snapshot.1f30f9
-rw-r--r-- 1 zookeeper zookeeper 755068 Sep 26 06:50 snapshot.2013c6
-rw-r--r-- 1 zookeeper zookeeper 754340 Sep 27 16:48 snapshot.21397b
#删除大小为零的日志
[root@VM-centos install]# rm -rf /var/lib/zookeeper/version-2/log.2194d
#重启服务
[root@VM-centos install]# systemctl restart zookeeper.service
#查看服务状态
[root@VM-centos install]# systemctl status zookeeper.service
● zookeeper.service - Coordination service for distributed applications
Loaded: loaded (/usr/lib/systemd/system/zookeeper.service; enabled; vendor preset: disabled)
Active: active (running) since Sat 2023-10-07 09:26:25 CST; 9s ago
Main PID: 22341 (java)
Tasks: 26
Memory: 168.9M
CGroup: /system.slice/zookeeper.service
└─22341 /usr/bin/java -cp /etc/zookeeper:/opt/zookeeper/zookeeper-3.4.14.jar:/opt/zookeeper/lib/audience-annotations-0.5.0.jar:/opt/zookeeper/lib/jline-0.9.94.jar:/o...
Oct 07 09:26:25 VM-5-123-centos systemd[1]: Started Coordination service for distributed applications.]