首页 > 代码库 > zookeeper集群崩溃处理
zookeeper集群崩溃处理
今天在私有化项目中遇到如下问题:
1.客户反馈用户登录返回303
2.登录服务器查看是大量的log将服务器磁盘空间占用殆尽,导致所有服务进程仍旧存在但是监听端口失败,服务不可用
3.清理日志文件
4.日志文件清理完成后,重启服务,重启zookeeper服务时出现以下报错
2017-07-12 10:52:39,171 [myid:] - INFO [main:QuorumPeerConfig@103] - Reading configuration from: /data/apps/config/zookeeper/zoo.cfg
2017-07-12 10:52:39,176 [myid:] - INFO [main:QuorumPeerConfig@340] - Defaulting to majority quorums
2017-07-12 10:52:39,180 [myid:2] - INFO [main:DatadirCleanupManager@78] - autopurge.snapRetainCount set to 5
2017-07-12 10:52:39,180 [myid:2] - INFO [main:DatadirCleanupManager@79] - autopurge.purgeInterval set to 24
2017-07-12 10:52:39,183 [myid:2] - INFO [PurgeTask:DatadirCleanupManager$PurgeTask@138] - Purge task started.
2017-07-12 10:52:39,194 [myid:2] - INFO [main:QuorumPeerMain@127] - Starting quorum peer
2017-07-12 10:52:39,196 [myid:2] - INFO [PurgeTask:DatadirCleanupManager$PurgeTask@144] - Purge task completed.
2017-07-12 10:52:39,206 [myid:2] - INFO [main:NIOServerCnxnFactory@94] - binding to port 0.0.0.0/0.0.0.0:2181
2017-07-12 10:52:39,218 [myid:2] - INFO [main:QuorumPeer@959] - tickTime set to 2000
2017-07-12 10:52:39,218 [myid:2] - INFO [main:QuorumPeer@979] - minSessionTimeout set to -1
2017-07-12 10:52:39,218 [myid:2] - INFO [main:QuorumPeer@990] - maxSessionTimeout set to -1
2017-07-12 10:52:39,218 [myid:2] - INFO [main:QuorumPeer@1005] - initLimit set to 10
2017-07-12 10:52:39,230 [myid:2] - INFO [main:FileSnap@83] - Reading snapshot /data/apps/data/zookeeper/version-2/snapshot.60000888d
2017-07-12 10:52:39,341 [myid:2] - ERROR [main:Util@239] - Last transaction was partial.
2017-07-12 10:52:39,342 [myid:2] - ERROR [main:QuorumPeer@497] - Unable to load database on disk
java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:392)
at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
at org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:576)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:595)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:561)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:643)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:547)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.<init>(FileTxnLog.java:522)
at org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:354)
at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:132)
at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:450)
at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:440)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:153)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:111)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
2017-07-12 10:52:39,345 [myid:2] - ERROR [main:QuorumPeerMain@89] - Unexpected exception, exiting abnormally
java.lang.RuntimeException: Unable to run quorum server
经查阅资料得知,造成zookeeper崩溃的原因是
zookeeper呈现给使用某些状态的所有客户端进程一致性的状态视图。当一个客户端从zookeeper获得响应时,客户端可以非常肯定这个响应信息与其他响应信息或其他客户端所接收的响应均保持一致。有时,zookeeper客户端库与zookeeper服务的连接会丢失,而且服务提供一致性保证信息,当客户端发现自己处于这种状态时就会返回这种状态。
解决方法:
1.查看zookeeper的配置文件,找到数据的存放目录
cat /etc/zookeeper/conf/zoo.cfg
2.删除或重命名数据配置文件
cd /var/lib/zookeeper
mv ./version-2 ./version-2.bak
3.重新启动zookeeper,查看进程以及端口号是否被监听。
zookeeper集群崩溃处理