[Zookeeper] 服务端之单机版服务器启动

1 服务器端整体概览图

概览图
  • ServerCnxnFactory:负责与client之间的网络交互,支持NIO(默认)以及Netty
  • SessionTrackerImpl:会话管理器
  • DatadirCleanupManager:定期清理存在磁盘上的log文件和snapshot文件
  • PreRequestProcessor,SyncRequestProcessor,FinalRequestProcessor:请求处理流程,责任链模式
  • LearnerHandler:Leader与Learner之间的交互
  • FileTxnSnapLog:存储在磁盘上的日志文件
  • DataTree:体现在内存中的存储结构
  • Sessions:Session的相关信息存储

2 单机版服务器启动流程

  1. 执行QuorumPeerMain的main方法,其中先创建一个QuorumPeerMain对象
  2. 调用initializeAndRun方法
    protected void initializeAndRun(String[] args)
        throws ConfigException, IOException
    {
        QuorumPeerConfig config = new QuorumPeerConfig();
        if (args.length == 1) {
            config.parse(args[0]);
        }

        // Start and schedule the the purge task
        DatadirCleanupManager purgeMgr = new DatadirCleanupManager(config
                .getDataDir(), config.getDataLogDir(), config
                .getSnapRetainCount(), config.getPurgeInterval());
        purgeMgr.start();

        if (args.length == 1 && config.servers.size() > 0) {
            runFromConfig(config);
        } else {
            LOG.warn("Either no config or no quorum defined in config, running "
                    + " in standalone mode");
            // there is only server in the quorum -- run as standalone
            ZooKeeperServerMain.main(args);
        }
    }

2.1

  • args实际上就是zoo.cfg中的配置,如下

2.2

  • 创建DatadirCleanupManager实例,参数有snapDir,dataLogDir,snapRetainCount(要保存snapshot文件的个数),purgeInterval(定期清理的频率,单位为小时),snapRetainCountpurgeInterval在zoo.cfg中均可以配置
  • 调用DatadirCleanupManagerstart方法,里面主要依赖PurgeTask,这也是一个线程,其run方法

PurgeTxnLogpurge方法:

    public static void purge(File dataDir, File snapDir, int num) throws IOException {
        // snapshot文件保存的数量小于3,抛异常
        if (num < 3) {
            throw new IllegalArgumentException(COUNT_ERR_MSG);
        }
        // 根据dataDir和snapDir创建FileTxnSnapLog
        FileTxnSnapLog txnLog = new FileTxnSnapLog(dataDir, snapDir);
        // 根据给定数量获取最近的文件
        List snaps = txnLog.findNRecentSnapshots(num);
        // 获取数目
        int numSnaps = snaps.size();
        if (numSnaps > 0) {
            // 第二个参数是最近的snapshot文件
            purgeOlderSnapshots(txnLog, snaps.get(numSnaps - 1));
        }
    }

找最近n个snapshot文件:

    public List findNRecentSnapshots(int n) throws IOException {
        List files = Util.sortDataDir(snapDir.listFiles(), SNAPSHOT_FILE_PREFIX, false);
        int count = 0;
        List list = new ArrayList();
        for (File f: files) {
            if (count == n)
                break;
            if (Util.getZxidFromName(f.getName(), SNAPSHOT_FILE_PREFIX) != -1) {
                count++;
                list.add(f);
            }
        }
        return list;
    }

purgeOlderSnapshots:

    static void purgeOlderSnapshots(FileTxnSnapLog txnLog, File snapShot) {
        // 从snapshot文件名中获取zxid
        final long leastZxidToBeRetain = Util.getZxidFromName(
                snapShot.getName(), PREFIX_SNAPSHOT);

        /**
         * 我们删除名称中带有zxid且小于leastZxidToBeRetain的所有文件。
         * 该规则适用于快照文件和日志文件,
         * zxid小于X的日志文件可能包含zxid大于X的事务。
         * 更准确的说,命名为log.(X-a)的log文件可能包含比snapshot.X文件更新的事务,
         * 如果在该间隔中没有其他以zxid开头即(X-a,X]的日志文件
 
         */
        final Set retainedTxnLogs = new HashSet();
        //获取快照日志,其中可能包含比给定zxid更新的事务,这些日志需要保留下来
        retainedTxnLogs.addAll(Arrays.asList(txnLog.getSnapshotLogs(leastZxidToBeRetain)));

        /**
         * Finds all candidates for deletion, which are files with a zxid in their name that is less
         * than leastZxidToBeRetain.  There's an exception to this rule, as noted above.
         */
        class MyFileFilter implements FileFilter{
            private final String prefix;
            MyFileFilter(String prefix){
                this.prefix=prefix;
            }
            public boolean accept(File f){
                if(!f.getName().startsWith(prefix + "."))
                    return false;
                if (retainedTxnLogs.contains(f)) {
                    return false;
                }
                long fZxid = Util.getZxidFromName(f.getName(), prefix);
                if (fZxid >= leastZxidToBeRetain) {
                    return false;
                }
                return true;
            }
        }
        // add all non-excluded log files
        List files = new ArrayList();
        File[] fileArray = txnLog.getDataDir().listFiles(new MyFileFilter(PREFIX_LOG));
        if (fileArray != null) {
            files.addAll(Arrays.asList(fileArray));
        }

        // add all non-excluded snapshot files to the deletion list
        fileArray = txnLog.getSnapDir().listFiles(new MyFileFilter(PREFIX_SNAPSHOT));
        if (fileArray != null) {
            files.addAll(Arrays.asList(fileArray));
        }

        // remove the old files
        for(File f: files)
        {
            final String msg = "Removing file: "+
                DateFormat.getDateTimeInstance().format(f.lastModified())+
                "\t"+f.getPath();
            LOG.info(msg);
            System.out.println(msg);
            if(!f.delete()){
                System.err.println("Failed to remove "+f.getPath());
            }
        }

    }

先获取到那些需要保留的文件,之后再去删除这些不在保留文件之内的文件。

2.3
判断集群是单机启动还是集群启动,集群走runFromConfig(config),单机走ZooKeeperServerMain.main(args)(其实单机版最终走的是ZooKeeperServerMainrunFromConfig,)

runFromConfig方法:

    public void runFromConfig(ServerConfig config) throws IOException {
        LOG.info("Starting server");
        FileTxnSnapLog txnLog = null;
        try {
            final ZooKeeperServer zkServer = new ZooKeeperServer();
            // Registers shutdown handler which will be used to know the
            // server error or shutdown state changes.
            final CountDownLatch shutdownLatch = new CountDownLatch(1);
            zkServer.registerServerShutdownHandler(
                    new ZooKeeperServerShutdownHandler(shutdownLatch));
            // 构建FileTxnSnapLog对象
            txnLog = new FileTxnSnapLog(new File(config.dataLogDir), new File(
                    config.dataDir));
            zkServer.setTxnLogFactory(txnLog);
            zkServer.setTickTime(config.tickTime);
            zkServer.setMinSessionTimeout(config.minSessionTimeout);
            zkServer.setMaxSessionTimeout(config.maxSessionTimeout);
            // 构建与client之间的网络通信服务组件
            // 这里可以通过zookeeper.serverCnxnFactory配置NIO还是Netty
            cnxnFactory = ServerCnxnFactory.createFactory();
            cnxnFactory.configure(config.getClientPortAddress(),
                    config.getMaxClientCnxns());
            cnxnFactory.startup(zkServer);
            // Watch status of ZooKeeper server. It will do a graceful shutdown
            // if the server is not running or hits an internal error.
            shutdownLatch.await();
            shutdown();

            cnxnFactory.join();
            if (zkServer.canShutdown()) {
                zkServer.shutdown(true);
            }
        } catch (InterruptedException e) {
            // warn, but generally this is ok
            LOG.warn("Server interrupted", e);
        } finally {
            if (txnLog != null) {
                txnLog.close();
            }
        }
    }

初始化网络服务组件后,cnxnFactory.startup(zkServer);
这里以默认网络服务组件为例NIOServerCnxnFactory

    @Override
    public void start() {
        // ensure thread is started once and only once
        if (thread.getState() == Thread.State.NEW) {
            thread.start();
        }
    }

    @Override
    public void startup(ZooKeeperServer zks) throws IOException,
            InterruptedException {
        //调用上面的start方法,实际调用thread的start
        //也就调用了该类的run方法,启动网络服务
        start();
        //这是ZooKeeperServer
        setZooKeeperServer(zks);
        zks.startdata();
        zks.startup();
    }

2.4
zks.startdata()方法:

    public void startdata() 
    throws IOException, InterruptedException {
        //check to see if zkDb is not null
        if (zkDb == null) {
            zkDb = new ZKDatabase(this.txnLogFactory);
        }  
        if (!zkDb.isInitialized()) {
            //loadData进行初始化
            loadData();
        }
    }

ZKDatabase在内存中维护了zookeeper的sessions, datatree和commit logs集合。 当zookeeper server启动的时候会将txnlogs和snapshots从磁盘读取到内存中

ZKDatabase的loadData()方法:

        //如果zkDb已经初始化,设置zxid(内存当中DataTree最新的zxid)
        if(zkDb.isInitialized()){
            setZxid(zkDb.getDataTreeLastProcessedZxid());
        }
        else {
            // 没有初始化,就loadDataBase
            setZxid(zkDb.loadDataBase());
        }

      public long loadDataBase() throws IOException {
          long zxid = snapLog.restore(dataTree, sessionsWithTimeouts, commitProposalPlaybackListener);
          initialized = true;
          return zxid;
      }

loadDataBase()内部调用的是FileTxnSnapLogrestore方法

2.5
zks.startup()方法:

    public synchronized void startup() {
        if (sessionTracker == null) {
            // 创建会话管理器
            createSessionTracker();
        }
        // 启动会话管理器
        startSessionTracker();
        // 设置请求处理器
        setupRequestProcessors();
        // 注册jmx
        registerJMX();

        setState(State.RUNNING);
        notifyAll();
    }

你可能感兴趣的:([Zookeeper] 服务端之单机版服务器启动)