Contents
Apache ZooKeeper is a highly reliable and available service that provides coordination between distributed processes.
Upgrading ZooKeeper to the Latest CDH3 Release
Cloudera recommends that you use arolling upgradeprocess to upgrade ZooKeeper: that is, upgrade one server in the ZooKeeper ensemble at a time. This means bringing down each server in turn, upgrading the software, then restarting the server. The server will automatically rejoin the quorum, update its internal state with the current ZooKeeper leader, and begin serving client sessions.
This method allows you to upgrade ZooKeeper without any interruption in the service, and also lets you monitor the ensemble as the upgrade progresses, and roll back if necessary if you run into problems.
The instructions that follow assume that you are upgrading ZooKeeper as part of an upgrade to the latest CDH3 release, and have already performed the steps underUpgrading CDH3.
Performing a ZooKeeper Rolling Upgrade
Follow these steps to perform a rolling upgrade.
Step 1: Stop the ZooKeeper Server on the First Node
To stop the ZooKeeper server:
$ sudo /sbin/service hadoop-zookeeper-server stop
|
or
$ sudo /sbin/service hadoop-zookeeper stop
|
depending on the platform and release.
Step 2: Install the ZooKeeper Base Package on the First Node
SeeInstalling the ZooKeeper Base Package.
Step 3: Install the ZooKeeper Server Package on the First Node
SeeInstalling the ZooKeeper Server Package.
Step 4: Re-enable the Server
Because of a packaging problem in earlier releases, you need to re-enable the server manually after upgrading ZooKeeper from CDH3 Update 1 or earlier to the latest CDH3 release:
$ sudo /sbin/chkconfig --add hadoop-zookeeper-server
|
Step 5: Restart the Server
SeeInstalling the ZooKeeper Server Packagefor instructions on starting the server.
The upgrade is now complete on this server and you can proceed to the next.
Step 6: Upgrade the Remaining Nodes
Repeat Steps 1-5 above on each of the remaining nodes.
The ZooKeeper upgrade is now complete.
Installing the ZooKeeper Packages
There are two ZooKeeper server packages:
- Thehadoop-zookeeperbase package provides the basic libraries and scripts that are necessary to run ZooKeeper servers and clients. The documentation is also included in this package.
- Thehadoop-zookeeper-serverpackage contains theinit.dscripts necessary to run ZooKeeper as a daemon process. Becausehadoop-zookeeper-serverdepends onhadoop-zookeeper, installing the server package automatically installs the base package.
Installing the ZooKeeper Base Package
To install ZooKeeper on Ubuntu and other Debian systems:
$ sudo apt-get install hadoop-zookeeper
|
To install ZooKeeper On Red Hat-compatible systems:
$ sudo yum install hadoop-zookeeper
|
To install ZooKeeper on SUSE systems:
$ sudo zypper install hadoop-zookeeper
|
Installing the ZooKeeper Server Package and Starting ZooKeeper on a Single Server
The instructions provided here deploy a single ZooKeeper server in "standalone" mode. This is appropriate for evaluation, testing and development purposes, but may not provide sufficient reliability for a production application. SeeInstalling ZooKeeper in a Production Environmentfor more information.
To install a ZooKeeper server on Ubuntu and other Debian systems:
$ sudo apt-get install hadoop-zookeeper-server
|
To install ZooKeeper On Red Hat-compatible systems:
$ sudo yum install hadoop-zookeeper-server
|
To install ZooKeeper on SUSE systems:
$ sudo zypper install hadoop-zookeeper-server
|
To start ZooKeeper
Use the following command to start ZooKeeper:
$ sudo /sbin/service hadoop-zookeeper-server start
|
Installing ZooKeeper in a Production Environment
For use in a production environment, you should deploy ZooKeeper as an ensemble with an odd number of nodes. As long as a majority of the servers in the ensemble are available, the ZooKeeper service will be available. The minimum recommended ensemble size is three ZooKeeper servers, and it is recommended that each server run on a separate machine.
ZooKeeper deployment on multiple servers requires a bit of additional configuration. The configuration file (zoo.cfg) on each server must include a list of all servers in the ensemble, and each server must also have amyidfile in its data directory (by default/var/zookeeper) that identifies it as one of the servers in the ensemble.
For instructions describing how to set up a multi-server deployment, seeInstalling a Multi-Server Setup.
Setting up Supervisory Process for the ZooKeeper Server
The ZooKeeper server is designed to be both highly reliable and highly available. This means that:
- If a ZooKeeper server encounters an error it cannot recover from, it will "fail fast" (the process will exit immediately)
- When the server shuts down, the ensemble remains active, and continues serving requests
- Once restarted, the server rejoins the ensemble without any further manual intervention.
Cloudera recommends that you fully automate this process by configuring a supervisory service to manage each server, and restart the ZooKeeper server process automatically if it fails. See theZooKeeper Administrator's Guidefor more information.
Maintaining a ZooKeeper Server
The ZooKeeper server continually saves znode snapshot files and, optionally, transactional logs in a Data Directory to enable you to recover data. It's a good idea to back up the ZooKeeper Data Directory periodically. Although ZooKeeper is highly reliable because a persistent copy is replicated on each server, recovering from backups may be necessary if a catastrophic failure or user error occurs.
The ZooKeeper server does not remove the snapshots and log files, so they will accumulate over time. You will need to cleanup this directory occasionally, based on your backup schedules and processes. To automate the cleanup, azkCleanup.shscript is provided in thebindirectory of thehadoop-zookeeperbase package. Modify this script as necessary for your situation. In general, you want to run this as a cron task based on your backup schedule.
The data directory is specified by thedataDirparameter in the ZooKeeperconfiguration file, and the data log directory is specified by thedataLogDirparameter.
For more information, seeOngoing Data Directory Cleanup.
Viewing the ZooKeeper Documentation
For additional ZooKeeper documentation, seehttp://archive.cloudera.com/cdh/3/zookeeper/.