For a small cluster (on the order of 10 nodes), it is usually acceptable to run the namenode
and the jobtracker on a single master machine (as long as at least one copy of the
namenode’s metadata is stored on a remote filesystem). As the cluster and the number
of files stored in HDFS grow, the namenode needs more memory, so the namenode
and jobtracker should be moved onto separate machines.
Machines running the namenodes should typically run on 64-bit hardware to avoid the
3 GB limit on Java heap size in 32-bit architectures.
You can increase the namenode’s memory without changing the memory allocated (分配值)to
other Hadoop daemons by setting HADOOP_NAMENODE_OPTS in hadoop-env.sh to include a
306 | Chapter 9: Setting Up a Hadoop Cluster
JVM option for setting the memory size. HADOOP_NAMENODE_OPTS allows you to pass extra
options to the namenode’s JVM. So, for example, if using a Sun JVM, -Xmx2000m would
specify that 2,000 MB of memory should be allocated to the namenode.
If you change the namenode’s memory allocation, don’t forget to do the same for the
secondary namenode (using the HADOOP_SECONDARYNAMENODE_OPTS variable), since its
memory requirements are comparable to the primary namenode’s. You will probably
also want to run the secondary namenode on a different machine, in this case.
System logfiles produced by Hadoop are stored in $HADOOP_INSTALL/logs by default.
This can be changed using the HADOOP_LOG_DIR setting in hadoop-env.sh. It’s a good idea
to change this so that logfiles are kept out of the directory that Hadoop is installed in,
since this keeps logfiles in one place even after the installation directory changes after
an upgrade. A common choice is /var/log/hadoop, set by including the following line in
hadoop-env.sh:
export HADOOP_LOG_DIR=/var/log/hadoop
1. The secondary asks the primary to roll its edits file, so new edits go to a new file.
2. The secondary retrieves fsimage and edits from the primary (using HTTP GET).
3. The secondary loads fsimage into memory, applies each operation from edits, then
creates a new consolidated fsimage file.
4. The secondary sends the new fsimage back to the primary (using HTTP POST).
5. The primary replaces the old fsimage with the new one from the secondary, and
the old edits file with the new one it started in step 1. It also updates the fstime file
to record the time that the checkpoint was taken.
At the end of the process, the primary has an up-to-date fsimage file and a shorter
edits file (it is not necessarily empty, as it may have received some edits while the
checkpoint was being taken). It is possible for an administrator to run this process
manually while the namenode is in safe mode, using the hadoop dfsadmin
-saveNamespace command.