Classic MapReduce (MapReduce1) Memory

Memory Usage

By default, Hadoop allocates 1000 MB (1GB) of memory to each daemon (NameNode, JobTracker, DataNode, SecondaryNameNode, TaskTracker) it runs. This is controlled by the HADOOP_HEAPSIZE settings in hadoop-env.sh. In addition, the task tracker launches separate child JVMs to run map and reduce tasks in, so we need to factor these into the total memory footprint of a worker machine.

 

The max number of map tasks that can run on a tasktracker at one time is controlled by the mapred.tasktracker.map.tasks.maximum property, which is default to 2 tasks. There's also a corresponding property for reduce tasks, mapred.tasktracker.reduce.maxmum which also default to 2 tasks. The tasktracker is said to have 2 map slots and 2 reduce slots.

 

MapReduce 1 has been criticised for it's static slot based memory model for a long time. Rather than using a true resource management system, a MapReduce cluster is divided into a fixed number of map and reduce slots based on a static configuration – so slots are wasted anytime the cluster workload does not fit the static configuration. Furthermore, the slot-based model makes it hard for non-MapReduce applications to be scheduled appropriately. This problem has been resolved in YARN as well as some third-party libraries such as Facebook Corona

 

Get back to MapReduce 1, the memory given to each JVM running a task can be changed by setting the mapred.child.java.opts property. The default setting is -Xmx200m, which gives each task 200MB of memory. Make sure you don't configure this setting to be final so that you can provide extra JVM options here to enable verbose GC logging to debug GC. The default configuration therefore uses 2800MB memory for a worker machine. Below table illustrate the typical memory usage of Hadoop Classic MapReduce.

JVM Default memory used(MB)

Memory used for 8 processors

400MB/child

DataNode 1000 1000
TaskTracker 1000 1000
TaskTracker child map task 2x200 7x400
TaskTracker child reduce task 2x200 7x400
Total 2800 7600

 

Why there is 7 map tasks and 7 reduce tasks while we have a 8 processors to process. The reason is that normally MapReduce jobs are I/O bounded, it make sense to have more tasks than processors to get better CPU utilization. The amount of oversubscription depends on the CPU utilization of jobs you run, but a good role of thumb is to have a factor of between one and two more tasks than processors, both map and reduce tasks counted.

 

In above table, we had 8 processors, so we set both mapred.tasktracker.map.tasks.maximum and mapred.tasktracker.reduce.tasks.maximum to 7 (not 8, because the datanode and the tasktracker each take one slot). Also we increase the memory available to each child task to 400MB, the total memory usage would be 7600MB.

 

Whether this Java memory allocation will fit into 8GB of physical memory depends on the other processes that are running on the machine If you are running Streaming or Pips program, this allocation will probably be inappropriate since it doesn't allow enough memory for user's Streaming or Pips processes to run. This is the thing we have to avoid because it will lead to processes swapped out, which will lead to severe performance degradation.

 

Task memory limits

On a shared cluster, it shouldn't be possible for one user's errant MapReduce program (memory leak for example) to bring down nodes in the cluster. Hadoop provide several ways to solve this problem.

  • The naive way, locking down mapred.child.java.opts by marking it as final to prevent user specify too much memory for their tasks. This is inappropriate most of time because there're legitimate reasons to allow some jobs to use more memory, so this is not always an acceptable solution, or user may use extra JVM opts to profile their MapReduce program. Furthermore, even locking down mapred.child.java.opts doesn't solve the problem because tasks can spawn new processes that are not constrained in memory usage. Streaming and Pipes  job do exactly that for example.
  • Via the Linux ulimit command, which can be done at the operating-system level (in the limits.conf, typically found in /etc/security) or by setting mapred.child.ulimit in the hadoop configuration. The value is specified in kilobytes, and should be comfortably larger than the memory of the JVM set by mapred.child.java.opts; otherwise the child JVM might now start.
  • Utilize Hadoop's task memory monitoring feature. The idea is that an administrator sets a range of allowed virtual memory limits for tasks on the cluster, and users specify the maximum memory requirements for their jobs in the job configuration. If user doesn't specify memory requirements for the job, then the defaults are used (mapred.job.map.memory.mb and mapred.job.reduce.memory.mb, both default to -1).

The task memory monitoring approach has a couple of advantages over the ulimit one. 

  • It enforces the memory usage of the whole task process tree, including spawned processes.
  • It enables memory-aware scheduling, where tasks are scheduled on tasktrackers that have enough free memory to run them.

To enable task memory monitoring, you need to set all six of the properties below. The default values are all -1, which means the feature is disabled.

  1. mapred.cluster.map.memory.mb    The amount of virtual memory that defines a map slot. Map tasks that requires more than this amount of memory will use more than 1 map slot.
  2. mapred.cluster.reduce.memory.mb    The amount of virtual memory that defines a reduce slot. Reduce tasks that requires more than this amount of memory will use more than 1 reduce slot.
  3. mapred.job.map.memory.mb    The amount of virtual memory that a map task requires to run. If a map task exceeds this limit, it may be terminated and marked as failed.
  4. mapred.job.reduce.memory.mb    The amount of virtual memory that a reduce task requires to run. If a reduce task exceeds this limit, it may be terminated and marked as failed.
  5. mapred.cluster.max.map.memory.mb    The max limit that user can set mapred.job.map.memory.mb to.
  6. mapred.cluster.max.reduce.memory.mb    The max limit that user can set mapred.job reduce.memory.mb to.

你可能感兴趣的:(mapreduce)