Tuning the Cluster for MapReduce v2 (YARN)

Tuning the Cluster for MapReduce v2 (YARN)

This topic applies to YARN clusters only, and describes how to tune and optimize YARN for your cluster. It introduces the following terms:

  • ResourceManager: A master daemon that authorizes submitted jobs to run, assigns an ApplicationMaster to them, and enforces resource limits.
  • NodeManager: A worker daemon that launches ApplicationMaster and task containers.
  • ApplicationMaster: A supervisory task that requests the resources needed for executor tasks. An ApplicationMaster runs on a different NodeManager for each application. The ApplicationMaster requests containers, which are sized by the resources a task requires to run.
  • vcore: Virtual CPU core; a logical unit of processing power. In a basic case, it is equivalent to a physical CPU core or hyperthreaded virtual CPU core.
  • Container: A resource bucket and process space for a task. A container’s resources consist of vcores and memory.

Identifying Hardware Resources and Service Demand

Begin YARN tuning by comparing hardware resources on the worker node to the sum demand of the worker services you intend to run. First, determine how many vcores, how much memory, and how many spindles are available for Hadoop operations on each worker node. Then, estimate service demand, or the resources needed to run a YARN NodeManager and HDFS DataNode process. There may be other Hadoop services that do not subscribe to YARN, including:
  • Impalad
  • HBase RegionServer
  • Solr

Worker nodes also run system support services and possibly third-party monitoring or asset management services. This includes the Linux operating system.

Estimating and Configuring Resource Requirements

After identifying hardware and software services, you can estimate the CPU cores and memory each service requires. The difference between the hardware complement and this sum is the amount of resources you can assign to YARN without creating contention. Cloudera recommends starting with these estimates:
  • 10-20% of RAM for Linux and its daemon services
  • At least 16 GB RAM for an Impalad process
  • No more than 12-16 GB RAM for an HBase RegionServer process

In addition, you must allow resources for task buffers, such as the HDFS Sort I/O buffer. For vcore demand, consider the number of concurrent processes or tasks each service runs as an initial guide. For the operating system, start with a count of two.

The following table shows example demand estimates for a worker node with 24 vcores and 256 GB of memory. Services that are not expected to run are allocated zero resources.
Table 1. Resource Demand Estimates: 24 vcores, 256 GB RAM
Service vcores Memory (MB)
Operating system 2  
YARN NodeManager 1  
HDFS DataNode 1 1,024
Impala Daemon 1 16,348
HBase RegionServer 0 0
Solr Server 0 0
Cloudera Manager agent 1 1,024
Task overhead 0 52,429
YARN containers 18 137,830
Total 24 262,144

You can now configure YARN to use the remaining resources for its supervisory processes and task containers. Start with the NodeManager, which has the following settings:

Table 2. NodeManager Properties
Property Description Default
yarn.nodemanager.resource.cpu-vcores Number of virtual CPU cores that can be allocated for containers. 8
yarn.nodemanager.resource.memory-mb Amount of physical memory, in MB, that can be allocated for containers. 8 GB
Hadoop is a disk I/O-centric platform by design. The number of independent physical drives (“spindles”) dedicated to DataNode use limits how much concurrent processing a node can sustain. As a result, the number of vcores allocated to the NodeManager should be the lesser of either:
  • (total vcores) – (number of vcores reserved for non-YARN use), or
  • 2 x (number of physical disks used for DataNode storage)
The amount of RAM allotted to a NodeManager for spawning containers should be the difference between a node’s physical RAM minus all non-YARN memory demand. So  yarn.nodemanager.resource.memory-mb = total memory on the node - (sum of all memory allocations to other processes such as DataNode, NodeManager, RegionServer etc.) For the example node, assuming the DataNode has 10 physical drives, the calculation is:
Table 3. NodeManager RAM Calculation
Property Value
yarn.nodemanager.resource.cpu-vcores min(24 – 6, 2 x 10) = 18
yarn.nodemanager.resource.memory-mb 137,830 MB

Sizing the ResourceManager

The ResourceManager enforces limits on YARN container resources and can reject NodeManager container requests when required. The ResourceManager has six properties to specify the minimum, maximum, and incremental allotments of vcores and memory available for a request.
Table 4. ResourceManager Properties
Property Description Default
yarn.scheduler.minimum-allocation-vcores The smallest number of virtual CPU cores that can be requested for a container. 1
yarn.scheduler.maximum-allocation-vcores The largest number of virtual CPU cores that can be requested for a container. 32
yarn.scheduler.increment-allocation-vcores If using the Fair Scheduler, virtual core requests are rounded up to the nearest multiple of this number. 1
yarn.scheduler.minimum-allocation-mb The smallest amount of physical memory, in MB, that can be requested for a container. 1 GB
yarn.scheduler.maximum-allocation-mb The largest amount of physical memory, in MB, that can be requested for a container. 64 GB
yarn.scheduler.increment-allocation-mb If you are using the Fair Scheduler, memory requests are rounded up to the nearest multiple of this number. 512 MB

If a NodeManager has 50 GB or more RAM available for containers, consider increasing the minimum allocation to 2 GB. The default memory increment is 512 MB. For minimum memory of 1 GB, a container that requires 1.2 GB receives 1.5 GB. You can set maximum memory allocation equal to yarn.nodemanager.resource.memory-mb.

The default minimum and increment value for vcores is 1. Because application tasks are not commonly multithreaded, you generally do not need to change this value. The maximum value is usually equal to yarn.nodemanager.resource.cpu-vcores. Reduce this value to limit the number of containers running concurrently on one node.

The example leaves more than 50 GB RAM available for containers, which accommodates the following settings:

Table 5. ResourceManager Calculations
Property Value
yarn.scheduler.minimum-allocation-mb 2,048 MB
yarn.scheduler.maximum-allocation-mb 137,830 MB
yarn.scheduler.maximum-allocation-vcores 18

Configuring YARN Settings

You can change the YARN settings that control MapReduce applications. A client can override these values if required, up to the constraints enforced by the ResourceManager or NodeManager. There are nine task settings, three each for mappers, reducers, and the ApplicationMaster itself:
Table 6. Gateway/Client Properties
Property Description Default
mapreduce.map.memory.mb The amount of physical memory, in MB, allocated for each map task of a job. 1 GB
mapreduce.map.java.opts.max.heap The maximum Java heap size, in bytes, of the map processes. 800 MB
mapreduce.map.cpu.vcores The number of virtual CPU cores allocated for each map task of a job. 1
mapreduce.reduce.memory.mb The amount of physical memory, in MB, allocated for each reduce task of a job. 1 GB
mapreduce.reduce.java.opts.max.heap The maximum Java heap size, in bytes, of the reduce processes. 800 MB
mapreduce.reduce.cpu.vcores The number of virtual CPU cores for each reduce task of a job. 1
yarn.app.mapreduce.am.resource.mb The physical memory requirement, in MB, for the ApplicationMaster. 1 GB
ApplicationMaster Java maximum heap size The maximum heap size, in bytes, of the Java MapReduce ApplicationMaster. Exposed in Cloudera Manager as part of the YARN service configuration. This value is folded into the propertyyarn.app.mapreduce.am.command-opts. 800 MB
yarn.app.mapreduce.am.resource.cpu-vcores The virtual CPU cores requirement for the ApplicationMaster. 1

The settings for mapreduce.[map | reduce].java.opts.max.heap specify the default memory allotted for mapper and reducer heap size, respectively. The mapreduce.[map| reduce].memory.mb settings specify memory allotted their containers, and the value assigned should allow overhead beyond the task heap size. Cloudera recommends applying a factor of 1.2 to the mapreduce.[map | reduce].java.opts.max.heap setting. The optimal value depends on the actual tasks. Cloudera also recommends settingmapreduce.map.memory.mb to 1–2 GB and setting mapreduce.reduce.memory.mb to twice the mapper value. The ApplicationMaster heap size is 1 GB by default, and can be increased if your jobs contain many concurrent tasks. Using these guides, size the example worker node as follows:

Table 7. Gateway/Client Calculations
Property Value
mapreduce.map.memory.mb 2048 MB
mapreduce.reduce.memory.mb 4096 MB
mapreduce.map.java.opts.max.heap 0.8 x 2,048 = 1,638 MB
mapreduce.reduce.java.opts.max.heap 0.8 x 4,096 = 3,277 MB

Defining Containers

With YARN worker resources configured, you can determine how many containers best support a MapReduce application, based on job type and system resources. For example, a CPU-bound workload such as a Monte Carlo simulation requires very little data but complex, iterative processing. The ratio of concurrent containers to spindle is likely greater than for an ETL workload, which tends to be I/O-bound. For applications that use a lot of memory in the map or reduce phase, the number of containers that can be scheduled is limited by RAM available to the container and the RAM required by the task. Other applications may be limited based on vcores not in use by other YARN applications or the rules employed by dynamic resource pools (if used).

To calculate the number of containers for mappers and reducers based on actual system constraints, start with the following formulas:

Table 8. Container Formulas
Property Value
mapreduce.job.maps MIN(yarn.nodemanager.resource.memory-mb / mapreduce.map.memory.mb, yarn.nodemanager.resource.cpu-vcores / mapreduce.map.cpu.vcores, number of physical drives x workload factor) x number of worker nodes
mapreduce.job.reduces MIN(yarn.nodemanager.resource.memory-mb / mapreduce.reduce.memory.mb, yarn.nodemanager.resource.cpu-vcores / mapreduce.reduce.cpu.vcores, # of physical drives x workload factor) x # of worker nodes

The workload factor can be set to 2.0 for most workloads. Consider a higher setting for CPU-bound workloads.

Many other factors can influence the performance of a MapReduce application, including:
  • Configured rack awareness
  • Skewed or imbalanced data
  • Network throughput
  • Co-tenancy demand (other services or applications using the cluster)
  • Dynamic resource pooling

You may also have to maximize or minimize cluster utilization for your workload or to meet Service Level Agreements (SLAs). To find the best resource configuration for an application, try various container and gateway/client settings and record the results.

For example, the following TeraGen/TeraSort script supports throughput testing with a 10-GB data load and a loop of varying YARN container and gateway/client settings. You can observe which configuration yields the best results.

#!/bin/sh
HADOOP_PATH=/opt/cloudera/parcels/CDH/lib/hadoop-0.20-mapreduce
for i in 2 4 8 16 32 64 # Number of mapper containers to test
do
	for j in 2 4 8 16 32 64 # Number of reducer containers to test
	do
		for k in 1024 2048 # Container memory for mappers/reducers to test
		do
			MAP_MB=`echo "($k*0.8)/1" | bc` # JVM heap size for mappers
			RED_MB=`echo "($k*0.8)/1" | bc` # JVM heap size for reducers
			hadoop jar $HADOOP_PATH/hadoop-examples.jar teragen
-Dmapreduce.job.maps=$i -Dmapreduce.map.memory.mb=$k
-Dmapreduce.map.java.opts.max.heap=$MAP_MB 100000000
/results/tg-10GB-${i}-${j}-${k} 1>tera_${i}_${j}_${k}.out 2>tera_${i}_${j}_${k}.err
			hadoop jar $HADOOP_PATH/hadoop-examples.jar terasort
-Dmapreduce.job.maps=$i -Dmapreduce.job.reduces=$j -Dmapreduce.map.memory.mb=$k
-Dmapreduce.map.java.opts.max.heap=$MAP_MB -Dmapreduce.reduce.memory.mb=$k
-Dmapreduce.reduce.java.opts.max.heap=$RED_MB /results/ts-10GB-${i}-${j}-${k}
1>>tera_${i}_${j}_${k}.out 2>>tera_${i}_${j}_${k}.err
			hadoop fs -rmr -skipTrash /results/tg-10GB-${i}-${j}-${k}
                     hadoop fs -rmr -skipTrash /results/ts-10GB-${i}-${j}-${k}
		done
	done
done
http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/cdh_ig_yarn_tuning.html

你可能感兴趣的:(Tuning the Cluster for MapReduce v2 (YARN))