Bigdata-Cloudera CDH5生产环境推荐的硬件配置

Bigdata-Cloudera CDH5生产环境推荐的硬件配置

  • Master Node Hardware Recommendations
  • Typical configurations for worker nodes
    • Worker Nodes—CPU
    • Worker Nodes—RAM
    • Worker Nodes—Disk

首先看官方推荐的硬件配置:

Master Node Hardware Recommendations

▪ Carrier-class hardware
▪ Dual power supplies
▪ Dual Ethernet cards

Bonded to provide failover
▪ RAIDed hard drives
▪ Reasonable amount of RAM

128GB recommended

Typical configurations for worker nodes

Midline: deep storage, 1Gb Ethernet
─ 16 x 3TB SATA II hard drives, in a non-RAID, JBOD † configuration
─ 1 or 2 of the 16 drives for the OS, with RAID-1 mirroring
─ 2 x 8-core 3.0GHz CPUs, 15MB cache
─ 256GB RAM
─ 2x1 Gigabit Ethernet
High-end: high memory, spindle dense, 10Gb Ethernet
─ 24 x 1TB Nearline/MDL SAS hard drives, in a non-RAID, JBOD*
configuration
─ 2 x 8-core 3.0GHz CPUs, 15MB cache
─ 512GB RAM (or more)
─ 1x10 Gigabit Ethernet

Worker Nodes—CPU

▪ Hex- and octo-core CPUs are commonly available
▪ Hyper-threading and quick-path interconnect (QPI) should be enabled
▪ Hadoop nodes are typically disk- and network-I/O bound

Therefore, top-of-the-range CPUs are usually not necessary
Some types of Hadoop jobs do make heavy use of CPU resources
─ Clustering and classification
─ Complex text mining
─ Natural language processing
─ Feature extraction
─ Image manipulation
You might need more processing power on your worker nodes if your specific
workload requires it

Worker Nodes—RAM

▪ Worker node configuration specifies the amount of memory and number of
cores that map tasks, reduce tasks, and ApplicationMasters can use on that
node
▪ Each map and reduce task typically takes 2GB to 4GB of RAM
▪ Each ApplicationMaster typically takes 1GB of RAM
▪ Worker nodes should
▪ Ensure you have enough RAM to run all tasks, plus overhead for the DataNode
and NodeManager daemons, plus the operating system
▪ Rule of thumb:
Total number of tasks = Number of physical processor cores minus one

be using virtual memory
This is a starting point, and should not be taken as a definitive setting for all
clusters

New, memory-intensive processing frameworks are being deployed on many
Hadoop clusters
─ Impala
─ Spark
▪ HDFS caching can also take advantage of extra RAM on worker nodes
▪ Good practice to equip your worker nodes with as much RAM as you can

Memory configurations up to 512GB per worker node are not unusual for
workloads with high memory requirements

Worker Nodes—Disk

Hadoop’s architecture impacts disk space requirements
─ By default, HDFS data is replicated three times
─ Temporary data storage typically requires 20-30 percent of a cluster’s raw
disk capacity
In general, more spindles (disks) is better

Use 3.5 inch disks

Faster, cheaper, higher capacity than 2.5" disks
7,200 RPM SATA/SATA II drives are fine

In practice, we see anywhere from four to 24 disks (or even more) per node
No need to buy 15,000 RPM drives
8 x 1.5TB drives is likely to be better than 6 x 2TB drives

Different tasks are more likely to be accessing different disks


A good practical maximum is 36TB per worker node

More than that will result in massive network traffic if a node dies and block
re-replication must take place
▪ Recommendation: dedicate 1 disk for OS and logs, use the other disks for
Hadoop data
▪ Mechanical hard drives currently provide a significantly better cost/
performance ratio than solid-state drives (SSDs)
▪ For hybrid clusters (both SSDs and HDDs), using SSDs for non-compressed
intermediate shuffle data leads to significant performance gains

你可能感兴趣的:(BigData)