初次翻译,如有不通顺的地方,请多多包涵。
我们被问过很多关于怎样选择Apache Hadoop工作节点硬件的问题。我在yahoo这段期间,我们买了很多机器(6*2TB SATA 硬盘、24G内存、8核CPU和双网卡配置)。这已经证明是最好的配置。今天我还看到的系统配置是(12*2TB SATA 硬盘,48G的内存、8核CPU和双网卡配置。我们在今年不久将增加到3TB的硬盘。
What configuration makes sense for any given organization is driven by such ratios as the storage-to-compute ratio of your workload and other factors that cannot be answered in a generic way. Further, the hardware industry moves quickly. In this post I’ll try to outline the principles that have generally guided Hadoop hardware configuration selections over the last six years. All of these thoughts are aimed at designing medium to large Apache Hadoop clusters. Scott Carey made a good case for smaller machines for small clusters the other day on the Apache mailing list.
The key for Hadoop clusters is to buy quality commodity equipment. Most Hadoop purchasers are cost conscious and as your clusters grow, their cost can be significant. When thinking about cost, one needs to think about the whole system, including network, power and the extra components included in many high-end systems. Remember that Hadoop is built to handle component failure well and to scale out on low cost gear. RAID cards, redundant power supplies and other per-component reliability features are not needed. Buy error-correcting RAM and SATA drives with good MTBF numbers. Good RAM allows you to trust the quality of your computations. Hard drives are the largest source of failures, so buy decent ones.
On CPU: It helps to understand your workload, but for most systems I recommend sticking with medium clock speeds and no more than 2 sockets. Both your upfront costs and power costs rise quickly on the high-end. For many workloads, the extra performance per node is not cost-effective.
On Power: Power is a major concern when designing Hadoop clusters. It is worth understanding how much power the systems you are buying use and not buying the biggest and fastest nodes on the market. In years past we saw huge savings in pricing and significant power savings by avoiding the fastest CPUs, not buying redundant power supplies, etc. Nowadays, vendors are building machines for cloud data centers that are designed to reduce cost and power and that exclude a lot of the niceties that bulk up traditional servers. Supermicro, Dell and HP all have such product lines for cloud providers, so if you are buying in large volume, it is worth looking for stripped-down cloud servers.
On RAM: What you need to consider is the amount of RAM needed to keep the processors busy and where the knee in the cost curve resides. Right now 48GB seems like a pretty good number. You can get this much RAM at commodity prices on low-end server motherboards. This is enough to provide the Hadoop framework with lots of RAM (~4 GB) and still have plenty to run many processes. Don’t worry too much about RAM, you’ll find a use for it, often running more processes in parallel. If you don’t, the system will still use it to good effect, caching disk data and improving performance.
On Disk:注意寻求购买大容量的SATA驱动器,通常7200转。hadoop是很吃存储和效率探索,但是它不一定需要快速的,昂贵的硬盘驱动器。记住12-驱动系统你通常要获得一个节点24TB或36TB。直到最近,把大容量的存储放在一个节点上面是不切合实际的,因为在庞大的集群中,磁盘故障会经常发生,复制24+TB能把网络陷入到泥潭中,这样会中断正常工作并且导致job丢失SLAs。在最近的一个hadoop版本0.20.204中设计了更优美处理磁盘错误,允许机器在剩余的磁盘里继续提供服务。这些变化,我们期望在更多12+驱动系统。总之,为存储增加磁盘,而不是寻求,如果你的工作量不需要这么大存储,合理的节约方法就是每个盒拆除6个磁盘或者4个磁盘。
On Disk: Look to buy high-capacity SATA drives, usually 7200RPM. Hadoop is storage hungry and seek efficient but it does not require fast, expensive hard drives. Keep in mind that with 12-drive systems you are generally getting 24 or 36 TB/node. Until recently, putting this much storage in a node was not practical because, in large clusters, disk failures are a regular occurrence and replicating 24+TB could swamp the network for long enough to really disrupt work and cause jobs to miss SLAs. The most recent release of Hadoop 0.20.204 is engineered to handle the failure of drives more elegantly, allowing machines to continue serving from their remaining drives. With these changes, we expect to see a lot of 12+ drive systems. In general, add disks for storage and not seeks. If your workload does not require huge amounts of storage, dropping disk count to 6 or 4 per box is a reasonable way to economize.
On Network:这是最难判断的因素。hadoop工作量变化很大,关键是在合理的速度和合理的价格之间,购买足够的网络容量允许你的集群中所有的节点进行通信。对于比较小的集群,我建议带宽至少1GB,你的全部节点都连接到一个很好的交换机上就很容易取得。在略大的集群里这仍然是个好的目标,虽然是基于工作负载你可能会下降。在非常大的数据中心yahoo在建的时候,他们看到2*10GB 每20个节点在一个机架上连接一对中心交换机。机架上的节点连接了2个1GB的连接。根据经验来看,
On Network: This is the hardest variable to nail down. Hadoop workloads vary a lot. The key is to buy enough network capacity to allow all nodes in your cluster to communicate with each other at reasonable speeds and for reasonable cost. For smaller clusters, I’d recommend at least 1GB all-to-all bandwidth, which is easily achieved by just connecting all of your nodes to a good switch. With larger clusters this is still a good target although based on workload you can probably go lower. In the very large data centers the Yahoo! built, they are seeing 2*10GB per 20 node rack going up to a pair of central switches, with rack nodes connected with two 1GB links. As a rule of thumb, watch the ratio of network-to-computer cost and aim for network cost being somewhere around 20% of your total cost. Network costs should include your complete network, core switches, rack switches, any network cards needed, etc. We’ve been seeing InfiniBand and 10GB Ethernet networks to the node now. If you can build this cost effectively, that’s great. However, keep in mind that Hadoop grew up with commodity Ethernet, so understand your workload requirements before spending too much on the network.
I hope you find this outline of how we think about Apache Hadoop hardware helpful.