Apache Hadoop Wins Terabyte Sort Benchmark

1T字节的数据排序209秒内完成,成功打破297秒的纪录。

100亿100字节的纪录,

yahoo拥有13000以上各节点的Hadopp集群。

 

One of Yahoo's Hadoop clusters sorted 1 terabyte of data in 209 seconds , which beat the previous record of 297 seconds in the annual general purpose (daytona) terabyte sort benchmark . The sort benchmark, which was created in 1998 by Jim Gray, specifies the input data (10 billion 100 byte records), which must be completely sorted and written to disk. This is the first time that either a Java or an open source program has won. Yahoo is both the largest user of Hadoop with 13,000+ nodes running hundreds of thousands of jobs a month and the largest contributor, although non-Yahoo usage and contributions are increasing rapidly.

The cluster statistics were:

  • 910 nodes
  • 2 quad core Xeons @ 2.0ghz per a node
  • 4 SATA disks per a node
  • 8G RAM per a node
  • 1 gigabit ethernet on each node
  • 40 nodes per a rack
  • 8 gigabit ethernet uplinks from each rack to the core
  • Red Hat Enterprise Linux Server Release 5.1 (kernel 2.6.18)
  • Sun Java JDK 1.6.0_05-b13

The benchmark was run with Hadoop trunk (pre-0.18) with a couple of optimization patches to remove intermediate writes to disk. The sort used 1800 maps and 1800 reduces and allocated enough memory to buffers to hold the intermediate data in memory. All of the code for the benchmark has been checked in as a Hadoop example.

你可能感兴趣的:(apache,hadoop,linux,Yahoo,rack)