Hadoop v2(Yarn)中的快速Crc32

测试源码:

下面的代码是从Hadoop 2.0.3中PureJavaCrc32C.java中取出来的:

A pure-java implementation of the CRC32 checksum that uses
the CRC32-C polynomial, the same polynomial used by iSCSI
and implemented on many Intel chipsets supporting SSE4.2.

#include"Crc32Table.h"
#include
#include
#include

#define TEST_SIZE (64<<10)
#define BLOCK_SIZE 512
/* Pure Crc32 */
uint32_t crc32(char b[], int off, int len) {
    static uint32_t crc = 0xffffffff;
    uint32_t localCrc = crc;

    while(len > 7) {
        const int c0 =(b[off+0] ^ localCrc) & 0xff;
        const int c1 =(b[off+1] ^ (localCrc >>= 8)) & 0xff;
        const int c2 =(b[off+2] ^ (localCrc >>= 8)) & 0xff;
        const int c3 =(b[off+3] ^ (localCrc >>= 8)) & 0xff;
        localCrc = (T[T8_7_start + c0] ^ T[T8_6_start + c1]) ^ (T[T8_5_start + c2] ^ T[T8_4_start + c3]);

        const int c4 = b[off+4] & 0xff;
        const int c5 = b[off+5] & 0xff;
        const int c6 = b[off+6] & 0xff;
        const int c7 = b[off+7] & 0xff;

        localCrc ^= (T[T8_3_start + c4] ^ T[T8_2_start + c5]) ^ (T[T8_1_start + c6] ^ T[T8_0_start + c7]);

        off += 8;
        len -= 8;
    }

    /* loop unroll - duff's device style */
    switch(len) {
    case 7: localCrc = (localCrc >> 8) ^ T[T8_0_start + ((localCrc ^ b[off++]) & 0xff)];
    case 6: localCrc = (localCrc >> 8) ^ T[T8_0_start + ((localCrc ^ b[off++]) & 0xff)];
    case 5: localCrc = (localCrc >> 8) ^ T[T8_0_start + ((localCrc ^ b[off++]) & 0xff)];
    case 4: localCrc = (localCrc >> 8) ^ T[T8_0_start + ((localCrc ^ b[off++]) & 0xff)];
    case 3: localCrc = (localCrc >> 8) ^ T[T8_0_start + ((localCrc ^ b[off++]) & 0xff)]; 
    case 2: localCrc = (localCrc >> 8) ^ T[T8_0_start + ((localCrc ^ b[off++]) & 0xff)];
    case 1: localCrc = (localCrc >> 8) ^ T[T8_0_start + ((localCrc ^ b[off++]) & 0xff)];
    default:
        /* nothing */;
    }

    // Publish crc out to object
    return crc = localCrc;
}

下面的代码是直接使用iSCSI的crc32指令的代码:

Win64下:

uint32_t crc32(char *src, uint32_t len) {
    static uint32_t re = 0xffffffff;
    uint64_t tmp;

    for (int i = 0 ; i < len ; i += 8) {
      tmp = *(uint64_t*)(src + i);
      re = _mm_crc32_u64(re, tmp);
    }
    return re;
}

Disassembly:   re = _mm_crc32_u64(re, tmp);

000000013FE51813  mov         eax,dword ptr [re (13FE5B000h)]  
000000013FE51819  crc32       rax,qword ptr [rsp]  
000000013FE51820  mov         dword ptr [re (13FE5B000h)],eax

Ubuntu下:

uint32_t getcrc32(char *data, int len) {
  static int crc = 0xffffffff;
  uint64_t tmp;
  for (int i = 0 ; i < len; i += 8) {
    tmp = *(uint64_t*)(data + i);
    crc = __builtin_ia32_crc32di(crc, tmp);
  }
  return crc;
}

测试结果:

Hadoop v2(Yarn)中的快速Crc32_第1张图片

*512B表示每512B算一次crc32,一共算64KB/512B次;64KB表示64KB一起算一次crc32。

*Jni + Fast Crc32 表示用jni 调用C版本的fast crc32,Jni + Pure Crc32同理。

测试环境:

Input Size:  64KB                         JDK: 1.7.0 

CPU:Intel I5 2450M                  Memory:6GB

Hadoop中测试:

    从以上数据可以看出Jni + Fast Crc32要比Pure JavaCrc32快的多,但在Hadoop中使用4个节点的集群测试(Benchmark: terasort)16G的数据Jni + Fast Crc32要比Pure JavaCrc32慢的多,所以又做了第二个测试:
Hadoop v2(Yarn)中的快速Crc32_第2张图片
(*)"10B and 90B" 表示输入10B计算一次crc32,然后输入90B计算一次crc32,这两次输入算做一次测试,一共做 640次。这样做更接近Hadoop的crc32的计算过程。

  在这种情况下Jni + Fast Crc32果然慢了下来,相对于Pure Java FastCrc32。

参考资料:

关于Fast CRC32的文档:
http://www.intel.cn/content/www/cn/zh/intelligent-systems/intel-technology/fast-crc-computation-paper.html
http://www.intel.com/content/dam/www/public/us/en/documents/white-papers/crc-iscsi-polynomial-crc32-instruction-paper.pdf
我的Crc32附录:
http://blog.csdn.net/edwardvsnc/article/details/8939077

你可能感兴趣的:(C/C++,分布式,Algorithm)