性能优化:__builtin_expect详解

转自:http://hi.baidu.com/lammy/blog/item/bc5e3d4e869073c3d1c86a89.html

在GTK+2.0源码中有很多这样的宏:G_LIKELY和G_UNLIKELY。比如下面这段代码:

if (G_LIKELY (acat == 1))       /* allocate through magazine layer */
      {
        ThreadMemory *tmem = thread_memory_from_self();
        guint ix = SLAB_INDEX (allocator, chunk_size);
        if (G_UNLIKELY (thread_memory_magazine1_is_empty (tmem, ix)))
          {
            thread_memory_swap_magazines (tmem, ix);
            if (G_UNLIKELY (thread_memory_magazine1_is_empty (tmem, ix)))
              thread_memory_magazine1_reload (tmem, ix);
          }
        mem = thread_memory_magazine1_alloc (tmem, ix);
      }

在源码中,宏G_LIKELY和G_UNLIKELY 是这么定义的:

#define G_LIKELY(expr) (__builtin_expect (_G_BOOLEAN_EXPR(expr), 1))
  #define G_UNLIKELY(expr) (__builtin_expect (_G_BOOLEAN_EXPR(expr), 0))

宏_G_BOOLEAN_EXPR的作用是把expr转换为0和1,即真假两种。要理解宏G_LIKELY和G_UNLIKELY ,很明显必须理解__builtin_expect。__builtin_expect是GCC(version>=2.9)引进的宏,其作用就是帮助编译器判断条件跳转的预期值,避免跳转造成时间乱费。拿上面的代码来说:

if (G_LIKELY (acat == 1))     //表示大多数情况下if里面是真,程序大多数直接执行if里面的程序

if (G_UNLIKELY (thread_memory_magazine1_is_empty (tmem, ix)))//表示大多数情况if里面为假,程序大多数直接执行else里面的程序

可能大家看到还是一头雾水,看下面一段就会明白其中的乐趣啦;

//test_builtin_expect.c 
#define LIKELY(x) __builtin_expect(!!(x), 1)
#define UNLIKELY(x) __builtin_expect(!!(x), 0)
int test_likely(int x)
{
 if(LIKELY(x))
 {
    x = 5;
 }
 else
 {
    x = 6;
 }
  
 return x;
}
int test_unlikely(int x)
{
 if(UNLIKELY(x))
 {
    x = 5;
 }
 else
 {
    x = 6;
 }
  
 return x;
}
[lammy@localhost test_builtin_expect]$ gcc -fprofile-arcs -O2 -c test_builtin_expect.c 
[lammy@localhost test_builtin_expect]$ objdump -d test_builtin_expect.o
test_builtin_expect.o:       file format elf32-i386
Disassembly of section .text:
00000000 :
     0: 55                      push     %ebp
     1: 89 e5                   mov      %esp,%ebp
     3: 8b 45 08                mov      0x8(%ebp),%eax
     6: 83 05 38 00 00 00 01  addl     $0x1,0x38
     d: 83 15 3c 00 00 00 00  adcl     $0x0,0x3c
  14: 85 c0                   test     %eax,%eax
  16: 74 15                   je       2d //主要看这里
  18: 83 05 40 00 00 00 01  addl     $0x1,0x40
  1f: b8 05 00 00 00          mov      $0x5,%eax
  24: 83 15 44 00 00 00 00  adcl     $0x0,0x44
  2b: 5d                      pop      %ebp
  2c: c3                      ret      
  2d: 83 05 48 00 00 00 01  addl     $0x1,0x48
  34: b8 06 00 00 00          mov      $0x6,%eax
  39: 83 15 4c 00 00 00 00  adcl     $0x0,0x4c
  40: 5d                      pop      %ebp
  41: c3                      ret      
  42: 8d b4 26 00 00 00 00  lea      0x0(%esi,%eiz,1),%esi
  49: 8d bc 27 00 00 00 00  lea      0x0(%edi,%eiz,1),%edi
00000050 :
  50: 55                      push     %ebp
  51: 89 e5                   mov      %esp,%ebp
  53: 8b 55 08                mov      0x8(%ebp),%edx
  56: 83 05 20 00 00 00 01  addl     $0x1,0x20
  5d: 83 15 24 00 00 00 00  adcl     $0x0,0x24
  64: 85 d2                   test     %edx,%edx
  66: 75 15                   jne      7d //主要看这里
  68: 83 05 30 00 00 00 01  addl     $0x1,0x30
  6f: b8 06 00 00 00          mov      $0x6,%eax
  74: 83 15 34 00 00 00 00  adcl     $0x0,0x34
  7b: 5d                      pop      %ebp
  7c: c3                      ret      
  7d: 83 05 28 00 00 00 01  addl     $0x1,0x28
  84: b8 05 00 00 00          mov      $0x5,%eax
  89: 83 15 2c 00 00 00 00  adcl     $0x0,0x2c
  90: 5d                      pop      %ebp
  91: c3                      ret      
  92: 8d b4 26 00 00 00 00  lea      0x0(%esi,%eiz,1),%esi
  99: 8d bc 27 00 00 00 00  lea      0x0(%edi,%eiz,1),%edi
000000a0 <_GLOBAL__I_65535_0_test_likely>:
  a0: 55                      push     %ebp
  a1: 89 e5                   mov      %esp,%ebp
  a3: 83 ec 08                sub      $0x8,%esp
  a6: c7 04 24 00 00 00 00  movl     $0x0,(%esp)
  ad: e8 fc ff ff ff          call     ae <_GLOBAL__I_65535_0_test_likely+0xe>
  b2: c9                      leave  
  b3: c3                      ret      
[lammy@localhost test_builtin_expect]$

两个函数编译生成的汇编语句所使用到的跳转指令不一样,仔细分析下会发现__builtin_expect实际上是为了满足在大多数情况不执行跳转指令,所以__builtin_expect仅仅是告诉编译器优化,并没有改变其对真值的判断。

这种用法在Linux内核中也经常用到,国外也有一篇相关的文章,大家不妨看看:http://kernelnewbies.org/FAQ/LikelyUnlikely

不知大家注意到没有,我在生产汇编时用的是gcc -fprofile-arcs -O2 -c test_builtin_expect.c,而不是gcc -O2 -c test_builtin_expect.c,具体可以参考http://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html。




FAQ/LikelyUnlikely

likely() and unlikely()

What are they ?

In Linux kernel code, one often find calls to likely() and unlikely(), in conditions, like :

bvl = bvec_alloc(gfp_mask, nr_iovecs, &idx);
if (unlikely(!bvl)) {
  mempool_free(bio, bio_pool);
  bio = NULL;
  goto out;
}

In fact, these functions are hints for the compiler that allows it to correctly optimize the branch, by knowing which is the likeliest one. The definitions of these macros, found in include/linux/compiler.h are the following :

#define likely(x)       __builtin_expect(!!(x), 1)
#define unlikely(x)     __builtin_expect(!!(x), 0)

The GCC documentation explains the role of __builtin_expect() :

 -- Built-in Function: long __builtin_expect (long EXP, long C)
     You may use `__builtin_expect' to provide the compiler with branch
     prediction information.  In general, you should prefer to use
     actual profile feedback for this (`-fprofile-arcs'), as
     programmers are notoriously bad at predicting how their programs
     actually perform.  However, there are applications in which this
     data is hard to collect.

     The return value is the value of EXP, which should be an integral
     expression.  The value of C must be a compile-time constant.  The
     semantics of the built-in are that it is expected that EXP == C.
     For example:

          if (__builtin_expect (x, 0))
            foo ();

     would indicate that we do not expect to call `foo', since we
     expect `x' to be zero.  Since you are limited to integral
     expressions for EXP, you should use constructions such as

          if (__builtin_expect (ptr != NULL, 1))
            error ();

     when testing pointer or floating-point values.

How does it optimize things ?

It optimizes things by ordering the generated assembly code correctly, to optimize the usage of the processor pipeline. To do so, they arrange the code so that the likeliest branch is executed without performing any jmp instruction (which has the bad effect of flushing the processor pipeline).

To see how it works, let's compile the following simple C user space program with gcc -O2 :

#define likely(x)    __builtin_expect(!!(x), 1)
#define unlikely(x)  __builtin_expect(!!(x), 0)

int main(char *argv[], int argc)
{
   int a;

   /* Get the value from somewhere GCC can't optimize */
   a = atoi (argv[1]);

   if (unlikely (a == 2))
      a++;
   else
      a--;

   printf ("%d\n", a);

   return 0;
}

Now, disassemble the resulting binary using objdump -S (comments added by me) :

080483b0 
: // Prologue 80483b0: 55 push %ebp 80483b1: 89 e5 mov %esp,%ebp 80483b3: 50 push %eax 80483b4: 50 push %eax 80483b5: 83 e4 f0 and $0xfffffff0,%esp // Call atoi() 80483b8: 8b 45 08 mov 0x8(%ebp),%eax 80483bb: 83 ec 1c sub $0x1c,%esp 80483be: 8b 48 04 mov 0x4(%eax),%ecx 80483c1: 51 push %ecx 80483c2: e8 1d ff ff ff call 80482e4 80483c7: 83 c4 10 add $0x10,%esp // Test the value 80483ca: 83 f8 02 cmp $0x2,%eax // -------------------------------------------------------- // If 'a' equal to 2 (which is unlikely), then jump, // otherwise continue directly, without jump, so that it // doesn't flush the pipeline. // -------------------------------------------------------- 80483cd: 74 12 je 80483e1 80483cf: 48 dec %eax // Call printf 80483d0: 52 push %edx 80483d1: 52 push %edx 80483d2: 50 push %eax 80483d3: 68 c8 84 04 08 push $0x80484c8 80483d8: e8 f7 fe ff ff call 80482d4 // Return 0 and go out. 80483dd: 31 c0 xor %eax,%eax 80483df: c9 leave 80483e0: c3 ret

Now, in the previous program, replace the unlikely() by a likely(), recompile it, and disassemble it again (again, comments added by me) :

080483b0 
: // Prologue 80483b0: 55 push %ebp 80483b1: 89 e5 mov %esp,%ebp 80483b3: 50 push %eax 80483b4: 50 push %eax 80483b5: 83 e4 f0 and $0xfffffff0,%esp // Call atoi() 80483b8: 8b 45 08 mov 0x8(%ebp),%eax 80483bb: 83 ec 1c sub $0x1c,%esp 80483be: 8b 48 04 mov 0x4(%eax),%ecx 80483c1: 51 push %ecx 80483c2: e8 1d ff ff ff call 80482e4 80483c7: 83 c4 10 add $0x10,%esp // -------------------------------------------------- // If 'a' equal 2 (which is likely), we will continue // without branching, so without flusing the pipeline. The // jump only occurs when a != 2, which is unlikely. // --------------------------------------------------- 80483ca: 83 f8 02 cmp $0x2,%eax 80483cd: 75 13 jne 80483e2 // Here the a++ incrementation has been optimized by gcc 80483cf: b0 03 mov $0x3,%al // Call printf() 80483d1: 52 push %edx 80483d2: 52 push %edx 80483d3: 50 push %eax 80483d4: 68 c8 84 04 08 push $0x80484c8 80483d9: e8 f6 fe ff ff call 80482d4 // Return 0 and go out. 80483de: 31 c0 xor %eax,%eax 80483e0: c9 leave 80483e1: c3 ret

How should I use it ?

You should use it only in cases when the likeliest branch is very very very likely, or when the unlikeliest branch is very very very unlikely.


你可能感兴趣的:(Linux网络编程)