前段时间在读xfce的Thunar(file manager)的源码时碰到了G_LIKELY和G_UNLIKELY的调用,虽然大概知道什么意思,跟linux内核里用的likely和unlikely应该是一样的,但是还是想在这里总结一下。
从glib的源码里面(glib/gmacros.h)可以找到G_LIKELY和G_UNLIKELY的定义:
#define _G_BOOLEAN_EXPR(expr) \
G_GNUC_EXTENSION ({ \
int _g_boolean_var_; \
if (expr) \
_g_boolean_var_ = 1; \
else \
_g_boolean_var_ = 0; \
_g_boolean_var_; \
})
#define G_LIKELY(expr) (__builtin_expect (_G_BOOLEAN_EXPR(expr), 1))
#define G_UNLIKELY(expr) (__builtin_expect (_G_BOOLEAN_EXPR(expr), 0))
它的实现靠的就是gcc的内建函数__builtin_expect,为了进一步了解该内建函数的功能,可以从gcc手册里面找到它的描述:
long __builtin_expect (long exp, long c) [Built-in Function]
You may use __builtin_expect to provide the compiler with branch prediction
information. In general, you should prefer to use actual profile feedback for this
(‘-fprofile-arcs’), as programmers are notoriously bad at predicting how their
programs actually perform. However, there are applications in which this data is
hard to collect.
The return value is the value of exp, which should be an integral expression. The
semantics of the built-in are that it is expected that exp == c. For example:
if (__builtin_expect (x, 0))
foo ();
would indicate that we do not expect to call foo, since we expect x to be zero. Since
you are limited to integral expressions for exp, you should use constructions such as
if (__builtin_expect (ptr != NULL, 1))
foo (*ptr);
when testing pointer or floating-point values.
通过上面这段描述,可以知道__builtin_expect可以用来告诉编译器做分支预判,其实这个分支预判就是提前告诉编译器让它在大概率事件或小概率事件发生时能做一定的优化,把大概率事件或小概率事件发生时程序的分支能够尽量短,即减少了这个分支的运行机器周期。
从上面的描述也可知道,__builtin_expect的返回值还是表达式x的值,所以if (__builtin_expect (x, 0)) 或 if (__builtin_expect (x, 1))的结果跟if (x)是一样的。
if (__builtin_expect (x, 0))的目的只是告诉编译器我们期望表达式x的值为0,这样其实是告诉编译器我们不期望if 紧接着的语句被执行,而是期望else后面的语句被执行,这样编译器就会帮我们把else后面的语句优化到if语句的后面。同理if (__builtin_expect (x, 1))的目的只是告诉编译器我们期望表达式x的值为1,即期望if紧接着的语句被执行。
总结一下就是__builtin_expect的功能是让编译器帮我们把我们期望被执行的语句优化到紧接着分支的后面。
下面来看个例子:
1 #include
2
3 int main (int argc, char **argv)
4 {
5 int a = argc;
6
7 if (__builtin_expect (a, 0))
8 {
9 a = 0x5a;
10 }
11 else
12 {
13 a = 0xaa;
14 }
15
16 printf ("a = 0x%x\n", a);
17
18 return 0;
19 }
gcc -fprofile-arcs -O2 test_expect.c -o test_expect
看一下这个例子反汇编的代码:
173 08048870 :
174 8048870: 55 push %ebp
175 8048871: 89 e5 mov %esp,%ebp
176 8048873: 83 e4 f0 and $0xfffffff0,%esp
177 8048876: 83 ec 10 sub $0x10,%esp
178 8048879: 8b 45 08 mov 0x8(%ebp),%eax
179 804887c: 83 05 e8 c0 04 08 01 addl $0x1,0x804c0e8
180 8048883: 83 15 ec c0 04 08 00 adcl $0x0,0x804c0ec
181 804888a: 85 c0 test %eax,%eax
182 804888c: 75 3d jne 80488cb
183 804888e: 83 05 f0 c0 04 08 01 addl $0x1,0x804c0f0
184 8048895: b8 aa 00 00 00 mov $0xaa,%eax
185 804889a: 83 15 f4 c0 04 08 00 adcl $0x0,0x804c0f4
186 80488a1: 89 44 24 08 mov %eax,0x8(%esp)
187 80488a5: c7 44 24 04 f0 a0 04 movl $0x804a0f0,0x4(%esp)
188 80488ac: 08
189 80488ad: c7 04 24 01 00 00 00 movl $0x1,(%esp)
190 80488b4: e8 67 ff ff ff call 8048820 <__printf_chk@plt>
191 80488b9: 83 05 f8 c0 04 08 01 addl $0x1,0x804c0f8
192 80488c0: 83 15 fc c0 04 08 00 adcl $0x0,0x804c0fc
193 80488c7: 31 c0 xor %eax,%eax
194 80488c9: c9 leave
195 80488ca: c3 ret
196 80488cb: b8 5a 00 00 00 mov $0x5a,%eax
197 80488d0: eb cf jmp 80488a1
注意184行和196行,可以看到分支优化的效果,为了达到这种优化效果,请在编译时使用选项-fprofile-arcs -O2来编译。
再来看一看不使用选项-fprofile-arcs -O2编译后反汇编的结果:
123 080483e4 :
124 80483e4: 55 push %ebp
125 80483e5: 89 e5 mov %esp,%ebp
126 80483e7: 83 e4 f0 and $0xfffffff0,%esp
127 80483ea: 83 ec 20 sub $0x20,%esp
128 80483ed: 8b 45 08 mov 0x8(%ebp),%eax
129 80483f0: 89 44 24 1c mov %eax,0x1c(%esp)
130 80483f4: 8b 44 24 1c mov 0x1c(%esp),%eax
131 80483f8: 85 c0 test %eax,%eax
132 80483fa: 74 0a je 8048406
133 80483fc: c7 44 24 1c 5a 00 00 movl $0x5a,0x1c(%esp)
134 8048403: 00
135 8048404: eb 08 jmp 804840e
136 8048406: c7 44 24 1c aa 00 00 movl $0xaa,0x1c(%esp)
137 804840d: 00
138 804840e: b8 00 85 04 08 mov $0x8048500,%eax
139 8048413: 8b 54 24 1c mov 0x1c(%esp),%edx
140 8048417: 89 54 24 04 mov %edx,0x4(%esp)
141 804841b: 89 04 24 mov %eax,(%esp)
142 804841e: e8 dd fe ff ff call 8048300
143 8048423: b8 00 00 00 00 mov $0x0,%eax
144 8048428: c9 leave
145 8048429: c3 ret
146 804842a: 90 nop
147 804842b: 90 nop
148 804842c: 90 nop
149 804842d: 90 nop
150 804842e: 90 nop
151 804842f: 90 nop