How To Write Shared Libraries(44)

2.5 Improving Generated Code(1)

On most platforms the code generated for DSOs differs from code generated for applications. The code in DSOs needs to be relocatable while application code can usually assume a fixed load address. This inevitably means that the code in DSOs is slightly slower and probably larger than application code. Sometimes this additional overhead can be measured. Small, often called functions fall into this category. This section shows some problem cases of code in DSOs and ways to avoid them.
大部分平台,DSO的代码和应用的代码生成结果不同。DSO的代码需要相关定位,应用代码假定固定地址。这就意味着DSO更慢,代码更多。有时增加内容可度量。分类小的经常调用函数。本节分析一些场景和如何处理这情况。

In the preceding text we have seen that for IA-32 a function accessing a global variable has to load determine the address of the GOT to use the @GOTOFF operation. Assuming this C code
之前所见,IA-32架构中一个函数操作变量使用GOT。示例代码:

static int foo;
int getfoo (void)
{ return foo; }

the compiler might end up creating code like this:
编译器可能的生成如下:

getfoo:
   call 1f
1: popl %ecx
addl _GLOBAL_OFFSET_TABLE_[.-1b],%ecx movl foo@GOTOFF(%ecx),%eax
ret

The actual variable access is overshadowed by the overhead to do so. Loading the GOT address into the %ecx register takes three instructions. What if this function is called very often? Even worse: what if the function getfoo would be defined static or hidden and no pointer to it are ever available? In this case the caller might already have computed the GOT address; at least on IA-32 the GOT address is the same for all functions in the DSO or executable. The computation of the GOT address in foobar would be unnecessary. The key word in this scenario description is “might”. The IA-32 ABI does not require that the caller loads the PIC register. Only if a function calls uses the PLT do we know that %ebx contains the GOT address and in this case the call could come from any other loaded DSO or the executable.I.e., we really always have to load the GOT address.
实际变量操作开销很大。加载GOT地址到ecx寄存器有三条指令。一般调用呢?更糟糕:函数getfoo是静态的或者hidden,没有指针地址可用呢?这种情况调用可能已经通过计算GOT完成。至少,IA-32架构上GOT地址在DSO和执行程序中是一致的。不需要计算foobar的GOT地址。关键词是might。IA-32的ABI不需要调用者加载PIC寄存器。只有当函数调用使用PLT时,我们才知道%ebx包含GOT地址,在这种情况下,调用可以来自任何其他已加载的DSO或可执行文件。(有道翻译)我们总是需要加载GOT。

On platforms with better-designed instruction sets the generated code is not bad at all. For example, the x86-64 version could look like this:
在设计良好的架构中代码很好。例如,x86-64架构类似如下代码:

 getfoo:
  movl foo(%rip),%eax
  ret

The x86-64 architecture provides a PC-relative data addressing mode which is extremely helpful in situations like this.
x86-64架构的有一个PC-relatvie的数据模型用于处理这种情况。

Another possible optimization is to require the caller to load the PIC register. On IA-64 the gp register is used for this purpose. Each function pointer consist of a pair function address and gp value. The gp value has to be loaded before making the call. The result is that for our running example the generated code might look like this:
另一个优化方案要求调用者加载PIC寄存器。IA-64的gp寄存器用于这个目的。函数指针有一对地址和gp值组成。gp在调用前加载。这样的代码如下:

getfoo:
addl r14=@gprel(foo),gp;; ld4 r8=[r14] br.ret.sptk.many b0

If the caller knows that the called function uses the same gp value, it can avoid the loading of gp. IA-32 is really a special case, but still a very common one. So it is appropriate to look for a solution.
如果调用者知道使用gp值,可以阻止加载gp。IA-32是一个特殊情况,但也是常见情况。所以需要搜寻方案。

你可能感兴趣的:(How To Write Shared Libraries(44))