1.7 Measuring ld.so Performance
To perform the optimizations it is useful to quantify the effect of the optimizations. Fortunately it is very easy to do this with glibc’s dynamic linker. Using the LD DEBUG environment variable it can be instructed to dump information related to the startup performance. Figure 8 shows an example invocation, of the echo program in this case.
需要度量有效效果。幸运的是使用glibc很容易完成度量。LD_DEBUG可以展示启动相关信息。图8展示了一个示例,每个程序都有这些内容。
The output of the dynamic linker is divided in two parts. The part before the program’s output is printed right before the dynamic linker turns over control to the application after having performed all the work we described in this section. The second part, a summary, is printed after the application terminated (normally). The actual format might vary for different architectures. It includes the timing information only on architectures which provide easy access to a CPU cycle counter register (modern IA-32, IA-64, x86-64, Alpha in the moment). For other architectures these lines are simply missing.
动态链接器的输出分为两个部分。程序输出之前的内容是动态链接器未转移执行权限到程序的部分,之后的部分是程序执行内容。执行格式和架构差别很大。统计时间信息只有容易获取的CPU指令周期。其他架构这些内容忽略。
The timing information provides absolute values for the total time spend during startup in the dynamic linker, the time needed to perform relocations, and the time spend in the kernel to load/map binaries. In this example the relocation processing dominates the startup costs with more than 50%. There is a lot of potential for optimizations here. The unit used to measure the time is CPU cycles. This means that the values cannot even be compared across different implementations of the same architecture. E.g., the measurement for a PentiumRM III and a PentiumRM 4 machine will be quite different. But the measurements are perfectly suitable to measure improvements on one machine which is what we are interested here.
时间包括绝对值,重加载的值和内核加载的值。示例中冲加载超过50%。这里有好多潜在优化方法。时间单位是CPU指令周期。这也就意味着不同实现无法对比,不同架构的CPU差别很大。但这非常适合度量优化提高效果。
Since relocations play such a vital part of the startup performance some information on the number of relocations is printed. In the example a total of 133 relocations are performed, from the dynamic linker, the C library, and the executable itself. Of these 5 relocations could be served from the relocation cache. This is an optimization imple- mented in the dynamic linker to handle the case of multiple relocations against the same symbol more efficient. After the program itself terminated the same information is printed again. The total number of relocations here is higher since the execution of the application code caused a number, 55 to be exact, of run-time relocations to be performed.
由于重定位如此重要,会打印一些信息。示例中有133次重定位只想,包括链接器、c库、执行程序本身。5个可以通过缓存完成。这是用于处理多次定位相同内容的优化。后面是程序本身的输出内容。总的重定位次数高于执行程序55次,是运行时加载内容。
The number of relocations which are processed is stable across successive runs of the program. The time mea- surements not. Even in a single-user mode with no other programs running there would be differences since the cache and main memory has to be accessed. It is therefore necessary to average the run-time over multiple runs.
程序冲加载的过程是稳定不变的。时间不一定。即使单用户执行程序页会有不同,原因是缓存已加载。因此需要平均一下时间。
It is obviously also possible to count the relocations without running the program.
Running readelf -d on the binary shows the dynamic section in which the DT RELSZ, DT RELENT, DT RELCOUNT, and DT PLTRELSZ entries are interesting. They allow computing the number of normal and relative relocations as well as the number of PLT entries. If one does not want to do this by hand the relinfo script in appendix A can be used.
明显可以统计重加载过程,忽略程序执行。执行readelf -d展示动态段DT RELSZ、DT RELENT、DT RELCOUNT、 DT PLTRELSZ。这样可以统计一般的和重定位的PLT内容。如果如此可以手动处理这些信息,见附录A。