Eliminating Pentium I support
去除Pentium I 的支持
Filed under: assembly,speed,x264 ::
With this change, we have finally eliminated support for all x86 CPUs that don’t support MMXEXT. More correctly, x264 compiled on an MMXEXT-supporting
system will not run a non-MMXEXT system, so people who still use Pentium Is for their encoding will have to compile their own versions.
这次改动,我们终于不再支持无MMXEXT功能的x86 CPU 。更准确的说,在支持MMXEXT的机器上编译的x264将无法在不支持MMXEXT的机器上运行,所以还在用奔I的人只能自己
编译他们的x264了。
What exactly did we do? Its quite nice to be able to use MMX, CMOV, and other such extended x86 instructions in ordinary C code. Normal assembly code
in x264 uses function pointers so that the best function is selected on runtime. But in a case where a performance gain can only be achieved through
inlining, function pointers are useless. This is especially the case with small, simple functions that are called often, such as the x264_median in the
above diff, used for motion vector prediction. A more extreme example was the CABAC asm, which got a >20% performance boost for CABAC encoding merely
by changing a branch to a cmov (but which required the whole function to be rewritten in assembly).
我们到底做了什么呢?能在普通c代码中使用MMX,CMOV以及其他x86的扩展指令是件很爽的事。x264使用函数指针的方式来使用直接用汇编语言编写的代码段,所以最适合的
当前环境的函数都是在运行的时候才做出选择的。但是,有的情况下性能的提升只能通过内联函数时,函数指针就无效了。对于那些被频繁调用的小函数,内联函数尤其能
派上用场,例如运动向量预测中用于计算above diff的x264_median函数。另一个最明显的例子是CABAC编码中的汇编代码中,只把一个分支改成cmov就提升了20%的性能。
Now, having officially eliminated support for all x86 CPU architectures prior to MMXEXT, we can feel free to throw MMX/CMOV code basically wherever we
want in the code, allowing all kinds of small speedups in cases that groups of simple operations can be easily SIMD’d. Another note: the median
assembly referenced at the start of this post is the first use of GCC inline assembly in x264′s history.
不再支持无MMXEXT功能的x86 CPU之后,我们就可以自由的使用MMX/CMOV指令了,当遇到可以简单的用SIMD实现的操作时,我们就可以用汇编指令做一下性能提升。值得一提的时,上面提到的x264_median函数x264第一次使用gcc 内联汇编。
Edit: Since it appears that some people aren’t exactly clear on the point of this post, the program will still compile on a non-MMX machine; its just
that code compiled on an MMX machine will not work on a non-MMX machine, whereas before it did.