x264 development: a six month retrospective
x264开发:回顾这六个月
These past 6 months have consisted mostly of bugfixes, vast speed improvements, and the beginning of what will hopefully be a series of psychovisual optimizations.
How can I best describe the speed boost? Numbers would do the best job, I think. All values are my internal development build compared to the current version from 6
months ago. Adaptive quantization is disabled to make the results comparable. CRF is used for all encodes.
Max speed settings (no B-frames, subme 1, analyse none, me dia): 29.5% speed boost
Near-max speed settings (3 B-frames, subme 1, analyse none, me dia): 24.5% speed boost
Medium speed settings: (3 B-frames, subme 5): 18.5% speed boost
Slow speed settings (3 b-frames, subme 6, b-rdo, me umh, ref 4): 35% speed boost
Very slow speed settings (16 b-frames, subme 7, b-rdo, me esa, ref 16, trellis 2, no fast-pskip, partitions all, mixed-refs): 52% speed boost
Lossless: 15% speed boost
这6个月主要在修复bug,提升编码速度,希望还同时带来了视觉效果上的优化。
如何描述与速度提升相关工作最好呢?我想,用统计数据来说明应该是最好的。下面的数据对我现在使用的版本和6个月前的版本做了对比。为了作对比,关闭了自适应量化功能。所有编码器
都使用了CRF。
速度优先的配置 (无B帧, 子像素的运动估计级别 subme 1, analyse 无, 运动估计算法 dia):提升了 29.5%
速度次优先的配置 (3 B帧, 子像素的运动估计级别 subme 1, analyse 无, 运动估计算法 dia):提升了 24.5%
速度与质量兼顾的配置(3 B帧, 子像素的运动估计级别 subme 5, ):提升了 18.5%
质量次优先的配置 (3 B帧, 子像素的运动估计级别 subme 6, b-rdo, me umh, ref 4 ):提升了 35%
质量优先的配置 (16 B帧, 子像素的运动估计级别 subme 7, b-rdo, me esa, ref 16, trellis 2, no fast-pskip, partitions all, mixed-refs):提升了 52%
无损压缩:提升了15%
Notable new features:
1. Psy-based adaptive quantization, for improving quality in flat areas of the frame by taking bits from more complex areas of the frame.
2. –me tesa, transformed exhaustive search. Converted from a ridiculously slow initial algorithm by me to a highly optimized thresholded solution by Loren Merritt,
resulting in an even slower alternative to –me esa.
3. A massive preprocessor-based abstraction layer for assembly, allowing complete abstraction between 32-bit and 64-bit assembly and even automatic handling of
everything from stack offsets to macros that permute their arguments and SSE/MMX abstraction. Written from scratch by Loren Merritt and drastically simplifies all
assembly development.
Notable speed increases:
1. Altivec implementations of various functions; much faster PowerPC encoding.
2. Cacheline optimization for SAD-based motion search. Also for luma MC.
3. Much faster exhaustive motion search.
4. Lots more SSE2 assembly. And SSSE3 too. And even more SSE2. Oh wait, more…
5. Skipping stuff.
6. Much much faster CABAC encoding.
7. Tons of small optimizations all over x264. Yes, there’s lots more of these. And more of these. And even more… wait, there’s more here…
值得关注的新特性
1 基于Psy的自适应量化:通过分配更多bit给平坦区域,同时减少复杂区域的bit数。
2 -me tesa选项:用Loren Merritt优化过的穷尽搜索算法替换原来我写的慢得离谱的算法。
3 为汇编代码提供了一个支持预处理的抽象层,可以抽象32-bit和64-bit的汇编代码,甚至可以自动处理简单的堆栈偏移和复杂的序列改变参数和SSE/MMX抽象这样的工作。Loren Merritt大
侠做的这些工作大大的简化了汇编方面的开发。