Introducing Photon
Filed under: codec,ffmpeg,photon ::
A few days ago I was somewhat bored; no particular good ideas for quick x264 improvements, no good games to play, and most of my finals out of the way. So I decided to
make my own codec, called Photon.
The basic goal is to make a very fast MPEG-2-like format with better compression and speed than MPEG-2, but without the complexities of things like interlacing and B-
frames. My eventual plans for Photon are a bit more fancy than what I have so far, of course; currently the encoder and decoder are intra-only, for example. And
honestly, I’ll be shocked if I manage to reach my goals; my main purpose in this is to learn how to write a codec and bitstream format from the ground up and
experiment with all sorts of features in the process.
最近百无聊赖,,没有什么好主意来改善x264,没有好游戏玩,期末考试也不碍事,于是我打算写一个我自己的codec,叫Photon
基本的目标是做一个比MPEG-2更快,压缩率更高的类MPEG-2编码器,其中不包括交错编码、B帧这些复杂功能。最终的计划比这个更棒一些。目前,编码器和解码器都是只有帧内编码。如果我
最终能达成最初的目标,我自己都会感到惊讶的。我主要的目的是学习如何从头写一个codec和码流格式,并体验整个过程。
The main philosophy behind Photon is to keep everything as simple as possible; the fewer special cases needed, the better. As such, every macroblock is divided up into
8×8 blocks; 4 for luma and 2 for chroma (the codec uses YV12 colorspace). Every single 8×8 block, luma or chroma, is treated exactly the same with the exact same set
of code; the bitstream does not even distinguish them. This makes it possible to have a single loop for decoding all of these blocks. For the transform, the H.264
transform/quant/zigzag process is used.
主要设计思想是所有东西都尽可能的简单。特殊情况越少越好。如此,每个宏块就按8x8划分成4个子宏块,加上两个8x8色度块,一共6个(使用YV12色彩空间)。所有的8x8块都用相同的方式
处理。码流中设置不区分这6个块。这样可以在一个循环中就处理所有的块。变换使用的是h264的变换、量化,zigzag扫描。
Preceding the blocks is a 6-bit element, with one bit for each block. For each block, its associated bit in the Coded Block Pattern (CBP) is set to 0 if there’s
nothing in the block and set to 1 if there is.
For each block with a CBP of 1:
1. A delta-quantizer element. Yes, that’s right: each 8×8 block has its own quantizer! This increases the effectiveness of adaptive quantization. The delta-quantizer
is done with respect to the same numbered block in the previous macroblock and is coded in unary with one bit for the sign (total bit cost: 1 bit to not code a delta
quant, N+1 bits to code a delta quant, where N is the delta).
2. A transform element. Set to 0 if the block uses 4x4dct, set to 1 if the block uses 8x8dct.
3. A 4-bit CBP for the 4 4x4dct blocks, set in the same manner as the macroblock CBP. Omitted in the case of 8x8dct.
4. Residual for the 8×8 block, or for each 4×4 block with a CBP (in raster scan order) as follows:
a. The first coefficient in the block, coded as a signed exponential golomb code.
b. Is this the last coefficient? If so, a 1 is coded and the residual coding ends. Otherwise, a 0 is coded.
c. The run length of 0s until the next coefficient, coded as an unsigned exponential golomb code.
This loop repeats to code the entire block.
宏块前是一个6-bit的语法元素,每个宏块一个bit。如果有残差,这个块对应的那个bit就是1,否则就是0。
某个块的CBP位是1时,还伴有下面这些语法元素:
1、量化参数差值(delta-quantizer)。是的,你没看错:每个8x8块有自己的量化参数。这使得自适应量化变得更高效。量化参数差值是相对于前一个宏块中相同位置的子宏块而言的,如果
没有量化参数差值,则用一个bit表示没有,如果有,则要用N+1个bit来表示,其中N个bit表示差值,一个bit表示正负符号。
2、变换。如果是0表示使用4x4 dct,1表示使用8x8 dct
3、4-bit 的子宏块 CBP:语义和宏块的CBP一样,该语法语速只在使用4x4 dct时才有意义。
4、残差
a. 第一个系数用有符号指数哥伦布编码进行编码。
b. 判断是否是最后一个系数?如果是,则把1编码,紧接着是残差。否则把0编码。
c. 把值为0的系数的个数用无符号指数哥伦布编码进行编码。
循环此过程完成整个宏块的编码。
The macroblock itself has a 2-to-4-bit header specifying the luma and chroma prediction modes to use, and the frame has a header specifying the frametype and frame
quantizer.
… and that’s it. That’s the whole thing, so far. Yes, the entropy coding is absurdly suboptimal and would benefit dramatically from custom variable-length codes
instead of universal codes. Yes, I’m not using an ounce of assembly whatsoever and the encoding and decoding are far slower than they should be (decoding is still
realtime for most sane resolutions though). But it works.
For those who want to see my hackneyed, half-copy-pasted-from-x264′s-common-library code, the diff so far is here.
宏块用2个bit到4个bit来表示亮度和色度预测模式,每个帧有一个头部指明帧的类型和帧的量化参数。
这就是我的整个设想。熵编码是可选项,并且使用自定义的变长编码,而不是使用通用的码表。实现时,没有用汇编代码,所以编码和解码都不是那么快(尽管如此,还是能实时解码目前有
的各种分辨率的图像)。