MPEG-4 B-frames in AVI/VFW hackery description

From: http://forum.doom9.org/showthread.php?p=529561


MPEG-4 B-frames in AVI/VFW hackery description

As i had to describe a bunch of people how b-frames can be placed in .avi/be used during enc/decoding in vfw, i thought it would be better to write some sort of a short doc once and for all, which i could point people too, so here it is:

after reading it you should know:
- what packed bitstream is
- what delay frames are
- what xvid's decoder lag message is
- what all has to be done for keeping outdated .avi and vfw alive


first of all the problem

the "video for windows" (VFW) codec interface (as used in virtualdub(mod)) and its container (AVI) can NOT handle b-frames, and they will never even know that this frame type exists!
therefore it can be said that these are outdated technologies, as they themselves cant handle modern technologies, like b-frames

now there are two possibilites if you want to use b-frames:
1) simply dont use outdated technologies, not able to handle b-frames (and use better technologies like DirectShow and .MP4)
2) use outdated technologies and develop and use hacks

two different hacks

ad 2) atm there are two different hacks existing to make b-frames work with avi and vfw:
a) on the encoder side (called packed bitstream as used by default in xvid and divx5)
b) on the decoder side (as used by xvid, if packed bitstream is disabled)

basics

now if we want to understand the hacks for using b-frames in avi and vfw, we have to know the following:

normally the frames are stored in the container this way:
I P B B
they get displayed this way:
I B B P

VFW and AVI use a "one frame in, one frame out" sheme (thats simply how the technology works), which means that on every frame inputted, you have to output one (on both the encoding and decoding side). 
this is not compatible with b-frames, as b-frames are constructed by using two frames at once, the previous and following i/p frame (but vfw/avi dont allow something like "two frames in, one frame out" or so (in contrary to modern "not outdated" technologies, like directshow or .mp4)

now normally (by using up-to-date technology) the decoder would do the following during decoding:
1) display the I-frame
2) now he wants to display the B-frame, for which he needs the previous and next I/P-frames. so, having the already decoded I still ready, he graps the P and therefore now is able to decode the B too (its a "3 frames in, one frame out" situation)
3) the same goes for the second B
4) than he displays the P-frame

hackery when decoding

now lets go on to the hacks:
ad a) as AVI and VFW only can do "one frame in, one frame out", the following workaround is done on the encoder side (called packed bitstream):
the first B gets packed together with the P-frame to one frame: 
I P B B becomes
I PB B N 
(N is a not coded placeholder frame to still have the same framenumber)

now the decoder does the following:
1) he decodes the I
2) for decoding the first B-frame he needs (as described above) the I and the P
as the first B is packed together with the P, he "officially" only gets only 1 frame (as PB appears as 1 frame), but in fact they are two frames (P + B), now is able to decode the first B
3) as he has the I and P already he is able to decode the second B too
4) than he displays the P

the hack here is that, as avi/vfw is forced to the "one frame in, one frame out" sheme, you circumvent this by feeding it with two frames at one time, by packing them together to one frame
some devs (like michael niedermayer from ffmpeg) have the opinion that packing two frame to one breaks the compliancy of the stream to the mpeg-4 standard! also a normal mpeg-4 decoder, written following the mpeg-4 specs, not knowing packed bitstream, cant decode packed bitstreams


ad b) the second possible hack is that the stream gets stored correctly in the avi (as I P B B) but the hack happens on the decoder side:
1) first the decoder gets the I-frame, but does not display it (instead he displays for example the famous "decoder lag" message of xvid!)
2) in the second step the decoder gets the P as input, but now only displays the I frame (the first frame) - one frame in, one frame out is still followed but with a time lag of 1 frame
3) in the third step the decoder now gets the first B-frame and he already has the needed I and P-frame handy and therefore is able to decode the B
4) as he has the I and P already, he now is also able to decode the second B
5) he displays the P

the hack here is that, as avi/vfw is forced to the "one frame in, one frame out" sheme, you create a lag, which leads to that the sheme is still followed, but the decoder does not output the frame he gets, but the frame before
in contrary to packed bitstream, the stream gets written as the mpeg-4 standard defines it by using this way

hackery when encoding

these are the two hacks which can make it possible to use b-frames in the outdated VFW and AVI  on the decoding side
still on the encoding side there is also a hack (as there also is the "one frame in, one frame out" sheme to follow of course):
for encoding a B-frame you need to feed the encoder with two frames:

now the encoder does the following:
1) you feed the encoder with the first frame -> coded as I-frame
2) input second frame -> should get coded as B-frame => not possible as you need a P-Frame too for creating a B
3) third input -> B => also not possible as the P was not there till now
4) fourth input -> coded as P-Frame
5) (now as the I and P are available) => first B gets written
6) (now as the I and P are available) => second B gets written

now as we know AVI and VFW is forced to follow the "one frame in, one frame out" sheme, so what do you think it writes in step 2) and 3) ?
as it has to write something (one in -> one out), it writes so called delay frames (1-byte 0x7f frames)
these delay frames are simply written as VFW simply has to write something! BUT they are not there in the input AND they break the compliancy to the mpeg-4 standard!

so what can we do? 
what we always do if AVI and VFW cause troubles -> we hack a workaround:
=> virtualdub(mod) will drop these "delay frames" during encoding, to make it sure that there are no delay frames in the final output stream
BUT the chance is high that other tools able to encode using vfw codecs, will not remove these delay frames, so beaware

你可能感兴趣的:(十全九美)