ffmpeg开发指南(Using libavformat and libavcodec)
The libavformat and libavcodec libraries that come with ffmpeg are a great way of accessing a large variety of video file formats. Unfortunately, there is no real documentation on using these libraries in your own programs (at least I couldn't find any), and the example programs aren't really very helpful either.
This situation meant that, when I used libavformat/libavcodec on a recent project, it took quite a lot of experimentation to find out how to use them. Here's what I learned - hopefully I'll be able to save others from having to go through the same trial-and-error process. There's also a small demo program that you can download. The code I'll present works with libavformat/libavcodec as included in version 0.4.8 of ffmpeg (the most recent version as I'm writing this). If you find that later versions break the code, please let me know.
In this document, I'll only cover how to read video streams from a file; audio streams work pretty much the same way, but I haven't actually used them, so I can't present any example code.
In case you're wondering why there are two libraries, libavformat and libavcodec: Many video file formats (AVI being a prime example) don't actually specify which codec(s) should be used to encode audio and video data; they merely define how an audio and a video stream (or, potentially, several audio/video streams) should be combined into a single file. This is why sometimes, when you open an AVI file, you get only sound, but no picture - because the right video codec isn't installed on your system. Thus, libavformat deals with parsing video files and separating the streams contained in them, and libavcodec deals with decoding raw audio and video streams.
Opening a Video File
First things first - let's look at how to open a video file and get at the streams contained in it. The first thing we need to do is to initialize libavformat/libavcodec:
av_register_all();
This registers all available file formats and codecs with the library so they will be used automatically when a file with the corresponding format/codec is opened. Note that you only need to call av_register_all() once, so it's probably best to do this somewhere in your startup code. If you like, it's possible to register only certain individual file formats and codecs, but there's usually no reason why you would have to do that.
Next off, opening the file:
AVFormatContext *pFormatCtx;
const char *filename="myvideo.mpg";
// Open video file
if(av_open_input_file(&pFormatCtx, filename, NULL, 0, NULL)!=0)
handle_error(); // Couldn't open file
The last three parameters specify the file format, buffer size and format parameters; by simply specifying NULL or 0 we ask libavformat to auto-detect the format and use a default buffer size. Replace handle_error() with appropriate error handling code for your application.
Next, we need to retrieve information about the streams contained in the file:
// Retrieve stream information
if(av_find_stream_info(pFormatCtx)<0)
handle_error(); // Couldn't find stream information
This fills the streams field of the AVFormatContext with valid information. As a debugging aid, we'll dump this information onto standard error, but of course you don't have to do this in a production application:
dump_format(pFormatCtx, 0, filename, false);
As mentioned in the introduction, we'll handle only video streams, not audio streams. To make things nice and easy, we simply use the first video stream we find:
int i, videoStream;
AVCodecContext *pCodecCtx;
// Find the first video stream
videoStream=-1;
for(i=0; i<pFormatCtx->nb_streams; i++)
if(pFormatCtx->streams->codec.codec_type==CODEC_TYPE_VIDEO)
{
videoStream=i;
break;
}
if(videoStream==-1)
handle_error(); // Didn't find a video stream
// Get a pointer to the codec context for the video stream
pCodecCtx=&pFormatCtx->streams[videoStream]->codec;
OK, so now we've got a pointer to the so-called codec context for our video stream, but we still have to find the actual codec and open it:
AVCodec *pCodec;
// Find the decoder for the video stream
pCodec=avcodec_find_decoder(pCodecCtx->codec_id);
if(pCodec==NULL)
handle_error(); // Codec not found
// Inform the codec that we can handle truncated bitstreams -- i.e.,
// bitstreams where frame boundaries can fall in the middle of packets
if(pCodec->capabilities & CODEC_CAP_TRUNCATED)
pCodecCtx->flags|=CODEC_FLAG_TRUNCATED;
// Open codec
if(avcodec_open(pCodecCtx, pCodec)<0)
handle_error(); // Could not open codec
(So what's up with those "truncated bitstreams"? Well, as we'll see in a moment, the data in a video stream is split up into packets. Since the amount of data per video frame can vary, the boundary between two video frames need not coincide with a packet boundary. Here, we're telling the codec that we can handle this situation.)
One important piece of information that is stored in the AVCodecContext structure is the frame rate of the video. To allow for non-integer frame rates (like NTSC's 29.97 fps), the rate is stored as a fraction, with the numerator in pCodecCtx->frame_rate and the denominator in pCodecCtx->frame_rate_base. While testing the library with different video files, I noticed that some codecs (notably ASF) seem to fill these fields incorrectly (frame_rate_base contains 1 instead of 1000). The following hack fixes this:
// Hack to correct wrong frame rates that seem to be generated by some
// codecs
if(pCodecCtx->frame_rate>1000 && pCodecCtx->frame_rate_base==1)
pCodecCtx->frame_rate_base=1000;
Note that it shouldn't be a problem to leave this fix in place even if the bug is corrected some day - it's unlikely that a video would have a frame rate of more than 1000 fps.
One more thing left to do: Allocate a video frame to store the decoded images in:
AVFrame *pFrame;
pFrame=avcodec_alloc_frame();
That's it! Now let's start decoding some video.
Decoding Video Frames
As I've already mentioned, a video file can contain several audio and video streams, and each of those streams is split up into packets of a particular size. Our job is to read these packets one by one using libavformat, filter out all those that aren't part of the video stream we're interested in, and hand them on to libavcodec for decoding. In doing this, we'll have to take care of the fact that the boundary between two frames can occur in the middle of a packet.
Sound complicated? Lucikly, we can encapsulate this whole process in a routine that simply returns the next video frame:
bool GetNextFrame(AVFormatContext *pFormatCtx, AVCodecContext *pCodecCtx,
int videoStream, AVFrame *pFrame)
{
static AVPacket packet;
static int bytesRemaining=0;
static uint8_t *rawData;
static bool fFirstTime=true;
int bytesDecoded;
int frameFinished;
// First time we're called, set packet.data to NULL to indicate it
// doesn't have to be freed
if(fFirstTime)
{
fFirstTime=false;
packet.data=NULL;
}
// Decode packets until we have decoded a complete frame
while(true)
{
// Work on the current packet until we have decoded all of it
while(bytesRemaining > 0)
{
// Decode the next chunk of data
bytesDecoded=avcodec_decode_video(pCodecCtx, pFrame,
&frameFinished, rawData, bytesRemaining);
// Was there an error?
if(bytesDecoded < 0)
{
fprintf(stderr, "Error while decoding frame/n");
return false;
}
bytesRemaining-=bytesDecoded;
rawData+=bytesDecoded;
// Did we finish the current frame? Then we can return
if(frameFinished)
return true;
}
// Read the next packet, skipping all packets that aren't for this
// stream
do
{
// Free old packet
if(packet.data!=NULL)
av_free_packet(&packet);
// Read new packet
if(av_read_packet(pFormatCtx, &packet)<0)
goto loop_exit;
} while(packet.stream_index!=videoStream);
bytesRemaining=packet.size;
rawData=packet.data;
}
loop_exit:
// Decode the rest of the last frame
bytesDecoded=avcodec_decode_video(pCodecCtx, pFrame, &frameFinished,
rawData, bytesRemaining);
// Free last packet
if(packet.data!=NULL)
av_free_packet(&packet);
return frameFinished!=0;
}
Now, all we have to do is sit in a loop, calling GetNextFrame() until it returns false. Just one more thing to take care of: Most codecs return images in YUV 420 format (one luminance and two chrominance channels, with the chrominance channels samples at half the spatial resolution of the luminance channel). Depending on what you want to do with the video data, you may want to convert this to RGB. (Note, though, that this is not necessary if all you want to do is display the video data; take a look at the X11 Xvideo extension, which does YUV-to-RGB and scaling in hardware.) Fortunately, libavcodec provides a conversion routine called img_convert, which does conversion between YUV and RGB as well as a variety of other image formats. The loop that decodes the video thus becomes:
while(GetNextFrame(pFormatCtx, pCodecCtx, videoStream, pFrame))
{
img_convert((AVPicture *)pFrameRGB, PIX_FMT_RGB24, (AVPicture*)pFrame,
pCodecCtx->pix_fmt, pCodecCtx->width, pCodecCtx->height);
// Process the video frame (save to disk etc.)
DoSomethingWithTheImage(pFrameRGB);
}
The RGB image pFrameRGB (of type AVFrame *) is allocated like this:
AVFrame *pFrameRGB;
int numBytes;
uint8_t *buffer;
// Allocate an AVFrame structure
pFrameRGB=avcodec_alloc_frame();
if(pFrameRGB==NULL)
handle_error();
// Determine required buffer size and allocate buffer
numBytes=avpicture_get_size(PIX_FMT_RGB24, pCodecCtx->width,
pCodecCtx->height);
buffer=new uint8_t[numBytes];
// Assign appropriate parts of buffer to image planes in pFrameRGB
avpicture_fill((AVPicture *)pFrameRGB, buffer, PIX_FMT_RGB24,
pCodecCtx->width, pCodecCtx->height);
Cleaning up
OK, we've read and processed our video, now all that's left for us to do is clean up after ourselves:
// Free the RGB image
delete [] buffer;
av_free(pFrameRGB);
// Free the YUV frame
av_free(pFrame);
// Close the codec
avcodec_close(pCodecCtx);
// Close the video file
av_close_input_file(pFormatCtx);
Done!
Update (April 26, 2005): A reader informs me that to compile the example programs on Kanotix (a Debian derivative) and possibly Debian itself, the include directives for avcodec.h and avformat.h have to be prefixed with "ffmpeg", like this:
#include <ffmpeg/avcodec.h>
#include <ffmpeg/avformat.h>
Also, the library libdts has to be included when compiling the programs, like this:
g++ -o avcodec_sample.0.4.9 avcodec_sample.0.4.9.cpp /
-lavformat -lavcodec -ldts -lz
A few months ago, I wrote an article on using the libavformat and libavcodec libraries that come with ffmpeg. Since then, I have received a number of comments, and a new prerelease version of ffmpeg (0.4.9-pre1) has recently become available, adding support for seeking in video files, new file formats, and a simplified interface for reading video frames. These changes have been in the CVS for a while, but now is the first time we get to see them in a release. (Thanks by the way to Silviu Minut for sharing the results of long hours of studying the CVS versions of ffmpeg - his page with ffmpeg information and a demo program is here.)
In this article, I'll describe only the differences between the previous release (0.4.8) and the new one, so if you're new to libavformat / libavcodec, I suggest you read the original article first.
First, a word about compiling the new release. On my compiler (gcc 3.3.1 on SuSE), I get an internal compiler error while compiling the source file ffv1.c. I suspect this particular version of gcc is a little flaky - I've had the same thing happen to me when compiling OpenCV - but at any rate, a quick fix is to compile this one file without optimizations. The easiest way to do this is to do a make, then when the build hits the compiler error, change to the libavcodec subdirectory (since this is where ffv1.c lives), copy the gcc command to compile ffv1.c from your terminal window, paste it back in, edit out the "-O3" compiler switch and then run gcc using that command. After that, you can change back to the main ffmpeg directory and restart make, and it should complete the build.
What's New?
So what's new? From a programmer's point of view, the biggest change is probably the simplified interface for reading individual video frames from a video file. In ffmpeg 0.4.8 and earlier, data is read from the video file in packets using the routine av_read_packet(). Usually, the information for one video frame is spread out over several packets, and the situation is made even more complicated by the fact that the boundary between two video frames can come in the middle of two packets. Thankfully, ffmpeg 0.4.9 introduces a new routine called av_read_frame(), which returns all of the data for a video frame in a single packet. The old way of reading video data using av_read_packet() is still supported but deprecated - I say: good riddance.
So let's take a look at how to access video data using the new API. In my original article (with the old 0.4.8 API), the main decode loop looked like this:
while(GetNextFrame(pFormatCtx, pCodecCtx, videoStream, pFrame))
{
img_convert((AVPicture *)pFrameRGB, PIX_FMT_RGB24, (AVPicture*)pFrame,
pCodecCtx->pix_fmt, pCodecCtx->width, pCodecCtx->height);
// Process the video frame (save to disk etc.)
DoSomethingWithTheImage(pFrameRGB);
}
GetNextFrame() is a helper routine that handles the process of assembling all of the packets that make up one video frame. The new API simplifies things to the point that we can do the actual reading and decoding of data directly in our main loop:
while(av_read_frame(pFormatCtx, &packet)>=0)
{
// Is this a packet from the video stream?
if(packet.stream_index==videoStream)
{
// Decode video frame
avcodec_decode_video(pCodecCtx, pFrame, &frameFinished,
packet.data, packet.size);
// Did we get a video frame?
if(frameFinished)
{
// Convert the image from its native format to RGB
img_convert((AVPicture *)pFrameRGB, PIX_FMT_RGB24,
(AVPicture*)pFrame, pCodecCtx->pix_fmt, pCodecCtx->width,
pCodecCtx->height);
// Process the video frame (save to disk etc.)
DoSomethingWithTheImage(pFrameRGB);
}
}
// Free the packet that was allocated by av_read_frame
av_free_packet(&packet);
}
At first sight, it looks as if things have actually gotten more complex - but that is just because this piece code does things that used to be hidden in the GetNextFrame() routine (checking if the packet belongs to the video stream, decoding the frame and freeing the packet). Overall, because we can eliminate GetNextFrame() completely, things have gotten a lot easier.
I've updated the demo program to use the new API. Simply comparing the number of lines (222 lines for the old version vs. 169 lines for the new one) shows that the new API has simplified things considerably.
Another important addition in the 0.4.9 release is the ability to seek to a certain timestamp in a video file. This is accomplished using the av_seek_frame() function, which takes three parameters: A pointer to the AVFormatContext, a stream index and the timestamp to seek to. The function will then seek to the first key frame before the given timestamp. All of this is from the documentation - I haven't gotten round to actually testing av_seek_frame() yet, so I can't present any sample code either. If you've used av_seek_frame() successfully, I'd be glad to hear about it.
Frame Grabbing (Video4Linux and IEEE1394)
Toru Tamaki sent me some sample code that demonstrates how to grab frames from a Video4Linux or IEEE1394 video source using libavformat / libavcodec. For Video4Linux, the call to av_open_input_file() should be modified as follows:
AVFormatParameters formatParams;
AVInputFormat *iformat;
formatParams.device = "/dev/video0";
formatParams.channel = 0;
formatParams.standard = "ntsc";
formatParams.width = 640;
formatParams.height = 480;
formatParams.frame_rate = 29;
formatParams.frame_rate_base = 1;
filename = "";
iformat = av_find_input_format("video4linux");
av_open_input_file(&ffmpegFormatContext,
filename, iformat, 0, &formatParams);
For IEEE1394, call av_open_input_file() like this:
AVFormatParameters formatParams;
AVInputFormat *iformat;
formatParams.device = "/dev/dv1394";
filename = "";
iformat = av_find_input_format("dv1394");
av_open_input_file(&ffmpegFormatContext,
filename, iformat, 0, &formatParams);
To be continued...
If I come across additional interesting information about libavformat / libavcodec, I plan to publish it here. So, if you have any comments, please contact me at the address given at the top of this article.
Standard disclaimer: I assume no liability for the correct functioning of the code and techniques presented in this article.