ffmpeg:开源的跨平台的视频和音频流方案,提供了录制、转换以及流化音视频的完整解决方案,包含先进的音频/视频编解码库libavcodec,里面提供了许多API给我们使用,但仍有一些问题需要我们自己解决,如同步问题。
ffmpeg编译后的一些文件:
ffplay:真正的播放器,像vlc、mplayer等,有图形界面的
ffmpeg:可以理解为一种工具,利用ffmpeg提供的API,再加上其他操作,可以实现转码等一些功能。
ffserver:做服务器的,可以单播或多播一些流。
处理音视频的一般过程:
1、从视频文件中打开视频流(解复用的过程)
2、从视频流中读取包到帧当中(解码)
3、如果帧还不完整,跳回2
4、对该帧进行操作
5、跳回2
一、打开文件获取视频流(以下操作使用ffmpeg-0.8版本)
<1> 使用av_register_all()注册所有的文件格式和编解码器的库,只需要调用一次,所以最好的选择就是在main函数中。
<2> av_open_input_file 打开视频文件。这个函数会读取视频文件头部信息并保存在AVFormatContext中,函数原型如下(avformat.h中):
int av_open_input_file(AVFormatContext **ic_ptr, const char *filename, AVInputFormat *fmt, int buf_size, AVFormatParameters *ap) { int err; AVDictionary *opts = convert_format_parameters(ap); if (!ap || !ap->prealloced_context) *ic_ptr = NULL; err = avformat_open_input(ic_ptr, filename, fmt, &opts); av_dict_free(&opts); return err; }
函数实现在utils.c中。如果后3个参数为NULL或者0,libavformat将自动检测这些参数,该函数最终调用avformat_open_input进行操作。
int avformat_open_input(AVFormatContext **ps, const char *filename, AVInputFormat *fmt, AVDictionary **options) { return avformat_open_input_header(ps,filename,fmt,options,NULL); }
avformat_open_input_header作用是读取视频文件头部信息并保存在AVFormatContext中。
<3> 根据视频文件头部信息,得到音视频流的信息,调用函数av_find_stream_info(定义在avformat.h)
int av_find_stream_info(AVFormatContext *ic) { int i, count, ret, read_size, j; AVStream *st; AVPacket pkt1, *pkt; int64_t old_offset = avio_tell(ic->pb); for(i=0;i<ic->nb_streams;i++) { AVCodec *codec; st = ic->streams[i]; /*st->codec得到的是AVCodecContext类型,其保存了流中关于使用 编解码器的信息 */ if (st->codec->codec_id == CODEC_ID_AAC) { st->codec->sample_rate = 0; st->codec->frame_size = 0; st->codec->channels = 0; } if (st->codec->codec_type == AVMEDIA_TYPE_VIDEO || st->codec->codec_type == AVMEDIA_TYPE_SUBTITLE) { /* if(!st->time_base.num) st->time_base= */ if(!st->codec->time_base.num) /* time_base是一个AVRational(分母)结构体,保存帧率的信息,现在很多编解码器 都使用非整数的帧率,如NTSC使用29.97fps*/ st->codec->time_base= st->time_base; } //only for the split stuff if (!st->parser && !(ic->flags & AVFMT_FLAG_NOPARSE)) { st->parser = av_parser_init(st->codec->codec_id); if(st->need_parsing == AVSTREAM_PARSE_HEADERS && st->parser){ st->parser->flags |= PARSER_FLAG_COMPLETE_FRAMES; } } assert(!st->codec->codec); /*找到对应的编解码器*/ codec = avcodec_find_decoder(st->codec->codec_id); /* Force decoding of at least one frame of codec data * this makes sure the codec initializes the channel configuration * and does not trust the values from the container. */ if (codec && codec->capabilities & CODEC_CAP_CHANNEL_CONF) st->codec->channels = 0; /* Ensure that subtitle_header is properly set. */ if (st->codec->codec_type == AVMEDIA_TYPE_SUBTITLE && codec && !st->codec->codec) //打开编解码器 avcodec_open(st->codec, codec); //try to just open decoders, in case this is enough to get parameters if(!has_codec_parameters(st->codec)){ if (codec && !st->codec->codec) avcodec_open(st->codec, codec); } } for (i=0; i<ic->nb_streams; i++) { ic->streams[i]->info->last_dts = AV_NOPTS_VALUE; } count = 0; read_size = 0; for(;;) { if(url_interrupt_cb()){ ret= AVERROR_EXIT; av_log(ic, AV_LOG_DEBUG, "interrupted\n"); break; } /* check if one codec still needs to be handled */ for(i=0;i<ic->nb_streams;i++) { int fps_analyze_framecount = 20; st = ic->streams[i]; if (!has_codec_parameters(st->codec)) break; /* if the timebase is coarse (like the usual millisecond precision of mkv), we need to analyze more frames to reliably arrive at the correct fps */ if (av_q2d(st->time_base) > 0.0005) fps_analyze_framecount *= 2; if (ic->fps_probe_size >= 0) fps_analyze_framecount = ic->fps_probe_size; /* variable fps and no guess at the real fps */ if( tb_unreliable(st->codec) && !(st->r_frame_rate.num && st->avg_frame_rate.num) && st->info->duration_count < fps_analyze_framecount && st->codec->codec_type == AVMEDIA_TYPE_VIDEO) break; if(st->parser && st->parser->parser->split && !st->codec->extradata) break; if(st->first_dts == AV_NOPTS_VALUE) break; } if (i == ic->nb_streams) { /* NOTE: if the format has no header, then we need to read some packets to get most of the streams, so we cannot stop here */ if (!(ic->ctx_flags & AVFMTCTX_NOHEADER)) { /* if we found the info for all the codecs, we can stop */ ret = count; av_log(ic, AV_LOG_DEBUG, "All info found\n"); break; } } /* we did not get all the codec info, but we read too much data */ if (read_size >= ic->probesize) { ret = count; av_log(ic, AV_LOG_DEBUG, "Probe buffer size limit %d reached\n", ic->probesize); break; } /* NOTE: a new stream can be added there if no header in file (AVFMTCTX_NOHEADER) */ ret = av_read_frame_internal(ic, &pkt1); if (ret < 0 && ret != AVERROR(EAGAIN)) { /* EOF or error */ ret = -1; /* we could not have all the codec parameters before EOF */ for(i=0;i<ic->nb_streams;i++) { st = ic->streams[i]; if (!has_codec_parameters(st->codec)){ char buf[256]; avcodec_string(buf, sizeof(buf), st->codec, 0); av_log(ic, AV_LOG_WARNING, "Could not find codec parameters (%s)\n", buf); } else { ret = 0; } } break; } if (ret == AVERROR(EAGAIN)) continue; pkt= add_to_pktbuf(&ic->packet_buffer, &pkt1, &ic->packet_buffer_end); if ((ret = av_dup_packet(pkt)) < 0) goto find_stream_info_err; read_size += pkt->size; st = ic->streams[pkt->stream_index]; if (st->codec_info_nb_frames>1) { int64_t t; if (st->time_base.den > 0 && (t=av_rescale_q(st->info->codec_info_duration, st->time_base, AV_TIME_BASE_Q)) >= ic->max_analyze_duration) { av_log(ic, AV_LOG_WARNING, "max_analyze_duration %d reached at %"PRId64"\n", ic->max_analyze_duration, t); break; } st->info->codec_info_duration += pkt->duration; } { int64_t last = st->info->last_dts; int64_t duration= pkt->dts - last; if(pkt->dts != AV_NOPTS_VALUE && last != AV_NOPTS_VALUE && duration>0){ double dur= duration * av_q2d(st->time_base); // if(st->codec->codec_type == AVMEDIA_TYPE_VIDEO) // av_log(NULL, AV_LOG_ERROR, "%f\n", dur); if (st->info->duration_count < 2) memset(st->info->duration_error, 0, sizeof(st->info->duration_error)); for (i=1; i<FF_ARRAY_ELEMS(st->info->duration_error); i++) { int framerate= get_std_framerate(i); int ticks= lrintf(dur*framerate/(1001*12)); double error= dur - ticks*1001*12/(double)framerate; st->info->duration_error[i] += error*error; } st->info->duration_count++; // ignore the first 4 values, they might have some random jitter if (st->info->duration_count > 3) st->info->duration_gcd = av_gcd(st->info->duration_gcd, duration); } if (last == AV_NOPTS_VALUE || st->info->duration_count <= 1) st->info->last_dts = pkt->dts; } if(st->parser && st->parser->parser->split && !st->codec->extradata){ int i= st->parser->parser->split(st->codec, pkt->data, pkt->size); if(i){ st->codec->extradata_size= i; st->codec->extradata= av_malloc(st->codec->extradata_size + FF_INPUT_BUFFER_PADDING_SIZE); memcpy(st->codec->extradata, pkt->data, st->codec->extradata_size); memset(st->codec->extradata + i, 0, FF_INPUT_BUFFER_PADDING_SIZE); } } /* if still no information, we try to open the codec and to decompress the frame. We try to avoid that in most cases as it takes longer and uses more memory. For MPEG-4, we need to decompress for QuickTime. */ if (!has_codec_parameters(st->codec) || !has_decode_delay_been_guessed(st)) try_decode_frame(st, pkt); st->codec_info_nb_frames++; count++; } // close codecs which were opened in try_decode_frame() for(i=0;i<ic->nb_streams;i++) { st = ic->streams[i]; if(st->codec->codec) avcodec_close(st->codec); } for(i=0;i<ic->nb_streams;i++) { st = ic->streams[i]; if (st->codec_info_nb_frames>2 && !st->avg_frame_rate.num && st->info->codec_info_duration) av_reduce(&st->avg_frame_rate.num, &st->avg_frame_rate.den, (st->codec_info_nb_frames-2)*(int64_t)st->time_base.den, st->info->codec_info_duration*(int64_t)st->time_base.num, 60000); if (st->codec->codec_type == AVMEDIA_TYPE_VIDEO) { if(st->codec->codec_id == CODEC_ID_RAWVIDEO && !st->codec->codec_tag && !st->codec->bits_per_coded_sample){ uint32_t tag= avcodec_pix_fmt_to_codec_tag(st->codec->pix_fmt); if(ff_find_pix_fmt(ff_raw_pix_fmt_tags, tag) == st->codec->pix_fmt) st->codec->codec_tag= tag; } // the check for tb_unreliable() is not completely correct, since this is not about handling // a unreliable/inexact time base, but a time base that is finer than necessary, as e.g. // ipmovie.c produces. if (tb_unreliable(st->codec) && st->info->duration_count > 15 && st->info->duration_gcd > FFMAX(1, st->time_base.den/(500LL*st->time_base.num)) && !st->r_frame_rate.num) av_reduce(&st->r_frame_rate.num, &st->r_frame_rate.den, st->time_base.den, st->time_base.num * st->info->duration_gcd, INT_MAX); if (st->info->duration_count && !st->r_frame_rate.num && tb_unreliable(st->codec) /*&& //FIXME we should not special-case MPEG-2, but this needs testing with non-MPEG-2 ... st->time_base.num*duration_sum[i]/st->info->duration_count*101LL > st->time_base.den*/){ int num = 0; double best_error= 2*av_q2d(st->time_base); best_error = best_error*best_error*st->info->duration_count*1000*12*30; for (j=1; j<FF_ARRAY_ELEMS(st->info->duration_error); j++) { double error = st->info->duration_error[j] * get_std_framerate(j); // if(st->codec->codec_type == AVMEDIA_TYPE_VIDEO) // av_log(NULL, AV_LOG_ERROR, "%f %f\n", get_std_framerate(j) / 12.0/1001, error); if(error < best_error){ best_error= error; num = get_std_framerate(j); } } // do not increase frame rate by more than 1 % in order to match a standard rate. if (num && (!st->r_frame_rate.num || (double)num/(12*1001) < 1.01 * av_q2d(st->r_frame_rate))) av_reduce(&st->r_frame_rate.num, &st->r_frame_rate.den, num, 12*1001, INT_MAX); } if (!st->r_frame_rate.num){ if( st->codec->time_base.den * (int64_t)st->time_base.num <= st->codec->time_base.num * st->codec->ticks_per_frame * (int64_t)st->time_base.den){ st->r_frame_rate.num = st->codec->time_base.den; st->r_frame_rate.den = st->codec->time_base.num * st->codec->ticks_per_frame; }else{ st->r_frame_rate.num = st->time_base.den; st->r_frame_rate.den = st->time_base.num; } } }else if(st->codec->codec_type == AVMEDIA_TYPE_AUDIO) { if(!st->codec->bits_per_coded_sample) st->codec->bits_per_coded_sample= av_get_bits_per_sample(st->codec->codec_id); // set stream disposition based on audio service type switch (st->codec->audio_service_type) { case AV_AUDIO_SERVICE_TYPE_EFFECTS: st->disposition = AV_DISPOSITION_CLEAN_EFFECTS; break; case AV_AUDIO_SERVICE_TYPE_VISUALLY_IMPAIRED: st->disposition = AV_DISPOSITION_VISUAL_IMPAIRED; break; case AV_AUDIO_SERVICE_TYPE_HEARING_IMPAIRED: st->disposition = AV_DISPOSITION_HEARING_IMPAIRED; break; case AV_AUDIO_SERVICE_TYPE_COMMENTARY: st->disposition = AV_DISPOSITION_COMMENT; break; case AV_AUDIO_SERVICE_TYPE_KARAOKE: st->disposition = AV_DISPOSITION_KARAOKE; break; } } } av_estimate_timings(ic, old_offset); compute_chapters_end(ic); #if 0 /* correct DTS for B-frame streams with no timestamps */ for(i=0;i<ic->nb_streams;i++) { st = ic->streams[i]; if (st->codec->codec_type == AVMEDIA_TYPE_VIDEO) { if(b-frames){ ppktl = &ic->packet_buffer; while(ppkt1){ if(ppkt1->stream_index != i) continue; if(ppkt1->pkt->dts < 0) break; if(ppkt1->pkt->pts != AV_NOPTS_VALUE) break; ppkt1->pkt->dts -= delta; ppkt1= ppkt1->next; } if(ppkt1) continue; st->cur_dts -= delta; } } } #endif find_stream_info_err: for (i=0; i < ic->nb_streams; i++) av_freep(&ic->streams[i]->info); return ret; }
二、读取包的信息保存在帧中
<1> 分配目标帧的内存,函数为avcode_alloc_frame() (定义在avcodec.h)
AVFrame *avcodec_alloc_frame(void); /* ffmpeg/libavcodec/utils.c */ AVFrame *avcodec_alloc_frame(void){ AVFrame *pic= av_malloc(sizeof(AVFrame)); if(pic==NULL) return NULL; avcodec_get_frame_defaults(pic); return pic; }
<2> 通过读取包来读取视频流,将它解码成帧,主要函数有av_read_frame(),这里说明下,av_read_packet这个方法已经不用了,在ffmpeg 0.8中有进行说明,函数原型如下:
/** * Return the next frame of a stream. * This function returns what is stored in the file, and does not validate * that what is there are valid frames for the decoder. It will split what is * stored in the file into frames and return one for each call. It will not * omit invalid data between valid frames so as to give the decoder the maximum * information possible for decoding. * * The returned packet is valid * until the next av_read_frame() or until av_close_input_file() and * must be freed with av_free_packet. For video, the packet contains * exactly one frame. For audio, it contains an integer number of * frames if each frame has a known fixed size (e.g. PCM or ADPCM * data). If the audio frames have a variable size (e.g. MPEG audio), * then it contains one frame. * * pkt->pts, pkt->dts and pkt->duration are always set to correct * values in AVStream.time_base units (and guessed if the format cannot * provide them). pkt->pts can be AV_NOPTS_VALUE if the video format * has B-frames, so it is better to rely on pkt->dts if you do not * decompress the payload. * * @return 0 if OK, < 0 on error or end of file */ int av_read_frame(AVFormatContext *s, AVPacket *pkt);
av_read_frame通常是在while循环中,主要是读取一个包并且将它保存在AVPacket结构体中,使用函数avcodec_decode_video2()将包转换为帧(原先的avcodec_decode_video()方法已经不用了,在ffmpeg/doc/APIChanges 说明如下:
2009-04-07 - r18351 - lavc 52.23.0 - avcodec_decode_video/audio/subtitle The old decoding functions are deprecated, all new code should use the new functions avcodec_decode_video2(), avcodec_decode_audio3() and avcodec_decode_subtitle2(). These new functions take an AVPacket *pkt argument instead of a const uint8_t *buf / int buf_size pair.
当解码一个包时,我们可能没有得到自己所需的帧的信息,因此当我们得到下一帧的时候,avcodec_decode_video2中设置了帧的结束标志 got_picture, 如果得到我们所需的帧,我们就可以对其进行自己所需要的操作了。