最近在做一个视频分析相关的产品,基本架构就是使用ffmpeg取流,cuda解码,然后调用算法进行分析,生成图片。但产品做完之后,发现生成的图片存在花屏问题。起初没有太在意,因为rtsp视频流底层使用的是udp协议,丢个一两帧数据,造成花屏是件再正常不过的事情(但忽略了这是在局域网内)。况且,已经将将解码和取流分开,做了一级缓冲,再优化的空间实在不是很大,再加上时间紧,实在抽不出时间来解决该问题。
但随之发现的一个bug,是我不得不重视这个问题,那就是——统一路视频流的多个实例,最后检测出的目标对象不同,而且差距很挺大,如下:
原因很明显,每个实例丢掉的数据不一样,导致了其分析的数据内容不一样,因此,最后检测出的结果也不一样。
该如何解决?首先想到的就是增大缓存。因为为了对本地历史文件做流控,只对每个实例做了2秒的缓存。于是,我一次又一次增大缓存,从缓存2s增加到缓存50秒,仍然丢帧,出现下面的问题:
很无奈,很崩溃!于是开始百度“ffmpeg取流 丢帧”,到最后发现好解决方案都是将接收协议改为tcp,然后增加socket接收缓冲区(忘记了,以前自己写接收发送视频或图片的代码,都会先调用setsockopt增大socket的缓冲区),代码如下:
AVDictionary* options = NULL;
av_dict_set(&options, "rtsp_transport", "tcp", 0); //强制使用tcp,udp在1080p下会丢包导致花屏
av_dict_set(&options, " max_delay", " 5000000", 0); //强制使用tcp,udp在1080p下会丢包导致花屏
av_dict_set(&options, "buffer_size", "8388608", 0); //设置udp的接收缓冲
考虑到在网络不好的情况下,TCP延迟可能会很大(这种情况下只能udp接收丢包),故还是采用UDP协议,即不设置rtsp_transport字段。修改代码后,问题解决,但ffmpeg又抛出下面的日志:
“attempted to set receive buffer to size 8388608 but it only ended up set as 425984”是什么鬼?会有潜在问题么?于是准备查看ffmpeg源码,查找问题的根源,但又不知从何找起。首先查看的是av_dict_set(dict.c)源码,发现这里面根本没有buffer_size的影子,后来查看options_table.h也没有buffer_size的影子。于是有百度avformat_open_input的代码,但最后也不了了之,都绝望了。最后通过百度“ffmpeg buffer_size 最大值”终于找出了蛛丝马迹,Set RTSP/UDP buffer size in FFmpeg/LibAV发现这些内容有可能libavformat目录下的udp.c中。
于是打开udp.c,相关代码片段果然在该文件:
static const AVOption options[] = {
{ "buffer_size", "System data size (in bytes)", OFFSET(buffer_size), AV_OPT_TYPE_INT, { .i64 = -1 }, -1, INT_MAX, .flags = D|E },
{ "bitrate", "Bits to send per second", OFFSET(bitrate), AV_OPT_TYPE_INT64, { .i64 = 0 }, 0, INT64_MAX, .flags = E },
{ "burst_bits", "Max length of bursts in bits (when using bitrate)", OFFSET(burst_bits), AV_OPT_TYPE_INT64, { .i64 = 0 }, 0, INT64_MAX, .flags = E },
{ "localport", "Local port", OFFSET(local_port), AV_OPT_TYPE_INT, { .i64 = -1 }, -1, INT_MAX, D|E },
{ "local_port", "Local port", OFFSET(local_port), AV_OPT_TYPE_INT, { .i64 = -1 }, -1, INT_MAX, .flags = D|E },
{ "localaddr", "Local address", OFFSET(localaddr), AV_OPT_TYPE_STRING, { .str = NULL }, .flags = D|E },
{ "udplite_coverage", "choose UDPLite head size which should be validated by checksum", OFFSET(udplite_coverage), AV_OPT_TYPE_INT, {.i64 = 0}, 0, INT_MAX, D|E },
{ "pkt_size", "Maximum UDP packet size", OFFSET(pkt_size), AV_OPT_TYPE_INT, { .i64 = 1472 }, -1, INT_MAX, .flags = D|E },
{ "reuse", "explicitly allow reusing UDP sockets", OFFSET(reuse_socket), AV_OPT_TYPE_BOOL, { .i64 = -1 }, -1, 1, D|E },
{ "reuse_socket", "explicitly allow reusing UDP sockets", OFFSET(reuse_socket), AV_OPT_TYPE_BOOL, { .i64 = -1 }, -1, 1, .flags = D|E },
{ "broadcast", "explicitly allow or disallow broadcast destination", OFFSET(is_broadcast), AV_OPT_TYPE_BOOL, { .i64 = 0 }, 0, 1, E },
{ "ttl", "Time to live (multicast only)", OFFSET(ttl), AV_OPT_TYPE_INT, { .i64 = 16 }, 0, INT_MAX, E },
{ "connect", "set if connect() should be called on socket", OFFSET(is_connected), AV_OPT_TYPE_BOOL, { .i64 = 0 }, 0, 1, .flags = D|E },
{ "fifo_size", "set the UDP receiving circular buffer size, expressed as a number of packets with size of 188 bytes", OFFSET(circular_buffer_size), AV_OPT_TYPE_INT, {.i64 = 7*4096}, 0, INT_MAX, D },
{ "overrun_nonfatal", "survive in case of UDP receiving circular buffer overrun", OFFSET(overrun_nonfatal), AV_OPT_TYPE_BOOL, {.i64 = 0}, 0, 1, D },
{ "timeout", "set raise error timeout (only in read mode)", OFFSET(timeout), AV_OPT_TYPE_INT, { .i64 = 0 }, 0, INT_MAX, D },
{ "sources", "Source list", OFFSET(sources), AV_OPT_TYPE_STRING, { .str = NULL }, .flags = D|E },
{ "block", "Block list", OFFSET(block), AV_OPT_TYPE_STRING, { .str = NULL }, .flags = D|E },
{ NULL }
};
并且,打印日志的代码片段也在该文件:
if (is_output) {
/* limit the tx buf size to limit latency */
tmp = s->buffer_size;
if (setsockopt(udp_fd, SOL_SOCKET, SO_SNDBUF, &tmp, sizeof(tmp)) < 0) {
log_net_error(h, AV_LOG_ERROR, "setsockopt(SO_SNDBUF)");
goto fail;
}
} else {
/* set udp recv buffer size to the requested value (default 64K) */
tmp = s->buffer_size;
if (setsockopt(udp_fd, SOL_SOCKET, SO_RCVBUF, &tmp, sizeof(tmp)) < 0) {
log_net_error(h, AV_LOG_WARNING, "setsockopt(SO_RECVBUF)");
}
len = sizeof(tmp);
if (getsockopt(udp_fd, SOL_SOCKET, SO_RCVBUF, &tmp, &len) < 0) {
log_net_error(h, AV_LOG_WARNING, "getsockopt(SO_RCVBUF)");
} else {
av_log(h, AV_LOG_DEBUG, "end receive buffer size reported is %d\n", tmp);
if(tmp < s->buffer_size)
av_log(h, AV_LOG_WARNING, "attempted to set receive buffer to size %d but it only ended up set as %d", s->buffer_size, tmp);
}
/* make the socket non-blocking */
ff_socket_nonblock(udp_fd, 1);
}
很明显,ffmpeg是先通过setsockopt设置socket接收缓冲区,然后又通过getsockopt获取socket接收缓冲区,来确认属性是否设置成功。但发现获取的接收缓冲区值小于设置的值,因此抛出了警告。具体为什么失败,请参考文章《socket tcp缓冲区大小的默认值、最大值 》。其实失败的原因说白了就是,设置的值,超过了系统允许的上限(ubuntu 16.04 允许的最大值为208KB,然后*2的416KB,即425984)。
自己始终感觉,为每个实例设置50秒的缓存,实在太大了,于是将缓存改成5秒,但问题又出现了:
经过百度,发现“jitter buffer”就是防抖缓冲区,很明显是自己的程序某些部分处理满了,于是加大缓冲区,6,7,8.......20,还是如此,一气之下又改回了50时。但是该种方式仅仅能从一定程度上缓解“jitter buffer full”的情况,如果想完全解决,只能通过av_dict_set设置reorder_queque_size(rtp包接收重排序队列大小),该值默认为500,可以根据具体情况调整,解决“jitter buffer full”的情况。