ffmpeg编译使用cuvid硬解方案试过了,不过解码出来的像素格式为YUV420, opencv中使用需要转成BGR,转色彩空间这部占用的CPU过高。
因此需要将转色彩空间这步也用GPU来处理,NVIDIA 开源了适用于 Python 的视频处理框架「VideoProcessingFramework(VPF)」。该框架为开发人员提供了一个简单但功能强大的 Python 工具,可用于硬件加速的视频编码、解码和处理类等任务。
同时,由于 Python 绑定下的 C ++ 代码,它使开发者可以在数十行代码中实现较高的 GPU 利用率。解码后的视频帧以 NumPy 数组或 CUDA 设备指针的形式公开,以简化交互过程及其扩展功能。
目前,VPF 并未对 NVIDIA Video Codec SDK 附加任何限制,开发者可充分利用 NVIDIA 专业级 GPU 的功能。
说明参考 VPF:适用于 Python 的开源视频处理框架,加速视频任务、提高 GPU 利用率
同时,VPF also supports exporting GPU memory objects such as decoded video frames to PyTorch tensors without Host to Device copies.
参考 Ubuntu上安装NVIDIA VideoProcessingFramework (VPF)
②下载NVIDIA Video Codec SDK并解压,官网下载需要注册
安装对应nvidia驱动版本的Nvidia Video Codec SDK
我的是linux 470.86, 因此下载VideoCodecSDK11.1
unzip Video_Codec_SDK.zip
cd Video_Codec_SDK
$ sudo cp Interface/* /usr/local/cuda/include
$ sudo cp Lib/linux/stubs/x86_64/* /usr/local/cuda/lib64/stubs
③编译安装ffmpeg,我编译了ffmpeg的cuvid版本, 还不清楚的可以翻看以前的文章 经测试需要ffmpeg3.x版本
# Clone repo and start building process
cd ~/installs
git clone https://github.com/NVIDIA/VideoProcessingFramework.git
# Export path to CUDA compiler (you may need this sometimes if you install drivers from Nvidia site):
export CUDACXX=/usr/local/cuda-11.3/bin/nvcc
# Now the build itself
cd VideoProcessingFramework
mkdir -p install
mkdir -p build
cd build
# If you want to generate Pytorch extension, set up corresponding CMake value GENERATE_PYTORCH_EXTENSION
cmake .. -DFFMPEG_DIR:PATH="/usr/local/ffmpeg3.4.9" \
-DVIDEO_CODEC_SDK_INCLUDE_DIR:PATH="/usr/local/cuda/include" \
-DPYTHON_LIBRARY=/home/hw/anaconda3/envs/cd_test/lib/libpython3.8.so \
-DPYTHON_EXECUTABLE=/home/hw/anaconda3/envs/cd_test/bin/python3 \
# 编译安装
make -j6 && sudo make install
# 验证是否成功
cd ../install/bin
conda activate cd_test
$ python3 SampleDecodeRTSP.py 0 rtsp://xxxx
This sample decodes multiple videos in parallel on given GPU.
It doesn't do anything beside decoding, output isn't saved.
Usage: SampleDecodeRTSP.py $gpu_id $url1 ... $urlN .
[h264 @ 0x55678af45560] co located POCs unavailable
Input #0, rtsp, from 'rtsp://':
title : Stream
Duration: N/A, start: -0.856438, bitrate: N/A
Stream #0:0: Video: h264 (High), yuv420p(tv, bt709, progressive), 1920x1080 [SAR 1:1 DAR 16:9], 60 fps, 60.08 tbr, 90k tbn, 120.16 tbc
Stream #0:1: Audio: aac (LC), 48000 Hz, stereo, fltp
Output #0, h264, to 'pipe:1':
title : Stream
encoder : Lavf57.83.100
Stream #0:0: Video: h264 (High), yuv420p(tv, bt709, progressive), 1920x1080 [SAR 1:1 DAR 16:9], q=2-31, 60 fps, 60.08 tbr, 60.08 tbn, 60.08 tbc
Stream mapping:
Stream #0:0 -> #0:0 (copy)
Press [q] to stop, [?] for help
3e123055-63a0-45f4-b8ac-82cf60f321ea 508kB time=00:00:03.52 bitrate=1180.1kbits/s speed=1.11x
3e123055-63a0-45f4-b8ac-82cf60f321ea1985kB time=00:00:05.57 bitrate=2916.0kbits/s speed=1.07x
3e123055-63a0-45f4-b8ac-82cf60f321ea2749kB time=00:00:06.59 bitrate=3416.6kbits/s speed=1.06x
3e123055-63a0-45f4-b8ac-82cf60f321ea3448kB time=00:00:07.58 bitrate=3721.1kbits/s speed=1.05x
如果需要在其他工程中使用VPF,则拷贝编译好的PyNvCodec.cpython-38-x86_64-linux-gnu.so文件到工程主目录下,或者在工程代码中使用sys.path.append(’/root/user/installs/VideoProcessingFramework/install/bin’)来添加,还可以将生成的.so文件拷贝到使用的Python包路径(例如cp PyNvCodec.cpython-38-x86_64-linux-gnu.so /root/conda/envs/env_name/lib/python3.8/site-packages/)。
# Starting from Python 3.8 DLL search policy has changed.
# We need to add path to CUDA DLLs explicitly.
import multiprocessing
import sys
import os
import threading
from typing import Dict
import cv2
if os.name == 'nt':
# Add CUDA_PATH env variable
cuda_path = os.environ["CUDA_PATH"]
if cuda_path:
print("CUDA_PATH environment variable is not set.", file=sys.stderr)
print("Can't set CUDA DLLs search path.", file=sys.stderr)
# Add PATH as well for minor CUDA releases
sys_path = os.environ["PATH"]
if sys_path:
paths = sys_path.split(';')
for path in paths:
if os.path.isdir(path):
print("PATH environment variable is not set.", file=sys.stderr)
import PyNvCodec as nvc
import numpy as np
from io import BytesIO
from multiprocessing import Process
import subprocess
import uuid
import json
import pycuda.driver as cuda
def get_stream_params(url: str) -> Dict:
cmd = [
'-v', 'quiet',
'-print_format', 'json',
'-show_format', '-show_streams', url]
proc = subprocess.Popen(cmd, stdout=subprocess.PIPE)
stdout = proc.communicate()[0]
bio = BytesIO(stdout)
json_out = json.load(bio)
params = {}
if not 'streams' in json_out:
return {}
for stream in json_out['streams']:
if stream['codec_type'] == 'video':
params['width'] = stream['width']
params['height'] = stream['height']
params['framerate'] = float(eval(stream['avg_frame_rate']))
codec_name = stream['codec_name']
is_h264 = True if codec_name == 'h264' else False
is_hevc = True if codec_name == 'hevc' else False
if not is_h264 and not is_hevc:
raise ValueError("Unsupported codec: " + codec_name +
'. Only H.264 and HEVC are supported in this sample.')
params['codec'] = nvc.CudaVideoCodec.H264 if is_h264 else nvc.CudaVideoCodec.HEVC
pix_fmt = stream['pix_fmt']
is_yuv420 = pix_fmt == 'yuv420p'
is_yuv444 = pix_fmt == 'yuv444p'
# YUVJ420P and YUVJ444P are deprecated but still wide spread, so handle
# them as well. They also indicate JPEG color range.
is_yuvj420 = pix_fmt == 'yuvj420p'
is_yuvj444 = pix_fmt == 'yuvj444p'
if is_yuvj420:
is_yuv420 = True
params['color_range'] = nvc.ColorRange.JPEG
if is_yuvj444:
is_yuv444 = True
params['color_range'] = nvc.ColorRange.JPEG
if not is_yuv420 and not is_yuv444:
raise ValueError("Unsupported pixel format: " +
pix_fmt +
'. Only YUV420 and YUV444 are supported in this sample.')
params['format'] = nvc.PixelFormat.NV12 if is_yuv420 else nvc.PixelFormat.YUV444
# Color range default option. We may have set when parsing
# pixel format, so check first.
if 'color_range' not in params:
params['color_range'] = nvc.ColorRange.MPEG
# Check actual value.
if 'color_range' in stream:
color_range = stream['color_range']
if color_range == 'pc' or color_range == 'jpeg':
params['color_range'] = nvc.ColorRange.JPEG
# Color space default option:
params['color_space'] = nvc.ColorSpace.BT_601
# Check actual value.
if 'color_space' in stream:
color_space = stream['color_space']
if color_space == 'bt709':
params['color_space'] = nvc.ColorSpace.BT_709
return params
return {}
def rtsp_client(url: str, name: str, gpu_id: int) -> None:
# Get stream parameters
params = get_stream_params(url)
if not len(params):
raise ValueError("Can not get " + url + ' streams params')
w = params['width']
h = params['height']
f = params['format']
c = params['codec']
g = gpu_id
# Prepare ffmpeg arguments
if nvc.CudaVideoCodec.H264 == c:
codec_name = 'h264'
elif nvc.CudaVideoCodec.HEVC == c:
codec_name = 'hevc'
bsf_name = codec_name + '_mp4toannexb,dump_extra=all'
cmd = [
'ffmpeg', '-hide_banner',
'-loglevel', 'quiet',
'-i', url,
'-c:v', 'copy',
'-bsf:v', bsf_name,
'-f', codec_name,
# Run ffmpeg in subprocess and redirect it's output to pipe
proc = subprocess.Popen(cmd, stdout=subprocess.PIPE)
cuda_ctx = cuda.Device(gpu_id).retain_primary_context()
cuda_str = cuda.Stream()
# Create HW decoder class
nvdec = nvc.PyNvDecoder(w, h, f, c, g)
nvCvt = nvc.PySurfaceConverter(w, h, nvc.PixelFormat.NV12, nvc.PixelFormat.BGR, cuda_ctx.handle, cuda_str.handle)
nvDwn = nvc.PySurfaceDownloader(w, h, nvCvt.Format(), cuda_ctx.handle, cuda_str.handle)
frameSize = int(w*h*3)
rawFrame = np.ndarray(shape=(frameSize), dtype=np.uint8)
cc_ctx = None
# Amount of bytes we read from pipe first time.
read_size = 4096
# Total bytes read and total frames decded to get average data rate
rt = 0
fd = 0
# Main decoding loop, will not flush intentionally because don't know the
# amount of frames available via RTSP.
while True:
# Pipe read underflow protection
if not read_size:
read_size = int(rt / fd)
# Counter overflow protection
rt = read_size
fd = 1
# Read data.
# Amount doesn't really matter, will be updated later on during decode.
bits = proc.stdout.read(read_size)
if not len(bits):
print("Can't read data from pipe")
rt += len(bits)
# Decode
enc_packet = np.frombuffer(buffer=bits, dtype=np.uint8)
pkt_data = nvc.PacketData()
surface_nv12 = nvdec.DecodeSurfaceFromPacket(enc_packet, pkt_data)
if not surface_nv12.Empty():
fd += 1
# Shifts towards underflow to avoid increasing vRAM consumption.
if pkt_data.bsl < read_size:
read_size = pkt_data.bsl
# Print process ID every second or so.
fps = int(params['framerate'])
#if not fd % fps:
# print(name)
if cc_ctx is None:
cspace = params['color_space']
crange = nvc.ColorRange.MPEG
cc_ctx = nvc.ColorspaceConversionContext(cspace, crange)
surface_bgr = nvCvt.Execute(surface_nv12, cc_ctx)
if surface_bgr.Empty():
if not nvDwn.DownloadSingleSurface(surface_bgr, rawFrame):
img_bgr = rawFrame.reshape((h, w, 3))
# Handle HW exceptions in simplest possible way by decoder respawn
except nvc.HwResetException:
nvdec = nvc.PyNvDecoder(w, h, f, c, g)
if __name__ == "__main__":
print("This sample decodes multiple videos in parallel on given GPU.")
print("It doesn't do anything beside decoding, output isn't saved.")
print("Usage: SampleDecodeRTSP.py $gpu_id $url1 ... $urlN .")
if(len(sys.argv) < 3):
print("Provide gpu ID and input URL(s).")
gpuID = int(sys.argv[1])
urls = []
for i in range(2, len(sys.argv)):
pool = []
for url in urls:
client = Process(target=rtsp_client, args=(
url, str(uuid.uuid4()), gpuID))
for client in pool:
ps: 经测试,解码+色彩空间转换,由40%的cpu使用率降到了6%, 但是nvDwn.DownloadSingleSurface从gpu下载到cpu,使用率又升到了24%。所以尽可能的不用下载到cpu直接送入推理,全流程gpu才是王道。