YUV软硬解数据的复制与渲染实现方式

零. 前言

在之前的YUV420数据格式学习中，介绍了420P和420SP的格式区别，主要区别在于：前者有三个平面（U和V各为一个平面，宽度和高度均为Y平面的1/2），后者有两个平面（U和V处于一个平面，宽度和Y平面相等，高度为Y平面的1/2）。

之前有个业务需求，需要用到视频流回调的数据进行另外的渲染，但由于视频流和我的业务不是在一个线程处理，如果直接用视频流的数据，有可能我的业务在使用的时候，视频流已经将数据释放掉了，会导致异常。因此，需要对视频流回调的数据进行复制处理。

而由于播放视频有软解/硬解两种方式，因此还需要将视频组的同事提供的接口对软硬解进行区分复制和渲染。

一. 软解和硬解

软解和硬解最根本的区别是：视频解码实现的载体不一样，软解交由CPU处理，而硬解交由GPU处理。

众所周知，视频解码涉及到了大量的运算，而GPU是处理复杂运算的好手，因此，交由GPU处理无疑是更好的选择，这样会大大减少CPU的负担，从而降低发热、卡顿等负面影响的可能性。目前主流的视频播放都是硬解，但也有软解可选。

在iOS开发中，硬解提供的数据格式是CVPixelBufferRef，而软解提供的数据格式是一个二维数组的二进制数据。

二. 硬解数据的复制与渲染

1. 硬解数据的复制

硬解的复制需要先按原有的CVPixelBufferRef的宽高，创建一个新的CVPixelBufferRef：

CVPixelBufferLockBaseAddress(pixelBuffer, 0);
int bufferWidth = (int)CVPixelBufferGetWidth(pixelBuffer);
int bufferHeight = (int)CVPixelBufferGetHeight(pixelBuffer);

OSType pixelFormat = CVPixelBufferGetPixelFormatType(pixelBuffer);

CVPixelBufferRef pixelBufferCopy = NULL;
NSDictionary *pixelAttributes = @{
    (id)kCVPixelBufferIOSurfacePropertiesKey : @{},
    (id)kCVPixelBufferOpenGLESCompatibilityKey : @(YES),
    (id)kCVPixelBufferMetalCompatibilityKey : @(YES),
};
CVReturn status = CVPixelBufferCreate(kCFAllocatorDefault, bufferWidth, bufferHeight, pixelFormat, (__bridge CFDictionaryRef)pixelAttributes, &pixelBufferCopy);
if (status != kCVReturnSuccess) {
    CVPixelBufferUnlockBaseAddress(pixelBuffer, 0);
    return NULL;
}
CVPixelBufferLockBaseAddress(pixelBufferCopy, 0);

然后可以通过GPU渲染到目标CVPixelBufferRef中，复制过程为：CVPixelBufferRef -> CIImage -> CVPixelBufferRef，需要先初始化CIContext：

EAGLContext *context = [[EAGLContext alloc] initWithAPI:kEAGLRenderingAPIOpenGLES2];
_ciContext = [CIContext contextWithEAGLContext:context];

然后获取CVPixelBufferRef对应的CIImage对象后，使用CIContext将其渲染到新的CVPixelBufferRef对象中去：

CIImage *ciImage = [CIImage imageWithCVPixelBuffer:pixelBuffer];

[_ciContext render:ciImage toCVPixelBuffer:pixelBufferCopy];

上述的操作中，获取CIImage是在CPU操作，而CIContext的render操作则是在GPU进行的。

最后验证下复制后生成的CVPixelBufferRef格式是否与原有的一致即可：

int yDstBufferWidth = (int)CVPixelBufferGetBytesPerRowOfPlane(pixelBufferCopy, 0);
int yDstBufferHeight = (int)CVPixelBufferGetHeightOfPlane(pixelBufferCopy, 0);

int uvDstBufferWidth = (int)CVPixelBufferGetBytesPerRowOfPlane(pixelBufferCopy, 1);
int uvDstBufferHeight = (int)CVPixelBufferGetHeightOfPlane(pixelBufferCopy, 1);

int ySrcBufferWidth = (int)CVPixelBufferGetBytesPerRowOfPlane(pixelBuffer, 0);
int ySrcBufferHeight = (int)CVPixelBufferGetHeightOfPlane(pixelBuffer, 0);

int uvSrcBufferWidth = (int)CVPixelBufferGetBytesPerRowOfPlane(pixelBuffer, 1);
int uvSrcBufferHeight = (int)CVPixelBufferGetHeightOfPlane(pixelBuffer, 1);

if (ySrcBufferWidth * ySrcBufferHeight != yDstBufferWidth * yDstBufferHeight) {
    CVPixelBufferUnlockBaseAddress(pixelBuffer, 0);
    CVPixelBufferUnlockBaseAddress(pixelBufferCopy, 0);
    CVPixelBufferRelease(pixelBufferCopy);
    return NULL;
}

if (uvSrcBufferWidth * uvSrcBufferHeight != uvDstBufferWidth * uvDstBufferHeight) {
    CVPixelBufferUnlockBaseAddress(pixelBuffer, 0);
    CVPixelBufferUnlockBaseAddress(pixelBufferCopy, 0);
    CVPixelBufferRelease(pixelBufferCopy);
    return NULL;
}

CVPixelBufferUnlockBaseAddress(pixelBuffer, 0);
CVPixelBufferUnlockBaseAddress(pixelBufferCopy, 0);

return pixelBufferCopy;

2. 硬解数据的渲染

前面的文章也提到，CVPixelBufferRef格式是YUV420SP格式，分为y分量和uv分量，y分量对应planeIndex = 0且pixelFormat = MTLPixelFormatR8Unorm，uv分量对应planeIndex = 1且pixelFormat = MTLPixelFormatRG8Unorm。

提取到y分量和uv分量的纹理数据之后，渲染时根据转换矩阵，即可获得RGB格式的输出。

- (id )textureWithPixelBuffer:(CVMetalTextureRef)pixelBuffer pixelFormat:(MTLPixelFormat)pixelFormat planeIndex:(NSInteger)planeIndex {
    id  texture = nil;

    // planeIndex为0时是Y分量数据，planeIndex为1时是UV分量数据
    size_t width = CVPixelBufferGetWidthOfPlane(pixelBuffer, planeIndex);
    size_t height = CVPixelBufferGetHeightOfPlane(pixelBuffer, planeIndex);
    CVMetalTextureRef textureRef = NULL;
    CVReturn status = CVMetalTextureCacheCreateTextureFromImage(NULL, _textureCache, pixelBuffer, NULL, pixelFormat, width, height, planeIndex, &textureRef);
    if (status == kCVReturnSuccess) {
        texture = CVMetalTextureGetTexture(textureRef);
        CFRelease(textureRef);
    } else {
        texture = nil;
    }
    return texture;
}

对应的Shader如下：

float3 rgbFromYuv(float2 textureCoor,
                  texture2d  textureY,
                  texture2d  textureUV,
                  constant CCAlphaVideoMetalConvertMatrix *convertMatrix) {
    
    constexpr sampler textureSampler (mag_filter::linear,
                                      min_filter::linear);
    float3 yuv = float3(textureY.sample(textureSampler, textureCoor).r,
                        textureUV.sample(textureSampler, textureCoor).rg);
    
    return convertMatrix->matrix * (yuv + convertMatrix->offset);
}

fragment float4 movieFragment(SingleInputVertexIO input [[ stage_in ]],
                               texture2d  textureY [[ texture(0) ]],
                               texture2d  textureUV [[ texture(1) ]],
                               constant CCAlphaVideoMetalConvertMatrix *convertMatrix [[ buffer(0) ]]) {
    float3 rgb = rgbFromYuv(input.textureCoordinate, textureY, textureUV, convertMatrix);
    return float4(rgb, 1.0);
}

对应的转换矩阵如下：

// BT.709
static const CCAlphaVideoMetalConvertMatrix CCAlphaVideoMetalYUVColorConversion709 = {
    .matrix = {
        .columns[0] = { 1.000,  1.000, 1.000, },
        .columns[1] = { 0.000, -0.187, 1.856, },
        .columns[2] = { 1.575, -0.468, 0.000, },
    },
    .offset = { 0.0, -0.5, -0.5 },
};

三. 软解数据的复制与渲染

1. 软解数据的复制

软解根据一个二维数组的二进制数据(unsigned char **pixels)、yuv对应的pitch（int *pitches）、width（int width）、height（int height），来进行复制操作。

其中，二进制数据(unsigned char **pixels)，pixels[0~2]分别对应了y、u、v的数据。根据这张图片我们可以得出，u、v分量的height，为y的height / 2。而yuv对应的width都是一致的，pitch则是各不相同。

由此，我们可以先给复制后的软解数据分配内存空间：

int heights[3] = {height, height / 2, height / 2};
if (_currentPixels == NULL) {
    size_t totalSize = 0;
    for (int i = 0; i < 3; i++) {
        totalSize += pitches[i] * heights[i];
    }
    _currentPixels = malloc(totalSize + 1);
}

然后开始按照下标来复制y、u、v对应的数据：

for (int i = 0; i < 3; i++) {
    unsigned char *src = pixels[i];
    if (src == NULL || _currentPixels == NULL) {
        continue;
    }
    if (malloc_size(_currentPixels[i]) < pitches[i] * heights[i]) {
        _currentPixels[i] = malloc(pitches[i] * heights[i] + 1);
    }
    memcpy(_currentPixels[i], src, pitches[i] * heights[i]);
}

由此，我们就可以得到一个复制后的二进制软解数据啦，当然，除了二进制数据之外，我们还需要保存当前帧对应的pitches、width、height，最后封装成一个Model，进行渲染操作。

渲染完之后记得释放：

- (void)releaseCurrentFFmpegPixels {
    pthread_mutex_lock(&s_ffmpeg_buffer_lock);
    if (_currentPixels != NULL) {
        for (int i = 0; i < 3; i++) {
            if (_currentPixels[i] != NULL) {
                free(_currentPixels[i]);
                _currentPixels[i] = NULL;
            }
        }
        _currentPixels = NULL;
    }
    pthread_mutex_unlock(&s_ffmpeg_buffer_lock);
}

2. 软解数据的渲染

由于产生纹理数据是一个比较耗资源的操作，因此在渲染前就要进行纹理数据的空间生成了，为了避免渲染的时候产生一些异常，首先需要验证这个封装Model的数据格式是否正确

- (BOOL)isValidCallBackModel:(CCMetalFFmpegCallbackModel *)model {
    int planes = model.planes;
    NSInteger maxTextureSize = [CCMetalTexture maxTextureSize];
    if (planes != 3 || model.pixels == NULL || model.pitches == NULL || model.width <= 0 || model.height <= 0 || model.height > maxTextureSize || model.width > maxTextureSize) {
        return NO;
    }
    for (int i = 0; i < 3; i++) {
        if (model.pitches[i] <= 0 || model.pitches[i] > maxTextureSize || model.pixels[i] == NULL || !model.pixels[i] || malloc_size(model.pixels[i]) <= 0) {
            return NO;
        }
    }
    return YES;
}

这里的maxTextureSize需要根据各个机型来适配，避免渲染时有Assert抛出。

// https://stackoverflow.com/questions/58366416/how-to-get-programmatically-the-maximum-texture-size-width-and-height
+ (NSInteger)maxTextureSize {
    NSInteger maxTextureSize = 4096;
    
    id  device = MTLCreateSystemDefaultDevice();
    
    if ([device supportsFeatureSet:MTLFeatureSet_iOS_GPUFamily3_v1]) {
        maxTextureSize = 16384;
    } else if ([device supportsFeatureSet:MTLFeatureSet_iOS_GPUFamily2_v2] || [device supportsFeatureSet:MTLFeatureSet_iOS_GPUFamily1_v2]) {
        maxTextureSize = 8192;
    } else {
        maxTextureSize = 4096;
    }
    
    return maxTextureSize;
}

然后就可以初始化我们的纹理内容了，分别根据y、u、v数据的pitch，和对应的height，来进行对应纹理的创建：

- (void)setupWithCallbackModel:(CCMetalFFmpegCallbackModel *)callbackModel {
    if (![self isValidCallBackModel:callbackModel]) {
        return;
    }
    
    const GLsizei widths[3]    = { callbackModel.pitches[0], callbackModel.pitches[1], callbackModel.pitches[2] };
    const GLsizei heights[3]   = { callbackModel.height,          callbackModel.height / 2,      callbackModel.height / 2 };
    
    for (int i = 0; i < 3; i++) {
        CCMetalTexture *inputTexture = [[CCMetalTexture alloc] init];
        MTLTextureDescriptor *textureDescriptor = [MTLTextureDescriptor texture2DDescriptorWithPixelFormat:MTLPixelFormatR8Unorm width:widths[i] height:heights[i] mipmapped:NO];
        textureDescriptor.usage = MTLTextureUsageShaderRead | MTLTextureUsageShaderWrite | MTLTextureUsageRenderTarget;
        id  texture = [_renderContext.device newTextureWithDescriptor:textureDescriptor];
        inputTexture.texture = texture;
        self.textureDict[@(i)] = inputTexture;
    }
}

渲染的时候，我们就可以使用- (void)replaceRegion:(MTLRegion)region mipmapLevel:(NSUInteger)level withBytes:(const void *)pixelBytes bytesPerRow:(NSUInteger)bytesPerRow这个方法，对对应纹理的yuv数据，分别进行替代（复制）了。

NSMutableArray *inputTextureArray = [NSMutableArray array];
for (int i = 0; i < 3; i++) {
    if (pixels[i] == NULL) {
        continue;
    }
    CCMetalTexture *inputTexture = self.textureDict[@(i)];
    
    id  currentTexture = inputTexture.texture;
    if (widths[i] == currentTexture.width && widths[i] > 0) {
        MTLRegion region = MTLRegionMake2D(0, 0, currentTexture.width, currentTexture.height);
        [inputTexture.texture replaceRegion:region mipmapLevel:0 withBytes:pixels[i] bytesPerRow:widths[i]];
    }
    [inputTextureArray addObject:inputTexture];
}

对应Shader如下，和硬解渲染不一样（y单独采样，uv一起采样），我们需要分别对y、u、v进行采样：

ragment float4 movieByPixelsFragment(SingleInputVertexIO input [[ stage_in ]],
                                      texture2d  textureY [[ texture(0) ]],
                                      texture2d  textureU [[ texture(1) ]],
                                      texture2d  textureV [[ texture(2) ]],
                                      constant CCAlphaVideoMetalConvertMatrix *convertMatrix [[ buffer(0) ]]) {
    float2 textureCoor = input.textureCoordinate;
    constexpr sampler textureSampler (mag_filter::linear,
                                      min_filter::linear);
    
    float y = textureY.sample(textureSampler, textureCoor).r;
    float u = textureU.sample(textureSampler, textureCoor).r;
    float v = textureV.sample(textureSampler, textureCoor).r;
    
    float3 yuv = float3(y, u, v);
    
    float3 rgb = convertMatrix->matrix * (yuv + convertMatrix->offset);
    rgb.r = min(max(rgb.r, 0.0), 1.0);
    rgb.g = min(max(rgb.g, 0.0), 1.0);
    rgb.b = min(max(rgb.b, 0.0), 1.0);

    return float4(rgb, 1.0);
}

四. 总结

硬解回调的是CVPixelBufferRef格式，复制时使用CIContext接口，通过GPU进行复制；渲染时y单独采样，uv一起采样；

软解回调的是unsigned char **格式，复制时需要对每个分量进行空间分配和memcpy操作，渲染后记得free释放掉；渲染时，y、u、v相互独立，单独采样。

涉及到内存空间的东西，真的很坑= =，踩了好多坑，希望这篇文章对你有帮助。