Metal与图形渲染五:链式架构的实现

零. 前言

在之前提到的渲染指令都是单次渲染,但当我们需要复用之前渲染的结果的时候,单次渲染显然就不能满足我们的需求,因此,链式结构就应运而生了。在链式结构中,我们可以利用一次渲染产生的输出再次作为输入,最后渲染到屏幕上,例如,我们依旧采取Metal与图形渲染二:透明图片的渲染的例子,需要得到透明图片的效果。

我们之前的实现原理其实是一次渲染实现的:

之前的实现会导致所有渲染操作都堆在一次渲染,导致OC层、Metal层的代码全部一次性放在一个地方,难以维护。

这次,我们不再把代码堆砌到一次渲染中实现,而是用链式结构来实现这个效果:

而链式结构的代码会更加直观简洁,更重要的是,无论后续想复用Picture的纹理,亦或是某个Filter的纹理,只需要在该Filter再加一层链即可再次复用。

提起链式结构,就不得不提到大神库GPUImage3了,该库可以支持一次渲染多次使用,但由于该库语言是基于Swift来编写的,除此之外,GPUImage3在处理视频还有致命的高CPU和高内存问题,一个视频没播放完内存就已经爆了,搜了下issue,19年就有人提到相关问题,但作者的回复也仅仅是 "We still have a lot of work to do on the inputs and outputs to get this to be ready for regular use."

坑爹..这样的开源库用来播放特效,怕是基本的需求都搞不定,再加上目前项目中运用的还是OC,没办法,只能借鉴前人的思路,自己手撸一个链式框架了,还得把开源库的坑给填掉。

一. 基本架构

链式结构的工作流程如下图所示:

而实现该工作流程的基础组成部分有:

基础库MetalKit、渲染层Renderer、纹理生产者Provider、纹理消费者Consumer,他们的关系如下图所示。

二. 渲染原理及基础组成部分

在介绍组成部分前,我们有必要简要回顾介绍一下单次渲染操作的流程图,即,在单次渲染操作中,一个输入源(UIImage)是如何通过层层处理渲染到屏幕上面的:

--- 初始化阶段 ---

  1. 配置 Device 、 Queue、MTKView(初始化阶段,只初始化一次)
  2. 配置 PipelineState (设置和.metal文件映射方法,只初始化一次)
  3. 创建资源,读取纹理MTLTexture(只初始化一次)
  4. 设置顶点MTLBuffer(最好只初始化一次)

--- 渲染阶段,drawInMTKView回调,每帧渲染一次 ---

  1. 根据Queue获取 CommandBuffer
  2. 根据CommandBuffer和RenderPassDescriptor配置 CommandBufferEncoder
  3. Encoder Buffer 【如有需要的话可以用 Threadgroups 来分组 Encoder 数据】

--- 结束,提交渲染命令,在完成渲染后,将命令缓存区提交至GPU ---

  1. 提交到 Queue 中


我们可以看到,在单次渲染操作中,有些部分是只会初始化一次,而有些部分需要频繁地创建和读取。

在本次链式结构中,对于一次链式渲染(从UIImage到MTKView)来说,我们只需要创建一次的内容包括:Device、CommanQueue、CommandBuffer、Library、Pipeline。

而需要多次读取的内容为CommandEncoder,多次Encode之后,直到MTKView,将该次渲染所有Encode操作得到的CommandBuffer提交Commit,让GPU进行渲染。

1. 基础库MetalKit

MetalKit负责管理和存储只需要创建一次的内容,基本都是Lazy Load得到的,这样就避免了渲染的时候频繁创建对象,消耗CPU和内存。

- (id)device {
    if (!_device) {
        _device = MTLCreateSystemDefaultDevice();
    }
    return _device;
}

- (id)commandQueue {
    if (!_commandQueue) {
        _commandQueue = [self.device newCommandQueue];
    }
    return _commandQueue;
}

- (id)commandBuffer {
    if (!_commandBuffer) {
        _commandBuffer = self.commandQueue.commandBuffer;
    }
    return _commandBuffer;
}

- (id)library {
    if (!_library) {
        NSString *libPath = [METAL_BUNDLE pathForResource:@"alpha_video_renderer" ofType:@"metallib"];
        if (!libPath) {
            NSAssert(NO, @"[HobenMetalKit] libPath is nil!");
            [CCAlphaVideoUtils handleMetalSetupError:CCAlphaVideoMetalErrorTypeLibLoadError reason:@"libPath is nil"];
            HobenLog(@"[HobenMetalKit] libPath is nil!");
            return nil;
        }
        NSError *error;
        id  defaultLibrary = [MTL_DEVICE newLibraryWithFile:libPath error:&error];
        if (error || !defaultLibrary) {
            [CCAlphaVideoUtils handleMetalSetupError:CCAlphaVideoMetalErrorTypeLibLoadError reason:@"defaultLibrary load failed"];
            HobenLog(@"[HobenMetalKit] newLibraryWithFile error: %@", error);
            return nil;
        }
        _library = defaultLibrary;
    }
    return _library;
}

- (NSMutableDictionary> *)pipelineDict {
    if (!_pipelineDict) {
        _pipelineDict = [NSMutableDictionary dictionary];
    }
    return _pipelineDict;
}

这里将Pipeline管理也放到MetalKit中,加入缓存机制,同样也是为了避免渲染中频繁创建管线

+ (id )pipelineStateWithVertexName:(NSString *)vertexName fragmentName:(NSString *)fragmentName {
    NSMutableDictionary *pipelineDict = [HobenMetalKit sharedInstance].pipelineDict;
    NSString *vName = vertexName ?: @"oneInputVertex";
    NSString *fName = fragmentName ?: @"passthroughFragment";
    NSString *key = [NSString stringWithFormat:@"%@_%@", vName, fName];
    id  cachedPipeline = pipelineDict[key];
    if (cachedPipeline) {
        [HobenMetalKit sharedInstance].didLoadMetalLibSuccess = YES;
        return cachedPipeline;
    }
    MTLRenderPipelineDescriptor *pipelineDesc = [MTLRenderPipelineDescriptor new];
    id  library = [self sharedLibrary];
    id  vertexFunction = [library newFunctionWithName:vName];
    id  fragmentFunction = [library newFunctionWithName:fName];
    if (!vertexFunction || !fragmentFunction) {
        NSAssert(NO, @"fuction is nil");
        return nil;
    }
    pipelineDesc.vertexFunction = vertexFunction;
    pipelineDesc.fragmentFunction = fragmentFunction;
    pipelineDesc.colorAttachments[0].pixelFormat = MTLPixelFormatBGRA8Unorm;

    NSError *pipelineError;
    id  pipelineState = [[self sharedDevice] newRenderPipelineStateWithDescriptor:pipelineDesc error:nil];
    if (pipelineError) {
        [CCAlphaVideoUtils handleMetalSetupError:CCAlphaVideoMetalErrorTypeLibLoadError reason:@"pipelinestate error"];
        HobenLog(@"[CCAlphaVideoMetalFunctionLoader] pipelinestate error: %@", pipelineError);
    }
    if (pipelineState) {
        [HobenMetalKit sharedInstance].didLoadMetalLibSuccess = YES;
    }
    pipelineDict[key] = pipelineState;
    return pipelineState;
}

2. 渲染层Renderer

渲染层的主要目的是将传进来的Pipeline、顶点坐标、各种缓冲、输入的纹理进行操作,进行Encode操作后得到输出的纹理

/**
 单次渲染操作
 @param pipelineState 渲染管线
 @param inputTextures 输入的纹理,结构体包含纹理数据和纹理坐标
 @param imageVertices 顶点坐标,输入nil则为默认顶点坐标
 @param vertexBuffers 顶点着色器缓冲数组
 @param fragmentBuffers 片段着色器缓冲数组
 @param loadAction 读取/清除之前渲染的内容,默认MTLLoadActionClear
 @param outputTexture 输出的纹理,可复用
 */
+ (void)renderQuad:(id )pipelineState
     inputTextures:(NSArray  *)inputTextures
     imageVertices:(nullable NSArray *)imageVertices
     vertexBuffers:(nullable NSArray > *)vertexBuffers
   fragmentBuffers:(nullable NSArray > *)fragmentBuffers
        loadAction:(MTLLoadAction)loadAction
     outputTexture:(id )outputTexture {
    
    NSAssert(!imageVertices || imageVertices.count == 8, @"imageVertices.count must be 8");
    
    AUTO_RELEASE_BEGIN
        
    if (!pipelineState) {
        NSAssert(NO, @"pipelineState is nil");
        return;
    }
    NSArray *defaultImageVertices = @[
        @-1.0, @1.0,
        @1.0, @1.0,
        @-1.0, @-1.0,
        @1.0, @-1.0,
    ];
    NSArray *vertice = imageVertices ?: defaultImageVertices;
    float verticeCoordinates[8] = {
        [vertice[0] floatValue], [vertice[1] floatValue],
        [vertice[2] floatValue], [vertice[3] floatValue],
        [vertice[4] floatValue], [vertice[5] floatValue],
        [vertice[6] floatValue], [vertice[7] floatValue],
    };
    id  vertexBuffer = [[HobenMetalKit sharedDevice] newBufferWithBytes:verticeCoordinates length:sizeof(verticeCoordinates) options:MTLResourceStorageModeShared];
    
    MTLRenderPassDescriptor *renderPass = [MTLRenderPassDescriptor renderPassDescriptor];
    renderPass.colorAttachments[0].texture = outputTexture;
    renderPass.colorAttachments[0].clearColor = MTLClearColorMake(0, 0, 0, 0);
    renderPass.colorAttachments[0].storeAction = MTLStoreActionStore;
    renderPass.colorAttachments[0].loadAction = loadAction;
    
    id  renderEncoder = [MTL_COMMAND_BUFFER renderCommandEncoderWithDescriptor:renderPass];
    [renderEncoder setRenderPipelineState:pipelineState];
    [renderEncoder setVertexBuffer:vertexBuffer offset:0 atIndex:0];
    
    for (NSInteger i = 0; i < vertexBuffers.count; i++) {
        id  extraVertexBuffer = vertexBuffers[i];
        [renderEncoder setVertexBuffer:extraVertexBuffer offset:0 atIndex:1 + i];
    }
    
    for (NSInteger i = 0; i < inputTextures.count; i++) {
        HobenMetalTexture *texture = inputTextures[i];
        if (![texture isKindOfClass:[HobenMetalTexture class]]) {
            NSAssert(NO, @"texture class must be HobenMetalTexture");
            [renderEncoder setVertexBuffer:nil offset:0 atIndex:1 + i + vertexBuffers.count];
            [renderEncoder setFragmentTexture:nil atIndex:i];
            continue;
        }
        NSArray *textureCoor = texture.textureCoordinates;
        NSAssert(textureCoor.count == 8, @"textureCoor.count must be 8");
        float textureCoordinates[8] = {
            [textureCoor[0] floatValue], [textureCoor[1] floatValue],
            [textureCoor[2] floatValue], [textureCoor[3] floatValue],
            [textureCoor[4] floatValue], [textureCoor[5] floatValue],
            [textureCoor[6] floatValue], [textureCoor[7] floatValue],
        };
        id  textureBuffer = [[HobenMetalKit sharedDevice] newBufferWithBytes:textureCoordinates length:sizeof(textureCoordinates) options:MTLResourceStorageModeShared];
        [renderEncoder setVertexBuffer:textureBuffer offset:0 atIndex:1 + i + vertexBuffers.count];
        [renderEncoder setFragmentTexture:texture.texture atIndex:i];
    }
    
    for (NSInteger i = 0; i < fragmentBuffers.count; i++) {
        id  fragmentBuffer = fragmentBuffers[i];
        [renderEncoder setFragmentBuffer:fragmentBuffer offset:0 atIndex:i];
    }
    [renderEncoder drawPrimitives:MTLPrimitiveTypeTriangleStrip vertexStart:0 vertexCount:4];
    [renderEncoder endEncoding];
        
    AUTO_RELEASE_END
}

3. 纹理生产者Provider

生产者的主要工作是根据渲染层获得的纹理,提供给对应的消费者,从而进行下一步操作,在这里我们定义了Provider需要遵循的协议:

@protocol HobenMetalProviderProtocol 

- (void)transmitTexture:(id)texture
                 target:(id)target
                  index:(NSInteger)index;

@end

再定义一个遵循Provider协议的纹理生产者MetalOutput,该生产者主要是管理自己所拥有的Consumer(根据addTarget方法加入),并在必要时刻通知给对应的Consumer,让其调用相应的方法。

@interface HobenMetalOutput : NSObject  {
    id outputTexture;
}

#pragma mark - Public Method

- (void)addTarget:(id )target {
    NSInteger index = 0;
    if ([target respondsToSelector:@selector(nextAvailableTextureIndex)]) {
        index = [target nextAvailableTextureIndex];
    }
    [self addTarget:target atIndex:index];
}

- (void)addTarget:(id )target atIndex:(NSInteger)index {
    if (!target) {
        return;
    }
    if ([self.targets containsObject:target]) {
        return;
    }
    if ([target respondsToSelector:@selector(textureIndexUnavailable:)]) {
        [target textureIndexUnavailable:index];
    }
    [self.targets addObject:target];
    [self.targetTextureIndices addObject:@(index)];
}

- (void)transmitTextureToAllTargets:(id)texture {
    for (id  target in self.targets) {
        NSInteger indexOfObject = [self.targets indexOfObject:target];
        NSInteger textureIndex = [[self.targetTextureIndices objectAtIndex:indexOfObject] integerValue];
        [self transmitTexture:texture target:target index:textureIndex];
    }
}

#pragma mark - HobenMetalProviderProtocol

- (void)transmitTexture:(id)texture target:(id)target index:(NSInteger)index {
    [target newTextureAvailable:texture index:index];
}

在本架构中,属于生产者的有HobenMetalPicture(根据UIImage获取到纹理)、HobenMetalMovieReader(根据CVPixelBufferRef获取到纹理)、HobenMetalFilter(根据链式上层获取到纹理),他们得到纹理后将会进行处理,输出给链式下层。

4. 纹理消费者Consumer

消费者的主要工作是根据Provider提供的纹理信息,进行进一步操作,在这里我们也定义了Consumer需要遵循的协议:

@protocol HobenMetalConsumerProtocol 

- (void)newTextureAvailable:(id )texture index:(NSInteger)index;

@optional

- (NSInteger)nextAvailableTextureIndex;

- (void)textureIndexUnavailable:(NSInteger)index;

@end

在本架构中,属于消费者的有HobenMetalRenderView(根据获取到的纹理提交渲染指令)、HobenMetalFilter(根据获取到的纹理进行这一层的Encode),他们的职责是根据上一层Provider提供的纹理,在这一层进行编码。

三. 生产者和消费者们

1. 资源处理器

资源处理器,即将一些现有的资源对象(UIImage、CVPixelBufferRef)转化为纹理的工具,他们属于生产者Provider,转化为纹理后可以提供给链式下层Consumer。

HobenMetalPicture根据MTKTextureLoader提供的纹理读取方法,在init的时候就将CGImage转换为了纹理。

- (instancetype)initWithImage:(UIImage *)newImageSource {
    if (self = [self initWithCGImage:newImageSource.CGImage]) {
        
    }
    return self;
}

- (instancetype)initWithCGImage:(CGImageRef)newImageSource {
    if (self = [super init]) {
        [self renderCGImage:newImageSource];
    }
    return self;
}

- (void)renderCGImage:(CGImageRef)cgImage {
    MTKTextureLoader *loader = [[MTKTextureLoader alloc] initWithDevice:MTL_DEVICE];
    NSDictionary *options = @{
        MTKTextureLoaderOptionSRGB : @(NO),
    };
    self.texture = [loader newTextureWithCGImage:cgImage options:options error:nil];
}

当开发者需要开始传递创建好的纹理的时候,调用以下方法即可

- (void)processImage {
    [self transmitTextureToAllTargets:self.texture];
}

HobenMetalMovieReader则需要定义好自己的YUV转换矩阵,加入到片段着色器缓冲当中,原理在Metal与图形渲染三:透明通道视频有提及,这里只是将过去的逻辑抽离得更简洁和可读一点:

- (BOOL)renderPixelBuffer:(CVPixelBufferRef)pixelBuffer {
    AUTO_RELEASE_BEGIN
    
    id  textureY = [self textureWithPixelBuffer:pixelBuffer pixelFormat:MTLPixelFormatR8Unorm planeIndex:0];
    id  textureUV = [self textureWithPixelBuffer:pixelBuffer pixelFormat:MTLPixelFormatRG8Unorm planeIndex:1];
    [self setupMatrixWithPixelBuffer:pixelBuffer];
    
    if (!textureY || !textureUV || !self.convertMatrix) {
        return NO;
    }
    CVPixelBufferLockBaseAddress(pixelBuffer, kCVPixelBufferLock_ReadOnly);
    NSMutableArray *inputTextureArray = [NSMutableArray array];
    for (id  texture in @[textureY, textureUV]) {
        HobenMetalTexture *inputTexture = [[HobenMetalTexture alloc] initWithTexture:texture];
        [inputTextureArray addObject:inputTexture];
    }
    
    CVPixelBufferUnlockBaseAddress(pixelBuffer, kCVPixelBufferLock_ReadOnly);
    
    if (!outputTexture) {
        outputTexture = [HobenMetalTexture defaultTextureByWidth:textureY.width height:textureY.height];
    }
        
    [HobenMetalKit renderQuad:MTL_PIPELINE(@"oneInputVertex", @"movieFragment") inputTextures:inputTextureArray imageVertices:nil vertexBuffers:nil fragmentBuffers:@[_convertMatrix] outputTexture:outputTexture];
    
    [self transmitTextureToAllTargets:outputTexture];
    
    AUTO_RELEASE_END
    
    return YES;
}

2. 中间层Filter

在链式图中,我们可以发现一个很重要的中间层——Filter,它既是生产者,也是消费者,它既可以消费上一层提供的纹理,又可以加入自己想要渲染的管线、缓冲、坐标,进行这一层的渲染,将得到的纹理提供给下一层。

Filter支持多个输入纹理,自己可以编写多个顶点缓冲、纹理缓冲,加上自己对应的Pipeline传递给渲染层,而最终只会得到一个输出。

根据Filter又是生产者又是消费者的特性,我们可以得出,它是一个继承HobenMetalOutput同时遵循HobenMetalConsumerProtocol的类:

@interface HobenMetalFilter : HobenMetalOutput 
{
    NSMutableArray  *inputTextures;
}

由于Filter支持多输入,所以我们需要等待所有的输入源准备好了,再进行该次渲染操作,在渲染时,如果上一层的Provider传来纹理,且所有纹理已经准备完毕,那就可以开始处理了:

- (void)newTextureAvailable:(id)texture index:(NSInteger)index {
    if (!texture) {
        return;
    }
    NSInteger numberOfInputs = MAX(_numberOfInputs, 1);
    
    HobenMetalTexture *inputTexture = [[HobenMetalTexture alloc] initWithTexture:texture];
    inputTexture.textureIndex = index;
    [inputTextures addObject:inputTexture];
    
    if (inputTextures.count < numberOfInputs) {
        return;
    }
    
    if (!outputTexture) {
        outputTexture = [HobenMetalTexture defaultTextureByWidth:texture.width height:texture.height];
    }
    
    [inputTextures sortUsingComparator:^NSComparisonResult(HobenMetalTexture *obj1, HobenMetalTexture *obj2) {
        if (obj1.textureIndex <= obj2.textureIndex) {
            return NSOrderedAscending;
        } else {
            return NSOrderedDescending;
        }
    }];
    [self renderToTextureWithVertices:nil textureCoordinates:nil];
    [inputTextures removeAllObjects];
}

- (void)renderToTextureWithVertices:(NSArray *)vertices textureCoordinates:(NSArray *)textureCoordinates {
    for (HobenMetalTexture *inputTexture in inputTextures) {
        inputTexture.textureCoordinates = textureCoordinates;
    }
    [HobenMetalKit renderQuad:MTL_PIPELINE(_vertexName, _fragmentName) inputTextures:inputTextures imageVertices:vertices outputTexture:outputTexture];
    
    [self transmitTextureToAllTargets:outputTexture];
}

值得注意的是,由于MTLTextureDescriptor创建纹理是一个很耗CPU的操作,因此,我们只创建一次outputTexture就好了(GPUImage3可能是因为这个问题,渲染视频的时候CPU占比很高,坑了我好久。。)

这里将renderToTextureWithVertices:textureCoordinates:抽了出来,开发者可以根据自己的需要自定义顶点坐标或纹理坐标,或者自己实现一套渲染逻辑,比如这次需要用到的裁剪操作CropFilter就是这样实现的:

- (void)calculateCropTextureCoordinates {
    CGFloat minX = _cropRegion.origin.x;
    CGFloat minY = _cropRegion.origin.y;
    CGFloat maxX = CGRectGetMaxX(_cropRegion);
    CGFloat maxY = CGRectGetMaxY(_cropRegion);
    
    _cropTextureCoordinates = @[
        @(minX), @(minY),
        @(maxX), @(minY),
        @(minX), @(maxY),
        @(maxX), @(maxY),
    ];
}

#pragma mark - Override

- (void)renderToTextureWithVertices:(NSArray *)vertices textureCoordinates:(NSArray *)textureCoordinates {
    [super renderToTextureWithVertices:vertices textureCoordinates:_cropTextureCoordinates];
}

3. 输出视图

输出视图继承于MTKView,其职责是将上一层提供的纹理进行展示,属于消费者Consumer,是将编码指令提交给GPU的最终结点。而这次,我们不需要让系统每帧回调drawInMtkView:了,而是我们自己决定调用的时机,代码如下:

@interface HobenMetalRenderView : MTKView 

static const NSUInteger MaxFramesInFlight = 3;

- (void)setup {
    // 设置enableSetNeedsDisplay为NO且paused为YES,开发者自决定draw时机
    self.enableSetNeedsDisplay = NO;
    self.paused = YES;
    self.autoResizeDrawable = YES;
    self.device = MTL_DEVICE;
    self.opaque = NO;
    _inFlightSemaphore = dispatch_semaphore_create(MaxFramesInFlight);
}

- (void)newTextureAvailable:(id)texture index:(NSInteger)index {
    self.drawableSize = CGSizeMake(texture.width, texture.height);
    self.currentTexture = texture;
    [self draw];
}

- (void)drawRect:(CGRect)rect {
    if (!self.currentTexture) {
        return;
    }
    if (!self.currentDrawable) {
        NSAssert(NO, @"drawable is nil");
        return;
    }
    dispatch_semaphore_wait(_inFlightSemaphore, DISPATCH_TIME_FOREVER);
    
    id  commandBuffer = MTL_COMMAND_BUFFER;
    HobenMetalTexture *texture = [[HobenMetalTexture alloc] initWithTexture:self.currentTexture];
    [HobenMetalKit renderQuad:MTL_PASSTHROUGH_PIPELINE inputTextures:@[texture] outputTexture:self.currentDrawable.texture];
    __block dispatch_semaphore_t block_semaphore = _inFlightSemaphore;
    [commandBuffer addCompletedHandler:^(id buffer)
     {
         dispatch_semaphore_signal(block_semaphore);
     }];
    [commandBuffer presentDrawable:self.currentDrawable];
    [commandBuffer commit];
    self.currentTexture = nil;
    [HobenMetalKit resetCommandBuffer];
}

MTKView的currentDrawable也就是当前屏幕的画布,当渲染指令commit完毕后,这次链式结构的所有编码好的命令缓冲就会提交给GPU,至此,该条链式结构就能完成了。

需要注意的是,当CommandBuffer提交上去后,需要重置,下次渲染的时候,会从命令缓冲队列里面再创建一条命令缓冲,直到下次MTKView又将渲染指令提交上去完毕。

四. 业务层的继承和调用

1. 自定义一个Filter

经过这次重构之后,业务层的逻辑显然简洁了很多,如果需要自定义一个Filter,我们只需要指定对应的顶点着色器、片段着色器即可进行操作,有需要的话还可以自定义顶点坐标、片段坐标,例如,裁剪操作CropFilter可以简化为以下代码:

- (instancetype)initWithCropRegin:(CGRect)newCropRegion {
    if (self = [super init]) {
        self.cropRegion = newCropRegion;
    }
    return self;
}

- (void)calculateCropTextureCoordinates {
    CGFloat minX = _cropRegion.origin.x;
    CGFloat minY = _cropRegion.origin.y;
    CGFloat maxX = CGRectGetMaxX(_cropRegion);
    CGFloat maxY = CGRectGetMaxY(_cropRegion);
    
    _cropTextureCoordinates = @[
        @(minX), @(minY),
        @(maxX), @(minY),
        @(minX), @(maxY),
        @(maxX), @(maxY),
    ];
}

#pragma mark - Override

- (void)renderToTextureWithVertices:(NSArray *)vertices textureCoordinates:(NSArray *)textureCoordinates {
    [super renderToTextureWithVertices:vertices textureCoordinates:_cropTextureCoordinates];
}

- (void)setCropRegion:(CGRect)newValue {
    NSParameterAssert(newValue.origin.x >= 0 && newValue.origin.x <= 1 &&
                      newValue.origin.y >= 0 && newValue.origin.y <= 1 &&
                      newValue.size.width >= 0 && newValue.size.width <= 1 &&
                      newValue.size.height >= 0 && newValue.size.height <= 1);

    _cropRegion = newValue;
    [self calculateCropTextureCoordinates];
}

而融合操作由于没有自定义顶点坐标的需求,在OC层就更简单了

- (instancetype)init {
    if (self = [super initWithVertexName:@"twoInputVertex" fragmentName:@"mixFragment" numberOfInputs:2]) {
        
    }
    return self;
}

对应的.metal文件也只是之前的融合操作:

vertex TwoInputVertexIO twoInputVertex(const device packed_float2 *position [[buffer(0)]],
                                       const device packed_float2 *texturecoord [[buffer(1)]],
                                       const device packed_float2 *texturecoord2 [[buffer(2)]],
                                       uint vid [[vertex_id]])
{
    TwoInputVertexIO outputVertices;
    
    outputVertices.position = float4(position[vid], 0, 1.0);
    outputVertices.textureCoordinate = texturecoord[vid];
    outputVertices.textureCoordinate2 = texturecoord2[vid];

    return outputVertices;
}

fragment float4 mixFragment(TwoInputVertexIO fragmentInput [[stage_in]],
                            texture2d inputTexture [[texture(0)]],
                            texture2d inputTexture2 [[texture(1)]])
{
    constexpr sampler quadSampler;
    float4 color1 = inputTexture.sample(quadSampler, fragmentInput.textureCoordinate);
    float4 color2 = inputTexture2.sample(quadSampler, fragmentInput.textureCoordinate2);

    return float4(color1.rgb, color2.r);
}

2. 业务层的调用

业务层需要指定链式结构的走向,也只需要一个可读性非常好的操作:

- (void)viewDidLoad {
    [super viewDidLoad];

    if (!_renderView) {
        _renderView = [[HobenMetalRenderView alloc] initWithFrame:CGRectMake(0, 0, self.view.frame.size.width, self.view.frame.size.height)];
    }
    if (!_cropLeftFilter) {
        _cropLeftFilter = [[HobenMetalCropFilter alloc] initWithCropRegin:CGRectMake(0, 0, .5f, 1.f)];
    }
    
    if (!_cropRightFilter) {
        _cropRightFilter = [[HobenMetalCropFilter alloc] initWithCropRegin:CGRectMake(.5f, 0, .5f, 1.f)];
    }

    if (!_mixFilter) {
        _mixFilter = [[HobenMetalMixFilter alloc] init];
    }

    if (!_picture) {
        _picture = [[HobenMetalPicture alloc] initWithImage:[UIImage imageNamed:@"crop_image"]];
    }
    
    [self.view addSubview:_renderView];
    
    [_picture addTarget:_cropLeftFilter];
    [_picture addTarget:_cropRightFilter];
    
    [_cropLeftFilter addTarget:_mixFilter];
    [_cropRightFilter addTarget:_mixFilter];
    
    [_mixFilter addTarget:_renderView];
    
    [_picture processImage];
}

至此,一个链式结构就完成啦!

五. 内存和CPU优(Cai)化(Keng)的一些思考

GPUImage3处理视频的高CPU和高内存情况,预估原因体现在以下几点:

  1. AutoReleasePool

苹果的对Metal渲染的官方文档是建议使用autoRelease的,对此我们渲染的操作也需要加上这个操作。

  1. 对CommandBuffer的频繁Commit

在GPUImage3的设计中,无论是Provider、Consumer还是Filter,他的每次编码操作之后都进行了一次commit,事实上,对于单次渲染来说,只需要一次commit、多次编码即可完成,而commit恰恰是CPU和GPU沟通的桥梁。

根据苹果官方的描述,Drawable其实是一个非常有限的资源(只有3个),他由系统进行调度,而官方的Sample Code:Synchronizing CPU and GPU Work,建议使用信号量来控制commit,GPUImage3这番频繁的commit估计会很影响CPU的性能。

// The maximum number of frames in flight.
static const NSUInteger MaxFramesInFlight = 3;

...

/// Handles view rendering for a new frame.
- (void)drawInMTKView:(nonnull MTKView *)view
{
    // Wait to ensure only `MaxFramesInFlight` number of frames are getting processed
    // by any stage in the Metal pipeline (CPU, GPU, Metal, Drivers, etc.).
    dispatch_semaphore_wait(_inFlightSemaphore, DISPATCH_TIME_FOREVER);

...

    // Add a completion handler that signals `_inFlightSemaphore` when Metal and the GPU have fully
    // finished processing the commands that were encoded for this frame.
    // This completion indicates that the dynamic buffers that were written-to in this frame, are no
    // longer needed by Metal and the GPU; therefore, the CPU can overwrite the buffer contents
    // without corrupting any rendering operations.
    __block dispatch_semaphore_t block_semaphore = _inFlightSemaphore;
    [commandBuffer addCompletedHandler:^(id buffer)
     {
         dispatch_semaphore_signal(block_semaphore);
     }];

    // Finalize CPU work and submit the command buffer to the GPU.
    [commandBuffer commit];
}
  1. 频繁地使用MTLTextureDescriptor创建outputTexture

在视频的每一帧渲染中,这个是非常非常消耗CPU的,一个视频有非常多帧,每一帧都初始化一个纹理肯定是不行的,因为这个,我渲染视频的CPU飙升到了50%左右,而优化之后CPU维持在10%左右,有多耗性能可想而知,事实上这个也不需要频繁创建,只需要Lazy Load就好了~

下图就是经过优化之后,渲染视频中,CPU和内存的峰值啦:

六. 总结

本次链式化架构的实现,大大地提升渲染逻辑的维护性和可读性,支持按照渲染功能对Filter文件和.metal文件进行分类,简化了业务层开发的逻辑。

即便需要自定义渲染操作,也只需要继承HobenMetalFilter,自行决定所需的顶点着色器、片段着色器、顶点坐标、纹理坐标、顶点缓冲、纹理缓冲即可,非常方便。

该链式结构遵循生产者-消费者结构,将输入作为生产者,输出作为消费者,中间层Filter作为生产者和消费者,从而使得单次的命令缓冲CommandBuffer集成了多个指令编码CommandEncode,最后让MTKView提交命令缓冲至GPU,完成该次渲染。

而本次链式架构不仅用OC完成了开源库GPUImage3的代码逻辑,而且还解决了高内存和高CPU问题,虽然过程比较煎熬,但收获真的很多,继续加油!

你可能感兴趣的:(Metal与图形渲染五:链式架构的实现)