零. 前言
在之前提到的渲染指令都是单次渲染,但当我们需要复用之前渲染的结果的时候,单次渲染显然就不能满足我们的需求,因此,链式结构就应运而生了。在链式结构中,我们可以利用一次渲染产生的输出再次作为输入,最后渲染到屏幕上,例如,我们依旧采取Metal与图形渲染二:透明图片的渲染的例子,需要得到透明图片的效果。
我们之前的实现原理其实是一次渲染实现的:
之前的实现会导致所有渲染操作都堆在一次渲染,导致OC层、Metal层的代码全部一次性放在一个地方,难以维护。
这次,我们不再把代码堆砌到一次渲染中实现,而是用链式结构来实现这个效果:
而链式结构的代码会更加直观简洁,更重要的是,无论后续想复用Picture的纹理,亦或是某个Filter的纹理,只需要在该Filter再加一层链即可再次复用。
提起链式结构,就不得不提到大神库GPUImage3了,该库可以支持一次渲染多次使用,但由于该库语言是基于Swift来编写的,除此之外,GPUImage3在处理视频还有致命的高CPU和高内存问题,一个视频没播放完内存就已经爆了,搜了下issue,19年就有人提到相关问题,但作者的回复也仅仅是 "We still have a lot of work to do on the inputs and outputs to get this to be ready for regular use."
坑爹..这样的开源库用来播放特效,怕是基本的需求都搞不定,再加上目前项目中运用的还是OC,没办法,只能借鉴前人的思路,自己手撸一个链式框架了,还得把开源库的坑给填掉。
一. 基本架构
链式结构的工作流程如下图所示:
而实现该工作流程的基础组成部分有:
基础库MetalKit、渲染层Renderer、纹理生产者Provider、纹理消费者Consumer,他们的关系如下图所示。
二. 渲染原理及基础组成部分
在介绍组成部分前,我们有必要简要回顾介绍一下单次渲染操作的流程图,即,在单次渲染操作中,一个输入源(UIImage)是如何通过层层处理渲染到屏幕上面的:
--- 初始化阶段 ---
- 配置 Device 、 Queue、MTKView(初始化阶段,只初始化一次)
- 配置 PipelineState (设置和.metal文件映射方法,只初始化一次)
- 创建资源,读取纹理MTLTexture(只初始化一次)
- 设置顶点MTLBuffer(最好只初始化一次)
--- 渲染阶段,drawInMTKView回调,每帧渲染一次 ---
- 根据Queue获取 CommandBuffer
- 根据CommandBuffer和RenderPassDescriptor配置 CommandBufferEncoder
- Encoder Buffer 【如有需要的话可以用 Threadgroups 来分组 Encoder 数据】
--- 结束,提交渲染命令,在完成渲染后,将命令缓存区提交至GPU ---
-
提交到 Queue 中
我们可以看到,在单次渲染操作中,有些部分是只会初始化一次,而有些部分需要频繁地创建和读取。
在本次链式结构中,对于一次链式渲染(从UIImage到MTKView)来说,我们只需要创建一次的内容包括:Device、CommanQueue、CommandBuffer、Library、Pipeline。
而需要多次读取的内容为CommandEncoder,多次Encode之后,直到MTKView,将该次渲染所有Encode操作得到的CommandBuffer提交Commit,让GPU进行渲染。
1. 基础库MetalKit
MetalKit负责管理和存储只需要创建一次的内容,基本都是Lazy Load得到的,这样就避免了渲染的时候频繁创建对象,消耗CPU和内存。
- (id)device {
if (!_device) {
_device = MTLCreateSystemDefaultDevice();
}
return _device;
}
- (id)commandQueue {
if (!_commandQueue) {
_commandQueue = [self.device newCommandQueue];
}
return _commandQueue;
}
- (id)commandBuffer {
if (!_commandBuffer) {
_commandBuffer = self.commandQueue.commandBuffer;
}
return _commandBuffer;
}
- (id)library {
if (!_library) {
NSString *libPath = [METAL_BUNDLE pathForResource:@"alpha_video_renderer" ofType:@"metallib"];
if (!libPath) {
NSAssert(NO, @"[HobenMetalKit] libPath is nil!");
[CCAlphaVideoUtils handleMetalSetupError:CCAlphaVideoMetalErrorTypeLibLoadError reason:@"libPath is nil"];
HobenLog(@"[HobenMetalKit] libPath is nil!");
return nil;
}
NSError *error;
id defaultLibrary = [MTL_DEVICE newLibraryWithFile:libPath error:&error];
if (error || !defaultLibrary) {
[CCAlphaVideoUtils handleMetalSetupError:CCAlphaVideoMetalErrorTypeLibLoadError reason:@"defaultLibrary load failed"];
HobenLog(@"[HobenMetalKit] newLibraryWithFile error: %@", error);
return nil;
}
_library = defaultLibrary;
}
return _library;
}
- (NSMutableDictionary> *)pipelineDict {
if (!_pipelineDict) {
_pipelineDict = [NSMutableDictionary dictionary];
}
return _pipelineDict;
}
这里将Pipeline管理也放到MetalKit中,加入缓存机制,同样也是为了避免渲染中频繁创建管线
+ (id )pipelineStateWithVertexName:(NSString *)vertexName fragmentName:(NSString *)fragmentName {
NSMutableDictionary *pipelineDict = [HobenMetalKit sharedInstance].pipelineDict;
NSString *vName = vertexName ?: @"oneInputVertex";
NSString *fName = fragmentName ?: @"passthroughFragment";
NSString *key = [NSString stringWithFormat:@"%@_%@", vName, fName];
id cachedPipeline = pipelineDict[key];
if (cachedPipeline) {
[HobenMetalKit sharedInstance].didLoadMetalLibSuccess = YES;
return cachedPipeline;
}
MTLRenderPipelineDescriptor *pipelineDesc = [MTLRenderPipelineDescriptor new];
id library = [self sharedLibrary];
id vertexFunction = [library newFunctionWithName:vName];
id fragmentFunction = [library newFunctionWithName:fName];
if (!vertexFunction || !fragmentFunction) {
NSAssert(NO, @"fuction is nil");
return nil;
}
pipelineDesc.vertexFunction = vertexFunction;
pipelineDesc.fragmentFunction = fragmentFunction;
pipelineDesc.colorAttachments[0].pixelFormat = MTLPixelFormatBGRA8Unorm;
NSError *pipelineError;
id pipelineState = [[self sharedDevice] newRenderPipelineStateWithDescriptor:pipelineDesc error:nil];
if (pipelineError) {
[CCAlphaVideoUtils handleMetalSetupError:CCAlphaVideoMetalErrorTypeLibLoadError reason:@"pipelinestate error"];
HobenLog(@"[CCAlphaVideoMetalFunctionLoader] pipelinestate error: %@", pipelineError);
}
if (pipelineState) {
[HobenMetalKit sharedInstance].didLoadMetalLibSuccess = YES;
}
pipelineDict[key] = pipelineState;
return pipelineState;
}
2. 渲染层Renderer
渲染层的主要目的是将传进来的Pipeline、顶点坐标、各种缓冲、输入的纹理进行操作,进行Encode操作后得到输出的纹理
/**
单次渲染操作
@param pipelineState 渲染管线
@param inputTextures 输入的纹理,结构体包含纹理数据和纹理坐标
@param imageVertices 顶点坐标,输入nil则为默认顶点坐标
@param vertexBuffers 顶点着色器缓冲数组
@param fragmentBuffers 片段着色器缓冲数组
@param loadAction 读取/清除之前渲染的内容,默认MTLLoadActionClear
@param outputTexture 输出的纹理,可复用
*/
+ (void)renderQuad:(id )pipelineState
inputTextures:(NSArray *)inputTextures
imageVertices:(nullable NSArray *)imageVertices
vertexBuffers:(nullable NSArray > *)vertexBuffers
fragmentBuffers:(nullable NSArray > *)fragmentBuffers
loadAction:(MTLLoadAction)loadAction
outputTexture:(id )outputTexture {
NSAssert(!imageVertices || imageVertices.count == 8, @"imageVertices.count must be 8");
AUTO_RELEASE_BEGIN
if (!pipelineState) {
NSAssert(NO, @"pipelineState is nil");
return;
}
NSArray *defaultImageVertices = @[
@-1.0, @1.0,
@1.0, @1.0,
@-1.0, @-1.0,
@1.0, @-1.0,
];
NSArray *vertice = imageVertices ?: defaultImageVertices;
float verticeCoordinates[8] = {
[vertice[0] floatValue], [vertice[1] floatValue],
[vertice[2] floatValue], [vertice[3] floatValue],
[vertice[4] floatValue], [vertice[5] floatValue],
[vertice[6] floatValue], [vertice[7] floatValue],
};
id vertexBuffer = [[HobenMetalKit sharedDevice] newBufferWithBytes:verticeCoordinates length:sizeof(verticeCoordinates) options:MTLResourceStorageModeShared];
MTLRenderPassDescriptor *renderPass = [MTLRenderPassDescriptor renderPassDescriptor];
renderPass.colorAttachments[0].texture = outputTexture;
renderPass.colorAttachments[0].clearColor = MTLClearColorMake(0, 0, 0, 0);
renderPass.colorAttachments[0].storeAction = MTLStoreActionStore;
renderPass.colorAttachments[0].loadAction = loadAction;
id renderEncoder = [MTL_COMMAND_BUFFER renderCommandEncoderWithDescriptor:renderPass];
[renderEncoder setRenderPipelineState:pipelineState];
[renderEncoder setVertexBuffer:vertexBuffer offset:0 atIndex:0];
for (NSInteger i = 0; i < vertexBuffers.count; i++) {
id extraVertexBuffer = vertexBuffers[i];
[renderEncoder setVertexBuffer:extraVertexBuffer offset:0 atIndex:1 + i];
}
for (NSInteger i = 0; i < inputTextures.count; i++) {
HobenMetalTexture *texture = inputTextures[i];
if (![texture isKindOfClass:[HobenMetalTexture class]]) {
NSAssert(NO, @"texture class must be HobenMetalTexture");
[renderEncoder setVertexBuffer:nil offset:0 atIndex:1 + i + vertexBuffers.count];
[renderEncoder setFragmentTexture:nil atIndex:i];
continue;
}
NSArray *textureCoor = texture.textureCoordinates;
NSAssert(textureCoor.count == 8, @"textureCoor.count must be 8");
float textureCoordinates[8] = {
[textureCoor[0] floatValue], [textureCoor[1] floatValue],
[textureCoor[2] floatValue], [textureCoor[3] floatValue],
[textureCoor[4] floatValue], [textureCoor[5] floatValue],
[textureCoor[6] floatValue], [textureCoor[7] floatValue],
};
id textureBuffer = [[HobenMetalKit sharedDevice] newBufferWithBytes:textureCoordinates length:sizeof(textureCoordinates) options:MTLResourceStorageModeShared];
[renderEncoder setVertexBuffer:textureBuffer offset:0 atIndex:1 + i + vertexBuffers.count];
[renderEncoder setFragmentTexture:texture.texture atIndex:i];
}
for (NSInteger i = 0; i < fragmentBuffers.count; i++) {
id fragmentBuffer = fragmentBuffers[i];
[renderEncoder setFragmentBuffer:fragmentBuffer offset:0 atIndex:i];
}
[renderEncoder drawPrimitives:MTLPrimitiveTypeTriangleStrip vertexStart:0 vertexCount:4];
[renderEncoder endEncoding];
AUTO_RELEASE_END
}
3. 纹理生产者Provider
生产者的主要工作是根据渲染层获得的纹理,提供给对应的消费者,从而进行下一步操作,在这里我们定义了Provider需要遵循的协议:
@protocol HobenMetalProviderProtocol
- (void)transmitTexture:(id)texture
target:(id)target
index:(NSInteger)index;
@end
再定义一个遵循Provider协议的纹理生产者MetalOutput,该生产者主要是管理自己所拥有的Consumer(根据addTarget方法加入),并在必要时刻通知给对应的Consumer,让其调用相应的方法。
@interface HobenMetalOutput : NSObject {
id outputTexture;
}
#pragma mark - Public Method
- (void)addTarget:(id )target {
NSInteger index = 0;
if ([target respondsToSelector:@selector(nextAvailableTextureIndex)]) {
index = [target nextAvailableTextureIndex];
}
[self addTarget:target atIndex:index];
}
- (void)addTarget:(id )target atIndex:(NSInteger)index {
if (!target) {
return;
}
if ([self.targets containsObject:target]) {
return;
}
if ([target respondsToSelector:@selector(textureIndexUnavailable:)]) {
[target textureIndexUnavailable:index];
}
[self.targets addObject:target];
[self.targetTextureIndices addObject:@(index)];
}
- (void)transmitTextureToAllTargets:(id)texture {
for (id target in self.targets) {
NSInteger indexOfObject = [self.targets indexOfObject:target];
NSInteger textureIndex = [[self.targetTextureIndices objectAtIndex:indexOfObject] integerValue];
[self transmitTexture:texture target:target index:textureIndex];
}
}
#pragma mark - HobenMetalProviderProtocol
- (void)transmitTexture:(id)texture target:(id)target index:(NSInteger)index {
[target newTextureAvailable:texture index:index];
}
在本架构中,属于生产者的有HobenMetalPicture
(根据UIImage获取到纹理)、HobenMetalMovieReader
(根据CVPixelBufferRef获取到纹理)、HobenMetalFilter
(根据链式上层获取到纹理),他们得到纹理后将会进行处理,输出给链式下层。
4. 纹理消费者Consumer
消费者的主要工作是根据Provider提供的纹理信息,进行进一步操作,在这里我们也定义了Consumer需要遵循的协议:
@protocol HobenMetalConsumerProtocol
- (void)newTextureAvailable:(id )texture index:(NSInteger)index;
@optional
- (NSInteger)nextAvailableTextureIndex;
- (void)textureIndexUnavailable:(NSInteger)index;
@end
在本架构中,属于消费者的有HobenMetalRenderView
(根据获取到的纹理提交渲染指令)、HobenMetalFilter
(根据获取到的纹理进行这一层的Encode),他们的职责是根据上一层Provider提供的纹理,在这一层进行编码。
三. 生产者和消费者们
1. 资源处理器
资源处理器,即将一些现有的资源对象(UIImage、CVPixelBufferRef)转化为纹理的工具,他们属于生产者Provider,转化为纹理后可以提供给链式下层Consumer。
HobenMetalPicture
根据MTKTextureLoader
提供的纹理读取方法,在init的时候就将CGImage转换为了纹理。
- (instancetype)initWithImage:(UIImage *)newImageSource {
if (self = [self initWithCGImage:newImageSource.CGImage]) {
}
return self;
}
- (instancetype)initWithCGImage:(CGImageRef)newImageSource {
if (self = [super init]) {
[self renderCGImage:newImageSource];
}
return self;
}
- (void)renderCGImage:(CGImageRef)cgImage {
MTKTextureLoader *loader = [[MTKTextureLoader alloc] initWithDevice:MTL_DEVICE];
NSDictionary *options = @{
MTKTextureLoaderOptionSRGB : @(NO),
};
self.texture = [loader newTextureWithCGImage:cgImage options:options error:nil];
}
当开发者需要开始传递创建好的纹理的时候,调用以下方法即可
- (void)processImage {
[self transmitTextureToAllTargets:self.texture];
}
而HobenMetalMovieReader
则需要定义好自己的YUV转换矩阵,加入到片段着色器缓冲当中,原理在Metal与图形渲染三:透明通道视频有提及,这里只是将过去的逻辑抽离得更简洁和可读一点:
- (BOOL)renderPixelBuffer:(CVPixelBufferRef)pixelBuffer {
AUTO_RELEASE_BEGIN
id textureY = [self textureWithPixelBuffer:pixelBuffer pixelFormat:MTLPixelFormatR8Unorm planeIndex:0];
id textureUV = [self textureWithPixelBuffer:pixelBuffer pixelFormat:MTLPixelFormatRG8Unorm planeIndex:1];
[self setupMatrixWithPixelBuffer:pixelBuffer];
if (!textureY || !textureUV || !self.convertMatrix) {
return NO;
}
CVPixelBufferLockBaseAddress(pixelBuffer, kCVPixelBufferLock_ReadOnly);
NSMutableArray *inputTextureArray = [NSMutableArray array];
for (id texture in @[textureY, textureUV]) {
HobenMetalTexture *inputTexture = [[HobenMetalTexture alloc] initWithTexture:texture];
[inputTextureArray addObject:inputTexture];
}
CVPixelBufferUnlockBaseAddress(pixelBuffer, kCVPixelBufferLock_ReadOnly);
if (!outputTexture) {
outputTexture = [HobenMetalTexture defaultTextureByWidth:textureY.width height:textureY.height];
}
[HobenMetalKit renderQuad:MTL_PIPELINE(@"oneInputVertex", @"movieFragment") inputTextures:inputTextureArray imageVertices:nil vertexBuffers:nil fragmentBuffers:@[_convertMatrix] outputTexture:outputTexture];
[self transmitTextureToAllTargets:outputTexture];
AUTO_RELEASE_END
return YES;
}
2. 中间层Filter
在链式图中,我们可以发现一个很重要的中间层——Filter,它既是生产者,也是消费者,它既可以消费上一层提供的纹理,又可以加入自己想要渲染的管线、缓冲、坐标,进行这一层的渲染,将得到的纹理提供给下一层。
Filter支持多个输入纹理,自己可以编写多个顶点缓冲、纹理缓冲,加上自己对应的Pipeline传递给渲染层,而最终只会得到一个输出。
根据Filter又是生产者又是消费者的特性,我们可以得出,它是一个继承HobenMetalOutput
同时遵循HobenMetalConsumerProtocol
的类:
@interface HobenMetalFilter : HobenMetalOutput
{
NSMutableArray *inputTextures;
}
由于Filter支持多输入,所以我们需要等待所有的输入源准备好了,再进行该次渲染操作,在渲染时,如果上一层的Provider传来纹理,且所有纹理已经准备完毕,那就可以开始处理了:
- (void)newTextureAvailable:(id)texture index:(NSInteger)index {
if (!texture) {
return;
}
NSInteger numberOfInputs = MAX(_numberOfInputs, 1);
HobenMetalTexture *inputTexture = [[HobenMetalTexture alloc] initWithTexture:texture];
inputTexture.textureIndex = index;
[inputTextures addObject:inputTexture];
if (inputTextures.count < numberOfInputs) {
return;
}
if (!outputTexture) {
outputTexture = [HobenMetalTexture defaultTextureByWidth:texture.width height:texture.height];
}
[inputTextures sortUsingComparator:^NSComparisonResult(HobenMetalTexture *obj1, HobenMetalTexture *obj2) {
if (obj1.textureIndex <= obj2.textureIndex) {
return NSOrderedAscending;
} else {
return NSOrderedDescending;
}
}];
[self renderToTextureWithVertices:nil textureCoordinates:nil];
[inputTextures removeAllObjects];
}
- (void)renderToTextureWithVertices:(NSArray *)vertices textureCoordinates:(NSArray *)textureCoordinates {
for (HobenMetalTexture *inputTexture in inputTextures) {
inputTexture.textureCoordinates = textureCoordinates;
}
[HobenMetalKit renderQuad:MTL_PIPELINE(_vertexName, _fragmentName) inputTextures:inputTextures imageVertices:vertices outputTexture:outputTexture];
[self transmitTextureToAllTargets:outputTexture];
}
值得注意的是,由于MTLTextureDescriptor
创建纹理是一个很耗CPU的操作,因此,我们只创建一次outputTexture就好了(GPUImage3可能是因为这个问题,渲染视频的时候CPU占比很高,坑了我好久。。)
这里将renderToTextureWithVertices:textureCoordinates:
抽了出来,开发者可以根据自己的需要自定义顶点坐标或纹理坐标,或者自己实现一套渲染逻辑,比如这次需要用到的裁剪操作CropFilter就是这样实现的:
- (void)calculateCropTextureCoordinates {
CGFloat minX = _cropRegion.origin.x;
CGFloat minY = _cropRegion.origin.y;
CGFloat maxX = CGRectGetMaxX(_cropRegion);
CGFloat maxY = CGRectGetMaxY(_cropRegion);
_cropTextureCoordinates = @[
@(minX), @(minY),
@(maxX), @(minY),
@(minX), @(maxY),
@(maxX), @(maxY),
];
}
#pragma mark - Override
- (void)renderToTextureWithVertices:(NSArray *)vertices textureCoordinates:(NSArray *)textureCoordinates {
[super renderToTextureWithVertices:vertices textureCoordinates:_cropTextureCoordinates];
}
3. 输出视图
输出视图继承于MTKView,其职责是将上一层提供的纹理进行展示,属于消费者Consumer,是将编码指令提交给GPU的最终结点。而这次,我们不需要让系统每帧回调drawInMtkView:
了,而是我们自己决定调用的时机,代码如下:
@interface HobenMetalRenderView : MTKView
static const NSUInteger MaxFramesInFlight = 3;
- (void)setup {
// 设置enableSetNeedsDisplay为NO且paused为YES,开发者自决定draw时机
self.enableSetNeedsDisplay = NO;
self.paused = YES;
self.autoResizeDrawable = YES;
self.device = MTL_DEVICE;
self.opaque = NO;
_inFlightSemaphore = dispatch_semaphore_create(MaxFramesInFlight);
}
- (void)newTextureAvailable:(id)texture index:(NSInteger)index {
self.drawableSize = CGSizeMake(texture.width, texture.height);
self.currentTexture = texture;
[self draw];
}
- (void)drawRect:(CGRect)rect {
if (!self.currentTexture) {
return;
}
if (!self.currentDrawable) {
NSAssert(NO, @"drawable is nil");
return;
}
dispatch_semaphore_wait(_inFlightSemaphore, DISPATCH_TIME_FOREVER);
id commandBuffer = MTL_COMMAND_BUFFER;
HobenMetalTexture *texture = [[HobenMetalTexture alloc] initWithTexture:self.currentTexture];
[HobenMetalKit renderQuad:MTL_PASSTHROUGH_PIPELINE inputTextures:@[texture] outputTexture:self.currentDrawable.texture];
__block dispatch_semaphore_t block_semaphore = _inFlightSemaphore;
[commandBuffer addCompletedHandler:^(id buffer)
{
dispatch_semaphore_signal(block_semaphore);
}];
[commandBuffer presentDrawable:self.currentDrawable];
[commandBuffer commit];
self.currentTexture = nil;
[HobenMetalKit resetCommandBuffer];
}
MTKView的currentDrawable也就是当前屏幕的画布,当渲染指令commit完毕后,这次链式结构的所有编码好的命令缓冲就会提交给GPU,至此,该条链式结构就能完成了。
需要注意的是,当CommandBuffer提交上去后,需要重置,下次渲染的时候,会从命令缓冲队列里面再创建一条命令缓冲,直到下次MTKView又将渲染指令提交上去完毕。
四. 业务层的继承和调用
1. 自定义一个Filter
经过这次重构之后,业务层的逻辑显然简洁了很多,如果需要自定义一个Filter,我们只需要指定对应的顶点着色器、片段着色器即可进行操作,有需要的话还可以自定义顶点坐标、片段坐标,例如,裁剪操作CropFilter可以简化为以下代码:
- (instancetype)initWithCropRegin:(CGRect)newCropRegion {
if (self = [super init]) {
self.cropRegion = newCropRegion;
}
return self;
}
- (void)calculateCropTextureCoordinates {
CGFloat minX = _cropRegion.origin.x;
CGFloat minY = _cropRegion.origin.y;
CGFloat maxX = CGRectGetMaxX(_cropRegion);
CGFloat maxY = CGRectGetMaxY(_cropRegion);
_cropTextureCoordinates = @[
@(minX), @(minY),
@(maxX), @(minY),
@(minX), @(maxY),
@(maxX), @(maxY),
];
}
#pragma mark - Override
- (void)renderToTextureWithVertices:(NSArray *)vertices textureCoordinates:(NSArray *)textureCoordinates {
[super renderToTextureWithVertices:vertices textureCoordinates:_cropTextureCoordinates];
}
- (void)setCropRegion:(CGRect)newValue {
NSParameterAssert(newValue.origin.x >= 0 && newValue.origin.x <= 1 &&
newValue.origin.y >= 0 && newValue.origin.y <= 1 &&
newValue.size.width >= 0 && newValue.size.width <= 1 &&
newValue.size.height >= 0 && newValue.size.height <= 1);
_cropRegion = newValue;
[self calculateCropTextureCoordinates];
}
而融合操作由于没有自定义顶点坐标的需求,在OC层就更简单了
- (instancetype)init {
if (self = [super initWithVertexName:@"twoInputVertex" fragmentName:@"mixFragment" numberOfInputs:2]) {
}
return self;
}
对应的.metal文件也只是之前的融合操作:
vertex TwoInputVertexIO twoInputVertex(const device packed_float2 *position [[buffer(0)]],
const device packed_float2 *texturecoord [[buffer(1)]],
const device packed_float2 *texturecoord2 [[buffer(2)]],
uint vid [[vertex_id]])
{
TwoInputVertexIO outputVertices;
outputVertices.position = float4(position[vid], 0, 1.0);
outputVertices.textureCoordinate = texturecoord[vid];
outputVertices.textureCoordinate2 = texturecoord2[vid];
return outputVertices;
}
fragment float4 mixFragment(TwoInputVertexIO fragmentInput [[stage_in]],
texture2d inputTexture [[texture(0)]],
texture2d inputTexture2 [[texture(1)]])
{
constexpr sampler quadSampler;
float4 color1 = inputTexture.sample(quadSampler, fragmentInput.textureCoordinate);
float4 color2 = inputTexture2.sample(quadSampler, fragmentInput.textureCoordinate2);
return float4(color1.rgb, color2.r);
}
2. 业务层的调用
业务层需要指定链式结构的走向,也只需要一个可读性非常好的操作:
- (void)viewDidLoad {
[super viewDidLoad];
if (!_renderView) {
_renderView = [[HobenMetalRenderView alloc] initWithFrame:CGRectMake(0, 0, self.view.frame.size.width, self.view.frame.size.height)];
}
if (!_cropLeftFilter) {
_cropLeftFilter = [[HobenMetalCropFilter alloc] initWithCropRegin:CGRectMake(0, 0, .5f, 1.f)];
}
if (!_cropRightFilter) {
_cropRightFilter = [[HobenMetalCropFilter alloc] initWithCropRegin:CGRectMake(.5f, 0, .5f, 1.f)];
}
if (!_mixFilter) {
_mixFilter = [[HobenMetalMixFilter alloc] init];
}
if (!_picture) {
_picture = [[HobenMetalPicture alloc] initWithImage:[UIImage imageNamed:@"crop_image"]];
}
[self.view addSubview:_renderView];
[_picture addTarget:_cropLeftFilter];
[_picture addTarget:_cropRightFilter];
[_cropLeftFilter addTarget:_mixFilter];
[_cropRightFilter addTarget:_mixFilter];
[_mixFilter addTarget:_renderView];
[_picture processImage];
}
至此,一个链式结构就完成啦!
五. 内存和CPU优(Cai)化(Keng)的一些思考
GPUImage3处理视频的高CPU和高内存情况,预估原因体现在以下几点:
- AutoReleasePool
苹果的对Metal渲染的官方文档是建议使用autoRelease的,对此我们渲染的操作也需要加上这个操作。
- 对CommandBuffer的频繁Commit
在GPUImage3的设计中,无论是Provider、Consumer还是Filter,他的每次编码操作之后都进行了一次commit,事实上,对于单次渲染来说,只需要一次commit、多次编码即可完成,而commit恰恰是CPU和GPU沟通的桥梁。
根据苹果官方的描述,Drawable其实是一个非常有限的资源(只有3个),他由系统进行调度,而官方的Sample Code:Synchronizing CPU and GPU Work,建议使用信号量来控制commit,GPUImage3这番频繁的commit估计会很影响CPU的性能。
// The maximum number of frames in flight.
static const NSUInteger MaxFramesInFlight = 3;
...
/// Handles view rendering for a new frame.
- (void)drawInMTKView:(nonnull MTKView *)view
{
// Wait to ensure only `MaxFramesInFlight` number of frames are getting processed
// by any stage in the Metal pipeline (CPU, GPU, Metal, Drivers, etc.).
dispatch_semaphore_wait(_inFlightSemaphore, DISPATCH_TIME_FOREVER);
...
// Add a completion handler that signals `_inFlightSemaphore` when Metal and the GPU have fully
// finished processing the commands that were encoded for this frame.
// This completion indicates that the dynamic buffers that were written-to in this frame, are no
// longer needed by Metal and the GPU; therefore, the CPU can overwrite the buffer contents
// without corrupting any rendering operations.
__block dispatch_semaphore_t block_semaphore = _inFlightSemaphore;
[commandBuffer addCompletedHandler:^(id buffer)
{
dispatch_semaphore_signal(block_semaphore);
}];
// Finalize CPU work and submit the command buffer to the GPU.
[commandBuffer commit];
}
- 频繁地使用MTLTextureDescriptor创建outputTexture
在视频的每一帧渲染中,这个是非常非常消耗CPU的,一个视频有非常多帧,每一帧都初始化一个纹理肯定是不行的,因为这个,我渲染视频的CPU飙升到了50%左右,而优化之后CPU维持在10%左右,有多耗性能可想而知,事实上这个也不需要频繁创建,只需要Lazy Load就好了~
下图就是经过优化之后,渲染视频中,CPU和内存的峰值啦:
六. 总结
本次链式化架构的实现,大大地提升渲染逻辑的维护性和可读性,支持按照渲染功能对Filter文件和.metal文件进行分类,简化了业务层开发的逻辑。
即便需要自定义渲染操作,也只需要继承HobenMetalFilter
,自行决定所需的顶点着色器、片段着色器、顶点坐标、纹理坐标、顶点缓冲、纹理缓冲即可,非常方便。
该链式结构遵循生产者-消费者结构,将输入作为生产者,输出作为消费者,中间层Filter作为生产者和消费者,从而使得单次的命令缓冲CommandBuffer集成了多个指令编码CommandEncode,最后让MTKView提交命令缓冲至GPU,完成该次渲染。
而本次链式架构不仅用OC完成了开源库GPUImage3的代码逻辑,而且还解决了高内存和高CPU问题,虽然过程比较煎熬,但收获真的很多,继续加油!