Metal日记：使用步骤指南

本文参考资料：

juejin.im/post/5b1e8f…

xiaozhuanlan.com/topic/04598…

developer.apple.com/videos/play…

github.com/quinn0809/G…

cloud.tencent.com/developer/a…

devstreaming-cdn.apple.com/videos/wwdc…

Metal处理逻辑

无论是CoreImage、GPUImage框架，还是Metal、OpenGL框架，处理逻辑类似：

输入（资源+逻辑）-> 黑盒 -> 输出

CoreImage 可以选择GPU处理->Metal->CoreImage,也可以选择CPU处理

GPUImage 有OpenGL ES版，也有Metal版本（Metal 版本极为简陋）

Metal使用大致分为：

build :shader
initialize :device and Queues Render Objects
Render:commandBuffer、ResourceUpdate、renderEncoder、Display

Metal 为控制GPU的编程语言其实从代码来讲，大部分时间都是在CPU完成组件的创建，包括shader，pipline，encoder。

build :shader

主要完成shader的编译，涉及到vertex 、fragment

Metal中的shader是MSL语言，SIMD的存在支持MSL与原生代码共享数据结构。

一个简单的vertexShader ：

vertex ThreeInputVertexIO threeInputVertex(device packed_float2 *position [[buffer(0)]],
                                       device packed_float2 *texturecoord [[buffer(1)]],
                                       device packed_float2 *texturecoord2 [[buffer(2)]],
                                       uint vid [[vertex_id]])
{
    ThreeInputVertexIO outputVertices;
    
    outputVertices.position = float4(position[vid], 0, 1.0);
    outputVertices.textureCoordinate = texturecoord[vid];
    outputVertices.textureCoordinate2 = texturecoord2[vid];
    
    return outputVertices;
}
复制代码

outputVertices.position = float4(position[vid], 0, 1.0); position[vid] 是float2 SIMD 是 Apple 提供的一款方便原生程序与着色器程序共享数据结构的库。

开发者可以基于SIMD框架在Objective-C头文件中定义一系列数据结构，在原生代码和着色器程序中通过#include包含这个头文件，两者就都有了这个结构的定义。

ThreeInputVertexIO 声明如下：

struct ThreeInputVertexIO
{
    float4 position [[position]];
    float2 textureCoordinate [[user(texturecoord)]];
    float2 textureCoordinate [[user(texturecoord2)]];

};
复制代码

device packed_float2 *position [[buffer(0)]]

device packed_float2 *texturecoord [[buffer(1)]]

packed_float2是类型 position、texturecoord是变量名

device是内存修饰符，Metal种的内存访问主要有两种方式：Device模式和Constant模式，由代码中显式指定。

Device模式是比较通用的访问模式，使用限制比较少，而Constant模式是为了多次读取而设计的快速访问只读模式，通过Constant内存模式访问的参数的数据的字节数量是固定的，特点总结为： Device支持读写，并且没有size的限制； Constant是只读，并且限定大小；如何选择Device和Constant模式？先看数据size是否会变化，再看访问的频率高低，只有那些固定size且经常访问的部分适合使用constant模式，其他的均用Device。

[[buffer(0)]]、[[buffer(1)]]是句柄，在MSL中不同的类型用不同的buffer表示，与renderCommandEncoder时相对应：

    //buffer 
    renderEncoder.setVertexBuffer(vertexBuffer, offset: 0, index: 0)
    renderEncoder.setVertexBuffer(textureBuffer1, offset: 0, index: 1)
    renderEncoder.setVertexBuffer(textureBuffer2, offset: 0, index: 2)

    ······
    //samper
    [renderEncoder setFragmentSampler:sampler atIndex:0];
    [renderEncoder setFragmentSampler:sampler1 atIndex:0];
    ······
    //texture
    renderEncoder.setFragmentTexture(texture, index: 0)
    renderEncoder.setFragmentTexture(texture1, index: 1)
    ······
复制代码

index 与 [[buffer(0)]]相对应，如，此时上文MSL的vertexShader中

[[buffer(0)]] 为vertex数据
[[buffer(1)]]为第一个纹理坐标数据
[[buffer(2)]]为第二个纹理坐标数据

index与shader中声明的[[buffer(x)]]严格对应，否则在Metal Validation Layer中极可能会报错（通常是内存读取越界），或者绘制出不符合预期的结果。 vertexShader的执行次数与顶点数量有关，即vid为索引数。

一个简单的fragmentShader ：

fragment half4 lookupSplitFragment(TwoInputVertexIO fragmentInput [[stage_in]],
                              texture2d inputTexture [[texture(0)]],
                              texture2d inputTexture2 [[texture(1)]],
                              texture2d inputTexture3 [[texture(2)]],
                              constant SplitUniform& uniform [[ buffer(1) ]])
{}
复制代码

同上文的renderCommandEncoder时,

inputTexture 为第一个纹理
inputTexture2 为第二个纹理
inputTexture3 为第三个纹理

SplitUniform 为自定义的参数，在此shader中的意义为split 的外界值。 SplitUniform的定义如下：在metal文件中：

typedef struct
{
    float intensity;
    float progress;

} SplitUniform;
复制代码

『intensity』 为filter的浓度

『progress』 为filter的 split 进度

shader 在xcode building 的时候就会被编译到 metal library中至此，本次目标渲染的shader 已经完成，下面开始初始化工作，将shader通过渲染管线联系起来。

初始化工作

devide
commandQueue
buffer
texture
pipline

初始化Device

devide 是 metal 控制的GPU 入口,是一个一次创建最好永久使用的对象，用来创建buffer、command、texture;在Metal最佳实践之南中，指出开发者应该长期持有一个device对象（device 对象创建比较昂贵）

OC：

id device = MTLCreateSystemDefaultDevice();
复制代码

Swift:

guard let device = MTLCreateSystemDefaultDevice() else {
            fatalError("Could not create Metal Device")
}
复制代码

创建 CommandQueue 命令队列

Metal 最佳实践指南中，指出大部分情况下，开发者要重复使用一个命令队列通过Device -> commandQueue

/// device 创建命令队列
   guard let commandQueue = self.device.makeCommandQueue() else {
       fatalError("Could not create command queue")
   }
复制代码

创建 Buffer 数据

Metal 中，所有无结构的数据都使用 Buffer 来管理。与 OpenGL 类似的，顶点、索引等数据都通过 Buffer 管理。比如：vertexBuffer、textureCoordBuffer

/// 纹理坐标buffer
let coordinateBuffer = device.makeBuffer(bytes: inputTextureCoordinates,
                    length: inputTextureCoordinates.count * MemoryLayout.size,
                    options: [])!
///顶点数据buffer
let vertexBuffer = device.makeBuffer(bytes: imageVertices,
                    length: imageVertices.count * MemoryLayout.size,
                    options: [])!
复制代码

这些Buffer在renderCommandEncoder中进行编码然后提交到GPU

创建 Texture

texture 可以理解为被加工的对象，设计者为它增加了一个描述对象MTLTextureDescriptor

在Metal中，有一个抽象对象，专门由于描述 teture 的详情（fromat,width,height,storageMode）

storageMode为控制CPU、GPU的内存管理方式。Apple 推荐在 iOS 中使用 shared mode，而在 macOS 中使用 managed mode。

Shared Storage：CPU 和 GPU 均可读写这块内存。
Private Storage: 仅 GPU 可读写这块内存，可以通过 Blit 命令等进行拷贝。
Managed Storage: 仅在 macOS 中允许。仅 GPU 可读写这块内存，但 Metal 会创建一块镜像内存供 CPU 使用
复制代码

//纹理描述 器
let textureDescriptor = MTLTextureDescriptor.texture2DDescriptor(pixelFormat: pixelFormat,
                                       width: width,
                                       height: height,
                                       mipmapped: mipmapped)
//通过 devide创建简单纹理（比如单色纹理）
guard let newTexture = device.makeTexture(descriptor: textureDescriptor) else {
            fatalError("Could not create texture of size: (\(width), \(height))")
 }
 // 通过 图片创建 （MetalKit）
var textureLoader = MTKTextureLoader(device: self.device)
let imageTexture = try textureLoader.newTexture(cgImage: img, options: [MTKTextureLoader.Option.SRGB : false])


复制代码

MTKTextureLoader 也建议重复使用

创建 pipline 渲染管线

pipline：最为复杂的东西，也是最简单的东西，说他复杂是因为，他的成员变量多；说简单，是因为pipline只是一个所有资源的描述者

在Metal中，有一个抽象对象，专门由于描述 pipline 的详情的对象Descriptor，包含了（顶点着色器,片段着色器，颜色格式，深度等）

colorAttachments，用于写入颜色数据
depthAttachment，用于写入深度信息
stencilAttachment，允许我们基于一些条件丢弃指定片段

MTLRenderPassDescriptor 里面的 colorAttachments，支持多达 4 个 用来存储颜色像素数据的 attachment，在 2D 图像处理时，我们一般只会关联一个。
即 colorAttachments[0]。
复制代码

let descriptor = MTLRenderPipelineDescriptor()
    descriptor.colorAttachments[0].pixelFormat = MTLPixelFormat.bgra8Unorm
    descriptor.vertexFunction = vertexFunction
    descriptor.fragmentFunction = fragmentFunction
复制代码

关于shader 函数的创建：

guard let vertexFunction = defaultLibrary.makeFunction(name: vertexFunctionName) else {
    fatalError("Could not compile vertex function \(vertexFunctionName)")
}
    
guard let fragmentFunction = defaultLibrary.makeFunction(name: fragmentFunctionName) else {
    fatalError("Could not compile fragment function \(fragmentFunctionName)")
}
复制代码

defaultLibrary 为通过device 创建的函数库，上文我们在编译的时候已经编译好了顶点着色器以及片段着色器，这是通过

do {
            let frameworkBundle = Bundle(for: Context.self)
            let metalLibraryPath = frameworkBundle.path(forResource: "default", ofType: "metallib")!
            
            self.defaultLibrary = try device.makeLibrary(filepath:metalLibraryPath)
        } catch {
            fatalError("Could not load library")
        }
        
复制代码

可以获取到 defaultLibrary，这是有Metal 提供的方法

到目前为止，我们已经完成了渲染所需的子控件的构造，初始化，下面将介绍命令编码，提交，渲染

Render:commandBuffer、ResourceUpdate、renderEncoder、Display

renderEncoder

上文我们创建了渲染管线状态，这里我们需要根据RenderPassDescriptor生成一个 RenderCommandEncoder,在encoder中链接shader GPU 渲染图像的步骤大致可以分为：加载、渲染、存储。开发者可以指定这三个步骤具体做什么事。

MTLRenderPassDescriptor * desc = [MTLRenderPassDescriptor new];
desc.colorAttachment[0].texture = myColorTexture;

// 指定三个步骤的行为
desc.colorAttachment[0].loadAction = MTLLoadActionClear;
desc.colorAttachment[0].clearColor = MTLClearColorMake(0.39f, 0.34f, 0.53f, 1.0f);
desc.colorAttachment[0].storeAction = MTLStoreActionStore;
复制代码

myColorTexture 可以理解为容器，用于安置渲染的结果。

上文有提到编码：

    //buffer 
    renderEncoder.setVertexBuffer(vertexBuffer, offset: 0, index: 0)
    renderEncoder.setVertexBuffer(textureBuffer1, offset: 0, index: 1)
    renderEncoder.setVertexBuffer(textureBuffer2, offset: 0, index: 2)

    ······
    //samper
    [renderEncoder setFragmentSampler:sampler atIndex:0];
    [renderEncoder setFragmentSampler:sampler1 atIndex:0];
    ······
    //texture
    renderEncoder.setFragmentTexture(texture, index: 0)
    renderEncoder.setFragmentTexture(texture1, index: 1)
    ······
复制代码

编码所需代码大致如下：

        let commandBuffer = commonQueue.makeCommandBuffer()!
        let commandEncoder = commandBuffer.makeRenderCommandEncoder(descriptor: renderPassDescripor)!
        
        commandEncoder.setRenderPipelineState(pipelineState)
        commandEncoder.setVertexBuffer(vertexBuffer, offset: 0, index: 0)
        commandEncoder.setFragmentTexture(texture, index: 0)
        commandEncoder.drawPrimitives(type: .triangleStrip, vertexStart: 0, vertexCount: 4)
        commandEncoder.endEncoding()
复制代码

提交渲染

        commandBuffer.present(drawable)
        commandBuffer.commit()
复制代码

渲染时的三帧缓存：创建三帧的资源缓冲区来形成一个缓冲池。CPU 将每一帧的数据按顺序写入缓冲区供 GPU 使用。

提交时，分为同步提交（阻塞），异步提交（非阻塞）阻塞:

id commandBuffer = [commandQueue commandBuffer];

// 编码命令...

[commandBuffer commit];

[commandBuffer waitUntilCompleted];
复制代码

非阻塞:

id commandBuffer = [commandQueue commandBuffer];

// 编码命令...

commandBuffer addCompletedHandler:^(id commandBuffer) {
	// 回调 CPU...
}

[commandBuffer commit];
复制代码

重申：本文参考资料：