版本记录

版本号	时间
V1.0	2018.11.11 星期日

前言

很多做视频和图像的，相信对这个框架都不是很陌生，它渲染高级3D图形，并使用GPU执行数据并行计算。接下来的几篇我们就详细的解析这个框架。感兴趣的看下面几篇文章。
1. Metal框架详细解析（一）—— 基本概览
2. Metal框架详细解析（二） —— 器件和命令（一）
3. Metal框架详细解析（三） —— 渲染简单的2D三角形（一）
4. Metal框架详细解析（四） —— 关于GPU Family 4（一）
5. Metal框架详细解析（五） —— 关于GPU Family 4之关于Imageblocks（二）
6. Metal框架详细解析（六） —— 关于GPU Family 4之关于Tile Shading（三）
7. Metal框架详细解析（七） —— 关于GPU Family 4之关于光栅顺序组（四）
8. Metal框架详细解析（八） —— 关于GPU Family 4之关于增强的MSAA和Imageblock采样覆盖控制（五）
9. Metal框架详细解析（九） —— 关于GPU Family 4之关于线程组共享（六）
10. Metal框架详细解析（十） —— 基本组件（一）
11. Metal框架详细解析（十一） —— 基本组件之器件选择 - 图形渲染的器件选择（二）
12. Metal框架详细解析（十二） —— 基本组件之器件选择 - 计算处理的设备选择（三）
13. Metal框架详细解析（十三） —— 计算处理（一）
14. Metal框架详细解析（十四） —— 计算处理之你好，计算（二）
15. Metal框架详细解析（十五） —— 计算处理之关于线程和线程组（三）
16. Metal框架详细解析（十六） —— 计算处理之计算线程组和网格大小（四）
17. Metal框架详细解析（十七） —— 工具、分析和调试（一）
18. Metal框架详细解析（十八） —— 工具、分析和调试之Metal GPU Capture（二）
19. Metal框架详细解析（十九） —— 工具、分析和调试之GPU活动监视器（三）
20. Metal框架详细解析（二十） —— 工具、分析和调试之关于Metal着色语言文件名扩展名、使用Metal的命令行工具构建库和标记Metal对象和命令（四）
21. Metal框架详细解析（二十一） —— 基本课程之基本缓冲区（一）
22. Metal框架详细解析（二十二） —— 基本课程之基本纹理（二）
23. Metal框架详细解析（二十三） —— 基本课程之CPU和GPU同步（三）
24. Metal框架详细解析（二十四） —— 基本课程之参数缓冲 - 基本参数缓冲（四）
25. Metal框架详细解析（二十五） —— 基本课程之参数缓冲 - 带有数组和资源堆的参数缓冲区（五）
26. Metal框架详细解析（二十六） —— 基本课程之参数缓冲 - 具有GPU编码的参数缓冲区（六）
27. Metal框架详细解析（二十七） —— 高级技术之图层选择的反射（一）
28. Metal框架详细解析（二十八） —— 高级技术之使用专用函数的LOD（一）
29. Metal框架详细解析（二十九） —— 高级技术之具有参数缓冲区的动态地形（一）
30. Metal框架详细解析（三十） —— 延迟照明（一）
31. Metal框架详细解析（三十一） —— 在视图中混合Metal和OpenGL渲染（一）
32. Metal框架详细解析（三十二） —— Metal渲染管道教程（一）
33. Metal框架详细解析（三十三） —— Metal渲染管道教程（二）
34. Metal框架详细解析（三十四） —— Hello Metal！一个简单的三角形的实现（一）
35. Metal框架详细解析（三十五） —— Hello Metal！一个简单的三角形的实现（二）
36. Metal框架详细解析（三十六） —— Metal编程指南之概览（一）
37. Metal框架详细解析（三十七） —— Metal编程指南之基本Metal概念（二）
38. Metal框架详细解析（三十八） —— Metal编程指南之命令组织和执行模型（三）
39. Metal框架详细解析（三十九） —— Metal编程指南之资源对象：缓冲区和纹理（四）
40. Metal框架详细解析（四十） —— Metal编程指南之函数和库（五）
41. Metal框架详细解析（四十一） —— Metal编程指南之图形渲染：渲染命令编码器之Part 1（六）
42. Metal框架详细解析（四十二） —— Metal编程指南之图形渲染：渲染命令编码器之Part 2（七）
43. Metal框架详细解析（四十三） —— Metal编程指南之数据并行计算处理：计算命令编码器（八）
44. Metal框架详细解析（四十四） —— Metal编程指南之缓冲和纹理操作：Blit命令编码器（九）
45. Metal框架详细解析（四十五） —— Metal编程指南之Metal工具（十）

Tessellation

可用于：iOS_GPUFamily3_v2，OSX_GPUFamily1_v2

Tessellation用于从由控制点组成的四边形或三角形补片构造的初始表面计算更详细的表面。为了逼近高阶曲面，GPU使用每个补丁tessellation因子将每个补丁细分为三角形

Metal Tessellation Pipeline - Metal Tessellation管道

图12-1显示了Metal tessellation管道，它使用计算机内核，tessellator和post-tessellation顶点函数。

Metal框架详细解析（四十六） —— Metal编程指南之Tessellation（十一）_第1张图片

Figure 12-1 The Metal tessellation pipeline

Tessellation对贴片进行操作，每个贴片表示由一组控制点定义的任意几何排列。每个补丁tessellation因子，每补丁用户数据和补丁控制点数据均存储在单独的MTLBuffer对象中。

1. Compute Kernel - 计算内核

计算内核是执行以下操作的内核函数(kernel function)：

计算per-patch tessellation因子。
（可选）计算每个修补程序的用户数据。
（可选）计算或修改补丁控制点数据。

注意：计算内核不需要每帧执行，以计算每个补丁程序的tessellation，每个补丁程序用户数据或补丁控制点数据。只要您在需要时向tessellator和post-tessellation顶点函数提供所需数据，您就可以每n帧，离线或任何其他运行时方式计算此数据。

2. Tessellator

tessellator是一个固定功能的流水线阶段，它创建补丁表面的采样模式并生成连接这些样本的图形基元。tessellator在规范化坐标系中平铺规范域，范围从0.0到1.0。

tessellator配置为渲染管道的一部分，使用MTLRenderPipelineDescriptor对象构建MTLRenderPipelineState对象。tessellator的输入是每个补丁tessellator因子。

Tessellator Primitive Generation

tessellator每个补丁运行一次，消耗输入补丁并生成一组新的三角形。这些三角形是通过根据提供的每个补丁tessellator细分补丁而产生的。由tessellator生成的每个三角形顶点在标准化参数空间中具有关联的（u，v）或（u，v，w）位置，每个参数值的范围从0.0到1.0。（请注意，细分是以与实现相关的方式执行的。）

3. Post-Tessellation Vertex Function - Post-Tessellation顶点函数

post-tessellation顶点函数是顶点函数，其计算由tessellator生成的每个补片表面样本的顶点数据。post-tessellation顶点函数的输入是：

补丁上的标准化顶点坐标（由tessellator输出）。
每个补丁用户数据（可选择由计算内核输出）。
补丁控制点数据（可选地由计算内核输出）。
任何其他顶点函数输入，例如纹理和缓冲区。

post-tessellation顶点函数生成tessellator三角形的最终顶点数据。在post-tessellation顶点函数完成执行之后，对细分的图元进行栅格化，并且渲染管线的其余阶段正常执行。

Per-Patch Tessellation Factors

每个补丁tessellation因子指定tessellator对每个补丁进行细分的程度。每个贴片tessellation由四边形贴片的MTLQuadTessellationFactorsHalf结构或三角形贴片的MTLTriangleTessellationFactorsHalf结构描述。

注意：虽然结构成员的类型为uint16_t，但是提供给tessellator的每个补丁tessellation因子必须是half的一半。

1. Understanding Quad Patches - 了解四边形补丁

对于四边形贴片，贴片中的位置是（u，v）笛卡尔坐标，指示顶点相对于四边形贴片边界的水平和垂直位置，如图12-2所示。 （u，v）值各自为0.0到1.0。

Metal框架详细解析（四十六） —— Metal编程指南之Tessellation（十一）_第2张图片

Figure 12-2 Quad patch coordinates in normalized parameter space

Interpreting the MTLQuadTessellationFactorsHalf structure - 解释MTLQuadTessellationFactorsHalf结构

MTLQuadTessellationFactorsHalf结构定义如下：

typedef struct {
    uint16_t edgeTessellationFactor[4];
    uint16_t insideTessellationFactor[2];
} MTLQuadTessellationFactorsHalf;

结构中的每个值都提供特定的tessellation因子：

edgeTessellationFactor [0]提供补丁边缘的tessellation因子，其中u = 0（边0）。
edgeTessellationFactor [1]提供补丁边缘的tessellation因子，其中v = 0（边缘1）。
edgeTessellationFactor [2]提供补丁边缘的tessellation因子，其中u = 1（边2）。
edgeTessellationFactor [3]提供补丁边缘的tessellation因子，其中v = 1（边3）。
insideTessellationFactor [0]为v的所有内部值提供水平tessellation因子。
insideTessellationFactor [1]为u的所有内部值提供垂直tessellation因子。

2. Understanding Triangle Patches - 了解三角形补丁

对于三角形贴片，贴片中的位置是（u，v，w）重心坐标，表示三角形的三个顶点对顶点位置的相对影响，如图12-3所示。（u，v，w）值的范围从0.0到1.0，其中u + v + w = 1.0。

Metal框架详细解析（四十六） —— Metal编程指南之Tessellation（十一）_第3张图片

Figure 12-3 Triangle patch coordinates in normalized parameter space

Interpreting the MTLTriangleTessellationFactorsHalf structure - 解释MTLTriangleTessellationFactorsHalf结构

MTLTriangleTessellationFactorsHalf结构定义如下：

typedef struct {
    uint16_t edgeTessellationFactor[3];
    uint16_t insideTessellationFactor;
} MTLTriangleTessellationFactorsHalf;

结构中的每个值都提供特定的tessellation因子：

edgeTessellationFactor [0]提供补丁边缘的曲面细分因子，其中u = 0（边0）。
edgeTessellationFactor [1]提供补丁边缘的曲面细分因子，其中v = 0（边缘1）。
edgeTessellationFactor [2]提供补丁边缘的曲面细分因子，其中w = 1（边2）。
insideTessellationFactor提供内部tessellation因子。

3. Rules for Discarding Patches - 丢弃补丁的规则

如果边缘tessellation因子的值为负，零或对应于浮点NaN，则tessellator将丢弃该补丁。如果内部tessellation因子的值为负，则tessellation因子将被限制在tessellationPartitionMode属性定义的范围内，并且tessellator不会丢弃该修补程序。

如果未丢弃补丁且tessellationFactorScaleEnabled属性设置为YES，则tessellator将边缘和内部tessellation因子乘以setTessellationFactorScale:方法中指定的比例因子。

当丢弃补丁时，不会生成新的基元，post-tessellation顶点函数不会执行，并且不会为该补丁生成可见的输出。

4. Specifying the Per-Patch Tessellation Factors Buffer - 指定Per-Patch Tessellation Factors缓冲区

每个补丁tessellation被写入MTLBuffer对象，并通过调用setTessellationFactorBuffer:offset:instanceStride:方法作为输入传递给tessellator。在对同一MTLRenderCommandEncoder对象发出补丁绘制调用之前，必须调用此方法。

Patch Functions - Patch 函数

本节总结了支持tessellation的Metal着色语言的主要更改。有关详细信息，请参阅Metal Shading Language Guide的Functions, Variables, and Qualifiers一章。

1. Creating a Compute Kernel - 创建计算内核

计算内核是使用现有kernel函数限定符标识的内核函数。 Listing 12-1是计算内核函数签名的示例。

Listing 12-1  Compute kernel function signature

kernel void my_compute_kernel(...) {...}

Metal着色语言的现有功能完全支持计算内核。计算内核函数的输入和输出与常规内核函数相同。

2. Creating a Post-Tessellation Vertex Function - 创建Post-Tessellation顶点函数

post-tessellation顶点函数是使用现有vertex函数限定符标识的顶点函数。另外，新的[[patch（patch-type），N]]属性用于指定补丁类型（patch-type）和patch (N)中的控制点数。 Listing 12-2是一个post-tessellation顶点函数签名的示例。

Listing 12-2  Post-tessellation vertex function signature

[[patch(quad, 16)]]
vertex float4 my_post_tessellation_vertex_function(...) {...}

注意：在OS X中，必须始终指定修补程序中的控制点数。在iOS和tvOS中，指定此值是可选的。如果指定了此值，则它必须与补丁绘制调用的numberOfPatchControlPoints参数的值匹配。

Post-Tessellation Vertex Function Inputs - Post-Tessellation顶点函数输入

post-tessellation顶点函数的所有输入都作为以下一个或多个参数传递：

缓冲区（在device或constant地址空间中声明），纹理或采样器等资源。
每个补丁数据和补丁控制点数据。这些可以直接从缓冲区中读取，也可以作为使用[[stage_in]]限定符声明的输入传递给post-tessellation顶点函数。
内置变量，如表12-1所示。

Metal框架详细解析（四十六） —— Metal编程指南之Tessellation（十一）_第4张图片

Table 12-1 Attribute qualifiers for post-tessellation vertex function input arguments

Post-Tessellation Vertex Function Outputs - Post-Tessellation顶点函数输出

post-tessellation顶点函数的输出与常规顶点函数相同。如果post-tessellation顶点函数写入缓冲区，则其返回类型必须为void。

Tessellation Pipeline State - Tessellation管道状态

本节总结了支持tessellation的Metal框架API的主要更改，这些更改与tessellation管道状态有关。

1. Building a Compute Pipeline - 构建计算管道

在构建MTLComputePipelineState对象时，计算内核被指定为计算管道的一部分，如Listing 12-3所示。为获得最佳性能，应在帧中尽早执行计算内核。（为了支持计算内核或tessellation，现有计算管道API没有变化。）

Listing 12-3  Building a compute pipeline with a compute kernel

// Fetch the compute kernel from the library
id  computeKernel = [_library newFunctionWithName:@"my_compute_kernel"];
 
// Build the compute pipeline
NSError *pipelineError = NULL;
_computePipelineState = [_device newComputePipelineStateWithFunction:computeKernel error:&pipelineError];
if (!_computePipelineState) {
    NSLog(@"Failed to create compute pipeline state, error: %@", pipelineError);
}

2. Building a Render Pipeline - 构建渲染管道

tessellator配置为渲染管道的一部分，使用MTLRenderPipelineDescriptor对象构建MTLRenderPipelineState对象。使用vertexFunction属性指定post-tessellation顶点函数。 Listing 12-4演示了如何使用tessellator和post-tessellation顶点函数配置和构建渲染管道。有关详细信息，请参阅MTLRenderPipelineDescriptor类引用的Specifying Tessellation State和MTLTessellationFactorStepFunction部分。

Listing 12-4  Building a render pipeline with a tessellator and a post-tessellation vertex function

// Fetch the post-tessellation vertex function from the library
id  postTessellationVertexFunction = [_library newFunctionWithName:@"my_post_tessellation_vertex_function"];
 
// Fetch the fragment function from the library
id  fragmentFunction = [_library newFunctionWithName:@"my_fragment_function"];
 
// Configure the render pipeline, using the default tessellation values
MTLRenderPipelineDescriptor *renderPipelineDescriptor = [MTLRenderPipelineDescriptor new];
renderPipelineDescriptor.colorAttachments[0].pixelFormat = _view.colorPixelFormat;
renderPipelineDescriptor.fragmentFunction = fragmentFunction;
renderPipelineDescriptor.vertexFunction = postTessellationVertexFunction;
renderPipelineDescriptor.maxTessellationFactor = 16;
renderPipelineDescriptor.tessellationFactorScaleEnabled = NO;
renderPipelineDescriptor.tessellationFactorFormat = MTLTessellationFactorFormatHalf;
renderPipelineDescriptor.tessellationControlPointIndexType = MTLTessellationControlPointIndexTypeNone;
renderPipelineDescriptor.tessellationFactorStepFunction = MTLTessellationFactorStepFunctionConstant;
renderPipelineDescriptor.tessellationOutputWindingOrder = MTLWindingClockwise;
renderPipelineDescriptor.tessellationPartitionMode = MTLTessellationPartitionModePow2;
 
// Build the render pipeline
NSError *pipelineError = NULL;
_renderPipelineState = [_device newRenderPipelineStateWithDescriptor:renderPipelineDescriptor error:&pipelineError];
if (!_renderPipelineState) {
    NSLog(@"Failed to create render pipeline state, error %@", pipelineError);
}

Patch Draw Calls - Patch绘制调用

本节总结了支持tessellation的Metal框架API的主要更改，与补丁绘制调用有关。

1. Drawing Tessellated Patches - 绘制Tessellated补丁

要呈现tessellated补丁的大量实例，请调用以下MTLRenderCommandEncoder方法之一：

drawPatches:patchStart:patchCount:patchIndexBuffer:patchIndexBufferOffset:instanceCount:baseInstance:
drawPatches:patchIndexBuffer:patchIndexBufferOffset:indirectBuffer:indirectBufferOffset:
drawIndexedPatches:patchStart:patchCount:patchIndexBuffer:patchIndexBufferOffset:controlPointIndexBuffer:controlPointIndexBufferOffset:instanceCount:baseInstance:
drawIndexedPatches:patchIndexBuffer:patchIndexBufferOffset:controlPointIndexBuffer:controlPointIndexBufferOffset:indirectBuffer:indirectBufferOffset:

注意：只有将vertexFunction属性设置为post-tessellation顶点函数时，才能调用这些补丁绘制调用。调用非补丁绘制调用会导致验证层报告错误。
Patch绘制调用不支持基本重启功能。

对于所有补丁绘制调用，每个补丁数据和一个补丁控制点数组被组织起来，用于在连续数组元素中进行渲染，从baseInstance参数中指定的值开始。有关每个参数的详细信息，请参阅MTLRenderCommandEncoder协议参考的Drawing Tessellated Patches部分。

为了呈现补丁数据，补丁绘制调用每补丁数据和补丁控制点数据。补丁数据通常一起存储在一个或多个缓冲区中的一个或多个网格的所有补丁中。执行计算内核以生成依赖于场景的per-patch tessellation；计算内核可能决定仅为未被丢弃的补丁生成因子，在这种情况下补丁不是连续的。因此，补丁索引缓冲区用于标识要绘制的补丁的补丁ID。

来自[patchStart，patchStart + patchCount-1]的缓冲区索引（drawPatchIndex）用于引用数据。如果用于获取每个补丁数据和补丁控制点数据的补丁索引不是连续的，则drawPatchIndex可以引用patchIndexBuffer，如图12-4所示。

Metal框架详细解析（四十六） —— Metal编程指南之Tessellation（十一）_第5张图片

Figure 12-4 Using patchIndexBuffer to fetch per-patch data and patch control point data

patchIndexBuffer的每个元素都包含一个32位patchIndex值，该值引用每个补丁数据和补丁控制点数据。从patchIndexBuffer获取的patchIndex位于：（drawPatchIndex * 4）+ patchIndexBufferOffset。

补丁的控制点索引通过以下方式计算：

patchIndex * numberOfPatchControlPoints *（（patchIndex + 1）* numberOfPatchControlPoints） - 1

patchIndexBuffer还使用于读取每个补丁数据和补丁控制点数据的patchIndex与用于读取每补丁tessellation因子的索引不同。对于tessellator，drawPatchIndex直接用作获取每个补丁tessellation因子的索引。

如果patchIndexBuffer为NULL，则drawPatchIndex和patchIndex的值相同，如图12-5所示。

Metal框架详细解析（四十六） —— Metal编程指南之Tessellation（十一）_第6张图片

Figure 12-5 Fetching per-patch data and patch control point data, if patchIndexBuffer is NULL

如果控制点在补丁之间共享或补丁控制点数据不连续，请使用drawIndexedPatches方法。 patchIndex引用一个指定的controlPointIndexBuffer，它包含一个补丁的控制点索引，如图12-6所示。（tessellationControlPointIndexType描述controlPointIndexBuffer中控制点索引的大小，并且必须是MTLTessellationControlPointIndexTypeUInt16或MTLTessellationControlPointIndexTypeUInt32。）

Metal框架详细解析（四十六） —— Metal编程指南之Tessellation（十一）_第7张图片

Figure 12-6 Using controlPointIndexBuffer to fetch patch control point data

controlPointIndexBuffer中第一个控制点索引的实际位置计算如下：

controlPointIndexBufferOffset +（patchIndex * numberOfPatchControlPoints * controlPointIndexType == UInt16？2：4）

几个（numberOfPatchControlPoints）控制点索引必须连续存储在controlPointIndexBuffer中，从第一个控制点索引的位置开始。

Sample Code - 示例代码

有关如何设置基本tessellation的示例，请参阅MetalBasicTessellation示例。

Porting DirectX 11-Style Tessellation Shaders to Metal - 将DirectX 11样式Tessellation着色器移植到Metal

本节介绍如何将DirectX 11样式tessellation顶点和外壳着色器移植到Metal计算内核。

注意：Metal tessellator执行DirectX 11的tessellator的等效计算。 Metal post-tessellation顶点函数执行DirectX 11域着色器的等效计算。

在DirectX 11中，为补丁的每个控制点执行HLSL顶点着色器。 HLSL外壳着色器由两个函数指定：一个为补丁的每个控制点执行的函数，另一个执行每个补丁的函数。顶点着色器的输出是构成外壳着色器的这两个函数的输入。

Listing 12-5显示了一个简单的HLSL顶点和外壳着色器

Listing 12-5  Simple HLSL vertex and hull shader

struct VertexIn
{
    float3 PosL;
    float3 NormalL;
    float3 TangentL;
    float2 Tex;
};
 
struct VertexOut
{
    float3 PosW       : POSITION;
    float3 NormalW    : NORMAL;
    float3 TangentW   : TANGENT;
    float2 Tex        : TEXCOORD;
    float  TessFactor : TESS;
};
 
VertexOut VS(VertexIn vin)
{
    VertexOut vout;
 
    // Transform to world space space.
    vout.PosW     = mul(float4(vin.PosL, 1.0f), gWorld).xyz;
    vout.NormalW  = mul(vin.NormalL, (float3x3)gWorldInvTranspose);
    vout.TangentW = mul(vin.TangentL, (float3x3)gWorld);
 
    // Output vertex attributes for interpolation across triangle.
    vout.Tex = mul(float4(vin.Tex, 0.0f, 1.0f), gTexTransform).xy;
 
    float d = distance(vout.PosW, gEyePosW);
 
    // Normalized tessellation factor.
    // The tessellation is
    //   0 if d >= gMinTessDistance and
    //   1 if d <= gMaxTessDistance.
    float tess = saturate( (gMinTessDistance - d) /
                   (gMinTessDistance - gMaxTessDistance) );
 
    // Rescale [0,1] --> [gMinTessFactor, gMaxTessFactor].
    vout.TessFactor = gMinTessFactor + tess*(gMaxTessFactor-gMinTessFactor);
 
    return vout;
}
 
struct HullOut
{
    float3 PosW     : POSITION;
    float3 NormalW  : NORMAL;
    float3 TangentW : TANGENT;
    float2 Tex      : TEXCOORD;
};
 
[domain("tri")]
[partitioning("fractional_odd")]
[outputtopology("triangle_cw")]
[outputcontrolpoints(3)]
[patchconstantfunc("PatchHS")]
HullOut HS(InputPatch p,
           uint i : SV_OutputControlPointID,
           uint patchId : SV_PrimitiveID)
{
    HullOut hout;
 
    // Pass through shader.
    hout.PosW     = p[i].PosW;
    hout.NormalW  = p[i].NormalW;
    hout.TangentW = p[i].TangentW;
    hout.Tex      = p[i].Tex;
 
    return hout;
}
 
struct PatchTess
{
    float EdgeTess[3] : SV_TessFactor;
    float InsideTess  : SV_InsideTessFactor;
};
 
PatchTess PatchHS(InputPatch patch,
                  uint patchID : SV_PrimitiveID)
{
    PatchTess pt;
 
    // Average tess factors along edges, and pick an edge tess factor for
    // the interior tessellation.  It is important to do the tess factor
    // calculation based on the edge properties so that edges shared by
    // more than one triangle will have the same tessellation factor.
    // Otherwise, gaps can appear.
    pt.EdgeTess[0] = 0.5f*(patch[1].TessFactor + patch[2].TessFactor);
    pt.EdgeTess[1] = 0.5f*(patch[2].TessFactor + patch[0].TessFactor);
    pt.EdgeTess[2] = 0.5f*(patch[0].TessFactor + patch[1].TessFactor);
    pt.InsideTess  = pt.EdgeTess[0];
 
    return pt;
}

这些简单的HLSL顶点和外壳着色器可以移植到Metal函数，并且可以创建一个调用这些Metal函数的计算内核，将这些函数作为单个内核执行。移植的顶点和控制点外壳函数在计算内核中被称为每线程(per-thread)，后跟一个线程组阻塞，然后每个补丁程序的外壳函数由线程组中的线程子集执行。能够直接在内核中调用已转换的顶点和外壳函数，使开发人员可以轻松地将其顶点和外壳着色器从DirectX 11移植到Metal。

简单的HLSL顶点和外壳着色器可以移植到Listing 12-6中所示的Metal函数

Listing 12-6  Simple HLSL vertex and hull shader ported to Metal functions

struct VertexIn
{
    float3 PosL  [[ attribute(0) ]];
    float3 NormalL  [[ attribute(1) ]];
    float3 TangentL  [[ attribute(2) ]];
    float2 Tex  [[ attribute(3) ]];
};
 
struct VertexOut
{
    float3 PosW [[ position ]];
    float3 NormalW;
    float3 TangentW;
    float2 Tex;
    float  TessFactor;
};
 
struct ConstantData {
    …;
}
 
// The vertex control point function
VertexOut
VS(VertexIn vin,
   constant ConstantData &c)
{
    VertexOut vout;
 
    // Transform to world space space.
    vout.PosW     = mul(float4(vin.PosL, 1.0f), c.gWorld).xyz;
    vout.NormalW  = mul(vin.NormalL, (float3x3)c.gWorldInvTranspose);
    vout.TangentW = mul(vin.TangentL, (float3x3)c.gWorld);
 
    // Output vertex attributes for interpolation across triangle.
    vout.Tex = mul(float4(vin.Tex, 0.0f, 1.0f), c.gTexTransform).xy;
 
    float d = distance(vout.PosW, gEyePosW);
 
    // Normalized tessellation factor.
    // The tessellation is
    //   0 if d >= gMinTessDistance and
    //   1 if d <= gMaxTessDistance.
    float tess = saturate( (c.gMinTessDistance - d) /
                   (c.gMinTessDistance - c.gMaxTessDistance) );
 
    // Rescale [0,1] --> [gMinTessFactor, gMaxTessFactor].
    vout.TessFactor = c.gMinTessFactor +
                tess * (c.gMaxTessFactor - c.gMinTessFactor);
 
    return vout;
}
 
struct HullOut
{
    float3 PosW [[ position ]];
    float3 NormalW;
    float3 TangentW;
    float2 Tex;
}
 
// The patch control point function
HullOut
HS(VertexOut p)
{
    HullOut hout;
 
    // Pass through shader.
    hout.PosW     = p.PosW;
    hout.NormalW  = p.NormalW;
    hout.TangentW = p.TangentW;
    hout.Tex      = p.Tex;
 
    return hout;
}
 
struct PatchTess
{
    packed_half3 EdgeTess;
    half  InsideTess;
};
 
// The per-patch function
PatchTess
PatchHS(threadgroup VertexOut *patch)
{
    PatchTess pt;
 
    // Average tess factors along edges, and pick an edge tess factor for
    // the interior tessellation.  It is important to do the tess factor
    // calculation based on the edge properties so that edges shared by
    // more than one triangle will have the same tessellation factor.
    // Otherwise, gaps can appear.
    pt.EdgeTess[0] = 0.5f*(patch[1].TessFactor + patch[2].TessFactor);
    pt.EdgeTess[1] = 0.5f*(patch[2].TessFactor + patch[0].TessFactor);
    pt.EdgeTess[2] = 0.5f*(patch[0].TessFactor + patch[1].TessFactor);
    pt.InsideTess  = pt.EdgeTess[0];
 
    return pt;
}
A compute kernel that calls these vertex and hull functions can be:
struct KernelPatchInfo {
    uint numPatches; // total number of patches to process.
                     // we need this because this value may
                     // not be a multiple of threadgroup size.
    ushort numPatchesInThreadGroup; // number of patches processed by a
                                    // thread-group
    ushort numControlPointsPerPatch;
};  // passed as a constant buffer using setBytes by the runtime
 
kernel void
PatchKernel(VertexIn vIn [[ stage_in ]],
            constant ConstantData &c [[ buffer(1) ]],
            constant KernelPatchInfo &patchInfo [[ buffer(2) ]],
            PatchTess *tessellationFactorBuffer [[ buffer(3) ]],
            device HullOut *hullOutputBuffer [[ buffer(4) ]],
            threadgroup HullOut *hullOutputTGBuffer [[ threadgroup(0) ]],
            uint tID [[ thread_position_in_grid ]],
            ushort lID [[ thread_position_in_threadgroup ]],
            ushort lSize [[ threads_in_threadgroup ]],
            ushort groupID [[ threadgroup_position_in_grid ]])
{
    ushort n = patchInfo.numControlPointsPerPatch;
    uint patchGroupID = groupID * patchInfo.numPatchesInThreadGroup;
 
    // execute the vertex and control-point hull function per-thread
    if ( (lID <= (patchInfo.numPatchesInThreadGroup * n) &&
         (tID <= (patchInfo.numPatches * n)) )
    {
        uint controlPointID = patchGroupID * n + lID;
 
        VertexOut vOut = VS(vIn, c);
      HullOut hOut = HS(vOut);
 
        hullOutputTGBuffer[lID] = hOut;
      hullOutputBuffer[controlPointID] = hOut;
    }
 
    threadgroup_barrier(mem_flags::mem_threadgroup);
 
    // execute the per-patch hull function
    if (lID < patchInfo.numPatchesInThreadGroup)
    {
        uint patchID = patchGroupID + lID;
        tessellationFactorBuffer[patchID] = PatchHS(
                                                  hullOutputTGBuffer[lID*n]);
    }
}

注意：

线程组大小应设置为SIMD大小或SIMD大小的倍数。

线程组中的补丁数由补丁中的threadgroup size / number of control points给出。

每个补丁的控制点数量在HLSL外壳着色器中描述。

在此移植示例中，输入和输出控制点的数量是相同的。通过对计算内核的一些修改，也可以支持输入和输出控制点的数量不相同的情况。

后记

本篇主要讲述了Metal编程指南之Tessellation，感兴趣的给个赞或者关注~~~

Metal框架详细解析（四十六） —— Metal编程指南之Tessellation（十一）_第8张图片

Metal框架详细解析（四十六） —— Metal编程指南之Tessellation（十一）