Metal 学习笔记

使用GPU计算的流程

https://developer.apple.com/documentation/metal/basic_tasks_and_concepts/performing_calculations_on_a_gpu

1. 写一个 C语言的GPU函数

void add_arrays(const float* inA,
                const float* inB,
                float* result,
                int length)
{
    for (int index = 0; index < length ; index++)
    {
        result[index] = inA[index] + inB[index];
    }
}

2. 将C语言函数转化成Metal着色语言（MSL）

kernel void add_arrays(device const float* inA,
                       device const float* inB,
                       device float* result,
                       uint index [[thread_position_in_grid]])
{
    // the for-loop is replaced with a collection of threads, each of which
    // calls this function.
    result[index] = inA[index] + inB[index];
}

3. 找一个GPU设备（MTLDevice）

id device = MTLCreateSystemDefaultDevice();

4. 初始化Metal实体们

MetalAdder* adder = [[MetalAdder alloc] initWithDevice:device]; //用它来管理需要需Metal通讯的实体

5. 引用Metal函数

Metal函数在 app 的默认 Metal Library 里，所以使用 MTLDevice 获取 MTLLibrary，然后通过 MTLLibrary 或者MTLFunction（Metal 函数）

- (instancetype) initWithDevice: (id) device
{
    self = [super init];
    if (self)
    {
        _mDevice = device;
        
        NSError* error = nil;
        
        // Load the shader files with a .metal file extension in the project

        id defaultLibrary = [_mDevice newDefaultLibrary];
        if (defaultLibrary == nil)
        {
            NSLog(@"Failed to find the default library.");
            return nil;
        }

        id addFunction = [defaultLibrary newFunctionWithName:@"add_arrays"];
        if (addFunction == nil)
        {
            NSLog(@"Failed to find the adder function.");
            return nil;
        }
//官网就没有结束大括号，也许是这个方法实际还没结束？

6. 准备Metal管道

Metal函数不是真正的可执行代码，Metal管道将函数转化成实际可执行代码。在Metal中，管道表示为pipeline state object （创建管道的时候编译代码）

_mAddFunctionPSO = [_mDevice newComputePipelineStateWithFunction: addFunction error:&error];

7. 创建命令队列

给GPU发送命令，需要一个命令队列

_mCommandQueue = [_mDevice newCommandQueue];

8. 创建Buffer和数据

Metal使用MTLResource管理内存，使用MTLDevice实例创建内存（实际使用MTLBuffer表示创建的buffer，是MTLResource的子类）

_mBufferA = [_mDevice newBufferWithLength:bufferSize options:MTLResourceStorageModeShared];//MTLResourceStorageModeShared可以让CPU和GPU共享
_mBufferB = [_mDevice newBufferWithLength:bufferSize options:MTLResourceStorageModeShared];
_mBufferResult = [_mDevice newBufferWithLength:bufferSize options:MTLResourceStorageModeShared];

[self generateRandomFloatData:_mBufferA];
[self generateRandomFloatData:_mBufferB];

- (void) generateRandomFloatData: (id) buffer
{
    float* dataPtr = buffer.contents;
    
    for (unsigned long index = 0; index < arrayLength; index++)
    {
        dataPtr[index] = (float)rand()/(float)(RAND_MAX);
    }
}

9.创建Command Buffer

id commandBuffer = [_mCommandQueue commandBuffer];

10.创建命令编码器 Command Encoder

为了将命令写入Command Buffer，需要一个命令解码器来传递具体哪种命令，这里使用计算命令编码器。
它编码出一个计算通路，里面有一列命令，每个计算命令都会导致GPU创建一个矩阵表（grid）来执行

id computeEncoder = [commandBuffer computeCommandEncoder];

为了编码一个命令，可以对编码器调用一系列方法，有一些设置状态信息，比如pipeline state object（PSO），或者传递给管道的参数。当作出这些状态改变后，会编码命令来执行管道。编码器把所有的状态改变和命令参数写入Command Buffer

11.设置Pipeline State和参数数据

先设置管道要执行的Pipeline state object，再设置add_arrays函数需要处理的数据，这里的index和add_arrays的参数位置对应。offset是buffer的偏移量。也可以用同一个buffer，不同偏移量，代表不同参数

[computeEncoder setComputePipelineState:_mAddFunctionPSO];
[computeEncoder setBuffer:_mBufferA offset:0 atIndex:0];
[computeEncoder setBuffer:_mBufferB offset:0 atIndex:1];
[computeEncoder setBuffer:_mBufferResult offset:0 atIndex:2];

12.指定线程数和组织方式

Metal可以处理1D，2D和3D数据，本例是1D数据，所以传datasize * 1 * 1作为参数

MTLSize gridSize = MTLSizeMake(arrayLength, 1, 1);

13.指定线程组大小

Metal把整个数据表分割成小的表，叫做线程组，每个线程组独立运行，分发给不同的GPU处理单元，来加速处理。你需要决定线程组有多大

NSUInteger threadGroupSize = _mAddFunctionPSO.maxTotalThreadsPerThreadgroup;//目前可用的最大的线程数量
if (threadGroupSize > arrayLength)
{
    threadGroupSize = arrayLength;
}
MTLSize threadgroupSize = MTLSizeMake(threadGroupSize, 1, 1);

14.编码计算命令并执行线程

[computeEncoder dispatchThreads:gridSize
          threadsPerThreadgroup:threadgroupSize];

编码器可以编码多个命令，而无需多余步骤

15.结束计算通路

[computeEncoder endEncoding];

16.提交Command Buffer来执行命令

[commandBuffer commit];

Metal异步执行这些命令，在执行完以后，command buffer会被标记成已完成

17.等待计算完成

[commandBuffer waitUntilCompleted];

这个方法可以同步等待计算完成，也可以对command buffer添加addCompletedHandler(_:)，或者检查status属性来获取完成状态

18. 从Buffer中读取结果

例子是读取结果，然后cpu再算一遍，看看gpu算的对不对

- (void) verifyResults
{
    float* a = _mBufferA.contents;
    float* b = _mBufferB.contents;
    float* result = _mBufferResult.contents;

    for (unsigned long index = 0; index < arrayLength; index++)
    {
        if (result[index] != (a[index] + b[index]))
        {
            printf("Compute ERROR: index=%lu result=%g vs %g=a+b\n",
                   index, result[index], a[index] + b[index]);
            assert(result[index] == (a[index] + b[index]));
        }
    }
    printf("Compute results as expected\n");
}

Metal 学习笔记

使用GPU计算的流程

1. 写一个 C语言的GPU函数

2. 将C语言函数转化成Metal着色语言（MSL）

3. 找一个GPU设备（MTLDevice）

4. 初始化Metal实体们

5. 引用Metal函数

6. 准备Metal管道

7. 创建命令队列

8. 创建Buffer和数据

9.创建Command Buffer

10.创建命令编码器 Command Encoder

11.设置Pipeline State和参数数据

12.指定线程数和组织方式

13.指定线程组大小

14.编码计算命令并执行线程

15.结束计算通路

16.提交Command Buffer来执行命令

17.等待计算完成

18. 从Buffer中读取结果

19. 可算完成了

你可能感兴趣的:(Metal 学习笔记)

Metal 学习笔记

使用GPU计算的流程

1. 写一个 C语言的GPU函数

2. 将C语言函数转化成Metal着色语言（MSL）

3. 找一个GPU设备（MTLDevice）

4. 初始化Metal实体们

5. 引用Metal函数

6. 准备Metal管道

7. 创建命令队列

8. 创建Buffer和数据

9.创建Command Buffer

10.创建命令编码器 Command Encoder

11.设置Pipeline State和参数数据

12.指定线程数和组织方式

13.指定线程组大小

14.编码 计算命令 并执行线程

15.结束计算通路

16.提交Command Buffer来执行命令

17.等待计算完成

18. 从Buffer中读取结果

19. 可算完成了

你可能感兴趣的:(Metal 学习笔记)

14.编码计算命令并执行线程