在Apple Silicon的GPU上运行PyTorch

前言

终于可以基于Apple的M1芯片来对神经网络训练做GPU加速了。

PyTorch安装

根据PyTorch文档信息,PyTorch v1.12开始支持在Apple M1系列芯片上进行GPU加速的神经网络训练。安装时有以下要求:

  • MacOS版本在12.3以上。本人使用的版本为12.5,芯片为Apple M1 Pro。
  • 安装arm64架构的python版本。
  • PyTorch要求安装v1.12以上版本。本人安装的是v1.12.1。

PyTorch安装脚本:

conda install pytorch torchvision torchaudio -c pytorch

安装完成后,在Terminal上进入python环境,通过以下方式确认python是否为arm64架构。

Python 3.10.4 (main, Mar 31 2022, 03:37:37) [Clang 12.0.0 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import platform
>>> print(platform.uname()[4])
arm64

通过以下方式确认是否可以使用Apple芯片的GPU加速。

Python 3.10.4 (main, Mar 31 2022, 03:37:37) [Clang 12.0.0 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.backends.mps.is_available()
True

如果torch.backends.mps.is_available()返回True,则表示MPS后端可用,可以进行GPU加速。

模型训练效率对比

在本章节中实现一个卷积神经网络,运行MNIST数据集,并测试用例分别在M1芯片的CPU和MPS设备上的运行时间。代码如下所示:

import torch
from torch.utils.data import DataLoader
from torchvision import datasets, transforms
import datetime


class Net(torch.nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        # 卷积层#1
        self.conv1 = torch.nn.Conv2d(1, 10, kernel_size=5)
        # 卷积层#2
        self.conv2 = torch.nn.Conv2d(10, 20, kernel_size=5)
        # 池化层
        self.pooling = torch.nn.MaxPool2d(2)
        # 全连接层
        self.fc = torch.nn.Linear(320, 10)

    def forward(self, x):
        batch_size = x.size(0)
        x = torch.functional.F.relu(self.pooling(self.conv1(x)))
        x = torch.functional.F.relu(self.pooling(self.conv2(x)))
        x = x.view(batch_size, -1)
        return self.fc(x)


def train(epoch_num, dev):
    running_loss = 0.0
    data_cnt = 0
    for i, (data, target) in enumerate(train_loader, 0):
        inputs, label = data.to(dev), target.to(dev)
        y_pred = model(inputs)
        loss = criterion(y_pred, label)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        running_loss += loss.item()
        data_cnt += 1
    print('[#%d] loss: %.3f' % (epoch_num + 1, running_loss / data_cnt))


def test(dev):
    correct = 0
    total = 0
    with torch.no_grad():
        for data, target in test_loader:
            inputs, label = data.to(dev), target.to(dev)
            y_pred = model(inputs)
            _, predicted = torch.max(y_pred.data, dim=1)
            total += label.size(0)
            correct += (predicted == label).sum().item()
    print('Accuracy on test set: %d %%' % (100 * correct / total))


if __name__ == '__main__':
    start_time = datetime.datetime.now()
    device = torch.device("mps")  # 运行在cpu时,修改为"cpu"
    print("Using Device: {}".format(device))
    transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))])
    train_data = datasets.MNIST(root='../data/mnist', train=True, download=False, transform=transform)
    test_data = datasets.MNIST(root='../data/mnist', train=False, download=False, transform=transform)
    train_loader = DataLoader(train_data, batch_size=128, shuffle=True)
    test_loader = DataLoader(test_data, batch_size=128, shuffle=True)

    model = Net().to(device)
    criterion = torch.nn.CrossEntropyLoss(reduction='mean')
    optimizer = torch.optim.SGD(model.parameters(), lr=0.01)
    for epoch in range(10):
        train(epoch, device)
        test(device)
    end_time = datetime.datetime.now()
    print('time cost: %f seconds' % (end_time - start_time).seconds)

运行时间统计如下:

device time (s)
cpu 142
mps 58

由表中数据可知,M1 Pro芯片上GPU相比CPU的加速比为2.45。当然,这个加速比可能随使用场景等会发生变化,本文主要想说明确实可以使用M1芯片上的GPU进行神经网络训练了。

参考文献

  • pytorch实现mnist数据集分类​​​​​​​
  • PyTorch支持了M1芯片等MacBook利用GPU训练,你会用它做什么?

你可能感兴趣的:(机器学习,pytorch,pytorch,深度学习,神经网络,python,机器学习)