终于可以基于Apple的M1芯片来对神经网络训练做GPU加速了。
根据PyTorch文档信息,PyTorch v1.12开始支持在Apple M1系列芯片上进行GPU加速的神经网络训练。安装时有以下要求:
PyTorch安装脚本:
conda install pytorch torchvision torchaudio -c pytorch
安装完成后,在Terminal上进入python环境,通过以下方式确认python是否为arm64架构。
Python 3.10.4 (main, Mar 31 2022, 03:37:37) [Clang 12.0.0 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import platform
>>> print(platform.uname()[4])
arm64
通过以下方式确认是否可以使用Apple芯片的GPU加速。
Python 3.10.4 (main, Mar 31 2022, 03:37:37) [Clang 12.0.0 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.backends.mps.is_available()
True
如果torch.backends.mps.is_available()返回True,则表示MPS后端可用,可以进行GPU加速。
在本章节中实现一个卷积神经网络,运行MNIST数据集,并测试用例分别在M1芯片的CPU和MPS设备上的运行时间。代码如下所示:
import torch
from torch.utils.data import DataLoader
from torchvision import datasets, transforms
import datetime
class Net(torch.nn.Module):
def __init__(self):
super(Net, self).__init__()
# 卷积层#1
self.conv1 = torch.nn.Conv2d(1, 10, kernel_size=5)
# 卷积层#2
self.conv2 = torch.nn.Conv2d(10, 20, kernel_size=5)
# 池化层
self.pooling = torch.nn.MaxPool2d(2)
# 全连接层
self.fc = torch.nn.Linear(320, 10)
def forward(self, x):
batch_size = x.size(0)
x = torch.functional.F.relu(self.pooling(self.conv1(x)))
x = torch.functional.F.relu(self.pooling(self.conv2(x)))
x = x.view(batch_size, -1)
return self.fc(x)
def train(epoch_num, dev):
running_loss = 0.0
data_cnt = 0
for i, (data, target) in enumerate(train_loader, 0):
inputs, label = data.to(dev), target.to(dev)
y_pred = model(inputs)
loss = criterion(y_pred, label)
optimizer.zero_grad()
loss.backward()
optimizer.step()
running_loss += loss.item()
data_cnt += 1
print('[#%d] loss: %.3f' % (epoch_num + 1, running_loss / data_cnt))
def test(dev):
correct = 0
total = 0
with torch.no_grad():
for data, target in test_loader:
inputs, label = data.to(dev), target.to(dev)
y_pred = model(inputs)
_, predicted = torch.max(y_pred.data, dim=1)
total += label.size(0)
correct += (predicted == label).sum().item()
print('Accuracy on test set: %d %%' % (100 * correct / total))
if __name__ == '__main__':
start_time = datetime.datetime.now()
device = torch.device("mps") # 运行在cpu时,修改为"cpu"
print("Using Device: {}".format(device))
transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))])
train_data = datasets.MNIST(root='../data/mnist', train=True, download=False, transform=transform)
test_data = datasets.MNIST(root='../data/mnist', train=False, download=False, transform=transform)
train_loader = DataLoader(train_data, batch_size=128, shuffle=True)
test_loader = DataLoader(test_data, batch_size=128, shuffle=True)
model = Net().to(device)
criterion = torch.nn.CrossEntropyLoss(reduction='mean')
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)
for epoch in range(10):
train(epoch, device)
test(device)
end_time = datetime.datetime.now()
print('time cost: %f seconds' % (end_time - start_time).seconds)
运行时间统计如下:
device | time (s) |
cpu | 142 |
mps | 58 |
由表中数据可知,M1 Pro芯片上GPU相比CPU的加速比为2.45。当然,这个加速比可能随使用场景等会发生变化,本文主要想说明确实可以使用M1芯片上的GPU进行神经网络训练了。