cpu消耗 pytorch_Pytorch用GPU到底能比CPU快多少?

给踌躇于要不要买GPU的朋友们做一点微小的贡献:

同一段脚本,同样的数据量,同样的神经网络配置,用cpu和gpu分别计算,看看分别用了多长时间。

嫌麻烦的同学可以提前看结论:

同样的钱,买GPU得到的计算能力,是CPU的15倍。

15倍。

神经网络配置:

5层hidden layer,每层500个nodes,一共500个epochs,做一次实验;

5层hidden layer,每层1000个nodes,一共1000个epochs,再做一次实验;

CPU信息:

4 Intel(R) Core(TM) i5-6600K CPU @ 3.50GHz

4核的inteli5-6600K处理器,主频3.50GHz。目前市场价大约250刀。

如图所示,确实4个cpu核心都用上了,都在干活儿。

GPU信息:

NVIDIA GeForce GTX 1070 8GB

一块1070的GPU。我用的是的1070,550刀; 假如是业界挖矿明星1080Ti,应该会更快,1080Ti目前大约950刀。我买1070的原因是便宜。根据下图userbenchmark网站的统计结果,1080Ti的速度比1070高56%,但价格高了近一倍,所以我觉得1080Ti不太划算,最终买的1070。(不要被下图的价格误导了,那个价格是三个月内的全网最低价,实际上是买不到的,我说的价格才是市场平均价。)

结果:

对于500nodes,500epoches的case:

CPU用时2分30秒;

GPU用时4秒;

GPU速度是CPU的37倍;

考虑到GPU速度太快,作为样本不太好,里面不确定因素多,所以我们再做一次实验。

对于1000nodes,1000epoches的case:

CPU用时11分18秒;

GPU用时21秒;

GPU速度是CPU的32倍;

可以算出大致时间相差32-37倍。

比较价格,

CPU250刀;

GPU550刀;

计算性价比:

32×250/550=14.5

37×250/550=16.8

结论:

对于3.50GHz的CPU和8G的GPU,两者的速度差大约在32-37倍;

性价比上,同样的钱买GPU和买CPU,在做神经网络的时候,速度上大约有14.5~16.8倍的差距。

对比其他人的研究:“GPUS ARE ONLY UP TO 14 TIMES FASTER THAN CPUS” SAYS INTEL | The Official NVIDIA Blog​blogs.nvidia.com

nvidia官网引用interl的研究,表示有14倍的差距; 我们的计算结果相差不大。

附件:

脚本:

import torch

import torch.nn as nn

import torch.nn.functional as F

import numpy as np

import matplotlib.pyplot as plt

from torch.autograd import Variable

import time

# print start time

print "Start time = "+time.ctime()

# read data

inp = np.loadtxt("input" , dtype=np.float32)

oup = np.loadtxt("output", dtype=np.float32)

#inp = inp*[4,100,1,4,0.04,1]

oup = oup*500

inp = inp.astype(np.float32)

oup = oup.astype(np.float32)

# Hyper Parameters

input_size = inp.shape[1]

hidden_size = 1000

output_size = 1

num_epochs = 1000

learning_rate = 0.001

# Toy Dataset

x_train = inp

y_train = oup

# Linear Regression Model

class Net(nn.Module):

def __init__(self, input_size, hidden_size, output_size):

super(Net, self).__init__()

#self.fc1 = nn.Linear(input_size, hidden_size)

self.fc1 = nn.Linear(input_size, hidden_size)

self.l1 = nn.ReLU()

self.l2 = nn.Sigmoid()

self.l3 = nn.Tanh()

self.l4 = nn.ELU()

self.l5 = nn.Hardshrink()

self.ln = nn.Linear(hidden_size, hidden_size)

self.fc2 = nn.Linear(hidden_size, output_size)

def forward(self, x):

out = self.fc1(x)

out = self.l3(out)

out = self.ln(out)

out = self.l1(out)

out = self.fc2(out)

return out

model = Net(input_size, hidden_size, output_size)

# Loss and Optimizer

criterion = nn.MSELoss()

optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

###### GPU

if torch.cuda.is_available():

print "We are using GPU now!!!"

model = model.cuda()

# Train the Model

for epoch in range(num_epochs):

# Convert numpy array to torch Variable

if torch.cuda.is_available():

inputs = Variable(torch.from_numpy(x_train).cuda())

targets = Variable(torch.from_numpy(y_train).cuda())

else:

inputs = Variable(torch.from_numpy(x_train))

targets = Variable(torch.from_numpy(y_train))

# Forward + Backward + Optimize

optimizer.zero_grad()

outputs = model(inputs)

loss = criterion(outputs, targets)

loss.backward()

optimizer.step()

if (epoch+1) % 5 == 0:

print ('Epoch [%d/%d], Loss:%.4f'

%(epoch+1, num_epochs, loss.data[0]))

# print end time

print "End time = "+time.ctime()

# Plot the graph

if torch.cuda.is_available():

predicted = model(Variable(torch.from_numpy(x_train).cuda())).data.cpu().numpy()

else:

predicted = model(Variable(torch.from_numpy(x_train))).data.numpy()

plt.plot( y_train/500, 'r-', label='Original data')

plt.plot( predicted/500,'-', label='Fitted line')

#plt.plot(y_train/500, predicted/500,'.', label='Fitted line')

plt.legend()

plt.show()

# Save the Model

torch.save(model.state_dict(), 'model.pkl')

input(1M)和output(140K)文件太大了.没法儿贴上来.我将他们放入google driven,有需要的同学可以联系我下载.https://drive.google.com/open?id=1xJFjwQEgR0ZT89PVEZruUZcq96r1D9om​drive.google.com

也可以自己生成,input是12225*9的矩阵,output是12225*1的矩阵.

你可能感兴趣的:(cpu消耗,pytorch)