给踌躇于要不要买GPU的朋友们做一点微小的贡献:
同一段脚本,同样的数据量,同样的神经网络配置,用cpu和gpu分别计算,看看分别用了多长时间。
嫌麻烦的同学可以提前看结论:
同样的钱,买GPU得到的计算能力,是CPU的15倍。
15倍。
神经网络配置:
5层hidden layer,每层500个nodes,一共500个epochs,做一次实验;
5层hidden layer,每层1000个nodes,一共1000个epochs,再做一次实验;
CPU信息:
4 Intel(R) Core(TM) i5-6600K CPU @ 3.50GHz
4核的inteli5-6600K处理器,主频3.50GHz。目前市场价大约250刀。
如图所示,确实4个cpu核心都用上了,都在干活儿。
GPU信息:
NVIDIA GeForce GTX 1070 8GB
一块1070的GPU。我用的是的1070,550刀; 假如是业界挖矿明星1080Ti,应该会更快,1080Ti目前大约950刀。我买1070的原因是便宜。根据下图userbenchmark网站的统计结果,1080Ti的速度比1070高56%,但价格高了近一倍,所以我觉得1080Ti不太划算,最终买的1070。(不要被下图的价格误导了,那个价格是三个月内的全网最低价,实际上是买不到的,我说的价格才是市场平均价。)
结果:
对于500nodes,500epoches的case:
CPU用时2分30秒;
GPU用时4秒;
GPU速度是CPU的37倍;
考虑到GPU速度太快,作为样本不太好,里面不确定因素多,所以我们再做一次实验。
对于1000nodes,1000epoches的case:
CPU用时11分18秒;
GPU用时21秒;
GPU速度是CPU的32倍;
可以算出大致时间相差32-37倍。
比较价格,
CPU250刀;
GPU550刀;
计算性价比:
32×250/550=14.5
37×250/550=16.8
结论:
对于3.50GHz的CPU和8G的GPU,两者的速度差大约在32-37倍;
性价比上,同样的钱买GPU和买CPU,在做神经网络的时候,速度上大约有14.5~16.8倍的差距。
对比其他人的研究:“GPUS ARE ONLY UP TO 14 TIMES FASTER THAN CPUS” SAYS INTEL | The Official NVIDIA Blogblogs.nvidia.com
nvidia官网引用interl的研究,表示有14倍的差距; 我们的计算结果相差不大。
附件:
脚本:
import torch
import torch.nn as nn
import torch.nn.functional as F
import numpy as np
import matplotlib.pyplot as plt
from torch.autograd import Variable
import time
# print start time
print "Start time = "+time.ctime()
# read data
inp = np.loadtxt("input" , dtype=np.float32)
oup = np.loadtxt("output", dtype=np.float32)
#inp = inp*[4,100,1,4,0.04,1]
oup = oup*500
inp = inp.astype(np.float32)
oup = oup.astype(np.float32)
# Hyper Parameters
input_size = inp.shape[1]
hidden_size = 1000
output_size = 1
num_epochs = 1000
learning_rate = 0.001
# Toy Dataset
x_train = inp
y_train = oup
# Linear Regression Model
class Net(nn.Module):
def __init__(self, input_size, hidden_size, output_size):
super(Net, self).__init__()
#self.fc1 = nn.Linear(input_size, hidden_size)
self.fc1 = nn.Linear(input_size, hidden_size)
self.l1 = nn.ReLU()
self.l2 = nn.Sigmoid()
self.l3 = nn.Tanh()
self.l4 = nn.ELU()
self.l5 = nn.Hardshrink()
self.ln = nn.Linear(hidden_size, hidden_size)
self.fc2 = nn.Linear(hidden_size, output_size)
def forward(self, x):
out = self.fc1(x)
out = self.l3(out)
out = self.ln(out)
out = self.l1(out)
out = self.fc2(out)
return out
model = Net(input_size, hidden_size, output_size)
# Loss and Optimizer
criterion = nn.MSELoss()
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)
###### GPU
if torch.cuda.is_available():
print "We are using GPU now!!!"
model = model.cuda()
# Train the Model
for epoch in range(num_epochs):
# Convert numpy array to torch Variable
if torch.cuda.is_available():
inputs = Variable(torch.from_numpy(x_train).cuda())
targets = Variable(torch.from_numpy(y_train).cuda())
else:
inputs = Variable(torch.from_numpy(x_train))
targets = Variable(torch.from_numpy(y_train))
# Forward + Backward + Optimize
optimizer.zero_grad()
outputs = model(inputs)
loss = criterion(outputs, targets)
loss.backward()
optimizer.step()
if (epoch+1) % 5 == 0:
print ('Epoch [%d/%d], Loss:%.4f'
%(epoch+1, num_epochs, loss.data[0]))
# print end time
print "End time = "+time.ctime()
# Plot the graph
if torch.cuda.is_available():
predicted = model(Variable(torch.from_numpy(x_train).cuda())).data.cpu().numpy()
else:
predicted = model(Variable(torch.from_numpy(x_train))).data.numpy()
plt.plot( y_train/500, 'r-', label='Original data')
plt.plot( predicted/500,'-', label='Fitted line')
#plt.plot(y_train/500, predicted/500,'.', label='Fitted line')
plt.legend()
plt.show()
# Save the Model
torch.save(model.state_dict(), 'model.pkl')
input(1M)和output(140K)文件太大了.没法儿贴上来.我将他们放入google driven,有需要的同学可以联系我下载.https://drive.google.com/open?id=1xJFjwQEgR0ZT89PVEZruUZcq96r1D9omdrive.google.com
也可以自己生成,input是12225*9的矩阵,output是12225*1的矩阵.