个人主页--> http://www.yansongsong.cn
关联阅读:
[开发技巧]·PyTorch如何使用GPU加速(CPU与GPU数据的相互转换)
[开发技巧]·TensorFlow&Keras GPU
我们基于CNN实现Cifar10 数据集分类把这段相同的代码在不同主流深度学习进行测试,得到训练速度的对比数据。
主流深度学习硬件速度对比
(Colab TPU) 速度 382s/epoch
(i5 8250u) 速度 320s/epoch
(i7 9700k) 速度 36s/epoch
(GPU MX150) 速度 36s/epoch
(Colab GPU) 速度 16s/epoch
(GPU GTX 1060) 速度 9s/epoch
(GPU GTX1080ti) 速度 4s/epoch
通过对比看出相较于普通比较笔记本的(i5 8250u)CPU,一个入门级显卡(GPU MX150)可以提升8倍左右的速度,而高性能的显卡(GPU GTX1080ti)可以提升80倍的速度,如果采用多个GPU将会获得更快速度,所以经常用于训练的话还是建议使用GPU。
也欢迎大家在自己电脑上运行下面代码,对比一下速度。我的电脑CPU 320s/epoch。
代码部分
from tensorflow import keras
from keras.datasets import cifar10
import numpy as np
batch_size = 100
num_classes = 10
epochs = 10
# 数据载入
(x_train, train_labels), (x_test, test_labels) = cifar10.load_data()
print(x_train.shape)
train_images = x_train.reshape([-1,32,32,3]) / 255.0
test_images = x_test.reshape([-1,32,32,3]) / 255.0
model = keras.Sequential([
#(-1,32,32,3)->(-1,32,32,16)
keras.layers.Conv2D(input_shape=(32, 32, 3),filters=32,kernel_size=3,strides=1,padding='same'), # Padding method),
#(-1,32,32,32)->(-1,32,32,32)
keras.layers.Conv2D(filters=32,kernel_size=3,strides=1,padding='same'), # Padding method),
#(-1,32,32,32)->(-1,16,16,32)
keras.layers.MaxPool2D(pool_size=2,strides=2,padding='same'),
#(-1,16,16,32)->(-1,16,16,64)
keras.layers.Conv2D(filters=64,kernel_size=3,strides=1,padding='same'), # Padding method),
#(-1,16,16,64)->(-1,16,16,64)
keras.layers.Conv2D(filters=64,kernel_size=3,strides=1,padding='same'), # Padding method),
#(-1,16,16,64)->(-1,8,8,64)
keras.layers.MaxPool2D(pool_size=2,strides=2,padding='same'),
#(-1,8,8,64)->(-1,8*8*128)
keras.layers.Conv2D(filters=128,kernel_size=3,strides=1,padding='same'), # Padding method),
#(-1,8,8,128)->(-1,8*8*128)
keras.layers.Conv2D(filters=128,kernel_size=3,strides=1,padding='same'), # Padding method),
#(-1,8,8,128)->(-1,8*8*128)
keras.layers.Flatten(),
#(-1,8*8*128)->(-1,256)
keras.layers.Dropout(0.3),
keras.layers.Dense(128, activation="relu"),
#(-1,256)->(-1,10)
keras.layers.Dense(10, activation="softmax")
])
print(model.summary())
model.compile(optimizer="adam",
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
model.fit(train_images, train_labels, batch_size = batch_size, epochs=epochs,validation_data=[test_images[:1000],test_labels[:1000]])
test_loss, test_acc = model.evaluate(test_images, test_labels)
print(np.argmax(model.predict(test_images[:20]),1),test_labels[:20])
输出结果(GPU gtx 1080 ti)
python demo.py
Using TensorFlow backend.
(50000, 32, 32, 3)
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d (Conv2D) (None, 32, 32, 32) 896
_________________________________________________________________
conv2d_1 (Conv2D) (None, 32, 32, 32) 9248
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 16, 16, 32) 0
_________________________________________________________________
conv2d_2 (Conv2D) (None, 16, 16, 64) 18496
_________________________________________________________________
conv2d_3 (Conv2D) (None, 16, 16, 64) 36928
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 8, 8, 64) 0
_________________________________________________________________
conv2d_4 (Conv2D) (None, 8, 8, 128) 73856
_________________________________________________________________
conv2d_5 (Conv2D) (None, 8, 8, 128) 147584
_________________________________________________________________
flatten (Flatten) (None, 8192) 0
_________________________________________________________________
dropout (Dropout) (None, 8192) 0
_________________________________________________________________
dense (Dense) (None, 128) 1048704
_________________________________________________________________
dense_1 (Dense) (None, 10) 1290
=================================================================
Total params: 1,337,002
Trainable params: 1,337,002
Non-trainable params: 0
_________________________________________________________________
None
Train on 50000 samples, validate on 1000 samples
Epoch 1/10
2019-03-15 17:07:34.477745: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2019-03-15 17:07:34.552699: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:964] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-03-15 17:07:34.553036: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.6325
pciBusID: 0000:01:00.0
totalMemory: 10.92GiB freeMemory: 10.68GiB
2019-03-15 17:07:34.553049: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2019-03-15 17:07:34.737306: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-03-15 17:07:34.737335: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988] 0
2019-03-15 17:07:34.737340: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0: N
2019-03-15 17:07:34.737468: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10327 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1)
50000/50000 [==============================] - 5s 103us/step - loss: 1.3343 - acc: 0.5256 - val_loss: 1.0300 - val_acc: 0.6450
Epoch 2/10
50000/50000 [==============================] - 4s 76us/step - loss: 0.9668 - acc: 0.6660 - val_loss: 0.8930 - val_acc: 0.6820
Epoch 3/10
50000/50000 [==============================] - 4s 76us/step - loss: 0.8349 - acc: 0.7097 - val_loss: 0.8486 - val_acc: 0.7130
Epoch 4/10
50000/50000 [==============================] - 4s 76us/step - loss: 0.7496 - acc: 0.7412 - val_loss: 0.8823 - val_acc: 0.7040
Epoch 5/10
50000/50000 [==============================] - 4s 76us/step - loss: 0.6805 - acc: 0.7643 - val_loss: 0.8710 - val_acc: 0.7060
Epoch 6/10
50000/50000 [==============================] - 4s 76us/step - loss: 0.6256 - acc: 0.7833 - val_loss: 0.9150 - val_acc: 0.7020
Epoch 7/10
50000/50000 [==============================] - 4s 77us/step - loss: 0.5715 - acc: 0.8000 - val_loss: 0.8586 - val_acc: 0.7140
Epoch 8/10
50000/50000 [==============================] - 4s 76us/step - loss: 0.5312 - acc: 0.8143 - val_loss: 0.9455 - val_acc: 0.7030
Epoch 9/10
50000/50000 [==============================] - 4s 77us/step - loss: 0.4878 - acc: 0.8287 - val_loss: 1.0063 - val_acc: 0.7360
Epoch 10/10
50000/50000 [==============================] - 4s 76us/step - loss: 0.4474 - acc: 0.8438 - val_loss: 1.0609 - val_acc: 0.7030
10000/10000 [==============================] - 1s 54us/step
[3 8 8 0 6 6 1 6 3 1 4 9 4 7 9 8 5 5 8 6]
[[3]
[8]
[8]
[0]
[6]
[6]
[1]
[6]
[3]
[1]
[0]
[9]
[5]
[7]
[9]
[8]
[5]
[7]
[8]
[6]]