学习目标
- GPU
- 可视化学习
使用GPU
支持的设备
- "/cpu:0":机器的 CPU。
- "/device:GPU:0":机器的 GPU(如果有一个)。
- "/device:GPU:1":机器的第二个 GPU(以此类推)。
如果 TensorFlow 指令中兼有 CPU 和 GPU 实现,当该指令分配到设备时,GPU 设备有优先权。
记录设备分配方式
# Creates a graph.
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)
# Creates a session with log_device_placement set to True.
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
# Runs the op.
print(sess.run(c))
手动分配设备
使用with tf.device
()
# Creates a graph.
with tf.device('/cpu:0'):
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)
# Creates a session with log_device_placement set to True.
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
# Runs the op.
print(sess.run(c))
###输出
b: /job:localhost/replica:0/task:0/cpu:0 #b分配到cpu:0
a: /job:localhost/replica:0/task:0/cpu:0 #a分配到cpu:0
MatMul: /job:localhost/replica:0/task:0/device:GPU:0 #MatMul会被选择一个可用设备
允许增加GPU内存
因为内存资源有限,因此tensorflow提供根据进程需要增加内存使用量。
- allow_growth 它根据运行时的需要来分配GPU内存,随着开始运行需要更多的GPU内存来扩展进程所需要的内存。但为了防止出现内存碎片,因此不会释放内存。
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
session = tf.Session(config=config, ...)
- per_process_gpu_memory_fraction 它决定每个可见GPU应分配到的内存占总内存量的比例。
config = tf.ConfigProto()
config.gpu_options.per_process_gpu_memory_fraction = 0.4 #指定tensorflow仅分配每个GPU总内存的40%
session = tf.Session(config=config, ...)
在多GPU系统中使用单一GPU
系统中由多个GPU时,默认情况下选择ID最小的GPU。也可以显示指定偏好设置
# Creates a graph.
with tf.device('/device:GPU:2'):
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)
# Creates a session with log_device_placement set to True.
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
# Runs the op.
print(sess.run(c))
如果设备不存在,则会报错InvalidArgumentError
当指定的设备不存在,可以设置allow_soft_placement为True来自动选择现有的设备来运行
# Creates a graph.
with tf.device('/device:GPU:2'):
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)
# Creates a session with allow_soft_placement and log_device_placement set
# to True.
sess = tf.Session(config=tf.ConfigProto(
allow_soft_placement=True, log_device_placement=True))
# Runs the op.
print(sess.run(c))
使用多个GPU
# Creates a graph.
c = []
for d in ['/device:GPU:2', '/device:GPU:3']:
with tf.device(d):
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3])
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2])
c.append(tf.matmul(a, b))
with tf.device('/cpu:0'):
sum = tf.add_n(c)
# Creates a session with log_device_placement set to True.
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
# Runs the op.
print(sess.run(sum))
###输出
Device mapping:
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: Tesla K20m, pci bus
id: 0000:02:00.0
/job:localhost/replica:0/task:0/device:GPU:1 -> device: 1, name: Tesla K20m, pci bus
id: 0000:03:00.0
/job:localhost/replica:0/task:0/device:GPU:2 -> device: 2, name: Tesla K20m, pci bus
id: 0000:83:00.0
/job:localhost/replica:0/task:0/device:GPU:3 -> device: 3, name: Tesla K20m, pci bus
id: 0000:84:00.0
Const_3: /job:localhost/replica:0/task:0/device:GPU:3
Const_2: /job:localhost/replica:0/task:0/device:GPU:3
MatMul_1: /job:localhost/replica:0/task:0/device:GPU:3
Const_1: /job:localhost/replica:0/task:0/device:GPU:2
Const: /job:localhost/replica:0/task:0/device:GPU:2
MatMul: /job:localhost/replica:0/task:0/device:GPU:2
AddN: /job:localhost/replica:0/task:0/cpu:0
[[ 44. 56.]
[ 98. 128.]]