1.tensoflow默认的执行位置
https://tensorflow.google.cn/guide/using_gpu
1.1 设备选择的优先级
If a TensorFlow operation has both CPU and GPU implementations, the GPU devices will be given priority when the operation is assigned to a device. For example, matmul
has both CPU and GPU kernels. On a system with devices cpu:0
and gpu:0
, gpu:0
will be selected to run matmul。==默认优先选择GPU,那么是否可以强行改变执行位置?
1.2 查看执行位置
查看方法:
# Creates a graph.
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)
# Creates a session with log_device_placement set to True.
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
# Runs the op.
print(sess.run(c))
输出结果:
conv1/weights/Initializer/random_uniform/shape: (Const): /job:localhost/replica:0/task:0/device:CPU:0
conv1/weights/Initializer/random_uniform/min: (Const): /job:localhost/replica:0/task:0/device:CPU:0
conv1/weights/Initializer/random_uniform/max: (Const): /job:localhost/replica:0/task:0/device:CPU:0
conv1/biases/Initializer/random_uniform/shape: (Const): /job:localhost/replica:0/task:0/device:CPU:0
conv1/biases/Initializer/random_uniform/min: (Const): /job:localhost/replica:0/task:0/device:CPU:0
conv1/biases/Initializer/random_uniform/max: (Const): /job:localhost/replica:0/task:0/device:CPU:0
Shape: (Const): /job:localhost/replica:0/task:0/device:CPU:0
conv2/weights/Initializer/random_uniform/shape: (Const): /job:localhost/replica:0/task:0/device:CPU:0
conv2/weights/Initializer/random_uniform/min: (Const): /job:localhost/replica:0/task:0/device:CPU:0
conv2/weights/Initializer/random_uniform/max: (Const): /job:localhost/replica:0/task:0/device:CPU:0
conv2/biases/Initializer/random_uniform/shape: (Const): /job:localhost/replica:0/task:0/device:CPU:0
conv2/biases/Initializer/random_uniform/min: (Const): /job:localhost/replica:0/task:0/device:CPU:0
conv2/biases/Initializer/random_uniform/max: (Const): /job:localhost/replica:0/task:0/device:CPU:0
结论:
CPU版本的tf,默认把所有参数和操作放在了CPU:0.
1.3 指定variable的放置位置,查看op的执行位置。
验证方法:
# Creates a graph.
with tf.device('/cpu:0'):
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)
# Creates a session with log_device_placement set to True.
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
# Runs the op.
print(sess.run(c))
输出结果:
Device mapping:
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: Tesla K40c, pci bus
id: 0000:05:00.0
b: /job:localhost/replica:0/task:0/cpu:0
a: /job:localhost/replica:0/task:0/cpu:0
MatMul: /job:localhost/replica:0/task:0/device:GPU:0
[[ 22. 28.]
[ 49. 64.]]
分析:
You will see that now a
and b
are assigned to cpu:0
. Since a device was not explicitly specified for the MatMul
operation, the TensorFlow runtime will choose one based on the operation and available devices (gpu:0
in this example) and automatically copy tensors between devices if required.
不指定op位置时,tf会将op部署在默认device上。如果此时,op和variable不在同一个device上,这些variable会被复制到op所在的device上完成运算。
1.4 variable定义位置指定的device和assign指定的位置不一致
会抛出一下异常:
InvalidArgumentError (see above for traceback): Cannot assign a device for operation conv1_2/Assign_3/value: node conv1_2/Assign_3/value (defined at ./CP_Alexnet/alexnet.py:229) was explicitly assigned to /device:CPU:1 but available devices are [ /job:localhost/replica:0/task:0/device:CPU:0, /job:localhost/replica:0/task:0/device:XLA_CPU:0 ]. Make sure the device specification refers to a valid device.
解释:
varaible会创建到它指定的device上,赋值时也应该在该device上查找;如果赋值时,不指定查找的device,tf就会到默认device下找该variable。如果创建指定的device和tf默认device不一致,就会出现上述错误。
1.5 CPU和GPU能否混合使用
能,具体方法如下:
# Creates a graph.
c = []
for d in ['/device:GPU:2', '/device:GPU:3']:
with tf.device(d):
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3])
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2])
c.append(tf.matmul(a, b))
with tf.device('/cpu:0'):
sum = tf.add_n(c)
# Creates a session with log_device_placement set to True.
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
# Runs the op.
print(sess.run(sum)
运行结果:
Device mapping:
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: Tesla K20m, pci bus
id: 0000:02:00.0
/job:localhost/replica:0/task:0/device:GPU:1 -> device: 1, name: Tesla K20m, pci bus
id: 0000:03:00.0
/job:localhost/replica:0/task:0/device:GPU:2 -> device: 2, name: Tesla K20m, pci bus
id: 0000:83:00.0
/job:localhost/replica:0/task:0/device:GPU:3 -> device: 3, name: Tesla K20m, pci bus
id: 0000:84:00.0
Const_3: /job:localhost/replica:0/task:0/device:GPU:3
Const_2: /job:localhost/replica:0/task:0/device:GPU:3
MatMul_1: /job:localhost/replica:0/task:0/device:GPU:3
Const_1: /job:localhost/replica:0/task:0/device:GPU:2
Const: /job:localhost/replica:0/task:0/device:GPU:2
MatMul: /job:localhost/replica:0/task:0/device:GPU:2
AddN: /job:localhost/replica:0/task:0/cpu:0
[[ 44. 56.]
[ 98. 128.]]
1.6 load SaveModel中的variable有怎样的特征
save_model生成的variable文件,保留了variable的device信息。因为我们需要动态调整varaible的device,因此所有model的layer必须在线生成,不能直接加载保存好的SaveModel。
小节:
1)variable定义和赋值指定的device必须一致,否则会报错。
2)variable和op的device,尽可能一致,否则会带来额外的内存复制开销。
3)耐心是美德。
2. 把model的layer放到特定device上执行。
core:op和variable定义在同一个device上,variable赋值时要引用该位置。
method:
config = tf.ConfigProto(device_count = {"CPU":3,"GPU":2}) #常见可用的device list
with tf.Session(config = config) as sess: #将上述device list添加到该session中
with tf.device(device_name):#指定特定device
#1.create model
model = Data_Parall_AlexNet(param_list)
#2. define the model's structure
model.create()
#3. intialize all all parameters
sess.run(tf.global_variables_initializer())
#4. assign value to all params
sess.run(model.load_initial_weights())
analyze:
1)tf.ConfigProto的属性device_count可以控制Session可用的device list,经过上述代码的配置,该session中可用的device为
cpu:0,cpu:1,cpu:2,gpu:0,gpu:1。其中,device_count采用
2)session为每个variable和ops创建了执行环境,当sess的代码块退出后,该sess下的所有variable和ops会被清空。