tensorflow设置layer的执行位置(device)

1.tensoflow默认的执行位置

https://tensorflow.google.cn/guide/using_gpu

1.1 设备选择的优先级

If a TensorFlow operation has both CPU and GPU implementations, the GPU devices will be given priority when the operation is assigned to a device. For example, matmul has both CPU and GPU kernels. On a system with devices cpu:0 and gpu:0, gpu:0 will be selected to run matmul。==默认优先选择GPU,那么是否可以强行改变执行位置?

1.2 查看执行位置

   查看方法:

# Creates a graph.
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)
# Creates a session with log_device_placement set to True.
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
# Runs the op.
print(sess.run(c))

输出结果:

conv1/weights/Initializer/random_uniform/shape: (Const): /job:localhost/replica:0/task:0/device:CPU:0
conv1/weights/Initializer/random_uniform/min: (Const): /job:localhost/replica:0/task:0/device:CPU:0
conv1/weights/Initializer/random_uniform/max: (Const): /job:localhost/replica:0/task:0/device:CPU:0
conv1/biases/Initializer/random_uniform/shape: (Const): /job:localhost/replica:0/task:0/device:CPU:0
conv1/biases/Initializer/random_uniform/min: (Const): /job:localhost/replica:0/task:0/device:CPU:0
conv1/biases/Initializer/random_uniform/max: (Const): /job:localhost/replica:0/task:0/device:CPU:0
Shape: (Const): /job:localhost/replica:0/task:0/device:CPU:0
conv2/weights/Initializer/random_uniform/shape: (Const): /job:localhost/replica:0/task:0/device:CPU:0
conv2/weights/Initializer/random_uniform/min: (Const): /job:localhost/replica:0/task:0/device:CPU:0
conv2/weights/Initializer/random_uniform/max: (Const): /job:localhost/replica:0/task:0/device:CPU:0
conv2/biases/Initializer/random_uniform/shape: (Const): /job:localhost/replica:0/task:0/device:CPU:0
conv2/biases/Initializer/random_uniform/min: (Const): /job:localhost/replica:0/task:0/device:CPU:0
conv2/biases/Initializer/random_uniform/max: (Const): /job:localhost/replica:0/task:0/device:CPU:0

结论:

CPU版本的tf,默认把所有参数和操作放在了CPU:0.

1.3 指定variable的放置位置,查看op的执行位置。

验证方法:

# Creates a graph.
with tf.device('/cpu:0'):
  a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
  b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)
# Creates a session with log_device_placement set to True.
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
# Runs the op.
print(sess.run(c))

输出结果:

Device mapping:
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: Tesla K40c, pci bus
id: 0000:05:00.0
b: /job:localhost/replica:0/task:0/cpu:0
a: /job:localhost/replica:0/task:0/cpu:0
MatMul: /job:localhost/replica:0/task:0/device:GPU:0
[[ 22.  28.]
 [ 49.  64.]]

分析:

You will see that now a and b are assigned to cpu:0. Since a device was not explicitly specified for the MatMul operation, the TensorFlow runtime will choose one based on the operation and available devices (gpu:0 in this example) and automatically copy tensors between devices if required.

不指定op位置时,tf会将op部署在默认device上。如果此时,op和variable不在同一个device上,这些variable会被复制到op所在的device上完成运算。

1.4 variable定义位置指定的device和assign指定的位置不一致

   会抛出一下异常:

InvalidArgumentError (see above for traceback): Cannot assign a device for operation conv1_2/Assign_3/value: node conv1_2/Assign_3/value (defined at ./CP_Alexnet/alexnet.py:229) was explicitly assigned to /device:CPU:1 but available devices are [ /job:localhost/replica:0/task:0/device:CPU:0, /job:localhost/replica:0/task:0/device:XLA_CPU:0 ]. Make sure the device specification refers to a valid device.

解释:

varaible会创建到它指定的device上,赋值时也应该在该device上查找;如果赋值时,不指定查找的device,tf就会到默认device下找该variable。如果创建指定的device和tf默认device不一致,就会出现上述错误。

 1.5 CPU和GPU能否混合使用

   能,具体方法如下:

# Creates a graph.
c = []
for d in ['/device:GPU:2', '/device:GPU:3']:
  with tf.device(d):
    a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3])
    b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2])
    c.append(tf.matmul(a, b))
with tf.device('/cpu:0'):
  sum = tf.add_n(c)
# Creates a session with log_device_placement set to True.
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
# Runs the op.
print(sess.run(sum)

运行结果:

Device mapping:
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: Tesla K20m, pci bus
id: 0000:02:00.0
/job:localhost/replica:0/task:0/device:GPU:1 -> device: 1, name: Tesla K20m, pci bus
id: 0000:03:00.0
/job:localhost/replica:0/task:0/device:GPU:2 -> device: 2, name: Tesla K20m, pci bus
id: 0000:83:00.0
/job:localhost/replica:0/task:0/device:GPU:3 -> device: 3, name: Tesla K20m, pci bus
id: 0000:84:00.0
Const_3: /job:localhost/replica:0/task:0/device:GPU:3
Const_2: /job:localhost/replica:0/task:0/device:GPU:3
MatMul_1: /job:localhost/replica:0/task:0/device:GPU:3
Const_1: /job:localhost/replica:0/task:0/device:GPU:2
Const: /job:localhost/replica:0/task:0/device:GPU:2
MatMul: /job:localhost/replica:0/task:0/device:GPU:2
AddN: /job:localhost/replica:0/task:0/cpu:0
[[  44.   56.]
 [  98.  128.]]

1.6 load SaveModel中的variable有怎样的特征

 save_model生成的variable文件,保留了variable的device信息。因为我们需要动态调整varaible的device,因此所有model的layer必须在线生成,不能直接加载保存好的SaveModel。

 小节:

1)variable定义和赋值指定的device必须一致,否则会报错。

2)variable和op的device,尽可能一致,否则会带来额外的内存复制开销。

3)耐心是美德。

2. 把model的layer放到特定device上执行。

core:op和variable定义在同一个device上,variable赋值时要引用该位置。

method:

config = tf.ConfigProto(device_count = {"CPU":3,"GPU":2}) #常见可用的device list
with tf.Session(config = config) as sess: #将上述device list添加到该session中
  with tf.device(device_name):#指定特定device
    #1.create model
    model = Data_Parall_AlexNet(param_list)
    #2. define the model's structure
    model.create()
    #3. intialize all all parameters
    sess.run(tf.global_variables_initializer())
    #4. assign value to all params
    sess.run(model.load_initial_weights())                    

analyze:

1)tf.ConfigProto的属性device_count可以控制Session可用的device list,经过上述代码的配置,该session中可用的device为

cpu:0,cpu:1,cpu:2,gpu:0,gpu:1。其中,device_count采用的方式来定义可用设备,string = “CPU” or "GPU",int32定义了设备数目。注意:这里是数目,不是名称;只有tf.device才能指定名称。

2)session为每个variable和ops创建了执行环境,当sess的代码块退出后,该sess下的所有variable和ops会被清空。

 

你可能感兴趣的:(tensorflow,model,parallel)