报错信息:
tensorflow.python.framework.errors_impl.NotFoundError: Could not find valid device for node.
Node:{{node Minimum}}
All kernels registered for op Minimum :
device=‘CPU’; T in [DT_FLOAT]
device=‘CPU’; T in [DT_HALF]
device=‘CPU’; T in [DT_BFLOAT16]
device=‘CPU’; T in [DT_DOUBLE]
device=‘CPU’; T in [DT_INT32]
device=‘CPU’; T in [DT_INT64]
device=‘GPU’; T in [DT_FLOAT]
device=‘GPU’; T in [DT_HALF]
device=‘GPU’; T in [DT_DOUBLE]
device=‘GPU’; T in [DT_INT64]
device=‘GPU’; T in [DT_INT32]
[Op:Minimum]
从报错日志来看,似乎是跟CPU与GPU驱动相关的报错,出现这种情况的原因呢,是因为输入的数据不对(不是TensorFlow在训练时所需要的数据格式类型,必须是tensor或者numpy的 float32,int32, float64,int64,bool等类型)。
系统: win11
TensorFlow: 1.1.5
CUDA: 10.1
cudnn: 7.2.1
以下代码是自定义的TensorFlow反向传播计算代码:
@tf.function
def train_step(model_train, stft_data, data_input, targets, optimizer):
with tf.GradientTape(True) as tape:
outputs = model_train(data_input)
loss_value = loss_cross(targets, np.multiply(stft_data, outputs))
loss_value = tf.reduce_sum(model_train.losses) + loss_value
grads = tape.gradient(loss_value, model_train.trainable_variables)
optimizer.apply_gradients(zip(grads, model_train.trainable_variables))
return loss_value
报错日志自己代码的报错行数
File “D:\PyCharm\TF\huanyuan\CSPAttUNnetMusic\utils\utils_fit.py”, line 27, in train_step
loss_value = loss_cross(targets, np.multiply(stft_data, outputs))
File “D:\Tool\Anaconda3\envs\TF\lib\site-packages\tensorflow_core\python\keras\losses.py”, line 989, in binary_crossentropy
K.binary_crossentropy(y_true, y_pred, from_logits=from_logits), axis=-1)
File “D:\Tool\Anaconda3\envs\TF\lib\site-packages\tensorflow_core\python\keras\backend.py”, line 4472, in binary_crossentropy
output = clip_ops.clip_by_value(output, epsilon_, 1. - epsilon_)
作者在做loss时,进去的数据类型为complex64(复数类型),导致TensorFlow中loss函数无法正确计算,需要修改为tensor或者numpy的 float32,int32, float64,int64,bool 等类型。
所以只需要将错误数据类型转为np或者tensor类型即可,只需要修改原始代码的第四行即可:
@tf.function
def train_step(model_train, stft_data, data_input, targets, optimizer):
with tf.GradientTape(True) as tape:
outputs = model_train(data_input)
loss_value = loss_cross(targets, np.array(np.multiply(stft_data, outputs),np.float32))
loss_value = tf.reduce_sum(model_train.losses) + loss_value
grads = tape.gradient(loss_value, model_train.trainable_variables)
optimizer.apply_gradients(zip(grads, model_train.trainable_variables))
return loss_value
本错误主要是自己数据不符合TensorFlow训练所要求的数据类型,定位到报错的行数,修改当前行的数据格式为为tensor或者numpy的 float32,int32, float64,int64,bool 等类型即可。