windows下tensorflow CUDA_ERROR_ILLEGAL_ADDRESS解决办法

最近在使用tensorflow的时候出现了以下所示的bug:

2017-11-08 12:24:52.838039: E tensorflow/stream_executor/cuda/cuda_driver.cc:1080] failed to synchronize the stop event: CUDA_ERROR_ILLEGAL_ADDRESS
2017-11-08 12:24:52.838090: E tensorflow/stream_executor/cuda/cuda_timer.cc:54] Internal: error destroying CUDA event in context 0x51f18f0: CUDA_ERROR_ILLEGAL_ADDRESS
2017-11-08 12:24:52.838106: E tensorflow/stream_executor/cuda/cuda_timer.cc:59] Internal: error destroying CUDA event in context 0x51f18f0: CUDA_ERROR_ILLEGAL_ADDRESS
2017-11-08 12:24:52.838137: F tensorflow/stream_executor/cuda/cuda_dnn.cc:3218] failed to set stream for cudnn handle: CUDNN_STATUS_MAPPING_ERROR

如https://github.com/tensorflow/tensorflow/issues/14363所示,将cudnn7.0.5之后并没有解决。

由于我的模型中使用了tf.contrib.image.transform层,而该层在tensorflow1.7,1.8中,ubuntu下gpu版本没问题。windows下cpu版本正常,gpu版本存在bug。解决思路如https://github.com/tensorflow/tensorflow/issues/17485 所示,将图中的tf.contrib.image.transform放在cpu上,改为:

with tf.device('/cpu:0'):
    data_node_transformed = tf.contrib.image.transform(imgs, rtm, "BILINEAR")

重新进行模型训练,部署。


期待后续tensorflow和nvidia可以解决该bug。

你可能感兴趣的:(tensorflow)