ubuntu18 重装keras遇到的种种问题记录

配置

conda虚拟环境:python==3.6.10


安装

安装指令

conda install tensorflow-gpu keras

【注】看了一些教程,说最好用pip安装,conda会捆绑很多东西,删除的时候也会连带删除很多包,会使整个环境都不能用。但是我试过pip安装,不能使用GPU训练,目前没有解决。用conda可以自动安装cudatoolkit、cudnn,并且可以自动对应版本。(小声bb:反正用的虚拟环境,坏了直接删了重建一个)

安装版本

自动安装了

  • tensorflow 2.1.0
  • keras 2.3.1

运行结果

问题一

  1. 问题描述
Traceback (most recent call last):
  File "train_g_unet.py", line 121, in <module>
    train(args)
  File "train_g_unet.py", line 62, in train
    parallel_model = multi_gpu_model(model, gpus=len(gpu_num))
  File "/media/s2/cyq/anaconda3/envs/keras/lib/python3.6/site-packages/keras/utils/multi_gpu_utils.py", line 150, in multi_gpu_model
    available_devices = _get_available_devices()
  File "/media/s2/cyq/anaconda3/envs/keras/lib/python3.6/site-packages/keras/utils/multi_gpu_utils.py", line 16, in _get_available_devices
    return K.tensorflow_backend._get_available_gpus() + ['/cpu:0']
  File "/media/s2/cyq/anaconda3/envs/keras/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 506, in _get_available_gpus
    _LOCAL_DEVICES = tf.config.experimental_list_devices()
AttributeError: module 'tensorflow_core._api.v2.config' has no attribute 'experimental_list_devices'
  1. 解决方法
    打开lib/python3.6/site-packages/keras/backend/tensorflow_backend.py,修改第506行
# 原始代码:
_LOCAL_DEVICES = tf.config.experimental_list_devices()
# 修改后:
devices = tf.config.list_logical_devices()
_LOCAL_DEVICES = [x.name for x in devices]

问题二

  1. 问题描述
Traceback (most recent call last):
  File "train_g_unet.py", line 121, in <module>
    train(args)
  File "train_g_unet.py", line 104, in train
    workers=2)
  File "/media/s2/cyq/anaconda3/envs/keras/lib/python3.6/site-packages/keras/legacy/interfaces.py", line 91, in wrapper
    return func(*args, **kwargs)
  File "/media/s2/cyq/anaconda3/envs/keras/lib/python3.6/site-packages/keras/engine/training.py", line 1732, in fit_generator
    initial_epoch=initial_epoch)
  File "/media/s2/cyq/anaconda3/envs/keras/lib/python3.6/site-packages/keras/engine/training_generator.py", line 100, in fit_generator
    callbacks.set_model(callback_model)
  File "/media/s2/cyq/anaconda3/envs/keras/lib/python3.6/site-packages/keras/callbacks/callbacks.py", line 68, in set_model
    callback.set_model(model)
  File "/media/s2/cyq/anaconda3/envs/keras/lib/python3.6/site-packages/keras/callbacks/tensorboard_v2.py", line 116, in set_model
    super(TensorBoard, self).set_model(model)
  File "/media/s2/cyq/anaconda3/envs/keras/lib/python3.6/site-packages/tensorflow_core/python/keras/callbacks.py", line 1532, in set_model
    self.log_dir, self.model._get_distribution_strategy())  # pylint: disable=protected-access
AttributeError: 'Model' object has no attribute '_get_distribution_strategy'
  1. 解决方法
    参考:https://github.com/tensorflow/tensorflow/pull/34870
    打开lib/python3.6/site-packages/tensorflow_core/python/keras/callbacks.py,修改第1532行和1732行
# 1529行左右 : # distributed_file_utils.write_dirpath()

    # In case this callback is used via native Keras, _get_distribution_strategy does not exist.
    if hasattr(self.model, '_get_distribution_strategy'):
      # TensorBoard callback involves writing a summary file in a
      # possibly distributed settings.
      self._log_write_dir = distributed_file_utils.write_dirpath(
          self.log_dir, self.model._get_distribution_strategy())  # pylint: disable=protected-access
    else:
      self._log_write_dir = self.log_dir

# 1732行: # distributed_file_utils.remove_temp_dirpath()

    # In case this callback is used via native Keras, _get_distribution_strategy does not exist.
    if hasattr(self.model, '_get_distribution_strategy'):
      # Safely remove the unneeded temp files.
      distributed_file_utils.remove_temp_dirpath(
          self.log_dir, self.model._get_distribution_strategy())  # pylint: disable=protected-access
          

最终做法

修改之后使用多GPU训练,以前用过的batchsize数,一直OOM,超出内存,改不好放弃了
最后只能卸载tensorflow,tensorflow-gpu,keras

conda uninstall tensorflow tensorflow-gpu keras

降级安装tensorflow 1.14.0

conda install tensorflow-gpu==1.14.0 

解决

你可能感兴趣的:(Keras学习)