记一次深度学习模型保存失败的解决

记一次深度学习模型保存失败的解决

  • 项目场景:
  • 异常信息:
  • 原因分析:
  • 解决方案:
    • ***如果你有同样的问题解决方法不同,欢迎评论补充~***

项目场景:

深度学习tf keras 训练完模型,模型保存失败
windows 没有这个问题,linux 上多次尝试都报这个错


异常信息:

``` 16107/16107 [==============================] - 24s 1ms/step - loss: 3.9254
Traceback (most recent call last):
File "test.py", line 115, in
model.save(filepath=checkpoit_path)
File "/opt/app/anaconda3/lib/python3.7/site-packages/tensorflow/python/keras/engine/network.py", line 1052, in save
signatures, options)
File "/opt/app/anaconda3/lib/python3.7/site-packages/tensorflow/python/keras/saving/save.py", line 135, in save_model
model, filepath, overwrite, include_optimizer)
File "/opt/app/anaconda3/lib/python3.7/site-packages/tensorflow/python/keras/saving/hdf5_format.py", line 102, in save_model_to_hdf5
f = h5py.File(filepath, mode='w')
File "/opt/app/anaconda3/lib/python3.7/site-packages/h5py/_hl/files.py", line 408, in __init__ swmr=swmr)
File "/opt/app/anaconda3/lib/python3.7/site-packages/h5py/_hl/files.py", line 179, in make_fid
fid = h5f.create(name, h5f.ACC_TRUNC, fapl=fapl, fcpl=fcpl)
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "h5py/h5f.pyx", line 108, in h5py.h5f.create
OSError: Unable to create file (unable to open file: name = '/root/lxl/my_model.h5', errno = 2, error message = 'No such file or directory', flags = 13, o_flags = 242) ```

原因分析:

1、可能启动python 脚本的用户没有创建文件的角色
2、文件夹确实不存在
3、hdf5 模块有异常
4、环境影响

解决方案:

针对可能原因一 使用sodu 命令尝试
如果文件夹不存在创建文件夹(文件夹存在也报错的可以重建文件夹),不相信自己眼睛的可以复制异常中的路径vi 一下看看文件是不是新的
如果还有问题,重装hdf5模块
上述操作都做了还不行建议重启linux (解决我问题的方法)

如果你有同样的问题解决方法不同,欢迎评论补充~

你可能感兴趣的:(tensorflow,python,深度学习,tensorflow)