在恒源云做深度学习时运行train.py到第1个epoch就自动终止了

目录

报错信息:

报错原因:

解决方案:

1.执行下面命令

2.删除数据集重新上传可以解决类似问题。


报错信息:

Start Train
Epoch 1/81:  35%|█████████████████████████▉                                                | 131/373 [01:15<02:02,  1.97it/s, acc=0.519, lr=0.0005, total_loss=0.692]Traceback (most recent call last):
  File "train.py", line 343, in 
    fit_one_epoch(model_train, model, loss, loss_history, optimizer, epoch, epoch_step, epoch_step_val, gen, gen_val, Epoch, Cuda, fp16, scaler, save_period, save_dir, local_rank)
  File "/home/Siamese-pytorch-master/utils/utils_fit.py", line 21, in fit_one_epoch
    for iteration, batch in enumerate(gen):
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 801, in __next__
    return self._process_data(data)
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 846, in _process_data
    data.reraise()
  File "/usr/local/lib/python3.6/dist-packages/torch/_utils.py", line 369, in reraise
    raise self.exc_type(msg)
IsADirectoryError: Caught IsADirectoryError in DataLoader worker process 11.
Original Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop
    data = fetcher.fetch(index)
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/_utils/fetch.py", line 44, in 
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/Siamese-pytorch-master/utils/dataloader.py", line 75, in __getitem__
    images, labels = self._convert_path_list_to_images_and_labels(batch_images_path)
  File "/home/Siamese-pytorch-master/utils/dataloader.py", line 114, in _convert_path_list_to_images_and_labels
    image = Image.open(path_list[pair * 2 + 1])
  File "/usr/local/lib/python3.6/dist-packages/PIL/Image.py", line 2912, in open
    fp = builtins.open(filename, "rb")
IsADirectoryError: [Errno 21] Is a directory: 'datasets/images_background/(9)/.ipynb_checkpoints'

报错原因:

每当你创建一个新的 notebook 时,都会创建一个检查点文件以及你的 notebook 文件;它将位于你保存位置的隐藏子目录中称作 .ipynb_checkpoints ,也是一个 .ipynb 文件。. 默认情况下,Jupyter 将每隔 120 秒自动保存你的 notebook,而不会改变你的主 notebook 文件。. 当你“保存和检查点”时,notebook 和检查点文件都将被更新。. 因此,检查点使你能够在发生意外事件时恢复未保存的工作。.

解决方案:

1.执行下面命令

rm -rf .ipynb_checkpoints
find . -name ".ipynb_checkpoints" -exec rm -rf {} \;  ## 这个在大文件运行后面就都删了 因为.ipynb_checkpoints是文件夹 需要加-rf循环地删除

注意事项:

(1).先cd 到那个数据集所在的文件夹

(2).不要只执行下面这一句命令,这样有可能会将你的数据集删掉。

rm -rf .ipynb_checkpoints

2.删除数据集重新上传可以解决类似问题。

你可能感兴趣的:(深度学习报错,linux,运维,服务器)