linux载入pytorch的预训练模型时遇到_pickle.UnpicklingError: unpickling stack underflow

linux试图载入pytorch的预训练模型resnet101时遇到如下报错:

Traceback (most recent call last):
File “train_baseline.py”, line 272, in
cnn = resnet101(pretrained=True).to(device)
File “/home/user/anaconda3/envs/my/lib/python3.6/site-packages/torchvision/models/resnet.py”, line 200, in resnet101
model.load_state_dict(model_zoo.load_url(model_urls[‘resnet101’]))
File “/home/user/anaconda3/envs/my/lib/python3.6/site-packages/torch/utils/model_zoo.py”, line 67, in load_url
return torch.load(cached_file, map_location=map_location)
File “/home/user/anaconda3/envs/my/lib/python3.6/site-packages/torch/serialization.py”, line 368, in load
return _load(f, map_location, pickle_module)
File “/home/user/anaconda3/envs/my/lib/python3.6/site-packages/torch/serialization.py”, line 532, in _load
magic_number = pickle_module.load(f)
_pickle.UnpicklingError: unpickling stack underflow

起因是最初下载resenet101时,提示系统的临时文件夹容量不足:

OSError: [Errno 18] Invalid cross-device link: ‘/tmp/tmpjqtk1ks_’ -> ‘/home/user/.torch/models/resnet101-5d3b4d8f.pth’
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File “train_baseline.py”, line 272, in
cnn = resnet101(pretrained=True).to(device)
File “/home/user/anaconda3/envs/my/lib/python3.6/site-packages/torchvision/models/resnet.py”, line 200, in resnet101
model.load_state_dict(model_zoo.load_url(model_urls[‘resnet101’]))
File “/home/user/anaconda3/envs/my/lib/python3.6/site-packages/torch/utils/model_zoo.py”, line 66, in load_url
_download_url_to_file(url, cached_file, hash_prefix, progress=progress)
File “/home/user/anaconda3/envs/my/lib/python3.6/site-packages/torch/utils/model_zoo.py”, line 107, in _download_url_to_file
shutil.move(f.name, dst)
File “/home/user/anaconda3/envs/my/lib/python3.6/shutil.py”, line 564, in move
copy_function(src, real_dst)
File “/home/user/anaconda3/envs/my/lib/python3.6/shutil.py”, line 263, in copy2
copyfile(src, dst, follow_symlinks=follow_symlinks)
File “/home/user/anaconda3/envs/my/lib/python3.6/shutil.py”, line 122, in copyfile
copyfileobj(fsrc, fdst)
File “/home/user/anaconda3/envs/my/lib/python3.6/shutil.py”, line 82, in copyfileobj
fdst.write(buf)
OSError: [Errno 28] No space left on device

解决容量不足的问题的方法是:在较大的硬盘空间里重新创建并定义一个临时文件夹,比如我这里是 /mnt/tmp 文件夹:

export TMPDIR=/mnt/tmp
source ~/.bashrc

但是重新定义完临时文件夹后,重新运行代码还是存在问题,即文章开头提到的:

_pickle.UnpicklingError: unpickling stack underflow

这个问题是由于先前已经缓存了resnet101模型的一部分,但是没有缓存完毕,导致临时文件夹中存在部分不完整的模型,载入失败。解决方法是删除原来临时文件夹中的resnet101模型。

原来的resnet101模型的路径存在以下两种可能:

/home/user/.cache/torch/checkpoints

或者是

/home/user/.torch/models

路径中的user是你的用户名,请按照你的用户名进行更改。

不同系统的具体路径不同,可以两个都尝试一下。直接用ls命令可能无法查看到~/.cache或者~/.torch这类隐藏文件夹,直接cd进入目录即可。

删除下载了一半的模型,并且记得定义新的临时文件夹,再次运行代码,解决问题。

你可能感兴趣的:(❤️,debug之路)