首先给出报错提示
[07.27.20|23:44:32] Training epoch: 0
Traceback (most recent call last):
File "main.py", line 31, in
p.start()
File "D:\code\st-gcn\processor\processor.py", line 114, in start
self.train()
File "D:\code\st-gcn\processor\recognition.py", line 85, in train
for data, label in loader:
File "D:\Python36\lib\site-packages\torch\utils\data\dataloader.py", line 279, in __iter__
return _MultiProcessingDataLoaderIter(self)
File "D:\Python36\lib\site-packages\torch\utils\data\dataloader.py", line 719, in __init__
w.start()
File "D:\Python36\lib\multiprocessing\process.py", line 105, in start
self._popen = self._Popen(self)
File "D:\Python36\lib\multiprocessing\context.py", line 223, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "D:\Python36\lib\multiprocessing\context.py", line 322, in _Popen
return Popen(process_obj)
File "D:\Python36\lib\multiprocessing\popen_spawn_win32.py", line 65, in __init__
reduction.dump(process_obj, to_child)
File "D:\Python36\lib\multiprocessing\reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
OverflowError: cannot serialize a bytes object larger than 4 GiB
Traceback (most recent call last):
File "", line 1, in
File "D:\Python36\lib\multiprocessing\spawn.py", line 105, in spawn_main
exitcode = _main(fd)
File "D:\Python36\lib\multiprocessing\spawn.py", line 115, in _main
self = reduction.pickle.load(from_parent)
EOFError: Ran out of input
主要在每个epoch划分batch_size的时候,这里出错。
常见错误提示有:OverflowError: cannot serialize a bytes object larger than 4 GiB 或者 EOFError: Ran out of input
解决办法官方:https://discuss.pytorch.org/t/pytorch-windows-eoferror-ran-out-of-input-when-num-workers-0/25918
直接修改num_worker为0,windows上遗留下来的问题
另外,如果出现类似如下错误提示:
RuntimeError: CUDA out of memory. Tried to allocate 236.00 MiB (GPU 0; 8.00 GiB total capacity; 5.76 GiB already allocated; 161.97 MiB free; 5.78 GiB
reserved in total by PyTorch)
也即:RuntimeError: CUDA out of memory.
解决办法,修改config文件中batch_size大小,往小调,如果对精度有疑虑,换卡