目录
AttributeError: module 'distutils' has no attribute 'version'
RuntimeError: Distributed package doesn’t have NCCL built in|PyTorch踩坑
windows下“import torch” 报错:“OSError: [WinError 1455] 页面文件太小,无法完成操作” 的问题
在开始准备训练模型时会报这个错误:
解决: setuptools版本问题”,版本过高导致的问题;setuptools版本
推荐安装:setuptools 57.5.0
pip uninstall setuptools
pip install setuptools==57.5.0 //需要比你之前的低
注意:以下是windows下会产生的错误
在windows系统上复现车道线检测GANet网络时了发生如下错误
raise RuntimeError("Distributed package doesn’t have NCCL "
RuntimeError: Distributed package doesn’t have NCCL built in
原因:windows不支持NCCL,应该修改为gloo
解决方案:在代码distributed_c10d.py里prefix_store = PrefixStore(group_name, store)下添加一段代码:
backend = "gloo"
修改后的片段如下:
prefix_store = PrefixStore(group_name, store)
backend = "gloo"
if backend == Backend.GLOO:
pg = ProcessGroupGloo(
prefix_store,
rank,
world_size,
timeout=timeout)
_pg_map[pg] = (Backend.GLOO, store)
_pg_names[pg] = group_name
elif backend == Backend.NCCL:
if not is_nccl_available():
raise RuntimeError("Distributed package doesn't have NCCL "
"built in")
网络上还有其他的一些方法,都是在这个文件里添加backend='gloo',但是在我用上面的方法解决的。
解决方法:在mmdet\datasets\builder.py里找到num_workers将其赋值为0
"""
rank, world_size = get_dist_info()
if dist:
# DistributedGroupSampler will definitely shuffle the data to satisfy
# that images on each GPU are in the same group
if shuffle:
sampler = DistributedGroupSampler(dataset, samples_per_gpu,
world_size, rank)
else:
sampler = DistributedSampler(
dataset, world_size, rank, shuffle=False)
batch_size = samples_per_gpu
num_workers = workers_per_gpu
else:
sampler = GroupSampler(dataset, samples_per_gpu) if shuffle else None
batch_size = num_gpus * samples_per_gpu
# num_workers = num_gpus * workers_per_gpu
num_workers = 0
init_fn = partial(
worker_init_fn, num_workers=num_workers, rank=rank,
seed=seed) if seed is not None else None
如果还不行,那可能需要调整页面文件大小
参考:解决pycharm中: OSError: [WinError 1455] 页面文件太小,无法完成操作 的问题 - 程序那点事 - 博客园 (cnblogs.com)