RuntimeError: Address already in use

同时跑两个pytorch DDP程序时,会出现下列错误:

Traceback (most recent call last):
  File "train_tasks.py", line 471, in 
    main()
  File "train_tasks.py", line 211, in main
    torch.distributed.init_process_group(backend="nccl")
  File ".../anaconda3/envs/vilbert/lib/python3.6/site-packages/torch/distributed/distributed_c10d.py", line 406, in init_process_group
    store, rank, world_size = next(rendezvous(url))
  File ".../anaconda3/envs/vilbert/lib/python3.6/site-packages/torch/distributed/rendezvous.py", line 143, in _env_rendezvous_handler
    store = TCPStore(master_addr, master_port, world_size, start_daemon)
RuntimeError: Address already in use

解决方案:

python -m torch.distributed.launch后指定一个未被使用的端口--master_port 88888

你可能感兴趣的:(环境配置与使用)