RuntimeError: Address already in use的解决方法

Traceback (most recent call last):
  File "train.py", line 159, in 
    train(args=args)
  File "train.py", line 50, in train
    rank = args.local_rank
  File "/home/wby/anaconda3/envs/wby/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 400, in init_process_group
    store, rank, world_size = next(rendezvous(url))
  File "/home/wby/anaconda3/envs/wby/lib/python3.7/site-packages/torch/distributed/rendezvous.py", line 95, in _tcp_rendezvous_handler
    store = TCPStore(result.hostname, result.port, world_size, start_daemon)
RuntimeError: Address already in use

问题在于,TCP的端口被占用,一种解决方法是,运行程序的同时指定端口,端口号随意给出:

--master_port 29501

另一种方式,查找占用的端口号(在程序里 插入print输出),然后找到该端口号对应的PID值:netstat -nltp,然后通过kill -9 PID来解除对该端口的占用

你可能感兴趣的:(pytorch学习)