pytorch-distributed traning

 

1. 单机多卡数据并行

```

model = xxx  
losses = torch.nn.parallel.data_parallel(model, inputs=(), device_ids=[], dim=x)  # functional style

```

2. pytorch-1.0 distributed

2.1 单机多卡

 

2.2 多级多卡

 

----

reference: https://pytorch.org/docs/stable/distributed.html

----

errors:

  File "inference.py", line 111, in 
    model.load_state_dict(torch.load('./output/state-ckpt-epoch-final', map_location='cpu')) # ['model']
  File "/home/anaconda3/lib/python3.6/site-packages/torch/serialization.py", line 367, in load
    return _load(f, map_location, pickle_module)
  File "/home/anaconda3/lib/python3.6/site-packages/torch/serialization.py", line 545, in _load
    deserialized_objects[key]._set_from_file(f, offset, f_should_read_directly)
RuntimeError: storage has wrong size: expected 4363873583357797660 got 1

你可能感兴趣的:(PyTorch,DeepLearning)