在对调用pytorch_pretrained_bert时,如果用多个GPU出现StopIteration: Caught StopIteration in replica 0 on device 0.具体如下。
File "/home/yuangen_yu/CLUE/baselines/models_pytorch/classifier_pytorch/run_classifier.py", line 569, in
main()
File "/home/yuangen_yu/CLUE/baselines/models_pytorch/classifier_pytorch/run_classifier.py", line 504, in main
global_step, tr_loss = train(args, train_dataset, model, tokenizer)
File "/home/yuangen_yu/CLUE/baselines/models_pytorch/classifier_pytorch/run_classifier.py", line 113, in train
outputs = model(**inputs)
File "/home/yuangen_yu/anaconda3/envs/transformers/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "/home/yuangen_yu/anaconda3/envs/transformers/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 155, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/home/yuangen_yu/anaconda3/envs/transformers/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 165, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/home/yuangen_yu/anaconda3/envs/transformers/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 85, in parallel_apply
output.reraise()
File "/home/yuangen_yu/anaconda3/envs/transformers/lib/python3.6/site-packages/torch/_utils.py", line 395, in reraise
raise self.exc_type(msg)
StopIteration: Caught StopIteration in replica 0 on device 0.
Original Traceback (most recent call last):
File "/home/yuangen_yu/anaconda3/envs/transformers/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 60, in _worker
output = module(*input, **kwargs)
File "/home/yuangen_yu/anaconda3/envs/transformers/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "/home/yuangen_yu/CLUE/baselines/models_pytorch/classifier_pytorch/transformers/modeling_bert.py", line 897, in forward
head_mask=head_mask)
File "/home/yuangen_yu/anaconda3/envs/transformers/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "/home/yuangen_yu/CLUE/baselines/models_pytorch/classifier_pytorch/transformers/modeling_bert.py", line 606, in forward
extended_attention_mask = extended_attention_mask.to(dtype=next(self.parameters()).dtype) # fp16 compatibility
StopIteration
我的pytorch版本是1.5,我用单个GPU把这个打印出来next(self.parameters()).dtype, 都是torch.float32,应该就是版本问题。直接替换掉就可以了。