Huggingface Trainer报错RuntimeError: Expected all tensors to be on the same device

问题描述

最近在新的机器上复现一个简单的使用Huggingface Transformers库中的Trainer模块来实现seq2seq模型时,报如下错误:

Traceback (most recent call last):
  File "/test/qiezi/jxqi/project/t5_for_imdb/seq2seq/run_seq2seq.py", line 252, in 
    main()
  File "/test/qiezi/jxqi/project/t5_for_imdb/seq2seq/run_seq2seq.py", line 197, in main
    train_result = trainer.train(resume_from_checkpoint=checkpoint)
  File "/home/xing/miniconda3/envs/jxqi_base/lib/python3.9/site-packages/transformers/trainer.py", line 1316, in train
    tr_loss_step = self.training_step(model, inputs)
  File "/home/xing/miniconda3/envs/jxqi_base/lib/python3.9/site-packages/transformers/trainer.py", line 1849, in training_step
    loss = self.compute_loss(model, inputs)
  File "/home/xing/miniconda3/envs/jxqi_base/lib/python3.9/site-packages/transformers/trainer.py", line 1881, in compute_loss
    outputs = model(**inputs)
  File "/home/xing/miniconda3/envs/jxqi_base/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/xing/miniconda3/envs/jxqi_base/lib/python3.9/site-packages/transformers/models/t5/modeling_t5.py", line 1571, in forward
    encoder_outputs = self.encoder(
  File "/home/xing/miniconda3/envs/jxqi_base/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/xing/miniconda3/envs/jxqi_base/lib/python3.9/site-packages/transformers/models/t5/modeling_t5.py", line 904, in forward
    inputs_embeds = self.embed_tokens(input_ids)
  File "/home/xing/miniconda3/envs/jxqi_base/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/xing/miniconda3/envs/jxqi_base/lib/python3.9/site-packages/torch/nn/modules/sparse.py", line 158, in forward
    return F.embedding(
  File "/home/xing/miniconda3/envs/jxqi_base/lib/python3.9/site-packages/torch/nn/functional.py", line 2183, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper__index_select)

解决方法

升级Transformers库到4.13.0,即

pip install transformers==4.13.0

参考

  1. https://github.com/nlp-with-transformers/notebooks/issues/31

你可能感兴趣的:(自然语言处理,NLP,Pytorch,python,深度学习)