VQA-ReGat 项目运行遇到的错误

VQA-ReGat:关系感知图形注意网络用于VQA

项目地址
论文地址

  • 1.torch报错:StopIteration: Caught StopIteration in replica 0 on device 0.
    原因:多GPU运行此项目报错,可能是torch版本错误。
    修改:按照别的博客将 weight = next(self.parameters()).data改为weight = torch.float32

  • 2.仍报错:AttributeError: 'torch.dtype' no attribute 'new':torch.dtype没有new属性。
    原因:因为1出的修改,weight是torch.dtype类,非torch.tensor数据。
    修改:于是看源码只是想获取next(self.parameters()).data的数据类型,大部分都是cuda的torch.float32的类型,

因此最终修改:

weight = 0
weight = torch.tensor(weight,dtype=torch.float32)
weight = weight.cuda()
  • 3.报错:RuntimeError: unsupported operation: more than one element of the written-to tensor refers to a single memory location. Please clone() the tensor before performing the operation
    修改:q_expand = q.expand(*repeat_vals)改为 q_expand = q.expand(*repeat_vals).clone()

  • 4.报错:RuntimeError: CUDA out of memory. Tried to allocate 292.00 MiB (GPU 0; 10.76 GiB total capacity; 4.34 GiB already allocat
    可能本项目存在很多的parameters,所以设置小点的batch_size.

VQA-ReGat结果

只记录了3个epoch结果,

--------------mutan.json--------------------------------
epoch 10:train_loss: 2.69, norm: 2.9384, score: 71.8663
	eval score: 76.90 (92.66)
	
-------------butd.json----------------------------------
epoch 28, time: 758.79
	train_loss: 2.53, norm: 2.6914, score: 73.75
	eval score: 75.40 (92.66)
	
epoch 29, time: 765.29
	train_loss: 2.53, norm: 2.6903, score: 73.72
	eval score: 75.40 (92.66)

你可能感兴趣的:(报错,pytorch,视觉问答项目实战,python,深度学习)