(个人记录)
学知识图谱的表示学习,准备先跑一下RE-Net,
在预训练时,作者的batchsize是1024,因为我用的是1050ti,设成10都跑不了,所以干脆batchsize设成了2,
遂报错“RuntimeError: Trying to create tensor with negative dimension”
百度上面看不到几条信息(英文渣),不负有心人,找到一个能稍微看明白的,链接
程序中报错在这个地方:
s_q = torch.cat((s_q, torch.zeros(len(t_list) - len(s_q), self.h_dim).cuda()), dim=0)
打印len(s_q)和len(t_list)
发现这俩都等于2,len(s_q)在报错前变成了200(还不明白原因)
所以尝试让batchsize=3,正常运行~
因为还没好好看代码,刚接触Torch,只能是先解决问题,背后的原因等晚点深入了解后再补上(或许)。。。
ps:现在来看Torch配置起来比TensorFlow方便很多啊
大意了,昨天改了batchsize后预训练脚本可以跑就以为万事大吉,结果今天训练时又报同样的错,上一招不管用了,
再仔细看看问题贴,然后打印有问题的变量,发现问题可能出在这个torch.squeeze()上,查了它的用法,试着让它压缩s_h的第一个维度,重新运行,报了别的错~
*** 2 1 200 ***
torch.Size([1, 2, 200])
torch.Size([2, 200])
Traceback (most recent call last):
File "D:/pycharm_ws/RE-Net-master/train.py", line 241, in <module>
train(args)
File "D:/pycharm_ws/RE-Net-master/train.py", line 136, in train
loss_s = model(batch_data, (s_hist, s_hist_t), (o_hist, o_hist_t), graph_dict, subject=True)
File "E:\Installed\Anaconda\envs\pytorch-ws\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "D:\pycharm_ws\RE-Net-master\model.py", line 84, in forward
reverse=reverse)
File "E:\Installed\Anaconda\envs\pytorch-ws\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "D:\pycharm_ws\RE-Net-master\Aggregator.py", line 167, in forward
return s_packed_input, s_packed_input_r
UnboundLocalError: local variable 's_packed_input_r' referenced before assignment
Process finished with exit code 1
这次吧,报错的点不一样,错误原因也不同,暂且当上一个真的解决了,来看看这个吧
ps:明天一定认真看代码
2021.1.31
换上了1080ti的卡,batchsize设为1024,预训练3秒一代,训练40分钟一代~