复现Modality to Modality Translation: An Adversarial Representation Learning and Graph Fusion Network

说好的pytorch框架,结果加载个数据还得安tf和keras

结果又是ordinal not in range(128)的报错

这次可以判断是加上这段话解决的

import pickle as pkl
from functools import partial
import torch
pkl.load = partial(pkl.load, encoding="latin1")
pkl.Unpickler = partial(pkl.Unpickler, encoding="latin1")

加上result2的路径

Can’t pickle local object ‘load_mosi_context..mosi_context’ 以及Ran out of input

就把load_mosi_context函数内写的类mosi_context放到外面全局

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [100, 100]], which is output 0 of TBackward, is at version 2; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

参考https://www.cnblogs.com/js2hou/p/13923089.html
这篇文章的第二个方法,把所有优化器的.step()放到最后一个损失.backward()的后面

——————————————————————

True cuda
Training initializing... Setup ID is: 240
Temp location for models: models\model_mosi_240.pt
Grid search results are in: result2\results_mosi_240.csv
(1141, 50) train_audio2
(1141, 50) train_video2
(1141, 50) train_text2
(306, 50) valid_audio2
(306, 50) valid_video2
Audio feature dimension is: 50
Visual feature dimension is: 50
Text feature dimension is: 50
There are 2916 different hyper-parameter settings in total.
Epoch: 3 loss: 0.12522432192807853
Validation loss is: 0.02261879241544437
Found new best model, saving to disk...
Epoch: 23 loss: 0.14449033431327316
Validation loss is: 0.05201117820989073
(752, 2) output_test_shape
(752, 2) y_shape
Binary accuracy on test set is 0.7792553191489362
F1-score on test set is 0.7822776619274766
best_acc:  0.7792553191489362
best_f1:  0.7822776619274766
best_setting:  (50, 50, 50, 0.5, 0.5, 0.5, 0.001, 8, 0.01, 0.05)
Epoch: 0 loss: 0.16256842604861985
Validation loss is: 0.04407192990670796
Found new best model, saving to disk...
Epoch: 9 loss: 0.07808198154382373
Validation loss is: 0.01965569826512555
Found new best model, saving to disk...

————————————————————

+-------------+---------+
|  Parameter  |  Value  |
+=============+=========+
| Cuda        | 1       |是否使用cuda
+-------------+---------+
| Data path   | ./data/ |
+-------------+---------+
| Epochs      | 500     |
+-------------+---------+
| Max len     | 20      |
+-------------+---------+
| Model path  | models  |调用模型models\model_mosi_240.pt
+-------------+---------+
| Output dim  | 2       |
+-------------+---------+
| Output path | result2 |result2\results_mosi_240.csv
+-------------+---------+
| Patience    | 20      |
+-------------+---------+
| Run id      | 240     |
+-------------+---------+
| Signiture   | mosi    |
+-------------+---------+

进入utils的load_mosi_context,读取’data/unimodal_mosi_2way.pickle’,包括

'train_mask'	(62, 63)
'test_mask'		(31, 63)
'train_label'	(62, 63, 2)
'test_label'	(31, 63, 2)
'text_train'	(62, 63, 50)
'audio_train'	(62, 63, 50)
'video_train'	(62, 63, 50)
'text_test'		(31, 63, 50)
'audio_test'	(31, 63, 50)
'video_test'	(31, 63, 50)

遍历’train_mask’[0:49,0:63]中的1,把原本空list的train_audio2、train_video2、train_text2、train_label2分别填入1141个50维、2维ndarray
遍历’train_mask’[49:62,0:63]中的1,把原本空list的valid_audio2、valid_video2、valid_text2、valid_label2分别填入306个50维、2维ndarray
遍历’test_mask’[0:31,0:63]中的1,把原本空list的test_audio2、test_video2、test_text2、test_label2分别填入752个50维、2维ndarray(其实就是在划分数据集)
然后构造dataloader数据结构,并筛除可能无效NaN的数据(看不太懂这是咋完成目标的?)

train_set.visual[train_set.visual != train_set.visual] = 0
valid_set.visual[valid_set.visual != valid_set.visual] = 0
test_set.visual[test_set.visual != test_set.visual] = 0

train_set.audio[train_set.audio != train_set.audio] = 0
valid_set.audio[valid_set.audio != valid_set.audio] = 0
test_set.audio[test_set.audio != test_set.audio] = 0
新增参数
|audio_hidden 	|[150, 50, 100]								|
|audio_dropout	|[0, 0.1, 0.2, 0.3, 0.5, 0.6, 0.7, 0.8, 0.9]|
|learning_rate	|[0.0001, 0.001, 0.01]						|
|batch_size		|[8, 16, 32]								|
|weight_decay	|[0, 0.001, 0.01, 0.0001]					|
|alpha			|[0.01, 0.001, 0.05]						|
共计2916组超参数

循环直到遍历完所有2916组超参数
随机从上面选择三个隐层大小(150),三个丢弃率(0.2),一个学习率(0.001),一个批次大小(8),一个衰减率(0.01),一个惩罚项(0.05)(这里应该是在记录最佳超参数设置)
从而构建此时一系列模型(不知咋地这块贼卡,明明只是初始化模型啊?)
语音、视觉、文本的编码器encoder_a、encoder_v、encoder_l

Encoder_5(
  #输入[8,50]
  (linear_1): Linear(in_features=50, out_features=500, bias=True)
  #[8,50]x[50,500]->[8,500]
  (drop): Dropout(p=0.2, inplace=False)
  (norm2): BatchNorm1d(500, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (leaky_relu)
  (linear_2): Linear(in_features=500, out_features=1500, bias=True)
  #[8,500]x[500,1500]->[8,1500]
  (drop): Dropout(p=0.2, inplace=False)
  (norm3): BatchNorm1d(1500, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (leaky_relu)
  (linear_3): Linear(in_features=1500, out_features=150, bias=True)
  #[8,1500]x[1500,150]->[8,150]
  (drop): Dropout(p=0.2, inplace=False)
  (norm): BatchNorm1d(150, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (leaky_relu)
  (linear_4): Linear(in_features=150, out_features=150, bias=True)
  #[8,150]x[150,150]->[8,150]
  (tanh)
)

解码器decoder_a、decoder_v、decoder_l

Decoder2(
  #输入[8,150]
  (model): Sequential(
    (0): Linear(in_features=150, out_features=512, bias=True)
    #[8,150]x[150,512]->[8,512]
    (1): Dropout(p=0.5, inplace=False)
    (2): LeakyReLU(negative_slope=0.2, inplace=True)
    (3): Linear(in_features=512, out_features=64, bias=True)
    #[8,512]x[512,64]->[8,64]
    (4): Dropout(p=0.5, inplace=False)
    (5): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (6): LeakyReLU(negative_slope=0.2, inplace=True)
    (7): Linear(in_features=64, out_features=50, bias=True)
    #[8,64]x[64,50]->[8,50]
    (8): Tanh()
  )
)

判别器

Discriminator(
  #输入[8,150]
  (model): Sequential(
    (0): Linear(in_features=150, out_features=64, bias=True)
    (1): LeakyReLU(negative_slope=0.2, inplace=True)
    (2): Linear(in_features=64, out_features=16, bias=True)
    (3): Tanh()
    (4): Linear(in_features=16, out_features=1, bias=True)
    (5): Sigmoid()
  )
  #输出[8,1]
)

分类器classifier

graph11_new(
  #输入[8,3,150]
  #结果先拆回三个[8,150](a1)
  (attention): Linear(in_features=150, out_features=1, bias=True)
  #[8,150]x[150,1]->[8,1]
  (sigmoid)
  #接着三个结果(sa)再拼成[8,3](total_weights) 
  
  #然后把前面的三个结果(sa)[8,1]分别expand到[8,150](unimodal_a)
  
  #同时把这三个[8,1](sa)squeeze到[8](sa)
  
  unimodal = (unimodal_a * a1 + unimodal_v * v1 + unimodal_l * l1)/3
  #即三个unimodal_[8,150]分别乘以输入拆出来的[8,150]得到[8,150](unimodal)
  
  (softmax)
  #分别对输入拆出来的三个[8,150](a1)做softmax(a)
  sav = (1/(torch.matmul(a.unsqueeze(1), v.unsqueeze(2)).squeeze() +0.5) *(sa+sv))
  #即这三个[8,150]两两组合运算,
  #先是一个unsqueeze到[8,1,150],一个unsqueeze到[8,150,1]后相乘出[8,1,1]再squeeze到[8]后加上0.5
  #然后求个倒数再乘上对应的两个softmax出来的[8,150]的和,从而有三个[8]
  #将这三个[8](sav)squeeze到[8,1]后拼接得到(normalize)[8,3]
  (softmax)
  #然后拿它再和一开始的[8,3](total_weights)拼接得到[8,6](total_weights)

  (graph_fusion): Sequential(
    #输入:一开始的输入拆出来的三个(a1)[8,150]两两拼接出的[8,300]
    (0): Linear(in_features=300, out_features=64, bias=True)
    (1): LeakyReLU(negative_slope=0.2, inplace=True)
    (2): Linear(in_features=64, out_features=150, bias=True)
    (3): Tanh()
    #输出:[8,150]
  )
  #分别拿上一步的[8,3](normalize)拆出[8,1]再expand到[8,150],与这里得到的[8,150]元素乘
  (elu)
  #三组分别的结果(a_v)相加从而得到两模态的[8,150](bimodal)
  
  #再重复上面的graph_fusion
  (softmax)
  #得到(a_v2)
  savvl = (1/(torch.matmul(a_v2.unsqueeze(1), v_l2.unsqueeze(2)).squeeze() +0.5) *(sav+svl))
  #类似的得到三个[8]
  savl = (1/(torch.matmul(a_v2.unsqueeze(1), l.unsqueeze(2)).squeeze() +0.5) *(sav+sl))
  #类似的得到三个[8]
  #将这六个(savvl和savl)拼接出[8,6](normalize2)与之前的[8,6](total_weights)拼成[8,12]
  
  (graph_fusion2): Sequential(
    #输入:graph_fusion的三个结果(a_v)两两拼接
    (0): Linear(in_features=300, out_features=64, bias=True)
    (1): LeakyReLU(negative_slope=0.2, inplace=True)
    (2): Linear(in_features=64, out_features=150, bias=True)
    (3): Tanh()
    #输出:[8,150]
  )
  #分别拿上一步的[8,6](normalize2)拆出[8,1]再expand到[8,150],与这里得到的[8,150]元素乘
  (elu)
  #六组分别的结果(avvl和avl)相加从而得到三模态的[8,150](trimodal)
  #拼接(unimodal,bimodal,trimodal)[8,450]
  (norm2): BatchNorm1d(450, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  #(fusion)
  (linear_1): Linear(in_features=450, out_features=50, bias=True)
  (tanh)
  (linear_2): Linear(in_features=50, out_features=50, bias=True)
  (tanh)
  (linear_3): Linear(in_features=50, out_features=2, bias=True)
  #得到y_2[8,2]
)

分类器classifier_3

classifier3(
  #输入[8,150]
  (norm): BatchNorm1d(150, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (drop): Dropout(p=0.5, inplace=False)
  (linear_1): Linear(in_features=150, out_features=150, bias=True)
  (tanh)
  (drop): Dropout(p=0.5, inplace=False)
  (linear_2): Linear(in_features=150, out_features=2, bias=True)
  #输出[8,2]
  (softmax)
)

构造四种损失

criterion = nn.L1Loss(reduction='sum').cuda()

adversarial_loss = torch.nn.BCELoss().cuda()	# 常用于二分类的损失

classifier_loss = torch.nn.SoftMarginLoss().cuda()	#类似于角度上的三元损失,扩大类间距缩小类内距什么的

pixelwise_loss = torch.nn.L1Loss(reduction='sum').cuda()

还有俩优化器用的超参数
b1 = 0.5
b2 = 0.999
构造四个超参数

optimizer_G = torch.optim.Adam(
    itertools.chain(encoder_a.parameters(), encoder_v.parameters(), encoder_l.parameters(), \
                    decoder_a.parameters(), decoder_l.parameters(), decoder_v.parameters()), weight_decay=decay,
    lr=lr, betas=(b1, b2))
# 对编码器解码器优化
optimizer_D = torch.optim.Adam(discriminator.parameters(), lr=lr, betas=(b1, b2), weight_decay=decay)
# 对判别器优化
optimizer_C = torch.optim.Adam(classifier.parameters(), lr=lr, betas=(b1, b2), weight_decay=decay)
# 对分类器classifier优化
optimizer_E = torch.optim.Adam(
    itertools.chain(encoder_a.parameters(), encoder_v.parameters(), encoder_l.parameters(),
                    classifier_3.parameters()), lr=lr, betas=(b1, b2), weight_decay=decay)
# 对编码器和分类器classifier_3优化                 

通过dataloader数据结构把数据集做成迭代器(类似于yield那样吧,节省空间复杂度好像是)
batchsize是8,每个batch包括a、v、t三个(8, 50)的输入以及一个(8, 2)的标签
三个输入数据分别经过编码解码重构成(8, 50),通过pixelwise_loss(本质是L1范数)计算重构损失rl1
分别构造(8, 1)的valid(全1)和fake(全0)
进而计算

g_loss = alpha * (adversarial_loss(discriminator(l_en), valid) 
		+ adversarial_loss(discriminator(v_en), valid)) 
		+ (1 - alpha) * (rl1)

即文本的编码经过判别后与valid做BCE损失,视觉的编码经过判别后与valid做BCE损失,二者一起与重构损失乘上各自的惩罚权重(为什么不考虑语音呢?)
(本来这里对g_loss做BP后应该接着optimizer_G.step()的,但如开头所说,为了能运行移到了后面,不知道会对运算有什么影响)
——————————————————
对三模态的编码经过classifier_3分类器,得到三模态的分类结果(8, 2)

c_loss = criterion(a, y) + criterion(l, y) + criterion(v, y)

对三模态的分类结果各自与标签做L1损失得到分类损失
(此处同理本该optimizer_E.step())
——————————————————
接下来终于考虑语音的编码经过判别后与valid做BCE损失作为真实损失real_loss,
再分别得到文本和视觉的损失和作为fake_loss(语音为主其他为辅是吗)
从而有判别损失

d_loss = 0.5 * (real_loss + fake_loss)

(optimizer_D.step())
——————————————————
分别对三模态的编码unsqueeze到[8, 1, 150]后拼接得到[8, 3, 150](所谓的融合)
然后通过classifier做分类(但这个分类里还是把融合结果拆开)得到[8, 2],与y做L1损失
(这时再按顺序执行所有.step())

你可能感兴趣的:(融合,论文,小白,python)