这次可以判断是加上这段话解决的
import pickle as pkl
from functools import partial
import torch
pkl.load = partial(pkl.load, encoding="latin1")
pkl.Unpickler = partial(pkl.Unpickler, encoding="latin1")
就把load_mosi_context函数内写的类mosi_context放到外面全局
参考https://www.cnblogs.com/js2hou/p/13923089.html
这篇文章的第二个方法,把所有优化器的.step()放到最后一个损失.backward()的后面
——————————————————————
True cuda
Training initializing... Setup ID is: 240
Temp location for models: models\model_mosi_240.pt
Grid search results are in: result2\results_mosi_240.csv
(1141, 50) train_audio2
(1141, 50) train_video2
(1141, 50) train_text2
(306, 50) valid_audio2
(306, 50) valid_video2
Audio feature dimension is: 50
Visual feature dimension is: 50
Text feature dimension is: 50
There are 2916 different hyper-parameter settings in total.
Epoch: 3 loss: 0.12522432192807853
Validation loss is: 0.02261879241544437
Found new best model, saving to disk...
Epoch: 23 loss: 0.14449033431327316
Validation loss is: 0.05201117820989073
(752, 2) output_test_shape
(752, 2) y_shape
Binary accuracy on test set is 0.7792553191489362
F1-score on test set is 0.7822776619274766
best_acc: 0.7792553191489362
best_f1: 0.7822776619274766
best_setting: (50, 50, 50, 0.5, 0.5, 0.5, 0.001, 8, 0.01, 0.05)
Epoch: 0 loss: 0.16256842604861985
Validation loss is: 0.04407192990670796
Found new best model, saving to disk...
Epoch: 9 loss: 0.07808198154382373
Validation loss is: 0.01965569826512555
Found new best model, saving to disk...
————————————————————
+-------------+---------+
| Parameter | Value |
+=============+=========+
| Cuda | 1 |是否使用cuda
+-------------+---------+
| Data path | ./data/ |
+-------------+---------+
| Epochs | 500 |
+-------------+---------+
| Max len | 20 |
+-------------+---------+
| Model path | models |调用模型models\model_mosi_240.pt
+-------------+---------+
| Output dim | 2 |
+-------------+---------+
| Output path | result2 |result2\results_mosi_240.csv
+-------------+---------+
| Patience | 20 |
+-------------+---------+
| Run id | 240 |
+-------------+---------+
| Signiture | mosi |
+-------------+---------+
进入utils的load_mosi_context,读取’data/unimodal_mosi_2way.pickle’,包括
'train_mask' (62, 63)
'test_mask' (31, 63)
'train_label' (62, 63, 2)
'test_label' (31, 63, 2)
'text_train' (62, 63, 50)
'audio_train' (62, 63, 50)
'video_train' (62, 63, 50)
'text_test' (31, 63, 50)
'audio_test' (31, 63, 50)
'video_test' (31, 63, 50)
遍历’train_mask’[0:49,0:63]中的1,把原本空list的train_audio2、train_video2、train_text2、train_label2分别填入1141个50维、2维ndarray
遍历’train_mask’[49:62,0:63]中的1,把原本空list的valid_audio2、valid_video2、valid_text2、valid_label2分别填入306个50维、2维ndarray
遍历’test_mask’[0:31,0:63]中的1,把原本空list的test_audio2、test_video2、test_text2、test_label2分别填入752个50维、2维ndarray(其实就是在划分数据集)
然后构造dataloader数据结构,并筛除可能无效NaN的数据(看不太懂这是咋完成目标的?)
train_set.visual[train_set.visual != train_set.visual] = 0
valid_set.visual[valid_set.visual != valid_set.visual] = 0
test_set.visual[test_set.visual != test_set.visual] = 0
train_set.audio[train_set.audio != train_set.audio] = 0
valid_set.audio[valid_set.audio != valid_set.audio] = 0
test_set.audio[test_set.audio != test_set.audio] = 0
新增参数
|audio_hidden |[150, 50, 100] |
|audio_dropout |[0, 0.1, 0.2, 0.3, 0.5, 0.6, 0.7, 0.8, 0.9]|
|learning_rate |[0.0001, 0.001, 0.01] |
|batch_size |[8, 16, 32] |
|weight_decay |[0, 0.001, 0.01, 0.0001] |
|alpha |[0.01, 0.001, 0.05] |
共计2916组超参数
循环直到遍历完所有2916组超参数
随机从上面选择三个隐层大小(150),三个丢弃率(0.2),一个学习率(0.001),一个批次大小(8),一个衰减率(0.01),一个惩罚项(0.05)(这里应该是在记录最佳超参数设置)
从而构建此时一系列模型(不知咋地这块贼卡,明明只是初始化模型啊?)
语音、视觉、文本的编码器encoder_a、encoder_v、encoder_l
Encoder_5(
#输入[8,50]
(linear_1): Linear(in_features=50, out_features=500, bias=True)
#[8,50]x[50,500]->[8,500]
(drop): Dropout(p=0.2, inplace=False)
(norm2): BatchNorm1d(500, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(leaky_relu)
(linear_2): Linear(in_features=500, out_features=1500, bias=True)
#[8,500]x[500,1500]->[8,1500]
(drop): Dropout(p=0.2, inplace=False)
(norm3): BatchNorm1d(1500, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(leaky_relu)
(linear_3): Linear(in_features=1500, out_features=150, bias=True)
#[8,1500]x[1500,150]->[8,150]
(drop): Dropout(p=0.2, inplace=False)
(norm): BatchNorm1d(150, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(leaky_relu)
(linear_4): Linear(in_features=150, out_features=150, bias=True)
#[8,150]x[150,150]->[8,150]
(tanh)
)
解码器decoder_a、decoder_v、decoder_l
Decoder2(
#输入[8,150]
(model): Sequential(
(0): Linear(in_features=150, out_features=512, bias=True)
#[8,150]x[150,512]->[8,512]
(1): Dropout(p=0.5, inplace=False)
(2): LeakyReLU(negative_slope=0.2, inplace=True)
(3): Linear(in_features=512, out_features=64, bias=True)
#[8,512]x[512,64]->[8,64]
(4): Dropout(p=0.5, inplace=False)
(5): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(6): LeakyReLU(negative_slope=0.2, inplace=True)
(7): Linear(in_features=64, out_features=50, bias=True)
#[8,64]x[64,50]->[8,50]
(8): Tanh()
)
)
判别器
Discriminator(
#输入[8,150]
(model): Sequential(
(0): Linear(in_features=150, out_features=64, bias=True)
(1): LeakyReLU(negative_slope=0.2, inplace=True)
(2): Linear(in_features=64, out_features=16, bias=True)
(3): Tanh()
(4): Linear(in_features=16, out_features=1, bias=True)
(5): Sigmoid()
)
#输出[8,1]
)
分类器classifier
graph11_new(
#输入[8,3,150]
#结果先拆回三个[8,150](a1)
(attention): Linear(in_features=150, out_features=1, bias=True)
#[8,150]x[150,1]->[8,1]
(sigmoid)
#接着三个结果(sa)再拼成[8,3](total_weights)
#然后把前面的三个结果(sa)[8,1]分别expand到[8,150](unimodal_a)
#同时把这三个[8,1](sa)squeeze到[8](sa)
unimodal = (unimodal_a * a1 + unimodal_v * v1 + unimodal_l * l1)/3
#即三个unimodal_[8,150]分别乘以输入拆出来的[8,150]得到[8,150](unimodal)
(softmax)
#分别对输入拆出来的三个[8,150](a1)做softmax(a)
sav = (1/(torch.matmul(a.unsqueeze(1), v.unsqueeze(2)).squeeze() +0.5) *(sa+sv))
#即这三个[8,150]两两组合运算,
#先是一个unsqueeze到[8,1,150],一个unsqueeze到[8,150,1]后相乘出[8,1,1]再squeeze到[8]后加上0.5
#然后求个倒数再乘上对应的两个softmax出来的[8,150]的和,从而有三个[8]
#将这三个[8](sav)squeeze到[8,1]后拼接得到(normalize)[8,3]
(softmax)
#然后拿它再和一开始的[8,3](total_weights)拼接得到[8,6](total_weights)
(graph_fusion): Sequential(
#输入:一开始的输入拆出来的三个(a1)[8,150]两两拼接出的[8,300]
(0): Linear(in_features=300, out_features=64, bias=True)
(1): LeakyReLU(negative_slope=0.2, inplace=True)
(2): Linear(in_features=64, out_features=150, bias=True)
(3): Tanh()
#输出:[8,150]
)
#分别拿上一步的[8,3](normalize)拆出[8,1]再expand到[8,150],与这里得到的[8,150]元素乘
(elu)
#三组分别的结果(a_v)相加从而得到两模态的[8,150](bimodal)
#再重复上面的graph_fusion
(softmax)
#得到(a_v2)
savvl = (1/(torch.matmul(a_v2.unsqueeze(1), v_l2.unsqueeze(2)).squeeze() +0.5) *(sav+svl))
#类似的得到三个[8]
savl = (1/(torch.matmul(a_v2.unsqueeze(1), l.unsqueeze(2)).squeeze() +0.5) *(sav+sl))
#类似的得到三个[8]
#将这六个(savvl和savl)拼接出[8,6](normalize2)与之前的[8,6](total_weights)拼成[8,12]
(graph_fusion2): Sequential(
#输入:graph_fusion的三个结果(a_v)两两拼接
(0): Linear(in_features=300, out_features=64, bias=True)
(1): LeakyReLU(negative_slope=0.2, inplace=True)
(2): Linear(in_features=64, out_features=150, bias=True)
(3): Tanh()
#输出:[8,150]
)
#分别拿上一步的[8,6](normalize2)拆出[8,1]再expand到[8,150],与这里得到的[8,150]元素乘
(elu)
#六组分别的结果(avvl和avl)相加从而得到三模态的[8,150](trimodal)
#拼接(unimodal,bimodal,trimodal)[8,450]
(norm2): BatchNorm1d(450, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
#(fusion)
(linear_1): Linear(in_features=450, out_features=50, bias=True)
(tanh)
(linear_2): Linear(in_features=50, out_features=50, bias=True)
(tanh)
(linear_3): Linear(in_features=50, out_features=2, bias=True)
#得到y_2[8,2]
)
分类器classifier_3
classifier3(
#输入[8,150]
(norm): BatchNorm1d(150, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(drop): Dropout(p=0.5, inplace=False)
(linear_1): Linear(in_features=150, out_features=150, bias=True)
(tanh)
(drop): Dropout(p=0.5, inplace=False)
(linear_2): Linear(in_features=150, out_features=2, bias=True)
#输出[8,2]
(softmax)
)
构造四种损失
criterion = nn.L1Loss(reduction='sum').cuda()
adversarial_loss = torch.nn.BCELoss().cuda() # 常用于二分类的损失
classifier_loss = torch.nn.SoftMarginLoss().cuda() #类似于角度上的三元损失,扩大类间距缩小类内距什么的
pixelwise_loss = torch.nn.L1Loss(reduction='sum').cuda()
还有俩优化器用的超参数
b1 = 0.5
b2 = 0.999
构造四个超参数
optimizer_G = torch.optim.Adam(
itertools.chain(encoder_a.parameters(), encoder_v.parameters(), encoder_l.parameters(), \
decoder_a.parameters(), decoder_l.parameters(), decoder_v.parameters()), weight_decay=decay,
lr=lr, betas=(b1, b2))
# 对编码器解码器优化
optimizer_D = torch.optim.Adam(discriminator.parameters(), lr=lr, betas=(b1, b2), weight_decay=decay)
# 对判别器优化
optimizer_C = torch.optim.Adam(classifier.parameters(), lr=lr, betas=(b1, b2), weight_decay=decay)
# 对分类器classifier优化
optimizer_E = torch.optim.Adam(
itertools.chain(encoder_a.parameters(), encoder_v.parameters(), encoder_l.parameters(),
classifier_3.parameters()), lr=lr, betas=(b1, b2), weight_decay=decay)
# 对编码器和分类器classifier_3优化
通过dataloader数据结构把数据集做成迭代器(类似于yield那样吧,节省空间复杂度好像是)
batchsize是8,每个batch包括a、v、t三个(8, 50)的输入以及一个(8, 2)的标签
三个输入数据分别经过编码解码重构成(8, 50),通过pixelwise_loss(本质是L1范数)
计算重构损失rl1
分别构造(8, 1)的valid(全1)
和fake(全0)
进而计算
g_loss = alpha * (adversarial_loss(discriminator(l_en), valid)
+ adversarial_loss(discriminator(v_en), valid))
+ (1 - alpha) * (rl1)
即文本的编码经过判别后与valid做BCE损失,视觉的编码经过判别后与valid做BCE损失,二者一起与重构损失乘上各自的惩罚权重(为什么不考虑语音呢?)
(本来这里对g_loss做BP后应该接着optimizer_G.step()的,但如开头所说,为了能运行移到了后面,不知道会对运算有什么影响)
——————————————————
对三模态的编码经过classifier_3分类器,得到三模态的分类结果(8, 2)
c_loss = criterion(a, y) + criterion(l, y) + criterion(v, y)
对三模态的分类结果各自与标签做L1损失得到分类损失
(此处同理本该optimizer_E.step())
——————————————————
接下来终于考虑语音的编码经过判别后与valid做BCE损失作为真实损失real_loss,
再分别得到文本和视觉的损失和作为fake_loss(语音为主其他为辅是吗)
从而有判别损失
d_loss = 0.5 * (real_loss + fake_loss)
(optimizer_D.step())
——————————————————
分别对三模态的编码unsqueeze到[8, 1, 150]后拼接得到[8, 3, 150](所谓的融合)
然后通过classifier做分类(但这个分类里还是把融合结果拆开)
得到[8, 2],与y做L1损失
(这时再按顺序执行所有.step())