首先是为了取消分布式训练重新配了环境(反正换垃圾笔记本跑了,也该从头再来了)
cuda11.1
python3.9
torch1.9.1
torchvision0.10.1
torchaudio0.9.1
其实是因为问了ChatGPT说是得重装torch,结果其实只需要把nccl那句改成下面这句就好(主要原因是win不支持nccl)
torch.distributed.init_process_group("gloo")
然后把baseline.yaml里的dataset_root设置成通过pretreatment预处理数据集后的输出,一般是数据集名-pkl
trainer_cfg:
'enable_float16': True,
'with_test': True,
'fix_BN': False,
# 是否冻结BN层
'log_iter': 100,
'restore_ckpt_strict': True,
# 表示在恢复模型时是否强制执行严格的检查。这可以确保恢复的模型与训练时的一致性,但可能会增加恢复时间和内存占用
'optimizer_reset': False,
'scheduler_reset': False,
'restore_hint': 0,
# 表示恢复模型时的提示步骤。这可以加速模型恢复过程,但需要手动确定适当的步骤数
'save_iter': 10000,
'save_name': 'Baseline',
'sync_BN': True,
# 是否使用同步BN层
'total_iter': 60000,
'sampler': {
'batch_shuffle': True,
'batch_size': [2, 4],
'frames_num_fixed': 30,
'frames_num_max': 50,
'frames_num_min': 25,
'sample_type': 'fixed_unordered',
# all_ordered表示使用整个序列进行测试,并按其自然顺序输入序列
# fixed_unordered表示使用固定数量的帧进行测试,并随机打乱它们的顺序
# frames_all_limit=720表示为避免内存不足而限制采样的帧数
# metric取euc(欧氏距离)或cos(余弦相似度)
'type': 'TripletSampler'},
'transform': [
{'type': 'BaseSilCuttingTransform',
'img_w': 64}]}
loss_cfg:
loss_term_weight: 1.0
# 损失函数项的权重,用于指定不同的损失函数贡献于总体损失函数的比例
margin: 0.2
# 三元组损失函数的边界参数。它定义了正样本和负样本之间的最小距离差异。如果距离差异小于边界,则损失函数值为0,否则为距离差异减去边界
type: TripletLoss
log_prefix: triplet
loss_term_weight: 0.1
scale: 16
# 交叉熵损失函数的比例尺参数
type: CrossEntropyLoss
log_prefix: softmax
log_accuracy: true
# 表示是否在日志文件中记录训练准确率
model_cfg:
'model': 'Baseline',
'backbone_cfg': {
'in_channels': 1,
'layers_cfg': ['BC-64', 'BC-64', 'M', 'BC-128', 'BC-128', 'M', 'BC-256', 'BC-256'], 'type': 'Plain'},
'SeparateFCs': {
'in_channels': 256, 'out_channels': 256, 'parts_num': 31},
'SeparateBNNecks': {
'class_num': 74, 'in_channels': 256, 'parts_num': 31},
'bin_num': [16, 8, 4, 2, 1]}
# 在模型的最后一层,输出向量将被分成多个不同的二进制数。这些数将用于度量两个向量之间的距离。bin_num指定二进制数的数量。在这里,有5个不同的二进制数,分别包含16、8、4、2和1个元素。
data_cfg:
{'dataset_name': 'CASIA-B',
'dataset_root': '../datasets/CASIA-B-pkl',
'num_workers': 1,
'dataset_partition': '../datasets/CASIA-B/CASIA-B.json',
# 1~74是训练集,75~124测试集
'remove_no_gallery': False,
#是否删除没有匹配图库样本的探测样本。这可以确保测试集仅包含有意义的样本。
'cache': False,
'test_dataset_name': 'CASIA-B'}
Train Pid List --------
[001, 002, ..., 074]
Test Pid List --------
[075, 076, ..., 124]
'lr': 0.1, 'momentum': 0.9, 'solver': 'SGD', 'weight_decay': 0.0005
'gamma': 0.1, 'milestones': [20000, 40000], 'scheduler': 'MultiStepLR'}
# 训练中将会更新学习率的步骤。
# 在这个例子中,训练开始时的初始学习率将持续到第一次更新的里程碑(20000个步骤),然后学习率将被降低,再在第二个里程碑(40000个步骤)降低一次。在这个例子中,只使用了两个里程碑来降低学习率,但可以根据需要添加更多的里程碑。
Parameters Count: 3.77914M
因为垃圾笔记本爆显存所以调了训练bs为2和4(类数和类内样本数)结果梯度爆炸了呵呵
main.py
首先是初始化cfg配置信息
msg_mgr = get_msg_mgr()
创建一个用于显示输出log信息的对象
__init__: 初始化MessageManager实例,创建一个有序字典info_dict,一个用于记录summary类型的列表writer_hparams,以及一个记录时间的变量time。
init_manager: 初始化MessageManager实例,创建一个记录器logger,一个TensorBoard写入器writer,并将它们添加到logger中,同时还会将log输出到文件。它还将iteration、log_iter和save_path设置为实例属性。
init_logger: 初始化记录器logger,可以选择是否将log输出到文件。
append: 将新的信息添加到info_dict中。
flush: 清空info_dict和TensorBoard写入器writer中的缓存。
write_to_tensorboard: 将summary写入TensorBoard中。
log_training_info: 输出训练信息和统计结果。
reset_time: 重置时间。
train_step: 训练一个步骤。
log_debug: 记录debug级别的日志信息。
log_info: 记录info级别的日志信息。
log_warning: 记录warning级别的日志信息。
Model = getattr(models, model_cfg['model'])
model = Model(cfgs, training)
# 使用getattr()获取models模块中对应的模型类。然后将读取的模型参数传入模型类实例化一个模型对象,最终将该对象返回。这个过程即为模型的初始化过程。
if training and cfgs['trainer_cfg']['sync_BN']:
model = nn.SyncBatchNorm.convert_sync_batchnorm(model)
# 判断是否在多GPU上同步BN
if cfgs['trainer_cfg']['fix_BN']:
model.fix_BN()
# 冻结BN
model = get_ddp_module(model)
# 将模型封装为一个分布式模型
msg_mgr.log_info(params_count(model))
msg_mgr.log_info("Model Initialization Finished!")
ipts = model.inputs_pretreament(inputs)
模型:
Baseline(
# 把上面的再拆成5个玩意
# ipts是list包着的[4, 30, 64, 44]->sils[4, 1, 30, 64, 44]
# labs是一维tensor=[63, 61, 63, 61](batch_size长)
# seqL是None
# 输入x[4, 1, 30, 64, 44]转置成[120, 1, 64, 44]输入Plain的feature
(Backbone): SetBlockWrapper(
(forward_block): Plain(
(feature): Sequential(
(0): BasicConv2d(
(conv): Conv2d(1, 64, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), bias=False)
)
# 得到[120, 64, 64, 44]
(1): LeakyReLU(negative_slope=0.01, inplace=True)
(2): BasicConv2d(
(conv): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
)
(3): LeakyReLU(negative_slope=0.01, inplace=True)
(4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
# torch.Size([120, 64, 32, 22])
(5): BasicConv2d(
(conv): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
)
# torch.Size([120, 128, 32, 22])
(6): LeakyReLU(negative_slope=0.01, inplace=True)
(7): BasicConv2d(
(conv): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
)
(8): LeakyReLU(negative_slope=0.01, inplace=True)
(9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
# torch.Size([120, 128, 16, 11])
(10): BasicConv2d(
(conv): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
)
# torch.Size([120, 256, 16, 11])
(11): LeakyReLU(negative_slope=0.01, inplace=True)
(12): BasicConv2d(
(conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
)
(13): LeakyReLU(negative_slope=0.01, inplace=True)
)
)
)
# 输出out[120, 256, 16, 11]再转置为[4, 256, 30, 16, 11]
(TP): PackSequenceWrapper()
# 最大池化得到x[4, 256, 16, 11]
(HPP): HorizontalPoolingPyramid()
# 1
# 转置x为torch.Size([4, 256, 16, 11])
# 最大池化+均值池化[4, 256, 16]
# 2
# 转置x为torch.Size([4, 256, 8, 22])
# 最大池化+均值池化[4, 256, 8]
# 3
# 转置x为torch.Size([4, 256, 4, 44])
# 最大池化+均值池化[4, 256, 4]
# 4
# 转置x为torch.Size([4, 256, 2, 88])
# 最大池化+均值池化[4, 256, 2]
# 5
# 转置x为torch.Size([4, 256, 1, 176])
# 最大池化+均值池化[4, 256, 1]
# 输出feat[4, 256, 31]
(FCs): SeparateFCs()
# 转置为[31, 4, 256]后乘一个学习参数[31, 256, 256]得到[31, 4, 256]再转置为[4, 256, 31]
(BNNecks): SeparateBNNecks(
# 转置为[4, 7936]
(bn1d): BatchNorm1d(7936, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
# 转置为[4, 256, 31]后再转置为feature[31, 4, 256],然后乘一个学习参数[31, 256, 74]得到logits[31, 4, 74]
# 再分别转置得到feature[4, 256, 31],logits[4, 74, 31]
)
`
(loss_aggregator): LossAggregator(
(losses): ModuleDict(
(triplet): TripletLoss()
# 31维的损失loss通过损失函数得到
# 还得到一个info
# Odict([('loss', 31维),
# ('hard_loss', 31维),
# ('loss_num', 31维),
# ('mean_dist', 31维)
(softmax): CrossEntropyLoss()
loss = loss.mean() * loss_func.loss_term_weight
# 这个权重,交叉熵是0.1,三元是1.0
loss_sum += loss
)
)
)
backbone输出一个retval
{'training_feat':
{'triplet':
{'embeddings': [4, 256, 31],
'labels': tensor([63, 61, 63, 61])},
'softmax':
{'logits': [4, 256, 31],
'labels': tensor([63, 61, 63, 61])}},
'visual_summary':
{'image/sils': [120, 1, 64, 44]},
'inference_feat':
{'embeddings': [4, 256, 31]}
DDPPassthrough(
# 把上面的再拆成5个玩意
# ipts是list包着的[4, 30, 64, 64]->sils[4, 1, 30, 64, 64]
# labs是一维tensor=[30, 30, 62, 62]
# seqL是None
# 输入x[4, 1, 30, 64, 64]
(module): GaitGL(
# 输入sils[4, 1, 30, 64, 64]
(conv3d): Sequential(
(0): BasicConv3d(
(conv3d): Conv3d(1, 32, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1), bias=False)
)
# 输出[4, 32, 30, 64, 64]
(1): LeakyReLU(negative_slope=0.01, inplace=True)
)
(LTA): Sequential(
(0): BasicConv3d(
(conv3d): Conv3d(32, 32, kernel_size=(3, 1, 1), stride=(3, 1, 1), bias=False)
)
# 输出x[4, 32, 10, 64, 64]
(1): LeakyReLU(negative_slope=0.01, inplace=True)
)
(GLConvA0): GLConv(
(global_conv3d): BasicConv3d(
(conv3d): Conv3d(32, 64, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1), bias=False)
)
# 输出gob_feat[4, 64, 10, 64, 64]
if self.halving == 0:# 这里是3所以不走这一路
(local_conv3d): BasicConv3d(
(conv3d): Conv3d(32, 64, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1), bias=False)
)
else:
# 对上面这个x[4, 32, 10, 64, 64] split得到lcl_feat
# 这是一个tuple,里面有8个[4, 32, 10, 8, 64]
# 8=int(x.size(3)//2**self.halving)
# lcl_feat中每个通过local_conv3d得到[4, 64, 10, 8, 64],再拼接成为lcl_feat[4, 64, 10, 64, 64]
feat = F.leaky_relu(gob_feat) + F.leaky_relu(lcl_feat)
# 即[4, 64, 10, 64, 64]
)
(MaxPool0): MaxPool3d(kernel_size=(1, 2, 2), stride=(1, 2, 2), padding=0, dilation=1, ceil_mode=False)
# 得到[4, 64, 10, 32, 32]
(GLConvA1): GLConv(
(global_conv3d): BasicConv3d(
(conv3d): Conv3d(64, 128, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1), bias=False)
)
# 得到[4, 128, 10, 32, 32]
# 类似上面先split,分别送入局部卷积
(local_conv3d): BasicConv3d(
(conv3d): Conv3d(64, 128, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1), bias=False)
)
# 再类似进行堆叠,并送入激活函数后求和,得到[4, 128, 10, 32, 32]
)
(GLConvB2): GLConv(
(global_conv3d): BasicConv3d(
(conv3d): Conv3d(128, 128, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1), bias=False)
)
(local_conv3d): BasicConv3d(
(conv3d): Conv3d(128, 128, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1), bias=False)
)
# 依然是类似的操作,得到[4, 128, 10, 64, 32]
)
(TP): PackSequenceWrapper()
# 最大池化,得到x[4, 128, 64, 32]
(HPP): GeMHPP()
# 将x平均池化得到[4, 128, 64, 1],并压缩为x[4, 128, 64]
(Head0): SeparateFCs()
# x permute成[64, 4, 128],乘一个学习参数[64, 128, 128],得到[64, 4, 128],再permute成gait[4, 128, 64]
(Bn): SyncBatchNorm(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
# gait batchnorm得到embed=bnft[4, 128, 64]
(Head1): SeparateFCs()
# 对bnft做类似Head0的操作,乘一个学习参数[64, 128, 74],得到logi[4, 74, 64]
(loss_aggregator): LossAggregator(
(losses): ModuleDict(
(triplet): TripletLoss()
(softmax): CrossEntropyLoss()
)
)
)
)
{'training_feat':
{'triplet':
{'embeddings': embed[4, 128, 64]
'labels': tensor([30, 30, 62, 62])},
'softmax':
{'logits': logi[4, 74, 64]
'labels': tensor([30, 30, 62, 62])}},
'visual_summary':
{'image/sils': sils的转置[120, 1, 64, 64]
'inference_feat':
{'embeddings': embed[4, 128, 64]}}
不能直接跑,因为会遇到batch里取到空导致越界等问题
改动:(应该都是必要的吧)
data_in_use: [false, false, true, true]→data_in_use: [true, false, false, false]
sample_type: fixed_unordered→sample_type: fixed_ordered
加上本没有的frames_skip_num: 0
trainer_cfg和evaluator_cfg统一取一种transform
CASIA-B*,不能做dataset_name,因为*,不能写进win的路径中
DDPPassthrough(
(module): Segmentation(
# 5个输入
# ipts是list包着俩
# [8, 30, 3, 128, 128]->rgbs[240, 3, 128, 128]
# [8, 30, 128, 128]->sils[240, 1, 128, 128]
# labs是一维tensor,长128
# typs是list,装着'nm-06', 'bg-01', 'cl-01'等,长128
# vies是list,装着'072', '090', '180', '108'等,长128
# seqL是None
(Backbone): U_Net(
# 输入x[240, 3, 128, 128](rgbs)
# 此时选择模型冻结,不算梯度
(Conv1): ConvBlock(
(conv): Sequential(
(0): Conv2d(3, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
(3): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(4): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(5): ReLU(inplace=True)
)
)# x1[240, 16, 128, 128]
(Maxpool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
# [240, 16, 64, 64]
(Conv2): ConvBlock(
(conv): Sequential(
(0): Conv2d(16, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
(3): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(4): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(5): ReLU(inplace=True)
)
)# x2[240, 32, 64, 64]
(Maxpool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
# [240, 32, 32, 32]
(Conv3): ConvBlock(
(conv): Sequential(
(0): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
(3): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(4): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(5): ReLU(inplace=True)
)
)# x3[240, 64, 32, 32]
(Maxpool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
# [240, 64, 16, 16]
(Conv4): ConvBlock(
(conv): Sequential(
(0): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
(3): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(4): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(5): ReLU(inplace=True)
)
)# x4[240, 128, 16, 16]
# 冻结选择结束,以下永远计算梯度
(Up4): UpConv(
(up): Sequential(
(0): Upsample(scale_factor=2.0, mode=nearest)
# [240, 128, 32, 32]
(1): Conv2d(128, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(3): ReLU(inplace=True)
)
)# d4[240, 64, 32, 32]
d4 = torch.cat((x3, d4), dim=1)
# d4[240, 128, 32, 32]
(Up_conv4): ConvBlock(
(conv): Sequential(
(0): Conv2d(128, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
(3): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(4): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(5): ReLU(inplace=True)
)
)# d4[240, 64, 32, 32]
(Up3): UpConv(
(up): Sequential(
(0): Upsample(scale_factor=2.0, mode=nearest)
(1): Conv2d(64, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(2): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(3): ReLU(inplace=True)
)
)# d3[240, 32, 64, 64]
d3 = torch.cat((x2, d3), dim=1)
# d3[240, 64, 64, 64]
(Up_conv3): ConvBlock(
(conv): Sequential(
(0): Conv2d(64, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
(3): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(4): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(5): ReLU(inplace=True)
)
)# d3[240, 32, 64, 64]
(Up2): UpConv(
(up): Sequential(
(0): Upsample(scale_factor=2.0, mode=nearest)
(1): Conv2d(32, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(2): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(3): ReLU(inplace=True)
)
)# d2[240, 16, 128, 128]
d2 = torch.cat((x1, d2), dim=1)
# d2[240, 32, 128, 128]
(Up_conv2): ConvBlock(
(conv): Sequential(
(0): Conv2d(32, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
(3): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(4): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(5): ReLU(inplace=True)
)
)# d2[240, 16, 128, 128]
(Conv_1x1): Conv2d(16, 1, kernel_size=(1, 1), stride=(1, 1))
)# d1[240, 1, 128, 128]
(loss_aggregator): LossAggregator(
(losses): ModuleDict(
(bce): BinaryCrossEntropyLoss()
)
)
)
)
{'training_feat': {
'bce': {
'logits': [240, 1, 128, 128],
'labels': [240, 1, 128, 128]}},
'visual_summary': {
'image/sils': [240, 1, 128, 128],
'image/logits': [240, 1, 128, 128],
'image/pred': [240, 1, 128, 128]},
'inference_feat': {
'pred': [240, 1, 128, 128],
'mask': [240, 1, 128, 128]}}
model_cfg里缺一个kernel_size,照着phase2_gaitedge.yaml加了个kernel_size: 3
因为分割模型没训练出来,所以trainer_cfg里的restore_hint暂且改成0
DDPPassthrough(
# 输入
# ipts是list包着3个东西
# ratios[16, 30]
# [16, 30, 3, 128, 128]->rgbs[480, 3, 128, 128]
# [16, 30, 128, 128]->sils[480, 1, 128, 128]
# labs是一维tensor,长16
# seqL是None
(module): GaitEdge(
(Backbone): U_Net(
# 输入[480, 3, 128, 128]
(Conv1): ConvBlock(
(conv): Sequential(
(0): Conv2d(3, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): SyncBatchNorm(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
(3): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(4): SyncBatchNorm(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(5): ReLU(inplace=True)
)
)
(Maxpool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(Conv2): ConvBlock(
(conv): Sequential(
(0): Conv2d(16, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): SyncBatchNorm(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
(3): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(4): SyncBatchNorm(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(5): ReLU(inplace=True)
)
)
(Maxpool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(Conv3): ConvBlock(
(conv): Sequential(
(0): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): SyncBatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
(3): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(4): SyncBatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(5): ReLU(inplace=True)
)
)
(Maxpool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(Conv4): ConvBlock(
(conv): Sequential(
(0): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): SyncBatchNorm(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
(3): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(4): SyncBatchNorm(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(5): ReLU(inplace=True)
)
)
(Up4): UpConv(
(up): Sequential(
(0): Upsample(scale_factor=2.0, mode=nearest)
(1): Conv2d(128, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(2): SyncBatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(3): ReLU(inplace=True)
)
)
(Up_conv4): ConvBlock(
(conv): Sequential(
(0): Conv2d(128, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): SyncBatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
(3): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(4): SyncBatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(5): ReLU(inplace=True)
)
)
(Up3): UpConv(
(up): Sequential(
(0): Upsample(scale_factor=2.0, mode=nearest)
(1): Conv2d(64, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(2): SyncBatchNorm(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(3): ReLU(inplace=True)
)
)
(Up_conv3): ConvBlock(
(conv): Sequential(
(0): Conv2d(64, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): SyncBatchNorm(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
(3): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(4): SyncBatchNorm(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(5): ReLU(inplace=True)
)
)
(Up2): UpConv(
(up): Sequential(
(0): Upsample(scale_factor=2.0, mode=nearest)
(1): Conv2d(32, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(2): SyncBatchNorm(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(3): ReLU(inplace=True)
)
)
(Up_conv2): ConvBlock(
(conv): Sequential(
(0): Conv2d(32, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): SyncBatchNorm(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
(3): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(4): SyncBatchNorm(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(5): ReLU(inplace=True)
)
)
(Conv_1x1): Conv2d(16, 1, kernel_size=(1, 1), stride=(1, 1))
)
# 输出logis[480, 1, 128, 128]
# 经过sigmoid得到logits,再四舍五入得到掩码mask(其实这一步上面几套里也有)
判断self.is_edge为false,再判断self.align为ture
(gait_align): GaitAlign(
# 输入logits,mask,ratios->w_h_ratio[480, 1]
# mask横向求和得到h_sum[480, 1, 128]
_ = (h_sum >= 1).float().cumsum(axis=-1) # [480, 1, 128]
h_top = (_ == 0).float().sum(-1) # [480, 1]
h_bot = (_ != torch.max(_, dim=-1, keepdim=True)
[0]).float().sum(-1) + 1. # [480, 1]
# mask纵向求和得到w_sum[480, 1, 128]
w_cumsum = w_sum.cumsum(axis=-1) # [480, 1, 128]
w_h_sum = w_sum.sum(-1).unsqueeze(-1) # [480, 1, 1]
w_center = (w_cumsum < w_h_sum / 2.).float().sum(-1) # [480, 1]
p1 = self.W - self.H * w_h_ratio
# self.W=44,self.H=64
p1 = p1 / 2.
p1 = torch.clamp(p1, min=0) # [n, c]
t_w = w_h_ratio * self.H / w
p2 = p1 / t_w # [n, c]
(Pad): ZeroPad2d(padding=(22, 22, 0, 0), value=0.0)
# logits[480, 1, 128, 128]输入得到feature_map[480, 1, 128, 172]
w_left = w_center - width / 2 - p2 # [n, c]
w_right = w_center + width / 2 + p2 # [n, c]
w_left = torch.clamp(w_left, min=0., max=w+2*width_p)
w_right = torch.clamp(w_right, min=0., max=w+2*width_p)
boxes = torch.cat([w_left, h_top, w_right, h_bot], dim=-1)
# index of bbox in batch
box_index = torch.arange(n, device=feature_map.device)
rois = torch.cat([box_index.view(-1, 1), boxes], -1) # [480, 5]
(RoiPool): RoIAlign(output_size=(64, 44), spatial_scale=1, sampling_ratio=-1, aligned=False)
# 输入feature_map, rois得到[480, 1, 64, 44]->cropped_logits[16, 30, 64, 44]
# 和labs作为下面gaitgl的输入
)
(conv3d): Sequential(
# cropped_logits->sils[16, 1, 30, 64, 44]
(0): BasicConv3d(
(conv3d): Conv3d(1, 32, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1), bias=False)
)
(1): LeakyReLU(negative_slope=0.01, inplace=True)
)# [16, 32, 30, 64, 44]
(LTA): Sequential(
(0): BasicConv3d(
(conv3d): Conv3d(32, 32, kernel_size=(3, 1, 1), stride=(3, 1, 1), bias=False)
)
(1): LeakyReLU(negative_slope=0.01, inplace=True)
)# x[16, 32, 10, 64, 44]
(GLConvA0): GLConv(
(global_conv3d): BasicConv3d(
(conv3d): Conv3d(32, 64, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1), bias=False)
)
# gob_feat[16, 64, 10, 64, 44]
h = x.size(3)
split_size = int(h // 2**self.halving)
lcl_feat = x.split(split_size, 3) # 8个[16, 32, 10, 8, 44]
(local_conv3d): BasicConv3d(
(conv3d): Conv3d(32, 64, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1), bias=False)
)# 逐个得到[16, 64, 10, 8, 44]后拼接
# 拼接得到lcl_feat[16, 64, 10, 64, 44]
feat = F.leaky_relu(gob_feat) + F.leaky_relu(lcl_feat)
)# [16, 64, 10, 64, 44]
(MaxPool0): MaxPool3d(kernel_size=(1, 2, 2), stride=(1, 2, 2), padding=0, dilation=1, ceil_mode=False)
# [16, 64, 10, 32, 22]
(GLConvA1): GLConv(
(global_conv3d): BasicConv3d(
(conv3d): Conv3d(64, 128, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1), bias=False)
)
(local_conv3d): BasicConv3d(
(conv3d): Conv3d(64, 128, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1), bias=False)
)
)# [16, 128, 10, 32, 22]
(GLConvB2): GLConv(
(global_conv3d): BasicConv3d(
(conv3d): Conv3d(128, 128, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1), bias=False)
)
(local_conv3d): BasicConv3d(
(conv3d): Conv3d(128, 128, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1), bias=False)
)
)# [16, 128, 10, 64, 22]
(TP): PackSequenceWrapper()
# 最大池化得到[16, 128, 64, 22]
(HPP): GeMHPP()
# 平均池化得到[16, 128, 64]
(Head0): SeparateFCs()
# ->[64, 16, 128]
# 乘张量[64, 128, 128]得到[64, 16, 128]->gait[16, 128, 64]
(Bn): SyncBatchNorm(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(Head1): SeparateFCs()
# 得到logi[16, 74, 64]
(loss_aggregator): LossAggregator(
(losses): ModuleDict(
(triplet): TripletLoss()
(bce): BinaryCrossEntropyLoss()
(softmax): CrossEntropyLoss()
)
)
)
)
{'training_feat': {
'triplet': {
'embeddings': [16, 128, 64],
'labels': tensor([24, 24, 24, 17, 17, 17, 2, 2, 24, 43, 43, 17, 2, 43, 2, 43])},
'softmax': {
'logits': [16, 74, 64],
'labels': tensor([24, 24, 24, 17, 17, 17, 2, 2, 24, 43, 43, 17, 2, 43, 2, 43])}
'bce': {
'logits': [480, 1, 128, 128],
'labels': [480, 1, 128, 128]}},
'visual_summary': {
'image/sils': [480, 1, 64, 44],
'image/roi': [480, 1, 64, 44]},
'inference_feat': {
'embeddings': [16, 128, 64]}}
其实就是phase2_e2e.yaml里把model_cfg的edge改成true
DDPPassthrough(
# 输入
# ipts是list包着3个东西
# ratios[16, 30]
# [16, 30, 3, 128, 128]->rgbs[480, 3, 128, 128]
# [16, 30, 128, 128]->sils[480, 1, 128, 128]
# labs是一维tensor,长16
# seqL是None
(module): GaitEdge(
(Backbone): U_Net(
# 输入[480, 3, 128, 128]
(Conv1): ConvBlock(
(conv): Sequential(
(0): Conv2d(3, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): SyncBatchNorm(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
(3): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(4): SyncBatchNorm(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(5): ReLU(inplace=True)
)
)
(Maxpool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(Conv2): ConvBlock(
(conv): Sequential(
(0): Conv2d(16, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): SyncBatchNorm(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
(3): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(4): SyncBatchNorm(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(5): ReLU(inplace=True)
)
)
(Maxpool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(Conv3): ConvBlock(
(conv): Sequential(
(0): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): SyncBatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
(3): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(4): SyncBatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(5): ReLU(inplace=True)
)
)
(Maxpool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(Conv4): ConvBlock(
(conv): Sequential(
(0): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): SyncBatchNorm(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
(3): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(4): SyncBatchNorm(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(5): ReLU(inplace=True)
)
)
(Up4): UpConv(
(up): Sequential(
(0): Upsample(scale_factor=2.0, mode=nearest)
(1): Conv2d(128, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(2): SyncBatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(3): ReLU(inplace=True)
)
)
(Up_conv4): ConvBlock(
(conv): Sequential(
(0): Conv2d(128, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): SyncBatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
(3): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(4): SyncBatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(5): ReLU(inplace=True)
)
)
(Up3): UpConv(
(up): Sequential(
(0): Upsample(scale_factor=2.0, mode=nearest)
(1): Conv2d(64, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(2): SyncBatchNorm(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(3): ReLU(inplace=True)
)
)
(Up_conv3): ConvBlock(
(conv): Sequential(
(0): Conv2d(64, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): SyncBatchNorm(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
(3): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(4): SyncBatchNorm(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(5): ReLU(inplace=True)
)
)
(Up2): UpConv(
(up): Sequential(
(0): Upsample(scale_factor=2.0, mode=nearest)
(1): Conv2d(32, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(2): SyncBatchNorm(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(3): ReLU(inplace=True)
)
)
(Up_conv2): ConvBlock(
(conv): Sequential(
(0): Conv2d(32, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): SyncBatchNorm(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
(3): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(4): SyncBatchNorm(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(5): ReLU(inplace=True)
)
)
(Conv_1x1): Conv2d(16, 1, kernel_size=(1, 1), stride=(1, 1))
)
# 输出logis[480, 1, 128, 128]
# 经过sigmoid得到logits,再四舍五入得到掩码mask(其实这一步上面几套里也有)
判断self.is_edge为true,对dils做如下预处理
dilated_mask = (morph.dilation(sils, self.kernel.to(sils.device)).detach()) > 0.5
# 形态学图像处理中的膨胀操作 [480, 1, 128, 128]
eroded_mask = (morph.erosion(sils, self.kernel.to(sils.device)).detach()) > 0.5
# 形态学图像处理中的腐蚀操作 [480, 1, 128, 128]
edge_mask = dilated_mask ^ eroded_mask
# [480, 1, 128, 128]
new_logits = edge_mask*logits+eroded_mask*sils
判断self.align为ture
(gait_align): GaitAlign(
# 输入new_logits, sils, ratios->w_h_ratio[480, 1]
# mask横向求和得到h_sum[480, 1, 128]
_ = (h_sum >= 1).float().cumsum(axis=-1) # [480, 1, 128]
h_top = (_ == 0).float().sum(-1) # [480, 1]
h_bot = (_ != torch.max(_, dim=-1, keepdim=True)
[0]).float().sum(-1) + 1. # [480, 1]
# mask纵向求和得到w_sum[480, 1, 128]
w_cumsum = w_sum.cumsum(axis=-1) # [480, 1, 128]
w_h_sum = w_sum.sum(-1).unsqueeze(-1) # [480, 1, 1]
w_center = (w_cumsum < w_h_sum / 2.).float().sum(-1) # [480, 1]
p1 = self.W - self.H * w_h_ratio
# self.W=44,self.H=64
p1 = p1 / 2.
p1 = torch.clamp(p1, min=0) # [n, c]
t_w = w_h_ratio * self.H / w
p2 = p1 / t_w # [n, c]
(Pad): ZeroPad2d(padding=(22, 22, 0, 0), value=0.0)
# logits[480, 1, 128, 128]输入得到feature_map[480, 1, 128, 172]
w_left = w_center - width / 2 - p2 # [n, c]
w_right = w_center + width / 2 + p2 # [n, c]
w_left = torch.clamp(w_left, min=0., max=w+2*width_p)
w_right = torch.clamp(w_right, min=0., max=w+2*width_p)
boxes = torch.cat([w_left, h_top, w_right, h_bot], dim=-1)
# index of bbox in batch
box_index = torch.arange(n, device=feature_map.device)
rois = torch.cat([box_index.view(-1, 1), boxes], -1) # [480, 5]
(RoiPool): RoIAlign(output_size=(64, 44), spatial_scale=1, sampling_ratio=-1, aligned=False)
# 输入feature_map, rois得到[480, 1, 64, 44]->cropped_logits[16, 30, 64, 44]
# 和labs作为下面gaitgl的输入
)
(conv3d): Sequential(
# cropped_logits->sils[16, 1, 30, 64, 44]
(0): BasicConv3d(
(conv3d): Conv3d(1, 32, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1), bias=False)
)
(1): LeakyReLU(negative_slope=0.01, inplace=True)
)# [16, 32, 30, 64, 44]
(LTA): Sequential(
(0): BasicConv3d(
(conv3d): Conv3d(32, 32, kernel_size=(3, 1, 1), stride=(3, 1, 1), bias=False)
)
(1): LeakyReLU(negative_slope=0.01, inplace=True)
)# x[16, 32, 10, 64, 44]
(GLConvA0): GLConv(
(global_conv3d): BasicConv3d(
(conv3d): Conv3d(32, 64, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1), bias=False)
)
# gob_feat[16, 64, 10, 64, 44]
h = x.size(3)
split_size = int(h // 2**self.halving)
lcl_feat = x.split(split_size, 3) # 8个[16, 32, 10, 8, 44]
(local_conv3d): BasicConv3d(
(conv3d): Conv3d(32, 64, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1), bias=False)
)# 逐个得到[16, 64, 10, 8, 44]后拼接
# 拼接得到lcl_feat[16, 64, 10, 64, 44]
feat = F.leaky_relu(gob_feat) + F.leaky_relu(lcl_feat)
)# [16, 64, 10, 64, 44]
(MaxPool0): MaxPool3d(kernel_size=(1, 2, 2), stride=(1, 2, 2), padding=0, dilation=1, ceil_mode=False)
# [16, 64, 10, 32, 22]
(GLConvA1): GLConv(
(global_conv3d): BasicConv3d(
(conv3d): Conv3d(64, 128, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1), bias=False)
)
(local_conv3d): BasicConv3d(
(conv3d): Conv3d(64, 128, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1), bias=False)
)
)# [16, 128, 10, 32, 22]
(GLConvB2): GLConv(
(global_conv3d): BasicConv3d(
(conv3d): Conv3d(128, 128, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1), bias=False)
)
(local_conv3d): BasicConv3d(
(conv3d): Conv3d(128, 128, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1), bias=False)
)
)# [16, 128, 10, 64, 22]
(TP): PackSequenceWrapper()
# 最大池化得到[16, 128, 64, 22]
(HPP): GeMHPP()
# 平均池化得到[16, 128, 64]
(Head0): SeparateFCs()
# ->[64, 16, 128]
# 乘张量[64, 128, 128]得到[64, 16, 128]->gait[16, 128, 64]
(Bn): SyncBatchNorm(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(Head1): SeparateFCs()
# 得到logi[16, 74, 64]
(loss_aggregator): LossAggregator(
(losses): ModuleDict(
(triplet): TripletLoss()
(bce): BinaryCrossEntropyLoss()
(softmax): CrossEntropyLoss()
)
)
)
)
{'training_feat': {
'triplet': {
'embeddings': [16, 128, 64],
'labels': tensor([24, 24, 24, 17, 17, 17, 2, 2, 24, 43, 43, 17, 2, 43, 2, 43])},
'softmax': {
'logits': [16, 74, 64],
'labels': tensor([24, 24, 24, 17, 17, 17, 2, 2, 24, 43, 43, 17, 2, 43, 2, 43])}
'bce': {
'logits': [480, 1, 128, 128],
'labels': [480, 1, 128, 128]}},
'visual_summary': {
'image/sils': [480, 1, 64, 44],
'image/roi': [480, 1, 64, 44]},
'inference_feat': {
'embeddings': [16, 128, 64]}}