mmaction2实验记录3——动作识别训练过程

之前了解了视频数据加载和处理的过程。mmaction2-master/mmaction/tools/train.py代码中对应的部分进行分析来看一下视频训练的过程。

1、首先在tools/train.py的main()函数中调用train_model()函数进行训练模型

2、然后train_model()函数中

该函数中最重要的部分为,生成runner,运行runner.run()开始执行训练过程。runner在单gpus训练过程中为EpochBasedRunner的子类。

Runner = OmniSourceRunner if cfg.omnisource else EpochBasedRunner  # 采用EpochBasedRunner
runner = Runner(  # 将数据模型和优化器全部装载到runner中
        model,
        optimizer=optimizer,
        work_dir=cfg.work_dir,
        logger=logger,
        meta=meta)
runner.run(data_loaders, cfg.workflow, cfg.total_epochs, **runner_kwargs)

train_model()的其他代码部分为模型运行配置和一些策略的设置。

3、runner.run()函数在其子类EpochBaseRunner中实现

下面给出run()中有关训练的关键代码,其中执行epoch_runnner()函数进行训练               

class EpochBasedRunner(BaseRunner):

    def run(self,
            data_loaders:List[DtaLoarder],
            workflow: List[Tuple[str, int]],
            max_epochs: Optional[int] = None,
            **kwargs):

        work_dir = self.work_dir if self.work_dir is not None else 'NONE'
        # 执行训练的迭代过程
        while self.epoch < self._max_epochs:  # 执行一次循环self.epochs会加一
            for i, flow in enumerate(workflow):  # workflow记录当前训练流信息 状态+epoch信息
                mode, epochs = flow  # epoch为一个常量
                if isinstance(mode, str):  # self.train()
                    if not hasattr(self, mode):
                        raise ValueError(
                            f'runner has no method named "{mode}" to run an '
                            'epoch')
                    epoch_runner = getattr(self, mode)
                else:
                    raise TypeError(
                        'mode in workflow must be a str, but got {}'.format(
                            type(mode)))

                for _ in range(epochs):  # 执行一个迭代
                    if mode == 'train' and self.epoch >= self._max_epochs:
                        break
                    epoch_runner(data_loaders[i], **kwargs)  # 训练使用的入口

        time.sleep(1)  # wait for some hooks like loggers to finish
        self.call_hook('after_run')

4、运行epoch_runner()函数,会执行EpochBasedRunner类下train()函数,在train中执行self.run_iter()完成一个batch的运算

    def train(self, data_loader, **kwargs):
        self.model.train()
        self.mode = 'train'
        self.data_loader = data_loader
        self._max_iters = self._max_epochs * len(self.data_loader)
        self.call_hook('before_train_epoch')
        time.sleep(2)  # Prevent possible deadlock during epoch transition
        for i, data_batch in enumerate(self.data_loader):
            self.data_batch = data_batch
            self._inner_iter = i
            self.call_hook('before_train_iter')
            self.run_iter(data_batch, train_mode=True, **kwargs)  # 执行一次迭代
            self.call_hook('after_train_iter')
            del self.data_batch
            self._iter += 1

        self.call_hook('after_train_epoch')
        self._epoch += 1

5、执行run_iter()函数,该函数中的model.train_step()为训练一个迭代的过程

    def run_iter(self, data_batch: Any, train_mode: bool, **kwargs):
        if self.batch_processor is not None:
            outputs = self.batch_processor(
                self.model, data_batch, train_mode=train_mode, **kwargs)
        elif train_mode:
            outputs = self.model.train_step(data_batch, self.optimizer,  # 执行模型下面的训练过程
                                            **kwargs)
        else:
            outputs = self.model.val_step(data_batch, self.optimizer, **kwargs)
        if not isinstance(outputs, dict):
            raise TypeError('"batch_processor()" or "model.train_step()"'
                            'and "model.val_step()" must return a dict')
        if 'log_vars' in outputs:
            self.log_buffer.update(outputs['log_vars'], outputs['num_samples'])
        self.outputs = outputs

6、跳转到train_step()函数该函数在模型的基类recognizers/base.py中实现,为所有识别模型共有的部分,中在self()语句中执行模型的前向传播并根据标签生成损失

    def train_step(self, data_batch, optimizer, **kwargs):
        imgs = data_batch['imgs']
        label = data_batch['label']

        aux_info = {}
        for item in self.aux_info:
            assert item in data_batch
            aux_info[item] = data_batch[item]

        losses = self(imgs, label, return_loss=True, **aux_info)  # 执行模型的forward来进行计算损失

        loss, log_vars = self._parse_losses(losses)

        outputs = dict(
            loss=loss,
            log_vars=log_vars,
            num_samples=len(next(iter(data_batch.values()))))

        return outputs

7、在forward()函数中,根据训练状态选择进行模型的训练或者测试状态

    def forward(self, imgs, label=None, return_loss=True, **kwargs):
        """Define the computation performed at every call."""
        if kwargs.get('gradcam', False):
            del kwargs['gradcam']
            return self.forward_gradcam(imgs, **kwargs)
        if return_loss:
            if label is None:
                raise ValueError('Label should not be None.')
            if self.blending is not None:
                imgs, label = self.blending(imgs, label)
            return self.forward_train(imgs, label, **kwargs)  # 调用子类中的具体forward_train函数来执行训练过程

        return self.forward_test(imgs, **kwargs)

8、通过forward_train()函数转到具体识别器中的训练过程

    def forward_train(self, imgs, labels, **kwargs):
        """Defines the computation performed at every call when training."""

        assert self.with_cls_head
        imgs = imgs.reshape((-1, ) + imgs.shape[2:])  # [8,1,3,32,224,224]->[8,3,32,224,224] bs*N*C*T*H*W 将bs和num_clips维度合并
        losses = dict()

        x = self.extract_feat(imgs)  # [8,3,32,224,224]->[8,2048,4,7,7] 使用3D卷积神经网络对视频帧进行特征提取
        if self.with_neck:
            x, loss_aux = self.neck(x, labels.squeeze())
            losses.update(loss_aux)

        cls_score = self.cls_head(x)  # [bs, cls_num]
        gt_labels = labels.squeeze()
        loss_cls = self.cls_head.loss(cls_score, gt_labels, **kwargs)
        losses.update(loss_cls)

        return losses

问题:

在数据处理时会将bs和num_clips维度进行合并,最后得到的输出为bs*num_clips,而标签只有bs数量个。由此在计算损失时会出现错误。因此视频识别的输入中num_clips被限制为只能为1。即一个视频只能选取一个视频片段,用其来代表整个视频来完成任务。

而如果输入视频是一个较长的视频时,一个视频片段很难代表一整个视频。所以对于长视频的视频分类,需要在原来的基础上进行调整。

你可能感兴趣的:(mmaction2,深度学习,机器学习,人工智能)