HigherHRNet 源码分析

HigherHRNet 是一个 bottom-up 模型

Bottom-up的人体姿势估计方法由于尺度变化的挑战,对于小人物的尺寸预测效果并不是太好。

HigherHRNet 训练时采用多分辨率监督,简单来说就是输出多分辨率的heatmap

实现也就是在HRNet的后面 添加一个反卷积分支,让其输出一个更大尺寸的heatmap

关于HRNet 可以参考:

HRNet 源码分析_那时那月那人的博客-CSDN博客

下面分析 模型代码:基本上和HRNet基本一致,仅仅在后面添加一个反卷积分支

class PoseHigherResolutionNet(nn.Module):

    def __init__(self, cfg, **kwargs):
        pass

    def forward(self, x):
        # 模型和HRNet基本一致 只是在最后添加一个返回更高分辨率的的输出
        x = self.conv1(x)
        x = self.bn1(x)
        x = self.relu(x)
        x = self.conv2(x)
        x = self.bn2(x)
        x = self.relu(x)
        x = self.layer1(x)

        x_list = []
        for i in range(self.stage2_cfg['NUM_BRANCHES']):
            if self.transition1[i] is not None:
                x_list.append(self.transition1[i](x))
            else:
                x_list.append(x)
        y_list = self.stage2(x_list)

        x_list = []
        for i in range(self.stage3_cfg['NUM_BRANCHES']):
            if self.transition2[i] is not None:
                x_list.append(self.transition2[i](y_list[-1]))
            else:
                x_list.append(y_list[i])
        y_list = self.stage3(x_list)

        x_list = []
        for i in range(self.stage4_cfg['NUM_BRANCHES']):
            if self.transition3[i] is not None:
                x_list.append(self.transition3[i](y_list[-1]))
            else:
                x_list.append(y_list[i])
        y_list = self.stage4(x_list)

        final_outputs = []
        x = y_list[0]
        y = self.final_layers[0](x)
        final_outputs.append(y)
        # 在最后添加一个反卷积增大分辨率 如果需要更大的分比率可以加添加反卷积
        # concat([features, heatmaps])
        # 论文表面 添加一个反卷积在coco上达到最优
        for i in range(self.num_deconvs):
            if self.deconv_config.CAT_OUTPUT[i]:
                x = torch.cat((x, y), 1)

            x = self.deconv_layers[i](x)
            y = self.final_layers[i + 1](x)
            final_outputs.append(y)

        return final_outputs

下面分析下heatmap label的生成代码,

class CocoKeypoints(CocoDataset):
    def __init__(self,
                 cfg,
                 dataset_name,
                 remove_images_without_annotations,
                 heatmap_generator,
                 joints_generator,
                 transforms=None):
        super().__init__(cfg.DATASET.ROOT,
                         dataset_name,
                         cfg.DATASET.DATA_FORMAT)

        pass
        self.heatmap_generator = heatmap_generator
        self.joints_generator = joints_generator

    def __getitem__(self, idx):
        # 调用COCODataset的__getitem()__得到 img, target
        img, anno = super().__getitem__(idx)

        if img is None:
            img_info = self.coco.loadImgs(self.ids[idx])[0]
            img = np.zeros((3, img_info['height'], img_info['width']))

        mask = self.get_mask(anno, idx)

        anno = [
            obj for obj in anno
            if obj['iscrowd'] == 0 or obj['num_keypoints'] > 0
        ]

        # TODO(bowen): to generate scale-aware sigma, modify `get_joints` to associate a sigma to each joint
        joints = self.get_joints(anno)

        mask_list = [mask.copy() for _ in range(self.num_scales)]
        joints_list = [joints.copy() for _ in range(self.num_scales)]
        target_list = list()

        if self.transforms:
            img, mask_list, joints_list = self.transforms(
                img, mask_list, joints_list
            )

        for scale_id in range(self.num_scales):
            # 对多尺寸生成heatmap
            target_t = self.heatmap_generator[scale_id](joints_list[scale_id])
            joints_t = self.joints_generator[scale_id](joints_list[scale_id])

            target_list.append(target_t.astype(np.float32))
            mask_list[scale_id] = mask_list[scale_id].astype(np.float32)
            joints_list[scale_id] = joints_t.astype(np.int32)

        return img, target_list, mask_list, joints_list
class HeatmapGenerator():
    def __init__(self, output_res, num_joints, sigma=-1):
        # heatmap输出的尺寸
        self.output_res = output_res
        # 17
        self.num_joints = num_joints
        # w32_512_adam_lr1e-3.yaml   sigma = 2
        if sigma < 0:
            sigma = self.output_res/64
        self.sigma = sigma
        # 高斯核的大小
        size = 6*sigma + 3
        x = np.arange(0, size, 1, float)
        y = x[:, np.newaxis]
        x0, y0 = 3*sigma + 1, 3*sigma + 1
        # 用2d高斯函数生成 label
        self.g = np.exp(- ((x - x0) ** 2 + (y - y0) ** 2) / (2 * sigma ** 2))

    def __call__(self, joints):
        # heatmaps 尺寸
        hms = np.zeros((self.num_joints, self.output_res, self.output_res),
                       dtype=np.float32)
        sigma = self.sigma
        for p in joints:
            for idx, pt in enumerate(p):
                if pt[2] > 0:
                    # 对每个点用高斯函数来生成label
                    x, y = int(pt[0]), int(pt[1])
                    if x < 0 or y < 0 or \
                       x >= self.output_res or y >= self.output_res:
                        continue

                    ul = int(np.round(x - 3 * sigma - 1)), int(np.round(y - 3 * sigma - 1))
                    br = int(np.round(x + 3 * sigma + 2)), int(np.round(y + 3 * sigma + 2))

                    c, d = max(0, -ul[0]), min(br[0], self.output_res) - ul[0]
                    a, b = max(0, -ul[1]), min(br[1], self.output_res) - ul[1]

                    cc, dd = max(0, ul[0]), min(br[0], self.output_res)
                    aa, bb = max(0, ul[1]), min(br[1], self.output_res)
                    # 高斯函数赋值 
                    hms[idx, aa:bb, cc:dd] = np.maximum(
                        hms[idx, aa:bb, cc:dd], self.g[a:b, c:d])
        return hms

论文中提出一种 尺寸感知 (scale-aware)的训练方法 下面分析下具体的实现

从 scale-ware 字面意思 尺寸感知,根据不同尺寸得到不同的高斯函数。从代码中可以看出根据关键点的可见度来进行判断。不太懂?可见度和尺寸有什么联系?

class ScaleAwareHeatmapGenerator():
    def __init__(self, output_res, num_joints):
        self.output_res = output_res
        self.num_joints = num_joints

    def get_gaussian_kernel(self, sigma):
        size = 6*sigma + 3
        x = np.arange(0, size, 1, float)
        y = x[:, np.newaxis]
        x0, y0 = 3*sigma + 1, 3*sigma + 1
        g = np.exp(- ((x - x0) ** 2 + (y - y0) ** 2) / (2 * sigma ** 2))
        return g

    def __call__(self, joints):
        hms = np.zeros((self.num_joints, self.output_res, self.output_res),
                       dtype=np.float32)
        for p in joints:
            sigma = p[0, 3]
            # 根据 可见性生成不同的高斯函数
            # simga:
            # 0: 未标注 1: 标注不可见 2: 标注可见
            # 这不同的尺寸感知 和 可见性有关联? 越小的的尺寸越不可见?
            g = self.get_gaussian_kernel(sigma)
            for idx, pt in enumerate(p):
                if pt[2] > 0:
                    x, y = int(pt[0]), int(pt[1])
                    if x < 0 or y < 0 or \
                       x >= self.output_res or y >= self.output_res:
                        continue

                    ul = int(np.round(x - 3 * sigma - 1)), int(np.round(y - 3 * sigma - 1))
                    br = int(np.round(x + 3 * sigma + 2)), int(np.round(y + 3 * sigma + 2))

                    c, d = max(0, -ul[0]), min(br[0], self.output_res) - ul[0]
                    a, b = max(0, -ul[1]), min(br[1], self.output_res) - ul[1]

                    cc, dd = max(0, ul[0]), min(br[0], self.output_res)
                    aa, bb = max(0, ul[1]), min(br[1], self.output_res)
                    hms[idx, aa:bb, cc:dd] = np.maximum(
                        hms[idx, aa:bb, cc:dd], g[a:b, c:d])
        return hms

接下来分析损失函数

一般bottom-up是先找点,然后在分组。openpose 采用的PAF 来进行划分。

HigherHRNet使用 associate embedding 方法来进行分组的。

如果不了解 associate embedding 请关注下面这篇论文

Associative Embedding: End-to-End Learning for Joint Detection and Grouping

https://arxiv.org/abs/1611.05424

HigherHRNet 源码分析_第1张图片

 Lg损失函数又上面两部分组成。

第一部分表示 同一个人所有关键点之间的损失函数,让所有关键点 embedding 先 reference embedding(平均 embedding)靠近。

第二部分表示 不同人之间的损失,两着reference embedding 距离越大 损失函数越小 exp(-d**2)

接下来分析下损失函数的代码实现

class MultiLossFactory(nn.Module):
    def __init__(self, cfg):
        super().__init__()
        # init check
        self._init_check(cfg)

        self.num_joints = cfg.MODEL.NUM_JOINTS
        self.num_stages = cfg.LOSS.NUM_STAGES

        # heatmap 损失 MSE 损失
        # WITH_HEATMAPS_LOSS: [True, True]
        # [HeatmapLoss(), HeatmapLoss()]
        self.heatmaps_loss = \
            nn.ModuleList(
                [
                    HeatmapLoss()
                    if with_heatmaps_loss else None
                    for with_heatmaps_loss in cfg.LOSS.WITH_HEATMAPS_LOSS
                ]
            )
        self.heatmaps_loss_factor = cfg.LOSS.HEATMAPS_LOSS_FACTOR

        # associative embedding 损失
        # WITH_AE_LOSS: [True, False]
        # [AELoss(), None]
        self.ae_loss = \
            nn.ModuleList(
                [
                    AELoss(cfg.LOSS.AE_LOSS_TYPE) if with_ae_loss else None
                    for with_ae_loss in cfg.LOSS.WITH_AE_LOSS
                ]
            )
        self.push_loss_factor = cfg.LOSS.PUSH_LOSS_FACTOR
        self.pull_loss_factor = cfg.LOSS.PULL_LOSS_FACTOR

    def forward(self, outputs, heatmaps, masks, joints):
        # loss_factory(outputs, heatmaps, masks, joints)
        # forward check
        self._forward_check(outputs, heatmaps, masks, joints)

        heatmaps_losses = []
        push_losses = []
        pull_losses = []
        # len(outputs) == 2
        for idx in range(len(outputs)):
            offset_feat = 0
            # [HeatmapLoss(), HeatmapLoss()]
            if self.heatmaps_loss[idx]:
                heatmaps_pred = outputs[idx][:, :self.num_joints]
                offset_feat = self.num_joints
                # 计算 heatmap 损失
                heatmaps_loss = self.heatmaps_loss[idx](
                    heatmaps_pred, heatmaps[idx], masks[idx]
                )
                # self.heatmaps_loss_factor[idx] 损失函数权重系数 1
                heatmaps_loss = heatmaps_loss * self.heatmaps_loss_factor[idx]
                heatmaps_losses.append(heatmaps_loss)
            else:
                heatmaps_losses.append(None)

            # Associative Embedding
            # [AELoss(), None]
            if self.ae_loss[idx]:
                tags_pred = outputs[idx][:, offset_feat:]
                batch_size = tags_pred.size()[0]
                tags_pred = tags_pred.contiguous().view(batch_size, -1, 1)
                # 计算 associative embedding loss
                push_loss, pull_loss = self.ae_loss[idx](
                    tags_pred, joints[idx]
                )
                push_loss = push_loss * self.push_loss_factor[idx]
                pull_loss = pull_loss * self.pull_loss_factor[idx]

                push_losses.append(push_loss)
                pull_losses.append(pull_loss)
            else:
                push_losses.append(None)
                pull_losses.append(None)

        return heatmaps_losses, push_losses, pull_losses

下面分析ae (associative embedding) 损失

class AELoss(nn.Module):
    def __init__(self, loss_type):
        super().__init__()
        self.loss_type = loss_type

    def singleTagLoss(self, pred_tag, joints):
        """
        associative embedding loss for one image
        """
        # pred_tag (17 * h * w, 1)
        # joints (30, 17, 2)
        tags = []
        pull = 0
        # 遍历人数
        for joints_per_person in joints:
            tmp = []
            # 遍历所有关节点
            for joint in joints_per_person:
                if joint[1] > 0:
                    # joint 里面存的值 [关节点在heatmap的位置, 1]
                    # 该关节点可见 加入tmp
                    tmp.append(pred_tag[joint[0]])
            # 如果没有可见关键点 continue
            if len(tmp) == 0:
                continue
            tmp = torch.stack(tmp)
            tags.append(torch.mean(tmp, dim=0))
            # 这里是 对单个人得损失 让每个关节点 embedding 向 reference embedding (均值靠拢)
            pull = pull + torch.mean((tmp - tags[-1].expand_as(tmp))**2)

        num_tags = len(tags)
        if num_tags == 0:
            return make_input(torch.zeros(1).float()), \
                make_input(torch.zeros(1).float())
        elif num_tags == 1:
            return make_input(torch.zeros(1).float()), \
                pull/(num_tags)

        tags = torch.stack(tags)

        # 不同人之间得损失
        size = (num_tags, num_tags)
        A = tags.expand(*size)
        B = A.permute(1, 0)

        diff = A - B

        if self.loss_type == 'exp':
            # 根据损失函数 希望 不同人之间reference embedding的距离越来越远
            diff = torch.pow(diff, 2)
            push = torch.exp(-diff)
            # 着里 -num_tags 可有可无 仅仅是添加一个margin
            push = torch.sum(push) - num_tags
        elif self.loss_type == 'max':
            diff = 1 - torch.abs(diff)
            push = torch.clamp(diff, min=0).sum() - num_tags
        else:
            raise ValueError('Unkown ae loss type')

        return push/((num_tags - 1) * num_tags) * 0.5, \
            pull/(num_tags)

    def forward(self, tags, joints):
        """
        accumulate the tag loss for each image in the batch
        """
        # tags (B, 17 * h * w, 1)
        # joints (B, 30, 17, 2)
        pushes, pulls = [], []
        joints = joints.cpu().data.numpy()
        batch_size = tags.size(0)
        for i in range(batch_size):
            push, pull = self.singleTagLoss(tags[i], joints[i])
            pushes.append(push)
            pulls.append(pull)
        return torch.stack(pushes), torch.stack(pulls)

接下来分析inference代码  论文中指出使用一个

Heatmap Aggregation for Inference
We propose a heatmap aggregation strategy during inference. We use bilinear interpolation to upsample all the predicted heatmaps with different resolutions to the reso lution of the input image and average the heatmaps from all scales for fifinal prediction. This strategy is quite different from previous methods [ 3 , 30 , 33 ] which only use heatmaps from a single scale or single stage for prediction.

简单来说 使用多尺度的heatmap来进行预测,对于不同尺寸的heatmap采用双向性插值的上采用来进行处理。和其他的模型处理方法不同,其他模型处理方法一般只取最后一个stage的最大尺寸heatmap来进行预测。HigherHRNet之所以使用不同尺寸的heatmap来进行预测,不同的heatmap尺寸对不同尺度的关键点起到的作用也不同。低分辨率的heatmap容易漏掉小尺寸人的关键点。但是小尺寸人的关键点可以在高分辨率的heatmap中恢复。

下面我们分析下inference代码  这里不同尺度的融合是采用取平均值的方式实现

def get_multi_stage_outputs(
        cfg, model, image, with_flip=False,
        project2image=False, size_projected=None
):
    # outputs = []
    heatmaps_avg = 0
    num_heatmaps = 0
    heatmaps = []
    tags = []

    outputs = model(image)
    for i, output in enumerate(outputs):
        if len(outputs) > 1 and i != len(outputs) - 1:
            # 对小尺寸的heatmap用双线性插值放大
            output = torch.nn.functional.interpolate(
                output,
                size=(outputs[-1].size(2), outputs[-1].size(3)),
                mode='bilinear',
                align_corners=False
            )

        offset_feat = cfg.DATASET.NUM_JOINTS \
            if cfg.LOSS.WITH_HEATMAPS_LOSS[i] else 0

        if cfg.LOSS.WITH_HEATMAPS_LOSS[i] and cfg.TEST.WITH_HEATMAPS[i]:
            # 这里进行累加 后面计算平均值
            heatmaps_avg += output[:, :cfg.DATASET.NUM_JOINTS]
            num_heatmaps += 1

        if cfg.LOSS.WITH_AE_LOSS[i] and cfg.TEST.WITH_AE[i]:
            # 这里是分组信息
            tags.append(output[:, offset_feat:])

    if num_heatmaps > 0:
        heatmaps.append(heatmaps_avg/num_heatmaps)

HigherHRNet 主要代码分析完。

你可能感兴趣的:(源码分析,人体姿态估计,论文解析,计算机视觉)