Realtime Multi-Person 2D Pose Estimation using Part Affinity


CVPR Oral 2017

Zhe Cao

Tomas Simon

Shih-En Wei

Yaser Sheikh

The Robotics Institute, Carnegie Mellon University


  • We present an approach to efficiently detect the 2D pose of multiple people in an image. The approach uses a non-parametric representation, which we refer to as Part Affinity Fields (PAFs), to learn to associate body parts with individuals in the image. The architecture encodes global context, allowing a greedy bottom-up parsing step that maintains high accuracy while achieving realtime performance, irrespective of the number of people in the image. The architecture is designed to jointly learn part locations and their association via two branches of the same sequential prediction process. Our method placed first in the inaugural COCO 2016 keypoints challenge, and significantly exceeds the previous state-of-the-art result on the MPII Multi-Person benchmark, both in performance and efficiency.

    • 目标:通过将图片中已检测到的人体关键点正确的联系起来估计人体姿态。

    • 方法:PAF(Part Affinity Fields)—-> 部分区域亲和。该架构编码全局上下文,贪婪的down-to-top的解析步骤,达到实时,性能与图片中的人数无关。该架构被设计用来,通过两个分支的相同的顺序预测过程,联合学习关键点位置和他们之间的联系。



Challenges in 基于找到每个个体身体部位的方法
  1. 人数,位置,尺度

  2. 空间干扰。四肢关节的接触,遮挡使得 连接 困难

  3. 执行时间与人数相关



  1. Detection 检测人的位置

  2. Pose Estimation


  1. 利用现有的技术,Detection + Pose Estimation

  2. 距离近,检测失败

  3. 计算量 = 人数 * Time


  • 先检测关键点,再连接。有着消除计算量随着人数增加的潜力。


  • Pishchulin et al. 每个图几小时。全连接关键点,用整数线性规划的方法求解

  • Insafutdinov et al. 几分钟。基于ResNet的身体部位检测,对总的身体部位的检测数量有限制,直接结果不精确需要额外的逻辑回归。

This Paper

**创新:**Jointly Learning Parts Detection and Parts Association

  • 同时学习,PAFs 和 Key Positions。

    • PAFs: 在图像域编码着四肢位置和方向的2D矢量

    • CMP: Part Detection Comfidence Maps

  • Greedy pasing Algorithm

    1. 二分配(图论算法)

    2. 姿态解析


Simutaneous Detection and Association
  • Each branch is an iterative prediction architecture, to refine the predictions over successive stages, t ∈ {1, … , T }, with intermediate supervision at each stage.

* each stage predict one part or limb *

  • 在分支,之前stage的预测和VGG得到的特征,一起用来产生下一个stage的预测

  • 在每个stage之后都有两个损失函数,分别来自两个分支,一个分支一个(这里设计了一个mask,由于训练数据中并不是所有的关键点


Confidence Maps for Part Detection
Part Affinity Fields for Part Association

part affinity fields that preserves both location and orientation information across the region of support of the limb

  1. it encodes only the position, and not the orientation, of each limb

  2. it reduces the region of support of a limb to a single point.

Multi-Person Parsing using PAFs
  • 简化连接问题为最大匹配问题,之后结合detectio能得到的关键点联合决策该连接是否正确。

  • 可以做这样的简化的原因在于,该文章提出的PAFS。首先学习到的PAFs,具有大范围的感受野,再者,PAFs得到的是一个向量的集合,向量提供了方向的信息,对于简化解析匹配很重要。

  • assemble the connections that share the same part detection candidates into full-body poses of multiple people.
