Dushyant Mehta, Srinath Sridhar, Oleksandr Sotnychenko, Helge Rhodin, Mohammad Shafiei, Hans-Peter Seidel, Weipeng Xu, Dan Casas, and Christian Theobalt. 2017. VNect: real-time 3D human pose estimation with a single RGB camera. ACM Trans. Graph. 36, 4, Article 44 (July 2017), 14 pages. DOI: https://doi.org/10.1145/3072959.3073596
从准确度(accuracy is quantitatively on par)上看,比离线[3]3d单目rgb姿态评价算法(the best offline 3D monocular RGB pose estimation methods)更好(特别是末端位置end effector positions);与单目rgb-d(rgb深度[1])方法效果持平,或有时会更好。
using convolutional neural networks (CNNs) [Mehta et al. 2016; Pavlakos et al. 2016],同时生成2d和3d joint positions ,放弃了资源消耗大的Bounding Box Computations(边界盒)。
使用Model-based kinematic skeleton fitting来实时修正前者(CNNs)预测出的运动骨骼连接节点(joint positions),保证预测运动的一致性。
事先提供被测人的身高,作为规格参考,降低图像重构时的歧义(ambiguous)。对于kinematic skeleton,平均开始的一段时间里的CNN预测(average CNN predictions for a few frames at the beginning)。
harder to run in real-time, partly due to additional preprocessing steps such as bounding box extraction。现有方法需要“精细”修剪的图片(tight crops at a fixed resolution,如由边界盒算法修剪,耗时大)
无法准确预测人体关节的范围(extent of articulation)
上述解决方法:本文创新的,将2d热图(heatmap)映射到3d,每个节点j都有3个方向映射(location-maps)Xj / Yj / Zj,捕捉相对骨盆(root-relative / pelvis)的j的三维位置xj / yj / zj,以此完成2d到3d的转换。
网络使用,ResNet50 network architecture of He et al. [2016]。
训练网络,使用训练集(2D pose estimation、heatmap)MPII [Andriluka et al. 2014] and LSP [Johnson and Everingham 2010, 2011]、(3D pose)MPI-INF- 3DHP [Mehta et al. 2016] and Human3.6m [Ionescu et al. 2014b]。
不需要“精细”裁剪,Bounding Box Tracker,The BB tracker starts with (slow) multi-scale predictions on the full image for the first few frames, and hones in on the person in the image making use of the BB-agnostic predictions from the fully convolutional network.算法开始时,先按照BB裁剪出含有目标物体的边框,后BB按设定比例逐帧移动、放大,即后续步骤中并未使用BB算法,加快了整个程序的速度以达到实时的目的。
