Total Capture: A 3D Deformation Model for Tracking Faces, Hands, and Bodies翻译[上]

Total Capture: A 3D Deformation Model for Tracking Faces, Hands, and Bodies翻译下](https://www.jianshu.com/p/c4463d318e3b)

Total Capture: A 3D Deformation Model for Tracking Faces, Hands, and Bodies

全部捕获：用于追踪面部，手部和身体的3D变形模型

论文：http://arxiv.org/pdf/1801.01615v1.pdf

Abstract

摘要

We present a uniﬁed deformation model for the markerless capture of multiple scales of human movement, including facial expressions, body motion, and hand gestures. An initial model is generated by locally stitching together models of the individual parts of the human body, which we refer to as the “Frankenstein” model. This model enables the full expression of part movements, including face and hands by a single seamless model. Using a large-scale capture of people wearing everyday clothes, we optimize the Frankenstein model to create “Adam”. Adam is a calibrated model that shares the same skeleton hierarchy as the initial model but can express hair and clothing geometry, making it directly usable for ﬁtting people as they normally appear in everyday life. Finally, we demonstrate the use of these models for total motion tracking, simultaneously capturing the large-scale body movements and the subtle face and hand motion of a social group of people.

我们提出了一个统一的变形模型，用于无标记地捕捉人体运动的多个尺度，包括面部表情，身体动作和手势。初始模型是通过局部缝合人体各个部分的模型产生的，我们称之为“弗兰肯斯坦”模型。该模型可以通过单一的无缝模型完整地表达零件移动，包括面部和手部。通过大规模捕捉穿着日常服装的人们，我们优化了弗兰肯斯坦模型以创建“亚当”。亚当是一个校准模型，与初始模型共享相同的骨架层次结构，但可以表达头发和服装的几何形状，使其可以直接用于配合人们的日常生活。最后，我们展示了使用这些模型进行全身运动追踪，同时捕捉大型身体动作以及社交群体的微妙脸部和手部动作。

1. Introduction

1.介绍

Social communication is a key function of human motion [7]. We communicate tremendous amounts of information with the subtlest movements. Between a group of interacting individuals, gestures such as a gentle shrug of the shoulders, a quick turn of the head, or an uneasy shifting of weight from foot to foot, all transmit critical information about the attention, emotion, and intention to observers. Notably, these social signals are usually transmitted by the organized motion of the whole body: with facial expressions, hand gestures, and body posture. These rich signals layer upon goal-directed activity in constructing the behavior of humans, and are therefore crucial for the machine perception of human activity.

社会交往是人类运动的关键功能[7]。我们通过最微妙的动作传达大量信息。在一群相互作用的人之间，诸如肩膀轻柔耸肩，头部快速转动，或者脚部之间不稳定的重量转移等手势都向观察者传递关于注意，情绪和意图的关键信息。值得注意的是，这些社交信号通常是通过整个身体的有组织运动来传递的：面部表情，手势和身体姿势。这些丰富的信号在构建人类行为时以目标为导向的活动分层，因此对机器对人类活动的感知至关重要。

∗Website: http://www.cs.cmu.edu/˜hanbyulj/totalcapture

*网站：http://www.cs.cmu.edu/~hanbyulj/totalcapture

However, there are no existing systems that can track, without markers, the human body, face, and hands simultaneously. Current markerless motion capture systems focus at a particular scale or on a particular part. Each area has its own preferred capture conﬁguration: (1) torso and limb motions are captured in a sufﬁciently large working volume where people can freely move [17, 21, 44, 19]; (2) facial motion is captured at close range, mostly frontal, and assuming little global head motion [5, 24, 6, 9, 51]; (3) ﬁnger motion is also captured at very close distances from hands, where the hand regions are dominant in the sensor measurements [36, 49, 42, 50]. These conﬁgurations make it difﬁcult to analyze these gestures in the context of social communication.

但是，没有现有的系统可以在没有标记的情况下同时跟踪人体，脸部和手部。目前无标记运动捕捉系统专注于特定的尺度或特定部分。每个区域都有其自己喜欢的捕捉配置：（1）躯干和肢体运动被捕捉到一个足够大的工作量，人们可以自由移动[17,21,44,19]; （2）面部运动是在近距离捕获的，大部分是前额运动，并且假设很少的全局头部运动[5,24,6,9,51]; （3）在距离手很近的距离处捕获手指运动，手部区域在传感器测量中占主导地位[36,49,42,50]。这些配置使得难以在社交沟通的背景下分析这些手势。

In this paper, we present a novel approach to capture the motion of the principal body parts for multiple interacting people (see Fig. 1). The fundamental difﬁculty of such capture is caused by the scale differences of each part. For example, the torso and limbs are relatively large and necessitate coverage over a sufﬁciently large working volume, while ﬁngers and faces, due to their smaller feature size, require close distance capture with high resolution and frontal imaging. With off-the-shelf cameras, the resolution for face and hand parts will be limited in a room-scale, multi-person capture setup.

在本文中，我们提出了一种捕捉多个相互作用的人的主体部分运动的新方法（见图1）。这种捕获的根本困难是由每个部分的尺度差异造成的。例如，躯干和四肢相对较大，需要覆盖足够大的工作体积，而手指和脸部由于其较小的特征尺寸，需要使用高分辨率和正面成像进行近距离拍摄。使用现成的相机，面部和手部件的分辨率将受限于一个房间规模的多人拍摄设置。

To overcome this sensing challenge, we use two general approaches: (1) we leverage keypoint detection (e.g., faces [18], bodies [54, 14, 35], and hands [41]) in multiple views to obtain 3D keypoints, which is robust to multiple people and object interactions; (2) to compensate for the limited sensor resolution, we present a novel generative body deformation model, which has the ability to express the motion of the each of the principal body parts. In particular, we describe a procedure to build an initial body model, named “Frankenstein”, by seamlessly consolidating available part template models [33, 13] into a single skeleton hierarchy. We optimize this initialization using a capture of 70 people, and learn a new deformation model, named “Adam”, capable of additionally capturing variations of hair and clothing, with a simpliﬁed parameterization. We present a method to capture the total body motion of multiple people with the 3D deformable model. Finally, we demonstrate the performance of our method on various sequences of social behavior and person-object interactions, where the combination of face, limb, and ﬁnger motion emerges naturally.

为了克服这种感知挑战，我们使用两种一般方法：（1）在多个视图中利用关键点检测（例如，脸部[18]，身体[54,14,35]和手[41]）来获得3D关键点，这对多个人和对象的交互是强大的; （2）为了补偿有限的传感器分辨率，我们提出了一种新颖的生成体变形模型，它能够表达每个主体部分的运动。特别是，我们通过无缝地将可用的零件模板模型[33,13]合并到一个单独的骨架层次结构中，描述了一个构建名为“Frankenstein”的初始模型的过程。我们使用70人的捕捉来优化初始化，并学习一种名为“Adam”的新变形模型，该模型能够通过简化的参数化额外捕捉头发和衣服的变化。我们提出了一种用3D变形模型来捕捉多个人的全身运动的方法。最后，我们证明了我们的方法在社会行为和人 - 对象交互的各种序列上的表现，其中面部，肢体和手指运动的组合自然出现。

2. Related Work

2.相关工作

Motion capture systems performed by tracking retroreﬂective markers [55] are the most widely used motion capture technology due to their high accuracy. Markerless motion capture methods [23, 17, 21, 44] have been explored over the past two decades to achieve the same goal without markers, but they tend to implicitly admit that their performance is inferior by treating the output of marker based methods as a ground truth or an upper bound. However, over the last few years, we have witnessed a great advance in key point detections from images (e.g., faces [18], bodies [54, 14, 35], and hands [41]), which can provide reliable anatomical landmark measurements for markerless motion capture methods [19, 28, 41], while the performance of marker based methods relatively remains the same with their major disadvantages including: (1) a necessity of sparsity in marker density for reliable tracking which limits the spatial resolution of motion measurements, and (2) a limitation in automatically handling occluded markers which requires an expensive manual clean-up. Especially, capturing high-ﬁdelity hand motion is still challenging in marker-based motion capture systems due to the severe self occlusions of hands [59], while occlusions are implicitly handled by guessing the occluded parts with uncertainty using the prior learnt from a large scale dataset [41]. Our method shows that the markerless motion capture approach potentially begins to outperform the marker-based counterpart by leveraging the learning based image measurements. As an evidence we demonstrate the motion capture from total body, which has not been demonstrated by other existing marker based methods. In this section, we review the most relevant markerless motion capture approaches to our method.

通过跟踪反射标记来执行动作捕捉系统[55]是由于其高精度而被广泛使用的动作捕捉技术。无标记运动捕获方法[23,17,21,44]在过去的二十年中已经被探索，以实现没有标记的相同目标，但是他们倾向于隐含地承认它们的性能较差，即将基于标记的方法的输出视为地面真相或上限。然而，在过去的几年里，我们在图像（例如脸部[18]，身体[54,14,35]和手部[41]）的关键点检测方面取得了巨大的进步，可以提供可靠的解剖标志无标记运动捕获方法的测量[19,28,41]，而基于标记的方法的性能相对保持不变，其主要缺点包括：（1）对于可靠跟踪的标记密度的稀疏性的必要性，其限制了空间分辨率运动测量，以及（2）自动处理遮挡标记的限制，这需要昂贵的手动清理。特别是，由于手部的严重自我遮挡[59]，捕获高清晰度手部运动在基于标记的运动捕捉系统中仍然具有挑战性，而通过使用先前从大规模学习得到的不确定性来猜测被遮挡部分时隐式处理遮挡部分数据集[41]。我们的方法表明，无标记运动捕捉方法可能开始通过利用基于学习的图像测量而优于基于标记的对应方法。作为一个证据，我们展示了来自全身的动作捕捉，而其他现有的基于标记的方法并未证明这一点。在本节中，我们将回顾我们方法中最相关的无标记运动捕捉方法。

Markerless motion capture largely focuses on the motion of the torso and limbs. The standard pipeline is based on a multiview camera setup and tracking with a 3D template model [32, 23, 15, 10, 29, 16, 52, 11, 44, 17, 20, 19]. In this approach, motion capture is performed by aligning the 3D template model to the measurements, which distinguish the various approaches and may include color, texture, silhouettes, point clouds, and landmarks. A parallel track of related work therefore focuses on capturing and improving body models for tracking, for which a highly controlled multiview capture system—specialized for single person capture—is used to build precise models. With the introduction of commodity depth sensors, single-view depth-based body motion capture became a popular direction [3, 40]. A recent collection of approaches aims to reconstruct 3D skeletons directly from monocular images, either by ﬁtting 2D keypoint detections with a prior on human pose [60, 8] or getting even closer to direct regression methods [61, 34, 48].

无标记动作捕捉主要集中在躯干和四肢的运动。标准流水线基于多视点相机设置和3D模板模型跟踪[32,23,15,10,29,16,52,11,44,17,20,19]。在这种方法中，通过将3D模板模型与测量进行对齐来执行运动捕捉，其区分各种方法并且可以包括颜色，纹理，轮廓，点云和地标。因此，相关工作的平行轨迹重点在于捕捉和改进用于跟踪的人体模型，其中使用高度控制的多视图捕捉系统（专用于单人捕捉）来构建精确的模型。随着商品化深度传感器的引入，基于单视点深度的身体动作捕捉成为一种流行的方向[3,4]。最近一系列方法的目标是直接从单眼图像重建三维骨架，或者通过2D人物姿势上的关键点检测[60,8]或更接近直接回归方法[61,34,48]。

Facial scanning and performance capture has been greatly advanced over the last decade. There exist multiview based methods showing excellent performance on high-quality facial scanning [5, 24] and facial motion capture [6, 9, 51]. Recently, light-weighed systems based on a single camera show a compelling performance by leveraging morphable 3D face model on 2D measurements[22, 18, 31, 47, 13, 12, 56]. Hand motion captures are mostly lead by single depth sensor based methods [36, 46, 49, 30, 57, 45, 53, 43, 39, 42, 50, 58], with few exceptions based on multi-view systems [4, 43, 38]. In this work, we take the latter approach and use the method of [41] who introduced a hand keypoint detector for RGB images which can be directly applicable in multiview systems to reconstruct 3D hand joints.

在过去的十年中，面部扫描和性能捕获已经大大提高。基于多视图的方法在高质量面部扫描[5,24]和面部动作捕捉[6,9,51]中表现出优异的性能。最近，基于单个照相机的光称重系统通过在2D测量中利用可变形的3D人脸模型显示出令人信服的性能[22,18,31,47,13,12,56]。手动捕捉大多数是基于单深度传感器的方法[36,46,49,30,57,55,53,43,39,42,50,58]，除少数例外基于多视图系统[4， 43，38]。在这项工作中，我们采用后一种方法，并使用[41]的方法，该方法为RGB图像引入了一个手关键点检测器，可以直接应用于多视图系统来重建3D手关节。

As a way to reduce the parameter space and overcome the complexity of the problems, generative 3D template models have been proposed in each ﬁeld, for example the methods of [2, 33, 37] in body motion capture, the method of [13] for facial motion capture, and very recently, the combined body+hands model of Romero et al. [38]. A generative model with expressive power for total body motion has not been introduced.

作为减少参数空间和克服问题复杂性的一种方法，已经在每个领域提出了生成三维模板模型，例如[2,33,37]在身体运动捕捉中的方法，[13]的方法，面部动作捕捉，以及最近，罗梅罗等组合的身体+手模型。 [38]。尚未引入具有全身运动表现力的生成模型。

image

Figure 2: Part models and a uniﬁed Frankenstein model. (a) The body model [33]; (b) the face model [13]; and (c) a hand rig, where red dots have corresponding 3D keypoints reconstructed from detectors in (a-c). (d) Aligned face and hand models (gray meshes) to the body model (the blue wireframe mesh); and (e) the seamless Frankenstein model.

图2：零件模型和统一的Frankenstein模型。（a）身体模型[33]; （b）人脸模型[13];和（c）手工钻机，其中红点具有从（a-c）中的检测器重建的对应3D关键点。（d）将人脸模型（灰色网格）与人体模型（蓝色线框网格）对齐;和（e）无缝的弗兰肯斯坦模型。

3. Frankenstein Model

3.弗兰肯斯坦模型

The motivation for building the Frankenstein body model is to leverage existing part models—SMPL [33] for the body, FaceWarehouse [13] for the face, and an artistdeﬁned hand rig—each of which capture shape and motion details at an appropriate scale for the corresponding part. This choice is not driven merely by the free availability of the component models: note that due to the trade-off between image resolution and ﬁeld of view of today’s 3D scanning systems, scans used to build detailed face models will generally be captured using a different system than that used for the rest of the body. For our model, we merge all transform bones into a single skeletal hierarchy but keep the native parameterization of each component part to express identity and motion variations, as explained below. As the ﬁnal output, the Frankenstein model produces motion parameters capturing the total body motion of humans, and generates a seamless mesh by blending the vertices of the component meshes.

构建弗兰肯斯坦人体模型的动机是利用现有的零件模型-SMPL [33]用于人体，FaceWarehouse [13]用于人脸，以及一个美术家精细的手动装置 - 每个模型以适当的比例捕捉形状和动作细节相应的部分。这种选择不仅仅取决于组件模型的可用性：请注意，由于当前3D扫描系统的图像分辨率和视野之间的折衷，用于构建详细人脸模型的扫描通常将使用不同的系统比用于身体其他部分的系统。对于我们的模型，我们将所有变换骨骼合并为单个骨架层次结构，但保留每个组件部分的原始参数化以表示身份和运动变化，如下所述。作为最终输出，Frankenstein模型产生捕捉人体总体运动的运动参数，并通过混合组件网格的顶点生成无缝网格。

3.1. Stitching Part Models

3.1。拼接零件模型

The Frankenstein model

image

is parameterized by motion parameters

image

, shape (or identity) parameters

image

, and a global translation parameter

image

Frankenstein模型

image

通过运动参数

image

，形状（或标识）参数

image

和全局转换参数

image

进行参数化，

image

where

image

is a seamless mesh expressing the motion and shape of the target subject.

image

是一个无缝网格，表示目标主体的运动和形状。

The motion and shape parameters of the model are a union of the part models’ parameters:

模型的运动和形状参数是零件模型参数的联合：

image

where the superscripts represent each part model: B for the body model, F for the face model, LH for for the left hand model, and RH for the right hand model.

上标表示每个零件模型：B代表人体模型，F代表人脸模型，LH代表左手模型，RH代表右手模型。

Each of the component part models maps from a subset of the above parameters to a set of vertices, respectively, VB ∈ RN B ×3, VF ∈ RN F ×3, VLH ∈ RN H ×3, and VRH ∈ RN H ×3, where the number of vertices of each mesh part is

image

, and

image

. The ﬁnal mesh of the Frankenstein model,

image

, is deﬁned by linearly blending them with a matrix

image

RN U ×(N B +N F +2N H ):

每个组件模型分别从上述参数的子集映射到一组顶点，分别为VB∈RB×3，VF∈RF×3，VLH∈RNH×3，VRH∈RNH×3其中每个网格部分的顶点数目是

image

，

image

和

image

。Frankenstein模型的最终网格

image

通过将它们与矩阵

image

RN U×（N B + N F + 2N H）线性混合来定义：

image

where T denotes the transpose of a matrix. Note that

image

has fewer vertices than the sum of part models because there are redundant parts in the body model (e.g., face and hands of the body model). In particular, our ﬁnal mesh has

image

vertices. Figure 2 shows the part models which are aligned by manually clicking corresponding points between parts, and also shows the ﬁnal mesh topology of Frankenstein model at the mean shape in the rest pose. The blending matrix C is a very sparse matrix and most rows have a single column set to one with zeros elsewhere, simply copying the vertex locations from the corresponding part models with minimal interpolation at the seams.

其中T表示矩阵的转置。请注意，

image

的顶点数少于零件模型的总和，因为身体模型中存在冗余零件（例如，身体模型的面部和手部）。特别是，我们的最终网格具有

image

顶点。图2显示了通过手动点击零件之间的对应点来对齐的零件模型，并且还显示了Frankenstein模型的最终网格拓扑结构，其平均形状为休息姿势。混合矩阵C是一个非常稀疏的矩阵，大多数行的单列设置为1，其他位置为零，只需从接缝处的最小插值处复制相应零件模型中的顶点位置即可。

In the Frankenstein model, all parts are rigidly linked by a single skeletal hierarchy. This uniﬁcation is achieved by substituting the hands and face branches of the SMPL body skeleton with the corresponding skeletal hierarchies of the detailed part models. All parameters of the Frankenstein model are jointly optimized for motion tracking and identity ﬁtting. The parameterization of each of the part models is detailed in the following sections.

在弗兰肯斯坦模型中，所有部分都由一个单独的骨架层级严格链接。这种统一是通过将SMPL身体骨骼的手和脸部分支替换为详细零件模型的相应骨架层次来实现的。Frankenstein模型的所有参数都针对运动跟踪和身份识别进行了联合优化。以下各节详细介绍了每个零件模型的参数化。

3.2. Body Model

3.2。身体模型

For the body, we use the SMPL model [33] with minor modiﬁcations. In this section, we summarize the salient aspects of the model in our notation. The body model,

image

, is deﬁned as follows,

对于身体，我们使用SMPL模型[33]进行小修改。在本节中，我们用符号总结模型的主要方面。身体模型

image

定义如下，

image

with

image

. The model uses a template mesh of

image

vertices, where we denote the i-th vertex as

image

. The vertices of this template mesh are ﬁrst displaced by a set of blendshapes describing the identity or body shape. Given the vertices in the rest pose, the posed mesh vertices are obtained by linear blend skinning using transformation matrices

image

SE(3) for each of J joints,

与

image

。该模型使用

image

顶点的模板网格，其中我们将第i个顶点表示为

image

。这个模板网格的顶点首先被一组描述身份或身体形状的混合曲线所替代。给定休息姿势中的顶点，姿态网格顶点通过对每个J关节使用变换矩阵

image

SE（3）的线性混合蒙皮来获得，

image

where

image

is the i-th vertex of the k-th blendshape,

image

is the k-th shape coefﬁcient in ϕφB ∈ RKb with

image

the number of identity body shape coefﬁcients, and

image

is the i-th vertex of the mean shape. The transformation matrices

image

encode the transform for each joint j from the rest pose to the posed mesh in world coordinates, which is constructed by following skeleton hierarchy from the root joint with pose parameter

image

(see [33]). The j-th pose parameter

image

is the angle-axis representation of the relative rotation of joint j with respect to its parent joints.

image

is the weight with which transform

image

affects vertex i, with

image

and

image

is the

image

truncated identity matrix to transform from homogenous coordinates to a 3 dimensional vector. We use

image

with

image

, ignoring the last joint of each hand of the original body model. For simplicity, we do not use the pose-dependent blendshapes.

其中

image

是第k个混合形状的第i个顶点，

image

是φφB∈RKb中的第k个形状系数，

image

是单位形体系数的个数，

image

是平均形状的第i个顶点。变换矩阵

image

将从休息姿势到姿态参数IN的每个关节j的变换编码为世界坐标中的姿态网格，它由具有姿势参数

image

（参见[33]）的根关节的以下骨架层次构建。第j个姿态参数

image

是关节j相对于其父关节的相对旋转的角度轴表示。

image

是变换

image

影响顶点i的权重，

image

和

image

是

image

截断的单位矩阵，用于将同维坐标转换为3维向量。我们使用

image

和

image

，忽略原始车身模型的每只手的最后一个关节。为了简单起见，我们不使用姿态相关的混合形状。

3.3. Face Model

3.3。面部模型

As a face model, we build a generative PCA model from the FaceWarehouse dataset [13]. Speciﬁcally, the face part model,

image

, is deﬁned as follows,

作为一个人脸模型，我们从FaceWarehouse数据集中构建一个生成的PCA模型[13]。具体而言，面部模型

image

定义如下，

image

with

image

, where the i-th vertex is

image

, and

image

. The vertices are represented by the linear combination of the subspaces:

与

image

，其中第i个顶点是

image

和

image

。顶点由子空间的线性组合表示：

image

where, as before,

image

denotes i-th vertex of the mean shape, and

image

and

image

are k-th face shape identity (shape) and sth facial expression (pose) parameters respectively. Here, f ki ∈ R3 is the i-th vertex of the k-th identity blendshape (

image

), and

image

is the i-th vertex of the s-th expression blendshape (

image

与之前一样，

image

表示平均形状的第i个顶点，并且

image

和

image

分别是第k个面部形状身份（形状）和面部表情（姿势）参数。这里，f ki∈R3是第k个身份混合形状（

image

）的第i个顶点，并且

image

是第s个表达式混合形状（

image

）的第i个顶点。

Finally, a transformation

image

brings the face vertices into world coordinates. To ensure that the face vertices transform in accordance to the rest of the body, we manually align the mean face

image

with the body mean shape, as shown in Fig. 2. This way, we can apply the transformation of the body model’s head joint

image

as a global transformation for the face model in Eq. 9. However, to keep the face in alignment with the body, an additional transform matrix

image

SE(3) is required to compensate for displacements in the root location of the face joint due to body shape changes in Eq. 6.

最后，转换

image

将脸部顶点带入世界坐标。为了确保脸部顶点根据身体的其余部分进行变换，我们手动将平均脸部

image

与身体平均形状对齐，如图2所示。通过这种方式，我们可以将身体模型的头部关节

image

的变换应用为方程式中人脸模型的全局变换。 9。然而，为了保持脸部与身体对齐，需要额外的变换矩阵

image

SE（3）来补偿由于方程式1中的身体形状变化引起的脸部关节根部位置的位移。 6。

Finally, each face vertex position is given by:

最后，每个人脸的顶点位置由下式给出：

image

where the transform

image

, directly determined by the body shape parameters

image

, aligns the face model with the body model.

其中由体形参数

image

直接确定的变换

image

将人脸模型与人体模型对齐。

3.4. Hand Model

3.4。手模型

We use an artist rigged hand mesh. Our hand model has

image

joints and the mesh is deformed via linear blend skinning. The hand model has a ﬁxed shape, but we introduce scaling parameters for each bone to allow for different ﬁnger sizes. The transform for each joint j is parameterized by the Euler angle rotation with respect to its parent,

image

, and an additional anisotropic scaling factor along each axis,

image

. Speciﬁcally, the linear transform for joint j in the bone’s local reference frame becomes

image

, where

image

converts from an Euler angle representation to a

image

rotation matrix and diag(ϕφj ) is the

image

diagonal matrix with the X,Y ,Z scaling factors

image

on the diagonal. The vertices of the hand in world coordinates are given by LBS with weights

image

我们使用艺术家装配的手网。我们的手模型有

image

接头，网格通过线性混合蒙皮变形。手模型具有固定的形状，但是我们为每个骨骼引入缩放参数以允许不同尺寸的手指。每个关节j的变换通过欧拉角相对于其父母的旋转参数

image

以及沿每个轴

image

的附加各向异性缩放因子进行参数化。具体而言，骨骼局部坐标系中关节j的线性变换变为

image

，其中

image

从欧拉角表示转换为

image

旋转矩阵，而diag（φφj）是具有X，Y，Z缩放因子

image

的

image

对角矩阵在对角线上。世界坐标中手的顶点由LBS给出，重量为

image

：

image

where

image

is each bone’s composed transform (with all parents in the hierarchy),

image

is the transformation of the corresponding hand joint in the body model, and

image

is the transformation that aligns the hand model to the body model. As with the face, this transform depends on the shape parameters of the body model.

image

是每个骨骼组成的变换（与层级中的所有父母一起），

image

是身体模型中相应手关节的变换，而

image

是将手模型与身体模型对齐的变换。与脸部一样，这种变换取决于身体模型的形状参数。

文章引用于 http://tongtianta.site/paper/1129
编辑 Lornatang
校准 Lornatang

Total Capture: A 3D Deformation Model for Tracking Faces, Hands, and Bodies翻译[上]

你可能感兴趣的:(Total Capture: A 3D Deformation Model for Tracking Faces, Hands, and Bodies翻译[上])