Distribution-Aware Coordinate Representation for Human Pose Estimation估计关键点真实分布的均值位置

今天在arxiv上看到了这篇[论文] (https://arxiv.org/abs/1910.06278),个人认为这是一个很有意思的工作, 利用用heatmap上的最大值以及其对应位置m, 来估计真实高斯分布均值位置μ. 这样的量化误差(下采样导致的量化最小单位误差)能够得到最大程度上的减轻.

论文实验验证了该方法比经验上的估计方法(取峰值到次峰值的1/4偏移处的位置,这个估计其实也是很符合高斯分布了)更准确.

公式6一阶导

D ′ ( x ) ∣ x = μ = ∂ P T ∂ x ∣ x = μ = − Σ − 1 ( x − μ ) ∣ x = μ = 0 \left.\mathcal{D}^{\prime}(\boldsymbol{x})\right|_{\boldsymbol{x}=\boldsymbol{\mu}}=\left.\frac{\partial \mathcal{P}^{T}}{\partial \boldsymbol{x}}\right|_{\boldsymbol{x}=\boldsymbol{\mu}}=-\left.\Sigma^{-1}(\boldsymbol{x}-\boldsymbol{\mu})\right|_{\boldsymbol{x}=\boldsymbol{\mu}}=0 D(x)x=μ=xPTx=μ=Σ1(xμ)x=μ=0

那么 D ′ ( x ) \mathcal{D}^{\prime}(\boldsymbol{x}) D(x) 是一个和 x \boldsymbol{x} x 形状一样的向量, 然而在公式(7)对向量 μ \boldsymbol{\mu} μ泰勒展开:

公式7,高斯分布均值 μ \mu μ处关于 m m 位置的二阶泰勒展开

P ( μ ) = P ( m ) + D ′ ( m ) ( μ − m ) + 1 2 ( μ − m ) T D ′ ′ ( m ) ( μ − m ) \mathcal{P}(\boldsymbol{\mu})=\mathcal{P}(\boldsymbol{m})+\mathcal{D}^{\prime}(\boldsymbol{m})(\boldsymbol{\mu}-\boldsymbol{m})+\frac{1}{2}(\boldsymbol{\mu}-\boldsymbol{m})^{T} \mathcal{D}^{\prime \prime}(\boldsymbol{m})(\boldsymbol{\mu}-\boldsymbol{m}) P(μ)=P(m)+D(m)(μm)+21(μm)TD(m)(μm)

中的第二项 D ′ ( m ) ( μ − m ) \mathcal{D}^{\prime}(\boldsymbol{m})(\boldsymbol{\mu}-\boldsymbol{m}) D(m)(μm) 中的 D ′ ( m ) \mathcal{D}^{\prime}(\boldsymbol{m}) D(m) 是不是应该加上转置,才能得到标量? 即 D ′ ( m ) T ( μ − m ) \mathcal{D}^{\prime}(\boldsymbol{m})^T(\boldsymbol{\mu}-\boldsymbol{m}) D(m)T(μm)

推导

泰勒展开公式

P ( μ ) = P ( m ) + D ′ ( m ) ( μ − m ) + 1 2 ( μ − m ) T D ′ ′ ( m ) ( μ − m ) \mathcal{P}(\boldsymbol{\mu})=\mathcal{P}(\boldsymbol{m})+\mathcal{D}^{\prime}(\boldsymbol{m})(\boldsymbol{\mu}-\boldsymbol{m})+\frac{1}{2}(\boldsymbol{\mu}-\boldsymbol{m})^{T} \mathcal{D}^{\prime \prime}(\boldsymbol{m})(\boldsymbol{\mu}-\boldsymbol{m}) P(μ)=P(m)+D(m)(μm)+21(μm)TD(m)(μm)

代入 P ( μ ) P(\mu) P(μ) P ( m ) P(m) P(m)的高斯分布公式,即,将 μ , m \mu,m μ,m代入下面的式子,约掉常数项
P ( x ; μ , Σ ) = ln ⁡ ( G ) = − ln ⁡ ( 2 π ) − 1 2 ln ⁡ ( ∣ Σ ∣ ) − 1 2 ( x − μ ) T Σ − 1 ( x − μ ) \begin{aligned} \mathcal{P}(\boldsymbol{x} ; \boldsymbol{\mu}, \Sigma)=\ln (\mathcal{G})=&-\ln (2 \pi)-\frac{1}{2} \ln (|\Sigma|) \\ &-\frac{1}{2}(\boldsymbol{x}-\boldsymbol{\mu})^{T} \Sigma^{-1}(\boldsymbol{x}-\boldsymbol{\mu}) \end{aligned} P(x;μ,Σ)=ln(G)=ln(2π)21ln(Σ)21(xμ)TΣ1(xμ)可以得到

0 = − 1 2 ( m − μ ) ⊤ Σ − 1 ( m − μ ) + D ′ ( m ) ⊤ ( μ − m ) + 1 2 ( μ − m ) ⊤ D ′ ′ ( m ) ( μ − m ) − D ′ ( m ) ⊤ ( μ − m ) = ( μ − m ) ⊤ D ′ ′ ( m ) ( μ − m ) − D ′ ( m ) ⊤ = ( μ − m ) ⊤ D ′ ′ ( m ) − D ′ ( m ) ⊤ D ′ ′ ( m ) − 1 = ( μ − m ) ⊤ − D ′ ′ ( m ) − ⊤ D ′ ( m ) = μ − m μ = m − D ′ ′ ( m ) − ⊤ D ′ ( m ) \begin{aligned} 0&=-\frac{1}{2}(m-\mu)^{\top} \Sigma^{-1}(m-\mu) +D^{\prime}(m)^{\top}(\mu-m)+\frac{1}{2}(\mu-m)^{\top} D^{\prime \prime}(m)(\mu-m)\\ -D^{\prime}(m)^{\top}(\mu-m)&=(\mu-m)^{\top} D^{\prime \prime}(m)(\mu-m) \\-D^{\prime}(m)^{\top} &=(\mu-m)^{\top} D^{\prime \prime}(m) \\-D^{\prime}(m)^{\top} D^{\prime \prime}(m)^{-1} &=(\mu-m)^{\top} \\-D^{\prime\prime}(m)^{-\top} D^{\prime}(m) &=\mu-m \\ \mu &=m-D^{\prime \prime}(m)^{-\top} D^{\prime}(m) \end{aligned} 0D(m)(μm)D(m)D(m)D(m)1D(m)D(m)μ=21(mμ)Σ1(mμ)+D(m)(μm)+21(μm)D(m)(μm)=(μm)D(m)(μm)=(μm)D(m)=(μm)=μm=mD(m)D(m)

因为 D ′ ′ ( m ) = - Σ − 1 D^{\prime \prime}(m)=- \Sigma^{-1} D(m)Σ1,在论文中方差矩阵假设为对角阵(可逆) Σ = [ σ 2 0 0 σ 2 ] \Sigma=\left[\begin{array}{ll}{\sigma^{2}} & {0} \\ {0} & {\sigma^{2}}\end{array}\right] Σ=[σ200σ2] (因为xy方向独立), 这意味着 D ′ ′ ( m ) = D ′ ′ ( m ) T D^{\prime \prime}(m)=D^{\prime \prime}(m)^T D(m)=D(m)T, 所以

μ = m − D ′ ′ ( m ) − ⊤ D ′ ( m ) μ = m − D ′ ′ ( m ) − 1 D ′ ( m ) \begin{aligned}\mu &=m-D^{\prime \prime}(m)^{-\top} D^{\prime}(m) \\ \mu&=m-D^{\prime \prime}(m)^{-1} D^{\prime}(m)\end{aligned} μμ=mD(m)D(m)=mD(m)1D(m)

补充一个细节:

上面的推导, 在第三个等式约掉( μ − m \mu-m μm)的条件是假设 μ \mu μ不等于 m m m,
所以下面的等式是更完备的推导:
0 = − 1 2 ( m − μ ) ⊤ Σ − 1 ( m − μ ) + D ′ ( m ) ⊤ ( μ − m ) + 1 2 ( μ − m ) ⊤ D ′ ′ ( m ) ( μ − m ) − D ′ ( m ) ⊤ ( μ − m ) = ( μ − m ) ⊤ D ′ ′ ( m ) ( μ − m ) − D ′ ( m ) ⊤ D ′ ′ ( m ) − 1 ( μ − m ) = ( μ − m ) ⊤ ( μ − m ) 0 = [ μ − m + D ′ ′ ( m ) − ⊤ D ′ ( m ) ] ( μ − m ) 0 = [ μ − m + D ′ ′ ( m ) − 1 D ′ ( m ) ] ( μ − m ) \begin{aligned} 0&=-\frac{1}{2}(m-\mu)^{\top} \Sigma^{-1}(m-\mu) +D^{\prime}(m)^{\top}(\mu-m)+\frac{1}{2}(\mu-m)^{\top} D^{\prime \prime}(m)(\mu-m)\\ -D^{\prime}(m)^{\top}(\mu-m)&=(\mu-m)^{\top} D^{\prime \prime}(m)(\mu-m) \\-D^{\prime}(m)^{\top} D^{\prime \prime}(m)^{-1} (\mu-m)&=(\mu-m)^{\top}(\mu-m) \\ 0 &=[\mu-m+D^{\prime \prime}(m)^{-\top} D^{\prime}(m)](\mu-m) \\ 0 &=[\mu-m+D^{\prime \prime}(m)^{-1} D^{\prime}(m)](\mu-m) \end{aligned} 0D(m)(μm)D(m)D(m)1(μm)00=21(mμ)Σ1(mμ)+D(m)(μm)+21(μm)D(m)(μm)=(μm)D(m)(μm)=(μm)(μm)=[μm+D(m)D(m)](μm)=[μm+D(m)1D(m)](μm)

这个推导的建立在两个假设上面:
(1) 下采样后得到的heatmap上面的取值, 被假设为服从真实关键点位置的高斯分布
(2) 二阶泰勒展开的近似

那么 μ = m − D ′ ′ ( m ) − 1 D ′ ( m ) \mu =m-D^{\prime \prime}(m)^{-1} D^{\prime}(m) μ=mD(m)1D(m) 也包含了 μ = m \mu=m μ=m的可能, 因为

D ′ ( m ) = 0 ⇔ m D^{\prime}(m)=0\Leftrightarrow m D(m)=0m在高斯分布的均值位置 ⇔ μ = m \Leftrightarrow \mu=m μ=m

所以 μ = m − D ′ ′ ( m ) − 1 D ′ ( m ) \mu =m-D^{\prime \prime}(m)^{-1} D^{\prime}(m) μ=mD(m)1D(m)是完备的

如果有问题, 还请指出~

原作者也给出了关于公式的解释: http://www.ilovepose.cn/t/99

你可能感兴趣的:(论文学习)