今天在arxiv上看到了这篇[论文] (https://arxiv.org/abs/1910.06278),个人认为这是一个很有意思的工作, 利用用heatmap上的最大值以及其对应位置m, 来估计真实高斯分布均值位置μ. 这样的量化误差(下采样导致的量化最小单位误差)能够得到最大程度上的减轻.
论文实验验证了该方法比经验上的估计方法(取峰值到次峰值的1/4偏移处的位置,这个估计其实也是很符合高斯分布了)更准确.
公式6一阶导
D ′ ( x ) ∣ x = μ = ∂ P T ∂ x ∣ x = μ = − Σ − 1 ( x − μ ) ∣ x = μ = 0 \left.\mathcal{D}^{\prime}(\boldsymbol{x})\right|_{\boldsymbol{x}=\boldsymbol{\mu}}=\left.\frac{\partial \mathcal{P}^{T}}{\partial \boldsymbol{x}}\right|_{\boldsymbol{x}=\boldsymbol{\mu}}=-\left.\Sigma^{-1}(\boldsymbol{x}-\boldsymbol{\mu})\right|_{\boldsymbol{x}=\boldsymbol{\mu}}=0 D′(x)∣x=μ=∂x∂PT∣∣∣x=μ=−Σ−1(x−μ)∣∣x=μ=0
那么 D ′ ( x ) \mathcal{D}^{\prime}(\boldsymbol{x}) D′(x) 是一个和 x \boldsymbol{x} x 形状一样的向量, 然而在公式(7)对向量 μ \boldsymbol{\mu} μ泰勒展开:
公式7,高斯分布均值 μ \mu μ处关于 m m m位置的二阶泰勒展开
P ( μ ) = P ( m ) + D ′ ( m ) ( μ − m ) + 1 2 ( μ − m ) T D ′ ′ ( m ) ( μ − m ) \mathcal{P}(\boldsymbol{\mu})=\mathcal{P}(\boldsymbol{m})+\mathcal{D}^{\prime}(\boldsymbol{m})(\boldsymbol{\mu}-\boldsymbol{m})+\frac{1}{2}(\boldsymbol{\mu}-\boldsymbol{m})^{T} \mathcal{D}^{\prime \prime}(\boldsymbol{m})(\boldsymbol{\mu}-\boldsymbol{m}) P(μ)=P(m)+D′(m)(μ−m)+21(μ−m)TD′′(m)(μ−m)
中的第二项 D ′ ( m ) ( μ − m ) \mathcal{D}^{\prime}(\boldsymbol{m})(\boldsymbol{\mu}-\boldsymbol{m}) D′(m)(μ−m) 中的 D ′ ( m ) \mathcal{D}^{\prime}(\boldsymbol{m}) D′(m) 是不是应该加上转置,才能得到标量? 即 D ′ ( m ) T ( μ − m ) \mathcal{D}^{\prime}(\boldsymbol{m})^T(\boldsymbol{\mu}-\boldsymbol{m}) D′(m)T(μ−m)
泰勒展开公式
P ( μ ) = P ( m ) + D ′ ( m ) ( μ − m ) + 1 2 ( μ − m ) T D ′ ′ ( m ) ( μ − m ) \mathcal{P}(\boldsymbol{\mu})=\mathcal{P}(\boldsymbol{m})+\mathcal{D}^{\prime}(\boldsymbol{m})(\boldsymbol{\mu}-\boldsymbol{m})+\frac{1}{2}(\boldsymbol{\mu}-\boldsymbol{m})^{T} \mathcal{D}^{\prime \prime}(\boldsymbol{m})(\boldsymbol{\mu}-\boldsymbol{m}) P(μ)=P(m)+D′(m)(μ−m)+21(μ−m)TD′′(m)(μ−m)
代入 P ( μ ) P(\mu) P(μ)和 P ( m ) P(m) P(m)的高斯分布公式,即,将 μ , m \mu,m μ,m代入下面的式子,约掉常数项
P ( x ; μ , Σ ) = ln ( G ) = − ln ( 2 π ) − 1 2 ln ( ∣ Σ ∣ ) − 1 2 ( x − μ ) T Σ − 1 ( x − μ ) \begin{aligned} \mathcal{P}(\boldsymbol{x} ; \boldsymbol{\mu}, \Sigma)=\ln (\mathcal{G})=&-\ln (2 \pi)-\frac{1}{2} \ln (|\Sigma|) \\ &-\frac{1}{2}(\boldsymbol{x}-\boldsymbol{\mu})^{T} \Sigma^{-1}(\boldsymbol{x}-\boldsymbol{\mu}) \end{aligned} P(x;μ,Σ)=ln(G)=−ln(2π)−21ln(∣Σ∣)−21(x−μ)TΣ−1(x−μ)可以得到
0 = − 1 2 ( m − μ ) ⊤ Σ − 1 ( m − μ ) + D ′ ( m ) ⊤ ( μ − m ) + 1 2 ( μ − m ) ⊤ D ′ ′ ( m ) ( μ − m ) − D ′ ( m ) ⊤ ( μ − m ) = ( μ − m ) ⊤ D ′ ′ ( m ) ( μ − m ) − D ′ ( m ) ⊤ = ( μ − m ) ⊤ D ′ ′ ( m ) − D ′ ( m ) ⊤ D ′ ′ ( m ) − 1 = ( μ − m ) ⊤ − D ′ ′ ( m ) − ⊤ D ′ ( m ) = μ − m μ = m − D ′ ′ ( m ) − ⊤ D ′ ( m ) \begin{aligned} 0&=-\frac{1}{2}(m-\mu)^{\top} \Sigma^{-1}(m-\mu) +D^{\prime}(m)^{\top}(\mu-m)+\frac{1}{2}(\mu-m)^{\top} D^{\prime \prime}(m)(\mu-m)\\ -D^{\prime}(m)^{\top}(\mu-m)&=(\mu-m)^{\top} D^{\prime \prime}(m)(\mu-m) \\-D^{\prime}(m)^{\top} &=(\mu-m)^{\top} D^{\prime \prime}(m) \\-D^{\prime}(m)^{\top} D^{\prime \prime}(m)^{-1} &=(\mu-m)^{\top} \\-D^{\prime\prime}(m)^{-\top} D^{\prime}(m) &=\mu-m \\ \mu &=m-D^{\prime \prime}(m)^{-\top} D^{\prime}(m) \end{aligned} 0−D′(m)⊤(μ−m)−D′(m)⊤−D′(m)⊤D′′(m)−1−D′′(m)−⊤D′(m)μ=−21(m−μ)⊤Σ−1(m−μ)+D′(m)⊤(μ−m)+21(μ−m)⊤D′′(m)(μ−m)=(μ−m)⊤D′′(m)(μ−m)=(μ−m)⊤D′′(m)=(μ−m)⊤=μ−m=m−D′′(m)−⊤D′(m)
因为 D ′ ′ ( m ) = - Σ − 1 D^{\prime \prime}(m)=- \Sigma^{-1} D′′(m)=-Σ−1,在论文中方差矩阵假设为对角阵(可逆) Σ = [ σ 2 0 0 σ 2 ] \Sigma=\left[\begin{array}{ll}{\sigma^{2}} & {0} \\ {0} & {\sigma^{2}}\end{array}\right] Σ=[σ200σ2] (因为xy方向独立), 这意味着 D ′ ′ ( m ) = D ′ ′ ( m ) T D^{\prime \prime}(m)=D^{\prime \prime}(m)^T D′′(m)=D′′(m)T, 所以
μ = m − D ′ ′ ( m ) − ⊤ D ′ ( m ) μ = m − D ′ ′ ( m ) − 1 D ′ ( m ) \begin{aligned}\mu &=m-D^{\prime \prime}(m)^{-\top} D^{\prime}(m) \\ \mu&=m-D^{\prime \prime}(m)^{-1} D^{\prime}(m)\end{aligned} μμ=m−D′′(m)−⊤D′(m)=m−D′′(m)−1D′(m)
补充一个细节:
上面的推导, 在第三个等式约掉( μ − m \mu-m μ−m)的条件是假设 μ \mu μ不等于 m m m,
所以下面的等式是更完备的推导:
0 = − 1 2 ( m − μ ) ⊤ Σ − 1 ( m − μ ) + D ′ ( m ) ⊤ ( μ − m ) + 1 2 ( μ − m ) ⊤ D ′ ′ ( m ) ( μ − m ) − D ′ ( m ) ⊤ ( μ − m ) = ( μ − m ) ⊤ D ′ ′ ( m ) ( μ − m ) − D ′ ( m ) ⊤ D ′ ′ ( m ) − 1 ( μ − m ) = ( μ − m ) ⊤ ( μ − m ) 0 = [ μ − m + D ′ ′ ( m ) − ⊤ D ′ ( m ) ] ( μ − m ) 0 = [ μ − m + D ′ ′ ( m ) − 1 D ′ ( m ) ] ( μ − m ) \begin{aligned} 0&=-\frac{1}{2}(m-\mu)^{\top} \Sigma^{-1}(m-\mu) +D^{\prime}(m)^{\top}(\mu-m)+\frac{1}{2}(\mu-m)^{\top} D^{\prime \prime}(m)(\mu-m)\\ -D^{\prime}(m)^{\top}(\mu-m)&=(\mu-m)^{\top} D^{\prime \prime}(m)(\mu-m) \\-D^{\prime}(m)^{\top} D^{\prime \prime}(m)^{-1} (\mu-m)&=(\mu-m)^{\top}(\mu-m) \\ 0 &=[\mu-m+D^{\prime \prime}(m)^{-\top} D^{\prime}(m)](\mu-m) \\ 0 &=[\mu-m+D^{\prime \prime}(m)^{-1} D^{\prime}(m)](\mu-m) \end{aligned} 0−D′(m)⊤(μ−m)−D′(m)⊤D′′(m)−1(μ−m)00=−21(m−μ)⊤Σ−1(m−μ)+D′(m)⊤(μ−m)+21(μ−m)⊤D′′(m)(μ−m)=(μ−m)⊤D′′(m)(μ−m)=(μ−m)⊤(μ−m)=[μ−m+D′′(m)−⊤D′(m)](μ−m)=[μ−m+D′′(m)−1D′(m)](μ−m)
这个推导的建立在两个假设上面:
(1) 下采样后得到的heatmap上面的取值, 被假设为服从真实关键点位置的高斯分布
(2) 二阶泰勒展开的近似
那么 μ = m − D ′ ′ ( m ) − 1 D ′ ( m ) \mu =m-D^{\prime \prime}(m)^{-1} D^{\prime}(m) μ=m−D′′(m)−1D′(m) 也包含了 μ = m \mu=m μ=m的可能, 因为
D ′ ( m ) = 0 ⇔ m D^{\prime}(m)=0\Leftrightarrow m D′(m)=0⇔m在高斯分布的均值位置 ⇔ μ = m \Leftrightarrow \mu=m ⇔μ=m
所以 μ = m − D ′ ′ ( m ) − 1 D ′ ( m ) \mu =m-D^{\prime \prime}(m)^{-1} D^{\prime}(m) μ=m−D′′(m)−1D′(m)是完备的
如果有问题, 还请指出~
原作者也给出了关于公式的解释: http://www.ilovepose.cn/t/99