论文笔记Monocular Depth Estimation through Virtual-world Supervision and Real-world SfM Self-Supervision

Problem Formulation

  • MDE on CNN : Ψ ( θ ; x ) → d \Psi(\theta;x) \rightarrow d Ψ(θ;x)d
  • θ ∗ = min ⁡ θ L ( θ ; X r , X s . Y s ) \theta^* = \min_{\theta} \mathcal{L}(\theta;X^r,X^s.Y^s) θ=minθL(θ;Xr,Xs.Ys).

Contributions

  • A CNN architecture training on virtual-world supervision and real-world SfM self supervision.
  • Reduce domain discrepancied between supervised (virtual world) and semi-supervised (real world) data at the space of the extracted features (backbone bottleneck) by Gradient reveral layer(GRL).

Methods

  • Assume two sources of Data 1. Real-world traffic X r = { x t r } t = 1 N r X^r = \{x^r_t\}^{N^r}_{t=1} Xr={xtr}t=1Nr, N r N^r Nr is the num of real-world sequences. 2. Analogous sequences X s = { x t s } t = 1 N s X^s = \{x^s_t\}^{N^s}_{t=1} Xs={xts}t=1Ns, N r N_r Nr is the Num of frames from the Virtual-world .

MonoDEVSNet architecture: Ψ ( θ ; x ) \Psi(\theta;x) Ψ(θ;x)

  • Ψ ( θ ; x ) \Psi(\theta;x) Ψ(θ;x) has three blocks: Encoding block with θ e n c \theta^{enc} θenc, a multi-scale pyramidal block, θ p y r \theta^{pyr} θpyr and a decoding block with θ d e c \theta^{dec} θdec.
  • The role of the multi-scale pyramid block is to adapt the bottleneck of the chosen encoder to the decoder.
  • L \mathcal{L} L relies on three different losses, L s f ( θ , V s f ; X r ) , L s p ( θ , X s ; Y s ) , L D A ( θ e n c , V D A ; X r , X s ) \mathcal{L}^{sf}(\theta,\mathcal{V}^{sf};X^r),\mathcal{L}^{sp}(\theta,X^{s};Y^s),\mathcal{L}^{DA}(\theta^{enc},\mathcal{V}^{DA};X^r,X^s ) Lsf(θ,Vsf;Xr),Lsp(θ,Xs;Ys),LDA(θenc,VDA;Xr,Xs).
  • L s f ( θ , V s f ; X r ) \mathcal{L}^{sf}(\theta,\mathcal{V}^{sf};X^r) Lsf(θ,Vsf;Xr) Sfm self-supervised loss is almost like Mono2.
  • L s p ( θ , X s ; Y s ) \mathcal{L}^{sp}(\theta,X^{s};Y^s) Lsp(θ,Xs;Ys) will discard pixels with d t s ( p ) ≥ d m a x d^s_t(p) \geq d^{max} dts(p)dmax.
    论文笔记Monocular Depth Estimation through Virtual-world Supervision and Real-world SfM Self-Supervision_第1张图片
  • Domain adaptation loss L D A ( θ e n c , V D A ; X r , X s ) \mathcal{L}^{DA}(\theta^{enc},\mathcal{V}^{DA};X^r,X^s ) LDA(θenc,VDA;Xr,Xs)
  • Aim at learning the Depth features, so hope the feature couldn’t be distinguished whether from real(target domain) or virtual wold(source domain).
  • In the Gradient-Reversal-Layer, the domain invariance of θ e n c \theta^{enc} θenc is measured by a binary target/source domain-classifier CNN,D,of weights { θ e n c , L D A } \{ \theta^{enc}, \mathcal{L}^{DA} \} {θenc,LDA}.

论文笔记Monocular Depth Estimation through Virtual-world Supervision and Real-world SfM Self-Supervision_第2张图片

  • D ( θ e n c , V D A ; X t r ) D(\theta^{enc},\mathcal{V}^{DA};X^r_t) D(θenc,VDA;Xtr) outputs 1 if x ∈ X r x \in X^r xXr and 0 if x ∈ X s x \in X^s xXs. This means that during forward passes of training, it acts as an identity function, while, during back-propagation, it reverses the gradient vector passing through it. Both the GRL and V D A \mathcal{V}^{DA} VDA are required at training time, but not at testing time.

Unsupervised Domain Adaptation by Backpropagation[1]

论文笔记Monocular Depth Estimation through Virtual-world Supervision and Real-world SfM Self-Supervision_第3张图片

  • At training time, in order to obtain domain-invariant features, we seek the parameters θ f θ_f θf of the feature mapping that maximize the loss of the domain classifier (by making the two feature distributions as similar as possible), while simultaneously seeking the parameters θ d θ_d θd of the domain classifier that minimize the loss of the domain classifier. In addition, we seek to minimize the loss of the label predictor.
    论文笔记Monocular Depth Estimation through Virtual-world Supervision and Real-world SfM Self-Supervision_第4张图片

论文笔记Monocular Depth Estimation through Virtual-world Supervision and Real-world SfM Self-Supervision_第5张图片
论文笔记Monocular Depth Estimation through Virtual-world Supervision and Real-world SfM Self-Supervision_第6张图片

  • Such reduction can be accomplished by introducing a special gradient reversal layer (GRL) defined as follows. The gradient reversal layer has no parameters associated with it (apart from the meta-parameter λ, which is not updated by backpropagation). During the forward propagation, GRL acts as an identity transform. During the backpropagation though, GRL takes the gradient from the subsequent level, multiplies it by −λ and passes it to the preceding layer.

Important Reference

[1] Y. Ganin and V. Lempitsky, “Unsupervised domain adaptation by backpropagation,” in Int. Conf. on Machine Learning (ICML), 2015.

你可能感兴趣的:(论文阅读)