论文下载
GitHub
bib:
@INPROCEEDINGS{DaiLi2023Semi,
title = {Semi-Supervised Deep Regression with Uncertainty Consistency and Variational Model Ensembling via Bayesian Neural Networks},
author = {Weihang Dai and Xiaomeng Li and Kwang-Ting Cheng},
booktitle = {AAAI},
year = {2023},
pages = {1--10}
}
Deep regression is an important problem with numerous applications.
These range from computer vision tasks such as age estimation from photographs, to medical tasks such as ejection fraction estimation from echocardiograms for disease tracking.
提出半监督回归的应用场景,年龄估计和医学任务,后续的实验也是按照这样进行的。
Semi-supervised approaches for deep regression are notably under-explored compared to classification and segmentation tasks, however.
说得太对了,半监督兴,半监督回归王。
Unlike classification tasks, which rely on thresholding functions for generating class pseudo-labels, regression tasks use real number target predictions directly as pseudo-labels, making them more sensitive to prediction quality.
半监督分类任务中的伪标签通过阈值来获取,而回归任务中的伪标签是一个实数,对于伪标签的质量更加严苛。
In this work, we propose a novel approach to semi-supervised regression, namely
UncertaintyConsistent
Variational Model
Ensembling (UCVME), which improves training by generating high-quality pseudo-labels and uncertainty estimates forheteroscedastic regression
.
这段话中的关键字很多:
Given that aleatoric uncertainty is only dependent on input data by definition and should be equal for the same inputs, we present a novel uncertainty consistency loss for co-trained models.
aleatoric uncertainty: 随机不确定性,指的是数据中的不确定性,与模型不相干。
Our consistency loss significantly improves uncertainty estimates and allows higher quality pseudo-labels to be assigned greater importance under heteroscedastic regression.
Furthermore, we introduce a novel
variational model ensembling
approach to reduce prediction noise and generate more robust pseudo-labels.
这里简单来说就是用两个模型的平均值来打伪标签,也叫集成(ensemble)。
We analytically show our method generates higher quality targets for unlabeled data and further improves training.
Experiments show that our method outperforms state-of-the-art alternatives on different tasks and can be competitive with supervised methods that use full labels.
由于这个只是其中的方法,我在这里不会详细的介绍,只是大概了解它是什么,能做什么。
Bayesian Neural Networks
从名字能知道也是一种神经网络,可以理解为一种神经网络的变体。其中最大的不同是,普通的神经网络的参数是一个常数,是一个确定的值,其输出当然也是一个确定的值。BNN中的参数则不是,它是一个变量,其输出也是一个变量。看到这里,就会有疑问,都是变量,那网络的前向过程怎么计算呢,这不是玩儿呢?这都是不用怀疑,在实际的操作中,我们是假设变量服从某一种分布来简化问题,其中正态分布比较常见。
具体的来说,在只有猫狗的数据集中,对于一张猫的图片来说,普通的神经网络会输出[0.8, 0.2],表示有0.8的概率表示猫,有0.2的概率表示狗。而在BNN中,对于结果是分布的形式[ N ( 0.7 , 0. 1 2 ) \mathcal{N}(0.7, 0.1^2) N(0.7,0.12), N ( 0.2 , 0.0 1 2 ) \mathcal{N}(0.2, 0.01^2) N(0.2,0.012)],其中方差表示对于预测结果的不确定性。
符号 | 意义 |
---|---|
D : = { ( x i , y i ) } i = 1 N D := \{(x_i, y_i)\}^{N}_{i=1} D:={(xi,yi)}i=1N | 有标记数据 |
D ′ = { x i ′ ′ } i ′ = 1 N ′ D' = \{x^{'}_{i^{'}}\}^{N^{'}}_{i^{'}=1} D′={xi′′}i′=1N′ | 无标记数据 |
f m where m ∈ { a , b } f_m \text{ where } m \in \{a, b\} fm where m∈{a,b} | two BNNs using Monte Carlo dropout |
y ^ i , m \hat{y}_{i,m} y^i,m | prediction of model f m f_m fmfor target label y i y_i yi |
z ^ i , m \hat{z}_{i,m} z^i,m | log-uncertainty prediction log ( σ 2 ) \log(\sigma^2) log(σ2) of model f m f_m fmfor target label y i y_i yi |
我们将 σ 2 \sigma^2 σ2表示为任意不确定性,但在实践中预测对数不确定性 log ( σ 2 ) \log(\sigma^2) log(σ2),这通常是为了避免获得对方差的负面预测。
强行理解一波的话,就是缩小预测不确定性的值域,有点类似于标准化,值域小了,预测的准确度就会高一些。
UCVME is based on two novel ideas: enforcing aleatoric uncertainty consistency to improve uncertainty-based loss weighting, and variational model ensembling for generating high-quality pseudo-labels.
Novel ideas:
两者都是为了一个目标generating high-quality pseudo-labels
。
heteroscedastic regression loss:
L r e g = 1 N ∑ i = 1 N ( y i − y ^ i ) 2 2 σ i 2 + ln ( σ i 2 ) 2 (1) \mathcal{L}_{reg} = \frac{1}{N}\sum_{i=1}^{N}\frac{(y_i-\hat{y}_i)^2}{2\sigma_i^2}+\frac{\ln(\sigma_i^2)}{2}\tag{1} Lreg=N1i=1∑N2σi2(yi−y^i)2+2ln(σi2)(1)
值得注意的是,这个loss的表达是来自于已有的工作12。
最大似然:
max θ log p ( y ∣ x , θ ) = max θ ∑ i = 1 N log p ( y i ∣ y ^ i ( x i , θ ) , σ i 2 ( x i , θ ) ) = max θ ∑ i = 1 N log N ( y ^ i , σ i 2 ) = max θ ∑ i = 1 N log 1 2 π σ i 2 exp ( − ∥ y i − y ^ i ∥ 2 2 σ i 2 ) = max θ ∑ i = 1 N { − ∥ y i − y ^ i ∥ 2 2 σ i 2 − log σ i 2 2 − log 2 π 2 } \begin{aligned} & \max _\theta \log p(y \mid x, \theta) \\ & =\max _\theta \sum_{i=1}^N \log p\left(y_i \mid \hat{y}_i\left(x_i, \theta\right), \sigma_i^2\left(x_i, \theta\right)\right) \\ & =\max _\theta \sum_{i=1}^N \log \mathcal{N}\left(\hat{y}_i, \sigma_i^2\right) \\ & =\max _\theta \sum_{i=1}^N \log \frac{1}{\sqrt{2 \pi \sigma_i^2}} \exp \left(-\frac{\left\|y_i-\hat{y}_i\right\|^2}{2 \sigma_i^2}\right) \\ & =\max _\theta \sum_{i=1}^N\left\{-\frac{\left\|y_i-\hat{y}_i\right\|^2}{2 \sigma_i^2}-\frac{\log \sigma_i^2}{2}-\frac{\log 2 \pi}{2}\right\} \end{aligned} θmaxlogp(y∣x,θ)=θmaxi=1∑Nlogp(yi∣y^i(xi,θ),σi2(xi,θ))=θmaxi=1∑NlogN(y^i,σi2)=θmaxi=1∑Nlog2πσi21exp(−2σi2∥yi−y^i∥2)=θmaxi=1∑N{−2σi2∥yi−y^i∥2−2logσi2−2log2π}
labeled inputs:
unlabeled inputs:
biasvariance decomposition
的角度证明了有效性,对标thresholding function for smoothing
。Total Loss:
L = L r e g l b + L u n c l b + ω u l b ( L r e g u l b + L r e g u l b ) \mathcal{L} = \mathcal{L}_{reg}^{lb} + \mathcal{L}_{unc}^{lb} + \omega_{ulb}(\mathcal{L}_{reg}^{ulb}+\mathcal{L}_{reg}^{ulb}) L=Lreglb+Lunclb+ωulb(Lregulb+Lregulb)
Kendall A, Gal Y. What uncertainties do we need in bayesian deep learning for computer vision?[J]. Advances in neural information processing systems, 2017, 30. ↩︎
https://zhuanlan.zhihu.com/p/568912284 ↩︎