F3Net is a novel SOD framework which achieves remarkable performance in producing high quality saliency maps.
F3Net consists of three important parts, CFM, CFD, PPA.
CFM (cross feature module):
CFM is designed to fuse features from different levels by element-wise multiplication, which can mitigate the discrepancy betweeen features.(为了减少特征之间的差异, CFM模块可以通过元素相乘来融合不同层次的特征).
1. CFM不采用传统的addition 与 concatenation, 而是采用一种 selective fusion strategy. In this strategy, redundant information will be suppressed to avoid the contamination between features and important features will complement each other.(冗余信息将呗抑制以避免特征之间的contamination, 重要的特征将被补充)CFM is also able to remove background noise and sharpen boundaries.
2. CFM can not solve the stituaion that high level features may suffer from information loss and distortion due to downsampling.(CFM无法解决由于下采样导致的高层特征的信息丢失和失真)
CFD(cascaded feedback decoder):
CFD contains multiple sub-decoders, each of which contains both bottom-up and top-down processes.For bottom-up process, multi-level features are aggregated by CFM gradually. For top-down process, aggregated features are feedback into previous features to refine them.(对于自下而上的过程,多级特征通过CFM逐渐融合;对于自上而下的过程,融合的特征通过反馈到previous features 改进)
PPA(pixel position aware loss):
Pixels located at boundaries or elongated areas are more difficult and discriminating. Paying more attention to these hard pixels can further enhance model generalization .PPA loss
assigns different weights to different pixels, which extends binary cross entropy. The weight of each pixel is determined by its surrounding pixels. Hard pixels will get larger weights and easy pixels will get smaller ones.(每个像素的权重由其周围的像素决定,hard pixels 将获得更大的权重)
The drawback of BCE loss :
1. 独立计算每个像素的损失,忽略了图像的全局结构
2. 在背景占主导地位的图片中,前景像素的损失将被稀释
3. BCE loss treats each pixel equally. In fact, pixel located on cluttered or elongated areas are prone to wrong predictions and deserve more attention.(容易出现错误的预测)
a-[0,1] 其值越大,代表pixel(i,j)与周围的像素差异明显
WIOU lossIOU loss仍然平等对待每个pixel,没有考虑到pixel之间的差异. 而wIOU loss则更叫关注hard pixel
pixel position aware loss total loss