论文笔记(4):排序图像质量评价 RankIQA

RankIQA: Learning from Rankings for No-reference Image Quality Assessment
github: https://github.com/xialeiliu/RankIQA

Part 1:

Definition: 无参考图像质量评价(NR-IQA)的目的是对于给定的一幅图像,不使用Ground Truth(GT)图像得到其评价分数。
Problem: 对于大样本集,通过人工标注得到图像主观分数需要花费很大的代价,因此训练一个深的CNN网络是非常困难的。
Observation & Motivation: 通过对图像进行不同程度的退化,能够得到不同质量的图像。
classic method : 直接计算CNN输出分数和GT图像分数的误差。
Paper method:

  1. 使用不同等级的退化因子对GT图像进行退化得到大量的ranked images.
  2. 使用ranked images 训练双生网络,得到ranking score.
  3. 将在ranked images上训练好的网络在具有label的样本集上进行fine-tune.

Part 2 method:

  1. 使用pairwise ranking hinge loss训练双生网络:
    论文笔记(4):排序图像质量评价 RankIQA_第1张图片

  2. 优化双生网络的训练过程
    例如,对GT图像A使用三个level的高斯噪声进行退化得到A1,A2和A3。原始的双生网络将(A1,A2)(A2,A3)和(A1,A3)三个pairwise images 送入网络进行训练。文章中采用了随机选取其中一个pairwise images送入网络进行训练,大大加快了网络的收敛速度。

Part 3 experiment:

  1. dataset: The LIVE consists of 808 images generated from 29 original images by distorting them with five types of distortion: Gaussian blur(GB), Gaussian noise (GN), JPEG compression (JPEG),
    JPEG2000 compression (JP2K) and fast fading (FF). The ground-truth Mean Opinion Score for each image is in the range [0, 100] and is estimated using annotations by 161 human annotators.The TID2013 dataset consists of 25 reference images with 3000 distorted images from 24 different distortion types at 5 degradation levels. Mean Opinion Scores are in the range [0, 9]. Distortion types include a range of noise, compression, and transmission artifacts.
  2. Network architectures: Shallow-4conv, AlexNet 和 VGG-16.
  3. Evaluation protocols: 线性相关系数 Linear Correlation Coefficient (LCC)和Spearman Rank Order Correlation Coefficient (SROCC)
    论文笔记(4):排序图像质量评价 RankIQA_第2张图片
  4. 实验结果
    论文笔记(4):排序图像质量评价 RankIQA_第3张图片
    论文笔记(4):排序图像质量评价 RankIQA_第4张图片
    论文笔记(4):排序图像质量评价 RankIQA_第5张图片
  5. 分析
    hard negative mining strategy: 网络开始训练时使用36个pairs, 每5000个iteration增加hard pair的数量,最大到72个。
    Network performance analysis: We randomly split on the original, high-quality images before distortion from the LIVE dataset into 80% training and 20% testing samples and compute the average LCC and SROCC scores on the testing set after training to convergence. This process is repeated ten times and the results are averaged.

S. Chopra, R. Hadsell, and Y. LeCun. Learning a similarity
metric discriminatively, with application to face verification.
In Computer Vision and Pattern Recognition, 2005.
CVPR 2005. IEEE Computer Society Conference on, volume
1, pages 539–546. IEEE, 2005.
