项目参考AAAI Association for the Advancement of Artificial Intelligence
近年来,Swin Transformer作为一种新兴的注意力机制模型,已经在自然语言处理和计算机视觉领域取得了显著的成果。Swin Transformer采用了一种分层的注意力机制,能够在处理大尺寸图像时保持较高的效率和准确性。因此,将Swin Transformer应用于图像超分辨率任务是非常有前景的研究方向。
基于Swin Transformer的图像超分辨率系统具有以下几个重要的意义:
首先,基于Swin Transformer的图像超分辨率系统可以提供更高质量的高分辨率图像。Swin Transformer的注意力机制能够更好地捕捉到图像中的细节和纹理,从而生成更加真实和清晰的图像。这对于许多应用领域,如医学图像处理和卫星图像处理,具有重要的意义。
其次,基于Swin Transformer的图像超分辨率系统可以提高计算效率。传统的CNN方法在处理大尺寸图像时需要大量的计算资源和存储空间,限制了它们在实际应用中的可行性。而Swin Transformer采用了一种分层的注意力机制,能够在处理大尺寸图像时保持较高的效率和准确性,从而降低了计算成本。
最后,基于Swin Transformer的图像超分辨率系统可以为其他相关领域的研究提供借鉴和参考。Swin Transformer作为一种新兴的注意力机制模型,已经在自然语言处理和计算机视觉领域取得了显著的成果。将其应用于图像超分辨率任务可以为其他领域的研究提供新的思路和方法。
综上所述,基于Swin Transformer的图像超分辨率系统具有重要的研究背景和意义。它可以提供更高质量的高分辨率图像,提高计算效率,并为其他相关领域的研究提供借鉴和参考。随着深度学习和注意力机制的不断发展,相信基于Swin Transformer的图像超分辨率系统将在未来取得更加广泛的应用和研究进展。
class Swin2SR:
def __init__(self, args):
self.args = args
self.device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
self.model = self.define_model()
self.model = self.model.to(self.device)
self.test_results = OrderedDict()
self.test_results['psnr'] = []
self.test_results['ssim'] = []
self.test_results['psnr_y'] = []
self.test_results['ssim_y'] = []
self.test_results['psnrb'] = []
self.test_results['psnrb_y'] = []
self.psnr, self.ssim, self.psnr_y, self.ssim_y, self.psnrb, self.psnrb_y = 0, 0, 0, 0, 0, 0
def define_model(self):
# 001 classical image sr
if self.args.task == 'classical_sr':
model = net(upscale=self.args.scale, in_chans=3, img_size=self.args.training_patch_size, window_size=8,
num_classes=3, embed_dim=96, depths=[2, 2, 6, 2], num_heads=[3, 6, 12, 24],
mlp_ratio=4, upsampler='pixelshuffle', upsampler_params={'scale': self.args.scale})
# 002 lightweight image sr
elif self.args.task == 'lightweight_sr':
model = net(upscale=self.args.scale, in_chans=3, img_size=self.args.training_patch_size, window_size=8,
num_classes=3, embed_dim=48, depths=[2, 2, 6, 2], num_heads=[3, 6, 12, 24],
mlp_ratio=4, upsampler='pixelshuffle', upsampler_params={'scale': self.args.scale})
# 003 real image sr
elif self.args.task == 'real_sr':
if self.args.large_model:
model = net(upscale=self.args.scale, in_chans=3, img_size=self.args.training_patch_size, window_size=8,
num_classes=3, embed_dim=96, depths=[2, 2, 18, 2], num_heads=[3, 6, 12, 24],
mlp_ratio=4, upsampler='pixelshuffle', upsampler_params={'scale': self.args.scale})
model = net(upscale=self.args.scale, in_chans=3, img_size=self.args.training_patch_size, window_size=8,
num_classes=3, embed_dim=48, depths=[2, 2, 18, 2], num_heads=[3, 6, 12, 24],
mlp_ratio=4, upsampler='pixelshuffle', upsampler_params={'scale': self.args.scale})
# 004 grayscale denoising
elif self.args.task == 'gray_dn':
model = net(upscale=1, in_chans=1, img_size=self.args.training_patch_size, window_size=8,
num_classes=1, embed_dim=48, depths=[2, 2, 6, 2], num_heads=[3, 6, 12, 24],
mlp_ratio=4, upsampler='pixelshuffle', upsampler_params={'scale': 1})
# 005 color denoising
elif self.args.task == 'color_dn':
model = net(upscale=1, in_chans=3, img_size=self.args.training_patch_size, window_size=8,
num_classes=3, embed_dim=48, depths=[2, 2, 6, 2], num_heads=[3, 6, 12, 24],
mlp_ratio=4, upsampler='pixelshuffle', upsampler_params={'scale': 1})
# 006 jpeg compression artifact reduction
elif self.args.task == 'jpeg_car':
model = net(upscale=1, in_chans=3, img_size=self.args.training_patch_size, window_size=8,
num_classes=3, embed_dim=48, depths=[2, 2, 6, 2], num_heads=[3, 6, 12, 24],
mlp_ratio=4, upsampler='pixelshuffle', upsampler_params={'scale': 1})
# 007 color jpeg compression artifact reduction
elif self.args.task == 'color_jpeg_car':
model = net(upscale=1, in_chans=3, img_size=self.args.training_patch_size, window_size=8,
num_classes=3, embed_dim=48, depths=[2, 2, 6, 2], num_heads=[3, 6, 12, 24],
mlp_ratio=4, upsampler='pixelshuffle', upsampler_params={'scale': 1})
raise NotImplementedError(f'Task [{self.args.task}] is not implemented.')
return model
def setup(self):
folder, save_dir, border, window_size = self.args.folder_lq, './outputs/', 0, self.args.training_patch_size
return folder, save_dir, border, window_size
def get_image_pair(self, path):
imgname = os.path.splitext(os.path.basename(path))[0]
img_lq = cv2.imread(path, cv2.IMREAD_UNCHANGED)
img_gt = None
if self.args.folder_gt is not None:
img_gt = cv2.imread(os.path.join(self.args.folder_gt, f'{imgname}.png'), cv2.IMREAD_UNCHANGED)
return imgname, img_lq, img_gt
def test(self, img_lq, model, args, window_size):
_, _, h_old, w_old = img_lq.size()
h_pad = (h_old // window_size + 1) * window_size - h_old
w_pad = (w_old // window_size + 1) * window_size - w_old
img_lq = torch.cat([img_lq, torch.flip(img_lq, [2])], 2)[:, :, :h_old + h_pad, :]
img_lq = torch.cat([img_lq, torch.flip(img_lq, [3])], 3)[:, :, :, :w_old + w_pad]
output = model(img_lq)
if args.task == 'compressed_sr':
output = output[0][..., :h_old * args.scale, :w_old * args.scale]
output = output[..., :h_old * args.scale, :w_old * args.scale]
return output
def evaluate(self, output, img_gt, border):
output = output.data.squeeze().float().cpu().clamp_(0, 1).numpy()
if output.ndim == 3:
output = np.transpose(output[[2, 1, 0], :, :], (1, 2, 0)) # CHW-RGB to HCW-BGR
output = (output * 255.0).round().astype(np.uint8) # float32 to uint8
cv2.imwrite(f'{save_dir}/{imgname}_Swin2SR.png', output)
if img_gt is not None:
img_gt = (img_gt * 255.0).round().astype(np.uint8) # float32 to uint8
img_gt = img_gt[:h_old * args.scale, :w_old * args.scale, ...] # crop gt
img_gt = np.squeeze(img_gt)
psnr = util.calculate_psnr(output, img_gt, crop_border=border)
ssim = util.calculate_ssim(output, img_gt, crop_border=border)
if img_gt.ndim == 3: # RGB image
psnr_y = util.calculate_psnr(output, img_gt, crop_border=border, test_y_channel=True)
ssim_y = util.calculate_ssim(output, img_gt, crop_border=border, test_y_channel=True)
if args.task in ['jpeg_car', 'color_jpeg_car']:
psnrb = util.calculate_psnrb(output, img_gt, crop_border=border, test_y_channel=False)
if args.task in ['color_jpeg_car']:
psnrb_y = util.calculate_psnrb(output, img_gt, crop_border=border, test_y_channel=True)
print('Testing {:d} {:20s} - PSNR: {:.2f} dB; SSIM: {:.4f}; PSNRB: {:.2f} dB;'
class Predictor(BasePredictor):
def setup(self):
"""Load the model into memory to make running multiple predictions efficient"""
print("Loading pipeline...")
self.device = "cuda:0"
args = argparse.Namespace()
args.scale = 4
args.large_model = False
tasks = ["classical_sr", "compressed_sr", "real_sr"]
paths = [
sizes = [64, 48, 128]
self.models = {}
for task, path, size in zip(tasks, paths, sizes):
args.training_patch_size = size
args.task, args.model_path = task, path
self.models[task] = define_model(args)
self.models[task] = self.models[task].to(self.device)
def predict(
image: Path = Input(description="Input image"),
task: str = Input(
description="Choose a task",
choices=["classical_sr", "real_sr", "compressed_sr"],
) -> Path:
"""Run a single prediction on the model"""
model = self.models[task]
window_size = 8
scale = 4
img_lq = cv2.imread(str(image), cv2.IMREAD_COLOR).astype(np.float32) / 255.0
img_lq = np.transpose(
img_lq if img_lq.shape[2] == 1 else img_lq[:, :, [2, 1, 0]], (2, 0, 1)
img_lq = (
# inference
with torch.no_grad():
# pad input image to be a multiple of window_size
_, _, h_old, w_old = img_lq.size()
h_pad = (h_old // window_size + 1) * window_size - h_old
w_pad = (w_old // window_size + 1) * window_size - w_old
img_lq = torch.cat([img_lq, torch.flip(img_lq, [2])], 2)[
:, :, : h_old + h_pad, :
img_lq = torch.cat([img_lq, torch.flip(img_lq, [3])], 3)[
:, :, :, : w_old + w_pad
output = model(img_lq)
if task == "compressed_sr":
output = output[0][..., : h_old * scale, : w_old * scale]
output = output[..., : h_old * scale, : w_old * scale]
# save image
output = output.data.squeeze().float().cpu().clamp_(0, 1).numpy()
if output.ndim == 3:
output = np.transpose(
output[[2, 1, 0], :, :], (1, 2, 0)
output = (output * 255.0).round().astype(np.uint8) # float32 to uint8
output_path = "/tmp/out.png"
cv2.imwrite(output_path, output)
return Path(output_path)
class Swin2SR:
def __init__(self, args):
self.args = args
self.device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
self.model = self.define_model()
self.model = self.model.to(self.device)
def define_model(self):
if self.args.task == 'classical_sr':
model = net(upscale=self.args.scale, in_chans=3, img_size=self.args.training_patch_size, window_size=8,
img_range=1., depths=[6, 6, 6, 6, 6, 6], embed_dim=180, num_heads=[6, 6, 6, 6, 6, 6],
mlp_ratio=2, upsampler='pixelshuffle', resi_connection='1conv')
param_key_g = 'params'
elif self.args.task in ['lightweight_sr']:
model = net(upscale=self.args.scale, in_chans=3, img_size=64, window_size=8,
img_range=1., depths=[6, 6, 6, 6], embed_dim=60, num_heads=[6, 6, 6, 6],
mlp_ratio=2, upsampler='pixelshuffledirect', resi_connection='1conv')
param_key_g = 'params'
elif self.args.task == 'compressed_sr':
model = net(upscale=self.args.scale, in_chans=3, img_size=self.args.training_patch_size, window_size=8,
img_range=1., depths=[6, 6, 6, 6, 6, 6], embed_dim=180, num_heads=[6, 6, 6, 6, 6, 6],
mlp_ratio=2, upsampler='pixelshuffle_aux', resi_connection='1conv')
param_key_g = 'params'
elif self.args.task == 'real_sr':
if not self.args.large_model:
model = net(upscale=self.args.scale, in_chans=3, img_size=64, window_size=8,
img_range=1., depths=[6, 6, 6, 6, 6, 6], embed_dim=180, num_heads=[6, 6, 6, 6, 6, 6],
mlp_ratio=2, upsampler='nearest+conv', resi_connection='1conv')
model = net(upscale=self.args.scale, in_chans=3, img_size=64, window_size=8,
img_range=1., depths=[6, 6, 6, 6, 6, 6, 6, 6, 6], embed_dim=240,
num_heads=[8, 8, 8, 8, 8, 8, 8, 8, 8],
mlp_ratio=2, upsampler='nearest+conv', resi_connection='3conv')
param_key_g = 'params_ema'
elif self.args.task == 'jpeg_car':
model = net(upscale=1, in_chans=1, img_size=126, window_size=7,
img_range=255., depths=[6, 6, 6, 6, 6, 6], embed_dim=180, num_heads=[6, 6, 6, 6, 6, 6],
mlp_ratio=2, upsampler='', resi_connection='1conv')
param_key_g = 'params'
elif self.args.task == 'color_jpeg_car':
model = net(upscale=1, in_chans=3, img_size=126, window_size=7,
img_range=255., depths=[6, 6, 6, 6, 6, 6], embed_dim=180, num_heads=[6, 6, 6, 6, 6, 6],
mlp_ratio=2, upsampler='', resi_connection='1conv')
param_key_g = 'params'
pretrained_model = torch.load(self.args.model_path)
model.load_state_dict(pretrained_model[param_key_g] if param_key_g in pretrained_model.keys() else pretrained_model,
return model
def setup(self):
if self.args.task in ['classical_sr', 'lightweight_sr', 'compressed_sr']:
save_dir = f'results/swin2sr_{self.args.task}_x{self.args.scale}'
if self.args.save_img_only:
folder = self.args.folder_lq
folder = self.args.folder_gt
border = self.args.scale
window_size = 8
class SwinTransformerBlock(nn.Module):
r""" Swin Transformer Block.
dim (int): Number of input channels.
input_resolution (tuple[int]): Input resulotion.
num_heads (int): Number of attention heads.
window_size (int): Window size.
shift_size (int): Shift size for SW-MSA.
mlp_ratio (float): Ratio of mlp hidden dim to embedding dim.
qkv_bias (bool, optional): If True, add a learnable bias to query, key, value. Default: True
drop (float, optional): Dropout rate. Default: 0.0
attn_drop (float, optional): Attention dropout rate. Default: 0.0
drop_path (float, optional): Stochastic depth rate. Default: 0.0
act_layer (nn.Module, optional): Activation layer. Default: nn.GELU
norm_layer (nn.Module, optional): Normalization layer. Default: nn.LayerNorm
pretrained_window_size (int): Window size in pre-training.
def __init__(self, dim, input_resolution, num_heads, window_size=7, shift_size=0,
mlp_ratio=4., qkv_bias=True, drop=0., attn_drop=0., drop_path=0.,
act_layer=nn.GELU, norm_layer=nn.LayerNorm, pretrained_window_size=0):
self.dim = dim
self.input_resolution = input_resolution
self.num_heads = num_heads
self.window_size = window_size
self.shift_size = shift_size
self.mlp_ratio = mlp_ratio
if min(self.input_resolution) <= self.window_size:
# if window size is larger than input resolution, we don't partition windows
self.shift_size = 0
self.window_size = min(self.input_resolution)
assert 0 <= self.shift_size < self.window_size, "shift_size must in 0-window_size"
self.norm1 = norm_layer(dim)
self.attn = WindowAttention(
dim, window_size=to_2tuple(self.window_size), num_heads=num_heads,
qkv_bias=qkv_bias, attn_drop=attn_drop, proj_drop=drop,
self.drop_path = DropPath(drop_path) if drop_path > 0. else nn.Identity()
self.norm2 = norm_layer(dim)
mlp_hidden_dim = int(dim * mlp_ratio)
self.mlp = Mlp(in_features=dim, hidden_features=mlp_hidden_dim, act_layer=act_layer, drop=drop)
if self.shift_size > 0:
attn_mask = self.calculate_mask(self.input_resolution)
attn_mask = None
self.register_buffer("attn_mask", attn_mask)
def calculate_mask(self, x_size):
# calculate attention mask for SW-MSA
H, W = x_size
img_mask = torch.zeros((1, H, W, 1)) # 1 H W 1
h_slices = (slice(0, -self.window_size),
slice(-self.window_size, -self.shift_size),
slice(-self.shift_size, None))
w_slices = (slice(0, -self.window_size),
slice(-self.window_size, -self.shift_size),
slice(-self.shift_size, None))
cnt = 0
for h in h_slices:
for w in w_slices:
img_mask[:, h, w, :] = cnt
cnt += 1
mask_windows = window_partition(img_mask, self.window_size) # nW, window_size, window_size, 1
mask_windows = mask_windows.view(-1, self.window_size * self.window_size)
attn_mask = mask_windows.unsqueeze(1) - mask_windows.unsqueeze(2)
attn_mask = attn_mask.masked_fill(attn_mask != 0, float(-100.0)).masked_fill(attn_mask == 0, float(0.0))
return attn_mask
def forward(self, x):
x: input features with shape of (B, N, C).
B, N, C = x.shape
shortcut = x
x = self.norm1(x)
x = x.view(B, N, C)
if self.shift_size > 0:
shifted_x = torch.roll(x, shifts=(-self.shift_size, -self.shift_size), dims=(1, 2))
shifted_x = x
x = self.attn(x, mask=self.attn_mask)
x = shortcut + self.drop_path(x)
x = x + self.drop_path(self.mlp(self.norm2(x)))
return x
这是一个实现Swin Transformer模型的Python程序文件。Swin Transformer是一种用于压缩图像超分辨率和恢复的模型,具体细节可以参考论文https://arxiv.org/abs/2209.11345。
SwinTransformerBlock是Swin Transformer的一个基本模块,包括窗口注意力和多层感知机。
整个程序文件实现了Swin Transformer模型的核心组件,可以用于图像超分辨率和恢复任务。
class ImageLoader:
def __init__(self, debug=False, norm=True, resize=None):
self.debug = debug
self.norm = norm
self.resize = resize
def load_img(self, filename):
img = cv2.imread(filename)
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
if self.norm:
img = img / 255.
img = img.astype(np.float32)
if self.debug:
print (img.shape, img.dtype, img.min(), img.max())
if self.resize:
img = cv2.resize(img, (self.resize[0], self.resize[1]))
return img
def plot_all(self, images, axis='off', figsize=(16, 8)):
fig = plt.figure(figsize=figsize, dpi=80)
nplots = len(images)
for i in range(nplots):
import cv2
import torch
import numpy as np
class ImageMetrics:
def __init__(self, input_order='HWC'):
self.input_order = input_order
def calculate_psnr(self, img1, img2, crop_border, test_y_channel=False):
assert img1.shape == img2.shape, (f'Image shapes are differnet: {img1.shape}, {img2.shape}.')
if self.input_order not in ['HWC', 'CHW']:
raise ValueError(f'Wrong input_order {self.input_order}. Supported input_orders are ' '"HWC" and "CHW"')
img1 = self.reorder_image(img1)
img2 = self.reorder_image(img2)
img1 = img1.astype(np.float64)
img2 = img2.astype(np.float64)
if crop_border != 0:
img1 = img1[crop_border:-crop_border, crop_border:-crop_border, ...]
img2 = img2[crop_border:-crop_border, crop_border:-crop_border, ...]
if test_y_channel:
img1 = self.to_y_channel(img1)
img2 = self.to_y_channel(img2)
mse = np.mean((img1 - img2) ** 2)
if mse == 0:
return float('inf')
return 20. * np.log10(255. / np.sqrt(mse))
calculate_psnr(img1, img2, crop_border, input_order=‘HWC’, test_y_channel=False):计算图像的峰值信噪比(PSNR)指标。
_ssim(img1, img2):计算图像的结构相似性(SSIM)指标。
calculate_ssim(img1, img2, crop_border, input_order=‘HWC’, test_y_channel=False):计算图像的结构相似性(SSIM)指标。
calculate_psnrb(img1, img2, crop_border, input_order=‘HWC’, test_y_channel=False):计算图像的PSNR-B指标。
reorder_image(img, input_order=‘HWC’):重新排列图像的通道顺序。
bgr2ycbcr(img, y_only=False):将BGR图像转换为YCbCr图像。
该图像超分辨率系统的整体功能是实现图像的超分辨率重建。它使用了基于Swin Transformer的模型进行图像超分辨率处理。系统包含了多个程序文件,每个文件负责不同的功能模块。主要的程序文件包括:
models\network_swin2sr.py:实现Swin Transformer模型的核心组件。定义了多个辅助函数和模块,包括Mlp、WindowAttention和SwinTransformerBlock等。
文件路径 | 功能 |
E:\视觉项目\shop\基于Swin_Transformer的图像超分辨率系统\code\main_test_swin2sr.py | 图像超分辨率重建的测试程序,包括加载模型、预处理图像、进行推理、保存结果和计算评估指标等步骤 |
E:\视觉项目\shop\基于Swin_Transformer的图像超分辨率系统\code\predict.py | 图像超分辨率预测的预测器,定义了Predictor类,负责加载模型并进行预测 |
E:\视觉项目\shop\基于Swin_Transformer的图像超分辨率系统\code\ui.py | 图像超分辨率的PyQt5界面程序,通过界面输入图像路径和参数,调用predict.py中的函数进行超分辨率处理 |
E:\视觉项目\shop\基于Swin_Transformer的图像超分辨率系统\code\models\network_swin2sr.py | 实现Swin Transformer模型的核心组件,包括辅助函数和模块,如Mlp、WindowAttention和SwinTransformerBlock |
E:\视觉项目\shop\基于Swin_Transformer的图像超分辨率系统\code\utils\plots.py | 用于绘制图像的工具文件,包含load_img和plot_all函数,用于加载和绘制图像 |
E:\视觉项目\shop\基于Swin_Transformer的图像超分辨率系统\code\utils\util_calculate_psnr_ssim.py | 用于计算图像质量评估指标的工具文件,包含多个函数,用于计算PSNR、SSIM和其他指标 |
E:\视觉项目\shop\基于Swin_Transformer的图像超分辨率系统\code\utils_init_.py | 空文件,用于标识utils文件夹为Python模块 |
参考该博客提出的RefSR工作,主要观点是将Transformer作为一个attention,这样可以更好地将参考图像(Ref)的纹理信息转移到高质图像(HR)中。做法还是比较有意思的,如下图所示,将上采样的LR图像、依次向下/上采样的Ref图像、原始Ref图像中提取的纹理特征分别作为Q、K、V。纹理Transformer包含了4个结构:1)DNN实现的可学习的纹理提取器(learnable texture extractor)2)相关性嵌入模块( relevance embedding)3)用于纹理转换的硬注意力模块(hard-attention)4)用于纹理合成的软注意力模块(soft-attention)。此外整个纹理Transformer模块可以跨尺度的方式进一步堆叠,这使得能够从不同尺度(例如,从1x倍到4x倍放大率)恢复纹理。
其中RBS为多个残差Block,CSFI为跨尺度特征集成模块(ross-scale feature integration )
2)相关性嵌入模块。使用归一化内积计算Q、K之间的相关性。获得矩阵r i , j r_{i,j}r
3)硬注意力。通过h i = a r g m a x ( r i , j ) h_{i}=argmax(r_{i,j})h
4)软注意力。获得软注意力图s i = a r g m a x ( r i , j ) s_{i}=argmax(r_{i,j})s
L1 loss + GAN loss + Percepture Loss
1)Shallow Feature Extraction 为一层3x3卷积。
2)HQ Image Reconstruction在SR任务中采用sub-pixel Conv,就是unpixelShuffle。denoise和JPEG去伪影用一层卷积。
3)对STL,就是Transformer的Encoder结构。将输入划分为M ∗ M M*MM∗M个块X,然后每个X映射为QKV,通过多头attention后将输出concat。MLP通过两层FC实现。作者还进行了划窗来避免图像块之间的信息不融合问题。步长为M / 2 M/2M/2
此外该博客的作者还用了一个High-frequencyFiltering Module (HFM)提取高频信息,结构如下,仅供参考。
Introducing Turing Image Super Resolution: AI powered image enhancements for Microsoft Edge and Bing Maps
这篇不算论文,是微软介绍自家用于Microsoft Edge和Bing Maps上ISR的技术博客。但是效果非常Amazing啊,但缺点是有些地方没有仔细介绍。
1)人类视觉为基准(Human eyes as the north star)
广泛使用的指标如PSNR,SSIM并不总是和人眼视觉的直观感受匹配的,同时也需要GT图。我们构建了一个并行评估工具匹配人眼判断,并将这个工具作为north star metric来引导模型训练。(可是作者没介绍这个工具是啥55555)
2)噪声建模(Noise modeling)
开始作者也是将HR图像降质然后构建HR-LR图相对训练。但这样有些case效果好,但是对真实的LR图像不鲁棒。因此随机对输入图像用blurring, compression 和 gaussian noise进行破坏可以恢复细节。
3)Perceptual and GAN loss
仅pixel loss不够,要引入感知和GAN loss,并用权重结合。
4)Transformers for vision
在处理高度压缩和从远程卫星拍摄的航拍照片等very noise图像时,Transformer清理噪声做的很好。如人脸的噪声和处理包含很多纹理的森林的特征就很不同。这是因为大数据集和Transformer卓越的远程记忆能力。我们先使用了一个稀疏Transformer,将其放大以支持非常大的序列长度来“Enhance”图像,产生干净的,crisper和更具吸引力,尺寸相同的图像。有些场景不需要放大图像,那到这里就可以停止了。
[1]盘展鸿,朱鉴,迟小羽,等.基于特征融合和注意力机制的图像超分辨率模型[J].计算机应用研究.2022,39(3).DOI:10.19734/j.issn.1001-3695.2021.07.0288 .
[3]Yu-Qi Liu,Xin Du,Hui-Liang Shen,等.Estimating Generalized Gaussian Blur Kernels for Out-of-Focus Image Deblurring[J].IEEE Transactions on Circuits & Systems for Video Technology.2020,31(3).829-843.DOI:10.1109/TCSVT.2020.2990623 .
[4]Shengxiang Zhang,Gaobo Liang,Shuwan Pan,等.A Fast Medical Image Super Resolution Method Based on Deep Learning Network[J].IEEE Access.2018.712319-12327.DOI:10.1109/ACCESS.2018.2871626 .
[5]Huihui Song,Qingshan Liu,Guojie Wang,等.Spatiotemporal Satellite Image Fusion Using Deep Convolutional Neural Networks[J].IEEE journal of selected topics in applied earth observations & remote sensing.2018,11(3).821-829.DOI:10.1109/JSTARS.2018.2797894 .
[6]Park, S.,Serpedin, E.,Qaraqe, K..Gaussian Assumption: The Least Favorable but the Most Useful [Lecture Notes][J].IEEE Signal Processing Magazine.2013,30(3).183-186.
[7]Mittal, A.,Soundararajan, R.,Bovik, A.C..Making a “Completely Blind” Image Quality Analyzer[J].Signal Processing Letters, IEEE.2013,20(3).209-212.DOI:10.1109/LSP.2012.2227726 .
[8]Ogawa, T.,Haseyama, M..Missing Intensity Interpolation Using a Kernel PCA-Based POCS Algorithm and its Applications[J].IEEE Transactions on Image Processing.2011,20(2).
[9]Yang, J.Wright, J.Huang, T.Ma, Y..Image Super-Resolution Via Sparse Representation[J].IEEE Transactions on Image Processing.2010,19(11).2861-2873.
[10]Bovik A.C.,Zhou Wang,Simoncelli E.P.,等.Image quality assessment: from error visibility to structural similarity[J].IEEE Transactions on Image Processing.2004,13(4).