CVPR2022,RealBasicVSR,MMEditing复现,使用REDS数据集模拟进行迁移训练


活动地址:CSDN21天学习挑战赛

1.环境准备

  • 环境配置可以参考这里

2.数据集下载

  • REDS数据集网站
  • 训练集需要下载train_sharp、train_sharp_bicubic
  • 验证集需要下载val_sharp、val_sharp_bicubic
  • 如果下载速度慢,可以使用Motrix进行更快的下载
  • 在mmediting-master目录下新建文件夹data/REDS,将上述压缩包放入REDS文件夹内部

3.数据集准备

  • 按照官方程序要进行下列准备工作,但是按照RealBasicVSR的配置文件里面设定,并不需要进行这些准备,也不需要标注文件,可以直接跳过这些步骤,手动解压到当前目录就好
  • 可参考的MMEditing官方文档
  • 进入到mmediting-master文件夹
    -运行官方文档程序: python tools/data/super-resolution/reds/preprocess_reds_dataset.py ./data/REDS
  • 发现报错:
usage: preprocess_reds_dataset.py [-h] [--root-path ROOT_PATH] [--make-lmdb]
preprocess_reds_dataset.py: error: unrecognized arguments: data/REDS
  • 提示缺少参数,改为:python tools/data/super-resolution/reds/preprocess_reds_dataset.py --root-path=data/REDS
  • 得到输出如下:
Unzip data/REDS\train_sharp.zip to data/REDS\train_sharp
Unzip data/REDS\train_sharp_bicubic.zip to data/REDS\train_sharp_bicubic
Unzip data/REDS\val_sharp_bicubic.zip to data/REDS\val_sharp_bicubic
Unzip data/REDS\val_sharp.zip to data/REDS\val_sharp
Move data/REDS\val_sharp to data/REDS\train_sharp...
Remove data/REDS\val_sharp
Move data/REDS\val_sharp_bicubic\X4 to data/REDS\train_sharp_bicubic\X4...
Remove data/REDS\val_sharp_bicubic
Generate annotation files meta_info_REDS_GT.txt...
  • 自动完成了解压、整理、生成标注文件
  • 到这里就完成了必要的准备工作

4.准备预训练模型

  • 预训练模型参数下载链接
  • 放在mmediting-master/checkpoint文件夹下
  • 正常步骤是分两步训练模型,但官方并没有给出第一步的预训练结果,只给出了两部完成后的模型参数,我们就用这个模型参数进行迁移训练即可
  • 报错:ValueError: persistent_workers option needs num_workers > 0
  • 修改配置文件265开始的三行,禁止使用多线程:
	train_dataloader=dict(samples_per_gpu=1, drop_last=True, persistent_workers = False),
    val_dataloader=dict(samples_per_gpu=1,  persistent_workers = False),
    test_dataloader=dict(samples_per_gpu=1, workers_per_gpu=1, persistent_workers = False),
  • 问题解决
  • 在mmediting-masier文件夹下创建mine_train.py文件(可以从这里下载该文件),内容如下(参考视频):
from cv2 import FlannBasedMatcher
import torch
import mmedit
from mmedit.datasets import build_dataset
from mmedit.models import build_model
from mmedit.apis import train_model
from mmcv import Config
from mmcv.runner import set_random_seed
from mmcv.runner import init_dist
import mmcv
import os.path as osp
import os

from mmedit.datasets.samplers import distributed_sampler

#加载配置文件
cfg = Config.fromfile('configs/restorers/real_basicvsr/realbasicvsr_c64b20_1x30x8_lr5e-5_150k_reds.py')


#指定训练集目录和标注文件
cfg.data.train.dataset.lq_folder = 'data/REDS/train_sharp_bicubic/X4'#低分辨率图像
cfg.data.train.dataset.gt_folder = 'data/REDS/train_sharp'#高分辨率图像
#cfg.data.train.dataset.ann_file = 'data/REDS/meta_info_REDS_GT.txt'#标注文件,SRAnnotationDatset才需要标注文件

#指定验证集目录


#指定预训练模型
#cfg.load_from = 'checkpoint/RealBasicVSR_x4.pth'
#cfg.load_from = 'checkpoint/spynet_20210409-c6c1bd09.pth'
cfg.load_from = 'checkpoint/realbasicvsr_c64b20_1x30x8_lr5e-5_150k_reds_20211104-52f77c2c.pth'
#设置工作目录
cfg.work_dir = 'realbasicvsr_exps'

#配置batch_size
cfg.data.samples_per_gpu = 10#计算资源受限时设小一些
cfg.data.workers_per_gpu = 0#关闭多线程
cfg.data.val_workers_per_gpu = 0

#总迭代次数
cfg.total_iters = 100

#在100次迭代时降低学习率
cfg.lr_config = {}
cfg.lr_config.policy = 'Step'
cfg.lr_config.by_epoch = False
cfg.lr_config.step = [100]
cfg.lr_config.gamma = 0.5

#每20轮进行一次验证
#cfg.evaluation.interval = 10
cfg.checkpoint_config.interval = 10

cfg.log_config.interval = 10

cfg.seed = 0
set_random_seed(0,deterministic=False)

cfg.gpus = 1

#构建数据集
datasets = [build_dataset(cfg.data.train)]

#构建模型
model = build_model(cfg.model, train_cfg = cfg.train_cfg, test_cfg = cfg.test_cfg)

#创建工作路径
mmcv.mkdir_or_exist(osp.abspath(cfg.work_dir))

#额外信息
meta = dict()
if cfg.get('exp_name', None) is None:
    cfg['exp_name'] = osp.splitext(osp.basename(cfg.work_dir))[0]
meta['exp_name'] = cfg.exp_name
meta['mmedit Version'] = mmedit.__version__
meta['seed'] = 0

#启动训练
train_model(model, datasets, cfg, distributed=False, validate=True,meta = meta)
  • 其中batchsize大小,根据计算资源调整,参考数值:colab上面设为4,NVIDIA GTX 1050ti设为10,占用率大概会达到25%-30%
#配置batch_size
cfg.data.samples_per_gpu = 10#计算资源受限时设小一些
  • 运行得到如下输出:
2022-08-14 07:44:36,112 - mmedit - INFO - load checkpoint from http path: https://download.openmmlab.com/mmediting/restorers/basicvsr/spynet_20210409-c6c1bd09.pth
2022-08-14 07:44:38,405 - mmedit - INFO - load checkpoint from torchvision path: torchvision://vgg19
e:\mmediting-master\mmedit\apis\train.py:321: UserWarning: "val_samples_per_gpu/val_workers_per_gpu" have been deprecated. Please use "val_dataloader=dict(samples_per_gpu=1)" instead. Details see https://github.com/open-mmlab/mmediting/pull/201
  warnings.warn('"val_samples_per_gpu/val_workers_per_gpu" have '
2022-08-14 07:44:39,595 - mmedit - INFO - load checkpoint from local path: checkpoint/realbasicvsr_c64b20_1x30x8_lr5e-5_150k_reds_20211104-52f77c2c.pth
2022-08-14 07:44:39,928 - mmedit - INFO - Start running, host: 53423@DESKTOP-IL6J7HR, work_dir: E:\mmediting-master\realbasicvsr_exps
2022-08-14 07:44:39,928 - mmedit - INFO - Hooks will be executed in the following order:
before_run:
(VERY_HIGH   ) StepLrUpdaterHook
(NORMAL      ) CheckpointHook
(NORMAL      ) ExponentialMovingAverageHook
(VERY_LOW    ) TextLoggerHook
(VERY_LOW    ) TensorboardLoggerHook
 --------------------
before_train_epoch:
(VERY_HIGH   ) StepLrUpdaterHook
(LOW         ) IterTimerHook
(VERY_LOW    ) TextLoggerHook
(VERY_LOW    ) TensorboardLoggerHook
 --------------------
before_train_iter:
(VERY_HIGH   ) StepLrUpdaterHook
(LOW         ) IterTimerHook
 --------------------
after_train_iter:
(NORMAL      ) CheckpointHook
(NORMAL      ) ExponentialMovingAverageHook
(LOW         ) IterTimerHook
(LOW         ) EvalIterHook
(VERY_LOW    ) TextLoggerHook
(VERY_LOW    ) TensorboardLoggerHook
 --------------------
after_train_epoch:
(NORMAL      ) CheckpointHook
(VERY_LOW    ) TextLoggerHook
(VERY_LOW    ) TensorboardLoggerHook
 --------------------
before_val_epoch:
(LOW         ) IterTimerHook
(VERY_LOW    ) TextLoggerHook
(VERY_LOW    ) TensorboardLoggerHook
 --------------------
before_val_iter:
(LOW         ) IterTimerHook
 --------------------
after_val_iter:
(LOW         ) IterTimerHook
 --------------------
after_val_epoch:
(VERY_LOW    ) TextLoggerHook
(VERY_LOW    ) TensorboardLoggerHook
 --------------------
after_run:
(VERY_LOW    ) TextLoggerHook
(VERY_LOW    ) TensorboardLoggerHook
 --------------------
2022-08-14 07:44:39,937 - mmedit - INFO - workflow: [('train', 1)], max: 100 iters
2022-08-14 07:44:39,938 - mmedit - INFO - Checkpoints will be saved to E:\mmediting-master\realbasicvsr_exps by HardDiskBackend.
E:\win10\anaconda\lib\site-packages\torch\functional.py:478: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass 
the indexing argument. (Triggered internally at  C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\TensorShape.cpp:2895.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
2022-08-14 07:45:24,545 - mmedit - INFO - Saving checkpoint at 10 iterations
2022-08-14 07:45:32,406 - mmedit - INFO - Iter [10/100] lr_generator: 5.000e-05 lr_discriminator: 1.000e-04, eta: 0:06:51, time: 4.574, data_time: 0.774, memory: 1319, loss_pix: 0.0630, loss_clean: 0.0370, loss_perceptual: 11.5091, loss_gan: 0.1698, loss_d_real: 0.2841, loss_d_fake: 
0.3269
2022-08-14 07:45:48,538 - mmedit - INFO - Saving checkpoint at 20 iterations
2022-08-14 07:45:55,273 - mmedit - INFO - Iter [20/100] lr_generator: 5.000e-05 lr_discriminator: 1.000e-04, eta: 0:04:34, time: 2.296, data_time: 0.509, memory: 1319, loss_pix: 0.0755, loss_clean: 0.0449, loss_perceptual: 14.0248, loss_gan: 0.1512, loss_d_real: 0.3036, loss_d_fake: 
0.3482
2022-08-14 07:46:10,076 - mmedit - INFO - Saving checkpoint at 30 iterations
2022-08-14 07:46:16,937 - mmedit - INFO - Iter [30/100] lr_generator: 5.000e-05 lr_discriminator: 1.000e-04, eta: 0:03:30, time: 2.166, data_time: 0.374, memory: 1319, loss_pix: 0.0457, loss_clean: 0.0257, loss_perceptual: 10.3769, loss_gan: 0.1161, loss_d_real: 0.3726, loss_d_fake: 
0.3367
2022-08-14 07:46:33,574 - mmedit - INFO - Saving checkpoint at 40 iterations
2022-08-14 07:46:40,455 - mmedit - INFO - Iter [40/100] lr_generator: 5.000e-05 lr_discriminator: 1.000e-04, eta: 0:02:50, time: 2.352, data_time: 0.442, memory: 1319, loss_pix: 0.0591, loss_clean: 0.0339, loss_perceptual: 11.7944, loss_gan: 0.1352, loss_d_real: 0.3279, loss_d_fake: 
0.2485
2022-08-14 07:46:56,052 - mmedit - INFO - Saving checkpoint at 50 iterations
2022-08-14 07:47:04,249 - mmedit - INFO - Iter [50/100] lr_generator: 5.000e-05 lr_discriminator: 1.000e-04, eta: 0:02:17, time: 2.379, data_time: 0.344, memory: 1319, loss_pix: 0.0569, loss_clean: 0.0339, loss_perceptual: 11.0197, loss_gan: 0.1292, loss_d_real: 0.3356, loss_d_fake: 
0.3477
2022-08-14 07:47:20,848 - mmedit - INFO - Saving checkpoint at 60 iterations
2022-08-14 07:47:27,431 - mmedit - INFO - Iter [60/100] lr_generator: 5.000e-05 lr_discriminator: 1.000e-04, eta: 0:01:47, time: 2.318, data_time: 0.448, memory: 1319, loss_pix: 0.0653, loss_clean: 0.0415, loss_perceptual: 12.6926, loss_gan: 0.1479, loss_d_real: 0.4131, loss_d_fake: 
0.2747
2022-08-14 07:47:44,179 - mmedit - INFO - Saving checkpoint at 70 iterations
2022-08-14 07:47:51,081 - mmedit - INFO - Iter [70/100] lr_generator: 5.000e-05 lr_discriminator: 1.000e-04, eta: 0:01:19, time: 2.365, data_time: 0.402, memory: 1319, loss_pix: 0.0560, loss_clean: 0.0324, loss_perceptual: 11.1881, loss_gan: 0.1564, loss_d_real: 0.4072, loss_d_fake: 
0.2297
2022-08-14 07:48:07,610 - mmedit - INFO - Saving checkpoint at 80 iterations
2022-08-14 07:48:14,205 - mmedit - INFO - Iter [80/100] lr_generator: 5.000e-05 lr_discriminator: 1.000e-04, eta: 0:00:51, time: 2.312, data_time: 0.402, memory: 1319, loss_pix: 0.0451, loss_clean: 0.0270, loss_perceptual: 10.7104, loss_gan: 0.1207, loss_d_real: 0.2725, loss_d_fake: 
0.3037
2022-08-14 07:48:30,553 - mmedit - INFO - Saving checkpoint at 90 iterations
2022-08-14 07:48:37,088 - mmedit - INFO - Iter [90/100] lr_generator: 5.000e-05 lr_discriminator: 1.000e-04, eta: 0:00:25, time: 2.288, data_time: 0.387, memory: 1319, loss_pix: 0.0764, loss_clean: 0.0481, loss_perceptual: 14.2199, loss_gan: 0.1268, loss_d_real: 0.2559, loss_d_fake: 
0.3601
2022-08-14 07:48:54,308 - mmedit - INFO - Saving checkpoint at 100 iterations
2022-08-14 07:49:00,806 - mmedit - INFO - Iter [100/100]        lr_generator: 5.000e-05 lr_discriminator: 1.000e-04, eta: 0:00:00, time: 2.372, data_time: 0.471, memory: 1319, loss_pix: 0.0666, loss_clean: 0.0353, loss_perceptual: 12.0360, loss_gan: 0.1298, loss_d_real: 0.6540, loss_d_fake: 0.2259
  • 迁移训练后的模型参数存在了mmediting-master\realbasicvsr_exps文件夹下
  • 完成REDS数据集下模拟进行迁移训练

5.遇见报错及解决方案

  • 内存不足,报错如下:
File "e:\mmediting-master\mine_train.py", line 79, in 
    train_model(model, datasets, cfg, distributed=False, validate=True,meta = meta)
  File "e:\mmediting-master\mmedit\apis\train.py", line 105, in train_model
    _non_dist_train(
  File "e:\mmediting-master\mmedit\apis\train.py", line 361, in _non_dist_train
    runner.run(data_loaders, cfg.workflow, cfg.total_iters)
  File "E:\win10\anaconda\lib\site-packages\mmcv\runner\iter_based_runner.py", line 144, in run
    iter_runner(iter_loaders[i], **kwargs)
  File "E:\win10\anaconda\lib\site-packages\mmcv\runner\iter_based_runner.py", line 70, in train
    self.call_hook('after_train_iter')
  File "E:\win10\anaconda\lib\site-packages\mmcv\runner\base_runner.py", line 317, in call_hook
    getattr(hook, fn_name)(self)
  File "e:\mmediting-master\mmedit\core\evaluation\eval_hooks.py", line 42, in after_train_iter
    results = single_gpu_test(
  File "e:\mmediting-master\mmedit\apis\test.py", line 41, in single_gpu_test
    for data in data_loader:
  File "E:\win10\anaconda\lib\site-packages\torch\utils\data\dataloader.py", line 681, in __next__
    data = self._next_data()
  File "E:\win10\anaconda\lib\site-packages\torch\utils\data\dataloader.py", line 721, in _next_data
    data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
  File "E:\win10\anaconda\lib\site-packages\torch\utils\data\_utils\fetch.py", line 49, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "E:\win10\anaconda\lib\site-packages\torch\utils\data\_utils\fetch.py", line 49, in 
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "e:\mmediting-master\mmedit\datasets\base_sr_dataset.py", line 53, in __getitem__
    return self.pipeline(results)
  File "e:\mmediting-master\mmedit\datasets\pipelines\compose.py", line 42, in __call__
    data = t(data)
  File "e:\mmediting-master\mmedit\datasets\pipelines\normalization.py", line 95, in __call__
    results[key] = [
  File "e:\mmediting-master\mmedit\datasets\pipelines\normalization.py", line 96, in 
    v.astype(np.float32) / 255. for v in results[key]
numpy.core._exceptions._ArrayMemoryError: Unable to allocate 10.5 MiB for an array with shape (720, 1280, 3) and data type float32
  • 第51行cfg.evaluation.interval = 10注释掉即可
  • 或者将batchsize改小,根据计算资源调整,参考数值:colab上面设为4,NVIDIA GTX 1050ti设为10,占用率大概会达到25%-30%,如果降到4仍然报错,那么需要尝试别的办法
#配置batch_size
cfg.data.samples_per_gpu = 10#计算资源受限时设小一些
  • 将配置文件中第277行train的dict里面的num_input_frams=15改小,如果将这行注释掉,默认是同一子文件夹下的所有照片的大小。比如我的注释掉默认是2414,会报错,将这个参数的值改小后就可以了,可以先试试极限改为1好不好使

  • 输入的序列不足以支持所选择的轮数,报错如下:
`ValueError: The input sequence is not long enough to support the current choice of [interval] or [num_input_frames]`
  • 一种原因是:指定的训练数据高分辨率和低分辨率文件夹不并列,即进入指定文件夹后两个文件夹里面的结构层次不完全相同
  • 一种原因:很可能是目录结构有问题,基本结构应该是data/REDS/train_sharp/000/00000000.png,即使想更换路径,data/REDS/train_sharp/000可以改变,但是图片名字一定要是00000000.png,00000001.png排下来,否则会报错

  • 找不到文件:
FileNotFoundError: [Errno 2] No such file or directory: 'data/REDS/train_sharp/data/REDS/train_sharp_bicubic/X4\\177\\00000044.png'

  • 如果确认文件是存在的,那么很可能是指定训练集目录的时候在结尾多加了’/‘,如cfg.data.train.dataset.gt_folder = 'data/REDS/train_sharp/',改为cfg.data.train.dataset.gt_folder = 'data/REDS/train_sharp'就不报错了。
  • 或者是路径中(如图片的命名中出现了下划线)也会出现类似的错误,路径中不要含有下划线
  • 文件不存在于数据集中,并且找不见的文件命名方式与数据集中其他文件不相同,很可能是目录结构有问题,基本结构应该是data/REDS/train_sharp/000/00000000.png,即使想更换路径,data/REDS/train_sharp/000可以改变,但是图片名字一定要是00000000.png,00000001.png排下来,否则会报错

你可能感兴趣的:(论文复现,超分,AI比赛与实战,机器学习,生成对抗网络,超分辨率重建,图像处理,计算机视觉)