轻轻松松使用StyleGAN2(五):StyleGAN2 Encoder源代码初探+中文注释,projector.py和project_images.py

StyleGAN2 Encoder使用18x512的dlatents进行迭代优化,实现对目标图片的“无限逼近”并重建高质量图像,保存对应的dlatents用于后续处理。

获取StyleGAN2 Encoder,GitHub上的链接是:

https://github.com/rolux/stylegan2encoder

StyleGAN2 Encoder改写了StyleGAN2的projector.py,增加了project_images.py,其实现的效果要优于StyleGAN2自带的run_projector.py。

StyleGAN2 Encoder的底层依然依赖于StyleGAN2核心的G_main类(生成器)、G_mapping类(映射网络)、G_synthesis_stylegan2类(合成网络)、D_stylegan2类(辨别器)等,这些类的源代码可以查看:./training/networks_stylegan2.py。

在训练过程中,通过对dlatent_avg(dlatent_avg是W空间dlatents向量(18x512)的平均值)进行预测和更新来实现输入的迭代,并且在每次迭代过程中引入噪声,即:迭代的输入 = 迭代的dlatent_avg + 噪声;同时,学习率也在每步迭代过程中进行更改;将迭代的输入通过StyleGAN2网络生成预测图像,再利用VGG16网络计算预测图像和目标图像之间的loss。

在训练的开始,StyleGAN2随机产生10000个向量,并用它们的平均值作为训练的起点,简化了训练过程。

StyleGAN2没有使用tensorflow自带的方法,而是使用dnnlib.tflib来构建优化函数。dnnlib.tflib是英伟达团队在tensorflow上构建的一组库函数,以便于代码维护,也有助于理解代码结构。

为解决StyleGAN存在的伪影(artifacts)问题,StyleGAN2对StyleGAN的合成网络进行了修改,如下图所示:

轻轻松松使用StyleGAN2(五):StyleGAN2 Encoder源代码初探+中文注释,projector.py和project_images.py_第1张图片

其中,(a)是StyleGAN的合成网络,(b)是StyleGAN的实现细节;(c)是StyleGAN2的合成网络,(d)是StyleGAN2的实现细节,其具体说明以及改动的原因,可以查看StyleGAN2的论文:

论文地址:https://arxiv.org/pdf/1912.04958.pdf

总体上,StyleGAN2 Encoder基于真实人脸图片重建高质量、可操控的人脸图像,较之第一代StyleGAN Encoder效果更好,但同样的GPU条件下(比如:NVIDIA GeForce RTX 2080Ti)运行时间更长。

我们花时间研究了一下StyleGAN2 Encoder的projector.py和project_images.py的源代码,其整体代码结构如下图所示,其中浅绿色部分为projector.py,白色部分为project_images.py:

轻轻松松使用StyleGAN2(五):StyleGAN2 Encoder源代码初探+中文注释,projector.py和project_images.py_第2张图片

具体的源代码(含中文注释)如下:

projector.py

import numpy as np
import tensorflow as tf
import dnnlib
import dnnlib.tflib as tflib

from training import misc

#----------------------------------------------------------------------------

class Projector:
    def __init__(self,
        # 初始化变量
        # vgg16_pkl                       = 'https://drive.google.com/uc?id=1N2-m9qszOeVC9Tq77WxsLnuWwOedQiD2',
        vgg16_pkl=                      '.\models\\vgg16_zhang_perceptual.pkl',
        num_steps                       = 1000,
        initial_learning_rate           = 0.1,
        initial_noise_factor            = 0.05,
        verbose                         = False
    ):

        self.vgg16_pkl                  = vgg16_pkl
        self.num_steps                  = num_steps
        self.dlatent_avg_samples        = 10000
        self.initial_learning_rate      = initial_learning_rate
        self.initial_noise_factor       = initial_noise_factor
        self.lr_rampdown_length         = 0.25
        self.lr_rampup_length           = 0.05
        self.noise_ramp_length          = 0.75
        self.regularize_noise_weight    = 1e5
        self.verbose                    = verbose
        self.clone_net                  = True

        self._Gs                    = None
        self._minibatch_size        = None
        self._dlatent_avg           = None
        self._dlatent_std           = None
        self._noise_vars            = None
        self._noise_init_op         = None
        self._noise_normalize_op    = None
        self._dlatents_var          = None
        self._noise_in              = None
        self._dlatents_expr         = None
        self._images_expr           = None
        self._target_images_var     = None
        self._lpips                 = None
        self._dist                  = None
        self._loss                  = None
        self._reg_sizes             = None
        self._lrate_in              = None
        self._opt                   = None
        self._opt_step              = None
        self._cur_step              = None

    # 显示信息
    def _info(self, *args):
        if self.verbose:
            print('Projector:', *args)

    # 设置StyleGAN网络
    def set_network(self, Gs, minibatch_size=1):
        assert minibatch_size == 1
        self._Gs = Gs
        self._minibatch_size = minibatch_size
        if self._Gs is None:
            return
        if self.clone_net:
            self._Gs = self._Gs.clone()

        # 向量空间18x512
        # Find dlatent stats.
        self._info('Finding W midpoint and stddev using %d samples...' % self.dlatent_avg_samples)
        # 以123为种子,随机产生dlatent_avg_samples=10000个样本值,赋值给latent_samples
        latent_samples = np.random.RandomState(123).randn(self.dlatent_avg_samples, *self._Gs.input_shapes[0][1:])
        # 将latent_samples映射到W空间,赋值给dlatent_samples
        # dlatent,即:disentangled latent,映射到18x512向量,可以分别控制不同层次的人脸特征,因此说是disentangled(解纠缠)
        # components.mapping = tflib.Network('G_mapping'......),见:./training/networks_stylegan2.py
        dlatent_samples = self._Gs.components.mapping.run(latent_samples, None) # (10000, 18, 512)
        # 计算均值、方差
        # _dlatent_avg是_dlatent_var优化迭代的初始值
        self._dlatent_avg = np.mean(dlatent_samples, axis=0, keepdims=True) # (1, 18, 512)
        self._dlatent_std = (np.sum((dlatent_samples - self._dlatent_avg) ** 2) / self.dlatent_avg_samples) ** 0.5
        self._info('std = %g' % self._dlatent_std)

        # Find noise inputs.
        self._info('Setting up noise inputs...')
        self._noise_vars = []
        noise_init_ops = []
        noise_normalize_ops = []
        # 初始化噪声,变量名形如'G_synthesis/noiseXX',完成后break退出
        while True:
            n = 'G_synthesis/noise%d' % len(self._noise_vars)
            if not n in self._Gs.vars:
                break
            v = self._Gs.vars[n]
            self._noise_vars.append(v)
            # 将正态分布的随机噪声赋值给v
            noise_init_ops.append(tf.assign(v, tf.random_normal(tf.shape(v), dtype=tf.float32)))
            # 计算均值、方差
            noise_mean = tf.reduce_mean(v)
            noise_std = tf.reduce_mean((v - noise_mean)**2)**0.5
            # 正则化
            noise_normalize_ops.append(tf.assign(v, (v - noise_mean) / noise_std))
            self._info(n, v)
        # 将噪声组成group
        self._noise_init_op = tf.group(*noise_init_ops)
        self._noise_normalize_op = tf.group(*noise_normalize_ops)

        # Image output graph.
        self._info('Building image output graph...')
        # 定义变量_dlatents_var,初始化为0,在start函数中赋初始值为_dlatent_avg
        self._dlatents_var = tf.Variable(tf.zeros([self._minibatch_size] + list(self._dlatent_avg.shape[1:])), name='dlatents_var') # (1,18,512)
        # 定义输入_noise_in
        self._noise_in = tf.placeholder(tf.float32, [], name='noise_in')
        dlatents_noise = tf.random.normal(shape=self._dlatents_var.shape) * self._noise_in  # 与正态分布的随机向量相乘
        # 求和,赋值给_dlatents_expr
        self._dlatents_expr = self._dlatents_var + dlatents_noise
        # components.synthesis将_dlatents_expr生成图像向量
        self._images_expr = self._Gs.components.synthesis.get_output_for(self._dlatents_expr, randomize_noise=False)

        # Downsample image to 256x256 if it's larger than that. VGG was built for 224x224 images.
        # 若生成的图像尺寸超过256x256,把图片调整到256x256
        proc_images_expr = (self._images_expr + 1) * (255 / 2)
        sh = proc_images_expr.shape.as_list()  # 将proc_images_expr.shape元组转换为list
        if sh[2] > 256:
            factor = sh[2] // 256   # 基于256x256,计算图像的倍数
            # 对超过256x256的像素取平均值,并压缩维度,运算后的shape是[-1, sh[1], sh[2]//factor, sh[2]//factor]
            proc_images_expr = tf.reduce_mean(tf.reshape(proc_images_expr, [-1, sh[1], sh[2] // factor, factor, sh[2] // factor, factor]), axis=[3,5])

        # Loss graph.
        self._info('Building loss graph...')
        # 定义变量_target_images_var
        # 该变量会在下面的start函数中赋初始值
        self._target_images_var = tf.Variable(tf.zeros(proc_images_expr.shape), name='target_images_var')
        if self._lpips is None:
            # 加载vgg16模型
            self._lpips = misc.load_pkl(self.vgg16_pkl) # vgg16_zhang_perceptual.pkl
        # 使用vgg16模型将proc_images_expr与_target_images_var比较,计算差值
        # 变量_target_images_var会在self.start函数中赋值
        self._dist = self._lpips.get_output_for(proc_images_expr, self._target_images_var)
        # 对差值求和,计算_loss
        self._loss = tf.reduce_sum(self._dist)

        # Noise regularization graph.
        self._info('Building noise regularization graph...')
        reg_loss = 0.0
        for v in self._noise_vars:
            sz = v.shape[2]
            while True:
                # 沿axis=3滚动一个位置,相乘,平方,沿axis=2滚动一个位置,相乘,取平均数,平方
                # 沿axis=2滚动一个位置,相乘,平方,沿axis=2滚动一个位置,相乘,取平均数,平方
                # 求和
                reg_loss += tf.reduce_mean(v * tf.roll(v, shift=1, axis=3))**2 + tf.reduce_mean(v * tf.roll(v, shift=1, axis=2))**2
                if sz <= 8:
                    break # Small enough already
                v = tf.reshape(v, [1, 1, sz//2, 2, sz//2, 2]) # Downscale,维度减半
                v = tf.reduce_mean(v, axis=[3, 5]) # 求平均数,压缩维度
                sz = sz // 2
        self._loss += reg_loss * self.regularize_noise_weight

        # Optimizer.
        # 用dnnlib.tflib构建优化函数,定义优化迭代计算_opt_step
        self._info('Setting up optimizer...')
        self._lrate_in = tf.placeholder(tf.float32, [], name='lrate_in') # _lrate_in是一个输入项
        self._opt = dnnlib.tflib.Optimizer(learning_rate=self._lrate_in) # 定义优化函数
        self._opt.register_gradients(self._loss, [self._dlatents_var] + self._noise_vars) # 梯度,_loss是输出,[_dlatents_var] + _noise_vars是输入
        self._opt_step = self._opt.apply_updates()

    # 运行
    def run(self, target_images):
        # Run to completion.
        # 开始,完成初始化
        self.start(target_images)
        # 迭代
        while self._cur_step < self.num_steps:
            self.step()

        # Collect results.
        # 返回结果
        pres = dnnlib.EasyDict()
        pres.dlatents = self.get_dlatents()
        pres.noises = self.get_noises()
        pres.images = self.get_images()
        return pres

    # 开始
    def start(self, target_images):
        assert self._Gs is not None

        # Prepare target images.
        # 准备目标图像(组),即:优化迭代的目标对象
        self._info('Preparing target images...')
        target_images = np.asarray(target_images, dtype='float32')
        target_images = (target_images + 1) * (255 / 2)
        sh = target_images.shape
        assert sh[0] == self._minibatch_size
        # 如果目标图像尺寸太大,就按照_target_images_var的尺寸缩小到256x256
        if sh[2] > self._target_images_var.shape[2]:
            factor = sh[2] // self._target_images_var.shape[2]
            target_images = np.reshape(target_images, [-1, sh[1], sh[2] // factor, factor, sh[3] // factor, factor]).mean((3, 5))

        # Initialize optimization state.
        self._info('Initializing optimization state...')
        # 设置_target_images_var变量、_dlatents_var变量
        # 把_dlatent_avg作为_dlatents_var优化迭代的起点,_target_images_var为优化迭代的目标图像(组)
        tflib.set_vars({self._target_images_var: target_images, self._dlatents_var: np.tile(self._dlatent_avg, [self._minibatch_size, 1, 1])})
        # 初始化噪声
        tflib.run(self._noise_init_op)
        # 复位优化器状态
        self._opt.reset_optimizer_state()
        # 迭代计数从0开始
        self._cur_step = 0

    # 迭代
    def step(self):
        assert self._cur_step is not None
        if self._cur_step >= self.num_steps:
            return
        if self._cur_step == 0:
            self._info('Running...')

        # Hyperparameters.
        t = self._cur_step / self.num_steps # 完成比例
        # 噪声强度 = 当前dlatent标准差 * 初始噪声因子 * 噪声斜面长度剩余比例的平方
        noise_strength = self._dlatent_std * self.initial_noise_factor * max(0.0, 1.0 - t / self.noise_ramp_length) ** 2
        # 计算学习率
        lr_ramp = min(1.0, (1.0 - t) / self.lr_rampdown_length)
        lr_ramp = 0.5 - 0.5 * np.cos(lr_ramp * np.pi)
        lr_ramp = lr_ramp * min(1.0, t / self.lr_rampup_length)
        learning_rate = self.initial_learning_rate * lr_ramp

        # Train.
        # 训练,tflib.run
        feed_dict = {self._noise_in: noise_strength, self._lrate_in: learning_rate}
        _, dist_value, loss_value = tflib.run([self._opt_step, self._dist, self._loss], feed_dict)
        tflib.run(self._noise_normalize_op) # 每次迭代,随机生成噪声

        # Print status.
        # 打印/显示迭代状态
        self._cur_step += 1
        if self._cur_step == self.num_steps or self._cur_step % 10 == 0:
            self._info('%-8d%-12g%-12g' % (self._cur_step, dist_value, loss_value))
        if self._cur_step == self.num_steps:
            self._info('Done.')

    def get_cur_step(self):
        return self._cur_step

    def get_dlatents(self):
        return tflib.run(self._dlatents_expr, {self._noise_in: 0})

    def get_noises(self):
        return tflib.run(self._noise_vars)

    def get_images(self):
        return tflib.run(self._images_expr, {self._noise_in: 0})

project_images.py

import argparse
import os
import shutil
import numpy as np

import dnnlib
import dnnlib.tflib as tflib
import pretrained_networks
import projector
import dataset_tool
from training import dataset
from training import misc


def project_image(proj, src_file, dst_dir, tmp_dir, video=False):

    data_dir = '%s/dataset' % tmp_dir  # ./stylegan2-tmp/dataset
    if os.path.exists(data_dir):
        shutil.rmtree(data_dir)
    image_dir = '%s/images' % data_dir  # ./stylegan2-tmp/dataset/images
    tfrecord_dir = '%s/tfrecords' % data_dir  # ./stylegan2-tmp/dataset/tfrecords
    os.makedirs(image_dir, exist_ok=True)
    # 将源图片文件copy到./stylegan2-tmp/dataset/images下
    shutil.copy(src_file, image_dir + '/')
    # 在./stylegan2-tmp/dataset/tfrecords下生成tfrecord临时文件
    # tfrecord临时文件序列化存储了不同lod下的图像的shape和数据
    # 举例,如果图像是1024x1024,则tfr_file命名从10--2,如:tfrecords-r10.tfrecords...tfrecords-r05.tfrecords...
    dataset_tool.create_from_images(tfrecord_dir, image_dir, shuffle=0)
    # TFRecordDataset类在“dataset.py”中定义,从一组.tfrecords文件中加载数据集到dataset_obj
    # load_dataset是个helper函数,用于构建dataset对象(在TFRecordDataset类创建对象实例时完成)
    dataset_obj = dataset.load_dataset(
        data_dir=data_dir, tfrecord_dir='tfrecords',
        max_label_size=0, repeat=False, shuffle_mb=0
    )

    # 生成用于优化迭代的目标图像(组)
    print('Projecting image "%s"...' % os.path.basename(src_file))
    # 取下一个minibatch=1作为Numpy数组
    images, _labels = dataset_obj.get_minibatch_np(1)
    # 把images的取值从[0. 255]区间调整到[-1, 1]区间
    images = misc.adjust_dynamic_range(images, [0, 255], [-1, 1])
    # Projector初始化:start
    proj.start(images)
    if video:
        video_dir = '%s/video' % tmp_dir
        os.makedirs(video_dir, exist_ok=True)
    while proj.get_cur_step() < proj.num_steps:
        print('\r%d / %d ... ' % (proj.get_cur_step(), proj.num_steps), end='', flush=True)
        # Projector优化迭代:step
        proj.step()
        # 如果配置了video选项,将优化过程图像存入./ stylegan2 - tmp / video
        if video:
            filename = '%s/%08d.png' % (video_dir, proj.get_cur_step())
            misc.save_image_grid(proj.get_images(), filename, drange=[-1,1])
    print('\r%-30s\r' % '', end='', flush=True)

    # 在目的地目录中保存图像,保存dlatents文件
    os.makedirs(dst_dir, exist_ok=True)
    filename = os.path.join(dst_dir, os.path.basename(src_file)[:-4] + '.png')
    misc.save_image_grid(proj.get_images(), filename, drange=[-1,1])
    filename = os.path.join(dst_dir, os.path.basename(src_file)[:-4] + '.npy')
    np.save(filename, proj.get_dlatents()[0])


def render_video(src_file, dst_dir, tmp_dir, num_frames, mode, size, fps, codec, bitrate):

    import PIL.Image
    import moviepy.editor

    def render_frame(t):
        frame = np.clip(np.ceil(t * fps), 1, num_frames)
        image = PIL.Image.open('%s/video/%08d.png' % (tmp_dir, frame))
        if mode == 1:
            canvas = image
        else:
            canvas = PIL.Image.new('RGB', (2 * src_size, src_size))
            canvas.paste(src_image, (0, 0))
            canvas.paste(image, (src_size, 0))
        if size != src_size:
            canvas = canvas.resize((mode * size, size), PIL.Image.LANCZOS)
        return np.array(canvas)

    src_image = PIL.Image.open(src_file)
    src_size = src_image.size[1]
    duration = num_frames / fps
    filename = os.path.join(dst_dir, os.path.basename(src_file)[:-4] + '.mp4')
    video_clip = moviepy.editor.VideoClip(render_frame, duration=duration)
    video_clip.write_videofile(filename, fps=fps, codec=codec, bitrate=bitrate)


def main():
    # 解析命令行参数
    parser = argparse.ArgumentParser(description='Project real-world images into StyleGAN2 latent space')
    parser.add_argument('src_dir', help='Directory with aligned images for projection')
    parser.add_argument('dst_dir', help='Output directory')
    parser.add_argument('--tmp-dir', default='./stylegan2-tmp', help='Temporary directory for tfrecords and video frames')
    parser.add_argument('--network-pkl', default='gdrive:networks/stylegan2-ffhq-config-f.pkl', help='StyleGAN2 network pickle filename')
    parser.add_argument('--vgg16-pkl', default='https://drive.google.com/uc?id=1N2-m9qszOeVC9Tq77WxsLnuWwOedQiD2', help='VGG16 network pickle filename')
    parser.add_argument('--num-steps', type=int, default=1000, help='Number of optimization steps')
    parser.add_argument('--initial-learning-rate', type=float, default=0.1, help='Initial learning rate')
    parser.add_argument('--initial-noise-factor', type=float, default=0.05, help='Initial noise factor')
    parser.add_argument('--verbose', type=bool, default=False, help='Verbose output')
    parser.add_argument('--video', type=bool, default=False, help='Render video of the optimization process')
    parser.add_argument('--video-mode', type=int, default=1, help='Video mode: 1 for optimization only, 2 for source + optimization')
    parser.add_argument('--video-size', type=int, default=1024, help='Video size (height in px)')
    parser.add_argument('--video-fps', type=int, default=25, help='Video framerate')
    parser.add_argument('--video-codec', default='libx264', help='Video codec')
    parser.add_argument('--video-bitrate', default='5M', help='Video bitrate')
    args = parser.parse_args()

    print('Loading networks from "%s"...' % args.network_pkl)
    _G, _D, Gs = pretrained_networks.load_networks(args.network_pkl)
    # 调用Projector
    proj = projector.Projector(
        vgg16_pkl             = args.vgg16_pkl,
        num_steps             = args.num_steps,
        initial_learning_rate = args.initial_learning_rate,
        initial_noise_factor  = args.initial_noise_factor,
        verbose               = args.verbose
    )
    # 为Projector设定StyleGAN2网络模型
    proj.set_network(Gs)

    src_files = sorted([os.path.join(args.src_dir, f) for f in os.listdir(args.src_dir) if f[0] not in '._'])
    # 遍历源文件目录下的所有图片
    for src_file in src_files:
        # 调用project_image
        project_image(proj, src_file, args.dst_dir, args.tmp_dir, video=args.video)
        # 如果配置了video选项,调用render_video,将优化过程图像写入视频流
        if args.video:
            render_video(
                src_file, args.dst_dir, args.tmp_dir, args.num_steps, args.video_mode,
                args.video_size, args.video_fps, args.video_codec, args.video_bitrate
            )
        shutil.rmtree(args.tmp_dir)


if __name__ == '__main__':
    main()
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          

(完)

你可能感兴趣的:(轻轻松松使用StyleGAN2(五):StyleGAN2 Encoder源代码初探+中文注释,projector.py和project_images.py)