DCGAN——菜鸟系列 model.py

参考

[1] DGAN代码简读

https://www.colabug.com/2958322.html

[2]基于DCGAN的动漫头像生成神经网络实现

https://blog.csdn.net/sinat_33741547/article/details/77871170

[3] batch norm原理及代码详解

  1. 博客: https://blog.csdn.net/qq_25737169/article/details/79048516
  2. 视频:https://www.bilibili.com/video/av15405597?from=search&seid=8881273700429864348
  3. 博客:https://www.cnblogs.com/eilearn/p/9780696.html

说明:2重点在代码,3重点在理论(但是一点都不枯燥),1有点像两个的结合。推荐2的系列视频以及资料,很精品,且免费。推荐看完视频再看3,而且3中有很多联立起来的理论,奉为神作不过分。

[3] tensorflow,tensorboard可视化网络结构

https://blog.csdn.net/helei001/article/details/51842531

[4] tensorflow变量作用域

https://www.cnblogs.com/MY0213/p/9208503.html

[5] 此接[4],同时解决了当初jupyter notebook的困惑

https://blog.csdn.net/Jerr__y/article/details/70809528#commentBox

[6] 激活函数 ReLU、Leaky ReLU、PReLU 和 RReLU

http://www.cnblogs.com/chamie/p/8665251.html

[7] conv2d函数简介

  • http://www.cnblogs.com/qggg/p/6832342.html

[8] DCGAN 源码分析

  • https://blog.csdn.net/nongfu_spring/article/details/54342861/

[9] DCGAN代码简单读,里面有对应训练效果,这里看中了某个实现效果…so,做个记录先

  • http://www.cnblogs.com/lyrichu/p/9093411.html

[10] 一个写的还算详细,但是排版和我彼此彼此的博客——题目:DCGAN

  • https://blog.csdn.net/Candy_GL/article/details/81138297

[11] tensorflow保存下载模型

  • https://blog.csdn.net/xiezongsheng1990/article/details/81011115

说明:作者正在逐渐熟悉markdown…勿喷

[12] mmp的视频,早看到可以少走N多个坑(注:博主自己理了一遍代码以后,找论文时找到的视频,讲的对我来说正正好)看完视频,完善了部分代码注释

  • https://www.bilibili.com/video/av20033914/?p=13

[13] B站找的视频

  • 其实到最后,疑惑的也就部分tensorflow框架以及代码了,所以此处学习一波回来再去完善一下唉
  • https://www.bilibili.com/video/av19360545/?p=1

代码解释

引入模块

from __future__ import division
import os
import time
import math
from glob import glob
import tensorflow as tf
import numpy as np
from six.moves import xrange

from ops import *
from utils import *
  • from _ future_ import division 这句话当python的版本为2.x时生效,可以让两个整数数字相除的结果返回一个浮点数(在python2中默认是整数,python3默认为浮点数)。
  • glob可以以简单的正则表达式筛选的方式返回某个文件夹下符合要求的文件名列表。[1]

conv_out_size_same 函数

def conv_out_size_same(size, stride):
  return int(math.ceil(float(size) / float(stride)))
  • math.ceil(): ceil() 函数返回数字的上入整数,即向上取整
  • stride : 步幅
  • 整个函数返回了一个整形值,

class DCGAN

init

def __init__(self, sess, input_height=108, input_width=108, crop=True,
         batch_size=64, sample_num = 64, output_height=64,output_width=64,
         y_dim=None, z_dim=100, gf_dim=64, df_dim=64,
         gfc_dim=1024, dfc_dim=1024, c_dim=3, dataset_name='default',
         input_fname_pattern='*.jpg', checkpoint_dir=None, sample_dir=None, data_dir='./data'):
  • 具体参数用到再细谈,此处注意特殊的那个: y_dim就好
  • 以下给出对应注释
"""
    Args:
      sess: TensorFlow session
      batch_size: The size of batch. Should be specified before training.
      y_dim: (optional) Dimension of dim for y. [None]
      z_dim: (optional) Dimension of dim for Z. [100]
      gf_dim: (optional) Dimension of gen filters in first conv layer.[64]
      df_dim: (optional) Dimension of discrim filters in first conv layer[64]
      gfc_dim: (optional) Dimension of gen units for for fully connected layer. [1024]
      dfc_dim: (optional) Dimension of discrim units for fully connected layer. [1024]
      c_dim: (optional) Dimension of image color. For grayscale input, set to 1. [3]
    """

	self.sess = sess
    self.crop = crop

    self.batch_size = batch_size
    
    # 测试要用到,由G生成,看效果
    self.sample_num = sample_num

    self.input_height = input_height
    self.input_width = input_width
    self.output_height = output_height
    self.output_width = output_width

    self.y_dim = y_dim
    
    # Gnerator最开始输入的噪音数据点的维度
    self.z_dim = z_dim
	
    # generator的中卷积网络的filter的维度,以下同理
    self.gf_dim = gf_dim
    self.df_dim = df_dim
	
    # generator全连接的维度
    self.gfc_dim = gfc_dim
    self.dfc_dim = dfc_dim
    
    # batch normalization : deals with poor initialization helps gradient flow
    self.d_bn1 = batch_norm(name='d_bn1')
    self.d_bn2 = batch_norm(name='d_bn2')

    if not self.y_dim:
      self.d_bn3 = batch_norm(name='d_bn3')

    self.g_bn0 = batch_norm(name='g_bn0')
    self.g_bn1 = batch_norm(name='g_bn1')
    self.g_bn2 = batch_norm(name='g_bn2')	
	
    if not self.y_dim:
      self.g_bn3 = batch_norm(name='g_bn3')

    self.dataset_name = dataset_name
    self.input_fname_pattern = input_fname_pattern
    self.checkpoint_dir = checkpoint_dir
    self.data_dir = data_dir
  • 这段就是赋值而已
  • batch_norm 的水很深,需要一定功夫研究,具体参考[3],
  • 此处贴出作者的一份回答:如果在每一层之后都归一化成0-1的高斯分布(减均值除方差)那么数据的分布一直都是高斯分布,数据分布都是固定的了,这样即使加更多层就没有意义了,深度网络就是想学习数据的分布发现规律性,BN就是不让学习的数据分布偏离太远,详细细节你可以去看论文。beta gama都是学习的,代码里他们定义的是variable, trainable是True。——个人感觉很有道理
  • 具体学习详看[3]

if self.dataset_name == 'mnist':
      self.data_X, self.data_y = self.load_mnist()
      self.c_dim = self.data_X[0].shape[-1]
else:
      data_path = os.path.join(self.data_dir, self.dataset_name, self.input_fname_pattern) 
      #返回的数据类型是list  
      self.data = glob(data_path)               #用它可以查找符合自己目的的文件
    
      if len(self.data) == 0:
        raise Exception("[!] No data found in '" + data_path + "'")
        
      np.random.shuffle(self.data)              #打乱数据
      
       #读取一张图片	
      imreadImg = imread(self.data[0])
        
      #check if image is a non-grayscale image by checking channel number  
      if len(imreadImg.shape) >= 3: 
        self.c_dim = imread(self.data[0]).shape[-1]
      else:
        self.c_dim = 1

      if len(self.data) < self.batch_size:
        raise Exception("[!] Entire dataset size is less than the configured batch_size")
  • DCGAN的构造方法除了设置一大堆的属性之外,还要注意区分dataset是否是mnist,因为mnist是灰度图像,所以应该设置channel = 1( self.c_dim = 1 ),如果是彩色图像,则 self.c_dim = 3 or self.c_dim = 4[1]
  • load_mnist为自定义方法,方法实现如下
  • 其余代码均有注释,整个代码块只是在读入数据集,并进行对应的可能报错处理。

def load_mnist(self):
    
    data_dir = os.path.join(self.data_dir, self.dataset_name)    #用于路径拼接文件路径
    
    '''
    train-images-idx3-ubyte:
        参考网址:https://www.jianshu.com/p/84f72791806f
        简介:这是IDX文件格式,是一种用来存储向量与多维度矩阵的文件格式           
    '''
    fd = open(os.path.join(data_dir,'train-images-idx3-ubyte'))  #创建文件对象
    loaded = np.fromfile(file=fd,dtype=np.uint8)                 #numpy 读取文件    
    trX = loaded[16:].reshape((60000,28,28,1)).astype(np.float)  #对应的数据处理

    fd = open(os.path.join(data_dir,'train-labels-idx1-ubyte'))
    loaded = np.fromfile(file=fd,dtype=np.uint8)
    trY = loaded[8:].reshape((60000)).astype(np.float)

    fd = open(os.path.join(data_dir,'t10k-images-idx3-ubyte'))
    loaded = np.fromfile(file=fd,dtype=np.uint8)
    teX = loaded[16:].reshape((10000,28,28,1)).astype(np.float)

    fd = open(os.path.join(data_dir,'t10k-labels-idx1-ubyte'))
    loaded = np.fromfile(file=fd,dtype=np.uint8)
    teY = loaded[8:].reshape((10000)).astype(np.float)

    # 将结构数据转化为ndarray,  例子详见: https://www.jb51.net/article/138281.htm
    # 可以理解为:两个变量用法同一份内存,只是数据格式不同
    trY = np.asarray(trY)
    teY = np.asarray(teY)
    
    # 数组拼接
    #concatenate([a, b])
    #连接,连接后ndim不变,a和b可以有一维size不同,但size不同的维度必须是要连接的维度
    X = np.concatenate((trX, teX), axis=0)
    y = np.concatenate((trY, teY), axis=0).astype(np.int)
    
    # seed
    # http://www.cnblogs.com/subic/p/8454025.html
    # 简单介绍:
    seed = 547
    np.random.seed(seed)
    np.random.shuffle(X)      #np.random.shuffle                  
    np.random.seed(seed)
    np.random.shuffle(y)
    
    y_vec = np.zeros((len(y), self.y_dim), dtype=np.float)
    for i, label in enumerate(y):
      y_vec[i,y[i]] = 1.0
    
    return X/255.,y_vec
  • 具体参考注释内容

	#是否为灰度
    self.grayscale = (self.c_dim == 1)

	#自定义的方法
    self.build_model()
  • 无特殊说明

build_model(self)

def build_model(self):
    
    #可以理解为申明了一个(batch_size,y_dim)大小的地方,用于mini_batc时往里面喂对应的数据。
    if self.y_dim:
      self.y = tf.placeholder(tf.float32, [self.batch_size, self.y_dim], name='y')
    else:
      self.y = None
	
    #is_crop为要不要裁剪图像 [True or False]
    #所以此处是判断图像是否裁剪过
    # https://www.jianshu.com/p/3e46ce8e7ddd
    if self.crop:
      # 把输出维度定义好  
      image_dims = [self.output_height, self.output_width, self.c_dim]
    else:  
      image_dims = [self.input_height, self.input_width, self.c_dim]

    #道理同上面解释,真实图片的输入[batchsize,height,width,c_dim]
    self.inputs = tf.placeholder(
      tf.float32, [self.batch_size] + image_dims, name='real_images')

    inputs = self.inputs
	
    # 噪音数据(None表示多少都可以,实际用到再填充batch)
    self.z = tf.placeholder(
      tf.float32, [None, self.z_dim], name='z')
    
    # 位于 opt.py里
    # histogram_summary = tf.histogram_summary
    # 将z在tensorboard里可视化
    self.z_sum = histogram_summary("z", self.z)
	
    # 以下为网络结构 
    self.G                  = self.generator(self.z, self.y)
    #真实数据
    self.D, self.D_logits   = self.discriminator(inputs, self.y, reuse=False)
    
    # 测试网络
    self.sampler            = self.sampler(self.z, self.y)
    # G产生的数据
    self.D_, self.D_logits_ = self.discriminator(self.G, self.y, reuse=True)
    
    # 同上可视化
    self.d_sum = histogram_summary("d", self.D)
    self.d__sum = histogram_summary("d_", self.D_)
    
    # opt.py
    # image_summary = tf.image_summary
    # tensorboar内图像可视化 : https://blog.csdn.net/dxmkkk/article/details/54925728
    self.G_sum = image_summary("G", self.G)
  • 显而易见,网络结构才是重点,其他都是陪衬。大致介绍参考[1]:

    self.generator 用于构造生成器; self.discriminator 用于构造鉴别器; self.sampler 用于随机采样(用于生成样本)。这里需要注意的是, self.y 只有当dataset是mnist的时候才不为None,不是mnist的情况下,只需要 self.z 即可生成samples


sigmoid_cross_entropy_with_logits(x, y) (内置于build_model)

 def sigmoid_cross_entropy_with_logits(x, y):
      try:
        return tf.nn.sigmoid_cross_entropy_with_logits(logits=x, labels=y)
      except:
        return tf.nn.sigmoid_cross_entropy_with_logits(logits=x, targets=y)
  • [1] sigmoid_cross_entropy_with_logits 函数被重新定义了,是为了兼容不同版本的tensorflow。该函数首先使用sigmoid activation,然后计算cross-entropy loss。
  • 这个函数有自己的定义,可以参考: https://blog.csdn.net/QW_sunny/article/details/72885403
  • 作者为对此函数进行过深了解,仅限于清楚大致处理过程。

​ 承接上面代码

	# 真实图片的鉴别器损失
    # D_logits,鉴别网络对真实数据的评分, tf.ones_like(self.D),与D大小一样的1.
    # tf.nn.sigmoid(h3), h3
    # 可以理解为两参数分别为x,y。  x是网络产生,y是实际希望,然后通过这个函数计算出一个损失值。不同于一般的平方求和,此函数解决了sigmoid梯度缓慢的问题,并且另外考虑了溢出的问题。
    self.d_loss_real = tf.reduce_mean(  
      sigmoid_cross_entropy_with_logits(self.D_logits, tf.ones_like(self.D)))
    # 虚假图片损失
    self.d_loss_fake = tf.reduce_mean(
      sigmoid_cross_entropy_with_logits(self.D_logits_, tf.zeros_like(self.D_)))
    
    # 生成器损失
    self.g_loss = tf.reduce_mean(
      sigmoid_cross_entropy_with_logits(self.D_logits_, tf.ones_like(self.D_)))
	
    # 可视化
    self.d_loss_real_sum = scalar_summary("d_loss_real", self.d_loss_real)
    self.d_loss_fake_sum = scalar_summary("d_loss_fake", self.d_loss_fake)
    
    # 总的鉴别器损失,越小越好
    self.d_loss = self.d_loss_real + self.d_loss_fake
  • sigmoid_cross_entropy_with_logits的output不是一个数,而是一个batch中每个样本的loss,所以一般配合tf.reduce_mean(loss)使用
  • self.g_loss 是生成器损失; self.d_loss_real 是真实图片的鉴别器损失; self.d_loss_fake 是虚假图片(由生成器生成的fake images)的损失; self.d_loss 是总的鉴别器损失。

	# 可视化
    self.g_loss_sum = scalar_summary("g_loss", self.g_loss)
    self.d_loss_sum = scalar_summary("d_loss", self.d_loss)

    # 所有的变量
    t_vars = tf.trainable_variables()

    # 寻找所有以d_开头的变量,优化时的参数(例如:如果是梯度下降,我们需要将权重和loss传进去,这里d_vars代表generator里要优化的变量)
    self.d_vars = [var for var in t_vars if 'd_' in var.name]
    self.g_vars = [var for var in t_vars if 'g_' in var.name]

    # 保存节点
    self.saver = tf.train.Saver()
  • [1] tf.trainable_variables() 可以获取model的全部可训练参数,由于我们在定义生成器和鉴别器变量的时候使用了不同的name,因此我们可以通过variable的name来获取得到 self.d_vars (鉴别器相关变量), self.g_vars (生成器相关变量)。
  • self.saver = tf.train.Saver() 用于保存训练好的模型参数到checkpoint。

discriminator

def discriminator(self, image, y=None, reuse=False):
    
    # 变量共享,我理解为: 即使函数被重复调用,其共享的节点不会重复定义、声明 [5]
    with tf.variable_scope("discriminator") as scope:
      if reuse:
        scope.reuse_variables()
	
      # 如果没有这一维	
      if not self.y_dim:
        # conv2d简介具体看 [7],但第二个参数的介绍是不对的,补上[8].
        '''
        参数简介:
        	第一个参数input:具有[batch, in_height, in_width, in_channels]这样的shape
        	第二个参数filter:conv2d中输出的特征图个数,是个1维的参数,即output_dim,					output_dim是 conv2d函数的第二个入参,由外部传入。比如,下面的这句话,表示h1					是输入,通过卷积之后,输出的特征图个数为gf_dim* *4,这里gf_dim = 128,则输出				 特征图为128*4=512个。即这里一共有512个卷积核。

        	第三个参数strides:卷积时在图像每一维的步长,这是一个一维的向量,长度4。[1, d_h, 				d_w, 1],其中第一个对应一次跳过batch中的多少图片,第二个d_h对应一次跳过图片中				多少行,第三个d_w对应一次跳过图片中多少列,第四个对应一次跳过图像的多少个通道。这				里直接设置为[1,2,2,1]。即每次卷积后,图像的滑动步长为2,特征图会缩小为原来的					1/4。
        	
			第四个参数padding:string类型的量,只能是"SAME","VALID"其中之一,这个值决定了不				 同的卷积方式
        	第五个参数:use_cudnn_on_gpu:bool类型,是否使用cudnn加速,默认为true
        	第六个参数:name=None
        '''     
        # df_dim 作为类的参数传进来
        # df_dim: (optional) Dimension of discrim filters in first conv layer[64]
        # lrelu 在opt.py 中
        # 输入是一个 image
        h0 = lrelu(conv2d(image, self.df_dim, name='d_h0_conv'))
        h1 = lrelu(self.d_bn1(conv2d(h0, self.df_dim*2, name='d_h1_conv')))
        h2 = lrelu(self.d_bn2(conv2d(h1, self.df_dim*4, name='d_h2_conv')))
        h3 = lrelu(self.d_bn3(conv2d(h2, self.df_dim*8, name='d_h3_conv')))
        
        #全连接层,-1表示不确定大小。 linear 在opt.py中,自己构建的。
        h4 = linear(tf.reshape(h3, [self.batch_size, -1]), 1, 'd_h4_lin')
        return tf.nn.sigmoid(h4), h4
    
      else:
        # y_dim = 10
        yb = tf.reshape(y, [self.batch_size, 1, 1, self.y_dim])
        
        # image被当作参数传进来的参数,Builder里是inputs.
        # 原型在opt里,基于concat函数基础上修改。这里并没有理解为什么要这样操作....
        # 这里代码以及构建不难,但是这里所有的concat尚且不了解原因。    
        x = conv_cond_concat(image, yb)

        h0 = lrelu(conv2d(x, self.c_dim + self.y_dim, name='d_h0_conv'))
        h0 = conv_cond_concat(h0, yb)

        h1 = lrelu(self.d_bn1(conv2d(h0, self.df_dim + self.y_dim, 																	name='d_h1_conv')))
        h1 = tf.reshape(h1, [self.batch_size, -1])      
        h1 = concat([h1, y], 1)
        
        h2 = lrelu(self.d_bn2(linear(h1, self.dfc_dim, 'd_h2_lin')))
        h2 = concat([h2, y], 1)

        h3 = linear(h2, 1, 'd_h3_lin')
        
        return tf.nn.sigmoid(h3), h3
  • [1] 下面是discriminator(鉴别器)的具体实现。
  • 首先鉴别器使用 conv (卷积)操作,激活函数使用 leaky-relu ,每一个layer需要使用batch normalization。tensorflow的batch normalization使用 tf.contrib.layers.batch_norm 实现。
  • 如果不是mnist,则第一层使用 leaky-relu+conv2d ,后面三层都使用 conv2d+BN+leaky-relu ,最后加上一个one hidden unit的linear layer,再送入sigmoid函数即可;
  • 如果是mnist,则 yb = tf.reshape(y, [self.batch_size, 1, 1, self.y_dim]) 首先给y增加两维,以便可以和image连接起来,这里实际上是使用了conditional GAN(条件GAN)的思想。
  • x = conv_cond_concat(image, yb) 得到condition和image合并之后的结果,然后 h0 = lrelu(conv2d(x, self.c_dim + self.y_dim, name=‘d_h0_conv’)) 进行卷积操作。第二次进行 conv2d+leaky-relu+concat 操作。第三次进行 conv2d+BN+leaky-relu+reshape+concat 操作。第四次进行 linear+BN+leaky-relu+concat 操作。最后同样是 linear+sigmoid 操作。

补充:卷积网络

作者了解卷积神经网络的基础知识,但未真正自己建立过卷积神经网络,此处不甘心于函数,所以此处去重新学习了下整个卷积网络的构建,详见 卷积网络学习系列1.md

作者在温习以后才去标注的discriminator的卷积注释,所以写的不够小白。具体原理详见上。

generator

 def generator(self, z, y=None):
        
    with tf.variable_scope("generator") as scope:
      
      	
      if not self.y_dim:
        s_h, s_w = self.output_height, self.output_width
        
        # return int(math.ceil(float(size) / float(stride)))
        s_h2, s_w2 = conv_out_size_same(s_h, 2), conv_out_size_same(s_w, 2)
        s_h4, s_w4 = conv_out_size_same(s_h2, 2), conv_out_size_same(s_w2, 2)
        s_h8, s_w8 = conv_out_size_same(s_h4, 2), conv_out_size_same(s_w4, 2)
        s_h16, s_w16 = conv_out_size_same(s_h8, 2), conv_out_size_same(s_w8, 2)

        # project `z` and reshape
        #self.z = tf.placeholder(tf.float32, [None, self.z_dim], name='z')
        # gf_dim是generator的filter大小的基,*8 *4 *2 *1 c_dim这是他的变化规律,下面代码内找找仔细观察看看
        self.z_, self.h0_w, self.h0_b = linear(
            z, self.gf_dim*8*s_h16*s_w16, 'g_h0_lin', with_w=True)
		
        # 确定三个维度,可以计算出最后一个维度,那么这个维度可以用-1表示。 将全连接层reshape成特征图形状
        self.h0 = tf.reshape(
            self.z_, [-1, s_h16, s_w16, self.gf_dim * 8])
        
        h0 = tf.nn.relu(self.g_bn0(self.h0))
		
        # 产生batch_size个 设置好大小的特征图
        self.h1, self.h1_w, self.h1_b = deconv2d(
            h0, [self.batch_size, s_h8, s_w8, self.gf_dim*4], name='g_h1', 							with_w=True)
        h1 = tf.nn.relu(self.g_bn1(self.h1))

        h2, self.h2_w, self.h2_b = deconv2d(
            h1, [self.batch_size, s_h4, s_w4, self.gf_dim*2], name='g_h2', 							with_w=True)
        h2 = tf.nn.relu(self.g_bn2(h2))

        h3, self.h3_w, self.h3_b = deconv2d(
            h2, [self.batch_size, s_h2, s_w2, self.gf_dim*1], name='g_h3', 							with_w=True)
        h3 = tf.nn.relu(self.g_bn3(h3))
	
    	# c_dim,通道数 3
        h4, self.h4_w, self.h4_b = deconv2d(
            h3, [self.batch_size, s_h, s_w, self.c_dim], name='g_h4', with_w=True)
        return tf.nn.tanh(h4)
    
      else:
        s_h, s_w = self.output_height, self.output_width
        s_h2, s_h4 = int(s_h/2), int(s_h/4)
        s_w2, s_w4 = int(s_w/2), int(s_w/4)

        # yb = tf.expand_dims(tf.expand_dims(y, 1),2)
        yb = tf.reshape(y, [self.batch_size, 1, 1, self.y_dim])
        z = concat([z, y], 1)

        h0 = tf.nn.relu(
            self.g_bn0(linear(z, self.gfc_dim, 'g_h0_lin')))
        h0 = concat([h0, y], 1)

        h1 = tf.nn.relu(self.g_bn1(
            linear(h0, self.gf_dim*2*s_h4*s_w4, 'g_h1_lin')))
        h1 = tf.reshape(h1, [self.batch_size, s_h4, s_w4, self.gf_dim * 2])

        h1 = conv_cond_concat(h1, yb)

        h2 = tf.nn.relu(self.g_bn2(deconv2d(h1,
            [self.batch_size, s_h2, s_w2, self.gf_dim * 2], name='g_h2')))
        h2 = conv_cond_concat(h2, yb)

        return tf.nn.sigmoid(
            deconv2d(h2, [self.batch_size, s_h, s_w, self.c_dim], name='g_h3'))
  • [1]下面是generator(生成器)的具体实现。和discriminator不同的是,generator需要使用deconv(反卷积)以及relu 激活函数。
  • generator的结构是:
  • 1.如果不是mnist: linear+reshape+BN+relu---->(deconv+BN+relu)x3 ---->deconv+tanh ;
  • 2.如果是mnist,则除了需要考虑输入z之外,还需要考虑label y,即需要将z和y连接起来(Conditional GAN),具体的结构是:reshape+concat---->linear+BN+relu+concat---->linear+BN+relu+reshape+concat---->deconv+BN+relu+concat---->deconv+sigmoid。
  • 注意的最后的激活函数没有采用通常的tanh,而是采用了sigmoid(其输出会直接映射到0-1之间)。

大佬们都不解释为什么要这样构建模型(补充:其实看里很久才知道,大佬们不说是因为原论文就是这样写构建的…不必纠结这个结构,因为还没到能设计结构的水平),所以,我只能硬着头皮自己解释。个人感觉,generator是从一个小输入,不断变大,变大,变成图片大小的过程,反卷积就用在这里。

def linear(input_, output_size, scope=None, stddev=0.02, bias_start=0.0, with_w=False):
  
  shape = input_.get_shape().as_list()

  with tf.variable_scope(scope or "Linear"):
    try:
      matrix = tf.get_variable("Matrix", [shape[1], output_size], tf.float32,
                 tf.random_normal_initializer(stddev=stddev))
    except ValueError as err:
        msg = "NOTE: Usually, this is due to an issue with the image dimensions.  Did you correctly set '--crop' or '--input_height' or '--output_height'?"
        err.args = err.args + (msg,)
        raise
      bias = tf.get_variable("bias", [output_size],
      initializer=tf.constant_initializer(bias_start))
    
    # 是否输出权重
    if with_w:
      return tf.matmul(input_, matrix) + bias, matrix, bias
    else:
      return tf.matmul(input_, matrix) + bias

反卷积

  • 恭喜,又是一个没听过的知识点…
  • 此处另外学习反卷积,详见反卷积系列。
def deconv2d(input_, output_shape,
       k_h=5, k_w=5, d_h=2, d_w=2, stddev=0.02,
       name="deconv2d", with_w=False):
  with tf.variable_scope(name):
    
    # filter : [height, width, output_channels, in_channels]
    w = tf.get_variable('w', [k_h, k_w, output_shape[-1], input_.get_shape()[-1]],
              initializer=tf.random_normal_initializer(stddev=stddev))
    
    try:
      deconv = tf.nn.conv2d_transpose(input_, w, output_shape=output_shape,
                strides=[1, d_h, d_w, 1])

    # Support for verisons of TensorFlow before 0.7.0
    except AttributeError:
      # d_h,d_w参考原论文  
      deconv = tf.nn.deconv2d(input_, w, output_shape=output_shape,
                strides=[1, d_h, d_w, 1])
	
    # b的大小跟输出的特征图个数一样
    biases = tf.get_variable('biases', [output_shape[-1]], initializer=tf.constant_initializer(0.0))
    # 先反卷积,后加偏置
    deconv = tf.reshape(tf.nn.bias_add(deconv, biases), deconv.get_shape())

    # 是否输出权重
    if with_w:
      return deconv, w, biases
    else:
      return deconv

sampler

def sampler(self, z, y=None):
    
    with tf.variable_scope("generator") as scope:
      scope.reuse_variables()

      if not self.y_dim:
        s_h, s_w = self.output_height, self.output_width
        s_h2, s_w2 = conv_out_size_same(s_h, 2), conv_out_size_same(s_w, 2)
        s_h4, s_w4 = conv_out_size_same(s_h2, 2), conv_out_size_same(s_w2, 2)
        s_h8, s_w8 = conv_out_size_same(s_h4, 2), conv_out_size_same(s_w4, 2)
        s_h16, s_w16 = conv_out_size_same(s_h8, 2), conv_out_size_same(s_w8, 2)

        # project `z` and reshape
        h0 = tf.reshape(
            linear(z, self.gf_dim*8*s_h16*s_w16, 'g_h0_lin'),
            [-1, s_h16, s_w16, self.gf_dim * 8])
        h0 = tf.nn.relu(self.g_bn0(h0, train=False))

        h1 = deconv2d(h0, [self.batch_size, s_h8, s_w8, self.gf_dim*4], name='g_h1')
        h1 = tf.nn.relu(self.g_bn1(h1, train=False))

        h2 = deconv2d(h1, [self.batch_size, s_h4, s_w4, self.gf_dim*2], name='g_h2')
        h2 = tf.nn.relu(self.g_bn2(h2, train=False))

        h3 = deconv2d(h2, [self.batch_size, s_h2, s_w2, self.gf_dim*1], name='g_h3')
        h3 = tf.nn.relu(self.g_bn3(h3, train=False))

        h4 = deconv2d(h3, [self.batch_size, s_h, s_w, self.c_dim], name='g_h4')

        return tf.nn.tanh(h4)
    
      else:
        s_h, s_w = self.output_height, self.output_width
        s_h2, s_h4 = int(s_h/2), int(s_h/4)
        s_w2, s_w4 = int(s_w/2), int(s_w/4)

        # yb = tf.reshape(y, [-1, 1, 1, self.y_dim])
        yb = tf.reshape(y, [self.batch_size, 1, 1, self.y_dim])
        z = concat([z, y], 1)

        h0 = tf.nn.relu(self.g_bn0(linear(z, self.gfc_dim, 'g_h0_lin'), train=False))
        h0 = concat([h0, y], 1)

        h1 = tf.nn.relu(self.g_bn1(
            linear(h0, self.gf_dim*2*s_h4*s_w4, 'g_h1_lin'), train=False))
        h1 = tf.reshape(h1, [self.batch_size, s_h4, s_w4, self.gf_dim * 2])
        h1 = conv_cond_concat(h1, yb)

        h2 = tf.nn.relu(self.g_bn2(
            deconv2d(h1, [self.batch_size, s_h2, s_w2, self.gf_dim * 2], name='g_h2'), 							train=False))
        h2 = conv_cond_concat(h2, yb)

        return tf.nn.sigmoid(deconv2d(h2, [self.batch_size, s_h, s_w, self.c_dim], 							name='g_h3'))
  • sampler函数是采样函数,用于生成样本送入当前训练的生成器,查看训练效果。其逻辑和generator函数基本类似,也是需要区分是否是mnist,二者需要采用不同的结构。不是mnist时,y=None即可;否则mnist还需要考虑y。
  • 如果单纯去看代码,前面那么代码的铺垫已经够了,但是,不理解为什么要用这个sampler,有什么用,怎么感觉他在做和generator一样的事儿。
  • 具体为什么用它,得等到最后我去啃DGCAN原论文时候再了解吧。

DCGAN原文解读

  • 此处为看完train代码后补的,单单看懂代码的大致逻辑很容易,但是细节完全把握不住,由于前面没有取细读论文,这里集中爆发了,感觉确实需要读完论文再往下做,所以,开始论文时间。

train 函数

  • 补充,终于到这儿了,突然感觉真的好多啊,以前没有这样做过,但这样来一遍,发现自己啥都不是,学到了很多,但也知道很多自己都没写出来,水平有限。当然,万分感谢那些我一头水雾时找到的大佬博客,动力之源啊。
  • 这应该是最难啃的一块骨头了,目测代码量瑟瑟发抖…
def train(self, config):
    
    # 用adam优化 
    d_optim = tf.train.AdamOptimizer(config.learning_rate, beta1=config.beta1) \
              .minimize(self.d_loss, var_list=self.d_vars)
        
    g_optim = tf.train.AdamOptimizer(config.learning_rate, beta1=config.beta1) \
              .minimize(self.g_loss, var_list=self.g_vars)
    
    #避免tensorflow版本问题报错 
    try:
      tf.global_variables_initializer().run()
    except:
      tf.initialize_all_variables().run()
	
    # merge_summary 函数和 SummaryWriter 用于构建summary,在tensorboard中显示。
    self.g_sum = merge_summary([self.z_sum, self.d__sum,
      self.G_sum, self.d_loss_fake_sum, self.g_loss_sum])
    self.d_sum = merge_summary(
        [self.z_sum, self.d_sum, self.d_loss_real_sum, self.d_loss_sum])
    self.writer = SummaryWriter("./logs", self.sess.graph)

    # 噪音向量 [个数,维度]
    sample_z = np.random.uniform(-1, 1, size=(self.sample_num , self.z_dim))
    
    if config.dataset == 'mnist':
      sample_inputs = self.data_X[0:self.sample_num]
      sample_labels = self.data_y[0:self.sample_num]
    else:
      #读取图片并且改变大小  
      sample_files = self.data[0:self.sample_num]
      sample = [
          get_image(sample_file,
                    input_height=self.input_height,
                    input_width=self.input_width,
                    resize_height=self.output_height,
                    resize_width=self.output_width,
                    crop=self.crop,
                    grayscale=self.grayscale) for sample_file in sample_files]    
    #如果处理的图像是灰度图像,则需要再增加一个dim,表示图像的 channel=1 
    if (self.grayscale):
        sample_inputs = np.array(sample).astype(np.float32)[:, :, :, None]
      else:
        sample_inputs = np.array(sample).astype(np.float32)
  
    counter = 1
    start_time = time.time()
    
    # 下载一波模型
    could_load, checkpoint_counter = self.load(self.checkpoint_dir)
    
    # 下载成功
    if could_load:
      counter = checkpoint_counter
      print(" [*] Load SUCCESS")
    else:
      print(" [!] Load failed...")

    # 循环训练开始
    for epoch in xrange(config.epoch):    
      if config.dataset == 'mnist':
        batch_idxs = min(len(self.data_X), config.train_size) // config.batch_size
      else: 
        # 数据集所有数据文件名称
        self.data = glob(os.path.join(
          config.data_dir, config.dataset, self.input_fname_pattern))
        # 打乱
        np.random.shuffle(self.data)
         
        # bath_size 表示一次指定几张图片    
        batch_idxs = min(len(self.data), config.train_size) // config.batch_size

      for idx in xrange(0, int(batch_idxs)):
        
        #每次循环的数据集抽取
        if config.dataset == 'mnist':
          batch_images = self.data_X[idx*config.batch_size:(idx+1)*config.batch_size]
          batch_labels = self.data_y[idx*config.batch_size:(idx+1)*config.batch_size]
        else:
          batch_files = self.data[idx*config.batch_size:(idx+1)*config.batch_size]
          batch = [
              get_image(batch_file,
                        input_height=self.input_height,
                        input_width=self.input_width,
                        resize_height=self.output_height,
                        resize_width=self.output_width,
                        crop=self.crop,
                        grayscale=self.grayscale) for batch_file in batch_files]
          if self.grayscale:
            batch_images = np.array(batch).astype(np.float32)[:, :, :, None]
          else:
            batch_images = np.array(batch).astype(np.float32)

            
        batch_z = np.random.uniform(-1, 1, [config.batch_size, self.z_dim]) \
              .astype(np.float32)

        # run    
        if config.dataset == 'mnist':
          # Update D network
          _, summary_str = self.sess.run([d_optim, self.d_sum],
            feed_dict={ 
              self.inputs: batch_images,
              self.z: batch_z,
              self.y:batch_labels,
            })
          self.writer.add_summary(summary_str, counter)

          # Update G network
          _, summary_str = self.sess.run([g_optim, self.g_sum],
            feed_dict={
              self.z: batch_z, 
              self.y:batch_labels,
            })
          self.writer.add_summary(summary_str, counter)

          # Run g_optim twice to make sure that d_loss does not go to zero (different from paper)
          _, summary_str = self.sess.run([g_optim, self.g_sum],
            feed_dict={ self.z: batch_z, self.y:batch_labels })
            
          self.writer.add_summary(summary_str, counter)
          
          errD_fake = self.d_loss_fake.eval({
              self.z: batch_z, 
              self.y:batch_labels
          })
          errD_real = self.d_loss_real.eval({
              self.inputs: batch_images,
              self.y:batch_labels
          })
          errG = self.g_loss.eval({
              self.z: batch_z,
              self.y: batch_labels
          })
        else:
          # Update D network
          _, summary_str = self.sess.run([d_optim, self.d_sum],
            feed_dict={ self.inputs: batch_images, self.z: batch_z })
          self.writer.add_summary(summary_str, counter)

          # Update G network
          _, summary_str = self.sess.run([g_optim, self.g_sum],
            feed_dict={ self.z: batch_z })
          self.writer.add_summary(summary_str, counter)

          # Run g_optim twice to make sure that d_loss does not go to zero (different from paper)
          _, summary_str = self.sess.run([g_optim, self.g_sum],
            feed_dict={ self.z: batch_z })
          self.writer.add_summary(summary_str, counter)
          
          errD_fake = self.d_loss_fake.eval({ self.z: batch_z })
          errD_real = self.d_loss_real.eval({ self.inputs: batch_images })
          errG = self.g_loss.eval({self.z: batch_z})

        counter += 1
        print("Epoch: [%2d/%2d] [%4d/%4d] time: %4.4f, d_loss: %.8f, g_loss: %.8f" \
          % (epoch, config.epoch, idx, batch_idxs,
            time.time() - start_time, errD_fake+errD_real, errG))
		
        '''
        np.mod(counter, config.print_every) == 1 表示每 print_every 次生成一次samples; 			np.mod(counter, config.checkpoint_every) == 2 表示每 checkpoint_every 次保存一下		checkpoint file。
        '''
        if np.mod(counter, 100) == 1:
          if config.dataset == 'mnist':
            samples, d_loss, g_loss = self.sess.run(
              [self.sampler, self.d_loss, self.g_loss],
              feed_dict={
                  self.z: sample_z,
                  self.inputs: sample_inputs,
                  self.y:sample_labels,
              }
            )
            save_images(samples, image_manifold_size(samples.shape[0]),
                  './{}/train_{:02d}_{:04d}.png'.format(config.sample_dir, epoch, idx))
            print("[Sample] d_loss: %.8f, g_loss: %.8f" % (d_loss, g_loss)) 
          else:
            try:
              samples, d_loss, g_loss = self.sess.run(
                [self.sampler, self.d_loss, self.g_loss],
                feed_dict={
                    self.z: sample_z,
                    self.inputs: sample_inputs,
                },
              )
              save_images(samples, image_manifold_size(samples.shape[0]),
                    './{}/train_{:02d}_{:04d}.png'.format(config.sample_dir, epoch, idx))
              print("[Sample] d_loss: %.8f, g_loss: %.8f" % (d_loss, g_loss)) 
            except:
              print("one pic error!...")

        if np.mod(counter, 500) == 2:
          self.save(config.checkpoint_dir, counter)
  • 下载模型的代码 [11]
def load(self, checkpoint_dir):
    import re
    print(" [*] Reading checkpoints...")
    checkpoint_dir = os.path.join(checkpoint_dir, self.model_dir)
	
    ckpt = tf.train.get_checkpoint_state(checkpoint_dir)
    if ckpt and ckpt.model_checkpoint_path:
        
      ckpt_name = os.path.basename(ckpt.model_checkpoint_path)
    
      self.saver.restore(self.sess, os.path.join(checkpoint_dir, ckpt_name))
      
      # 不太清楚这句话有什么用
      counter = int(next(re.finditer("(\d+)(?!.*\d)",ckpt_name)).group(0))
      print(" [*] Success to read {}".format(ckpt_name))
      return True, counter
    else:
      print(" [*] Failed to find a checkpoint")
      return False, 0

你可能感兴趣的:(deeplearning)