cifar10官方例程学习记录

整理学习到的东西,以备以后可以查看。

第一部分:训练

一、使用FLAGS设置参数

初始化FLAGS:tf.app.flags.FLAGS。

定义参数:tf.app.flags.DEFINE_xxx,xxx为参数类型。传入的第一个参数为变量名,如train_dir,可通过FLAGS.train_dir取得该变量的值。第二个参数为默认值,第三个参数为说明内容。当不设置该变量的值时,通过FLAGS.train_dir取到的是默认值,即/tmp/cifar10_train,若要设置该变量的值,可通过运行时写参数--train_dir test_dir来设置,即python cifar10_train.py --train_dir test_dir,cifar10_train就是这段代码的文件名,这样FLAGS.train_dir取到的就是test_dir。如果键入-h或--help,就会打印说明内容。其他变量类似。

​
FLAGS = tf.app.flags.FLAGS

tf.app.flags.DEFINE_string('train_dir', '/tmp/cifar10_train',
                           """Directory where to write event logs """
                           """and checkpoint.""")
tf.app.flags.DEFINE_integer('max_steps', 100000,
                            """Number of batches to run.""")
tf.app.flags.DEFINE_boolean('log_device_placement', False,
                            """Whether to log device placement.""")
tf.app.flags.DEFINE_integer('log_frequency', 10,
                            """How often to log results to the console.""")

​

程序开始执行部分代码:

def main(argv=None):  # pylint: disable=unused-argument
  cifar10.maybe_download_and_extract()
  if tf.gfile.Exists(FLAGS.train_dir):
    tf.gfile.DeleteRecursively(FLAGS.train_dir)
  tf.gfile.MakeDirs(FLAGS.train_dir)
  train()


if __name__ == '__main__':
  tf.app.run()

tf.app.run(),若未传入参数,则执行main(argv=...)函数,该函数同时会解析FLAGS。若不执行tf.app.run(),FLAGS不能正常使用。在main函数第一步,执行下载和解压数据集。

二、下载和解压数据集

def maybe_download_and_extract():
  """Download and extract the tarball from Alex's website."""
  dest_directory = FLAGS.data_dir
  if not os.path.exists(dest_directory):
    os.makedirs(dest_directory)
  filename = DATA_URL.split('/')[-1]
  filepath = os.path.join(dest_directory, filename)
  if not os.path.exists(filepath):
    def _progress(count, block_size, total_size):
      sys.stdout.write('\r>> Downloading %s %.1f%%' % (filename,
          float(count * block_size) / float(total_size) * 100.0))
      sys.stdout.flush()
    filepath, _ = urllib.request.urlretrieve(DATA_URL, filepath, _progress)
    print()
    statinfo = os.stat(filepath)
    print('Successfully downloaded', filename, statinfo.st_size, 'bytes.')
  extracted_dir_path = os.path.join(dest_directory, 'cifar-10-batches-bin')
  if not os.path.exists(extracted_dir_path):
    tarfile.open(filepath, 'r:gz').extractall(dest_directory)

文件名为cifar10.py,这里可以设置下载路径,方法在第一节已经说过。程序不是很复杂,先判断文件夹是否存在,不存在则创建,DATA_URL在文件开头已经定义,即为cifar10数据集的下载地址,这里将url最后一个斜杠后面的内容作为文件名,并将其与数据文件夹结合得到下载文件存放的路径,接下来判断文件是否存在,如果存在就定义解压的文件夹,然后判断解压文件夹是否存在,若存在表明数据集已经下载并解压了,就不需要操作了。如果还没下载,则通过urllib.request.urlretrieve直接下载文件,其最后一项是一个回调函数,用于显示下载进度,下载进度为当前下载量除以总下载量,下载结束之后就接着后面进行解压。

三、初始操作

def main(argv=None):  # pylint: disable=unused-argument
  cifar10.maybe_download_and_extract()
  if tf.gfile.Exists(FLAGS.eval_dir):
    tf.gfile.DeleteRecursively(FLAGS.eval_dir)
  tf.gfile.MakeDirs(FLAGS.eval_dir)
  evaluate()

这里使用gfile模块,判断评估的文件夹是否存在,存在就删除,即如果之前进行训练过,则把之前的记录删掉,然后再创建该文件夹。完成此操作之后开始训练。

def train():
  """Train CIFAR-10 for a number of steps."""
  with tf.Graph().as_default():
    global_step = tf.train.get_or_create_global_step()

with tf.Graph().as_default()作用:定义默认图。tf.train.get_or_create_global_step():创建并返回global step tensor。这两行还不是理解的很清楚。

    # Get images and labels for CIFAR-10.
    # Force input pipeline to CPU:0 to avoid operations sometimes ending up on
    # GPU and resulting in a slow down.
    with tf.device('/cpu:0'):
      images, labels = cifar10.distorted_inputs()

注释部分写得比较清楚,获取图片和标签,这里使用with tf.device('/cpu:0'),强制使用CPU做读取操作,以免使用GPU,因为使用GPU读取反而会使读取速度下降。

四、读取数据

cifar10.py文件中,读取数据部分代码如下:

def distorted_inputs():
  """Construct distorted input for CIFAR training using the Reader ops.
  Returns:
    images: Images. 4D tensor of [batch_size, IMAGE_SIZE, IMAGE_SIZE, 3] size.
    labels: Labels. 1D tensor of [batch_size] size.
  Raises:
    ValueError: If no data_dir
  """
  if not FLAGS.data_dir:
    raise ValueError('Please supply a data_dir')
  data_dir = os.path.join(FLAGS.data_dir, 'cifar-10-batches-bin')
  images, labels = cifar10_input.distorted_inputs(data_dir=data_dir,
                                                  batch_size=FLAGS.batch_size)
  if FLAGS.use_fp16:
    images = tf.cast(images, tf.float16)
    labels = tf.cast(labels, tf.float16)
  return images, labels

实际上,该函数调用cifar10_input,py文件夹中的distorted_inputs函数读取到图片和标签,然后判断是否需要转换数据类型,并决定是否转换。tf.cast函数用于转换数据类型。

进入cifar10_input.py文件查看具体实现方法。

def distorted_inputs(data_dir, batch_size):
  """Construct distorted input for CIFAR training using the Reader ops.
  Args:
    data_dir: Path to the CIFAR-10 data directory.
    batch_size: Number of images per batch.
  Returns:
    images: Images. 4D tensor of [batch_size, IMAGE_SIZE, IMAGE_SIZE, 3] size.
    labels: Labels. 1D tensor of [batch_size] size.
  """
  filenames = [os.path.join(data_dir, 'data_batch_%d.bin' % i)
               for i in xrange(1, 6)]
  for f in filenames:
    if not tf.gfile.Exists(f):
      raise ValueError('Failed to find file: ' + f)

filenames为数据集文件名,即为data_batch_0.bin--data_batch_5.bin,之后判断文件是否存在,不存在就抛出异常错误。

  # Create a queue that produces the filenames to read.
  filename_queue = tf.train.string_input_producer(filenames)

接下来创建一个文件队列,用于训练的时候取数据。

  with tf.name_scope('data_augmentation'):
    # Read examples from files in the filename queue.
    read_input = read_cifar10(filename_queue)

tf.name_scope的使用会方便后续命名,可以参考这里。使用之后,下面的变量如果命名为'a',之后在另一个地方也使用tf.name_scope,然后在命名一个'a',那也不会出现命名冲突,因为两个变量的命名空间不一样。继续查看read_cifar10函数。

def read_cifar10(filename_queue):
  """Reads and parses examples from CIFAR10 data files.
  Recommendation: if you want N-way read parallelism, call this function
  N times.  This will give you N independent Readers reading different
  files & positions within those files, which will give better mixing of
  examples.
  Args:
    filename_queue: A queue of strings with the filenames to read from.
  Returns:
    An object representing a single example, with the following fields:
      height: number of rows in the result (32)
      width: number of columns in the result (32)
      depth: number of color channels in the result (3)
      key: a scalar string Tensor describing the filename & record number
        for this example.
      label: an int32 Tensor with the label in the range 0..9.
      uint8image: a [height, width, depth] uint8 Tensor with the image data
  """

  class CIFAR10Record(object):
    pass
  result = CIFAR10Record()

  # Dimensions of the images in the CIFAR-10 dataset.
  # See http://www.cs.toronto.edu/~kriz/cifar.html for a description of the
  # input format.
  label_bytes = 1  # 2 for CIFAR-100
  result.height = 32
  result.width = 32
  result.depth = 3
  image_bytes = result.height * result.width * result.depth
  # Every record consists of a label followed by the image, with a
  # fixed number of bytes for each.
  record_bytes = label_bytes + image_bytes

定义一个CIFAR10Record类,用于存储读取到的相关信息。这里简要介绍一下cifar10二进制文件中的数据存储格式:每一幅图片由32*32*3+1个字节组成,第一位表示图片标签,取值为0到9,代表10个种类,在meta文件里面可以查看具体0到9代表什么,紧接着32*32=1024个字节为图片的R通道,后1024个字节为G通道,最后1024个字节为B通道,图片的宽和高位32。根据cifar10二进制文件的存储结构,很容易就能知道上面代码定义的参数含义了。

  # Read a record, getting filenames from the filename_queue.  No
  # header or footer in the CIFAR-10 format, so we leave header_bytes
  # and footer_bytes at their default of 0.
  reader = tf.FixedLengthRecordReader(record_bytes=record_bytes)
  result.key, value = reader.read(filename_queue)

通过tf.FixedLengthRecordReader设置每次读取文件的字节数,然后使用read函数就可以读取到对应的字节。这里,每read一次,文件指针就会往后移动一次(纯属个人理解),移动的长度为reader设置的长度。所以这里每次read一次,就会依次取出一幅图片和它对应的标签,读取到的值为value。

  # Convert from a string to a vector of uint8 that is record_bytes long.
  record_bytes = tf.decode_raw(value, tf.uint8)

接着将读取到的值解码为无符号整形,因为图片取值范围为0到255,即8位无符号整形,最终得到解码结果。

  # The first bytes represent the label, which we convert from uint8->int32.
  result.label = tf.cast(
      tf.strided_slice(record_bytes, [0], [label_bytes]), tf.int32)

  # The remaining bytes after the label represent the image, which we reshape
  # from [depth * height * width] to [depth, height, width].
  depth_major = tf.reshape(
      tf.strided_slice(record_bytes, [label_bytes],
                       [label_bytes + image_bytes]),
      [result.depth, result.height, result.width])
  # Convert from [depth, height, width] to [height, width, depth].
  result.uint8image = tf.transpose(depth_major, [1, 2, 0])

  return result

tf.stride_slice作用为:读取指定位置的数据,具体用法可以查看这里。标签位于首位,所以这里读取起始位置设置为0,结束位置为标签所占的字节,将读取到的标签转换为32位整形。读取图片时,起始位置为标签占用的字节数,结束位置为总字节数。读取到的数据转换成3*32*32的矩阵,这样就将数据分成了三个通道,每个通道都为32*32的矩阵,通过后面的转置操作,将其变为32*32*3的矩阵,即为最终的RGB图像,最终把结果返回。

    reshaped_image = tf.cast(read_input.uint8image, tf.float32)

    height = IMAGE_SIZE
    width = IMAGE_SIZE

将读取到的数据转换为32位浮点型数据,再设置宽和高。

    # Randomly crop a [height, width] section of the image.
    distorted_image = tf.random_crop(reshaped_image, [height, width, 3])

随机剪裁图片。原图大小为32*32,这里随机剪裁用于训练,剪裁之后图片大小为24*24。

    # Randomly flip the image horizontally.
    distorted_image = tf.image.random_flip_left_right(distorted_image)

随机左右翻转图片。

    # Because these operations are not commutative, consider randomizing
    # the order their operation.
    # NOTE: since per_image_standardization zeros the mean and makes
    # the stddev unit, this likely has no effect see tensorflow#1458.
    distorted_image = tf.image.random_brightness(distorted_image,
                                                 max_delta=63)
    distorted_image = tf.image.random_contrast(distorted_image,
                                               lower=0.2, upper=1.8)

随机调整图片的亮度,即在原始像素值基础上加上一个随机值,该值范围为:[-max_delta,max_delta]。随机调整对比度,对比度范围为:[0.2,1.8]。

    # Subtract off the mean and divide by the variance of the pixels.
    float_image = tf.image.per_image_standardization(distorted_image)

图片标准化,这样可以加快训练速度。通过以上预处理,可以加强模型的泛化能力。

    # Set the shapes of tensors.
    float_image.set_shape([height, width, 3])
    read_input.label.set_shape([1])

设置图片和标签的尺寸。

    # Ensure that the random shuffling has good mixing properties.
    min_fraction_of_examples_in_queue = 0.4
    min_queue_examples = int(NUM_EXAMPLES_PER_EPOCH_FOR_TRAIN *
                             min_fraction_of_examples_in_queue)
    print ('Filling queue with %d CIFAR images before starting to train. '
           'This will take a few minutes.' % min_queue_examples)

设置队列最小样本数量。队列的作用是在取到数据之后,将数据打乱再放到网络的输入作为训练。

  # Generate a batch of images and labels by building up a queue of examples.
  return _generate_image_and_label_batch(float_image, read_input.label,
                                         min_queue_examples, batch_size,
                                         shuffle=True)

最终返回一个batch的数据,该数据的产生函数如下:

def _generate_image_and_label_batch(image, label, min_queue_examples,
                                    batch_size, shuffle):
  """Construct a queued batch of images and labels.
  Args:
    image: 3-D Tensor of [height, width, 3] of type.float32.
    label: 1-D Tensor of type.int32
    min_queue_examples: int32, minimum number of samples to retain
      in the queue that provides of batches of examples.
    batch_size: Number of images per batch.
    shuffle: boolean indicating whether to use a shuffling queue.
  Returns:
    images: Images. 4D tensor of [batch_size, height, width, 3] size.
    labels: Labels. 1D tensor of [batch_size] size.
  """
  # Create a queue that shuffles the examples, and then
  # read 'batch_size' images + labels from the example queue.
  num_preprocess_threads = 16
  if shuffle:
    images, label_batch = tf.train.shuffle_batch(
        [image, label],
        batch_size=batch_size,
        num_threads=num_preprocess_threads,
        capacity=min_queue_examples + 3 * batch_size,
        min_after_dequeue=min_queue_examples)
  else:
    images, label_batch = tf.train.batch(
        [image, label],
        batch_size=batch_size,
        num_threads=num_preprocess_threads,
        capacity=min_queue_examples + 3 * batch_size)

设置读取线程,可以优于单线程的读取速度。判断是否打乱顺序,调用tf.train.shuffle_batch(打乱数据顺序)或tf.train.batch(顺序读取)读取数据,设置batch大小,队列容量和出队之后最少数据量。至此,数据读取已经结束,这里返回的数据即可输入到网络进行训练。

五、构造模型

回到cifar10_train.py文件的train()函数,读取数据之后的操作如下:

    # Build a Graph that computes the logits predictions from the
    # inference model.
    logits = cifar10.inference(images)

这里通过cifar10.py中的接口构建图。该模型结构大致为:卷积-->最大池化-->局部响应归一化-->卷积-->局部结构归一化-->最大池化-->全连接-->全连接-->softmax linear。

def inference(images):
  """Build the CIFAR-10 model.
  Args:
    images: Images returned from distorted_inputs() or inputs().
  Returns:
    Logits.
  """
  # We instantiate all variables using tf.get_variable() instead of
  # tf.Variable() in order to share variables across multiple GPU training runs.
  # If we only ran this model on a single GPU, we could simplify this function
  # by replacing all instances of tf.get_variable() with tf.Variable().
  #
  # conv1
  with tf.variable_scope('conv1') as scope:
    kernel = _variable_with_weight_decay('weights',
                                         shape=[5, 5, 3, 64],
                                         stddev=5e-2,
                                         wd=None)
    conv = tf.nn.conv2d(images, kernel, [1, 1, 1, 1], padding='SAME')
    biases = _variable_on_cpu('biases', [64], tf.constant_initializer(0.0))
    pre_activation = tf.nn.bias_add(conv, biases)
    conv1 = tf.nn.relu(pre_activation, name=scope.name)
    _activation_summary(conv1)

以上为第一个卷积层。 _variable_with_weight_decay函数如下:

def _variable_with_weight_decay(name, shape, stddev, wd):
  """Helper to create an initialized Variable with weight decay.
  Note that the Variable is initialized with a truncated normal distribution.
  A weight decay is added only if one is specified.
  Args:
    name: name of the variable
    shape: list of ints
    stddev: standard deviation of a truncated Gaussian
    wd: add L2Loss weight decay multiplied by this float. If None, weight
        decay is not added for this Variable.
  Returns:
    Variable Tensor
  """
  dtype = tf.float16 if FLAGS.use_fp16 else tf.float32
  var = _variable_on_cpu(
      name,
      shape,
      tf.truncated_normal_initializer(stddev=stddev, dtype=dtype))
  if wd is not None:
    weight_decay = tf.multiply(tf.nn.l2_loss(var), wd, name='weight_loss')
    tf.add_to_collection('losses', weight_decay)
  return var

首先确定数据类型,通过_variable_on_cpu函数得到变量,通过tf.truncated_normal_initializer初始化变量,即以stddev为标准差的正态分布,然后将该变量返回。如果wd不为空,则计算衰减的权重,tf.nn.l2_loss(var)=sum(var**2)/2,并将其添加到几何中,若要查看其值,则调用tf.get_collection即可。

def _variable_on_cpu(name, shape, initializer):
  """Helper to create a Variable stored on CPU memory.
  Args:
    name: name of the variable
    shape: list of ints
    initializer: initializer for Variable
  Returns:
    Variable Tensor
  """
  with tf.device('/cpu:0'):
    dtype = tf.float16 if FLAGS.use_fp16 else tf.float32
    var = tf.get_variable(name, shape, initializer=initializer, dtype=dtype)
  return var

该函数主要用于创建变量,通过tf.get_variable创建,并指定其命名,大小,初始化函数和数据类型。

回到第一层卷积层的定义,该卷积层的核大小为5*5*3,产生64个输出,即提取64个特征,卷积核初始化时其变量初始化为标准差为5e-2的正态分布产生的随机数。卷积结果为将图片与该卷积核卷积,每次移动的步长为1,且卷积之后保持图片大小不变。之后初始化一个64维的偏执,初始值为0。将卷积结果和偏置相加之后经过一个relu激活函数,即得到第一个卷积层的输出。_activation_summary用于统计某一个值的变化,在训练结束之后可以通过tensorboard查看其变化过程。

def _activation_summary(x):
  """Helper to create summaries for activations.
  Creates a summary that provides a histogram of activations.
  Creates a summary that measures the sparsity of activations.
  Args:
    x: Tensor
  Returns:
    nothing
  """
  # Remove 'tower_[0-9]/' from the name in case this is a multi-GPU training
  # session. This helps the clarity of presentation on tensorboard.
  tensor_name = re.sub('%s_[0-9]*/' % TOWER_NAME, '', x.op.name)
  tf.summary.histogram(tensor_name + '/activations', x)
  tf.summary.scalar(tensor_name + '/sparsity',
                                       tf.nn.zero_fraction(x))

re.sub为替换字符的作用,若变量名(x.op.name)形如:TOWER_NAME_xxx..../name,xxx...表示0个数字或者n个数字,即类似:TOWER_NAME_/name,TOWER_NAME_0/name,TOWER_NAME_23849/name,则用''替换TOWER_NAME_xxx..../,即去掉该部分,仅留下name。通过tf.summary.histogram绘制变量x的直方图,通过tf.summary.scalar对标量汇总,def zero_fraction定义如下:

def zero_fraction(value, name=None):
  """Returns the fraction of zeros in `value`.

  If `value` is empty, the result is `nan`.

  This is useful in summaries to measure and report sparsity.  For example,

  ```python
      z = tf.nn.relu(...)
      summ = tf.summary.scalar('sparsity', tf.nn.zero_fraction(z))
  ```

  Args:
    value: A tensor of numeric type.
    name: A name for the operation (optional).

  Returns:
    The fraction of zeros in `value`, with type `float32`.
  """
  with ops.name_scope(name, "zero_fraction", [value]):
    value = ops.convert_to_tensor(value, name="value")
    zero = constant_op.constant(0, dtype=value.dtype, name="zero")
    return math_ops.reduce_mean(
        math_ops.cast(math_ops.equal(value, zero), dtypes.float32))

可以看到其功能是统计值为0的个数。

  # pool1
  pool1 = tf.nn.max_pool(conv1, ksize=[1, 3, 3, 1], strides=[1, 2, 2, 1],
                         padding='SAME', name='pool1')
  # norm1
  norm1 = tf.nn.lrn(pool1, 4, bias=1.0, alpha=0.001 / 9.0, beta=0.75,
                    name='norm1')

卷积层之后紧接最大池化层,池化的尺寸为3*3,移动步长为2,池化结果保持大小不变,再进行局部响应归一化,提高模型泛化能力。具体实现方法不是很理解。

  # conv2
  with tf.variable_scope('conv2') as scope:
    kernel = _variable_with_weight_decay('weights',
                                         shape=[5, 5, 64, 64],
                                         stddev=5e-2,
                                         wd=None)
    conv = tf.nn.conv2d(norm1, kernel, [1, 1, 1, 1], padding='SAME')
    biases = _variable_on_cpu('biases', [64], tf.constant_initializer(0.1))
    pre_activation = tf.nn.bias_add(conv, biases)
    conv2 = tf.nn.relu(pre_activation, name=scope.name)
    _activation_summary(conv2)

  # norm2
  norm2 = tf.nn.lrn(conv2, 4, bias=1.0, alpha=0.001 / 9.0, beta=0.75,
                    name='norm2')
  # pool2
  pool2 = tf.nn.max_pool(norm2, ksize=[1, 3, 3, 1],
                         strides=[1, 2, 2, 1], padding='SAME', name='pool2')

同上,再加一个卷积层,卷积核大小5*5,输出64维向量,加上偏执之后通过relu激活层,再对卷积结果总结,然后局部相应归一化,最大池化。

  # local3
  with tf.variable_scope('local3') as scope:
    # Move everything into depth so we can perform a single matrix multiply.
    reshape = tf.reshape(pool2, [images.get_shape().as_list()[0], -1])
    dim = reshape.get_shape()[1].value
    weights = _variable_with_weight_decay('weights', shape=[dim, 384],
                                          stddev=0.04, wd=0.004)
    biases = _variable_on_cpu('biases', [384], tf.constant_initializer(0.1))
    local3 = tf.nn.relu(tf.matmul(reshape, weights) + biases, name=scope.name)
    _activation_summary(local3)

随后,添加全连接层。首先,将上面最大池化的输出拉成一个一维向量,获取这个向量的维数,初始化权重,设置输出为384个元素,这里的权重设置wd不为空,会计算衰减值。与poll层输出相乘后加上偏置,通过relu激活函数,统计输出。

  # local4
  with tf.variable_scope('local4') as scope:
    weights = _variable_with_weight_decay('weights', shape=[384, 192],
                                          stddev=0.04, wd=0.004)
    biases = _variable_on_cpu('biases', [192], tf.constant_initializer(0.1))
    local4 = tf.nn.relu(tf.matmul(local3, weights) + biases, name=scope.name)
    _activation_summary(local4)

第二个全连接层与前面类似,初始化权重和偏执设置输出个数为192,与前一层相乘之后加上偏置,通过relu,统计。

  # linear layer(WX + b),
  # We don't apply softmax here because
  # tf.nn.sparse_softmax_cross_entropy_with_logits accepts the unscaled logits
  # and performs the softmax internally for efficiency.
  with tf.variable_scope('softmax_linear') as scope:
    weights = _variable_with_weight_decay('weights', [192, NUM_CLASSES],
                                          stddev=1/192.0, wd=None)
    biases = _variable_on_cpu('biases', [NUM_CLASSES],
                              tf.constant_initializer(0.0))
    softmax_linear = tf.add(tf.matmul(local4, weights), biases, name=scope.name)
    _activation_summary(softmax_linear)

  return softmax_linear

最后计算输出,输出个数与标签个数一致。

六、构造损失函数

    # Calculate loss.
    loss = cifar10.loss(logits, labels)

cifar10_train.py文件中,在构造完整个网络模型之后,紧接着构造了损失函数。

def loss(logits, labels):
  """Add L2Loss to all the trainable variables.
  Add summary for "Loss" and "Loss/avg".
  Args:
    logits: Logits from inference().
    labels: Labels from distorted_inputs or inputs(). 1-D tensor
            of shape [batch_size]
  Returns:
    Loss tensor of type float.
  """
  # Calculate the average cross entropy loss across the batch.
  labels = tf.cast(labels, tf.int64)
  cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(
      labels=labels, logits=logits, name='cross_entropy_per_example')
  cross_entropy_mean = tf.reduce_mean(cross_entropy, name='cross_entropy')
  tf.add_to_collection('losses', cross_entropy_mean)

  # The total loss is defined as the cross entropy loss plus all of the weight
  # decay terms (L2 loss).
  return tf.add_n(tf.get_collection('losses'), name='total_loss')

用tf.nn.sparse_softmax_cross_entropy_with_logits计算稀疏交叉熵,这里的labels大小为batch_size,logits大小为batch_size*num_classes。计算稀疏交叉熵之后通过tf.reduce_mean计算中值,再把中值加到集合中,然后把所有的loss加起来再返回。

七、构造训练

根据损失函数构造训练方法。

    # Build a Graph that trains the model with one batch of examples and
    # updates the model parameters.
    train_op = cifar10.train(loss, global_step)

这里训练时学习率会动态衰减,开始的时候学习率较大,收敛速度较快,随着训练次数增加,为了更接近最有值,较小学习率。

def train(total_loss, global_step):
  """Train CIFAR-10 model.
  Create an optimizer and apply to all trainable variables. Add moving
  average for all trainable variables.
  Args:
    total_loss: Total loss from loss().
    global_step: Integer Variable counting the number of training steps
      processed.
  Returns:
    train_op: op for training.
  """
  # Variables that affect learning rate.
  num_batches_per_epoch = NUM_EXAMPLES_PER_EPOCH_FOR_TRAIN / FLAGS.batch_size
  decay_steps = int(num_batches_per_epoch * NUM_EPOCHS_PER_DECAY)

  # Decay the learning rate exponentially based on the number of steps.
  lr = tf.train.exponential_decay(INITIAL_LEARNING_RATE,
                                  global_step,
                                  decay_steps,
                                  LEARNING_RATE_DECAY_FACTOR,
                                  staircase=True)
  tf.summary.scalar('learning_rate', lr)

计算每次epoch时batch数,据此计算衰减步数,使用tf.train.exponential_decay让学习率呈指数衰减,函数需要指定初始学习速率,global_step,衰减步数,衰减因子,staircase设置为True表明呈阶梯衰减,即每decay_steps步之后计算一次衰减,得到新的学习率,之后保持学习率不变,直至下一个decay_steps。统计学习率的变化。

  # Generate moving averages of all losses and associated summaries.
  loss_averages_op = _add_loss_summaries(total_loss)

这里生成所有loss的移动平均值。

def _add_loss_summaries(total_loss):
  """Add summaries for losses in CIFAR-10 model.
  Generates moving average for all losses and associated summaries for
  visualizing the performance of the network.
  Args:
    total_loss: Total loss from loss().
  Returns:
    loss_averages_op: op for generating moving averages of losses.
  """
  # Compute the moving average of all individual losses and the total loss.
  loss_averages = tf.train.ExponentialMovingAverage(0.9, name='avg')
  losses = tf.get_collection('losses')
  loss_averages_op = loss_averages.apply(losses + [total_loss])

  # Attach a scalar summary to all individual losses and the total loss; do the
  # same for the averaged version of the losses.
  for l in losses + [total_loss]:
    # Name each loss as '(raw)' and name the moving average version of the loss
    # as the original loss name.
    tf.summary.scalar(l.op.name + ' (raw)', l)
    tf.summary.scalar(l.op.name, loss_averages.average(l))

  return loss_averages_op

tf.train.ExponentialMovingAverage,利用指数衰减计算滑动平均值。具体可以参考这里。然后利用滑动平均值计算loss,这里个人理解是,某一批次的损失主要与当次计算的损失有关,用其他次训练的损失平滑,以免噪声等的影响。统计原始损失和平均之后的损失,返回平均之后的损失。

  # Compute gradients.
  with tf.control_dependencies([loss_averages_op]):
    opt = tf.train.GradientDescentOptimizer(lr)
    grads = opt.compute_gradients(total_loss)

  # Apply gradients.
  apply_gradient_op = opt.apply_gradients(grads, global_step=global_step)

计算梯度并应用。

  # Add histograms for trainable variables.
  for var in tf.trainable_variables():
    tf.summary.histogram(var.op.name, var)

  # Add histograms for gradients.
  for grad, var in grads:
    if grad is not None:
      tf.summary.histogram(var.op.name + '/gradients', grad)

直方图统计训练变量和梯度。

  # Track the moving averages of all trainable variables.
  variable_averages = tf.train.ExponentialMovingAverage(
      MOVING_AVERAGE_DECAY, global_step)
  with tf.control_dependencies([apply_gradient_op]):
    variables_averages_op = variable_averages.apply(tf.trainable_variables())

  return variables_averages_op

计算变量的滑动平均值,使用tf.control_dependencies,保证梯度已经应用之后再做下面的操作,即计算变量的滑动平均值,保证变量滑动平均值是在梯度应用之后计算的结果。返回变量的滑动平均值。

八、开始训练

class _LoggerHook(tf.train.SessionRunHook):
      """Logs loss and runtime."""

      def begin(self):
        self._step = -1
        self._start_time = time.time()

      def before_run(self, run_context):
        self._step += 1
        return tf.train.SessionRunArgs(loss)  # Asks for loss value.

      def after_run(self, run_context, run_values):
        if self._step % FLAGS.log_frequency == 0:
          current_time = time.time()
          duration = current_time - self._start_time
          self._start_time = current_time

          loss_value = run_values.results
          examples_per_sec = FLAGS.log_frequency * FLAGS.batch_size / duration
          sec_per_batch = float(duration / FLAGS.log_frequency)

          format_str = ('%s: step %d, loss = %.2f (%.1f examples/sec; %.3f '
                        'sec/batch)')
          print (format_str % (datetime.now(), self._step, loss_value,
                               examples_per_sec, sec_per_batch))

begin方法初始化训练步数和起始时间,before_run,用于运行之前返回loss的值,同时技术训练步数。after_run用于打印相关信息。

    with tf.train.MonitoredTrainingSession(
        checkpoint_dir=FLAGS.train_dir,
        hooks=[tf.train.StopAtStepHook(last_step=FLAGS.max_steps),
               tf.train.NanTensorHook(loss),
               _LoggerHook()],
        config=tf.ConfigProto(
            log_device_placement=FLAGS.log_device_placement)) as mon_sess:
      while not mon_sess.should_stop():
        mon_sess.run(train_op)

tf.train.MonitoredTrainingSession,字面意思为监督训练的会话,checkpoint_dir,恢复checkpoint的文件夹(不是很懂),tf.train.StopAtStepHook,到达last_step时发起停止的信号,tf.train.NanTensorHook用于监督loss是否为nan,如果没有收到停止信息就训练。

 

至此,大概知道了训练过程,首先读取图片,对图片预处理,图片存放到队列中打乱之后用于网络输入。其次构造模型,损失函数计算,学习率指数衰减,计算梯度,用梯度来求解最优值(猜测是应用梯度的时候求的)。最后开始训练。

附上cifar10例程源码地址。

你可能感兴趣的:(学习,tensorflow,cifar10)