TensorFlow 学习之三TensorBoard实战

TensorBoard简介

TensorFlow 的计算过程其实是一个黑盒过程，为了便于使用者对TensorFlow程序的理解，调试和优化，TensorFlow 提供TensorBoard 这套组件来支持对代码的可视化理解。TensorBoard 是一组Web应用组件；其主要的作用是用可视化的方式展示TensorFlow的计算过程和计算图的形态；

TensorBoard可视化

TensorBoard 是通过读取TensorFlow的事件文件信息来进行可视化的。TensorFlow的事件文件包含的是tensorflow运行时的summary data。

Summary Operations

Summary Operation提供了这样一些工具用来记录computational graph的运行信息：
下面两个类提供了output 接口，用来将summary信息写到事件文件中去。

tf.summary.FileWriter
tf.summary.FileWriterCache

下面是一些Summary Operation

tf.summary.tensor_summary: Outputs a Summary protocol buffer with a serialized tensor.proto
tf.summary.scalar: Outputs a Summary protocol buffer containing a single scalar value
tf.summary.histogram: Outputs a Summary protocol buffer with a histogram
tf.summary.audio: Outputs a Summary protocol buffer with audio.
tf.summary.image: Outputs a Summary protocol buffer with images.
tf.summary.merge: This op creates a Summary protocol buffer that contains the union of all the values in the input summaries.
tf.summary.merge_all: Merges all summaries collected in the default graph

生成summary 过程

首先创建需要进行收集summary data 的 computational graph，并且确定需要观察的节点。
例如：假设在训练一个cnn去识别MNIST的数字的模型中，你可能需要记录learning rate，损失函数值的变化过程，这时你可以给输出learning rate的值和输出损失函数值的节点分别附加一个tf.summary.scalar操作。
合并summary Ops：
tensorflow 中的Operation不会做任何事情，直到有人去运行它，或者有其它的运行的Operation依赖于它的输出作为输入；而我们在第一步附加给compuatiaonal graph 节点上的summary operation 是一种相对于目标graph 是外围的节点，他们并不被依赖，所以需要我们主动的去运行summary Operatiton；当然一个一个手动的去运行summary operation 显然是很麻烦的，所以这一步需要用tf.summary.merge_all 将所有的summary operation 合并成单个operation。
运行合并后的summary operation：
运行summary operation将生成序列化的Summary protobuf object, 之后将其传给 tf.summary.FileWriter ， FileWrite 会将summary object 写入到事件文件中去。
设置运行的频次：
在模型的训练过程中往往都要进行多步迭代，我们可以在图每次计算一次时运行summary，但当次数迭代较多时这就没必要了，一般可以设置每训练多少步运行一次summary operation。

launch Tensorboard

可以用下面两中方式启动tensorbord:

python -m tensorflow.tensorboard --logdir=path/to/log-directory
tensorboard --logdir=path/to/log-directory
这里的logdir 指的是tf.summary.FileWriter 写的事件文件的文件夹；如果logdir 文件夹含有子文件夹，且这个子文件夹中含有不同的事件文件，Tensorboard 也会对其进行可视话。当Tensorboard 启动好后，可以通过浏览器访问 localhost:6006 去查看Tensorboard的可视化结果。

TensorBoard: Embedding Visualization

前面我们介绍了tensorborad的流程和用法，这里我们介绍Tensorboard另外一个有用的功能，embedding visuaslization, 其实质就是将高维的数据按照特定的算法映射到2维或者3维进行展示。

TensorBoard 有一个内置的可是话工具叫做 Embedding Projector,主要是为了方便交互式的展示和分析高维数据， embedding projector 会读取在模型文件中的embedings, 并且加载模型中任何2维的tensor。

Embedding Projector 默认的使用PCA将高维数据，映射到3维空间，但其也提供了t-SNE 用来做映射。

创建embedding

需要这么三步来可视化embeddings：

创建一个2维的tensor来记录embedding :
embedding_var = tf.Variable(....)
周期性的将模型变量保存在logdir 下面的checkpoint文件中

 saver = tf.train.Saver()
 saver.save(session, os.path.join(LOG_DIR, "model.ckpt"), step)

(可选)对embedding 附加元数据：
如果你想对embedding 的数据添加元数据(如标签，图片),你可以通过在log_dir文件夹下面保存一个projector_config.pbtxt指定元数据信息，或者通过python API
例如：下面的projector_config.pbtxt为word_embedding附加一个存在logdir/metadata.tsv下的元数据：

 embeddings {
  tensor_name: 'word_embedding'
  metadata_path: '$LOG_DIR/metadata.tsv'
}

元数据

通常，embeddings都会有附加元数据，元数据必须在模型的checkpoint 外面用一个单独的文件保存。元数据文件的格式是TSV格式的文件，即用tab键分隔的文件，并且这个文件必须带有文件头；
一个具体的文件内容的例子：

Word\tFrequency
Airplane\t345
Car\t241
...

需要注意的一点是元文件中数据的顺序必须和embedding tenor的顺序一致；

图片元数据

如果你需要将图片数据附加到embeddings 上去，你需要将每个数据点代表的图片合成一张整的图片，这张图片叫做sprite image。
生成完sprite image后，需要告诉Embedding projector 去加载文件：

  embedding.sprite.image_path = PATH_TO_SPRITE_IMAGE
 # Specify the width and height of a single thumbnail.
  embedding.sprite.single_image_dim.extend([w, h])

Graph的可视化：

TensorFlow 的computation graphs 一般都会比较复杂. 对其进行可视化能帮助人们理解和调试程序。
对图进行可视化，只需运行TensorBoard命令，并且点击graph 按件就可以看到了。
这里主要讲的一点是name scoping

name scope

由于深度学习模型往往有成千上万个节点，在有限的空间中展示这么多细节是很不友好的，tensorflow 里面有个name scope 的机制，可以将一些variable 划到一个scope中去，然后在展示graph的时候，在同一个name scope 都会被折叠进一个节点中去，用户可以自己去展开。
另外name scope 也类似java的包一样，解决了variable 命名的问题，这里就不细讲了。

实战：基于CNN对mnist数字识别

import os
import tensorflow as tf
import sys
import urllib

if sys.version_info[0] >= 3:
    from urllib.request import urlretrieve
else:
    from urllib import urlretrieve

LOGDIR = '/tmp/mnist_tutorial/'
GITHUB_URL = 'https://raw.githubusercontent.com/mamcgrath/TensorBoard-TF-Dev-Summit-Tutorial/master/'

### MNIST EMBEDDINGS ###
mnist = tf.contrib.learn.datasets.mnist.read_data_sets(train_dir=LOGDIR + 'data', one_hot=True)
### Get a sprite and labels file for the embedding projector ###
urlretrieve(GITHUB_URL + 'labels_1024.tsv', LOGDIR + 'labels_1024.tsv')
urlretrieve(GITHUB_URL + 'sprite_1024.png', LOGDIR + 'sprite_1024.png')


# Add convolution layer
def conv_layer(input, size_in, size_out, name="conv"):
  with tf.name_scope(name):
    w = tf.Variable(tf.truncated_normal([5, 5, size_in, size_out], stddev=0.1), name="W")
    b = tf.Variable(tf.constant(0.1, shape=[size_out]), name="B")
    conv = tf.nn.conv2d(input, w, strides=[1, 1, 1, 1], padding="SAME")
    act = tf.nn.relu(conv + b)
    tf.summary.histogram("weights", w)
    tf.summary.histogram("biases", b)
    tf.summary.histogram("activations", act)
    return tf.nn.max_pool(act, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding="SAME")


# Add fully connected layer
def fc_layer(input, size_in, size_out, name="fc"):
  with tf.name_scope(name):
    w = tf.Variable(tf.truncated_normal([size_in, size_out], stddev=0.1), name="W")
    b = tf.Variable(tf.constant(0.1, shape=[size_out]), name="B")
    act = tf.nn.relu(tf.matmul(input, w) + b)
    tf.summary.histogram("weights", w)
    tf.summary.histogram("biases", b)
    tf.summary.histogram("activations", act)
    return act


def mnist_model(learning_rate, use_two_conv, use_two_fc, hparam):
  tf.reset_default_graph()
  sess = tf.Session()

  # Setup placeholders, and reshape the data
  x = tf.placeholder(tf.float32, shape=[None, 784], name="x")
  x_image = tf.reshape(x, [-1, 28, 28, 1])
  tf.summary.image('input', x_image, 3)
  y = tf.placeholder(tf.float32, shape=[None, 10], name="labels")

  if use_two_conv:
    conv1 = conv_layer(x_image, 1, 32, "conv1")
    conv_out = conv_layer(conv1, 32, 64, "conv2")
  else:
    conv1 = conv_layer(x_image, 1, 64, "conv")
    conv_out = tf.nn.max_pool(conv1, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding="SAME")

  flattened = tf.reshape(conv_out, [-1, 7 * 7 * 64])


  if use_two_fc:
    fc1 = fc_layer(flattened, 7 * 7 * 64, 1024, "fc1")
    embedding_input = fc1
    embedding_size = 1024
    logits = fc_layer(fc1, 1024, 10, "fc2")
  else:
    embedding_input = flattened
    embedding_size = 7*7*64
    logits = fc_layer(flattened, 7*7*64, 10, "fc")

  with tf.name_scope("xent"):
    xent = tf.reduce_mean(
        tf.nn.softmax_cross_entropy_with_logits(
            logits=logits, labels=y), name="xent")
    tf.summary.scalar("xent", xent)

  with tf.name_scope("train"):
    train_step = tf.train.AdamOptimizer(learning_rate).minimize(xent)

  with tf.name_scope("accuracy"):
    correct_prediction = tf.equal(tf.argmax(logits, 1), tf.argmax(y, 1))
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
    tf.summary.scalar("accuracy", accuracy)

  summ = tf.summary.merge_all()


  embedding = tf.Variable(tf.zeros([1024, embedding_size]), name="test_embedding")
  assignment = embedding.assign(embedding_input)
  saver = tf.train.Saver()

  sess.run(tf.global_variables_initializer())
  writer = tf.summary.FileWriter(LOGDIR + hparam)
  writer.add_graph(sess.graph)

  config = tf.contrib.tensorboard.plugins.projector.ProjectorConfig()
  embedding_config = config.embeddings.add()
  embedding_config.tensor_name = embedding.name
  embedding_config.sprite.image_path = LOGDIR + 'sprite_1024.png'
  embedding_config.metadata_path = LOGDIR + 'labels_1024.tsv'
  # Specify the width and height of a single thumbnail.
  embedding_config.sprite.single_image_dim.extend([28, 28])
  tf.contrib.tensorboard.plugins.projector.visualize_embeddings(writer, config)

  for i in range(2001):
    batch = mnist.train.next_batch(100)
    if i % 5 == 0:
      [train_accuracy, s] = sess.run([accuracy, summ], feed_dict={x: batch[0], y: batch[1]})
      writer.add_summary(s, i)
    if i % 500 == 0:
      sess.run(assignment, feed_dict={x: mnist.test.images[:1024], y: mnist.test.labels[:1024]})
      saver.save(sess, os.path.join(LOGDIR, "model.ckpt"), i)
    sess.run(train_step, feed_dict={x: batch[0], y: batch[1]})


def make_hparam_string(learning_rate, use_two_fc, use_two_conv):
  conv_param = "conv=2" if use_two_conv else "conv=1"
  fc_param = "fc=2" if use_two_fc else "fc=1"
  return "lr_%.0E,%s,%s" % (learning_rate, conv_param, fc_param)


def main():
  # You can try adding some more learning rates
  for learning_rate in [1E-4]:

    # Include "False" as a value to try different model architectures
    for use_two_fc in [True]:
      for use_two_conv in [True]:
        # Construct a hyperparameter string for each one (example: "lr_1E-3,fc=2,conv=2)
        hparam = make_hparam_string(learning_rate, use_two_fc, use_two_conv)
        print('Starting run for %s' % hparam)

        # Actually run with the new settings
        mnist_model(learning_rate, use_two_fc, use_two_conv, hparam)


if __name__ == '__main__':
  main()

总结

本文主要介绍了如下内容：

tensorboard 是什么及其左右
tensorboard 如何进行可视化
tensorboard 进行embedding 可视化
tensorboard 对graph进行可视化
最后以一个完整的例子演示了上面所讲的内容，这个例子是完整可以运行的。

Reference

https://www.tensorflow.org/get_started/summaries_and_tensorboard
https://github.com/tensorflow/tensorflow/blob/r1.1/tensorflow/tensorboard/README.md
https://github.com/mamcgrath/TensorBoard-TF-Dev-Summit-Tutorial