点击查看原文地址
使用TensorFlow训练很庞大的深度学习神经网络是很复杂也很困难的。为了便于理解,调试和优化TF程序,我们开发了一套用于可视化的工具,叫做TensorBoard。可以用TensorBoard来可视化TensorFlwo的graph,量化图的执行过程,显示其他诸如图片之类的数据并通过它传递数据。
序列化数据
TensorBoard通过读取TensorFlow的事件(events)文件进行操作,该文件包括TensorFlow运行过程中生成的摘要数据(summary data)。下面是TensorBoard中摘要数据的生命周期。
首先,创建一个你想要从中搜集摘要数据的TensorFlow数据流图(TensorFlow graph),并决定哪些点是想要对其进行摘要操作(summary operations)的。
比如,假设要用CNN(卷积神经网络)训练一个MNIST识别手写数字集,可能比较关心学习率随时间的变化以及目标函数是如何变化的,就需要对该点进行tf.summary.scalar
操作,分别输出学习率和损失。然后,给每个scalar_summary
一个有意义的tag
,比如learning rate
或者loss function
。
可能你也想可视化每一层激活函数的分布、梯度分布或权重分布,则可以通过tf.summary.hsitogram
操作来分别显示梯度和有权重的变量。
更详细的summary operations请见summary operations
TensorFlow不会执行任何操作,直到你调用了run或者有一个run依赖于它们输出。我们刚才创建的summary nodes属于graph之外,你当前运行的操作也不会依赖与它。所以,为了产生summaries,我们需要运行所有的summary nodes。手动管理她们是很烦人的,所以我们用tf.summary.merge_all
来把它们连成一个操作,生成所有的summary data。
接下来,你可以只运行合并好的summary操作,它会在一个给定步骤用所有summary data生成一个序列化的summary protobuf对象。最后,将这summary data写入disk, 讲summary protobuf传给tf.summary.FileWriter
。
这个FileWriter的构造函数里有一个logdir,它很重要,是所有写出数据的目录。FileWriter也能传Graph作为可选参数。如果传递了Graph对象,TensorBoard就会可视化Graph,并带有tensor shape信息。更多Tensor shape information请见此
现在你修改了graph,也有了FileWriter,可以开始运行你的网路了。只要你想,可以每一步都运行合并summary操作,记录训练数据,尽管这可能会有多余数据。相反,可以考虑每n步运行一次summary op。
下面的代码示例是简单MINIST教程的修改,我们在其中添加了一些summary ops, 并且每一步都运行了它。如果你运行它并且打开teensorboard --logdir=/tmp/mnist_logs
,就能看到统计图,比如训练过程中权重和精度是如何变化的。下面的代码是一段摘录,源代码在这里。
def variable_summaries(var):
"""Attach a lot of summaries to a Tensor (for TensorBoard visualization)."""
with tf.name_scope('summaries'):
mean = tf.reduce_mean(var)
tf.summary.scalar('mean', mean)
with tf.name_scope('stddev'):
stddev = tf.sqrt(tf.reduce_mean(tf.square(var - mean)))
tf.summary.scalar('stddev', stddev)
tf.summary.scalar('max', tf.reduce_max(var))
tf.summary.scalar('min', tf.reduce_min(var))
tf.summary.histogram('histogram', var)
def nn_layer(input_tensor, input_dim, output_dim, layer_name, act=tf.nn.relu):
"""Reusable code for making a simple neural net layer.
It does a matrix multiply, bias add, and then uses relu to nonlinearize.
It also sets up name scoping so that the resultant graph is easy to read,
and adds a number of summary ops.
"""
# Adding a name scope ensures logical grouping of the layers in the graph.
with tf.name_scope(layer_name):
# This Variable will hold the state of the weights for the layer
with tf.name_scope('weights'):
weights = weight_variable([input_dim, output_dim])
variable_summaries(weights)
with tf.name_scope('biases'):
biases = bias_variable([output_dim])
variable_summaries(biases)
with tf.name_scope('Wx_plus_b'):
preactivate = tf.matmul(input_tensor, weights) + biases
tf.summary.histogram('pre_activations', preactivate)
activations = act(preactivate, name='activation')
tf.summary.histogram('activations', activations)
return activations
hidden1 = nn_layer(x, 784, 500, 'layer1')
with tf.name_scope('dropout'):
keep_prob = tf.placeholder(tf.float32)
tf.summary.scalar('dropout_keep_probability', keep_prob)
dropped = tf.nn.dropout(hidden1, keep_prob)
# Do not apply softmax activation yet, see below.
y = nn_layer(dropped, 500, 10, 'layer2', act=tf.identity)
with tf.name_scope('cross_entropy'):
# The raw formulation of cross-entropy,
#
# tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(tf.softmax(y)),
# reduction_indices=[1]))
#
# can be numerically unstable.
#
# So here we use tf.nn.softmax_cross_entropy_with_logits on the
# raw outputs of the nn_layer above, and then average across
# the batch.
diff = tf.nn.softmax_cross_entropy_with_logits(targets=y_, logits=y)
with tf.name_scope('total'):
cross_entropy = tf.reduce_mean(diff)
tf.summary.scalar('cross_entropy', cross_entropy)
with tf.name_scope('train'):
train_step = tf.train.AdamOptimizer(FLAGS.learning_rate).minimize(
cross_entropy)
with tf.name_scope('accuracy'):
with tf.name_scope('correct_prediction'):
correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
with tf.name_scope('accuracy'):
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
tf.summary.scalar('accuracy', accuracy)
# Merge all the summaries and write them out to /tmp/mnist_logs (by default)
merged = tf.summary.merge_all()
train_writer = tf.summary.FileWriter(FLAGS.summaries_dir + '/train',
sess.graph)
test_writer = tf.summary.FileWriter(FLAGS.summaries_dir + '/test')
tf.global_variables_initializer().run()
初始化完FileWriters后,我们必须在训练和测试模型时向FileWriters添加summaries。
# Train the model, and also write summaries.
# Every 10th step, measure test-set accuracy, and write test summaries
# All other steps, run train_step on training data, & add training summaries
def feed_dict(train):
"""Make a TensorFlow feed_dict: maps data onto Tensor placeholders."""
if train or FLAGS.fake_data:
xs, ys = mnist.train.next_batch(100, fake_data=FLAGS.fake_data)
k = FLAGS.dropout
else:
xs, ys = mnist.test.images, mnist.test.labels
k = 1.0
return {x: xs, y_: ys, keep_prob: k}
for i in range(FLAGS.max_steps):
if i % 10 == 0: # Record summaries and test-set accuracy
summary, acc = sess.run([merged, accuracy], feed_dict=feed_dict(False))
test_writer.add_summary(summary, i)
print('Accuracy at step %s: %s' % (i, acc))
else: # Record train set summaries, and train
summary, _ = sess.run([merged, train_step], feed_dict=feed_dict(True))
train_writer.add_summary(summary, i)
到此,用TensorBoard可视化数据已全部完成。
启动TensorBoard
用下面命令启动TensorBoard(或者python -m tensorflow.tensorboard
)
tensorboard --logdir=path/to/log-directory
logir指向FileWriter序列化数据的地方。如果logdir目录包含其他运行的子目录,TensorBoard将把全部数据可视化。TensorBoard运行后,用浏览器到localhost:6006
查看。
打开TensorBoard,在右上角可以看到导航栏。每个tab代表能看到到一个序列化数据集。
更多如何使用graph tab可视化graph,请见这里
更多TensorBoard信息请见这里