Tensorflow的动态图(Eager)介绍(官网原文译文)

未完待续。。。

Eager

官网地址

英汉互译:

Eager:急切的、渴望的;很多文章翻译为:动态图,Eager execution(立即执行)引申为动态图。

DevOps(Development和Operations的组合),突出重视软件开发人员运维人员的沟通合作,通过自动化流程来使得软件构建、测试、发布更加快捷、频繁和可靠。

Eager execution

动态图

Eager execution is a feature that makes TensorFlow execute operations immediately: concrete values are returned, instead of creating a computational graph that is executed later:

动态图特性使Tensorflow可以立刻执行运算:并返回具体值,替换(之前)建立计算图(它)过一会才执行。

A user guide is available: https://www.tensorflow.org/programmers_guide/eager (source file)

使用指南:https://www.tensorflow.org/programmers_guide/eager (source file)

We welcome feedback through GitHub issues.

欢迎反馈:GitHub issues.

Sample code is available, including benchmarks for some:

可用的实例,包含如下实例:

  • Linear Regression  线性回归
  • MNIST handwritten digit classifier 基于MINIST数据集的手写数字分类
  • ResNet50 image classification 使用ResNet50网络分类图像
  • RNN to generate colors 使用RNN(循环神经网络)生成演示
  • RNN language model RNN(循环神经网络)语言模型

Eager Execution

动态图

TensorFlow's eager execution is an imperative programming environment that evaluates operations immediately, without building graphs: operations return concrete values instead of constructing a computational graph to run later. This makes it easy to get started with TensorFlow and debug models, and it reduces boilerplate as well. To follow along with this guide, run the code samples below in an interactive python interpreter.

TensorFlow's eager(动态图)是一个命令式的编程环境,不建立图而是立即运算求值:运算返回具体值替换(以前)先构建运算图然后执行的机制。使得(使用)Tensorflow和调试模型变得简单,而且减少了多余(模板化、公式化操作)。请根据指南,在交互式Python的解释器中运行以下样例。

Eager execution is a flexible machine learning platform for research and experimentation, providing:

动态图是一个灵活的机器学习平台,用于研究和实验,提供以下功能:

  • An intuitive interface—Structure your code naturally and use Python data structures. Quickly iterate on small models and small data.
  • 直观的接口-方便编码使用使用,基于Python数据结构。快速迭代小模型和小数据
  • Easier debugging—Call ops directly to inspect running models and test changes. Use standard Python debugging tools for immediate error reporting.
  • 调试简单-直接调用ops来检查运行模型和测试变更。使用标准Python调试工具进行即时错误报告。
  • Natural control flow—Use Python control flow instead of graph control flow, simplifying the specification of dynamic models.
  • 自然控制流-使用Python控制流替换图控制流,简化动态模型规范。

Eager execution supports most TensorFlow operations and GPU acceleration. For a collection of examples running in eager execution, see: tensorflow/contrib/eager/python/examples.

动态图支持大部分Tensorflow运算和GPU(图像设备接口,显卡)加速,运行动态库实例见:

tensorflow/contrib/eager/python/examples.

Note: Some models may experience increased overhead with eager execution enabled. Performance improvements are ongoing, but please file a bug if you find a problem and share your benchmarks.

注意:有些模型可能会在动态图执行时会增加开销。性能改进正在进行中,如果发现问题请提交bug并分享基准(依据)。

Setup and basic usage

安装和简单使用

Upgrade to the latest version of TensorFlow:

更新到最新的Tensorflow版本

$ pip install --upgrade tensorflow

To start eager execution, add tf.enable_eager_execution() to the beginning of the program or console session. Do not add this operation to other modules that the program calls.

开始动态图使用,在程序文件开始处添加tf.enable_eager_execution()启动会话。不要添加该之旅到程序调用的其他模块。

from __future__ import absolute_import, division, print_function

import tensorflow as tf

tf.enable_eager_execution()

(请根据Tensorflow实际版本情况添加,不同版本使用指令有差异)

#!/usr/bin/env python
# -*- coding: utf-8 -*-
# Tensorflow 1.6.0版本

from __future__ import absolute_import, division, print_function
import tensorflow as tf
import tensorflow.contrib.eager as tfe

tfe.enable_eager_execution()

x = [[2.]]
m = tf.matmul(x, x)
print("hello, {}".format(m))  # => "hello, [[4.]]"

Now you can run TensorFlow operations and the results will return immediately:

现在运行Tensorflow运算,会立即返回结果

tf.executing_eagerly()        # => True

x = [[2.]]
m = tf.matmul(x, x)
print("hello, {}".format(m))  # => "hello, [[4.]]"

Enabling eager execution changes how TensorFlow operations behave—now they immediately evaluate and return their values to Python. tf.Tensor objects reference concrete values instead of symbolic handles to nodes in a computational graph. Since there isn't a computational graph to build and run later in a session, it's easy to inspect results using print() or a debugger. Evaluating, printing, and checking tensor values does not break the flow for computing gradients.

动态图改变了Tensorflow的运行机制,现在它立即给出运算并返给Python结果tf.Tensor(Tensorflow中的张量)对象对应引用了具体的值提供给计算图中的节点,替换了符号句柄。因此,在会话中不用使用先建立后运行的计算图了,使用print()或调试器查看运算(结果)变容易了。

Eager execution works nicely with NumPy. NumPy operations accept tf.Tensor arguments. TensorFlow math operations convert Python objects and NumPy arrays to tf.Tensor objects. The tf.Tensor.numpy method returns the object's value as a NumPy ndarray.

动态图和NumPy兼容性很好. NumPy 运算接受 tf.Tensor 参数. TensorFlow 数学运算可以转换为Python对象,同时NumPy 数组可以转换为 tf.Tensor 对象. The tf.Tensor.numpy 方法返回对象的值和NumPy ndarray类似。

a = tf.constant([[1, 2],
                 [3, 4]])
print(a)
# => tf.Tensor([[1 2]
#               [3 4]], shape=(2, 2), dtype=int32)

# Broadcasting support
b = tf.add(a, 1)
print(b)
# => tf.Tensor([[2 3]
#               [4 5]], shape=(2, 2), dtype=int32)

# Operator overloading is supported
print(a * b)
# => tf.Tensor([[ 2  6]
#               [12 20]], shape=(2, 2), dtype=int32)

# Use NumPy values
import numpy as np

c = np.multiply(a, b)
print(c)
# => [[ 2  6]
#     [12 20]]

# Obtain numpy value from a tensor:
print(a.numpy())
# => [[1 2]
#     [3 4]]

The tf.contrib.eager module contains symbols available to both eager and graph execution environments and is useful for writing code to work with graphs:

tf.contrib.eager 模块包含的符号在动态图和图执行环境都可以用,这对写代码在图中运行有用。

tfe = tf.contrib.eager

Dynamic control flow

动态控制流

A major benefit of eager execution is that all the functionality of the host language is available while your model is executing. So, for example, it is easy to write fizzbuzz:

动态图一个主要好处是,当模型正在执行时,宿主语言的所有功能都可用。例如,很容易写一个主要好处是,当您的模型正在执行时,宿主语言的所有功能都可用。例如,写fizzbuzz很容易:

def fizzbuzz(max_num):
  counter = tf.constant(0)
  max_num = tf.convert_to_tensor(max_num)
  for num in range(max_num.numpy()):
    num = tf.constant(num)
    if int(num % 3) == 0 and int(num % 5) == 0:
      print('FizzBuzz')
    elif int(num % 3) == 0:
      print('Fizz')
    elif int(num % 5) == 0:
      print('Buzz')
    else:
      print(num)
    counter += 1
  return counter

This has conditionals that depend on tensor values and it prints these values at runtime.

这有条件依赖于张量值,并且在运行时打印这些值。

Build a model

建立模型

Many machine learning models are represented by composing layers. When using TensorFlow with eager execution you can either write your own layers or use a layer provided in the tf.keras.layers package.

许多机器学习模型使用合成层表示,如果使用Tensorflow动态图,既可以写自定义层还可以使用tf.keras.layers包提供的层。

While you can use any Python object to represent a layer, TensorFlow has tf.keras.layers.Layer as a convenient base class. Inherit from it to implement your own layer:

当你使用Python模型展示层,Tensorflow提供tf.keras.layers.Layer作为一个便利的基础类。基于该函数实现自定义层。

class MySimpleLayer(tf.keras.layers.Layer):
  def __init__(self, output_units):
    super(MySimpleLayer, self).__init__()
    self.output_units = output_units

  def build(self, input_shape):
    # The build method gets called the first time your layer is used.
    # Creating variables on build() allows you to make their shape depend
    # on the input shape and hence removes the need for the user to specify
    # full shapes. It is possible to create variables during __init__() if
    # you already know their full shapes.
    self.kernel = self.add_variable(
      "kernel", [input_shape[-1], self.output_units])

  def call(self, input):
    # Override call() instead of __call__ so we can perform some bookkeeping.
    return tf.matmul(input, self.kernel)

Use tf.keras.layers.Dense layer instead of MySimpleLayer above as it has a superset of its functionality (it can also add a bias).

使用tf.keras.layers.Dense层代替MySimpleLayer,因为它具有功能的超集(也可以添加偏差)。

When composing layers into models you can use tf.keras.Sequential to represent models which are a linear stack of layers. It is easy to use for basic models:

将层组合成模型时,可以使用tf.keras.Sequential来表示模型,这些模型是层的线性堆积。应用与基本模型很容易:

model = tf.keras.Sequential([
  tf.keras.layers.Dense(10, input_shape=(784,)),  # must declare input shape
  tf.keras.layers.Dense(10)
])

Alternatively, organize models in classes by inheriting from tf.keras.Model. This is a container for layers that is a layer itself, allowing tf.keras.Model objects to contain other tf.keras.Model objects.

或者,通过基础 tf.keras.Model组织类里面的模型。这是一个容器层,它本身是一个层,允许tf.keras.Model对象包含其他tf.keras.Model对象。

class MNISTModel(tf.keras.Model):
  def __init__(self):
    super(MNISTModel, self).__init__()
    self.dense1 = tf.keras.layers.Dense(units=10)
    self.dense2 = tf.keras.layers.Dense(units=10)

  def call(self, input):
    """Run the model."""
    result = self.dense1(input)
    result = self.dense2(result)
    result = self.dense2(result)  # reuse variables from dense2 layer
    return result

model = MNISTModel()

It's not required to set an input shape for the tf.keras.Model class since the parameters are set the first time input is passed to the layer.

不需要为tf.keras.Model类设置输入形状,因为参数第一次输入已设置并被传递到该层。

tf.keras.layers classes create and contain their own model variables that are tied to the lifetime of their layer objects. To share layer variables, share their objects.

tf.keras.layers类建立和包含自有模型变量,他们和层对象整个生命周期都是联系的。共享层变量,共享对象。

Eager training

动态图训练

Computing gradients

计算梯度

Automatic differentiation is useful for implementing machine learning algorithms such as backpropagation for training neural networks. During eager execution, use tf.GradientTape to trace operations for computing gradients later.

自动微分很有用,用于实现机器学习算法,如BP(反向传播)训练神经网络。在动态图中,使用tf.GradientTape跟踪运算以计算梯度

tf.GradientTape is an opt-in feature to provide maximal performance when not tracing. Since different operations can occur during each call, all forward-pass operations get recorded to a "tape". To compute the gradient, play the tape backwards and then discard. A particular tf.GradientTape can only compute one gradient; subsequent calls throw a runtime error.

tf.GradientTape是一个可选的特性,在不跟踪时提供最大的性能。由于在每次调用期间都会发生不同的操作,所以所有的正向传递操作都被记录到“磁带”。要计算梯度,播放磁带向后,然后丢弃。特定的tf.GradientTape只计算一个梯度;随后的调用会引发运行时错误。

w = tfe.Variable([[1.0]])
with tf.GradientTape() as tape:
  loss = w * w

grad = tape.gradient(loss, w)
print(grad)  # => tf.Tensor([[ 2.]], shape=(1, 1), dtype=float32)

Here's an example of tf.GradientTape that records forward-pass operations to train a simple model:

下面是一个tf.GradientTape例子,它记录正向传递操作来训练一个简单的模型:

# A toy dataset of points around 3 * x + 2
NUM_EXAMPLES = 1000
training_inputs = tf.random_normal([NUM_EXAMPLES])
noise = tf.random_normal([NUM_EXAMPLES])
training_outputs = training_inputs * 3 + 2 + noise

def prediction(input, weight, bias):
  return input * weight + bias

# A loss function using mean-squared error
def loss(weights, biases):
  error = prediction(training_inputs, weights, biases) - training_outputs
  return tf.reduce_mean(tf.square(error))

# Return the derivative of loss with respect to weight and bias
def grad(weights, biases):
  with tf.GradientTape() as tape:
    loss_value = loss(weights, biases)
  return tape.gradient(loss_value, [weights, biases])

train_steps = 200
learning_rate = 0.01
# Start with arbitrary values for W and B on the same batch of data
W = tfe.Variable(5.)
B = tfe.Variable(10.)

print("Initial loss: {:.3f}".format(loss(W, B)))

for i in range(train_steps):
  dW, dB = grad(W, B)
  W.assign_sub(dW * learning_rate)
  B.assign_sub(dB * learning_rate)
  if i % 20 == 0:
    print("Loss at step {:03d}: {:.3f}".format(i, loss(W, B)))

print("Final loss: {:.3f}".format(loss(W, B)))
print("W = {}, B = {}".format(W.numpy(), B.numpy()))

Output (exact numbers may vary):

输出(实际数字会变化):

Initial loss: 71.204
Loss at step 000: 68.333
Loss at step 020: 30.222
Loss at step 040: 13.691
Loss at step 060: 6.508
Loss at step 080: 3.382
Loss at step 100: 2.018
Loss at step 120: 1.422
Loss at step 140: 1.161
Loss at step 160: 1.046
Loss at step 180: 0.996
Final loss: 0.974
W = 3.01582956314, B = 2.1191945076

Replay the tf.GradientTape to compute the gradients and apply them in a training loop. This is demonstrated in an excerpt from the mnist_eager.py example:

重放tf.GradientTape计算梯度并将它们应用于训练循环中。这是从mnist_eager.py 示例中摘录的:

dataset = tf.data.Dataset.from_tensor_slices((data.train.images,
                                              data.train.labels))
...
for (batch, (images, labels)) in enumerate(dataset):
  ...
  with tf.GradientTape() as tape:
    logits = model(images, training=True)
    loss_value = loss(logits, labels)
  ...
  grads = tape.gradient(loss_value, model.variables)
  optimizer.apply_gradients(zip(grads, model.variables),
                            global_step=tf.train.get_or_create_global_step())

The following example creates a multi-layer model that classifies the standard MNIST handwritten digits. It demonstrates the optimizer and layer APIs to build trainable graphs in an eager execution environment.

以下示例创建了一个多层模型,将标准的MNIST handwritten digits分类。它演示了优化器和层APIs层在动态图环境中构建可训练的图形。

Train a model

训练模型

Even without training, call the model and inspect the output in eager execution:

未训练,在动态图调用模型和检查输出:

# Create a tensor representing a blank image
batch = tf.zeros([1, 1, 784])
print(batch.shape)  # => (1, 1, 784)

result = model(batch)
# => tf.Tensor([[[ 0.  0., ..., 0.]]], shape=(1, 1, 10), dtype=float32)

This example uses the dataset.py module from the TensorFlow MNIST example; download this file to your local directory. Run the following to download the MNIST data files to your working directory and prepare a tf.data.Dataset for training:

此示例使用TensorFlow MNIST example中的dataset.py module模块;将此文件下载到本地目录。运行以下步骤将MNIST数据文件下载到工作目录,并准备一个tf.data.Dataset用于训练:

import dataset  # download dataset.py file
dataset_train = dataset.train('./datasets').shuffle(60000).repeat(4).batch(32)

To train a model, define a loss function to optimize and then calculate gradients. Use an optimizer to update the variables:

为训练模型,定义损失函数优化和计算梯度。使用优化器更新变量:

def loss(model, x, y):
  prediction = model(x)
  return tf.losses.sparse_softmax_cross_entropy(labels=y, logits=prediction)

def grad(model, inputs, targets):
  with tf.GradientTape() as tape:
    loss_value = loss(model, inputs, targets)
  return tape.gradient(loss_value, model.variables)

optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.001)

x, y = iter(dataset_train).next()
print("Initial loss: {:.3f}".format(loss(model, x, y)))

# Training loop
for (i, (x, y)) in enumerate(dataset_train):
  # Calculate derivatives of the input function with respect to its parameters.
  grads = grad(model, x, y)
  # Apply the gradient to the model
  optimizer.apply_gradients(zip(grads, model.variables),
                            global_step=tf.train.get_or_create_global_step())
  if i % 200 == 0:
    print("Loss at step {:04d}: {:.3f}".format(i, loss(model, x, y)))

print("Final loss: {:.3f}".format(loss(model, x, y)))

Output (exact numbers may vary):

输出(实际数字会变化):

Initial loss: 2.674
Loss at step 0000: 2.593
Loss at step 0200: 2.143
Loss at step 0400: 2.009
Loss at step 0600: 2.103
Loss at step 0800: 1.621
Loss at step 1000: 1.695
...
Loss at step 6600: 0.602
Loss at step 6800: 0.557
Loss at step 7000: 0.499
Loss at step 7200: 0.744
Loss at step 7400: 0.681
Final loss: 0.670

And for faster training, move the computation to a GPU:

为了快速训练,可以使用GPU运算:

with tf.device("/gpu:0"):
  for (i, (x, y)) in enumerate(dataset_train):
    # minimize() is equivalent to the grad() and apply_gradients() calls.
    optimizer.minimize(lambda: loss(model, x, y),
                       global_step=tf.train.get_or_create_global_step())

Variables and optimizers

变量和优化

tfe.Variable objects store mutable tf.Tensor values accessed during training to make automatic differentiation easier. The parameters of a model can be encapsulated in classes as variables.

tfe.Variable对象存储在训练期间访问的可变tf.Tensor值以使自动微分更容易。模型的参数可以像变量一样封装在类中。

Better encapsulate model parameters by using tfe.Variable with tf.GradientTape. For example, the automatic differentiation example above can be rewritten:

可使用基于tf.GradientTape的tfe.Variable更好的封装模型参数。例如,以上自动微分实例可以写成:

class Model(tf.keras.Model):
  def __init__(self):
    super(Model, self).__init__()
    self.W = tfe.Variable(5., name='weight')
    self.B = tfe.Variable(10., name='bias')
  def predict(self, inputs):
    return inputs * self.W + self.B

# A toy dataset of points around 3 * x + 2
NUM_EXAMPLES = 2000
training_inputs = tf.random_normal([NUM_EXAMPLES])
noise = tf.random_normal([NUM_EXAMPLES])
training_outputs = training_inputs * 3 + 2 + noise

# The loss function to be optimized
def loss(model, inputs, targets):
  error = model.predict(inputs) - targets
  return tf.reduce_mean(tf.square(error))

def grad(model, inputs, targets):
  with tf.GradientTape() as tape:
    loss_value = loss(model, inputs, targets)
  return tape.gradient(loss_value, [model.W, model.B])

# Define:
# 1. A model.
# 2. Derivatives of a loss function with respect to model parameters.
# 3. A strategy for updating the variables based on the derivatives.
model = Model()
optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.01)

print("Initial loss: {:.3f}".format(loss(model, training_inputs, training_outputs)))

# Training loop
for i in range(300):
  grads = grad(model, training_inputs, training_outputs)
  optimizer.apply_gradients(zip(grads, [model.W, model.B]),
                            global_step=tf.train.get_or_create_global_step())
  if i % 20 == 0:
    print("Loss at step {:03d}: {:.3f}".format(i, loss(model, training_inputs, training_outputs)))

print("Final loss: {:.3f}".format(loss(model, training_inputs, training_outputs)))
print("W = {}, B = {}".format(model.W.numpy(), model.B.numpy()))

Output (exact numbers may vary):

输出(实际数字会变化):

Initial loss: 69.066
Loss at step 000: 66.368
Loss at step 020: 30.107
Loss at step 040: 13.959
Loss at step 060: 6.769
Loss at step 080: 3.567
Loss at step 100: 2.141
Loss at step 120: 1.506
Loss at step 140: 1.223
Loss at step 160: 1.097
Loss at step 180: 1.041
Loss at step 200: 1.016
Loss at step 220: 1.005
Loss at step 240: 1.000
Loss at step 260: 0.998
Loss at step 280: 0.997
Final loss: 0.996
W = 2.99431324005, B = 2.02129220963

Use objects for state during eager execution

在动态图中使用对象作为状态

With graph execution, program state (such as the variables) is stored in global collections and their lifetime is managed by the tf.Session object. In contrast, during eager execution the lifetime of state objects is determined by the lifetime of their corresponding Python object.

(静态)图执行中,程序状态(例如变量)被存储在全局集合中,并且它们的生存周期由tf.Session对象管理。相反,在动态图中,状态对象的生命周期由其相应的Python对象的生存期决定。

Variables are objects

变量即对象

During eager execution, variables persist until the last reference to the object is removed, and is then deleted.

在动态图中,变量一直存在,直到对象的最后引用被删除,然后被删除。

with tf.device("gpu:0"):
  v = tfe.Variable(tf.random_normal([1000, 1000]))
  v = None  # v no longer takes up GPU memory

Object-based saving

基于对象的保存

tfe.Checkpoint can save and restore tfe.Variables to and from checkpoints:

tfe.Checkpoint可以保存和恢复tfe.Variables的变量:

x = tfe.Variable(10.)

checkpoint = tfe.Checkpoint(x=x)  # save as "x"

x.assign(2.)   # Assign a new value to the variables and save.
save_path = checkpoint.save('./ckpt/')

x.assign(11.)  # Change the variable after saving.

# Restore values from the checkpoint
checkpoint.restore(save_path)

print(x)  # => 2.0

To save and load models, tfe.Checkpoint stores the internal state of objects, without requiring hidden variables. To record the state of a model, an optimizer, and a global step, pass them to a tfe.Checkpoint:

为了保存和加载模型,tfe.Checkpoint存储对象的内部状态,而不需要隐藏变量。记录模型、优化器和全局步骤的状态,将它们传递给 tfe.Checkpoint

model = MyModel()
optimizer = tf.train.AdamOptimizer(learning_rate=0.001)
checkpoint_dir =/path/to/model_dir’
checkpoint_prefix = os.path.join(checkpoint_dir, "ckpt")
root = tfe.Checkpoint(optimizer=optimizer,
                      model=model,
                      optimizer_step=tf.train.get_or_create_global_step())

root.save(file_prefix=checkpoint_prefix)
# or
root.restore(tf.train.latest_checkpoint(checkpoint_dir))

Object-oriented metrics

面向对象度量

tfe.metrics are stored as objects. Update a metric by passing the new data to the callable, and retrieve the result using the tfe.metrics.result method, for example:

tfe.metrics作为对象存储。通过将新数据传递给可调用的更新度量,并使用tfe.metrics.result方法检索结果,例如:

m = tfe.metrics.Mean("loss")
m(0)
m(5)
m.result()  # => 2.5
m([8, 9])
m.result()  # => 5.5

Summaries and TensorBoard

摘要与TensorBoard

@{$summaries_and_tensorboard$TensorBoard} is a visualization tool for understanding, debugging and optimizing the model training process. It uses summary events that are written while executing the program.

@{$summaries_and_tensorboard$TensorBoard} 是一个可视化工具,用于理解、调试和优惠模型训练过程。它使用在执行程序时编写的摘要事件。

tf.contrib.summary is compatible with both eager and graph execution environments. Summary operations, such as tf.contrib.summary.scalar, are inserted during model construction. For example, to record summaries once every 100 global steps:

tf.contrib.summary在动态图和静态图执行环境中具有兼容性。在模型构建过程中插入摘要操作,如tf.contrib.summary.scalar。例如,每100个全局步骤记录摘要一次:

writer = tf.contrib.summary.create_file_writer(logdir)
global_step=tf.train.get_or_create_global_step()  # return global step var

writer.set_as_default()

for _ in range(iterations):
  global_step.assign_add(1)
  # Must include a record_summaries method
  with tf.contrib.summary.record_summaries_every_n_global_steps(100):
    # your model code goes here
    tf.contrib.summary.scalar('loss', loss)
     ...

Advanced automatic differentiation topics

高级自动微分主题

Dynamic models

动态模型

tf.GradientTape can also be used in dynamic models. This example for a backtracking line search algorithm looks like normal NumPy code, except there are gradients and is differentiable, despite the complex control flow:

tf.GradientTape可以用于动态模型。backtracking line search算法的这个示例看起来像正常的NumPy 代码,除了有复杂的控制流之外,还有梯度,并且是可微分的:

def line_search_step(fn, init_x, rate=1.0):
  with tf.GradientTape() as tape:
    # Variables are automatically recorded, but manually watch a tensor
    tape.watch(init_x)
    value = fn(init_x)
  grad = tape.gradient(value, init_x)
  grad_norm = tf.reduce_sum(grad * grad)
  init_value = value
  while value > init_value - rate * grad_norm:
    x = init_x - rate * grad
    value = fn(x)
    rate /= 2.0
  return x, value

Additional functions to compute gradients

计算梯度的附加函数

tf.GradientTape is a powerful interface for computing gradients, but there is another Autograd-style API available for automatic differentiation. These functions are useful if writing math code with only tensors and gradient functions, and without tfe.Variables:

tf.GradientTape是计算梯度的一个强大的接口,但还有另一个Autograd-style的API可用于自动分化。这些功能是有用的如果写数学代码只有张量梯度功能,没有变量的tfe.Variables

  • tfe.gradients_function —Returns a function that computes the derivatives of its input function parameter with respect to its arguments. The input function parameter must return a scalar value. When the returned function is invoked, it returns a list of tf.Tensor objects: one element for each argument of the input function. Since anything of interest must be passed as a function parameter, this becomes unwieldy if there's a dependency on many trainable parameters.
  • tfe.gradients_function —返回一个函数,该函数计算其输入函数参数相对于其参数的导数。输入函数参数必须返回标量值。当调用返回函数时,它返回tf.Tensor对象的列表:输入函数的每个参数的一个元素。由于任何感兴趣的事物都必须作为函数参数传递,如果对许多可训练参数有依赖性,这将变得笨拙。
  • tfe.value_and_gradients_function —Similar to tfe.gradients_function, but when the returned function is invoked, it returns the value from the input function in addition to the list of derivatives of the input function with respect to its arguments.
  • tfe.value_and_gradients_function —类似于tfe.gradients_function函数,但是当调用返回函数时,除了输入函数的导数的列表之外,它还从输入函数返回值。

In the following example, tfe.gradients_function takes the square function as an argument and returns a function that computes the partial derivatives of square with respect to its inputs. To calculate the derivative of square at 3grad(3.0) returns 6.

以下示例,    tfe.gradients_functionsquare函数作为为参数,并返回一个函数来计算square相对于它的输入的偏导数。计算square3的导数grad(3.0)返回6。

def square(x):
  return tf.multiply(x, x)

grad = tfe.gradients_function(square)

square(3.)  # => 9.0
grad(3.)    # => [6.0]

# The second-order derivative of square:
gradgrad = tfe.gradients_function(lambda x: grad(x)[0])
gradgrad(3.)  # => [2.0]

# The third-order derivative is None:
gradgradgrad = tfe.gradients_function(lambda x: gradgrad(x)[0])
gradgradgrad(3.)  # => [None]


# With flow control:
def abs(x):
  return x if x > 0. else -x

grad = tfe.gradients_function(abs)

grad(3.)   # => [1.0]
grad(-3.)  # => [-1.0]

Custom gradients

自定义梯度

Custom gradients are an easy way to override gradients in eager and graph execution. Within the forward function, define the gradient with respect to the inputs, outputs, or intermediate results. For example, here's an easy way to clip the norm of the gradients in the backward pass:

自定义梯度是一种简单的方式来克服梯度在动态图和图执行。在前向函数中,定义相对于输入、输出或中间结果的梯度。例如,这里有一种简单的方法来处理后传中梯度的范数:

@tf.custom_gradient
def clip_gradient_by_norm(x, norm):
  y = tf.identity(x)
  def grad_fn(dresult):
    return [tf.clip_by_norm(dresult, norm), None]
  return y, grad_fn

Custom gradients are commonly used to provide a numerically stable gradient for a sequence of operations:

自定义梯度通常是用来为一个操作序列提供数值稳定的梯度:

def log1pexp(x):
  return tf.log(1 + tf.exp(x))
grad_log1pexp = tfe.gradients_function(log1pexp)

# The gradient computation works fine at x = 0.
grad_log1pexp(0.)  # => [0.5]

# However, x = 100 fails because of numerical instability.
grad_log1pexp(100.)  # => [nan]

Here, the log1pexp function can be analytically simplified with a custom gradient. The implementation below reuses the value for tf.exp(x) that is computed during the forward pass—making it more efficient by eliminating redundant calculations:

这里,log1pexp函数可以用自定义梯度分析并简化。下面的实现重用了在正向传递期间计算的tf.exp(x)的值,通过消除冗余计算使其更有效:

@tf.custom_gradient
def log1pexp(x):
  e = tf.exp(x)
  def grad(dy):
    return dy * (1 - 1 / (1 + e))
  return tf.log(1 + e), grad

grad_log1pexp = tfe.gradients_function(log1pexp)

# As before, the gradient computation works fine at x = 0.
grad_log1pexp(0.)  # => [0.5]

# And the gradient computation also works at x = 100.
grad_log1pexp(100.)  # => [1.0]

Performance

性能

Computation is automatically offloaded to GPUs during eager execution. If you want control over where a computation runs you can enclose it in a tf.device('/gpu:0') block (or the CPU equivalent):

动态图中计算自动加载到GPU。如果想控制计算运行的地方,可以将其包含在tf.device('/gpu:0')块(或CPU等效)中:

import time

def measure(x, steps):
  # TensorFlow initializes a GPU the first time it's used, exclude from timing.
  tf.matmul(x, x)
  start = time.time()
  for i in range(steps):
    x = tf.matmul(x, x)
    _ = x.numpy()  # Make sure to execute op and not just enqueue it
  end = time.time()
  return end - start

shape = (1000, 1000)
steps = 200
print("Time to multiply a {} matrix by itself {} times:".format(shape, steps))

# Run on CPU:
with tf.device("/cpu:0"):
  print("CPU: {} secs".format(measure(tf.random_normal(shape), steps)))

# Run on GPU, if available:
if tfe.num_gpus() > 0:
  with tf.device("/gpu:0"):
    print("GPU: {} secs".format(measure(tf.random_normal(shape), steps)))
else:
  print("GPU: not found")

Output (exact numbers depend on hardware):

输出(实际数字根据实际硬件)

Time to multiply a (1000, 1000) matrix by itself 200 times:
CPU: 4.614904403686523 secs
GPU: 0.5581181049346924 secs

tf.Tensor object can be copied to a different device to execute its operations:

x = tf.random_normal([10, 10])

x_gpu0 = x.gpu()
x_cpu = x.cpu()

_ = tf.matmul(x_cpu, x_cpu)    # Runs on CPU
_ = tf.matmul(x_gpu0, x_gpu0)  # Runs on GPU:0

if tfe.num_gpus() > 1:
  x_gpu1 = x.gpu(1)
  _ = tf.matmul(x_gpu1, x_gpu1)  # Runs on GPU:1

Benchmarks

基准

For compute-heavy models, such as ResNet50 training on a GPU, eager execution performance is comparable to graph execution. But this gap grows larger for models with less computation and there is work to be done for optimizing hot code paths for models with lots of small operations.

对于计算重模型,如GPU上的ResNet50(一种深度卷积神经网络)训练,动态图性能可与(静态)图相媲美。但是,对于计算量较小的模型来说,这种差距越来越大,需要对许多小操作的模型优化hot code 路径。

Work with graphs

图工作

While eager execution makes development and debugging more interactive, TensorFlow graph execution has advantages for distributed training, performance optimizations, and production deployment. However, writing graph code can feel different than writing regular Python code and more difficult to debug.

虽然动态图使开发和调试更具交互性,但Tensorflow(静态)图对于分布式训练、性能优化和生产部署方面具有优势。但是,编写(静态)图代码会比编写常规Python代码感觉不同(不方便,不习惯,不爽),调试也困难。

For building and training graph-constructed models, the Python program first builds a graph representing the computation, then invokes Session.run to send the graph for execution on the C++-based runtime. This provides:

建立和训练图模型,首先建立一个Python程序图的计算,然后基于C++的环境,调用Session.run执行图。这提供:

  • Automatic differentiation using static autodiff.
  • 使用autodiff自动微分。
  • Simple deployment to a platform independent server.
  • 简单部署到与平台无关的服务器。
  • Graph-based optimizations (common subexpression elimination, constant-folding, etc.).
  • 基于图的优化(常见的子表达式消除、常数折叠等。)
  • Compilation and kernel fusion.
  • 编译与核融合。
  • Automatic distribution and replication (placing nodes on the distributed system).
  • 自动分发和复制(在分布式系统上部署节点)。

Deploying code written for eager execution is more difficult: either generate a graph from the model, or run the Python runtime and code directly on the server.

为动态图编写的代码部署更加困难:要么从模型生成图形,要么运行Python时直接在服务器上进行编码。

Write compatible code

写入兼容代码

The same code written for eager execution will also build a graph during graph execution. Do this by simply running the same code in a new Python session where eager execution is not enabled.

为动态图编写的相同代码也将在图执行期间建立图表。通过在一个新的Python会话中简单地运行相同的代码来执行这一操作,而在该Python会话中没有动态图。

Most TensorFlow operations work during eager execution, but there are some things to keep in mind:

大多数TensorFlow 运算都是在急切的执行过程中进行的,但是有以下情况请注意:

  • Use tf.data for input processing instead of queues. It's faster and easier.
  • 使用tf.data做输入处理而不是队列。它更快更容易。
  • Use object-oriented layer APIs—like tf.keras.layers and tf.keras.Model—since they have explicit storage for variables.
  • 使用面向对象的层API,如tf.keras.layerstf.keras.Model,因为它们对变量有明确的存储。
  • Most model code works the same during eager and graph execution, but there are exceptions. (For example, dynamic models using Python control flow to change the computation based on inputs.)
  • 大多数模型代码在动态图和(静态)图执行过程中都是相同的,但也有例外。(例如,使用Python控制流的动态模型来改变基于输入的计算)。
  • Once eager execution is enabled with tf.enable_eager_execution, it cannot be turned off. Start a new Python session to return to graph execution.
  • 如果使用tf.enable_eager_execution启用动态图,它就不能关闭。启动一个新的Python会话以返回到(静态)图执行。

It's best to write code for both eager execution and graph execution. This gives you eager's interactive experimentation and debuggability with the distributed performance benefits of graph execution.

最好是编写代码,动态图和(静态)图都使用。利用动态图交互体验和调试以及(静态)图的分布式性能

Write, debug, and iterate in eager execution, then import the model graph for production deployment. Use tfe.Checkpoint to save and restore model variables, this allows movement between eager and graph execution environments. See the examples in: tensorflow/contrib/eager/python/examples.

编写、调试和迭代执行,然后导入模型图进行生产部署。使用tfe.Checkpoint保存和恢复模型变量,这允许在动态图和(静态)图环境之间切换。

实例见:tensorflow/contrib/eager/python/examples.

Use eager execution in a graph environment

在图环境中使用动态图

Selectively enable eager execution in a TensorFlow graph environment using tfe.py_func. This is used 

when tf.enable_eager_execution() has not been called.

在Tensorflow图环境中,使用tfe.py_func有选择的使用动态图,当tf.enable_eager_execution()没有被调用时,可使用该函数

def my_py_func(x):
  x = tf.matmul(x, x)  # You can use tf ops
  print(x)  # but it's eager!
  return x

with tf.Session() as sess:
  x = tf.placeholder(dtype=tf.float32)
  # Call eager function in graph!
  pf = tfe.py_func(my_py_func, [x], tf.float32)
  sess.run(pf, feed_dict={x: [[2.0]]})  # [[4.0]]


完整项目下载

方便没积分童鞋,请加企鹅452205574,共享文件夹。

包括:代码、数据集合(图片)、已生成model、安装库文件等。

Tensorflow的动态图(Eager)介绍(官网原文译文)_第1张图片

你可能感兴趣的:(深度学习,机器学习,tensorFlow)