之前我们介绍了Tensor
以及在其上的操作,下面我们介绍一下自动微分技术,—用来优化模型参数的关键。
tensorflow 提供了用于自动微分的API,来计算一个函数的导数。一种更接近数学的求导方法是:先写一个python函数,封装好对参数的运算。然后使用tf.contrib.eager.gradients_function
来创建一个函数计算上面封装好的函数的导函数(可指定对哪个参数求导)。同时,只要嵌套调用该函数,即可求高阶导。
什么是梯度带呢?对于每一个可导的TF函数操作,其都有对应的导函数。
对于上面提到的用户封装的一个函数,TF都会记录下这样一系列的操作,称之为tape
然后根据tape和每一个primitive操作(即上面提高的TF函数),就可以使用reverse mode differentiation
方法来计算用户封装的函数啦
有的时候,函数不是那么好封装,比如你需要对中间的某个变量进行求导。这个时候你可以使用tf.GradientTape
context(python的上下文管理机制)
All computation inside the context of a tf.GradientTape
is “recorded”.
Higher-order gradients
GradientTape
上下文管理器中的所有相关操作都会记录下来用于automatic differentiation.
If gradients are computed in that context, then the gradient computation is recorded as well.
As a result, the exact same API works for higher-order gradients as well.
详见代码
import tensorflow as tf
import matplotlib.pyplot as plt
from math import pi
tf.enable_eager_execution()
tfe = tf.contrib.eager
def f(x):
return tf.square(tf.sin(x))
print( f(pi/2).numpy() == 1.0 )
grad_f = tfe.gradients_function(f) # f的梯度函数
print(tf.abs(grad_f((pi/2,pi))).numpy())
# Higher-order gradients
def grad(f): #相当于直接返回一个一阶梯度函数
return lambda x : tfe.gradients_function(f)(x)[0]
x = tf.lin_space(-2*pi,2*pi,100) # 100 points between -2pi ~ 2pi
plt.plot(x, f(x), label = 'f')
plt.plot(x, grad(f)(x), label = 'first derivative')
plt.plot(x, grad(grad(f))(x), label = 'second derivative')
plt.plot(x, grad(grad(grad(f)))(x), label = 'third derivative')
plt.legend()
plt.show()
'''
Gradient tapes
'''
import tensorflow as tf
tf.enable_eager_execution()
tfe = tf.contrib.eager # shorthand for some symbols
# x^y
def f(x,y):
output = 1
# Must use range(int(y)) instead of range(y) in Python 3 when
# using TensorFlow 1.10 and earlier. Can use range(y) in 1.11+
for i in range(int(y)): # you can use for loop (#^.^#)
output = tf.multiply(output, x)
return output
# d x^y / d x
def g(x,y):
# Return the gradient of 'f' with respect to it's first parameter default?
return tfe.gradients_function(f)(x,y)[0]
print( f(3,2).numpy() )
print( g(3.0,2).numpy() )
print( f(3,3).numpy() )
print( g(3.0,3).numpy() )
'''
At times it may be inconvenient to encapsulate(封装) computation of
interest into a function. For example, if you want the gradient
of the output with respect to intermediate(中间的) values computed in
the function. In such cases, the slightly more verbose but
explicit(明确的) **tf.GradientTape** context is useful. All computation
inside the context(上下文) of a **tf.GradientTape** is "recorded".
'''
import tensorflow as tf
tf.enable_eager_execution()
tfe = tf.contrib.eager # shorthand for some symbols
x= tf.ones((2, 2))
with tf.GradientTape(persistent = True) as t: # persistent 持久的
t.watch(x)
y = tf.reduce_sum(x) # 就是求和的意思(降维)
z = tf.multiply(y,y)
#use the same tape to compute the derivative of z with
# respect to the intermediate value y
dz_dy = t.gradient(z, y) # 对y求导
print(dz_dy.numpy())
# Derivative of z with respect to the original input tensor x
dz_dx = t.gradient(z, x) # 对x求导
print(dz_dx.numpy())
'''
higher-order gradients
Operations inside of the GradientTape context manager
are recorded for automatic differentiation.
If gradients are computed in that context,
then the gradient computation is recorded as well.
As a result, the exact same API works for higher-order
gradients as well. For example:
'''
x = tf.Variable(1.0) # Convert the Python 1.0 to a Tensor object
with tf.GradientTape() as t:
with tf.GradientTape() as t2:
y = x*x*x
# Compute the gradient inside the 't' context manager
# which means the gradient computation is differentiable as well.
dy_dx = t2.gradient(y,x)
d2y_dx2 = t.gradient(dy_dx, x)
print(dy_dx.numpy(), d2y_dx2.numpy())