梯度带是tensorflow2.x非常常用的一个特性了,因为一旦涉及到计算梯度的问题就离不开这个新的API
persistent: Boolean controlling whether a persistent gradient tape is created. False by default, which means at most one call can be made to the gradient() method on this object.
watch_accessed_variables: Boolean controlling whether the tape will automatically watch any (trainable) variables accessed while the tape is active. Defaults to True meaning gradients can be requested from any result computed in the tape derived from reading a trainable Variable. If False users must explicitly watch any Variables they want to request gradients from.
persistent: 如果是false,那么gradient()函数最多只能调用一次。反之可以调用多次,默认是False.
watch_accessed_variables: 默认值是True,可以自动对任何Tensorflow 的Variable求梯度。
如果是False,那么只能显示调用Watch()方法对某些变量就梯度了
import tensorflow as tf
x= tf.Variable(initial_value=3.0)
with tf.GradientTape() as g:
y = x * x
dy_dx = g.gradient(y, x) # Will compute to 6.0
print(dy_dx)
dy_dx = g.gradient(y, x)
print(dy_dx)
执行结果调用第一次gradient()方法返回6,而第二次就抛错,因为persistent默认是False(GradientTape.gradient can only be called once on non-persistent tapes)
tf.Tensor(6.0, shape=(), dtype=float32)
Traceback (most recent call last):
File "**/GradientTape_test.py", line 70, in <module>
test1()
File "**/test/GradientTape_test.py", line 11, in test1
dy_dx = g.gradient(y, x) # Will compute to 6.0
File "**\lib\site-packages\tensorflow_core\python\eager\backprop.py", line 980, in gradient
raise RuntimeError("GradientTape.gradient can only be called once on "
RuntimeError: GradientTape.gradient can only be called once on non-persistent tapes.
import tensorflow as tf
x= tf.Variable(initial_value=3.0)
with tf.GradientTape(persistent=True) as g:
y = x * x
dy_dx = g.gradient(y, x) # Will compute to 6.0
print(dy_dx)
dy_dx = g.gradient(y, x)
print(dy_dx)
执行结果
tf.Tensor(6.0, shape=(), dtype=float32)
tf.Tensor(6.0, shape=(), dtype=float32)
import tensorflow as tf
x= tf.Variable(initial_value=3.0)
with tf.GradientTape(persistent=True) as g:
y = x * x
dy_dx = g.gradient(y, x) # Will compute to 6.0
print(dy_dx)
x = tf.constant(3.0)
with tf.GradientTape(persistent=True) as g1:
y = x * x
dy_dx = g1.gradient(y, x) # Will compute to 6.0
print(dy_dx)
with tf.GradientTape(persistent=True) as g1:
g1.watch(x)
y = x * x
dy_dx = g1.gradient(y, x) # Will compute to 6.0
print(dy_dx)
执行结果,如果用constant定义常量而且你想要对其求梯度,那么必须调用watch方法
tf.Tensor(6.0, shape=(), dtype=float32)
None
tf.Tensor(6.0, shape=(), dtype=float32)
import tensorflow as tf
x = tf.constant(3.0)
with tf.GradientTape() as g:
g.watch(x)
with tf.GradientTape() as gg:
gg.watch(x)
y = x * x
dy_dx = gg.gradient(y, x) # Will compute to 6.0
print(dy_dx)
d2y_dx2 = g.gradient(dy_dx, x) # Will compute to 2.
print(d2y_dx2)
import tensorflow as tf
x = tf.constant(3.0)
with tf.GradientTape(persistent=True) as g:
g.watch(x)
y = x * x
z = y * y
dz_dx = g.gradient(z, x) # 108.0 (4*x^3 at x = 3)
print(dz_dx)
dy_dx = g.gradient(y, x) # 6.0
print(dy_dx)
del g # Drop the reference to the tape
import tensorflow as tf
x = tf.ones((2, 2))
print(x)
y = tf.reduce_sum(x)
print(y)
z = tf.multiply(y, y)
print(z)
# 需要计算梯度的操作
with tf.GradientTape() as t:
t.watch(x)
y = tf.reduce_sum(x)
z = tf.multiply(y, y)
# 计算z关于x的梯度
dz_dx = t.gradient(z, x)
print(dz_dx)
import tensorflow as tf
x = tf.constant(value=3.0)
y = tf.constant(value=2.0)
with tf.GradientTape(persistent=True,watch_accessed_variables=True) as tape:
tape.watch([x,y])
z1=x*x*y+x*y
# 一阶导数
dz1_dx=tape.gradient(target=z1,sources=x)
dz1_dy = tape.gradient(target=z1, sources=y)
dz1_d=tape.gradient(target=z1,sources=[x,y])
print("dz1_dx:", dz1_dx)
print("dz1_dy:", dz1_dy)
print("dz1_d:",dz1_d)
print("type of dz1_d:",type(dz1_d))
执行结果
dz1_dx: tf.Tensor(14.0, shape=(), dtype=float32)
dz1_dy: tf.Tensor(12.0, shape=(), dtype=float32)
dz1_d: [<tf.Tensor: shape=(), dtype=float32, numpy=14.0>, <tf.Tensor: shape=(), dtype=float32, numpy=12.0>]
type of dz1_d: <class 'list'>
作用:把计算出来的梯度更新到变量上面去。
参数:
grads_and_vars: (gradient, variable) 对的列表.
name: 操作名
This is the second part ofminimize()
. It returns anOperation
that
applies gradients.
Args:
grads_and_vars: List of (gradient, variable) pairs.
name: Optional name for the returned operation. Default to the name
passed to theOptimizer
constructor.
Returns:
AnOperation
that applies the specified gradients. Theiterations
will be automatically increased by 1.
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
TRAIN_STEPS = 20
# Prepare train data
train_X = np.linspace(-1, 1, 100)
train_Y = 2 * train_X + np.random.randn(*train_X.shape) * 0.33 + 10
print(train_X.shape)
w = tf.Variable(initial_value=1.0)
b = tf.Variable(initial_value=1.0)
optimizer = tf.keras.optimizers.SGD(0.1)
mse = tf.keras.losses.MeanSquaredError()
for i in range(TRAIN_STEPS):
print("epoch:", i)
print("w:", w.numpy())
print("b:", b.numpy())
# 计算和更新梯度
with tf.GradientTape() as tape:
logit = w * train_X + b
loss = mse(train_Y, logit)
gradients = tape.gradient(target=loss, sources=[w, b]) # 计算梯度
# print("gradients:",gradients)
# print("zip:\n",list(zip(gradients,[w,b])))
optimizer.apply_gradients(zip(gradients, [w, b])) # 更新梯度
# draw
plt.plot(train_X, train_Y, "+")
plt.plot(train_X, w * train_X + b)
plt.show()
执行结果: 可以看到随着epoch增大,W和b值逐渐逼近2和10
epoch: 0
w: 1.0
b: 1.0
epoch: 1
w: 1.0676092
b: 2.7953496
epoch: 2
w: 1.13062
b: 4.231629
epoch: 3
w: 1.1893452
b: 5.3806524
epoch: 4
w: 1.2440765
b: 6.2998714
epoch: 5
w: 1.2950852
b: 7.035247
epoch: 6
w: 1.3426247
b: 7.623547
epoch: 7
w: 1.3869308
b: 8.094187
epoch: 8
w: 1.4282235
b: 8.470699
epoch: 9
w: 1.4667077
b: 8.771909
epoch: 10
w: 1.5025746
b: 9.0128765
epoch: 11
w: 1.5360019
b: 9.20565
epoch: 12
w: 1.5671558
b: 9.35987
epoch: 13
w: 1.5961908
b: 9.483246
epoch: 14
w: 1.6232511
b: 9.581946
epoch: 15
w: 1.6484709
b: 9.660907
epoch: 16
w: 1.6719754
b: 9.724075
epoch: 17
w: 1.6938813
b: 9.77461
epoch: 18
w: 1.7142972
b: 9.815037
epoch: 19
w: 1.7333245
b: 9.847379
因为tapes记录了整个操作,所以即使过程中存在python控制流(如if, while),梯度求导也能正常处理。
def f(x, y):
output = 1.0
# 根据y的循环
for i in range(y):
# 根据每一项进行判断
if i> 1 and i<5:
output = tf.multiply(output, x)
return output
def grad(x, y):
with tf.GradientTape() as t:
t.watch(x)
out = f(x, y)
# 返回梯度
return t.gradient(out, x)
# x为固定值
x = tf.convert_to_tensor(2.0)
print(grad(x, 6))
print(grad(x, 5))
print(grad(x, 4))
执行结果
tf.Tensor(12.0, shape=(), dtype=float32)
tf.Tensor(12.0, shape=(), dtype=float32)
tf.Tensor(4.0, shape=(), dtype=float32)
[0] https://www.tensorflow.org/api_docs/python/tf/GradientTape
[1] https://blog.csdn.net/xierhacker/article/details/53174558
[2] https://blog.csdn.net/qq_36758914/article/details/104456736