【cs231n Lesson4】Backpropagation

个人学习笔记
Date:2023.01.06
参考web:cs231n官方笔记

简单表达式以及对梯度的解释

Expression如下:
f ( x , y ) = x y   →   ∂ f ∂ x = y ,   ∂ f ∂ y = x f(x,y)=xy \space\to\space\frac{\partial f}{\partial x}=y,\space\frac{\partial f}{\partial y}=x f(x,y)=xy  xf=y, yf=x


d f ( x ) d x = l i m h → 0 f ( x + h ) − f ( x ) h \frac{\mathrm{d}f(x)}{\mathrm{d}x}=lim_{h \to 0}\frac{f(x+h)-f(x)}{h} dxdf(x)=limh0hf(x+h)f(x)
可得
f ( x + h ) = f ( x ) + h d f ( x ) d x f(x+h) = f(x)+h\frac{\mathrm{d}f(x)}{\mathrm{d}x} f(x+h)=f(x)+hdxdf(x)

一个例子,如果 x = 4 , y = − 3 x=4,y=-3 x=4,y=3,那么对 x x x的偏导为-3,意思就是如果 x x x增加一个小小的值 h h h,那么 f f f就会减小,且减小的值为 3 h 3h 3h

梯度是偏导的向量,既 ∇ f = [ ∂ f ∂ x , ∂ f ∂ x ] \nabla f =[\frac{\partial f}{\partial x},\frac{\partial f}{\partial x}] f=[xf,xf]。所以 x x x上的梯度,也就是 f f f x x x的偏导

常见汇总:

  1. 乘法:如 f = x y f = xy f=xy,偏导结果为另一个因子。如 ∂ f ∂ x = y \frac{\partial f}{\partial x} = y xf=y
  2. 加法:如 f = 2 x + y f= 2x+y f=2x+y,偏导结果为前面的系数,如 ∂ f ∂ x = 2 \frac{\partial f}{\partial x}=2 xf=2
  3. 最大值:如 f = m a x ( x , y ) f=max(x,y) f=max(x,y),如果 x > y x>y x>y,则偏导结果为 ∂ f ∂ x = 1 , ∂ f ∂ y = 0 \frac{\partial f}{\partial x}=1,\frac{\partial f}{\partial y}=0 xf=1,yf=0。这个结果表示函数对于 x x x的改变很敏感,而对 y y y的变化不敏感,既 y y y变化一小段距离,并不会对 f f f产生影响,而 x x x变化一小段距离, f f f也会变化一小段距离。注意偏导是 h h h接近于0,所以一大段距离不在讨论范围内。

结合链式法则理解表达式

假如有以下表达式
f ( x , y , z ) = ( x + y ) z f(x,y,z) = (x+y)z f(x,y,z)=(x+y)z
q = x + y q=x+y q=x+y,则 f = q z f=qz f=qz,则有以下偏导
∂ f ∂ q = z , ∂ f ∂ z = q , ∂ q ∂ x = 1 , ∂ q ∂ y = 1 \frac{\partial f}{\partial q}=z, \frac{\partial f}{\partial z}=q,\frac{\partial q}{\partial x}=1,\frac{\partial q}{\partial y}=1 qf=z,zf=q,xq=1,yq=1
我们想要的是 f f f x , y , z x,y,z x,y,z的偏导,则有 ∂ f ∂ x = ∂ f ∂ q ∂ q ∂ x = z ∗ 1 = z \frac{\partial f}{\partial x}=\frac{\partial f}{\partial q}\frac{\partial q}{\partial x}=z*1=z xf=qfxq=z1=z

反向传播机制如下
【cs231n Lesson4】Backpropagation_第1张图片
反向传播机制就是上一步回流的值乘以偏导,比较好理解。解释如下:
f f f x x x的偏导为-4,或者说 x x x上的梯度为-4,就是说输入 x x x的值变大,比如由 − 2 -2 2变为 − 1 -1 1,则 q q q变为4,则 f f f变为-16,输出 f f f变小,且变小的值为4倍 x x x变大的值(-4)

模块化:Sigmoid函数例子

f ( w , x ) = 1 1 + e − ( w 0 x 0 + w 1 x 1 + w 2 ) f(w,x) = \frac{1}{1+e^{-(w_0x_0+w_1x_1+w_2)}} f(w,x)=1+e(w0x0+w1x1+w2)1
其反向传播图如下(不解释)
【cs231n Lesson4】Backpropagation_第2张图片
这里输入为[x0,x1],输出为learnable的权重网络[w0,w1,w2]。稍后解释。

这里模板化,令
σ ( x ) = 1 1 + e − x → d σ ( X ) d x = ( 1 − σ ( x ) ) σ ( x ) \sigma(x)=\frac{1}{1+e^{-x}}\\\to\frac{\mathrm{d}\sigma(X)}{\mathrm{d}x}=(1-\sigma(x))\sigma(x) σ(x)=1+ex1dxdσ(X)=(1σ(x))σ(x)

当输入 x = 1 x=1 x=1时,输出 σ ( x ) = 0.73 \sigma(x)=0.73 σ(x)=0.73,则梯度为 ( 1 − 0.73 ) ∗ 0.73   = 0.2 (1-0.73)*0.73 ~= 0.2 (10.73)0.73 =0.2。可见简便了不少。实际应用中,也经常模板化一些复杂的表达式。
以下程序中,设置中间变量dot,然后设置输出为f,则可知中间变量的梯度为(1-f)*f设置为ddot,则dx梯度则为[w[0]*ddot,w[1]*ddot]

w = [2,-3,-3] # assume some random weights and data
x = [-1, -2]

# forward pass
dot = w[0]*x[0] + w[1]*x[1] + w[2]
f = 1.0 / (1 + math.exp(-dot)) # sigmoid function

# backward pass through the neuron (backpropagation)
ddot = (1 - f) * f # gradient on dot variable, using the sigmoid gradient derivation
dx = [w[0] * ddot, w[1] * ddot] # backprop into x
dw = [x[0] * ddot, x[1] * ddot, 1.0 * ddot] # backprop into w
# we're done! we have the gradients on the inputs to the circuit

一个没用的函数但是是个很好的例子

f ( x , y ) = x + σ ( y ) σ ( x ) + ( x + y ) 2 f(x,y) = \frac{x+\sigma(y)}{\sigma(x)+(x+y)^2} f(x,y)=σ(x)+(x+y)2x+σ(y)

首先模板化,代码如下

x = 3 # example values
y = -4

# forward pass
sigy = 1.0 / (1 + math.exp(-y)) # sigmoid in numerator   #(1)
num = x + sigy # numerator                               #(2)
sigx = 1.0 / (1 + math.exp(-x)) # sigmoid in denominator #(3)
xpy = x + y                                              #(4)
xpysqr = xpy**2                                          #(5)
den = sigx + xpysqr # denominator                        #(6)
invden = 1.0 / den                                       #(7)
f = num * invden # done!                                 #(8)

forward pass之后,再反向传播即可,注意代码中要得到的值。

# backprop f = num * invden
dnum = invden # gradient on numerator                             #(8)
dinvden = num                                                     #(8)
# backprop invden = 1.0 / den 
dden = (-1.0 / (den**2)) * dinvden                                #(7)
# backprop den = sigx + xpysqr
dsigx = (1) * dden                                                #(6)
dxpysqr = (1) * dden                                              #(6)
# backprop xpysqr = xpy**2
dxpy = (2 * xpy) * dxpysqr                                        #(5)
# backprop xpy = x + y
dx = (1) * dxpy                                                   #(4)
dy = (1) * dxpy                                                   #(4)
# backprop sigx = 1.0 / (1 + math.exp(-x))
dx += ((1 - sigx) * sigx) * dsigx # Notice += !! See notes below  #(3)
# backprop num = x + sigy
dx += (1) * dnum                                                  #(2)
dsigy = (1) * dnum                                                #(2)
# backprop sigy = 1.0 / (1 + math.exp(-y))
dy += ((1 - sigy) * sigy) * dsigy                                 #(1)
# done! phew

自己写的草图
【cs231n Lesson4】Backpropagation_第3张图片

向量化

这里学习一下如何根据维度为进行点乘

# forward pass
W = np.random.randn(5, 10)
X = np.random.randn(10, 3)
D = W.dot(X)

# now suppose we had the gradient on D from above in the circuit
dD = np.random.randn(*D.shape) # same shape as D
dW = dD.dot(X.T) #.T gives the transpose of the matrix
dX = W.T.dot(dD)

dwdx应和W以及X一个shape

你可能感兴趣的:(cs231n,人工智能,算法)