专业英语(5、Backpropagation)

Computational graphs(计算图)

专业英语(5、Backpropagation)_第1张图片

Convolutional network(卷积网络)

专业英语(5、Backpropagation)_第2张图片

Neural Turing Machine(神经图灵机)

专业英语(5、Backpropagation)_第3张图片

Why we need backpropagation?

Problem statement

The core problem studied in this section is as
follows: We are given some function f(x) where x is a vector of
inputs and we are interested in computing the gradient
of f at x (i.e.() ).

Motivation

In the specific case of Neural Networks, f(x) will
correspond to the loss function (L) and the inputs x will consist of the training data and the neural network weights. The training data is given and parameters (e.g. W and b) are variables. We need to compute the gradient for all the parameters so that we can use it to perform a parameter update.

Backpropagation: a simple example

专业英语(5、Backpropagation)_第4张图片

  • Q1: What is a add gate?

  • Q2:What is a max gate?

  • Q3:What is a mul gate?

  • A1:add gate: gradient distributor(梯度分配器)

  • A2:max gate: gradient router(梯度路由器)

  • A3:mul gate: gradient switcher(梯度切换器)

Another example:

专业英语(5、Backpropagation)_第5张图片


专业英语(5、Backpropagation)_第6张图片


专业英语(5、Backpropagation)_第7张图片


专业英语(5、Backpropagation)_第8张图片


专业英语(5、Backpropagation)_第9张图片

What about autograd?

  • Deep learning frameworks can automatically perform backprop!
  • Problems might surface related to underlying gradients when
    debugging your models.
  • 深度学习框架可以自动执行backprop(反向传播)!
  • 调试模型时,问题可能涉及表面的对底层梯度。

Modularized implementation: forward / backward API(模块化实现:前向/后向API)

专业英语(5、Backpropagation)_第10张图片


专业英语(5、Backpropagation)_第11张图片


例如:

f ( x , y ) = x + σ ( y ) σ ( x ) + ( x + y ) 2 , 其 中 σ ( x ) = 1 1 + e − x f(x,y)=\frac{x+\sigma(y)}{\sigma(x)+(x+y)^2},其中\sigma(x)=\frac{1}{1+e^{-x}} f(x,y)=σ(x)+(x+y)2x+σ(y),σx=1+ex1

构建前向传播的代码模式:
x = 3 # 例子数值
y = -4

# 前向传播
sigy = 1.0 / (1 + math.exp(-y)) # 分子中的sigmoi          #(1)
num = x + sigy # 分子                                    #(2)
sigx = 1.0 / (1 + math.exp(-x)) # 分母中的sigmoid         #(3)
xpy = x + y                                              #(4)
xpysqr = xpy**2                                          #(5)
den = sigx + xpysqr # 分母                                #(6)
invden = 1.0 / den                                       #(7)
f = num * invden # 搞定!                                 #(8)
构建后向传播的代码模式:
# 回传 f = num * invden
dnum = invden # 分子的梯度                                         #(8)
dinvden = num                                                     #(8)
# 回传 invden = 1.0 / den 
dden = (-1.0 / (den**2)) * dinvden                                #(7)
# 回传 den = sigx + xpysqr
dsigx = (1) * dden                                                #(6)
dxpysqr = (1) * dden                                              #(6)
# 回传 xpysqr = xpy**2
dxpy = (2 * xpy) * dxpysqr                                        #(5)
# 回传 xpy = x + y
dx = (1) * dxpy                                                   #(4)
dy = (1) * dxpy                                                   #(4)
# 回传 sigx = 1.0 / (1 + math.exp(-x))
dx += ((1 - sigx) * sigx) * dsigx # Notice += !! See notes below  #(3)
# 回传 num = x + sigy
dx += (1) * dnum                                                  #(2)
dsigy = (1) * dnum                                                #(2)
# 回传 sigy = 1.0 / (1 + math.exp(-y))
dy += ((1 - sigy) * sigy) * dsigy                                 #(1)
# 完成! 嗷~~

  • 前向传播:从输入计算到输出(绿色)
  • 反向传播:从尾部开始,根据链式法则递归地向前计算梯度(显示为红色),一直到网络的输入端。

∂ C ∂ w \frac{\partial C}{\partial w} wC

可以分解为两步:

1、

∂ z ∂ w \frac{\partial z}{\partial w} wz

这是个Forward pass(这个其实是求导的最后一步,可以从前向后直接得到)

2、

∂ C ∂ z \frac{\partial C}{\partial z} zC

这是个Backward pass(这部分不能直接求出来,需要运用递归思想,从后向前计算)

Summary so far…

  • neural nets will be very large: impractical to write down gradient formula
    by hand for all parameters
  • backpropagation = recursive application of the chain rule along a
    computational graph to compute the gradients of all
    inputs/parameters/intermediates
  • implementations maintain a graph structure, where the nodes implement forward() / backward() API
  • forward: compute result of an operation and save any intermediates
    needed for gradient computation in memory
  • backward: apply the chain rule to compute the gradient of the loss
    function with respect to the inputs

  • 神经网络将非常大:用手写下所有参数的梯度公式是不切实际的

  • backpropagation =沿着计算图递归应用链规则来计算所有的梯度

    输入/参数/中间体

  • 实现维护图结构,其中节点实现forward()/ backward()API

  • 计算操作结果并保存内存中梯度计算所需的任何中间体

  • 应用链规则计算相对于输入的损失函数的梯度

你可能感兴趣的:(学习复习,计算机视觉)