吴恩达神经网络与深度学习——深度神经网络

吴恩达神经网络与深度学习——深度神经网络

  • 深度神经网络
    • 符号
  • 前向传播
  • 矩阵维度
    • m个样本
  • 为什么使用深层表示
  • 搭建深层神经网络块
    • 正向传播和反向传播
  • 前向和反向传播
    • 前向传播
    • 反向传播
  • 参数和超参数
  • 和大脑的关系

深度神经网络

吴恩达神经网络与深度学习——深度神经网络_第1张图片

符号

吴恩达神经网络与深度学习——深度神经网络_第2张图片

l:层数
l = 4
n^[l]:每一次的单元数
n^[1] = 5 n^[2] = 5 n^[3] = 3 n^[4] = 1
a^[l]:每一次的激活函数
a^[l] = g^[l](z^[l])
w^[l]:每一次的权值
b^[l]:每一次的偏置

前向传播

吴恩达神经网络与深度学习——深度神经网络_第3张图片

x
z^[1] = w^[1]x + b^[1]
a^[1] = g^[1](z^[1])
z^[2] = w^[2]a^[1] + b^[2]
a^[2] = g^[2](z^[2])
z^[3] = w^[3]a^[2] + b^[3]
a^[3] = g^[3](z^[3])
z^[4] = w^[4]a^[3] + b^[4]
a^[4] = g^[4](z^[4])
for l =1 to 4
    z^[l] = w^[l]a^[l-1] + b^[l]
    a^[l] = g^[l](z^[l])
#m个样本向量化
Z^[1] = W^[1]A^[0] + b^[1]      #  X=A^[0]
A^[1] = g^[1](Z^[1])
Z^[2] = W^[2]A^[1]+ b^[2]
A^[2] = g^[2](z^[2])
Z^[3] = W^[3]A^[2] + b^[3]
A^[3] = g^[3](Z^[3])
Z^[4] = W^[4]A^[3] + b^[4]
A^[4] = g^[4](Z^[4])
for l = 1 to 4
	    Z^[l] = w^[l]A^[l-1] + b^[l]
	    A^[l] = g^[l](Z^[l])

矩阵维度

吴恩达神经网络与深度学习——深度神经网络_第4张图片

n^[0] = 2
n^[1] = 3
n^[2] = 5
n^[3] = 4
n^[4] = 2
n^[4] = 1
z^[1]        =      w^[1]               x            +        b^[1]
(3,1)               (3,2)             (2,1)                   (3,1)
(n^[1],1)        (n^[1],n^[0])       (n^[0],1)              (n^[1],1)
a^[1] = g^[1](z^[1])
(3,1)         (3,1)
(n^[1],1)   (n^[1],1)
z^[2]        =      w^[2]             a^[1]             +     b^[1]
(5,1)               (5,3)             (3,1)                   (5,1)
(n^[2],1)        (n^[2],n^[1])       (n^[1],1)              (n^[2],1)
a^[2] = g^[2](z^[2])
(5,1)         (5,1) 
(n^[2],1)   (n^[2],1)  
z^[3]        =      w^[3]             a^[2]             +     b^[3]
(4,1)               (4,5)             (5,1)                   (4,1)
(n^[3],1)        (n^[3],n^[2])       (n^[2],1)              (n^[3],1)
a^[3] = g^[3](z^[3])
(4,1)         (4,1)  
(n^[3],1)   (n^[3],1)  
z^[4]        =      w^[4]             a^[3]             +     b^[4]
(2,1)               (2,4)             (4,1)                   (2,1)
(n^[4],1)        (n^[4],n^[3])       (n^[3],1)              (n^[4],1)
a^[4] = g^[4](z^[4])
(2,1)         (2,1)        
(n^[4],1)    (n^[4],1)    
z^[5]        =      w^[5]             a^[4]             +     b^[5]
(1,1)               (1,2)             (2,1)                   (1,1)
(n^[5],1)        (n^[5],n^[4])       (n^[4],1)              (n^[5],1)
a^[5] = g^[5](z^[5])
(1,1)         (1,1)        
(n^[5],1)    (n^[5],1)    
for l = 1 to 5
	    z^[l]     =   w^[l]            a^[l-1]         +       b^[l]
	    (n^[l],1)    (n^[l],n^[l-1])  (n^[l-1],1)            (n^[l],1)
	    a^[l] = g^[l](z^[l])
	    (n^[l],1)    (n^[l],1) 

m个样本

Z^[1]        =      W^[1]               X            +        b^[1]
(3,m)               (3,2)             (2,m)                   (3,1)
(n^[1],m)        (n^[1],n^[0])       (n^[0],m)              (n^[1],1)
A^[1] = g^[1](Z^[1])
(3,m)         (3,m)
(n^[1],m)   (n^[1],m)
Z^[2]        =      W^[2]             A^[1]             +     b^[1]
(5,m)               (5,3)             (3,m)                   (5,1)
(n^[2],m)        (n^[2],n^[1])       (n^[1],m)              (n^[2],1)
A^[2] = g^[2](Z^[2])
(5,m)         (5,m) 
(n^[2],m)   (n^[2],m)  
Z^[3]        =      W^[3]             A^[2]             +     b^[3]
(4,m)               (4,5)             (5,m)                   (4,1)
(n^[3],m)        (n^[3],n^[2])       (n^[2],m)              (n^[3],1)
A^[3] = g^[3](Z^[3])
(4,m)         (4,m)  
(n^[3],m)   (n^[3],m)  
Z^[4]        =      W^[4]             A^[3]             +     b^[4]
(2,m)               (2,4)             (4,m)                   (2,1)
(n^[4],m)        (n^[4],n^[3])       (n^[3],m)              (n^[4],1)
A^[4] = g^[4](Z^[4])
(2,m)         (2,m)        
(n^[4],m)    (n^[4],m)    
Z^[5]        =      W^[5]             A^[4]             +     b^[5]
(1,m)               (1,2)             (2,m)                   (1,1)
(n^[5],m)        (n^[5],n^[4])       (n^[4],m)              (n^[5],1)
A^[5] = g^[5](Z^[5])
(1,m)         (1,m)        
(n^[5],m)    (n^[5],m)    
for l = 1 to 4
	    Z^[l]    =    w^[l]              A^[l-1]   +     b^[l]
	    (n^[l],m)    (n^[l],n^[l-1])   (n^[l-1],m)     (n^[l],1)
	    A^[l] = g^[l](Z^[l])
      (n^[l],m)     (n^[l],m)

为什么使用深层表示

深度神经网络有很多的隐层,较早的前几层能学习一些低层次的简单特征,后几层就能将简单的特征结合起来去探测更加复杂的东西

搭建深层神经网络块

吴恩达神经网络与深度学习——深度神经网络_第5张图片

第l层参数: w^[l],b^[l]
前向传播: 输入 a^[l-1] 输出 a^[l]  存储 z^[l]
					z^[l] = w^[l]a^[l-1] + b^[l]
					a^[l] = g^[l](z^[l])
反向传播: 输入 da^[l]   输出da^[l-1]  dw^[l] db^[l]
					前向传播存储的z^[l] 					
					

吴恩达神经网络与深度学习——深度神经网络_第6张图片

正向传播和反向传播

吴恩达神经网络与深度学习——深度神经网络_第7张图片

从a^[0]开始,也就是x,经过一系列正向传播计算得到yhat,之后再用输出值计算实现反向传播,得到所有的导数项,w,b也在每一层更新
编程细节:将z^[l],w^[l],b^[l]存储

前向和反向传播

前向传播

前向传播: 输入 a^[l-1] 输出 a^[l]  存储 z^[l]
					z^[l] = w^[l]a^[l-1] + b^[l]
					a^[l] = g^[l](z^[l])
		向量化:
					Z^[l] = W^[l]A^[l-1] + b^[l]
					A^[l] = g^[l](Z^[l])

反向传播

反向传播: 输入 da^[l]   输出da^[l-1]  dw^[l] db^[l]
					dz^[l] = da^[l]*g'^[l](z^[l])
					dw^[l] = dz^[l]a^[l-1]
					db^[l] = dz^[l]
					da^[l-1] = w^[l]Tdz^[l]
					dz^[l] = w^[l+1]Tdz^[l+1]*g'^[l](z^[l])
		向量化:
					dZ^[l] = dA^[l]*g'^[l](Z^[l])
					dW^[l] =(1/m)dZ^[l]A^[l-1]T
					db^[l] = (1/m)np.sum(dZ^[l],axis = 1,keepdims = True)
					dA^[l-1] = W^[l]TdZ^[l]
					dZ^[l] = W^[l+1]TdZ^[l+1]*g'^[l](Z^[l])

吴恩达神经网络与深度学习——深度神经网络_第8张图片

参数和超参数

参数: w^[1],b^[1],w^[2],b^[2]...
超参数:
			学习率:alpha
			循环下降法的迭代次数:iteration
			隐藏层数:l
			隐藏单元数:n^[1],n^[2]...
			激活函数:sigmoid ,relu, tanh
			

和大脑的关系

吴恩达神经网络与深度学习——深度神经网络_第9张图片

你可能感兴趣的:(神经网络与深度学习)