笔记为自我总结整理的学习笔记,若有错误欢迎指出哟~
笔记链接
【深度学习】吴恩达课程笔记(一)——深度学习概论、神经网络基础
x1 ,x2 ,x3:输入层A[0],指的是单个样本的输入值
中间四个神经元:隐藏层A[1]
右侧的单个神经元:输出层A[2]
单次训练过程:
正向传播
z 1 [ 1 ] = w 1 [ 1 ] T x + b 1 [ 1 ] , a 1 [ 1 ] = σ ( z 1 [ 1 ] ) z 2 [ 1 ] = w 2 [ 1 ] T x + b 2 [ 1 ] , a 1 [ 1 ] = σ ( z 2 [ 1 ] ) z 3 [ 1 ] = w 3 [ 1 ] T x + b 3 [ 2 ] , a 1 [ 1 ] = σ ( z 3 [ 1 ] ) z 4 [ 1 ] = w 4 [ 1 ] T x + b 4 [ 1 ] , a 1 [ 1 ] = σ ( z 4 [ 1 ] ) z^{[1]}_{1}=w^{[1]T}_{1}x+b^{[1]}_{1},a^{[1]}_{1}=σ(z^{[1]}_{1})\\ z^{[1]}_{2}=w^{[1]T}_{2}x+b^{[1]}_{2},a^{[1]}_{1}=σ(z^{[1]}_{2})\\ z^{[1]}_{3}=w^{[1]T}_{3}x+b^{[2]}_{3},a^{[1]}_{1}=σ(z^{[1]}_{3})\\ z^{[1]}_{4}=w^{[1]T}_{4}x+b^{[1]}_{4},a^{[1]}_{1}=σ(z^{[1]}_{4})\\ z1[1]=w1[1]Tx+b1[1],a1[1]=σ(z1[1])z2[1]=w2[1]Tx+b2[1],a1[1]=σ(z2[1])z3[1]=w3[1]Tx+b3[2],a1[1]=σ(z3[1])z4[1]=w4[1]Tx+b4[1],a1[1]=σ(z4[1])
Z [ 1 ] = W [ 1 ] X + b [ 1 ] A [ 1 ] = σ ( Z [ 1 ] ) Z [ 2 ] = W [ 2 ] A [ 1 ] + b [ 2 ] A [ 2 ] = σ ( Z [ 2 ] ) Z^{[1]}=W^{[1]}X+b^{[1]}\\ A^{[1]}=σ(Z^{[1]})\\ Z^{[2]}=W^{[2]}A^{[1]}+b^{[2]}\\ A^{[2]}=σ(Z^{[2]}) Z[1]=W[1]X+b[1]A[1]=σ(Z[1])Z[2]=W[2]A[1]+b[2]A[2]=σ(Z[2])
反向传播
多个样本
训练样本集:X = [x(1),x(2),x(3), … ,x(m)],其中x(i)是第 i 个训练样本,共m个样本
第一层神经元的w参数集:
第一层神经元的b参数集:
第一层前向传播过程计算Z[1]
第一层前向传播过程计算A[1]
第二层神经元的w参数集:
第二层神经元的b参数集:
第二层前向传播过程计算Z[2]
第二层前向传播过程计算A[2]
核对矩阵维数
第一层 X . s h a p e = ( n [ 0 ] , m ) W [ 1 ] . s h a p e = ( n [ 1 ] , n [ 0 ] ) b [ 1 ] . s h a p e = ( n [ 1 ] , 1 ) Z [ 1 ] . s h a p e = ( n [ 1 ] , m ) A [ 1 ] . s h a p e = ( n [ 1 ] , m ) 第二层 W [ 2 ] . s h a p e = ( n [ 2 ] , n [ 1 ] ) Z [ 2 ] . s h a p e = ( n [ 2 ] , m ) A [ 2 ] . s h a p e = ( n [ 2 ] , m ) Y . s h a p e = A [ 2 ] . s h a p e = ( n [ 2 ] , m ) \textcolor{red}{第一层}\\ X.shape=(n^{[0]},m)\\ W^{[1]}.shape=(n^{[1]},n^{[0]})\\ b^{[1]}.shape=(n^{[1]},1)\\ Z^{[1]}.shape=(n^{[1]},m)\\ A^{[1]}.shape=(n^{[1]},m)\\ \textcolor{red}{第二层} \\ W^{[2]}.shape=(n^{[2]},n^{[1]})\\ Z^{[2]}.shape=(n^{[2]},m)\\ A^{[2]}.shape=(n^{[2]},m)\\ Y.shape=A^{[2]}.shape=(n^{[2]},m) 第一层X.shape=(n[0],m)W[1].shape=(n[1],n[0])b[1].shape=(n[1],1)Z[1].shape=(n[1],m)A[1].shape=(n[1],m)第二层W[2].shape=(n[2],n[1])Z[2].shape=(n[2],m)A[2].shape=(n[2],m)Y.shape=A[2].shape=(n[2],m)
训练样本维数: n [ 0 ] 隐藏层神经元个数: n [ 1 ] 输出层神经元个数: n [ 2 ] = 1 W [ 1 ] : ( n [ 1 ] , n [ 0 ] ) b [ 1 ] : ( n [ 1 ] , 1 ) W [ 2 ] : ( n [ 2 ] , n [ 1 ] ) b [ 2 ] : ( n [ 2 ] , 1 ) 成本函数: J ( W , b ) = 1 m ∑ i = 1 m L ( y ^ i , y i ) 训练样本维数:n^{[0]} \\ 隐藏层神经元个数:n^{[1]} \\ 输出层神经元个数:n^{[2]}=1 \\ W^{[1]}:(n^{[1]},n^{[0]})\\ b^{[1]}:(n^{[1]},1)\\ W^{[2]}:(n^{[2]},n^{[1]})\\ b^{[2]}:(n^{[2]},1)\\ 成本函数:J(W,b)=\frac{1}{m}\sum_{i=1}^{m}{L(ŷ_i,y_i)} 训练样本维数:n[0]隐藏层神经元个数:n[1]输出层神经元个数:n[2]=1W[1]:(n[1],n[0])b[1]:(n[1],1)W[2]:(n[2],n[1])b[2]:(n[2],1)成本函数:J(W,b)=m1i=1∑mL(y^i,yi)
d W [ i ] = ∂ J ∂ W [ i ] , d b [ i ] = ∂ J ∂ b [ i ] W [ i ] = W [ i ] − α d W [ i ] b [ i ] = b [ i ] − α d b [ i ] i = 1 , 2 dW^{[i]}=\frac{\partial J}{\partial W^{[i]}},db^{[i]}=\frac{\partial J}{\partial b^{[i]}}\\ W^{[i]}=W^{[i]}-\alpha dW{[i]} \\ b^{[i]}=b^{[i]}-\alpha db{[i]}\\ i=1,2 dW[i]=∂W[i]∂J,db[i]=∂b[i]∂JW[i]=W[i]−αdW[i]b[i]=b[i]−αdb[i]i=1,2
d Z [ 2 ] = A [ 2 ] − Y d W [ 2 ] = 1 m d Z [ 2 ] A [ 1 ] T d b [ 2 ] = 1 m n p . s u m ( d Z [ 2 ] , a x i s = 1 , k e e p d i m s = T r u e ) d Z [ 1 ] = W [ 2 ] T d Z [ 1 ] ∗ g [ 1 ] ′ ( Z [ 1 ] ) d W [ 1 ] = 1 m d Z [ 1 ] X T d b [ 1 ] = 1 m n p . s u m ( d Z [ 1 ] , a x i s = 1 , k e e p d i m s = T r u e ) dZ^{[2]}=A^{[2]}-Y\\ dW^{[2]}=\frac{1}{m}dZ^{[2]}A^{[1]T}\\ db^{[2]}=\frac{1}{m}np.sum(dZ^{[2]},axis=1,keepdims=True)\\ dZ^{[1]}=W^{[2]T}dZ^{[1]}*g^{[1]'}(Z^{[1]})\\ dW^{[1]}=\frac{1}{m}dZ^{[1]}X^{T}\\ db^{[1]}=\frac{1}{m}np.sum(dZ^{[1]},axis=1,keepdims=True)\\ dZ[2]=A[2]−YdW[2]=m1dZ[2]A[1]Tdb[2]=m1np.sum(dZ[2],axis=1,keepdims=True)dZ[1]=W[2]TdZ[1]∗g[1]′(Z[1])dW[1]=m1dZ[1]XTdb[1]=m1np.sum(dZ[1],axis=1,keepdims=True)
sigmoid:只可能用于二元分类的输出层。
a = 1 1 + e − z d a d z = a ( 1 − a ) a=\frac{1}{1+e^{-z}}\\ \frac{da}{dz}=a(1-a) a=1+e−z1dzda=a(1−a)
tanh:几乎在所有情况下优于sigmoid函数。(计算速度更快)
a = e z − e − z e z + e − z d a d z = 1 − a 2 a=\frac{e^{z}-e^{-z}}{e^{z}+e^{-z}}\\ \frac{da}{dz}=1-a^2 a=ez+e−zez−e−zdzda=1−a2
ReLU(Rectified Linear Unit):最常用的默认激活函数
a = m a x ( 0 , z ) d a d z = { 0 , z < 0 1 , z > 0 u n d e f i n e d , z = 0 a=max(0,z)\\ \frac{da}{dz}=\left\{ \begin{aligned} 0 & , z<0 \\ 1 & , z>0 \\ undefined&,z=0 \end{aligned} \right. a=max(0,z)dzda=⎩ ⎨ ⎧01undefined,z<0,z>0,z=0
leaky ReLU:有人认为这个比ReLU好
a = m a x ( α z , z ) , α u s u a l l y l e s s t h a n 1 d a d z = { α , z < 0 1 , z > 0 u n d e f i n e d , z = 0 a=max(\alpha z,z),\alpha \ usually \ less\ than\ 1\\ \frac{da}{dz}=\left\{ \begin{aligned} \alpha & , z<0 \\ 1 & , z>0 \\ undefined&,z=0 \end{aligned} \right. a=max(αz,z),α usually less than 1dzda=⎩ ⎨ ⎧α1undefined,z<0,z>0,z=0
只有一种情况可能使用线性激活函数:在输出层。
变量名 | 变量含义 |
---|---|
l | 层数 |
n[l] | l 层的单元数 |
矩阵符号 | 矩阵维数 |
---|---|
X | (n[0],m) |
W[l] and dW[l] | (n[l],n[l-1]) |
b[l] and db[l] | (n[l],1) |
Z[l] and dZ[l] | (n[l],m) |
A[l] and dA[l] | (n[l],m) |
Y | (n[the last l ],m) |
深层表示(Deep Representation)是神经网络中的一个重要概念,它指的是通过多层非线性变换来逐步提取输入数据的高级特征表示。
以下是使用深层表示的几个主要原因:
前向传播
A [ 0 ] = X Z [ l ] = W [ 1 ] A [ l − 1 ] + b [ l ] A [ l ] = g [ l ] ( Z [ l ] ) A^{[0]}=X\\ Z^{[l]}=W^{[1]}A^{[l-1]}+b^{[l]}\\ A^{[l]}=g^{[l]}(Z^{[l]})\\ A[0]=XZ[l]=W[1]A[l−1]+b[l]A[l]=g[l](Z[l])
反向传播
d Z [ l ] = d A [ l ] ∗ g [ l ] ′ ( Z [ l ] ) d W [ l ] = 1 m d Z [ l ] A [ l − 1 ] T d b [ l ] = 1 m n p . s u m ( d Z [ l ] , a x i s = 1 , k e e p d i m s = T r u e ) d A [ l − 1 ] = W [ l ] T d Z [ l ] \textcolor{red}{}\\ dZ^{[l]}=dA^{[l]}*g^{[l]'}(Z^{[l]})\\ dW^{[l]}=\frac{1}{m}dZ^{[l]}A^{[l-1]T}\\ db^{[l]}=\frac{1}{m}np.sum(dZ^{[l]},axis=1,keepdims=True)\\ dA^{[l-1]}=W^{[l]T}dZ^{[l]} dZ[l]=dA[l]∗g[l]′(Z[l])dW[l]=m1dZ[l]A[l−1]Tdb[l]=m1np.sum(dZ[l],axis=1,keepdims=True)dA[l−1]=W[l]TdZ[l]