【Andrew Ng Deep Learning个人学习笔记】 2、神经网络基础(2)

计算图(Computation Graph)

举例:
   J ( a , b , c ) = 3 ( a + b c )    ⟹    { u = b c v = a + u J = 3 v J(a,b,c)=3(a+bc)\implies\begin{cases} u=bc \\ v=a+u \\ J=3v \end{cases} J(a,b,c)=3(a+bc)u=bcv=a+uJ=3v
那么这个函数的计算图为:
【Andrew Ng Deep Learning个人学习笔记】 2、神经网络基础(2)_第1张图片
  
  

逻辑回归梯度下降算法(Gradient descent algorithm)

单个训练样本(One training sample):
    z = w T + b z=w^T+b z=wT+b
    y ^ = a = σ ( z ) \hat{y}=a=\sigma(z) y^=a=σ(z)
    L ( a , y ) = − ( y l o g ( a ) + ( 1 − y ) l o g ( 1 − a ) ) L(a,y)=-(ylog(a)+(1-y)log(1-a)) L(a,y)=(ylog(a)+(1y)log(1a))
  计算图(Computaion Graph):
【Andrew Ng Deep Learning个人学习笔记】 2、神经网络基础(2)_第2张图片
  计算导数(Derivative):
    d l ( a , y ) d a = − y a + 1 − y 1 − a \frac{dl(a,y)}{da}=-\frac{y}{a}+\frac{1-y}{1-a} dadl(a,y)=ay+1a1y
    d l ( a , y ) d z = d l d a ⋅ d a d z \frac{dl(a,y)}{dz}=\frac{dl}{da}\cdot\frac{da}{dz} dzdl(a,y)=dadldzda
       = ( − y a + 1 − y 1 − a ) a ( 1 − a ) =(-\frac{y}{a}+\frac{1-y}{1-a})a(1-a) =(ay+1a1y)a(1a)
       = a − y =a-y =ay
    d l ( a , y ) d w 1 = x 1 ( a − y ) \frac{dl(a,y)}{dw_1}=x_1(a-y) dw1dl(a,y)=x1(ay)
    d l ( a , y ) d w 2 = x 2 ( a − y ) \frac{dl(a,y)}{dw_2}=x_2(a-y) dw2dl(a,y)=x2(ay)
    d l ( a , y ) d b = a − y \frac{dl(a,y)}{db}=a-y dbdl(a,y)=ay
  
  这实际上是把逻辑回归看作单层的神经网络,用反向传播算法(Back Propagation Algorithm)计算出各个参数的导数,以便下一步用梯度下降算法计算出代价最小的参数。
  
多个训练样本(m training samples):
   J ( w , b ) = 1 m ∑ i = 1 m l ( a ( i ) , y ( i ) ) J(w,b)=\frac{1}{m}\sum_{i=1}^{m}l(a^{(i)},y^{(i)}) J(w,b)=m1i=1ml(a(i),y(i))
   a ( i ) = y ^ ( i ) = σ ( z ( i ) ) = σ ( w T x ( i ) + b ) a^{(i)}=\hat{y}^{(i)}=\sigma(z^{(i)})=\sigma(w^Tx^{(i)}+b) a(i)=y^(i)=σ(z(i))=σ(wTx(i)+b)
   ∂ J ( w , b ) ∂ w 1 = 1 m ∑ i = 1 m ∂ l ( a ( i ) , y ( i ) ) ∂ w 1 \frac{∂J(w,b)}{∂w_1}=\frac{1}{m}\sum_{i=1}^{m}\frac{∂l(a^{(i)},y^{(i)})}{∂w_1} w1J(w,b)=m1i=1mw1l(a(i),y(i))
   ∂ J ( w , b ) ∂ b = 1 m ∑ i = 1 m ∂ l ( a ( i ) , y ( i ) ) ∂ b \frac{∂J(w,b)}{∂b}=\frac{1}{m}\sum_{i=1}^{m}\frac{∂l(a^{(i)},y^{(i)})}{∂b} bJ(w,b)=m1i=1mbl(a(i),y(i))
  
逻辑回归算法(Logistic regression algorithm)
  Repeat{
      J = 0 ; d w 1 = 0 ; d w 2 = 0 ; d b = 0 J=0;dw_1=0;dw_2=0;db=0 J=0;dw1=0;dw2=0;db=0
     For i in range(m):
        z ( i ) = w T x ( i ) + b z^{(i)}=w^Tx^{(i)}+b z(i)=wTx(i)+b
        a ( i ) = σ ( z ( i ) ) a^{(i)}=\sigma(z^{(i)}) a(i)=σ(z(i))
        J + = y ( i ) l o g a ( i ) + ( 1 − y ( i ) ) l o g ( 1 − a ( i ) ) J+=y^{(i)}loga^{(i)}+(1-y^{(i)})log(1-a^{(i)}) J+=y(i)loga(i)+(1y(i))log(1a(i))
        d z ( i ) = a ( i ) − y ( i ) dz^{(i)}=a^{(i)}-y^{(i)} dz(i)=a(i)y(i)
        d w 1 ( i ) + = x 1 ( i ) d z ( i ) dw_1^{(i)}+=x_1^{(i)}dz^{(i)} dw1(i)+=x1(i)dz(i)
        d w 2 ( i ) + = x 2 ( i ) d z ( i ) dw_2^{(i)}+=x_2^{(i)}dz^{(i)} dw2(i)+=x2(i)dz(i)
        d b + = d z ( i ) db +=dz^{(i)} db+=dz(i)
      J / = m J/=m J/=m
      d w 1 / = m dw_1/=m dw1/=m
      d w 2 / = m dw_2/=m dw2/=m
      d b / = m db/=m db/=m
     
      w 1 = w 1 − α d w 1 w_1=w_1-\alpha dw_1 w1=w1αdw1
      w 2 = w 2 − α d w 2 w_2=w_2-\alpha dw_2 w2=w2αdw2
      b = w 1 − α d b b=w_1-\alpha db b=w1αdb
  }
  
  未完待续…

你可能感兴趣的:(【Andrew Ng Deep Learning个人学习笔记】 2、神经网络基础(2))