【Andrew Ng Deep Learning个人学习笔记】 2、神经网络基础(1)

构建训练集的矩阵时,使用以下形式:

X = ( ⋮ ⋮ ⋮ x ( 1 ) x ( 2 ) ⋯ x ( m ) ⋮ ⋮ ⋮ ) X ∈ R n × m X = \begin{pmatrix} \vdots & \vdots & & \vdots \\ x^{(1)}& x^{(2)} & \cdots & x^{(m)}\\ \vdots & \vdots & & \vdots \\ \end{pmatrix} X\in R^{n \times m} X=x(1)x(2)x(m)XRn×m
Y = ( y ( 1 ) y ( 2 ) ⋯ y ( m ) ) Y ∈ R 1 × m Y= \begin{pmatrix} y^{(1)} & y^{(2)} & \cdots & y^{(m)} \\ \end{pmatrix} Y\in R^{1 \times m} Y=(y(1)y(2)y(m))YR1×m
  
  

逻辑回归(Logistic Regression)

   Given X, y ^ \hat{y} y^ = P(y = 1 | X)   0 ≤ \leq y ^ \hat{y} y^ ≤ \leq 1
   即预测值 y ^ \hat{y} y^ 为:X条件下, y = 1的概率。
  
  

参数说明(Parameters specification)

   输入的特征向量(Feature Vector)X:  X ∈ \in R n x R^{n_x} Rnx , n x n_x nx为特征的数量;
   训练标签(Training Label)Y:  Y ∈ \in {0, 1} ;
   权重(Weights)w:  w ∈ \in R n x R^{n_x} Rnx ;
   阈值??(Threshold)b:  b ∈ \in R R R ;
   输出(Output) y ^ \hat{y} y^:   y ^ \hat{y} y^ = σ \sigma σ( w T w^T wTx + b) ;
   S S S型函数(Sigmoid Function):  S S S = σ \sigma σ( w T w^T wTx + b) = σ \sigma σ( z z z) = 1 1 + e − z \frac{1}{1+e^{-z}} \quad 1+ez1;
   参数向量(Parameter Vector):   Θ \Theta Θ = ( θ 0 θ 1 θ 2 ⋮ θ m ) \begin{pmatrix} \theta_0 \\ \theta_1 \\ \theta_2 \\ \vdots \\ \theta_m \\ \end{pmatrix} θ0θ1θ2θm
  
  

损失函数(Loss/Error Function)

   l ( y ^ ( i ) , y ( i ) ) = 1 2 ( y ^ ( i ) − y ( i ) ) 2 l(\hat{y}^{(i)}, y^{(i)}) =\frac{1}{2} (\hat{y}^{(i)}- y^{(i)})^2 l(y^(i),y(i))=21(y^(i)y(i))2
  一般情况下,我们使用平方误差(Squared Error)来衡量损失函数,但是一个非凸函数,运行梯度下降算法时,很大可能性取到的是局部最优解,而我们想要的是全局最优解,因此一般情况下不使用这种损失函数。
  
一般使用这种形式的损失函数:
   l ( y ^ ( i ) , y ( i ) ) = − [ y ( i ) l o g ( y ^ ( i ) ) + ( 1 − y ( i ) ) l o g ( 1 − y ^ ( i ) ) ] l(\hat{y}^{(i)}, y^{(i)}) =-[y^{(i)}log(\hat{y}^{(i)}) + (1-y^{(i)})log(1-\hat{y}^{(i)})] l(y^(i),y(i))=[y(i)log(y^(i))+(1y(i))log(1y^(i))]
     i f if if y ^ ( i ) = = 1 : l ( y ^ ( i ) , y ( i ) ) = − y ( i ) l o g ( y ^ ( i ) ) \hat{y}^{(i)}==1: l(\hat{y}^{(i)}, y^{(i)}) =-y^{(i)}log(\hat{y}^{(i)}) y^(i)==1:l(y^(i),y(i))=y(i)log(y^(i))
     i f if if y ^ ( i ) = = 0 : l ( y ^ ( i ) , y ( i ) ) = − ( 1 − y ( i ) ) l o g ( 1 − y ^ ( i ) ) \hat{y}^{(i)}==0: l(\hat{y}^{(i)}, y^{(i)}) =-(1-y^{(i)})log(1-\hat{y}^{(i)}) y^(i)==0:l(y^(i),y(i))=(1y(i))log(1y^(i))
  
  

代价函数(Cost Function)

   J ( w , b ) = 1 m ∑ i = 1 m l ( y ^ ( i ) , y ( i ) ) J(w,b)=\frac{1}{m}\sum_{i=1}^ml(\hat{y}^{(i)}, y^{(i)}) J(w,b)=m1i=1ml(y^(i),y(i))
       = − 1 m ∑ i = 1 m [ y ( i ) l o g ( y ^ ( i ) ) + ( 1 − y ( i ) ) l o g ( 1 − y ^ ( i ) ) ] =-\frac{1}{m}\sum_{i=1}^m[y^{(i)}log(\hat{y}^{(i)}) + (1-y^{(i)})log(1-\hat{y}^{(i)})] =m1i=1m[y(i)log(y^(i))+(1y(i))log(1y^(i))]
  
  

对比Cost Function与Loss/Error Function

  Loss/Error Function衡量单个训练样本上的表现;Cost Function是Loss Function在整个训练集(Training set)上的平均值。
  
  

你可能感兴趣的:(深度学习,深度学习)