非线性函数

  在A和B之间插入非线性函数S,表达式为: A ( S ( B v ) ) A(S(Bv)) A(S(Bv)),这是一个非常大的进步。最终发现平滑的逻辑函数S被更简单的斜坡函数ReLU所替代, R e L U ( x ) = m a x ( 0 , x ) ReLU(x)= max (0, x) ReLU(x)=max(0,x)

F ( v ) = L ( R ( L ( R ) … ( L v ) ) ) F(v)=L(R(L(R)\dots(Lv))) F(v)=L(R(L(R)(Lv)))

The functions that yield deep learning have the form F(v)= L(R(L(R(… .(Lv))))).This is a composition of affine functions Lo = Ao + b with nonlinear functions R—which act on each component of the vector Lv. The matrices A and the bias vectors bare the weights in the learning function F’. It is the A’s and b’s that must be learnedfrom the training data, so that the outputs F(v) will be(nearly) correct.Then F can beapplied to new samples from the same population. If the weights (A’s and b’s) are wellchosen,the outputs F(v) from the unseen test data should be accurate. More layersin the function F’ will typically produce more accuracy in F(o).
Properly speaking, F(a, z) depends on the input o and the weights a (all the A’s andb’s). The outputs v = ReLU(Ajo + b1) from the first step produce the first hiddenlayer in our neural net. The complete net starts with the input layer o and ends with theoutput layer w = F(). The affine part Le.(vk-1)= AkOk-1+ bke of each step uses thecomputed weights Ak and bk.

Here is a picture of the neural net, to show the structure of F(v). The input layercontains the training samples o = vg. The output is their classification w = F(o).For perfect learning, w will be a (correct) digit from 0 to 9. The hidden layersadd depth to the network. It is that depth which has allowed the composite function Fto be so successful in deep learning. In fact the number of weights A i j A_{ij} Aij and b j b_j bj in the neural net is often larger than the number of inputs from the training samples v.

参数量远远大于训练样本个数(待验证)。

你可能感兴趣的:(线性代数,概率论,线性代数,机器学习)