下面是就是神经网络中代价函数 J ( Θ ) J(\Theta) J(Θ)的表达式,看起来还是稍微有点复杂。这个表达式到底在计算什么?下面我们先用一个简单的例子来分开一步步计算一下。
J ( Θ ) = − 1 m ∑ i = 1 m ∑ k = 1 K [ y k ( i ) log ( ( h Θ ( x ( i ) ) ) k ) + ( 1 − y k ( i ) ) log ( 1 − ( h Θ ( x ( i ) ) ) k ) ] + λ 2 m ∑ l = 1 L − 1 ∑ i = 1 s l ∑ j = 1 s l + 1 ( Θ j , i ( l ) ) 2 J(\Theta) = - \frac{1}{m} \sum_{i=1}^m \sum_{k=1}^K \left[y^{(i)}_k \log ((h_\Theta (x^{(i)}))_k) + (1 - y^{(i)}_k)\log (1 - (h_\Theta(x^{(i)}))_k)\right] + \frac{\lambda}{2m}\sum_{l=1}^{L-1} \sum_{i=1}^{s_l} \sum_{j=1}^{s_{l+1}} ( \Theta_{j,i}^{(l)})^2 J(Θ)=−m1i=1∑mk=1∑K[yk(i)log((hΘ(x(i)))k)+(1−yk(i))log(1−(hΘ(x(i)))k)]+2mλl=1∑L−1i=1∑slj=1∑sl+1(Θj,i(l))2
有如下神经网络:
其中:
L = 神经网络总共包含的层数 s l = 第 l 层的神经元数目 K = 输出层的神经元数,亦即分类的数目 \begin{aligned} L &= \text{神经网络总共包含的层数} \\[1ex] s_l &= \text{第$l$层的神经元数目} \\[1ex] K &= \text{输出层的神经元数,亦即分类的数目}\\[\1ex] \end{aligned} LslK=神经网络总共包含的层数=第l层的神经元数目=输出层的神经元数,亦即分类的数目
假设 s 1 = 3 , s 2 = 2 , s 3 = 3 s_1=3,s_2=2,s_3=3 s1=3,s2=2,s3=3,则 Θ 1 \Theta^1 Θ1的维度为 2 × 4 2\times4 2×4, Θ 2 \Theta^2 Θ2的维度为 3 × 3 3\times3 3×3。
则有:
X T = [ 1 x 1 x 2 x 3 ] , Θ 1 = [ θ 10 1 θ 11 1 θ 12 1 θ 13 1 θ 20 1 θ 21 1 θ 22 1 θ 23 1 ] 2 × 4 , Θ 2 = [ θ 10 2 θ 11 2 θ 12 2 θ 20 2 θ 21 2 θ 22 2 θ 30 2 θ 31 2 θ 32 2 ] 3 × 3 X^T=\begin{bmatrix} 1 \\ x_1 \\ x_2\\x_3\end{bmatrix}, \Theta^1= \begin{bmatrix}\theta^1_{10}&\theta^1_{11}&\theta^1_{12}&\theta^1_{13}\\\theta^1_{20}&\theta^1_{21}&\theta^1_{22}&\theta^1_{23}\\ \end{bmatrix}_{2\times4}, \Theta^2= \begin{bmatrix}\theta^2_{10}&\theta^2_{11}&\theta^2_{12}\\ \theta^2_{20}&\theta^2_{21}&\theta^2_{22}\\ \theta^2_{30}&\theta^2_{31}&\theta^2_{32}\\ \end{bmatrix}_{3\times3} XT=⎣⎢⎢⎡1x1x2x3⎦⎥⎥⎤,Θ1=[θ101θ201θ111θ211θ121θ221θ131θ231]2×4,Θ2=⎣⎡θ102θ202θ302θ112θ212θ312θ122θ222θ322⎦⎤3×3
先回忆一下正向传播的计算公式:
z ( j ) = Θ ( j − 1 ) a ( j − 1 ) … … ( 1 ) a ( j ) = g ( z ( j ) ) , s e t t i n g a 0 ( j ) = 1 … … ( 2 ) h Θ ( x ) = a ( j ) = g ( z ( j ) ) … … ( 3 ) \begin{aligned} z^{(j)} = \Theta^{(j-1)}a^{(j-1)}\dots\dots(1) \newline a^{(j)}=g(z^{(j)}),setting \text{ }a^{(j)}_0=1\dots\dots(2) \newline h_\Theta(x) = a^{(j)} = g(z^{(j)})\dots\dots(3) \end{aligned} z(j)=Θ(j−1)a(j−1)……(1)a(j)=g(z(j)),setting a0(j)=1……(2)hΘ(x)=a(j)=g(z(j))……(3)
详解戳此处
此时我们先忽略 regularized term
①当m=1时;
J ( Θ ) = − 1 m ∑ k = 1 K [ y k ( i ) log ( ( h Θ ( x ( i ) ) ) k ) + ( 1 − y k ( i ) ) log ( 1 − ( h Θ ( x ( i ) ) ) k ) ] \color{blue}{J(\Theta) = - \frac{1}{m} \sum_{k=1}^K \left[y^{(i)}_k \log ((h_\Theta (x^{(i)}))_k) + (1 - y^{(i)}_k)\log (1 - (h_\Theta(x^{(i)}))_k)\right]} J(Θ)=−m1k=1∑K[yk(i)log((hΘ(x(i)))k)+(1−yk(i))log(1−(hΘ(x(i)))k)]
1. 令 a 1 = X T ;    ⟹    z 2 = Θ 1 ∗ a 1 = [ θ 10 1 θ 11 1 θ 12 1 θ 13 1 θ 20 1 θ 21 1 θ 22 1 θ 23 1 ] 2 × 4 × [ 1 x 1 x 2 x 3 ] = [ θ 10 1 + θ 11 1 ⋅ x 1 + θ 12 1 ⋅ x 2 + θ 13 1 ⋅ x 3 θ 20 1 + θ 21 1 ⋅ x 1 + θ 22 1 ⋅ x 2 + θ 23 1 ⋅ x 3 ] 2 × 1 \begin{aligned} 1.令a^1=X^T;\implies z^2=\Theta^1*a^1 =\begin{bmatrix}\theta^1_{10}&\theta^1_{11}&\theta^1_{12}&\theta^1_{13}\\\theta^1_{20}&\theta^1_{21}&\theta^1_{22}&\theta^1_{23}\\ \end{bmatrix}_{2\times4}\times\begin{bmatrix} 1 \\ x_1 \\ x_2\\x_3\end{bmatrix}=\begin{bmatrix}\theta^1_{10}+\theta^1_{11}\cdot x_1+\theta^1_{12}\cdot x_2+\theta^1_{13}\cdot x_3\\\theta^1_{20}+\theta^1_{21}\cdot x_1+\theta^1_{22}\cdot x_2+\theta^1_{23}\cdot x_3\end{bmatrix}_{2\times1} \end{aligned} 1.令a1=XT;⟹z2=Θ1∗a1=[θ101θ201θ111θ211θ121θ221θ131θ231]2×4×⎣⎢⎢⎡1x1x2x3⎦⎥⎥⎤=[θ101+θ111⋅x1+θ121⋅x2+θ131⋅x3θ201+θ211⋅x1+θ221⋅x2+θ231⋅x3]2×1
= [ z 1 2 z 2 2 ] ,    ⟹    a 2 = g ( z 2 ) ; =\begin{bmatrix}z^2_1\\ z^2_2\end{bmatrix},\implies a^2=g(z^2); =[z12z22],⟹a2=g(z2);
2.给 a 2 添加偏置项,并计算 a 3 即 h θ ( x ) ; a 2 = [ 1 a 1 2 a 2 2 ] ;    ⟹    z 3 = Θ 2 ∗ a 2 = [ θ 10 2 θ 11 2 θ 12 2 θ 20 2 θ 21 2 θ 22 2 θ 30 2 θ 31 2 θ 32 2 ] 3 × 3 × [ 1 a 1 2 a 2 2 ] = [ z 1 3 z 2 3 z 3 3 ] ; \begin{aligned} \text{ 2.给}a^2\text{添加偏置项,并计算}a^3 \text{即} h_{\theta}(x); a^2=\begin{bmatrix}1 \\ a^2_1\\ a^2_2\end{bmatrix}; \implies z^3=\Theta^2*a^2=\begin{bmatrix}\theta^2_{10}&\theta^2_{11}&\theta^2_{12}\\ \theta^2_{20}&\theta^2_{21}&\theta^2_{22}\\ \theta^2_{30}&\theta^2_{31}&\theta^2_{32}\\ \end{bmatrix}_{3\times 3}\times\begin{bmatrix}1\\[1ex] a^2_1 \\ a^2_2\end{bmatrix}=\begin{bmatrix}z^3_1 \\ z^3_2 \\ z^3_3\end{bmatrix}; \end{aligned} 2.给a2添加偏置项,并计算a3即hθ(x);a2=⎣⎡1a12a22⎦⎤;⟹z3=Θ2∗a2=⎣⎡θ102θ202θ302θ112θ212θ312θ122θ222θ322⎦⎤3×3×⎣⎢⎡1a12a22⎦⎥⎤=⎣⎡z13z23z33⎦⎤;
   ⟹    h θ ( x ) = a 3 = g ( z 3 ) = [ g ( z 1 3 ) g ( z 2 3 ) g ( z 3 3 ) ] = [ h ( x ) 1 h ( x ) 2 h ( x ) 3 ) ] \implies h_\theta(x)=a^3=g(z^3)=\begin{bmatrix}g(z^3_1)\\g(z^3_2)\\ g(z^3_3)\end{bmatrix}=\color{red}{\begin{bmatrix}h(x)_1\\h(x)_2\\ h(x)_3)\end{bmatrix}} ⟹hθ(x)=a3=g(z3)=⎣⎡g(z13)g(z23)g(z33)⎦⎤=⎣⎡h(x)1h(x)2h(x)3)⎦⎤
此时我们知道,对于每一个example,最终都会输出3个结果,那么这时代价函数所做的就是将这3个输出取对数然后乘以对应的预期期望值y之后,再累加起来。具体如下:
假设 i n p u t : X T = [ 1 x 1 x 2 x 3 ] ; o u t p u t : y = [ 1 0 0 ] = [ y 1 y 2 y 3 ] input:X^T=\begin{bmatrix} 1 \\ x_1 \\ x_2\\x_3\end{bmatrix};output:y=\begin{bmatrix} 1 \\ 0 \\ 0\end{bmatrix}=\begin{bmatrix} y_1 \\ y_2 \\ y_3\end{bmatrix} input:XT=⎣⎢⎢⎡1x1x2x3⎦⎥⎥⎤;output:y=⎣⎡100⎦⎤=⎣⎡y1y2y3⎦⎤
则有:
J ( Θ ) ∗ m = [ − y 1 × l o g ( h ( x ) 1 ) − ( 1 − y 1 ) × l o g ( 1 − h ( x ) 1 ) ] + [ − y 2 × l o g ( h ( x ) 2 ) − ( 1 − y 2 ) × l o g ( 1 − h ( x ) 2 ) ] + [ − y 3 × l o g ( h ( x ) 3 ) − ( 1 − y 3 ) × l o g ( 1 − h ( x ) 3 ) ] = [ − 1 × l o g ( h ( x ) 1 ) − ( 1 − 1 ) × l o g ( 1 − h ( x ) 1 ) ] + [ − 0 × l o g ( h ( x ) 2 ) − ( 1 − 0 ) × l o g ( 1 − h ( x ) 2 ) ] + [ − 0 × l o g ( h ( x ) 3 ) − ( 1 − 0 ) × l o g ( 1 − h ( x ) 3 ) ] = − l o g ( h ( x ) 1 ) − l o g ( 1 − h ( x ) 2 ) − l o g ( 1 − h ( x ) 3 ) \begin{aligned} J(\Theta)*m &=[-y_1\times log(h(x)_1)-(1-y_1)\times log(1-h(x)_1)]\\ &+[-y_2\times log(h(x)_2)-(1-y_2)\times log(1-h(x)_2)]\\ &+[-y_3\times log(h(x)_3)-(1-y_3)\times log(1-h(x)_3)]\\ &\color{red}{=}[-1\times log(h(x)_1)-(1-1)\times log(1-h(x)_1)]\\ &+[-0\times log(h(x)_2)-(1-0)\times log(1-h(x)_2)]\\ &+[-0\times log(h(x)_3)-(1-0)\times log(1-h(x)_3)]\\ &\color{red}{=-log(h(x)_1)-log(1-h(x)_2)-log(1-h(x)_3)} \end{aligned} J(Θ)∗m=[−y1×log(h(x)1)−(1−y1)×log(1−h(x)1)]+[−y2×log(h(x)2)−(1−y2)×log(1−h(x)2)]+[−y3×log(h(x)3)−(1−y3)×log(1−h(x)3)]=[−1×log(h(x)1)−(1−1)×log(1−h(x)1)]+[−0×log(h(x)2)−(1−0)×log(1−h(x)2)]+[−0×log(h(x)3)−(1−0)×log(1−h(x)3)]=−log(h(x)1)−log(1−h(x)2)−log(1−h(x)3)
在matlab中,矢量化之后的代价函数为:
J ( Θ ) = ( 1 / m ) ∗ ( s u m ( − l a b e l Y . ∗ l o g ( H θ ) − ( 1 − l a b e l Y ) . ∗ l o g ( 1 − H θ ) ) ) ; J(\Theta)= (1/m)*(sum\color{red}{(}-label_Y.*log(H_{\theta})-(1-label_Y).*log(1-H_\theta)\color{red}{)}); J(Θ)=(1/m)∗(sum(−labelY.∗log(Hθ)−(1−labelY).∗log(1−Hθ)));
J ( Θ ) = − 1 m ∑ i = 1 m ∑ k = 1 K [ y k ( i ) log ( ( h Θ ( x ( i ) ) ) k ) + ( 1 − y k ( i ) ) log ( 1 − ( h Θ ( x ( i ) ) ) k ) ] \color{blue}{J(\Theta) = - \frac{1}{m} \sum_{i=1}^m \sum_{k=1}^K \left[y^{(i)}_k \log ((h_\Theta (x^{(i)}))_k) + (1 - y^{(i)}_k)\log (1 - (h_\Theta(x^{(i)}))_k)\right]} J(Θ)=−m1i=1∑mk=1∑K[yk(i)log((hΘ(x(i)))k)+(1−yk(i))log(1−(hΘ(x(i)))k)]
此时,对于每一个example都会产生一个上面的代价,所以只需要把所有的对于每一个example产生的代价累加起来即可。
再来分解一下:
假 设 , X = [ 1 x 1 1 x 2 1 x 3 1 1 x 1 2 x 2 2 x 3 2 1 x 1 3 x 2 3 x 3 3 ] , 假设,X=\begin{bmatrix} 1 &x^1_1 & x^1_2&x^1_3\\ 1 &x^2_1 & x^2_2&x^2_3\\ 1 &x^3_1 & x^3_2&x^3_3\end{bmatrix}, 假设,X=⎣⎡111x11x12x13x21x22x23x31x32x33⎦⎤,
1. 令 a 1 = X T ;    ⟹    z 2 = Θ 1 ∗ a 1 = [ θ 10 1 θ 11 1 θ 12 1 θ 13 1 θ 20 1 θ 21 1 θ 22 1 θ 23 1 ] 2 × 4 × [ 1 1 1 x 1 1 x 1 2 x 1 3 x 2 1 x 2 2 x 2 3 x 3 1 x 3 2 x 3 3 ] 4 × 3 = 1.令a^1=X^T;\implies z^2=\Theta^1*a^1 =\begin{bmatrix}\theta^1_{10}&\theta^1_{11}&\theta^1_{12}&\theta^1_{13}\\\theta^1_{20}&\theta^1_{21}&\theta^1_{22}&\theta^1_{23}\\ \end{bmatrix}_{2\times4} \times\begin{bmatrix} 1 &1&1\\ x^1_1&x^2_1&x^3_1 \\ x^1_2&x^2_2&x^3_2\\x^1_3&x^2_3&x^3_3\end{bmatrix}_{4\times3}= 1.令a1=XT;⟹z2=Θ1∗a1=[θ101θ201θ111θ211θ121θ221θ131θ231]2×4×⎣⎢⎢⎡1x11x21x311x12x22x321x13x23x33⎦⎥⎥⎤4×3=
[ θ 10 1 + θ 11 1 ⋅ x 1 1 + θ 12 1 ⋅ x 2 1 + θ 13 1 ⋅ x 3 1 θ 10 1 + θ 11 1 ⋅ x 1 2 + θ 12 1 ⋅ x 2 2 + θ 13 1 ⋅ x 3 2 θ 10 1 + θ 11 1 ⋅ x 1 3 + θ 12 1 ⋅ x 2 3 + θ 13 1 ⋅ x 3 3 θ 20 1 + θ 21 1 ⋅ x 1 1 + θ 22 1 ⋅ x 2 1 + θ 23 1 ⋅ x 3 1 θ 20 1 + θ 21 1 ⋅ x 1 2 + θ 22 1 ⋅ x 2 2 + θ 23 1 ⋅ x 3 2 θ 20 1 + θ 21 1 ⋅ x 1 3 + θ 22 1 ⋅ x 2 3 + θ 23 1 ⋅ x 3 3 ] 2 × 3 \begin{bmatrix}\color{red}{\theta^1_{10}+\theta^1_{11}\cdot x^1_1+\theta^1_{12}\cdot x^1_2+\theta^1_{13}\cdot x^1_3} &\color{blue}{\theta^1_{10}+\theta^1_{11}\cdot x^2_1+\theta^1_{12}\cdot x^2_2+\theta^1_{13}\cdot x^2_3} &\color{blcak}{\theta^1_{10}+\theta^1_{11}\cdot x^3_1+\theta^1_{12}\cdot x^3_2+\theta^1_{13}\cdot x^3_3} \\ \color{red}{\theta^1_{20}+\theta^1_{21}\cdot x^1_1+\theta^1_{22}\cdot x^1_2+\theta^1_{23}\cdot x^1_3} &\color{blue}{\theta^1_{20}+\theta^1_{21}\cdot x^2_1+\theta^1_{22}\cdot x^2_2+\theta^1_{23}\cdot x^2_3} &\color{black}{\theta^1_{20}+\theta^1_{21}\cdot x^3_1+\theta^1_{22}\cdot x^3_2+\theta^1_{23}\cdot x^3_3} \end{bmatrix}_{2\times3} [θ101+θ111⋅x11+θ121⋅x21+θ131⋅x31θ201+θ211⋅x11+θ221⋅x21+θ231⋅x31θ101+θ111⋅x12+θ121⋅x22+θ131⋅x32θ201+θ211⋅x12+θ221⋅x22+θ231⋅x32θ101+θ111⋅x13+θ121⋅x23+θ131⋅x33θ201+θ211⋅x13+θ221⋅x23+θ231⋅x33]2×3
= [ z 11 2 z 12 2 z 13 2 z 21 2 z 22 2 z 23 2 ] 2 × 3 ,    ⟹    a 2 = g ( z 2 ) ; =\begin{bmatrix}z^2_{11}&z^2_{12}&z^2_{13}\\ z^2_{21}&z^2_{22}&z^2_{23}\end{bmatrix}_{2\times3},\implies a^2=g(z^2); =[z112z212z122z222z132z232]2×3,⟹a2=g(z2);
2.给 a 2 添加偏置项,并计算 a 3 即 h θ ( x ) \text{ 2.给}a^2\text{添加偏置项,并计算}a^3\text{即}h_{\theta}(x) 2.给a2添加偏置项,并计算a3即hθ(x);
a 2 = [ 1 1 1 a 11 2 a 12 2 a 13 2 a 21 2 a 22 2 a 23 2 ] 3 × 3 ;    ⟹    z 3 = Θ 2 ∗ a 2 = [ θ 10 2 θ 11 2 θ 12 2 θ 20 2 θ 21 2 θ 22 2 θ 30 2 θ 31 2 θ 32 2 ] 3 × 3 × [ 1 1 1 a 11 2 a 12 2 a 13 2 a 21 2 a 22 2 a 23 2 ] 3 × 3 a^2=\begin{bmatrix}1&1&1\\a^2_{11}&a^2_{12}&a^2_{13}\\ a^2_{21}&a^2_{22}&a^2_{23}\end{bmatrix}_{3\times3};\implies z^3=\Theta^2*a^2=\begin{bmatrix}\theta^2_{10}&\theta^2_{11}&\theta^2_{12}\\ \theta^2_{20}&\theta^2_{21}&\theta^2_{22}\\ \theta^2_{30}&\theta^2_{31}&\theta^2_{32}\\ \end{bmatrix}_{3\times3}\times\begin{bmatrix}1&1&1\\a^2_{11}&a^2_{12}&a^2_{13}\\ a^2_{21}&a^2_{22}&a^2_{23}\end{bmatrix}_{3\times3} a2=⎣⎡1a112a2121a122a2221a132a232⎦⎤3×3;⟹z3=Θ2∗a2=⎣⎡θ102θ202θ302θ112θ212θ312θ122θ222θ322⎦⎤3×3×⎣⎡1a112a2121a122a2221a132a232⎦⎤3×3
   ⟹    h θ ( x ) = a 3 = g ( z 3 ) = [ g ( z 11 3 ) g ( z 12 3 g ( z 13 3 ) ) g ( z 21 3 ) g ( z 22 3 g ( z 23 3 ) ) g ( z 31 3 ) g ( z 32 3 ) g ( z 33 3 ) ] \implies h_\theta(x)=a^3=g(z^3)=\begin{bmatrix}g(z^3_{11})&g(z^3_{12}&g(z^3_{13}))\\g(z^3_{21})&g(z^3_{22}&g(z^3_{23}))\\ g(z^3_{31})&g(z^3_{32})&g(z^3_{33})\end{bmatrix} ⟹hθ(x)=a3=g(z3)=⎣⎡g(z113)g(z213)g(z313)g(z123g(z223g(z323)g(z133))g(z233))g(z333)⎦⎤
= [ 第 一 个 样 本 点 对 应 的 所 有 输 出 ; 第 二 个 样 本 点 第 三 个 时 ; h ( x 1 ) 1 h ( x 2 ) 1 h ( x 3 ) 1 h ( x 1 ) 2 h ( x 2 ) 2 h ( x 3 ) 2 h ( x 1 ) 3 h ( x 2 ) 3 h ( x 3 ) 3 ] =\begin{bmatrix}\color{red}{第一个样本点对应的所有输出;}&第二个样本点&第三个时;\\\color{red}{h(x_1)_1}&h(x_2)_1&h(x_3)_1&\\\color{red}{h(x_1)_2}&h(x_2)_2&h(x_3)_2\\ \color{red}{h(x_1)_3}&h(x_2)_3&h(x_3)_3\end{bmatrix} =⎣⎢⎢⎡第一个样本点对应的所有输出;h(x1)1h(x1)2h(x1)3第二个样本点h(x2)1h(x2)2h(x2)3第三个时;h(x3)1h(x3)2h(x3)3⎦⎥⎥⎤
假设 i n p u t : X = [ 1 x 1 1 x 2 1 x 3 1 1 x 1 2 x 2 2 x 3 2 1 x 1 3 x 2 3 x 3 3 ] ; o u t p u t : y = [ 1 2 2 ] = [ y 1 y 2 y 3 ] input:X=\begin{bmatrix} 1 &x^1_1 & x^1_2&x^1_3\\ 1 &x^2_1 & x^2_2&x^2_3\\ 1 &x^3_1 & x^3_2&x^3_3\end{bmatrix};output:y=\begin{bmatrix} 1 \\ 2 \\ 2\end{bmatrix}=\begin{bmatrix} y_1 \\ y_2 \\ y_3\end{bmatrix} input:X=⎣⎡111x11x12x13x21x22x23x31x32x33⎦⎤;output:y=⎣⎡122⎦⎤=⎣⎡y1y2y3⎦⎤
该例子的背景为用神经网络识别手写体,即 y 1 = 1 表示期望输出为1, y 2 = y 3 = 2 ,表示其期望输出为2。在计算代价函数的时候要将其每一个对应的输出转换为只含有0,1的向量 y_1=1\text{表示期望输出为1,}y_2=y_3=2\text{,表示其期望输出为2。在计算代价函数的时候要将其每一个对应的输出转换为只含有0,1的向量} y1=1表示期望输出为1,y2=y3=2,表示其期望输出为2。在计算代价函数的时候要将其每一个对应的输出转换为只含有0,1的向量
则有:
y 1 = [ 1 0 0 ] ; y 2 = [ 0 1 0 ] ; y 3 = [ 0 1 0 ]    ⟹    l a b e l Y = [ m = 1 m = 2 m = 3 1 0 0 0 1 1 0 0 0 ] y_1=\begin{bmatrix} 1 \\ 0 \\ 0 \end{bmatrix};y_2=\begin{bmatrix} 0 \\ 1 \\ 0 \end{bmatrix};y_3=\begin{bmatrix} 0 \\ 1 \\ 0 \end{bmatrix}\implies label_Y=\begin{bmatrix}\color{red}{m=1}&m=2&m=3\\ \color{red}{1} &0&0\\ \color{red}{0}&1&1 \\ \color{red}{0}&0&0 \end{bmatrix} y1=⎣⎡100⎦⎤;y2=⎣⎡010⎦⎤;y3=⎣⎡010⎦⎤⟹labelY=⎣⎢⎢⎡m=1100m=2010m=3010⎦⎥⎥⎤
对于如何将普通的输出值转换成只含有0,1的向量,戳此处
则有(Malab中的矢量化形式):
J ( Θ ) = ( 1 / m ) ∗ ( s u m ( s u m [ − l a b e l Y . ∗ l o g ( H θ ) − ( 1 − l a b e l Y ) . ∗ l o g ( 1 − H θ ) ] ) ) ; J(\Theta)= (1/m)*(sum\color{red}{(}sum\color{blue}{[}-label_Y.*log(H_{\theta})-(1-label_Y).*log(1-H_{\theta})\color{blue}{]}\color{red}{)}); J(Θ)=(1/m)∗(sum(sum[−labelY.∗log(Hθ)−(1−labelY).∗log(1−Hθ)]));
r e g u l a r = λ 2 m ∑ l = 1 L − 1 ∑ i = 1 s l ∑ j = 1 s l + 1 ( Θ j , i ( l ) ) 2 ; regular = \frac{\lambda}{2m}\sum_{l=1}^{L-1} \sum_{i=1}^{s_l} \sum_{j=1}^{s_{l+1}} ( \Theta_{j,i}^{(l)})^2; regular=2mλl=1∑L−1i=1∑slj=1∑sl+1(Θj,i(l))2;
其实regularized term 就是所有每一层的参数 ( Θ j , i l , j ≠ 0 ,即除了每一层的第一列偏置项所对应的参数 ) (\Theta^l_{j,i},j\neq0\text{,即除了每一层的第一列偏置项所对应的参数}) (Θj,il,j̸=0,即除了每一层的第一列偏置项所对应的参数)的平方和相加即可。
具体到本文的例子就是:
Θ 1 = [ θ 10 1 θ 11 1 θ 12 1 θ 13 1 θ 20 1 θ 21 1 θ 22 1 θ 23 1 ] 2 × 4 , Θ 2 = [ θ 10 2 θ 11 2 θ 12 2 θ 20 2 θ 21 2 θ 22 2 θ 30 2 θ 31 2 θ 32 2 ] 3 × 3 \Theta^1= \begin{bmatrix}\theta^1_{10}&\theta^1_{11}&\theta^1_{12}&\theta^1_{13}\\\theta^1_{20}&\theta^1_{21}&\theta^1_{22}&\theta^1_{23}\\ \end{bmatrix}_{2\times4}, \Theta^2= \begin{bmatrix}\theta^2_{10}&\theta^2_{11}&\theta^2_{12}\\ \theta^2_{20}&\theta^2_{21}&\theta^2_{22}\\ \theta^2_{30}&\theta^2_{31}&\theta^2_{32}\\ \end{bmatrix}_{3\times3} Θ1=[θ101θ201θ111θ211θ121θ221θ131θ231]2×4,Θ2=⎣⎡θ102θ202θ302θ112θ212θ312θ122θ222θ322⎦⎤3×3
r e g u l a r = ( θ 11 1 ) 2 + ( θ 12 1 ) 2 + ( θ 13 1 ) 2 + ( θ 21 1 ) 2 + ( θ 22 1 ) 2 + ( θ 23 1 ) 2 + ( θ 11 2 ) 2 + ( θ 12 2 ) 2 + ( θ 21 2 ) 2 + ( θ 22 2 ) 2 + ( θ 31 2 ) 2 + ( θ 32 2 ) 2 \begin{aligned} regular &=\color{red}{(\theta^1_{11})^2+(\theta^1_{12})^2+(\theta^1_{13})^2+(\theta^1_{21})^2+(\theta^1_{22})^2+(\theta^1_{23})^2 }\\ &+ \color{blue}{(\theta^2_{11})^2+ (\theta^2_{12})^2+ (\theta^2_{21})^2+ (\theta^2_{22})^2+ (\theta^2_{31})^2+ (\theta^2_{32})^2}\end{aligned} regular=(θ111)2+(θ121)2+(θ131)2+(θ211)2+(θ221)2+(θ231)2+(θ112)2+(θ122)2+(θ212)2+(θ222)2+(θ312)2+(θ322)2
Matlab中矢量化为:
s_Theta1 = sum(Theta1 .^ 2);%先求所有元素的平方,然后再每一列相加
r_Theta1 = sum(s_Theta1)-s_Theta1(1,1);%减去第一列的和
s_Theta2 = sum(Theta2 .^ 2);
r_Theta2 = sum(s_Theta2)-s_Theta2(1,1);
regular = (lambda/(2*m))*(r_Theta1+r_Theta2);