Machine Learning By Andrew Ng (5)

Notes on Machine Learning By Andrew Ng (5)

Click here to see previous note.

Neural Networks: Representation

Non-linear hypotheses

Non-linear classification

You may use polynomial features to find an ideal classifier, but when we have lots of features, it may comes to overfitting in the end.

Neurons and the brain

Machine Learning By Andrew Ng (5)_第1张图片

Model representation I

Neuron model: Logistic unit

Machine Learning By Andrew Ng (5)_第2张图片
Machine Learning By Andrew Ng (5)_第3张图片

Notation

a i ( j ) = a_i^{(j)} = ai(j)= “activation” of unit i i i in layer j j j.

$\Theta^{(j)} = $ matrix of weight controlling function mapping from layer j j j to layer j + 1 j+1 j+1.

Machine Learning By Andrew Ng (5)_第4张图片
a 1 ( 2 ) = g ( Θ 10 ( 1 ) x 0 + Θ 11 ( 1 ) x 1 + Θ 12 ( 1 ) x 2 + Θ 13 ( 1 ) x 3 ) a 2 ( 2 ) = g ( Θ 20 ( 1 ) x 0 + Θ 21 ( 1 ) x 1 + Θ 22 ( 1 ) x 2 + Θ 23 ( 1 ) x 3 ) a 3 ( 2 ) = g ( Θ 30 ( 1 ) x 0 + Θ 31 ( 1 ) x 1 + Θ 32 ( 1 ) x 2 + Θ 33 ( 1 ) x 3 ) h Θ ( x ) = a 1 ( 3 ) = g ( Θ 10 ( 2 ) a 0 ( 2 ) + Θ 11 ( 2 ) a 1 ( 2 ) + Θ 12 ( 2 ) a 2 ( 2 ) + Θ 13 ( 2 ) a 3 ( 2 ) ) a_1^{(2)} = g(\Theta_{10}^{(1)}x_0 + \Theta_{11}^{(1)}x_1 + \Theta_{12}^{(1)}x_2 + \Theta_{13}^{(1)}x_3)\\ a_2^{(2)} = g(\Theta_{20}^{(1)}x_0 + \Theta_{21}^{(1)}x_1 + \Theta_{22}^{(1)}x_2 + \Theta_{23}^{(1)}x_3)\\ a_3^{(2)} = g(\Theta_{30}^{(1)}x_0 + \Theta_{31}^{(1)}x_1 + \Theta_{32}^{(1)}x_2 + \Theta_{33}^{(1)}x_3)\\ h_\Theta(x) = a_1^{(3)} = g(\Theta_{10}^{(2)}a_0^{(2)} + \Theta_{11}^{(2)}a_1^{(2)} + \Theta_{12}^{(2)}a_2^{(2)} + \Theta_{13}^{(2)}a_3^{(2)}) a1(2)=g(Θ10(1)x0+Θ11(1)x1+Θ12(1)x2+Θ13(1)x3)a2(2)=g(Θ20(1)x0+Θ21(1)x1+Θ22(1)x2+Θ23(1)x3)a3(2)=g(Θ30(1)x0+Θ31(1)x1+Θ32(1)x2+Θ33(1)x3)hΘ(x)=a1(3)=g(Θ10(2)a0(2)+Θ11(2)a1(2)+Θ12(2)a2(2)+Θ13(2)a3(2))
If network has s j s_j sj units in layer j j j, s j s_j sj units in layer j + 1 j+1 j+1, then Θ ( j ) \Theta^{(j)} Θ(j) will be of dimension s j + 1 × ( s j + 1 ) s_{j+1} \times (s_j+1) sj+1×(sj+1).

Model representation II

Forward propagation(前向传播): Vertorized implementation

Let Θ 10 ( 1 ) x 0 + Θ 11 ( 1 ) x 1 + Θ 12 ( 1 ) x 2 + Θ 13 ( 1 ) x 3 = z 1 ( 2 ) \Theta_{10}^{(1)}x_0 + \Theta_{11}^{(1)}x_1 + \Theta_{12}^{(1)}x_2 + \Theta_{13}^{(1)}x_3 = z^{(2)}_1 Θ10(1)x0+Θ11(1)x1+Θ12(1)x2+Θ13(1)x3=z1(2) and a 1 ( 2 ) = g ( z 1 ( 2 ) ) a_1^{(2)} = g(z^{(2)}_1) a1(2)=g(z1(2)).

Turn it to a vector!
x = [ x 0 x 1 x 2 x 3 ] z ( 2 ) = [ z 1 ( 2 ) z 2 ( 2 ) z 3 ( 2 ) ] , z ( 2 ) = Θ ( 1 ) x ( x = a ( 1 ) ) a ( 2 ) = g ( z ( 2 ) ) . \mathbf{x} = \left[ \begin{matrix} x_0\\ x_1\\ x_2\\ x_3 \end{matrix} \right]\quad \mathbf{z}^{(2)} = \left[ \begin{matrix} z_1^{(2)}\\ z_2^{(2)}\\ z_3^{(2)} \end{matrix} \right],\\ z^{(2)} = \Theta^{(1)}x\quad (x = a^{(1)})\\ a^{(2)} = g(z^{(2)}). x=x0x1x2x3z(2)=z1(2)z2(2)z3(2),z(2)=Θ(1)x(x=a(1))a(2)=g(z(2)).
Add a 0 ( 2 ) = 1 a_0^{(2)} =1 a0(2)=1, z ( 3 ) = Θ ( 2 ) a ( 2 ) z^{(3)} = \Theta^{(2)}a^{(2)} z(3)=Θ(2)a(2), h Θ ( x ) = a ( 3 ) = g ( z ( 3 ) ) h_\Theta(x) = a^{(3)} = g(z^{(3)}) hΘ(x)=a(3)=g(z(3)).

Examples and intuitions I

Examples and intuitions II

Machine Learning By Andrew Ng (5)_第5张图片

Multi-class classification

Multiple output units: One-vs-all

Machine Learning By Andrew Ng (5)_第6张图片

Training set: ( x ( 1 ) , y ( 1 ) ) , ( x ( 2 ) , y ( 2 ) ) , ( x ( m ) , y ( m ) ) (x^{(1)}, y^{(1)}), (x^{(2)}, y^{(2)}), (x^{(m)}, y^{(m)}) (x(1),y(1)),(x(2),y(2)),(x(m),y(m)),

y ( i ) ∈ [ 1 0 0 0 ] , [ 0 1 0 0 ] , [ 0 0 1 0 ] , [ 0 0 0 1 ] y^{(i)} \in \left[\begin{matrix}1\\0\\0\\0\end{matrix}\right], \left[\begin{matrix}0\\1\\0\\0\end{matrix}\right], \left[\begin{matrix}0\\0\\1\\0\end{matrix}\right],\left[\begin{matrix}0\\0\\0\\1\end{matrix}\right] y(i)1000,0100,0010,0001.

你可能感兴趣的:(机器学习,笔记)