ReLU ( x ) = ( x ) + = max ( 0 , x ) \text{ReLU}(x) = (x)^+ = \max(0, x) ReLU(x)=(x)+=max(0,x)
作用
使用线性激活函数,那么这个模型的输出不过是你输入特征x的线性组合。
使用非线性线性激活函数,
使得神经网络可以任意逼近任何非线性函数,这样神经网络就可以应用到众多的非线性模型中。
o u t p u t ( 1 ) = w ( 1 ) x + b i a s ( 1 ) o u t p u t ( 2 ) = w ( 2 ) o u t p u t ( 1 ) + b i a s ( 2 ) o u t p u t ( 2 ) = w ( 2 ) w ( 1 ) x + b i a s ( 1 ) + b i a s ( 2 ) 如 果 将 w ′ = w ( 2 ) w ( 1 ) , b i a s ′ = b i a s ( 1 ) + b i a s ( 2 ) o u t p u t ( 2 ) = w ′ x + b i a s ′ output^{(1)} = w^{(1)}x + bias^{(1)}\\ output^{(2)} = w^{(2)}output^{(1)}+ bias^{(2)}\\ output^{(2)} = w^{(2)}w^{(1)}x + bias^{(1)}+ bias^{(2)}\\ 如果将w'= w^{(2)}w^{(1)},bias' = bias^{(1)}+ bias^{(2)}\\ output^{(2)} = w'x + bias'\\ output(1)=w(1)x+bias(1)output(2)=w(2)output(1)+bias(2)output(2)=w(2)w(1)x+bias(1)+bias(2)如果将w′=w(2)w(1),bias′=bias(1)+bias(2)output(2)=w′x+bias′
Z = f ( X , A ) = s o f t m a x ( A ^ R e L U ( A ^ X W ( 0 ) ) W ( 1 ) ) ( 9 ) Z = f(X, A) = softmax(\widehat{A} ReLU(\widehat{A}XW^{(0)})W^{(1)} ) \ (9) Z=f(X,A)=softmax(A ReLU(A XW(0))W(1)) (9)
H 1 = R e L U ( A ^ X W ( 0 ) ) H^1 = ReLU(\widehat{A}XW^{(0)}) H1=ReLU(A XW(0))
o u t p u t = A ^ H 1 W ( 1 ) output = \widehat{A}H^1 W^{(1)} output=A H1W(1)
o u t p u t ( 第 一 层 ) = A ^ ∗ s u p p o r t = [ a 1 , 1 x 1 w 1 + . . . + a 1 , 2708 x 2708 w 1 ⋯ a 1 , 1 x 1 w 16 + . . . + a 1 , 2708 x 2708 w 16 a 2 , 1 x 1 w 1 + . . . + a 2 , 2708 x 2708 w 1 ⋯ a 2 , 1 x 1 w 16 + . . . + a 2 , 2708 x 2708 w 16 ⋮ ⋱ ⋮ a 2708 , 1 x 1 w 1 + . . . + a 2708 , 2708 x 2708 w 1 ⋯ a 2708 , 1 x 1 w 16 + . . . + a 2708 , 2708 x 2708 w 16 ] output(第一层) =\widehat{A} *support=\\ \begin{bmatrix} {a_{1,1}x_{1}w_{1}+...+a_{1,2708}x_{2708}w_{1}}&{\cdots}&{a_{1,1}x_{1}w_{16}+...+a_{1,2708}x_{2708}w_{16}}\\ {a_{2,1}x_{1}w_{1}+...+a_{2,2708}x_{2708}w_{1}}&{\cdots}&{a_{2,1}x_{1}w_{16}+...+a_{2,2708}x_{2708}w_{16}}\\ {\vdots}&{\ddots}&{\vdots}\\ {a_{2708,1}x_{1}w_{1}+...+a_{2708,2708}x_{2708}w_{1}}&{\cdots}&{a_{2708,1}x_{1}w_{16}+...+a_{2708,2708}x_{2708}w_{16}}\\ \end{bmatrix}\\ output(第一层)=A ∗support=⎣⎢⎢⎢⎡a1,1x1w1+...+a1,2708x2708w1a2,1x1w1+...+a2,2708x2708w1⋮a2708,1x1w1+...+a2708,2708x2708w1⋯⋯⋱⋯a1,1x1w16+...+a1,2708x2708w16a2,1x1w16+...+a2,2708x2708w16⋮a2708,1x1w16+...+a2708,2708x2708w16⎦⎥⎥⎥⎤
将 a i , 1 x 1 w j + . . . + a i , 2708 x 2708 w j 定 义 为 h i , j 。 i ∈ [ 1 , 2708 ] , j ∈ [ 1 , 16 ] 将a_{i,1}x_{1}w_{j}+...+a_{i,2708}x_{2708}w_{j}定义为h_{i,j}。\\ i\in[1,2708],j\in[1,16] 将ai,1x1wj+...+ai,2708x2708wj定义为hi,j。i∈[1,2708],j∈[1,16]
H 1 = o u t p u t = [ h 1 , 1 h 1 , 2 ⋯ h 1 , 16 h 2 , 1 h 2 , 2 ⋯ h 2 , 16 ⋮ ⋮ ⋱ ⋮ h 2708 , 1 h 2708 , 2 ⋯ h 2708 , 16 ] ( 第 二 层 的 输 入 ) H^1=output=\\ \begin{bmatrix} {h_{1,1}}&{h_{1,2}}&{\cdots}&{h_{1,16}}\\ {h_{2,1}}&{h_{2,2}}&{\cdots}&{h_{2,16}}\\ {\vdots}&{\vdots}&{\ddots}&{\vdots}\\ {h_{2708,1}}&{h_{2708,2}}&{\cdots}&{h_{2708,16}}\\ \end{bmatrix}\\ (第二层的输入) H1=output=⎣⎢⎢⎢⎡h1,1h2,1⋮h2708,1h1,2h2,2⋮h2708,2⋯⋯⋱⋯h1,16h2,16⋮h2708,16⎦⎥⎥⎥⎤(第二层的输入)
W 1 = [ w 1 , 1 w 1 , 2 ⋯ w 1 , 7 w 2 , 1 w 2 , 2 ⋯ w 2 , 7 ⋮ ⋮ ⋱ ⋮ w 16 , 1 w 16 , 2 ⋯ w 16 , 7 ] ( 第 二 层 的 权 重 矩 阵 ) W^{1}= \begin{bmatrix} {w_{1,1}}&{w_{1,2}}&{\cdots}&{w_{1,7}}\\ {w_{2,1}}&{w_{2,2}}&{\cdots}&{w_{2,7}}\\ {\vdots}&{\vdots}&{\ddots}&{\vdots}\\ {w_{16,1}}&{w_{16,2}}&{\cdots}&{w_{16,7}}\\ \end{bmatrix}\\ (第二层的权重矩阵) W1=⎣⎢⎢⎢⎡w1,1w2,1⋮w16,1w1,2w2,2⋮w16,2⋯⋯⋱⋯w1,7w2,7⋮w16,7⎦⎥⎥⎥⎤(第二层的权重矩阵)
A ^ = [ a 1 , 1 a 1 , 2 ⋯ a 1 , 2708 a 2 , 1 a 2 , 2 ⋯ a 2 , 2708 ⋮ ⋮ ⋱ ⋮ a 2708 , 1 a 2708 , 2 ⋯ a 2708 , 2708 ] ( c o r a 数 据 集 对 应 的 归 一 化 的 对 称 矩 阵 ) \widehat{A} = \begin{bmatrix} {a_{1,1}}&{a_{1,2}}&{\cdots}&{a_{1,2708}}\\ {a_{2,1}}&{a_{2,2}}&{\cdots}&{a_{2,2708}}\\ {\vdots}&{\vdots}&{\ddots}&{\vdots}\\ {a_{2708,1}}&{a_{2708,2}}&{\cdots}&{a_{2708,2708}}\\ \end{bmatrix}\\ (cora数据集对应的归一化的对称矩阵) A =⎣⎢⎢⎢⎡a1,1a2,1⋮a2708,1a1,2a2,2⋮a2708,2⋯⋯⋱⋯a1,2708a2,2708⋮a2708,2708⎦⎥⎥⎥⎤(cora数据集对应的归一化的对称矩阵)
s u p p o r t = H 1 W 1 = [ h 1 , 1 h 1 , 2 ⋯ h 1 , 16 h 2 , 1 h 2 , 2 ⋯ h 2 , 16 ⋮ ⋮ ⋱ ⋮ h 2708 , 1 h 2708 , 2 ⋯ h 2708 , 16 ] ∗ [ w 1 , 1 w 1 , 2 ⋯ w 1 , 7 w 2 , 1 w 2 , 2 ⋯ w 2 , 7 ⋮ ⋮ ⋱ ⋮ w 16 , 1 w 16 , 2 ⋯ w 16 , 7 ] support = H^1W^{1}=\begin{bmatrix} {h_{1,1}}&{h_{1,2}}&{\cdots}&{h_{1,16}}\\ {h_{2,1}}&{h_{2,2}}&{\cdots}&{h_{2,16}}\\ {\vdots}&{\vdots}&{\ddots}&{\vdots}\\ {h_{2708,1}}&{h_{2708,2}}&{\cdots}&{h_{2708,16}}\\ \end{bmatrix} *\begin{bmatrix} {w_{1,1}}&{w_{1,2}}&{\cdots}&{w_{1,7}}\\ {w_{2,1}}&{w_{2,2}}&{\cdots}&{w_{2,7}}\\ {\vdots}&{\vdots}&{\ddots}&{\vdots}\\ {w_{16,1}}&{w_{16,2}}&{\cdots}&{w_{16,7}}\\ \end{bmatrix} support=H1W1=⎣⎢⎢⎢⎡h1,1h2,1⋮h2708,1h1,2h2,2⋮h2708,2⋯⋯⋱⋯h1,16h2,16⋮h2708,16⎦⎥⎥⎥⎤∗⎣⎢⎢⎢⎡w1,1w2,1⋮w16,1w1,2w2,2⋮w16,2⋯⋯⋱⋯w1,7w2,7⋮w16,7⎦⎥⎥⎥⎤
H 1 W 1 = [ h 1 , 1 w 1 , 1 + h 1 , 2 w 2 , 1 + ⋯ + h 1 , 16 w 16 , 1 ⋯ h 1 , 1 w 1 , 7 + h 1 , 2 w 2 , 7 + ⋯ + h 1 , 16 w 16 , 7 h 2 , 1 w 1 , 1 + h 2 , 2 w 2 , 1 + ⋯ + h 2 , 16 w 16 , 1 ⋯ h 2 , 1 w 1 , 7 + h 2 , 2 w 2 , 7 + ⋯ + h 2 , 16 w 16 , 7 ⋮ ⋱ ⋮ h 2708 , 1 w 1 , 1 + h 2708 , 2 w 2 , 1 + ⋯ + h 2708 , 16 w 16 , 1 ⋯ h 2708 , 1 w 1 , 7 + h 2708 , 2 w 2 , 7 + ⋯ + h 2708 , 16 w 16 , 7 ] ( s h a p e = [ 2708 , 7 ] ) H^1W^{1}=\begin{bmatrix} {h_{1,1}w_{1,1}}+{h_{1,2}w_{2,1}}+{\cdots}+{h_{1,16}w_{16,1}}&{\cdots}&{h_{1,1}w_{1,7}}+{h_{1,2}w_{2,7}}+{\cdots}+{h_{1,16}w_{16,7}}\\ {h_{2,1}w_{1,1}}+{h_{2,2}w_{2,1}}+{\cdots}+{h_{2,16}w_{16,1}}&{\cdots}&{h_{2,1}w_{1,7}}+{h_{2,2}w_{2,7}}+{\cdots}+{h_{2,16}w_{16,7}}\\ {\vdots}&{\ddots}&{\vdots}\\ {h_{2708,1}w_{1,1}}+{h_{2708,2}w_{2,1}}+{\cdots}+{h_{2708,16}w_{16,1}}&{\cdots}&{h_{2708,1}w_{1,7}}+{h_{2708,2}w_{2,7}}+{\cdots}+{h_{2708,16}w_{16,7}}\\ \end{bmatrix}\\ (shape=[2708,7]) H1W1=⎣⎢⎢⎢⎡h1,1w1,1+h1,2w2,1+⋯+h1,16w16,1h2,1w1,1+h2,2w2,1+⋯+h2,16w16,1⋮h2708,1w1,1+h2708,2w2,1+⋯+h2708,16w16,1⋯⋯⋱⋯h1,1w1,7+h1,2w2,7+⋯+h1,16w16,7h2,1w1,7+h2,2w2,7+⋯+h2,16w16,7⋮h2708,1w1,7+h2708,2w2,7+⋯+h2708,16w16,7⎦⎥⎥⎥⎤(shape=[2708,7])
将 h i , 1 w 1 , j + h i , 2 w 2 , j + ⋯ + h i , 16 w 16 , j 记 为 h i ⃗ w j ⃗ h i ⃗ = [ h i , 1 , h i , 2 , ⋯ , h i , 16 ] , i ∈ [ 1 , 2708 ] , w j ⃗ = [ w 1 , j , w 2 , j , ⋯ , w 16 , j ] T , j ∈ [ 1 , 7 ] . 将{h_{i,1}w_{1,j}}+{h_{i,2}w_{2,j}}+{\cdots}+{h_{i,16}w_{16,j}}记为\vec{h_{i}}\vec{w_{j}}\\ \vec{h_{i}}=[ h_{i,1},h_{i,2},{\cdots},h_{i,16}], i\in[1,2708], \vec{w_{j}}=[ w_{1,j},w_{2,j},{\cdots},w_{16,j}]^{T}, j\in[1,7]. 将hi,1w1,j+hi,2w2,j+⋯+hi,16w16,j记为hiwjhi=[hi,1,hi,2,⋯,hi,16],i∈[1,2708],wj=[w1,j,w2,j,⋯,w16,j]T,j∈[1,7].
o u t p u t = A ^ ∗ s u p p o r t = [ a 1 , 1 h 1 ⃗ w 1 ⃗ + . . . + a 1 , 2708 h 2708 ⃗ w 1 ⃗ ⋯ a 1 , 1 h 1 ⃗ w 7 ⃗ + . . . + a 1 , 2708 h 2708 ⃗ w 7 ⃗ a 2 , 1 h 1 ⃗ w 1 ⃗ + . . . + a 2 , 2708 h 2708 ⃗ w 1 ⃗ ⋯ a 2 , 1 h 1 ⃗ w 7 ⃗ + . . . + a 2 , 2708 h 2708 ⃗ w 7 ⃗ ⋮ ⋱ ⋮ a 2708 , 1 h 1 ⃗ w 1 ⃗ + . . . + a 2708 , 2708 h 2708 ⃗ w 1 ⃗ ⋯ a 2708 , 1 h 1 ⃗ w 7 ⃗ + . . . + a 2708 , 2708 h 2708 ⃗ w 7 ⃗ ] output =\widehat{A} *support=\\ \begin{bmatrix} {a_{1,1}\vec{h_{1}}\vec{w_{1}}+...+a_{1,2708}\vec{h_{2708}}\vec{w_{1}}}&{\cdots}&{a_{1,1}\vec{h_{1}}\vec{w_{7}}+...+a_{1,2708}\vec{h_{2708}}\vec{w_{7}}}\\ {a_{2,1}\vec{h_{1}}\vec{w_{1}}+...+a_{2,2708}\vec{h_{2708}}\vec{w_{1}}}&{\cdots}&{a_{2,1}\vec{h_{1}}\vec{w_{7}}+...+a_{2,2708}\vec{h_{2708}}\vec{w_{7}}}\\ {\vdots}&{\ddots}&{\vdots}\\ {a_{2708,1}\vec{h_{1}}\vec{w_{1}}+...+a_{2708,2708}\vec{h_{2708}}\vec{w_{1}}}&{\cdots}&{a_{2708,1}\vec{h_{1}}\vec{w_{7}}+...+a_{2708,2708}\vec{h_{2708}}\vec{w_{7}}}\\ \end{bmatrix}\\ output=A ∗support=⎣⎢⎢⎢⎡a1,1h1w1+...+a1,2708h2708w1a2,1h1w1+...+a2,2708h2708w1⋮a2708,1h1w1+...+a2708,2708h2708w1⋯⋯⋱⋯a1,1h1w7+...+a1,2708h2708w7a2,1h1w7+...+a2,2708h2708w7⋮a2708,1h1w7+...+a2708,2708h2708w7⎦⎥⎥⎥⎤
∂ L o s s ∂ w 1 , 1 = ∂ L o s s ∂ l n Z ∗ ∂ l n Z ∂ H 1 ∗ ∂ H 1 ∂ w 1 , 1 \frac{\partial Loss}{\partial w_{1,1}} = \frac{\partial Loss}{\partial lnZ} *\frac{\partial lnZ}{\partial H^1} *\frac{\partial H^1}{\partial w_{1,1}} ∂w1,1∂Loss=∂lnZ∂Loss∗∂H1∂lnZ∗∂w1,1∂H1
X ( 输 入 的 结 点 特 征 ) = [ x ⃗ 1 [ 1433 ] , x ⃗ 2 [ 1433 ] , x ⃗ 3 [ 1433 ] , . . . , x ⃗ 2708 [ 1433 ] ] T X(输入的结点特征) = [\vec{x}_1[1433],\vec{x}_2[1433],\vec{x}_3[1433],...,\vec{x}_{2708}[1433]]^T X(输入的结点特征)=[x1[1433],x2[1433],x3[1433],...,x2708[1433]]T
X W 0 = [ x ⃗ 1 ′ [ 16 ] , x ⃗ 2 ′ [ 16 ] , x ⃗ 3 ′ [ 16 ] , . . . , x ⃗ 2708 ′ [ 16 ] ] T XW^0 =[\vec{x}'_1[16],\vec{x}'_2[16],\vec{x}'_3[16],...,\vec{x}'_{2708}[16]]^T XW0=[x1′[16],x2′[16],x3′[16],...,x2708′[16]]T
x ⃗ 1 ′ [ 0 ] = x 1 , 1 w 1 , 1 + x 1 , 2 w 2 , 1 + ⋯ + x 1 , 1433 w 1433 , 1 + b i a s \vec{x}'_1[0] =x_{1,1}w_{1,1}+{x_{1,2}w_{2,1}}+{\cdots}+{x_{1,1433}w_{1433,1}} +bias x1′[0]=x1,1w1,1+x1,2w2,1+⋯+x1,1433w1433,1+bias
A ^ = D ‾ − 1 A ‾ , 其 中 a i j ^ = a i j ‾ d i i ‾ A ‾ 是 原 始 的 矩 阵 A 对 称 化 , 再 加 上 上 对 角 矩 阵 的 结 果 。 \widehat{A} = \overline{D}^{-1}\overline{A},其中\widehat{a_{ij}}=\frac{\overline{a_{ij}}}{\overline{d_{ii}}}\\ \overline{A}是原始的矩阵A对称化,再加上上对角矩阵的结果。 A =D−1A,其中aij =diiaijA是原始的矩阵A对称化,再加上上对角矩阵的结果。
原 始 的 邻 接 矩 阵 A = [ 0 1 1 0 1 0 0 0 0 1 0 0 0 0 1 0 ] A ~ = [ 1 1 1 0 1 1 0 0 0 1 1 0 0 0 1 1 ] , D ~ = [ 3 0 0 0 0 2 0 0 0 0 2 0 0 0 0 2 ] A ‾ = [ 1 1 1 0 1 1 1 0 1 1 1 1 0 0 1 1 ] , D ‾ = [ 3 0 0 0 0 3 0 0 0 0 4 0 0 0 0 2 ] 原始的邻接矩阵A =\begin{bmatrix} {0}&{1}&{1}&{0}\\ {1}&{0}&{0}&{0}\\ {0}&{1}&{0}&{0}\\ {0}&{0}&{1}&{0}\\ \end{bmatrix}\\ \widetilde{A} =\begin{bmatrix} {1}&{1}&{1}&{0}\\ {1}&{1}&{0}&{0}\\ {0}&{1}&{1}&{0}\\ {0}&{0}&{1}&{1}\\ \end{bmatrix} , \widetilde{D} =\begin{bmatrix} {3}&{0}&{0}&{0}\\ {0}&{2}&{0}&{0}\\ {0}&{0}&{2}&{0}\\ {0}&{0}&{0}&{2}\\ \end{bmatrix}\\ \overline{A} =\begin{bmatrix} {1}&{1}&{1}&{0}\\ {1}&{1}&{1}&{0}\\ {1}&{1}&{1}&{1}\\ {0}&{0}&{1}&{1}\\ \end{bmatrix}, \overline{D} =\begin{bmatrix} {3}&{0}&{0}&{0}\\ {0}&{3}&{0}&{0}\\ {0}&{0}&{4}&{0}\\ {0}&{0}&{0}&{2}\\ \end{bmatrix}\\ 原始的邻接矩阵A=⎣⎢⎢⎡0100101010010000⎦⎥⎥⎤A =⎣⎢⎢⎡1100111010110001⎦⎥⎥⎤,D =⎣⎢⎢⎡3000020000200002⎦⎥⎥⎤A=⎣⎢⎢⎡1110111011110011⎦⎥⎥⎤,D=⎣⎢⎢⎡3000030000400002⎦⎥⎥⎤
A ^ = D ‾ − 1 A ‾ = [ 1 / 3 1 / 3 1 / 3 0 1 / 3 1 / 3 1 / 3 0 1 / 4 1 / 4 1 / 4 1 / 4 0 0 1 / 2 1 / 2 ] \widehat{A} = \overline{D}^{-1}\overline{A}= \begin{bmatrix} {1/3}&{1/3}&{1/3}&{0}\\ {1/3}&{1/3}&{1/3}&{0}\\ {1/4}&{1/4}&{1/4}&{1/4}\\ {0}&{0}&{1/2}&{1/2}\\ \end{bmatrix} A =D−1A=⎣⎢⎢⎡1/31/31/401/31/31/401/31/31/41/2001/41/2⎦⎥⎥⎤
A ^ = D ~ − 1 2 A ~ D ~ − 1 2 , 其 中 a i j ^ = a i j ~ d i i ∗ d j j A ~ = A + I N \widehat{A} = \widetilde{D}^{-\frac{1}{2}}\widetilde{A}\widetilde{D}^{-\frac{1}{2}},其中\widehat{a_{ij}}=\frac{\widetilde{a_{ij}}}{\sqrt{d_{ii}}*\sqrt{d_{jj}}}\\ \widetilde{A} = A + I_N A =D −21A D −21,其中aij =dii∗djjaij A =A+IN
A ^ = D ~ − 1 2 A ~ D ~ − 1 2 = [ 1 3 0 0 0 0 1 2 0 0 0 0 1 2 0 0 0 0 1 2 ] [ 1 1 1 0 1 1 0 0 0 1 1 0 0 0 1 1 ] [ 1 3 0 0 0 0 1 2 0 0 0 0 1 2 0 0 0 0 1 2 ] r e s u l t = [ 1 3 1 6 1 6 0 1 6 1 2 0 0 0 1 2 1 2 0 0 0 1 2 1 2 ] \widehat{A} = \widetilde{D}^{-\frac{1}{2}}\widetilde{A}\widetilde{D}^{-\frac{1}{2}}= \begin{bmatrix} {\frac{1}{\sqrt{3}}}&{0}&{0}&{0}\\ {0}&{\frac{1}{\sqrt{2}}}&{0}&{0}\\ {0}&{0}&{\frac{1}{\sqrt{2}}}&{0}\\ {0}&{0}&{0}&{\frac{1}{\sqrt{2}}}\\ \end{bmatrix} \begin{bmatrix} {1}&{1}&{1}&{0}\\ {1}&{1}&{0}&{0}\\ {0}&{1}&{1}&{0}\\ {0}&{0}&{1}&{1}\\ \end{bmatrix} \begin{bmatrix} {\frac{1}{\sqrt{3}}}&{0}&{0}&{0}\\ {0}&{\frac{1}{\sqrt{2}}}&{0}&{0}\\ {0}&{0}&{\frac{1}{\sqrt{2}}}&{0}\\ {0}&{0}&{0}&{\frac{1}{\sqrt{2}}}\\ \end{bmatrix}\\ result=\begin{bmatrix} {\frac{1}{3}}&{\frac{1}{\sqrt{6}}}&{\frac{1}{\sqrt{6}}}&{0}\\ {\frac{1}{\sqrt{6}}}&{\frac{1}{2}}&{0}&{0}\\ {0}&{\frac{1}{2}}&{\frac{1}{2}}&{0}\\ {0}&{0}&{\frac{1}{2}}&{\frac{1}{2}}\\ \end{bmatrix} A =D −21A D −21=⎣⎢⎢⎢⎡31000021000021000021⎦⎥⎥⎥⎤⎣⎢⎢⎡1100111010110001⎦⎥⎥⎤⎣⎢⎢⎢⎡31000021000021000021⎦⎥⎥⎥⎤result=⎣⎢⎢⎡3161006121210610212100021⎦⎥⎥⎤
H ( i ) = α ( X ( i ) W ( i − 1 ) + b i a s ( i ) ) , 1 ⩽ i i 为 隐 藏 层 数 量 ; 其 中 α 是 激 活 函 数 ; X ( i ) 为 第 i 层 输 入 的 矩 阵 表 示 ; W ( i − 1 ) 为 第 i 层 的 权 重 矩 阵 ; b i a s ( i ) 为 第 i 层 的 偏 置 ; X ( i + 1 ) = H ( i ) ; H^{(i)} = \alpha (X^{(i)}W^{(i-1)}+bias^{(i)}), 1\leqslant i \\ i为隐藏层数量;\\ 其中\alpha是激活函数;\\ X^{(i)}为第i层输入的矩阵表示;\\ W^{(i-1)}为第i层的权重矩阵;\\ bias^{(i)}为第i层的偏置;\\ X^{(i+1)}=H^{(i)}; H(i)=α(X(i)W(i−1)+bias(i)),1⩽ii为隐藏层数量;其中α是激活函数;X(i)为第i层输入的矩阵表示;W(i−1)为第i层的权重矩阵;bias(i)为第i层的偏置;X(i+1)=H(i);
H ( 1 ) = α ( X W ( 0 ) + b i a s ( 1 ) ) H^{(1)} = \alpha (XW^{(0)}+bias^{(1)}) H(1)=α(XW(0)+bias(1))
X W 0 = [ x 1 , 1 w 1 , 1 + x 1 , 2 w 2 , 1 + ⋯ + x 1 , 1433 w 1433 , 1 ⋯ x 1 , 1 w 1 , 16 + x 1 , 2 w 2 , 16 + ⋯ + x 1 , 1433 w 1433 , 16 x 2 , 1 w 1 , 1 + x 2 , 2 w 2 , 1 + ⋯ + x 2 , 1433 w 1433 , 1 ⋯ x 2 , 1 w 1 , 16 + x 2 , 2 w 2 , 16 + ⋯ + x 2 , 1433 w 1433 , 16 ⋮ ⋱ ⋮ x 2708 , 1 w 1 , 1 + x 2708 , 2 w 2 , 1 + ⋯ + x 2708 , 1433 w 1433 , 1 ⋯ x 2708 , 1 w 1 , 16 + x 2708 , 2 w 2 , 16 + ⋯ + x 2708 , 1433 w 1433 , 16 ] ( s h a p e = [ 2708 , 16 ] ) XW^{0} =\begin{bmatrix} {x_{1,1}w_{1,1}}+{x_{1,2}w_{2,1}}+{\cdots}+{x_{1,1433}w_{1433,1}}&{\cdots}&{x_{1,1}w_{1,16}}+{x_{1,2}w_{2,16}}+{\cdots}+{x_{1,1433}w_{1433,16}}\\ {x_{2,1}w_{1,1}}+{x_{2,2}w_{2,1}}+{\cdots}+{x_{2,1433}w_{1433,1}}&{\cdots}& {x_{2,1}w_{1,16}}+{x_{2,2}w_{2,16}}+{\cdots}+{x_{2,1433}w_{1433,16}}\\ {\vdots}&{\ddots}&{\vdots}\\ {x_{2708,1}w_{1,1}}+{x_{2708,2}w_{2,1}}+{\cdots}+{x_{2708,1433}w_{1433,1}}&{\cdots}& {x_{2708,1}w_{1,16}}+{x_{2708,2}w_{2,16}}+{\cdots}+{x_{2708,1433}w_{1433,16}}\\ \end{bmatrix}\\ (shape=[2708,16]) XW0=⎣⎢⎢⎢⎡x1,1w1,1+x1,2w2,1+⋯+x1,1433w1433,1x2,1w1,1+x2,2w2,1+⋯+x2,1433w1433,1⋮x2708,1w1,1+x2708,2w2,1+⋯+x2708,1433w1433,1⋯⋯⋱⋯x1,1w1,16+x1,2w2,16+⋯+x1,1433w1433,16x2,1w1,16+x2,2w2,16+⋯+x2,1433w1433,16⋮x2708,1w1,16+x2708,2w2,16+⋯+x2708,1433w1433,16⎦⎥⎥⎥⎤(shape=[2708,16])
将 x i , 1 w 1 , j + x i , 2 w 2 , j + ⋯ + x i , 1433 w 1433 , j 记 为 x i w j x i ⃗ = [ x i , 1 , x i , 2 , ⋯ , x i , 1433 ] , i ∈ [ 1 , 2708 ] , w j ⃗ = [ w 1 , j , w 2 , j , ⋯ , w 1433 , j ] T , j ∈ [ 1 , 16 ] , 将{x_{i,1}w_{1,j}}+{x_{i,2}w_{2,j}}+{\cdots}+{x_{i,1433}w_{1433,j}}记为x_{i}w_{j}\\ \vec{x_{i}}=[ x_{i,1},x_{i,2},{\cdots},x_{i,1433}], i\in[1,2708], \vec{w_{j}}=[ w_{1,j},w_{2,j},{\cdots},w_{1433,j}]^{T}, j\in[1,16], 将xi,1w1,j+xi,2w2,j+⋯+xi,1433w1433,j记为xiwjxi=[xi,1,xi,2,⋯,xi,1433],i∈[1,2708],wj=[w1,j,w2,j,⋯,w1433,j]T,j∈[1,16],
b i a s ( 1 ) = [ b i a s 1 , b i a s 2 , . . . , b i a s 16 ] T bias^{(1)}=[bias_1,bias_2,...,bias_{16}]^{T} bias(1)=[bias1,bias2,...,bias16]T
α ( X W 0 + b i a s ( 1 ) ) = [ α ( x 1 ⃗ w 1 ⃗ + b i a s 1 ) α ( x 1 ⃗ w 2 ⃗ + b i a s 2 ) ⋯ α ( x 1 ⃗ w 15 ⃗ + b i a s 15 ) α ( x 1 ⃗ w 16 ⃗ + b i a s 16 ) α ( x 2 ⃗ w 1 ⃗ + b i a s 1 ) α ( x 2 ⃗ w 2 ⃗ + b i a s 2 ) ⋯ α ( x 2 ⃗ w 15 ⃗ + b i a s 15 ) α ( x 2 ⃗ w 16 ⃗ + b i a s 16 ) ⋮ ⋮ ⋱ ⋮ ⋮ α ( x 2708 ⃗ w 1 ⃗ + b i a s 1 ) α ( x 2708 ⃗ w 2 ⃗ + b i a s 2 ) ⋯ α ( x 2708 ⃗ w 15 ⃗ + b i a s 15 ) α ( x 2708 ⃗ w 16 ⃗ + b i a s 16 ) ] ( s h a p e = [ 2708 , 16 ] ) \alpha(XW^{0} + bias^{(1)}) =\begin{bmatrix} {\alpha(\vec{x_{1}} \vec{w_{1}} +bias_1)}&{\alpha(\vec{x_{1}} \vec{w_{2}}+bias_2)}&{\cdots}&{\alpha(\vec{x_{1}} \vec{w_{15}}+bias_{15})}&{\alpha(\vec{x_{1}} \vec{w_{16}}+bias_{16})}\\ {\alpha(\vec{x_{2}} \vec{w_{1}} +bias_1)}&{\alpha(\vec{x_{2}} \vec{w_{2}}+bias_2)}&{\cdots}&{\alpha(\vec{x_{2}} \vec{w_{15}}+bias_{15})}&{\alpha(\vec{x_{2}} \vec{w_{16}}+bias_{16})}\\ {\vdots}&{\vdots}&{\ddots}&{\vdots}&{\vdots}\\ {\alpha(\vec{x_{2708}} \vec{w_{1}} +bias_1)}&{\alpha(\vec{x_{2708}} \vec{w_{2}}+bias_2)}&{\cdots}&{\alpha(\vec{x_{2708}} \vec{w_{15}}+bias_{15})}&{\alpha(\vec{x_{2708}} \vec{w_{16}}+bias_{16})}\\ \end{bmatrix}\\ (shape=[2708,16]) α(XW0+bias(1))=⎣⎢⎢⎢⎡α(x1w1+bias1)α(x2w1+bias1)⋮α(x2708w1+bias1)α(x1w2+bias2)α(x2w2+bias2)⋮α(x2708w2+bias2)⋯⋯⋱⋯α(x1w15+bias15)α(x2w15+bias15)⋮α(x2708w15+bias15)α(x1w16+bias16)α(x2w16+bias16)⋮α(x2708w16+bias16)⎦⎥⎥⎥⎤(shape=[2708,16])
https://pytorch.apachecn.org
PyTorch