深度学习基础 - 链式法则

深度学习基础 - 链式法则

flyfish

复合函数
Function composition 从字面理解就是函数的组合
直观理解就是多个函数组合在一起
代码表示

#include 
int f(int x){
	return x+1;
}
int g(int x){
    return x+2;
}
int main(){
    int x=3;
    std::cout<

如果看复合函数的定义,它已经脱离了人们的日常生活语言
复合函数 ( f ∘ g ) ′ ( x ) (f\circ g)'(x) (fg)(x)的导数是:
( f ∘ g ) ′ ( x ) = f ′ ( g ( x ) ) g ′ ( x ) (f \circ g)^{\prime}(x)=f^{\prime}(g(x)) g^{\prime}(x) (fg)(x)=f(g(x))g(x)
其中 ( f ∘ g ) ′ ( x ) (f\circ g)'(x) (fg)(x)的读法是
中间的圈多种读法
g circle f
g round f
g about f
g composed with f
g after f
g following f
g of f
g on f

不同的写法
F ( x ) = f ( g ( x ) ) F ′ ( x ) = f ′ ( g ( x ) ) g ′ ( x ) F\left( x \right) = f\left( {g\left( x \right)} \right)\hspace{0.5in}F'\left( x \right) = f'\left( {g\left( x \right)} \right)g'\left( x \right) F(x)=f(g(x))F(x)=f(g(x))g(x)
 if  z = f ( y )  and  y = g ( x )  then  \text { if } z=f(y) \text { and } y=g(x) \text { then }  if z=f(y) and y=g(x) then 
d z d x = d z d y ⋅ d y d x = f ′ ( y ) g ′ ( x ) = f ′ ( g ( x ) ) g ′ ( x ) \frac{d z}{d x}=\frac{d z}{d y} \cdot \frac{d y}{d x}=f^{\prime}(y) g^{\prime}(x)=f^{\prime}(g(x)) g^{\prime}(x) dxdz=dydzdxdy=f(y)g(x)=f(g(x))g(x)
如何分解
y = sin ⁡ 2 x y = sin ⁡ u ; u = 2 x y = ln ⁡ sin ⁡ x y = ln ⁡ u ; u = sin ⁡ x y = ln ⁡ cos ⁡ e x y = ln ⁡ u ; u = cos ⁡ v ; ν = e x \begin{array}{l}{y=\sin 2 x \quad \quad y=\sin u ; u=2 x} \\ {y=\ln \sin x \quad y=\ln u ; u=\sin x} \\ {y=\ln \cos e^{x} \quad y=\ln u ; u=\cos v ; \nu=e^{x}}\end{array} y=sin2xy=sinu;u=2xy=lnsinxy=lnu;u=sinxy=lncosexy=lnu;u=cosv;ν=ex

如何求导
例子
f ( g ( x ) ) = ( 3 x + 1 ) 2 f(g(x))=(3x+1)^2 f(g(x))=(3x+1)2 的导数
解:
f ( g ) = g 2   ,   g ( x ) = 3 x + 1 f(g)=g^2\ ,\ g(x)=3x+1 f(g)=g2 , g(x)=3x+1
f ′ ( g ) = 2 g   ,   g ′ ( x ) = 3 f'(g)=2g\ ,\ g'(x)=3 f(g)=2g , g(x)=3
f ( g ( x ) ) ′ = 2 ( 3 x + 1 ) ( 3 ) = 18 x + 6 f(g(x))'=2(3x+1)(3)=18x+6 f(g(x))=2(3x+1)(3)=18x+6

f ( g ( x ) ) = sin ⁡ ( x 2 + 2 ) f(g(x))=\sin(x^2+2) f(g(x))=sin(x2+2) 的导数

f ( g ) = sin ⁡ ( g )   ,   g ( x ) = x 2 + 2 f(g)=\sin(g)\ ,\ g(x)=x^2+2 f(g)=sin(g) , g(x)=x2+2
f ′ ( g ) = cos ⁡ ( g )   ,   g ′ ( x ) = 2 x f'(g)=\cos(g)\ ,\ g'(x)=2x f(g)=cos(g) , g(x)=2x
f ( g ( x ) ) ′ = cos ⁡ ( x 2 + 2 ) ⋅ 2 x = 2 x ⋅ cos ⁡ ( x 2 + 2 ) f(g(x))'=\cos(x^2+2)\cdot2x=2x\cdot\cos(x^2+2) f(g(x))=cos(x2+2)2x=2xcos(x2+2)

H ( x ) = ( 2 x + 1 ) 3 \mathrm{H}(\mathrm{x})=(2 \mathrm{x}+1)^{3} H(x)=(2x+1)3的导数
f ( x ) = 2 x + 1 f(x)=2 x+1 f(x)=2x+1 g ( x ) = x 3 g(x)=x^{3} g(x)=x3
f ′ ( x ) = 2 f^{\prime}(x)=2 f(x)=2
g ′ ( x ) = 3 x 2 g^{\prime}(x)=3 x^{2} g(x)=3x2
H ( x ) = g ′ ( f ( x ) ) f ′ ( x ) = g ′ ( 2 x + 1 ) ( 2 ) = 3 ( 2 x + 1 ) 2 ( 2 ) = 6 ( 2 x + 1 ) 2 \begin{aligned} H(x) &=g^{\prime}(f(x)) f^{\prime}(x) \\ &=g^{\prime}(2 x+1)(2) \\ &=3(2 x+1)^{2}(2)=6(2 x+1)^{2} \end{aligned} H(x)=g(f(x))f(x)=g(2x+1)(2)=3(2x+1)2(2)=6(2x+1)2

三个函数的组合
( f ∘ g ∘ h ) ′ ( a ) = f ′ ( ( g ∘ h ) ( a ) ) ⋅ ( g ∘ h ) ′ ( a ) = f ′ ( ( g ∘ h ) ( a ) ) ⋅ g ′ ( h ( a ) ) ⋅ h ′ ( a ) = ( f ′ ∘ g ∘ h ) ( a ) ⋅ ( g ′ ∘ h ) ( a ) ⋅ h ′ ( a ) \begin{aligned}(f \circ g \circ h)^{\prime}(a) &=f^{\prime}((g \circ h)(a)) \cdot(g \circ h)^{\prime}(a) \\ &=f^{\prime}((g \circ h)(a)) \cdot g^{\prime}(h(a)) \cdot h^{\prime}(a)=\left(f^{\prime} \circ g \circ h\right)(a) \cdot\left(g^{\prime} \circ h\right)(a) \cdot h^{\prime}(a) \end{aligned} (fgh)(a)=f((gh)(a))(gh)(a)=f((gh)(a))g(h(a))h(a)=(fgh)(a)(gh)(a)h(a)

简写
d y d x = d y d u ⋅ d u d v ⋅ d v d x \frac{d y}{d x}=\frac{d y}{d u} \cdot \frac{d u}{d v} \cdot \frac{d v}{d x} dxdy=dudydvdudxdv

如果使用非标准微积分的方式更容易证明

 If  y = f ( x )  and  x = g ( t )  then  Δ t ≠ 0 Δ x = g ( t + Δ t ) − g ( t ) Δ y = f ( x + Δ x ) − f ( x ) ,  so  Δ y Δ t = Δ y Δ x Δ x Δ t  the stdrandard part d y d t = d y d x d x d t \begin{array}{l}{\text { If } y=f(x) \text { and } x=g(t) \text { then }{ \Delta t \neq 0 } \quad \Delta x=g(t+\Delta t)-g(t) } \\\\ {\Delta y=f(x+\Delta x)-f(x), \text { so }}\\ \\ {\quad \frac{\Delta y}{\Delta t}=\frac{\Delta y}{\Delta x} \frac{\Delta x}{\Delta t}} \\\\ {\text { the stdrandard part}} \\ \\ {\quad \frac{d y}{d t}=\frac{d y}{d x} \frac{d x}{d t}}\end{array}  If y=f(x) and x=g(t) then Δt=0Δx=g(t+Δt)g(t)Δy=f(x+Δx)f(x), so ΔtΔy=ΔxΔyΔtΔx the stdrandard partdtdy=dxdydtdx

非标准微积分就像我们在计算机程序中最后加个函数一样简单
st 表示Standard part function,有的书籍是std
f ′ ( x ) = st ⁡ ( f ( x + h ) − f ( x ) h ) f^{\prime}(x)=\operatorname{st}\left(\frac{f(x+h)-f(x)}{h}\right) f(x)=st(hf(x+h)f(x))

你可能感兴趣的:(深度学习基础)