对微分的简单理解

目录:

      • 一、一元函数的可微性
      • 二、偏导数
      • 三、二元函数的可微性
      • 四、 n n n 元函数的可微性
      • 五、向量函数的导数
      • 作者留言

研究生复试现场:

 T:“这位同学,请你解释一下微分是什么。”
 S:“啊,老师,微分的定义是~~~~”
 T:“嗯,定义背得不错,不过你能说下自己对微分的理解吗?”
 S:“这个,呃,这个… …”
 T:“好了,回去等通知吧!”

那微分到底是什么意思呢? 一言以蔽之,曰"线性近似".

一、一元函数的可微性

  设函数 f ( x ) \small f(x) f(x) x 0 x_0 x0 的某邻域 U ( x 0 ) \small U(x_0) U(x0) 内有定义,若对 U ( x 0 ) \small U(x_0) U(x0) 中的任意一点 x = x 0 + Δ x x=x_0+\Delta x x=x0+Δx,函数增量 Δ f \Delta f Δf 都可以表示为 Δ f = f ( x 0 + Δ x ) − f ( x 0 ) = A Δ x + o ( Δ x ) \Delta f=f(x_0+\Delta x)-f(x_0)=A\Delta x+o(\Delta x) Δf=f(x0+Δx)f(x0)=AΔx+o(Δx)其中 A \small A A 为常数,仅与 x 0 x_0 x0 有关,则称函数 f f f x 0 x_0 x0可微.

另外,还可以求出 A = lim ⁡ Δ x → 0 f ( x 0 + Δ x ) − f ( x 0 ) Δ x = f ′ ( x 0 ) \displaystyle A=\lim_{\Delta x \to0}\frac{f(x_0+\Delta x)-f(x_0)}{\Delta x}=f'(x_0) A=Δx0limΔxf(x0+Δx)f(x0)=f(x0).

可以看到,可微是函数 局部性质 的一个刻画.

二、偏导数

  设函数 f ( x ⃗ )    ( x ⃗ = ( x 1 , x 2 , ⋯   , x n ) ) f(\vec{x}) \,\,(\vec{x}=(x_1,x_2,\cdots,x_n)) f(x )(x =(x1,x2,,xn)) 在点 P 0 = ( x 10 , x 20 , ⋯   , x n 0 ) \small P_0=(x_{10},x_{20},\cdots,x_{n0}) P0=(x10,x20,,xn0) 处有定义. 将 x 1 x_1 x1 以外的其他变量当作常数,令 g ( x 1 ) = f ( x 1 , x 20 , ⋯   , x n 0 ) g(x_1)=f(x_{1},x_{20},\cdots,x_{n0}) g(x1)=f(x1,x20,,xn0),则 g ( x 1 ) g(x_1) g(x1) 是关于 x 1 x_1 x1 的一元函数. 若 g ( x 1 ) g(x_1) g(x1) x 10 x_{10} x10 处的导数 g ′ ( x 10 ) g'(x_{10}) g(x10) 存在,则称其为函数 f f f 在点 P 0 \small P_0 P0 处关于 x 1 x_1 x1 的偏导数,记作 f x 1 ( P 0 )    or    ∂ f ∂ x 1 ∣ P 0 f_{x_1}(P_0) \,\,\textmd{or}\,\, \frac{\partial f}{\partial x_1} \Big|_{P_0} fx1(P0)orx1fP0

三、二元函数的可微性

  设函数 f ( x , y ) f(x,y) f(x,y) 在点 P 0 = ( x 0 , y 0 ) \small P_0=(x_0,y_0) P0=(x0,y0) 的某邻域 U ( P 0 ) \small U(P_0) U(P0) 内有定义,若对 U ( P 0 ) \small U(P_0) U(P0) 中的任意一点 P = ( x , y ) = ( x 0 + Δ x , y 0 + Δ y ) \small P=(x,y)=(x_0+\Delta x,y_0+\Delta y) P=(x,y)=(x0+Δx,y0+Δy),函数增量 Δ f \Delta f Δf 都可以表示为 Δ f = f ( x 0 + Δ x , y 0 + Δ y ) − f ( x 0 , y 0 ) = A Δ x + B Δ y + o ( ( Δ x ) 2 + ( Δ y ) 2   ) \begin{aligned}\Delta f&=f(x_0+\Delta x,y_0+\Delta y)-f(x_0,y_0)\\&=A\Delta x+ B\Delta y+o\big(\sqrt{(\Delta x)^2+(\Delta y)^2}\,\big)\end{aligned} Δf=f(x0+Δx,y0+Δy)f(x0,y0)=AΔx+BΔy+o((Δx)2+(Δy)2 )其中 A , B \small A,B A,B 为常数,仅与 P 0 \small P_0 P0 有关,则称函数 f f f P 0 \small P_0 P0可微.

  一元函数情形中,函数增量 Δ f \Delta f Δf 可以表示为 Δ f = f ( x 0 + Δ x ) − f ( x 0 ) = A Δ x + o ( Δ x ) \Delta f=f(x_0+\Delta x)-f(x_0)=A\Delta x+o(\Delta x) Δf=f(x0+Δx)f(x0)=AΔx+o(Δx)并求得 A = f ′ ( x 0 ) A=f'(x_0) A=f(x0).

  那么在二元函数中 A , B \small A,B A,B又是什么呢?直觉性较强的读者应该能够猜到 A = f x ( x 0 , y 0 ) , B = f y ( x 0 , y 0 ) A=f_x(x_0,y_0), B=f_y(x_0,y_0) A=fx(x0,y0),B=fy(x0,y0)(不然我为什么要先介绍偏导数呢?嘿嘿 )

下面来证明这个直觉是对的.
1.    1.\,\, 1.证明 o ( ( Δ x ) 2 + ( Δ y ) 2   ) = α Δ x + β Δ y o\big(\sqrt{(\Delta x)^2+(\Delta y)^2}\,\big)=\alpha\Delta x+\beta\Delta y o((Δx)2+(Δy)2 )=αΔx+βΔy,其中 α , β \alpha,\beta α,β ( Δ x , Δ y ) (\Delta x,\Delta y) (Δx,Δy) 的函数且满足 lim ⁡ ( Δ x , Δ y ) → ( 0 , 0 ) α = lim ⁡ ( Δ x , Δ y ) → ( 0 , 0 ) β = 0 \lim_{(\Delta x,\Delta y)\to(0,0)}\alpha=\lim_{(\Delta x,\Delta y)\to(0,0)}\beta=0 (Δx,Δy)(0,0)limα=(Δx,Δy)(0,0)limβ=0证明:
( 1 ) ⇐ : (1) \Leftarrow: (1): α Δ x + β Δ y = o ( ( Δ x ) 2 + ( Δ y ) 2   ) \alpha\Delta x+\beta\Delta y=o\big(\sqrt{(\Delta x)^2+(\Delta y)^2}\,\big) αΔx+βΔy=o((Δx)2+(Δy)2 ) ∣ α Δ x + β Δ y ( Δ x ) 2 + ( Δ y ) 2 ∣ = ∣ α Δ x ( Δ x ) 2 + ( Δ y ) 2 + β Δ y ( Δ x ) 2 + ( Δ y ) 2 ∣ ≤ ∣ α ∣ ∣ Δ x ( Δ x ) 2 + ( Δ y ) 2 ∣ + ∣ β ∣ ∣ Δ y ( Δ x ) 2 + ( Δ y ) 2 ∣ ≤ ∣ α ∣ + ∣ β ∣ \begin{aligned} \Big| \frac{\alpha\Delta x+\beta\Delta y}{\sqrt{(\Delta x)^2+(\Delta y)^2}}\Big|&=\Big| \alpha\frac{\Delta x}{\sqrt{(\Delta x)^2+(\Delta y)^2}}+\beta\frac{\Delta y}{\sqrt{(\Delta x)^2+(\Delta y)^2}}\Big| \\& \leq \vert \alpha\vert\Big|\frac{\Delta x}{\sqrt{(\Delta x)^2+(\Delta y)^2}}\Big|+\vert \beta\vert\Big|\frac{\Delta y}{\sqrt{(\Delta x)^2+(\Delta y)^2}}\Big| \\& \leq\vert \alpha \vert+\vert \beta \vert \end{aligned} (Δx)2+(Δy)2 αΔx+βΔy=α(Δx)2+(Δy)2 Δx+β(Δx)2+(Δy)2 Δyα(Δx)2+(Δy)2 Δx+β(Δx)2+(Δy)2 Δyα+β两边取极限,得 lim ⁡ ( Δ x , Δ y ) → ( 0 , 0 ) α Δ x + β Δ y ( Δ x ) 2 + ( Δ y ) 2 = 0 \lim_{(\Delta x,\Delta y)\to(0,0)} \frac{\alpha\Delta x+\beta\Delta y}{\sqrt{(\Delta x)^2+(\Delta y)^2}}=0 (Δx,Δy)(0,0)lim(Δx)2+(Δy)2 αΔx+βΔy=0因此, α Δ x + β Δ y = o ( ( Δ x ) 2 + ( Δ y ) 2   ) \alpha\Delta x+\beta\Delta y=o\big(\sqrt{(\Delta x)^2+(\Delta y)^2}\,\big) αΔx+βΔy=o((Δx)2+(Δy)2 ).
( 2 ) ⇒ : (2) \Rightarrow: (2): o ( ( Δ x ) 2 + ( Δ y ) 2   ) = α Δ x + β Δ y o\big(\sqrt{(\Delta x)^2+(\Delta y)^2}\,\big)=\alpha\Delta x+\beta\Delta y o((Δx)2+(Δy)2 )=αΔx+βΔy,其中 lim ⁡ ( Δ x , Δ y ) → ( 0 , 0 ) α = lim ⁡ ( Δ x , Δ y ) → ( 0 , 0 ) β = 0 \lim_{(\Delta x,\Delta y)\to(0,0)}\alpha=\lim_{(\Delta x,\Delta y)\to(0,0)}\beta=0 (Δx,Δy)(0,0)limα=(Δx,Δy)(0,0)limβ=0  由定义易知, o ( ( Δ x ) 2 + ( Δ y ) 2   ) = ε ( Δ x ) 2 + ( Δ y ) 2 o\big(\sqrt{(\Delta x)^2+(\Delta y)^2}\,\big)=\varepsilon\sqrt{(\Delta x)^2+(\Delta y)^2} o((Δx)2+(Δy)2 )=ε(Δx)2+(Δy)2 ,其中 ε \varepsilon ε ( Δ x , Δ y ) (\Delta x,\Delta y) (Δx,Δy) 的函数且满足 lim ⁡ ( Δ x , Δ y ) → ( 0 , 0 ) ε = 0 \displaystyle \lim_{(\Delta x,\Delta y)\to(0,0)}\varepsilon=0 (Δx,Δy)(0,0)limε=0.    ε ( Δ x ) 2 + ( Δ y ) 2 =    ε ( Δ x ) 2 + ( Δ y ) 2 ( Δ x ) 2 + ( Δ y ) 2 =   ε Δ x ( Δ x ) 2 + ( Δ y ) 2 Δ x + ε Δ y ( Δ x ) 2 + ( Δ y ) 2 Δ y \begin{aligned}&\,\,\varepsilon\sqrt{(\Delta x)^2+(\Delta y)^2}\\=&\,\,\varepsilon\frac{(\Delta x)^2+(\Delta y)^2}{\sqrt{(\Delta x)^2+(\Delta y)^2}}\\=&\,\frac{\varepsilon\Delta x}{\sqrt{(\Delta x)^2+(\Delta y)^2}}\Delta x+\frac{\varepsilon\Delta y}{\sqrt{(\Delta x)^2+(\Delta y)^2}}\Delta y \end{aligned} ==ε(Δx)2+(Δy)2 ε(Δx)2+(Δy)2 (Δx)2+(Δy)2(Δx)2+(Δy)2 εΔxΔx+(Δx)2+(Δy)2 εΔyΔy α = ε Δ x ( Δ x ) 2 + ( Δ y ) 2 ,   β = ε Δ y ( Δ x ) 2 + ( Δ y ) 2 \displaystyle \alpha=\frac{\varepsilon\Delta x}{\sqrt{(\Delta x)^2+(\Delta y)^2}},\,\beta=\frac{\varepsilon\Delta y}{\sqrt{(\Delta x)^2+(\Delta y)^2}} α=(Δx)2+(Δy)2 εΔx,β=(Δx)2+(Δy)2 εΔy,则 ε ( Δ x ) 2 + ( Δ y ) 2 = α Δ x + β Δ y \varepsilon\sqrt{(\Delta x)^2+(\Delta y)^2}=\alpha\Delta x+\beta\Delta y ε(Δx)2+(Δy)2 =αΔx+βΔy ∣ α ∣ = ∣ ε ∣ ∣ Δ x ( Δ x ) 2 + ( Δ y ) 2 ∣ ≤ ∣ ε ∣ \vert\alpha\vert=\vert \varepsilon\vert \Big| \frac{\Delta x}{\sqrt{(\Delta x)^2+(\Delta y)^2}}\Big| \leq \vert \varepsilon\vert α=ε(Δx)2+(Δy)2 Δxε两边取极限,得 lim ⁡ ( Δ x , Δ y ) → ( 0 , 0 ) α = 0 \displaystyle \lim_{(\Delta x,\Delta y)\to(0,0)}\alpha=0 (Δx,Δy)(0,0)limα=0. 同理可证 lim ⁡ ( Δ x , Δ y ) → ( 0 , 0 ) β = 0 \displaystyle \lim_{(\Delta x,\Delta y)\to(0,0)}\beta=0 (Δx,Δy)(0,0)limβ=0.
因此, o ( ( Δ x ) 2 + ( Δ y ) 2   ) = α Δ x + β Δ y o\big(\sqrt{(\Delta x)^2+(\Delta y)^2}\,\big)=\alpha\Delta x+\beta\Delta y o((Δx)2+(Δy)2 )=αΔx+βΔy.

2.    2.\,\, 2.由上述结论,函数增量 Δ f \Delta f Δf 可以表示为
Δ f = A Δ x + B Δ y + o ( ( Δ x ) 2 + ( Δ y ) 2   ) = A Δ x + B Δ y + α Δ x + β Δ y \begin{aligned}\Delta f&=A\Delta x+B\Delta y+o\big(\sqrt{(\Delta x)^2+(\Delta y)^2}\,\big)\\&=A\Delta x+B\Delta y+\alpha\Delta x+\beta\Delta y\end{aligned} Δf=AΔx+BΔy+o((Δx)2+(Δy)2 )=AΔx+BΔy+αΔx+βΔy其中 lim ⁡ ( Δ x , Δ y ) → ( 0 , 0 ) α = lim ⁡ ( Δ x , Δ y ) → ( 0 , 0 ) β = 0 \displaystyle \lim_{(\Delta x,\Delta y)\to(0,0)}\alpha=\lim_{(\Delta x,\Delta y)\to(0,0)}\beta=0 (Δx,Δy)(0,0)limα=(Δx,Δy)(0,0)limβ=0.
Δ y = 0 \Delta y=0 Δy=0,则 Δ f = A Δ x + α Δ x = A Δ x + o ( Δ x ) \Delta f=A\Delta x+\alpha\Delta x=A\Delta x+o(\Delta x) Δf=AΔx+αΔx=AΔx+o(Δx).
由一元函数可微性及偏导数定义,可得 A = f x ( x 0 , y 0 ) A=f_x(x_0,y_0) A=fx(x0,y0). 同理 B = f y ( x 0 , y 0 ) B=f_y(x_0,y_0) B=fy(x0,y0).

于是 Δ f = f x ( x 0 , y 0 ) Δ x + f y ( x 0 , y 0 ) Δ y + o ( ( Δ x ) 2 + ( Δ y ) 2   ) = c ⃗   Δ x ⃗ + o ( ∥ Δ x ⃗ ∥ ) \begin{aligned}\Delta f&=f_x(x_0,y_0)\Delta x+f_y(x_0,y_0)\Delta y+o\big(\sqrt{(\Delta x)^2+(\Delta y)^2}\,\big)\\&=\vec{c}\,\Delta \vec{x}+o(\Vert \Delta \vec{x}\Vert)\end{aligned} Δf=fx(x0,y0)Δx+fy(x0,y0)Δy+o((Δx)2+(Δy)2 )=c Δx +o(Δx )其中, c ⃗ = ( f x ( x 0 , y 0 ) , f y ( x 0 , y 0 ) ) , Δ x ⃗ = ( Δ x , Δ y ) T , ∥ Δ x ⃗ ∥ = ( Δ x ) 2 + ( Δ y ) 2 \vec{c}=(f_x(x_0,y_0),f_y(x_0,y_0)),\Delta \vec{x}=(\Delta x,\Delta y)^T,\Vert \Delta \vec{x}\Vert=\sqrt{(\Delta x)^2+(\Delta y)^2} c =(fx(x0,y0),fy(x0,y0)),Δx =(Δx,Δy)T,Δx =(Δx)2+(Δy)2 .

四、 n n n 元函数的可微性

新酒装旧壶
  设 n n n 元函数 f ( x ⃗ )   ( x ⃗ = ( x 1 , x 2 , ⋯   , x n ) T ) f(\vec{x})\,(\vec{x}=(x_1,x_2,\cdots,x_n)^T) f(x )(x =(x1,x2,,xn)T) 在点 x ⃗ 0 = ( x 10 , x 20 , ⋯   , x n 0 ) T \vec{x}_0=(x_{10},x_{20},\cdots,x_{n0})^T x 0=(x10,x20,,xn0)T 的某邻域 U ( x ⃗ 0 ) \small U(\vec{x}_0) U(x 0) 内有定义,若对 U ( x ⃗ 0 ) \small U(\vec{x}_0) U(x 0) 中的任意一点 x ⃗ = x ⃗ 0 + Δ x ⃗    ( Δ x ⃗ = ( Δ x 1 , Δ x 2 , ⋯   , Δ x n ) T ) \vec{x}=\vec{x}_0+\Delta \vec{x}\,\, (\Delta \vec{x}=(\Delta x_1,\Delta x_2,\cdots,\Delta x_n)^T) x =x 0+Δx (Δx =(Δx1,Δx2,,Δxn)T),函数增量 Δ f \Delta f Δf 都可以表示为 Δ f = f ( x ⃗ 0 + Δ x ⃗ ) − f ( x ⃗ 0 ) = c 1 Δ x 1 + c 2 Δ x 2 + ⋯ + c n Δ x n + o ( ∥ Δ x ⃗ ∥ ) \begin{aligned}\Delta f&=f(\vec{x}_0+\Delta \vec{x})-f(\vec{x}_0)\\&=c_1\Delta x_1+ c_2\Delta x_2+\cdots+c_n\Delta x_n+o(\Vert\Delta \vec{x}\Vert)\end{aligned} Δf=f(x 0+Δx )f(x 0)=c1Δx1+c2Δx2++cnΔxn+o(Δx )其中 c 1 , c 2 , ⋯   , c n c_1,c_2,\cdots,c_n c1,c2,,cn 为常数,仅与 x ⃗ 0 \vec{x}_0 x 0 有关,则称函数 f f f x ⃗ 0 \vec{x}_0 x 0可微.

同理可证, o ( ∥ Δ x ⃗ ∥ ) = ε 1 Δ x 1 + ε 2 Δ x 2 + ⋯ + ε n Δ x n o(\Vert\Delta \vec{x}\Vert)=\varepsilon_1\Delta x_1+\varepsilon_2\Delta x_2+\cdots+\varepsilon_n\Delta x_n o(Δx )=ε1Δx1+ε2Δx2++εnΔxn,其中 lim ⁡ Δ x ⃗ → 0 ⃗ ε i = 0 \displaystyle \lim_{\Delta \vec{x}\to\vec{0}}\varepsilon_i=0 Δx 0 limεi=0.
同理亦可证, c i = f x i ( x ⃗ 0 ) c_i=f_{x_i}(\vec{x}_0 ) ci=fxi(x 0).

c = ( c 1 , c 2 , ⋯   , c n ) = ( f x 1 ( x ⃗ 0 ) , f x 2 ( x ⃗ 0 ) , ⋯   , f x n ( x ⃗ 0 ) ) c=(c_1,c_2,\cdots,c_n)=(f_{x_1}(\vec{x}_0 ),f_{x_2}(\vec{x}_0 ),\cdots,f_{x_n}(\vec{x}_0 )) c=(c1,c2,,cn)=(fx1(x 0),fx2(x 0),,fxn(x 0))

Δ f = c   Δ x ⃗ + o ( ∥ Δ x ⃗ ∥ ) \Delta f=c\,\Delta \vec{x}+o(\Vert\Delta \vec{x}\Vert) Δf=cΔx +o(Δx ),其中 d y = c   Δ x ⃗ dy=c\,\Delta \vec{x} dy=cΔx 称为全微分.

由此推知 n n n 元函数的导数为 f ′ ( x ⃗ ) = ( f x 1 ( x ⃗ ) , f x 2 ( x ⃗ ) , ⋯   , f x n ( x ⃗ ) ) = ( ∂ f ∂ x 1 , ∂ f ∂ x 2 , ⋯   , ∂ f ∂ x n ) f'(\vec{x})=(f_{x_1}(\vec{x}),f_{x_2}(\vec{x}),\cdots,f_{x_n}(\vec{x}))=(\frac{\partial f}{\partial x_1},\frac{\partial f}{\partial x_2},\cdots,\frac{\partial f}{\partial x_n}) f(x )=(fx1(x ),fx2(x ),,fxn(x ))=(x1f,x2f,,xnf)咦?这不就是梯度吗?或者说是梯度的转置

那,微分怎么就是线性近似了呢?
  不急不急,先来看下 n n n 维空间中的超平面,定义为 { x ⃗   ∣   c ⃗   x ⃗ = d , c ⃗   ( c ⃗ ≠ 0 ⃗ ) \lbrace \vec{x}\,| \,\vec{c}\,\vec{x}=d,\vec{c}\,(\vec{c}\neq \vec{0}) {x c x =d,c (c =0 ) 为"平面"的法向量(行向量), d d d 为常数 } ⊂ R n \small \rbrace\subset R^n }Rn.
n = 2 n=2 n=2 时,是平面 R 2 \small R^2 R2 中的一条直线;
n = 3 n=3 n=3 时,是三维空间 R 3 \small R^3 R3 中的一个平面;
n > 3 n>3 n>3 时,便是 R n \small R^n Rn 中的超平面.

  有了微分,我们便可以据此给出 x ⃗ 0 \vec{x}_0 x 0 附近 x ⃗ \vec{x} x 处函数的估计值,即 f ^ ( x ⃗ ) = f ( x ⃗ 0 ) + d y = f ( x ⃗ 0 ) + f ′ ( x ⃗ 0 ) ( x ⃗ − x ⃗ 0 ) \hat{f}(\vec{x})=f(\vec{x}_0)+dy=f(\vec{x}_0)+f'(\vec{x}_0)(\vec{x}-\vec{x}_0) f^(x )=f(x 0)+dy=f(x 0)+f(x 0)(x x 0).

c ⃗ = ( f ′ ( x ⃗ 0 ) , − 1 ) \vec{c}=(f'(\vec{x}_0),-1) c =(f(x 0),1),令 x ⃗ n e w = ( x ⃗ T , f ^ ( x ⃗ ) ) T , x ⃗ o l d = ( x ⃗ 0 T , f ( x ⃗ 0 ) ) T \vec{x}_{new}=(\vec{x}^T,\hat{f}(\vec{x}))^T,\vec{x}_{old}=(\vec{x}_0^T,f(\vec{x}_0))^T x new=(x T,f^(x ))T,x old=(x 0T,f(x 0))T

  c ⃗   x ⃗ n e w = c ⃗   x ⃗ o l d = d 0 \,\vec{c}\,\vec{x}_{new}=\vec{c}\,\vec{x}_{old}=d_0 c x new=c x old=d0.

因此,近似点 x ⃗ n e w \vec{x}_{new} x new 与已知点 x ⃗ o l d \vec{x}_{old} x old 位于同一超平面上,这便是线性近似.

 一元函数,近似点位于切线上;
 二元函数,近似点位于切平面上;
n n n 元函数,近似点位于 R n + 1 R^{n+1} Rn+1 中的“切超平面”上.

对微分的简单理解_第1张图片 对微分的简单理解_第2张图片


五、向量函数的导数

(这部分可以跳过,不影响对“微分”的理解,写出来是为了保证文章的完整性)

换汤不换药
  什么是向量函数?简单来讲,就是函数"值"不再是数(实数或复数),而是向量. 具体定义如下:
  设 X ⊂ R n ,   Y ⊂ R m \small X\subset R^n,\,Y\subset R^m XRn,YRm,若对任意的 x ⃗ ∈ X \vec{x}\in X x X,都存在唯一的 y ⃗ ∈ Y \vec{y}\in Y y Y 与之对应,则将此映射称为向量函数,记作 f :   X → Y x ↦ y \begin{aligned}f:\,X&\to Y\\ x&\mapsto y\end{aligned} f:XxYy

嘶,向量到向量的映射,啊,这让人怎么想象?
  让我来给您解释一下
:记 f ( x ⃗ ) = ( f 1 ( x ⃗ ) , f 2 ( x ⃗ ) , ⋯   , f m ( x ⃗ ) ) T f(\vec{x})=(f_1(\vec{x}),f_2(\vec{x}),\cdots,f_m(\vec{x}))^T f(x )=(f1(x ),f2(x ),,fm(x ))T,则 f i ( x ⃗ ) f_i(\vec{x}) fi(x ) n n n 元函数. 于是,向量函数便可以理解为 m m m n n n 元函数拼成的东东.

emm, n   n\, n元函数的导数是个向量,那向量函数的导数该是什么?
数学直觉动起来,没错,答案就是——矩阵,下面来验证这一猜想.

  先来回顾下多元函数(可微时)函数增量的表达式
Δ f i = f i ( x ⃗ 0 + Δ x ⃗ ) − f i ( x ⃗ 0 ) = c i Δ x ⃗ + o ( ∥ Δ x ⃗ ∥ ) \Delta f_i=f_i(\vec{x}_0+\Delta \vec{x})-f_i(\vec{x}_0)=c_{i}\Delta \vec{x}+o(\Vert\Delta \vec{x}\Vert) Δfi=fi(x 0+Δx )fi(x 0)=ciΔx +o(Δx )其中 c i = f i ′ ( x ⃗ 0 ) c_i=f'_i(\vec{x}_0) ci=fi(x 0). 把这些 Δ f i \Delta f_i Δfi 拼起来, Δ f = ( Δ f 1 Δ f 2 ⋮ Δ f m ) = ( c 1 c 2 ⋮ c m ) Δ x ⃗ + o ⃗ ( ∥ Δ x ⃗ ∥ ) = A Δ x ⃗ + o ⃗ ( ∥ Δ x ⃗ ∥ ) \Delta f=\begin{pmatrix}\Delta f_1 \\ \Delta f_2 \\ \vdots \\ \Delta f_m\\\end{pmatrix}=\begin{pmatrix}c_1 \\ c_2 \\ \vdots \\c_m\\\end{pmatrix}\Delta \vec{x}+\vec{o}(\Vert\Delta \vec{x}\Vert)=A\Delta \vec{x}+\vec{o}(\Vert\Delta \vec{x}\Vert) Δf=Δf1Δf2Δfm=c1c2cmΔx +o (Δx )=AΔx +o (Δx )其中 A = ( c 1 c 2 ⋮ c m ) = ( f 1 ′ ( x ⃗ 0 ) f 2 ′ ( x ⃗ 0 ) ⋮ f m ′ ( x ⃗ 0 ) ) = ( ∂ f 1 ∂ x 1 ∂ f 1 ∂ x 2 ⋯ ∂ f 1 ∂ x n ∂ f 2 ∂ x 1 ∂ f 2 ∂ x 2 ⋯ ∂ f 2 ∂ x n ⋮ ⋮ ⋱ ⋮ ∂ f m ∂ x 1 ∂ f m ∂ x 2 ⋯ ∂ f m ∂ x n ) A=\begin{pmatrix}c_1\\c_2\\\vdots\\c_m\end{pmatrix}= \begin{pmatrix}f'_1(\vec{x}_0)\\f'_2(\vec{x}_0)\\\vdots\\f'_m(\vec{x}_0)\end{pmatrix}= \begin{pmatrix} \frac{\partial f_1}{\partial x_1} & \frac{\partial f_1}{\partial x_2} & \cdots & \frac{\partial f_1}{\partial x_n}\\ \frac{\partial f_2}{\partial x_1} & \frac{\partial f_2}{\partial x_2} & \cdots & \frac{\partial f_2}{\partial x_n}\\ \vdots & \vdots & \ddots & \vdots\\ \frac{\partial f_m}{\partial x_1} & \frac{\partial f_m}{\partial x_2} & \cdots & \frac{\partial f_m}{\partial x_n}\\ \end{pmatrix} A=c1c2cm=f1(x 0)f2(x 0)fm(x 0)=x1f1x1f2x1fmx2f1x2f2x2fmxnf1xnf2xnfm A \small A A 就是向量函数 f f f 的导数,有时也被称为 f f f J a c o b i \small Jacobi Jacobi 矩阵,记作 J f ( x ⃗ 0 ) \small J_f(\vec{x}_0) Jf(x 0).

  巧的是,这样得到的导数同样满足求导的链式法则.

补充两个例子

1.   f ( x ) = A x 1. \,f(x)=Ax 1.f(x)=Ax A A A m × n m\times n m×n 阶矩阵.
f ( x ) = ( a 11 a 12 ⋯ a 1 n a 21 a 22 ⋯ a 2 n ⋮ ⋮ ⋱ ⋮ a m 1 a m 2 ⋯ a m n ) ( x 1 x 2 ⋮ x n ) = ( a 11 x 1 + a 12 x 2 + ⋯ + a 1 n x n a 21 x 1 + a 22 x 2 + ⋯ + a 2 n x n ⋮ a m 1 x 1 + a m 2 x 2 + ⋯ + a m n x n ) = ( f 1 f 2 ⋮ f n ) f(x)= \begin{pmatrix} a_{11} & a_{12} & \cdots & a_{1n} \\ a_{21} & a_{22} & \cdots & a_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{m1} & a_{m2} & \cdots & a_{mn} \end{pmatrix}\begin{pmatrix}x_1\\x_2\\\vdots\\x_n\end{pmatrix}= \begin{pmatrix} a_{11}x_1 + a_{12}x_2 + \cdots + a_{1n}x_n\\ a_{21}x_1 + a_{22}x_2 + \cdots + a_{2n}x_n\\ \vdots\\ a_{m1}x_1 + a_{m2}x_2 + \cdots + a_{mn}x_n\end{pmatrix}= \begin{pmatrix}f_1\\f_2\\\vdots\\f_n\end{pmatrix} f(x)=a11a21am1a12a22am2a1na2namnx1x2xn=a11x1+a12x2++a1nxna21x1+a22x2++a2nxnam1x1+am2x2++amnxn=f1f2fn

f ′ ( x ) = ( ∂ f 1 ∂ x 1 ∂ f 1 ∂ x 2 ⋯ ∂ f 1 ∂ x n ∂ f 2 ∂ x 1 ∂ f 2 ∂ x 2 ⋯ ∂ f 2 ∂ x n ⋮ ⋮ ⋱ ⋮ ∂ f m ∂ x 1 ∂ f m ∂ x 2 ⋯ ∂ f m ∂ x n ) = ( a 11 a 12 ⋯ a 1 n a 21 a 22 ⋯ a 2 n ⋮ ⋮ ⋱ ⋮ a m 1 a m 2 ⋯ a m n ) = A f'(x)= \begin{pmatrix} \frac{\partial f_1}{\partial x_1} & \frac{\partial f_1}{\partial x_2} & \cdots & \frac{\partial f_1}{\partial x_n}\\ \frac{\partial f_2}{\partial x_1} & \frac{\partial f_2}{\partial x_2} & \cdots & \frac{\partial f_2}{\partial x_n}\\ \vdots & \vdots & \ddots & \vdots\\ \frac{\partial f_m}{\partial x_1} & \frac{\partial f_m}{\partial x_2} & \cdots & \frac{\partial f_m}{\partial x_n}\\ \end{pmatrix}= \begin{pmatrix} a_{11} & a_{12} & \cdots & a_{1n} \\ a_{21} & a_{22} & \cdots & a_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{m1} & a_{m2} & \cdots & a_{mn} \end{pmatrix}=A f(x)=x1f1x1f2x1fmx2f1x2f2x2fmxnf1xnf2xnfm=a11a21am1a12a22am2a1na2namn=A
2.   f ( x ) = x T A x 2. \,f(x)=x^TAx 2.f(x)=xTAx A A A n n n 阶方阵.
f ( x ) = a 11 x 1 x 1 + a 12 x 1 x 2 + ⋯ + a 1 n x 1 x n + a 21 x 2 x 1 + a 22 x 2 x 2 + ⋯ + a 2 n x 2 x n + ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ + a n 1 x n x 1 + a n 2 x n x 2 + ⋯ + a n n x n x n \begin{aligned} f(x)&=a_{11}x_1x_1+a_{12}x_1x_2+\cdots+a_{1n}x_1x_n\\ &+a_{21}x_2x_1+a_{22}x_2x_2+\cdots+a_{2n}x_2x_n\\ &+\cdots\cdots\cdots\cdots\cdots\cdots\cdots\cdots\cdots\cdots\cdots\\ &+a_{n1}x_nx_1+a_{n2}x_nx_2+\cdots+a_{nn}x_nx_n \end{aligned} f(x)=a11x1x1+a12x1x2++a1nx1xn+a21x2x1+a22x2x2++a2nx2xn++an1xnx1+an2xnx2++annxnxn
∂ f ∂ x 1 = 2 a 11 x 1 + a 12 x 2 + ⋯ + a 1 n x n + a 21 x 2 + a 31 x 3 + ⋯ + a n 1 x n = ( a 11 + a 11 ) x 1 + ( a 12 + a 21 ) x 2 + ⋯ + ( a 1 n + a n 1 ) x n = ( a 11 + a 11 a 12 + a 21 ⋯ a 1 n + a n 1 ) ( x 1 x 2 ⋮ x n ) \begin{aligned} \frac{\partial f}{\partial x_1}&=2a_{11}x_1+a_{12}x_2+\cdots+a_{1n}x_n +a_{21}x_2+a_{31}x_3+\cdots+a_{n1}x_n\\&=(a_{11}+a_{11})x_1+(a_{12}+a_{21})x_2+\cdots+(a_{1n}+a_{n1})x_n\\&= \begin{pmatrix} a_{11}+a_{11} & a_{12}+a_{21} & \cdots & a_{1n}+a_{n1} \end{pmatrix}\begin{pmatrix}x_1\\x_2\\\vdots\\x_n\end{pmatrix} \end{aligned} x1f=2a11x1+a12x2++a1nxn+a21x2+a31x3++an1xn=(a11+a11)x1+(a12+a21)x2++(a1n+an1)xn=(a11+a11a12+a21a1n+an1)x1x2xn
∇ f = ( ∂ f ∂ x 1 ∂ f ∂ x 2 ⋮ ∂ f ∂ x n ) = ( a 11 + a 11 a 12 + a 21 ⋯ a 1 n + a n 1 a 21 + a 12 a 22 + a 22 ⋯ a 2 n + a n 2 ⋮ ⋮ ⋱ ⋮ a n 1 + a 1 n a n 2 + a 2 n ⋯ a n n + a n n ) ( x 1 x 2 ⋮ x n ) = ( A + A T ) x \nabla f= \begin{pmatrix}\frac{\partial f}{\partial x_1} \\ \frac{\partial f}{\partial x_2} \\ \vdots \\ \frac{\partial f}{\partial x_n}\end{pmatrix}= \begin{pmatrix} a_{11}+a_{11} & a_{12}+a_{21} & \cdots & a_{1n}+a_{n1}\\ a_{21}+a_{12} & a_{22}+a_{22} & \cdots & a_{2n}+a_{n2}\\ \vdots & \vdots & \ddots & \vdots\\ a_{n1}+a_{1n} & a_{n2}+a_{2n} & \cdots & a_{nn}+a_{nn} \end{pmatrix}\begin{pmatrix}x_1\\x_2\\\vdots\\x_n\end{pmatrix}=(A+A^T)x f=x1fx2fxnf=a11+a11a21+a12an1+a1na12+a21a22+a22an2+a2na1n+an1a2n+an2ann+annx1x2xn=(A+AT)x  特别地,若 A = A T \small A=A^T A=AT,则 ∇ f = 2 A x \small \nabla f=2Ax f=2Ax.

作者留言

  线性近似其实是"一次多项式"近似,那更高次数的多项式近似是什么?

  答案是泰勒展开,欲知后事如何,请看下回分解.



参考文献:
1.华东师范大学数学系.数学分析(下册)
[M].第四版.北京:高等教育出版社,2010.


Plus: 如有错误、可以改进的地方、或任何想说的,请在评论区留言!

你可能感兴趣的:(数学园地)