数值梯度是对梯度的估计值,数值梯度在基于梯度下降的学习任务中可以用来检测计算梯度的代码是否正确,尽管当前而言各种autodiff框架早已保证了梯度的准确性,但是我就是想写你管我啊
常见的数值梯度形式有以下两种,利用泰勒展式可以证明其精度
f ( x + h ) − f ( x ) h \dfrac{f(x+h)-f(x)}{h} hf(x+h)−f(x)
其误差为 O ( h ) O(h) O(h),证明过程如下2
f ( x + h ) = f ( x ) + f ′ ( x ) h + O ( h 2 ) ⇒ f ′ ( x ) h = f ( x + h ) − f ( x ) + O ( h 2 ) ⇒ f ′ ( x ) = f ( x + h ) − f ( x ) h + O ( h ) \begin{aligned} &f(x+h)=f(x)+f'(x)h+O(h^2)\\ \Rightarrow \ &f'(x)h=f(x+h)-f(x)+O(h^2)\\ \Rightarrow \ &f'(x)=\dfrac{f(x+h)-f(x)}{h}+O(h) \end{aligned} ⇒ ⇒ f(x+h)=f(x)+f′(x)h+O(h2)f′(x)h=f(x+h)−f(x)+O(h2)f′(x)=hf(x+h)−f(x)+O(h)
f ( x + h ) − f ( x − h ) 2 h \dfrac{f(x+h)-f(x-h)}{2h} 2hf(x+h)−f(x−h)
其误差为 O ( h 2 ) O(h^2) O(h2),证明过程如下2
f ( x + h ) = f ( x ) + f ′ ( x ) h + f ′ ′ ( x ) h 2 + O ( h 3 ) f ( x − h ) = f ( x ) − f ′ ( x ) h + f ′ ′ ( x ) h 2 + O ( h 3 ) f ( x + h ) − f ( x − h ) = 2 f ′ ( x ) h + O ( h 3 ) ⇒ f ′ ( x ) = f ( x + h ) − f ( x − h ) 2 h + O ( h 2 ) \begin{aligned} &f(x+h)=f(x)+f'(x)h+f''(x)h^2+O(h^3)\\ &f(x-h)=f(x)-f'(x)h+f''(x)h^2+O(h^3)\\ &f(x+h)-f(x-h)=2f'(x)h+O(h^3)\\ \Rightarrow \ &f'(x)= \dfrac{f(x+h)-f(x-h)}{2h}+O(h^2) \end{aligned} ⇒ f(x+h)=f(x)+f′(x)h+f′′(x)h2+O(h3)f(x−h)=f(x)−f′(x)h+f′′(x)h2+O(h3)f(x+h)−f(x−h)=2f′(x)h+O(h3)f′(x)=2hf(x+h)−f(x−h)+O(h2)
数学上,当存在 L L L
f ( x ) g ( x ) ≤ L ( x → x 0 ) \dfrac{f(x)}{g(x)}\leq L \ (x\rightarrow x_0) g(x)f(x)≤L (x→x0)
时,记 f ( x ) = O ( g ( x ) ) f(x)=O(g(x)) f(x)=O(g(x))
https://en.wikipedia.org/wiki/Numerical_differentiation ↩︎
http://math.mit.edu/classes/18.01/F2011/lecture14.pdf ↩︎ ↩︎