梯度下降算法总纲:
1:CS231
http://cs231n.github.io/neural-networks-3/
https://zhuanlan.zhihu.com/p/21798784?refer=intelligentunit
2:An overview of gradient descent optimization algorithms
http://ruder.io/optimizing-gradient-descent/index.html
ADADELTA: AN ADAPTIVE LEARNING RATE METHOD
https://arxiv.org/pdf/1212.5701.pdf
ADAM: A METHOD FOR STOCHASTIC OPTIMIZATION
https://arxiv.org/pdf/1412.6980.pdf
一般的梯度下降法:
核心就是 learning_rate * gradient
def f(x): return x**3-2*x - 10 +x**2 def derivative_f(x): return 3*(x**2)+2*-2 x=0.0 y=0.0 learning_rate = 0.001 gradient=0 for i in range(1000000): print('x = {:6f}, f(x) = {:6f},gradient={:6f}'.format(x,y,gradient)) if((abs(gradient)>0.00001) and (abs(gradient)<0.0001)): print("break at "+str(i)) break else: gradient = derivative_f(x) x = x - learning_rate*gradient y = f(x)
ADAGRAD 实现(按照文档上的收敛很差,所以修改了分母)
核心是 learning_rate*gradient/(math.sqrt(sum/(i+1))+e)
import math def f(x): return x**3-2*x - 10 +x**2 def derivative_f(x): return 3*(x**2)+2*-2 x=0.0 y=0.0 learning_rate = 0.001 gradient=0 e=0.00000001 sum = 0.0 for i in range(100000): print('x = {:6f}, f(x) = {:6f},gradient={:6f}'.format(x,y,gradient)) if((abs(gradient)>0.00001) and (abs(gradient)<0.0001)): print("break at "+str(i)) break else: gradient = derivative_f(x) sum += gradient**2; x=x-learning_rate*gradient/(math.sqrt(sum/(i+1))+e) y=f(x)
ADADELTA实现
import math def f(x): return x**3-2*x - 10 +x**2 def derivative_f(x): return 3*(x**2)+2*-2 x=0.0 y=0.0 learning_rate = 0.001 gradient=0 e=0.00000001 sum = 0.0 d = 0.9 Egt=0 Edt = 0 delta = 0 for i in range(100000): print('x = {:6f}, f(x) = {:6f},gradient={:6f}'.format(x,y,gradient)) if(abs(gradient)>0.00001 and (abs(gradient)<0.0001)): print("break at "+str(i)) break else: gradient = derivative_f(x) Egt = d * Egt + (1-d)*(gradient**2) delta = math.sqrt(Edt + e)*gradient/math.sqrt(Egt + e) Edt = d*Edt+(1-d)*(delta**2) x=x-delta y = f(x)
RMSprop实现
import math def f(x): return x**3-2*x - 10 +x**2 def derivative_f(x): return 3*(x**2)+2*-2 x=0.0 y=0.0 learning_rate = 0.001 gradient=0 e=0.00000001 sum = 0.0 d = 0.9 Egt=0 Edt = 0 delta = 0 for i in range(100000): print('x = {:6f}, f(x) = {:6f},gradient={:6f}'.format(x,y,gradient)) if(abs(gradient)>0.00001 and (abs(gradient)<0.0001)): print("break at "+str(i)) break else: gradient = derivative_f(x) Egt = d * Egt + (1-d)*(gradient**2) x=x-learning_rate*gradient/math.sqrt(Egt + e) y=f(x)
Adam 实现
import math def f(x): return x**3-2*x - 10 +x**2 def derivative_f(x): return 3*(x**2)+2*-2 x=0.0 y=0.0 learning_rate = 0.001 gradient=0 e=0.00000001 b1 = 0.9 b2 = 0.995 m = 0 v = 0 t = 0 for i in range(10000): print('x = {:6f}, f(x) = {:6f},gradient={:6f}'.format(x,y,gradient)) if(abs(gradient)>0.00001 and (abs(gradient)<0.0001)): print("break at "+str(i)) break else: gradient = derivative_f(x) t=t+1 'mt ← β1 · mt−1 + (1 − β1) · gt ' m = b1*m + (1-b1)*gradient 'vt ← β2 · vt−1 + (1 − β2) · g2' v = b2*v +(1-b2)*(gradient**2) 'mt ← mt/(1 − βt1)' mt = m/(1-(b1**t)) 'vbt ← vt/(1 − βt2)' vt = v/(1-(b2**t)) x = x- learning_rate * mt/(math.sqrt(vt)+e) y=f(x)
总体感觉RMSprop 最优