Lecture 5 | Convergence in Neural Networks

Lecture 5 | Convergence in Neural Networks_第1张图片
Lecture 5 | Convergence in Neural Networks_第2张图片
Lecture 5 | Convergence in Neural Networks_第3张图片
Lecture 5 | Convergence in Neural Networks_第4张图片
Lecture 5 | Convergence in Neural Networks_第5张图片
Lecture 5 | Convergence in Neural Networks_第6张图片

accuracy (counting) is not differentiable! and cross entropy error is just an approx of the accuracy
sometimes, minimizing the cross entropy is not minimizing the accuracy

Lecture 5 | Convergence in Neural Networks_第7张图片

perceptron and sigmoid NN can find the decision boundary successfully

Lecture 5 | Convergence in Neural Networks_第8张图片

Now one more point. Perception -> 100% accuracy, while sigmoid NN can not reach 100% accuracy (assume NN's weights are bounded => length of weights vector is 1)

Lecture 5 | Convergence in Neural Networks_第9张图片
Lecture 5 | Convergence in Neural Networks_第10张图片
Lecture 5 | Convergence in Neural Networks_第11张图片
Lecture 5 | Convergence in Neural Networks_第12张图片
Lecture 5 | Convergence in Neural Networks_第13张图片
Lecture 5 | Convergence in Neural Networks_第14张图片
Lecture 5 | Convergence in Neural Networks_第15张图片

high dim -> no one knows -> only hypothesis

saddle point -> some eigen values of the hessian matrix are positive, and some are negative

Lecture 5 | Convergence in Neural Networks_第16张图片
Lecture 5 | Convergence in Neural Networks_第17张图片
Lecture 5 | Convergence in Neural Networks_第18张图片
Lecture 5 | Convergence in Neural Networks_第19张图片
Lecture 5 | Convergence in Neural Networks_第20张图片
Lecture 5 | Convergence in Neural Networks_第21张图片

R => how fast it converges

R > 1 => getting worse
R = 1 => no better no worse
R<1 => better


First consider the quadratic cases

Lecture 5 | Convergence in Neural Networks_第22张图片
Lecture 5 | Convergence in Neural Networks_第23张图片

Newton's method 参考 https://zhuanlan.zhihu.com/p/83320557 chapter4.1
注意不同的是,4.1里面是函数本身求根,这里是要求导数的根,所以多加一次导数形式就匹配了。optimal step for grad is the second order derivative (hessian matrix)'s inverse

Lecture 5 | Convergence in Neural Networks_第24张图片
Lecture 5 | Convergence in Neural Networks_第25张图片
Lecture 5 | Convergence in Neural Networks_第26张图片
Lecture 5 | Convergence in Neural Networks_第27张图片
Lecture 5 | Convergence in Neural Networks_第28张图片
Lecture 5 | Convergence in Neural Networks_第29张图片

difference dim may have different optimal -> may converge in one direction, but diverge in the other -> have to get the min of all optimal

Lecture 5 | Convergence in Neural Networks_第30张图片
Lecture 5 | Convergence in Neural Networks_第31张图片
Lecture 5 | Convergence in Neural Networks_第32张图片
Lecture 5 | Convergence in Neural Networks_第33张图片
coupled
Lecture 5 | Convergence in Neural Networks_第34张图片
solution -> normalization of data
Lecture 5 | Convergence in Neural Networks_第35张图片
Lecture 5 | Convergence in Neural Networks_第36张图片
Lecture 5 | Convergence in Neural Networks_第37张图片
quadratic term is approximated by Hessian Matrix
Lecture 5 | Convergence in Neural Networks_第38张图片
Lecture 5 | Convergence in Neural Networks_第39张图片
if eta = 1 ~> equals to Newton's method
Lecture 5 | Convergence in Neural Networks_第40张图片
curse of dim

but we dont need capture the whole Hessian matrix, right?

Lecture 5 | Convergence in Neural Networks_第41张图片
Hessian matrix and quadratic approximation may not be in the right direction
Lecture 5 | Convergence in Neural Networks_第42张图片
a number of methods to approximate the Hessian

all these 2nd order method fail in high dim

Lecture 5 | Convergence in Neural Networks_第43张图片
Lecture 5 | Convergence in Neural Networks_第44张图片
Lecture 5 | Convergence in Neural Networks_第45张图片
Lecture 5 | Convergence in Neural Networks_第46张图片

Lecture 5 | Convergence in Neural Networks_第47张图片
Lecture 5 | Convergence in Neural Networks_第48张图片
does bfgs and LM solves the stability issue?
Lecture 5 | Convergence in Neural Networks_第49张图片

Lecture 5 | Convergence in Neural Networks_第50张图片
Lecture 5 | Convergence in Neural Networks_第51张图片
Lecture 5 | Convergence in Neural Networks_第52张图片
Lecture 5 | Convergence in Neural Networks_第53张图片

why not using multi step information??

Lecture 5 | Convergence in Neural Networks_第54张图片
Lecture 5 | Convergence in Neural Networks_第55张图片
inverse of hessian -> inverse of partial derivative
Lecture 5 | Convergence in Neural Networks_第56张图片
Lecture 5 | Convergence in Neural Networks_第57张图片
Lecture 5 | Convergence in Neural Networks_第58张图片
Lecture 5 | Convergence in Neural Networks_第59张图片

你可能感兴趣的:(Lecture 5 | Convergence in Neural Networks)