深度学习——梯度下降法1

最优化的入门

从简单问题出发

1.1 函数最小值

求   J\left ( w \right )=w^{^{2}}+2w+1=(w+1)^{^{2}}  的极小值?

理解梯度小降法的三个层次:

了解层次 

  • 二次函数从某点出发,沿着梯度方向下降
  • 从二次函数求最小值,数学公式的层次

初级入门层次

  • 从零代码求二次函数最小值
  • 简单的数学证明

入门层次

  • 推广到机器学习
  • 推广到一般的梯度下降法

min J\left (W \right ) 二次函数

Gradient 

一种直观理解,函数沿着切线方向下降?(不是)

  J\left ( w \right )=w^{^{2}}+2w+1=(w+1)^{^{2}}

w ←w-α*J'(w)

初始值:w0=1   α=0.01(学习速率)

\vec{d}=(x_{1},...x_{n}) \left \| d \right \|^{2}=1

J'(w)=2(w+1)

w_{n+1}=w_{n}-\alpha J'(w_{n})=w_{n}-\alpha\ast 2(w+1)=(1-2\alpha ) w_{n}-2\alpha=(1-2*0.01)w_{n}-2*0.01=0.98w_{n}-0.02\rightarrow w_{1}=0.98-0.02=0.96

x=-1处为函数最小值,停止更新w_{n}=-1\; \; \; \; w_{n+1}=-0.98w_{n}-0.02=-1

若走过对称轴-1,自变量更新会往回走(接近于最小值的方向-1点)

w_{n}=-2\; \; \; \; w_{n+1}=0.98*(-2)-0.02=-1.98

初级入门层次:

# 定义函数

def J(w):
    return w**2 + 2*w + 1
# 函数导数
def J_prime(w):
    return 2*w + 2

# 梯度下降法,就是沿着梯度方向更新参数
# w ←  w - α * J_prime(w)
# epoch=1000,alpha=0.01

epoch = 100
alpha = 0.1
w = 1
for i in range(epoch):
    w = w - alpha*J_prime(w)
    print(w)

#输出结果 向着-1一直接近
0.6
0.2799999999999999
0.023999999999999966
-0.18080000000000004
-0.34464000000000006
-0.475712
-0.5805696
-0.66445568
-0.731564544
-0.7852516352000001
-0.82820130816
-0.862561046528
-0.8900488372224
-0.91203906977792
-0.9296312558223361
-0.9437050046578689
-0.9549640037262951
-0.9639712029810361
-0.9711769623848289
-0.9769415699078631
-0.9815532559262905
-0.9852426047410324
-0.9881940837928259
-0.9905552670342608
-0.9924442136274086
-0.9939553709019269
-0.9951642967215415
-0.9961314373772332
-0.9969051499017866
-0.9975241199214293
-0.9980192959371434
-0.9984154367497148
-0.9987323493997718
-0.9989858795198174
-0.999188703615854
-0.9993509628926832
-0.9994807703141466
-0.9995846162513173
-0.9996676930010538
-0.9997341544008431
-0.9997873235206745
-0.9998298588165395
-0.9998638870532316
-0.9998911096425853
-0.9999128877140683
-0.9999303101712547
-0.9999442481370038
-0.999955398509603
-0.9999643188076824
-0.9999714550461459
-0.9999771640369167
-0.9999817312295334
-0.9999853849836267
-0.9999883079869013
-0.9999906463895211
-0.9999925171116169
-0.9999940136892935
-0.9999952109514348
-0.9999961687611478
-0.9999969350089183
-0.9999975480071346
-0.9999980384057077
-0.9999984307245662
-0.999998744579653
-0.9999989956637224
-0.9999991965309779
-0.9999993572247823
-0.9999994857798258
-0.9999995886238606
-0.9999996708990885
-0.9999997367192708
-0.9999997893754167
-0.9999998315003333
-0.9999998652002666
-0.9999998921602133
-0.9999999137281707
-0.9999999309825365
-0.9999999447860292
-0.9999999558288233
-0.9999999646630586
-0.9999999717304469
-0.9999999773843575
-0.9999999819074861
-0.9999999855259889
-0.9999999884207911
-0.9999999907366328
-0.9999999925893063
-0.999999994071445
-0.999999995257156
-0.9999999962057248
-0.9999999969645799
-0.9999999975716639
-0.9999999980573311
-0.9999999984458648
-0.9999999987566919
-0.9999999990053535
-0.9999999992042828
-0.9999999993634263
-0.999999999490741
-0.9999999995925928

函数不是沿着切线方向下降

层次二:数学公式理解和证明

已知:w_{n+1}=w_{n}-\alpha J'(w_{n})

证明:J(w_{n+1})<J(w_{n})

泰勒公式(Taylor):f(x)=f(x_{0})+(x-x_{0})f'(x_{0})+\frac{(x-x_{0})^{2}}{2}f''(x_{0})+... \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \;x\rightarrow x\rightarrow x_{0} \; \; \; \; \; \; \; \; \; \; f(x)\approx f(x_{0})+f'(x_{0})(x-x_{0})

J(w_{n+1})=J(w_{n}-\alpha J'(w_{n}))\approx J(w_{n})-(\alpha J'(w_{n})J'(w_{n})\; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; =J(w_{n})-\alpha J'^{2}(w_{n})< J(w_{n})

层次三:从优化到机器学习

J\left ( w \right )=w^{^{2}}+2w+1=(w+1)^{^{2}}

J(Y_{pred},\; Y)\rightarrow minJ(w,\: b)

一维:w\leftarrow w-\alpha J'(w)

多维(导数变梯度,梯度有大小有方向):w\leftarrow w-\alpha \bigtriangledown J(w)

                                  w_{0}\rightarrow w_{1}\rightarrow w_{2}\rightarrow ...\rightarrow w_{n}

                                  J(w_{0})\rightarrow J(w_{1})\rightarrow...\rightarrow minJ(w_{})

动量梯度法:

# 动量梯度下降法

J = lambda w: w**2 + 2*w +1
J_prime = lambda w: 2*w +2

epoch = 200
w = 1
alpha = 0.1
v = 0
beta = 0.9

for i in range(epoch):
    v = beta * v +(1-beta)* J_prime(w)
    w = w-alpha*v
    print(w)

# 运行结果:收敛到-1附近

0.96 0.8848 0.779424 0.64899712 0.4986329856 0.33333260492800004 0.15789561022464008 -0.023155597212876716 -0.20563857196238428 -0.3857604777976934 -0.5601549834935178 -0.7259069389498893 -0.880565560081626 -1.0221470078985564 -1.1491273707758227 -1.2604271499498458 -1.3553884082074699 -1.433745772475182 -1.4955924848666193 -1.5413426763215805 -1.571690995104614 -1.5875706621072518 -1.590110949167481 -1.5805949885383375 -1.5604187242013416 -1.5310517118140186 -1.4940003664291475 -1.4507741482541805 -1.4028550689316266 -1.3516707961626955 -1.2985715347474036 -1.244810768778693 -1.1915298640312795 -1.1397464524779817 -1.090346453030454 -1.0440795244670702 -1.0015576982706833 -0.9632569007285213 -0.9295210449260052 -0.9005683538052205 -0.8764995647204099 -0.8573076632496721 -0.8428887986610147 -0.8330540445580028 -0.8275416849741319 -0.8260297276491656 -0.8281483715037126 -0.8334921835427306 -0.8416317707069921 -0.8521247637406877 -0.8645259621961999 -0.878396521562237 -0.8933120945604256 -0.9088698683675869 -0.9246944674266803 -0.9404427172313307 -0.9558072877108894 -0.9705192553882745 -0.9843496411901557 -0.9971099955880456 -1.0086521146343856 -1.018866979483404 -1.0276830182578522 -1.0350637927896986 -1.0410052140125665 -1.0455323888328962 -1.048696198394535 -1.0505697030321193 -1.0512444631453026 -1.0508268579842617 -1.0494344761796397 -1.047192643031887 -1.0442311403382718 -1.0406811651072527 -1.0366725640971906 -1.0323313719061908 -1.0277776714961673 -1.0231237876972226 -1.018472816524228 -1.0139174861380484 -1.0095393390677256 -1.0054082199230807 -1.0015820482944386 -0.998106852862772 -0.9950170399170166 -0.9923358674674965 -0.9900760949135784 -0.9882407777167805 -0.9868241766853268 -0.985812752223312 -0.9851862151630324 -0.9849186075055201 -0.9849793884636486 -0.9853345035566914 -0.985947417069296 -0.9867800908892542 -0.9877938955094315 -0.9889504417574024 -0.9902123245454283 -0.9915437725637429 -0.9929112003289513 -0.9942836613110598 -0.9956332029687363 -0.9969351264012704 -0.9981681549625256 -0.9993145175684048 -1.000359953562328 -1.0012936468856124 -1.0021080979388561 -1.0027989419279983 -1.0033647226796663 -1.0038066309025742 -1.0041282156851397 -1.0043350776757458 -1.0044345519137765 -1.0044353876897285 -1.004347432134291 -1.0041813234917112 -1.0039481992435553 -1.0036594234353438 -1.0033263367392464 -1.002960031977974 -1.0025711570532692 -1.0021697464799695 -1.0017650820344004 -1.0013655823927003 -1.0009787210673162 -1.0006109714531242 -1.000267777371289 -0.9999535471502113 -0.9996716690082373 -0.999424545300296 -0.9992136430571429 -0.9990395581771622 -0.9989020906216363 -0.9988003280092302 -0.9987327350978802 -0.9986972467757076 -0.998691362350238 -0.9987122391203107 -0.99875678343097 -0.9988217376419438 -0.9989037616789814 -0.9989995080787356 -0.9991056896769397 -0.9992191393217846 -0.9993368612157093 -0.9994560736959274 -0.9995742434542051 -0.9996891113675709 -0.9997987102622488 -0.9999013750622139 -0.9999957458809382 -1.0000807647001713 -1.0001556663434776 -1.0002199644955838 -1.0002734335425676 -1.0003160870140018 -1.0003481533980125 -1.0003700500756618 -1.000382356084033 -1.0003857843698865 -1.0003811541397567 -1.000369363849845 -1.0003513653119274 -1.000328139321563 -1.000300673143804 -1.0002699401209447 -1.0002368815979525 -1.0002023912953004 -1.0001673021970074 -1.0001323759646037 -1.0000982948361483 -1.0000656559238155 -1.0000349677842395 -1.0000066491029365 -0.9999810293077049 -0.9999583509058425 -0.9999387733260494 -0.9999223780377147 -0.9999091747174591 -0.9998991082348799 -0.999892066235861 -0.9998878871120268 -0.9998863681583355 -0.9998872737368466 -0.9998903432827697 -0.9998952990084451 -0.999901853181384 -0.9999097148734014 -0.999918596098749 -0.9999282172795868 -0.9999383119967491 -0.9999486310022602 -0.999958945487175 -0.9999690496138548 -0.9999787623355896 -0.9999879285384391 -0.9999964195502349 -1.0000041330698464

 

 

 

你可能感兴趣的:(深度学习——梯度下降法1)