强化学习复现笔记(3)Robbins-Monro算法证明

摘要: 都没证完,感觉都有问题。
  有个不知道具体表达式(也就是黑箱)的单调递增函数 M ( x ) M(x) M(x) 满足 0 < c 1 ≤ M ′ ( x ) ≤ c 2 00<c1M(x)c2,每输入 x x x 可以得到一个观测值 Y ( x ) = M ( x ) + w Y(x)=M(x)+w Y(x)=M(x)+w,其中 w w w 是噪声,不一定是高斯噪声也不一定是白噪声,但满足 E ( w ) = 0 , E ( w 2 ) ≤ C < ∞ \mathbb{E}(w)=0,\mathbb{E}(w^2)\leq C<\infty E(w)=0,E(w2)C<
  现在需要求解方程 M ( x ) = 0 M(x)=0 M(x)=0,解记作 x ∗ = θ x^*=\theta x=θ,方法是输入若干个点然后观察它的输出,使用迭代方法,给定任意一个初始点 x 0 x_0 x0,按照如下的方法迭代:
x n + 1 = x n − a n Y ( x n ) x_{n+1}=x_n-a_nY(x_n) xn+1=xnanY(xn)
a n a_n an 满足
a i > 0 , ∑ n = 1 ∞ a n = ∞ , ∑ n = 1 ∞ a n 2 < ∞ a_i>0,\sum_{n=1}^\infty a_n=\infty,\sum_{n=1}^\infty a_n^2<\infty ai>0,n=1an=,n=1an2<
时, x n x_n xn 就可以收敛到 x ∗ x^* x

第一种证明方法

  这基本上是原文证法。估计误差的变化情况为
J n + 1 = ( x n + 1 − θ ) 2 = ( x n − a n Y ( x n ) − θ ) 2 = ( x n − θ ) 2 − 2 a n Y ( x n ) ( x n − θ ) + a n 2 Y 2 ( x n ) \begin{aligned} J_{n+1} =& (x_{n+1}-\theta)^2 \\ =& (x_n-a_nY(x_n)-\theta)^2 \\ =& (x_n-\theta)^2-2a_nY(x_n)(x_n-\theta)+a_n^2Y^2(x_n) \\ \end{aligned} Jn+1===(xn+1θ)2(xnanY(xn)θ)2(xnθ)22anY(xn)(xnθ)+an2Y2(xn)
先指出所有变量中哪些是随机变量,因为 w w w 是随机变量,所以 x , Y ( x ) , J , M ( x ) x,Y(x),J,M(x) x,Y(x),J,M(x) 都是随机变量,只有 θ , M ( θ ) \theta,M(\theta) θ,M(θ) 不是。对估计误差求期望
E J n + 1 = E J n − 2 a n E [ Y ( x n ) ( x n − θ ) ] + a n 2 E [ Y 2 ( x n ) ] E J n + 1 − E J n = a n 2 E [ Y 2 ( x n ) ] − 2 a n E [ Y ( x n ) ( x n − θ ) ] \mathbb{E}J_{n+1}=\mathbb{E}J_n-2a_n\mathbb{E}[Y(x_n)(x_n-\theta)] +a_n^2\mathbb{E}[Y^2(x_n)] \\ \mathbb{E}J_{n+1}-\mathbb{E}J_n = a_n^2\mathbb{E}[Y^2(x_n)] -2a_n\mathbb{E}[Y(x_n)(x_n-\theta)] EJn+1=EJn2anE[Y(xn)(xnθ)]+an2E[Y2(xn)]EJn+1EJn=an2E[Y2(xn)]2anE[Y(xn)(xnθ)]
目标是寻找合适的 a n a_n an 序列使 E J n \mathbb{E}J_n EJn 趋于0。设
b n = E J n d n = E [ Y ( x n ) ( x n − θ ) ] e n = E [ Y 2 ( x n ) ] \begin{aligned} b_n =& \mathbb{E}J_n \\ d_n =& \mathbb{E}[Y(x_n)(x_n-\theta)] \\ e_n =& \mathbb{E}[Y^2(x_n)] \\ \end{aligned} bn=dn=en=EJnE[Y(xn)(xnθ)]E[Y2(xn)]
由于 M ( x ) M(x) M(x) 单调递增且 M ( θ ) = 0 M(\theta)=0 M(θ)=0,所以
d n = E [ ( M ( x n ) + w ) ( x n − θ ) ] = E [ M ( x n ) ( x n − θ ) ] + E [ w ( x n − θ ) ] = M ( x n ) ( x n − θ ) + ( x n − θ ) E [ w ] = M ( x n ) ( x n − θ ) ≥ 0 \begin{aligned} d_n =& \mathbb{E}[(M(x_n)+w)(x_n-\theta)] \\ =& \mathbb{E}[M(x_n)(x_n-\theta)]+\mathbb{E}[w(x_n-\theta)] \\ =& M(x_n)(x_n-\theta)+(x_n-\theta)\mathbb{E}[w] \\ =& M(x_n)(x_n-\theta) \\ \geq& 0 \end{aligned} dn====E[(M(xn)+w)(xnθ)]E[M(xn)(xnθ)]+E[w(xnθ)]M(xn)(xnθ)+(xnθ)E[w]M(xn)(xnθ)0
由于 E ( w 2 ) < ∞ \mathbb{E}(w^2)<\infty E(w2)<,所以
e n = E [ ( M ( x n ) + w ) 2 ] = E [ M 2 ( x n ) + w 2 + 2 w M ( x n ) ] ≤ C \begin{aligned} e_n =& \mathbb{E}[(M(x_n)+w)^2] \\ =& \mathbb{E}[M^2(x_n)+w^2+2wM(x_n)] \\ \leq& C \end{aligned} en==E[(M(xn)+w)2]E[M2(xn)+w2+2wM(xn)]C
b n + 1 − b n b_{n+1}-b_n bn+1bn 求和得到
∑ j = 1 n ( b j + 1 − b j ) = b n + 1 − b 1 = ∑ j = 1 n a j 2 e j − 2 ∑ j = 1 n a j d j \begin{aligned} & \sum_{j=1}^n(b_{j+1}-b_j) = b_{n+1}-b_1 = \sum_{j=1}^na_j^2e_j -2\sum_{j=1}^na_jd_j \\ \end{aligned} j=1n(bj+1bj)=bn+1b1=j=1naj2ej2j=1najdj
因为 b n + 1 ≥ 0 b_{n+1}\geq 0 bn+10,所以
∑ j = 1 n a j d j = 1 2 ( ∑ j = 1 n a j 2 e j + b 1 − b n + 1 ) ≤ 1 2 ( ∑ j = 1 n a j 2 e j + b 1 ) ≤ C 2 ∑ j = 1 n a j 2 + b 1 2 < ∞ \begin{aligned} \sum_{j=1}^na_jd_j =& \frac{1}{2}\left(\sum_{j=1}^na_j^2e_j+b_1-b_{n+1}\right) \\ \leq& \frac{1}{2}\left(\sum_{j=1}^na_j^2e_j+b_1\right) \\ \leq& \frac{C}{2}\sum_{j=1}^na_j^2+\frac{b_1}{2} \\ <& \infty \end{aligned} j=1najdj=<21(j=1naj2ej+b1bn+1)21(j=1naj2ej+b1)2Cj=1naj2+2b1
所以级数 a n d n a_nd_n andn 收敛,进一步地
b n + 1 = b 1 + ∑ j = 1 n a j 2 e j − 2 ∑ j = 1 n a j d j b_{n+1}=b_1+\sum_{j=1}^na_j^2e_j -2\sum_{j=1}^na_jd_j bn+1=b1+j=1naj2ej2j=1najdj
也收敛。现在假设存在一个非负常数级数 { k n } \{k_n\} {kn} 满足
d n ≥ k n b n , ∑ 1 ∞ a n k n = ∞ d_n\geq k_nb_n,\sum_1^\infty a_nk_n=\infty dnknbn,1ankn=
于是级数 a n k n b n a_nk_nb_n anknbn 收敛,又因为级数 a n k n a_nk_n ankn 发散,所以 b n b_n bn 一定收敛。

参考 随机梯度下降(SGD)算法的收敛性分析 -知乎

第二种证明方法

  记两次迭代之间的误差为
J n + 1 = x n + 1 − θ = x n − θ − a n Y ( x n ) = x n − θ − a n [ M ( x n ) + w ] = x n − θ − a n [ M ( θ ) + M ′ ( ξ ) ( x n − θ ) + w ] = ( 1 − a n M ′ ( ξ ) ) ( x n − θ ) − a n [ M ( θ ) + w ] = ( 1 − a n M ′ ( ξ ) ) J n − a n w \begin{aligned} J_{n+1} =& x_{n+1}-\theta \\ =& x_n-\theta-a_nY(x_n) \\ =& x_n-\theta-a_n[M(x_n)+w] \\ =& x_n-\theta-a_n[M(\theta)+M'(\xi)(x_n-\theta)+w] \\ =& (1-a_nM'(\xi))(x_n-\theta)-a_n[M(\theta)+w] \\ =& (1-a_nM'(\xi))J_n-a_nw \\ \end{aligned} Jn+1======xn+1θxnθanY(xn)xnθan[M(xn)+w]xnθan[M(θ)+M(ξ)(xnθ)+w](1anM(ξ))(xnθ)an[M(θ)+w](1anM(ξ))Jnanw
其中 ξ ∈ ( x n , θ ) \xi\in(x_n,\theta) ξ(xn,θ),即微分中值定理, M ′ ( ξ ) M'(\xi) M(ξ) 在下面简单记作 M ′ M' M。现在目标是证明当 n → ∞ n\rightarrow\infty n J n → 0 J_n\rightarrow 0 Jn0
J n + 1 2 − J n 2 = ( J n + 1 + J n ) ( J n + 1 − J n ) = [ ( 2 − M ′ a n ) J n − a n w ] [ − M ′ a n J n − a n w ] = − M ′ a n ( 2 − M ′ a n ) J n 2 + a n 2 w 2 + M ′ a n 2 w J n − ( 2 − M ′ a n ) a n w J n = − M ′ a n ( 2 − M ′ a n ) J n 2 + a n 2 w 2 + 2 ( M ′ a n − 1 ) a n w J n \begin{aligned} J_{n+1}^2 - J_n^2 =& (J_{n+1} + J_n)(J_{n+1} - J_n) \\ =& [(2-M'a_n)J_n-a_nw][-M'a_nJ_n-a_nw] \\ =& -M'a_n(2-M'a_n)J_n^2 +a_n^2w^2 +M'a_n^2wJ_n -(2-M'a_n)a_nwJ_n \\ =& -M'a_n(2-M'a_n)J_n^2 +a_n^2w^2 +2(M'a_n-1)a_nwJ_n \\ \end{aligned} Jn+12Jn2====(Jn+1+Jn)(Jn+1Jn)[(2Man)Jnanw][ManJnanw]Man(2Man)Jn2+an2w2+Man2wJn(2Man)anwJnMan(2Man)Jn2+an2w2+2(Man1)anwJn
0 < a n < 1 00<an<1 时,第一项小于0;由已知条件 E ( w 2 ) < ∞ , a n 2 < ∞ \mathbb{E}(w^2)<\infty,a_n^2<\infty E(w2)<,an2<,第二项大于0且有界。设 b n b_n bn 为前两项的和
b n = − M ′ a n ( 2 − M ′ a n ) J n 2 + a n 2 w 2 b_n=-M'a_n(2-M'a_n)J_n^2 +a_n^2w^2 bn=Man(2Man)Jn2+an2w2
b n < ∞ b_n<\infty bn<。因为 J 0 2 ≥ 0 J_0^2\geq 0 J020 且有界,所以 b n > − ∞ b_n>-\infty bn>
第三项中利用条件概率性质 E [ x g ( y ) ∣ y ] = g ( y ) E [ x ∣ y ] \mathbb{E}[xg(y)|y]=g(y)E[x|y] E[xg(y)y]=g(y)E[xy] 可以求出期望为0。
E [ J n + 1 2 − J n 2 ] = E [ − M ′ a n ( 2 − M ′ a n ) J n 2 + a n 2 w 2 ] + 2 E [ ( M ′ a n − 1 ) a n w J n ] = E [ − M ′ a n ( 2 − M ′ a n ) J n 2 + a n 2 w 2 ] + 2 ( M ′ a n − 1 ) a n J n E [ w ] = − E [ M ′ a n ( 2 − M ′ a n ) J n 2 ] + a n 2 E [ w 2 ] \begin{aligned} \mathbb{E}[J_{n+1}^2 - J_n^2] =& \mathbb{E}[-M'a_n(2-M'a_n)J_n^2 +a_n^2w^2] +2\mathbb{E}[(M'a_n-1)a_nwJ_n] \\ =& \mathbb{E}[-M'a_n(2-M'a_n)J_n^2 +a_n^2w^2] +2(M'a_n-1)a_nJ_n\mathbb{E}[w] \\ =& -\mathbb{E}[M'a_n(2-M'a_n)J_n^2] +a_n^2\mathbb{E}[w^2] \\ \end{aligned} E[Jn+12Jn2]===E[Man(2Man)Jn2+an2w2]+2E[(Man1)anwJn]E[Man(2Man)Jn2+an2w2]+2(Man1)anJnE[w]E[Man(2Man)Jn2]+an2E[w2]
∑ n = 1 ∞ E ( J n + 1 2 − J n 2 ) = \begin{aligned} \sum_{n=1}^\infty\mathbb{E}(J_{n+1}^2-J_n^2) =& \end{aligned} n=1E(Jn+12Jn2)=

举例

  求 x 3 = 5 x^3=5 x3=5 的解。

import random
x = 0
M = lambda x: x**3 - 5
for n in range(100):
    x = x - 1/(n+10) * (M(x) + random.gauss(0, 0.1))
    print(x)

你可能感兴趣的:(强化学习,算法,概率论)