机器学习基石——Heoffding不等式推导

该文推导参考了

https://www.cnblogs.com/kolmogorov/p/9518867.html

https://blog.csdn.net/hedan2013/article/details/76337040


 

0.Heoffding不等式

P\left ( \left | \frac{1}{m}\sum_{i=1}^{m}Z_i-\mu \right |\geq \varepsilon \right )\leq 2 e^{\frac{-2m\varepsilon ^2}{\left ( b-a \right )^2}}   

Z_1,Z_2,...Z_m是独立的随机变量(不一定同分布);

\mu =E\left ( \bar{Z} \right )=E\left ( \frac{1}{m}\sum_{i=1}^{m}Z_i\right )

\varepsilon > 0P\left [ a\leq Z_i\leq b \right ]=1


1.证明Heoffding不等式,首先得先证明E\left ( e^{\lambda X} \right )\leq e^{\frac{\lambda ^2\left ( b-a \right )^2}{8}}

其中X是一个随机变量,a\leq X\leq bE\left ( X \right )=0\lambda > 0

由于f\left ( X \right )=e^{\lambda X}为凸函数,所以f\left ( X \right )\leq f\left ( a \right )+\frac{f\left ( b \right )-f\left ( a \right )}{b-a}\left ( X-a \right ), 这个是凸函数的性质不再多述。

f\left ( X \right )=e^{\lambda X}f\left ( a \right )=e^{\lambda a}f\left ( b \right )=e^{\lambda b}

代入f\left ( X \right )\leq f\left ( a \right )+\frac{f\left ( b \right )-f\left ( a \right )}{b-a}\left ( X-a \right )

e^{\lambda X}\leq \frac{b-X}{b-a}e^{\lambda a}+\frac{X-a}{b-a}e^{\lambda b}

对于e^{\lambda X}\leq \frac{b-X}{b-a}e^{\lambda a}+\frac{X-a}{b-a}e^{\lambda b}不等式两端取期望,

E\left (e^{\lambda X} \right )\leq \frac{b-E\left ( X\right )}{b-a}e^{\lambda a}+\frac{E\left (X \right )-a}{b-a}e^{\lambda b}

又因为E\left ( X \right )=0

所以E\left (e^{\lambda X} \right )\leq \frac{b}{b-a}e^{\lambda a}-\frac{a}{b-a}e^{\lambda b}

h=\lambda \left ( b-a \right )p=\frac{-a}{b-a}

则 \frac{b}{b-a}e^{\lambda a}-\frac{a}{b-a}e^{\lambda b}=e^{-hp+\ln \left ( 1-p+pe^h \right )}=e^{L\left ( h \right )}e^\frac{\lambda ^2\left ( b-a \right )^2}{8}=e^\frac{h^2}{8}

现在只要证明L\left ( h \right )=-hp+\ln \left ( 1-p+pe^h \right )\leq \frac{\lambda ^2\left ( b-a \right )^2}{8}=\frac{h^2}{8}

就可以证明E\left (e^{\lambda X} \right )\leq \frac{b}{b-a}e^{\lambda a}-\frac{a}{b-a}e^{\lambda b}\leq e^\frac{\lambda ^2\left ( b-a \right )^2}{8}

g\left ( h \right )=-hp+\ln \left ( 1-p+pe^h \right )-\frac{h^2}{8}

可求得g\left ( 0 \right )=0g^{'}\left (0 \right )=0, g^{''}\left (0 \right )=-\left ( p-\frac{1}{2}\right )^2\leq 0

由此可推出g\left ( h \right )=-hp+\ln \left ( 1-p+pe^h \right )-\frac{h^2}{8}\leq 0

所以E\left (e^{\lambda X} \right )\leq \frac{b}{b-a}e^{\lambda a}-\frac{a}{b-a}e^{\lambda b}\leq e^\frac{\lambda ^2\left ( b-a \right )^2}{8}


2.其次得了解Morkov定理,这个定理证明直接截取了其他人的

机器学习基石——Heoffding不等式推导_第1张图片


3.接下来进行Heoffding不等式证明   

X_i=Z_i-E\left ( Z_i \right )\bar{X}=\sum_{i=1}^{m}X_i

P\left ( \left | \frac{1}{m}\sum_{i=1}^{m}Z_i-\mu \right |\geq \varepsilon \right )=P\left ( \left | \frac{1}{m}\sum_{i=1}^{m}Z_i-E\left ( \frac{1}{m}\sum_{i=1}^{m}Z_i \right ) \right |\geq \varepsilon \right )=P\left ( \left |\bar{X} \right |\geq \varepsilon \right )

P\left ( \left |\bar{X} \right |\geq \varepsilon \right )=P\left ( \bar{X}\geq \varepsilon \right | \bar{X}\geq 0)+P\left ( -\bar{X}\geq \varepsilon | \bar{X}\leq 0\right )

先证明P\left ( \bar{X}\geq \varepsilon \right | \bar{X}\geq 0)\leq e^{\frac{-2m\varepsilon ^2}{\left ( b-a \right )^2}}

首先根据Morkov定理得,P\left ( \bar{X}\geq \varepsilon \right )=P\left ( e^{\lambda \bar{X}}\leq e^{\lambda \varepsilon } \right )\leq \frac{E\left (e^{\lambda \bar{X}} \right )} {e^{\lambda \varepsilon }}=\frac{E\left (e^{\lambda \frac{1}{m}\sum_{i=1}^{m}X_i} \right )} {e^{\lambda \varepsilon }},这里\lambda > 0\bar{X}\geq 0

由于Z_1,Z_2,...Z_m是独立的随机变量,则X_1,X_2,...X_m也是独立的随机变量,所以

P\left ( \bar{X}\geq \varepsilon \right )\leq \frac{E\left (e^{\lambda \frac{1}{m}\sum_{i=1}^{m}X_i} \right )} {e^{\lambda \varepsilon }}=e^{-\lambda \varepsilon }\prod_{i=1}^{m}E\left ( e^{\frac{\lambda }{m}X_i} \right )

由1中证明的结果E\left ( e^{\lambda X} \right )\leq e^{\frac{\lambda ^2\left ( b-a \right )^2}{8}}

P\left ( \bar{X}\geq \varepsilon \right )\leq e^{-\lambda \varepsilon }\prod_{i=1}^{m}E\left ( e^{\frac{\lambda }{m}X_i} \right )=e^{\frac{\lambda ^2\left ( b-a \right )^2}{8m}-\lambda \varepsilon }

\lambda =\frac{4m\varepsilon }{\left (b-a \right )^2},

P\left ( \bar{X}\geq \varepsilon \right )\leq e^{\frac{-2m\varepsilon ^2}{\left ( b-a \right )^2}}

同理再证明P\left ( -\bar{X}\geq \varepsilon \right | \bar{X}\leq 0)\leq e^{\frac{-2m\varepsilon ^2}{\left ( b-a \right )^2}}

所以P\left ( \left |\bar{X} \right |\geq \varepsilon \right )=P\left ( \bar{X}\geq \varepsilon \right | \bar{X}\geq 0)+P\left ( -\bar{X}\geq \varepsilon | \bar{X}\leq 0\right )\leq 2e^{\frac{-2m\varepsilon ^2}{\left ( b-a \right )^2}}

 

 

 

你可能感兴趣的:(算法,人工智能,霍夫丁不等式,机器学习)