一句话,讲白了就是对已有的观测样本反复的有放回抽样,通过多次计算这些放回抽样的结果,获取统计量的分布。
以下是收集的一些例子:
1979年美国Stanford大学统计系教授Bradley Efron[1]在总结、归纳前人研究成果的基础上提出一种新的非参数统计方法——Bootstrap方法,1980年魏宗舒教授向国内做了首次介绍并将Bootstrap译作“自助法”。Bootstrap方法是一类非参数Monte Carlo方法,其实质是对观测信息进行再抽样,进而对总体的分布特性进行统计推断。 Bootstrap方法因为充分利用了给定的观测信息,不需要模型其他的假设和增加新的观测,并且具有稳健性和效率高的特点,越来越受到欢迎。
基本思想:重抽样
我们有观测数据集
<nobr style=""><span class="math" id="MathJax-Span-1" style=""><span style=""><span style=""><span class="mrow" id="MathJax-Span-2" style=""><span class="mi" id="MathJax-Span-3" style="">D</span><span class="mo" id="MathJax-Span-4" style="">:</span><span class="mo" id="MathJax-Span-5" style="">{</span><span class="mo" id="MathJax-Span-6" style="">(</span><span class="msubsup" id="MathJax-Span-7" style=""><span style=""><span style=""><span class="mi" id="MathJax-Span-8" style="">x</span><span style=""></span></span><span style=""><span class="texatom" id="MathJax-Span-9" style=""><span class="mrow" id="MathJax-Span-10" style=""><span class="mi" id="MathJax-Span-11" style="">i</span></span></span><span style=""></span></span></span></span><span class="mo" id="MathJax-Span-12" style="">,</span><span class="msubsup" id="MathJax-Span-13" style=""><span style=""><span style=""><span class="mi" id="MathJax-Span-14" style="">y<span style=""></span></span><span style=""></span></span><span style=""><span class="texatom" id="MathJax-Span-15" style=""><span class="mrow" id="MathJax-Span-16" style=""><span class="mi" id="MathJax-Span-17" style="">i</span></span></span><span style=""></span></span></span></span><span class="mo" id="MathJax-Span-18" style="">)</span><span class="mo" id="MathJax-Span-19" style="">,</span><span class="mn" id="MathJax-Span-20" style="">1</span><span class="mo" id="MathJax-Span-21" style="">≤</span><span class="mi" id="MathJax-Span-22" style="">i</span><span class="mo" id="MathJax-Span-23" style="">≤</span><span class="mi" id="MathJax-Span-24" style="">N<span style=""></span></span><span class="mo" id="MathJax-Span-25" style="">}</span></span><span style=""></span></span></span><span style=""></span></span></nobr>
,然后对这N个样本,进行有放回的重抽样。每轮我们还是抽N个,然后一共抽B轮(比如几百轮,话说前几天weibo上有人问“如果给你一万个人,你要做什么”,放在这里我就要他们不停的抽小球抽小球抽小球,哈哈!)。这样就得到了新的观测样本
<nobr style=""><span class="math" id="MathJax-Span-26" style=""><span style=""><span style=""><span class="mrow" id="MathJax-Span-27" style=""><span class="msubsup" id="MathJax-Span-28" style=""><span style=""><span style=""><span class="mi" id="MathJax-Span-29" style="">D</span><span style=""></span></span><span style=""><span class="texatom" id="MathJax-Span-30" style=""><span class="mrow" id="MathJax-Span-31" style=""><span class="mi" id="MathJax-Span-32" style="">b</span></span></span><span style=""></span></span></span></span><span class="mo" id="MathJax-Span-33" style="">:</span><span class="mo" id="MathJax-Span-34" style="">{</span><span class="mo" id="MathJax-Span-35" style="">(</span><span class="msubsup" id="MathJax-Span-36" style=""><span style=""><span style=""><span class="mi" id="MathJax-Span-37" style="">x</span><span style=""></span></span><span style=""><span class="texatom" id="MathJax-Span-38" style=""><span class="mrow" id="MathJax-Span-39" style=""><span class="mi" id="MathJax-Span-40" style="">b</span></span></span><span style=""></span></span><span style=""><span class="texatom" id="MathJax-Span-41" style=""><span class="mrow" id="MathJax-Span-42" style=""><span class="mi" id="MathJax-Span-43" style="">i</span></span></span><span style=""></span></span></span></span><span class="mo" id="MathJax-Span-44" style="">,</span><span class="msubsup" id="MathJax-Span-45" style=""><span style=""><span style=""><span class="mi" id="MathJax-Span-46" style="">y<span style=""></span></span><span style=""></span></span><span style=""><span class="texatom" id="MathJax-Span-47" style=""><span class="mrow" id="MathJax-Span-48" style=""><span class="mi" id="MathJax-Span-49" style="">b</span></span></span><span style=""></span></span><span style=""><span class="texatom" id="MathJax-Span-50" style=""><span class="mrow" id="MathJax-Span-51" style=""><span class="mi" id="MathJax-Span-52" style="">i</span></span></span><span style=""></span></span></span></span><span class="mo" id="MathJax-Span-53" style="">)</span><span class="mo" id="MathJax-Span-54" style="">,</span><span class="mn" id="MathJax-Span-55" style="">1</span><span class="mo" id="MathJax-Span-56" style="">≤</span><span class="mi" id="MathJax-Span-57" style="">i</span><span class="mo" id="MathJax-Span-58" style="">≤</span><span class="mi" id="MathJax-Span-59" style="">N<span style=""></span></span><span class="mo" id="MathJax-Span-60" style="">}</span><span class="mo" id="MathJax-Span-61" style="">,</span><span class="mspace" id="MathJax-Span-62" style=""></span><span class="mn" id="MathJax-Span-63" style="">1</span><span class="mo" id="MathJax-Span-64" style="">≤</span><span class="mi" id="MathJax-Span-65" style="">b</span><span class="mo" id="MathJax-Span-66" style="">≤</span><span class="mi" id="MathJax-Span-67" style="">B</span></span><span style=""></span></span></span><span style=""></span></span></nobr>
Bootstrap的思想,是生成一系列bootstrap伪样本,每个样本是初始数据有放回抽样。通过对伪样本的计算,获得统计量的分布。例如,要进行1000次bootstrap,求平均值的置信区间,可以对每个伪样本计算平均值。这样就获得了1000个平均值。对着1000个平均值的分位数进行计算, 即可获得置信区间。已经证明,在初始样本足够大的情况下,bootstrap抽样能够无偏得接近总体的分布。