BOOTSTRAP RESAMPLING DISTRIBUTIONS OF THE MEAN WHEN THE CENTRAL LOCATION PARAMETER IS KNOWN

——— Peter M. Saama

作者分别利用Mathematica、sas/iml、splus及stat完成该bootstrap实验,这里摘录仅sas/iml代码。值得强调的是,它属于percentile-bootstrap。

1.主要算法步骤

a) Define population of 10 cases from the uniform distribution on the unit interval [0,1].定义一个总体,它由服从[01]均匀分布的10个数据组成。

b) Draw i.i.d sample of n=50 with replacement from population in (a).从(a)得到的总体中有放回抽取n=50的独立同分布样本

c) From (b) draw 50 observations with replacement.从(b)样本中有放回抽取50个观测值

d) Subtract average of the sample in (b) from the average of the sample in (c).(c)样本得到均值减去(b)得到样本均值

e) Repeat steps (c) through (d), 2000 times to generate a 2000x1 vector bootstrap mean differences.重复(c)、(d)步骤2000次,生成一个2000*1的bootstrap均值差向量
f) Plot CDF of (e).绘制累积分布函数曲线

h) To get 100(1−a )% CI on X by BS,获得变量X的bootstrap抽样的100(1−a )% 置信区间,

   h.i   Arrange in ascending order the absolute value of BS mean difference obtained in (e).首先.将(e)中获得的均值差绝对值按升序排列
   h.ii  The band-width is given by the pth value of the vector in (h.i), where p = (1−a )*2000 .  其次.上步均值差向量中的第p个值决定了区间宽度,其中p = (1−a )*2000 .

2.文章模拟得到的数据

总体

pop = {0.954552, 0.0923216, 0.517541, 0.25118, 0.778512, 0.747938, 0.53945,
             0.999825, 0.102718, 0.702373};

独立同分布样本

mysam={0.702373, 0.0923216, 0.53945, 0.53945, 0.0923216, 0.102718, 0.102718,
                 0.25118, 0.747938, 0.25118, 0.778512, 0.517541, 0.25118, 0.25118,
                 0.102718, 0.0923216, 0.53945, 0.702373, 0.53945, 0.517541, 0.102718,
                 0.702373, 0.999825, 0.778512, 0.954552, 0.954552, 0.53945, 0.102718,
                 0.778512, 0.747938, 0.954552, 0.747938, 0.25118, 0.702373, 0.747938,
                 0.25118, 0.517541, 0.954552, 0.53945, 0.747938, 0.102718, 0.778512,
                 0.53945, 0.53945, 0.102718, 0.0923216, 0.25118, 0.53945, 0.747938,
                 0.102718};

该样本的均值为0.492,方差为0.082,根据正态分布理论,知其均值95%区间为(0.407,0.576).

3.iml代码

libname ssd '.';
Options fullstimer;
     /* Setup to generate a PostScript file */
filename gsasfile 'bootci4.ps';
goptions reset=global target=ps GSFNAME = gsasfile gsfmode=replace noprompt
simfont=swissxb nocharacters gunit=pct device= ps300
notext82 htitle=6 htext=3 ctext=blue ctitle=white
COLORS = (BLACK,GRAY22,GRAY33,GRAY66,GRAY88,GRAYAA,GRAYCC)
border;
       * Start SAS/IML and do the bootstrapping;
PROC IML;
       /* Initialize data matrices */
pop_size=10;                  /* Population size */
sam_size=50;                  /* Bootstrap sample size */
niter=2000;                   /* number of iterations */
pop=J(pop_size,1,0);          /* Population */
mysam=J(sam_size,1,0);        /* Sample from population */
bssam=J(sam_size,1,0);        /* Bootstrap sample */
msm=J(niter,1,0);             /* Bootstrap means */
absmbs=J(niter,1,0);          /* Absolute value of Bootstrap differences */
y=J(niter,1,0);               /* vector - stores step function for CDF */
                              /* Use sample generated from Mathematica for Comparison */
mysam={0.702373, 0.0923216, 0.53945, 0.53945, 0.0923216, 0.102718, 0.102718,
       0.25118, 0.747938, 0.25118, 0.778512, 0.517541, 0.25118, 0.25118,
       0.102718, 0.0923216, 0.53945, 0.702373, 0.53945, 0.517541, 0.102718,
       0.702373, 0.999825, 0.778512, 0.954552, 0.954552, 0.53945, 0.102718,
       0.778512, 0.747938, 0.954552, 0.747938, 0.25118, 0.702373, 0.747938,
       0.25118, 0.517541, 0.954552, 0.53945, 0.747938, 0.102718, 0.778512,
       0.53945, 0.53945, 0.102718, 0.0923216, 0.25118, 0.53945, 0.747938,
       0.102718};
print mysam;
                              /* Compute mean of sample from population */
psm = mysam(|:|);
print psm;
            /* Define module to generate i.i.d. bootstrap and sample, compute its mean,then compute the bootstrap difference */
Start bsmv;
                             /* Generate the Boostrap Sample */
do j=1 to sam_size;
point=ceil(sam_size*uniform(070164));
bssam[j,] = mysam[point,];
end;
                             /* Compute mean of bootstrap sample */
sm = bssam(|:|);
                             /* Compute bootstrap differences */
bsm = sm - psm;
Finish bsmv;
                             /* Define module to perform replications of the bootstrap */
Start bsrep;
                             /* Module to perform bootstrap analysis and store results */
do r=1 to niter;
call bsmv;
msm[r,]=sm;                  /* The bootstrap means */
absmbs[r,]=ABS(bsm);         /* Absolute value of the Bootstrap differences */
y[r,] = r/niter;
end;
Finish bsrep;
START qsort(arr);
                             /* Implementation of QuickSort sorting algorithm. Sorts an array (1:n) into
ascending numerical order. arr is replaced on output by its sorted arrangement.
Parameters: M is the size of subarrays sorted by straight insertion. NSTACK is
the required auxilliary storage. */
n=NROW(arr);
jstack=0;
m=7;
nstack=50;
l=1;
ir=n;
istack=J(nstack,1,0);
line1:
if (ir-l) < M then goto insert;
else goto median;
insert:
                             /* Insertion sort when subarray is small enough */
DO j=l+1 TO ir;
a=arr[j];
DO i=j-1 TO 1 BY -1;
if arr[i] <= a THEN goto line2;
arr[i+1]=arr[i];
END;
line11:
i=0;
line2:
arr[i+1]=a;
END;
line12:
if jstack = 0 THEN return;
                            /* Pop the stack and begin a new round of partitioning */
ir=istack[jstack];
l=istack[jstack-1];
jstack=jstack-2;
goto insert0;
median:
                           /* Otherwise,choose median of left, center, and right elements as partitioning
element a. Also rearrange so that a(l+1) <= a(l) <= a(ir) */
k=(l+ir)/2;
temp=arr[k];
arr[k]=arr[l+1];
arr[l+1]=temp;
if arr[l+1] > arr[ir] then DO;
temp=arr[l+1];
arr[l+1]=arr[ir];
arr[ir]=temp;
END;
if arr[l] > arr[ir] then DO;
temp=arr[l];
arr[l]=arr[ir];
arr[ir]=temp;
END;
if arr[l+1] > arr[l] then DO;
temp=arr[l+1];
arr[l+1]=arr[l];
arr[l]=temp;
END;
                               /* Initialize pointers for partitioning */
i=l+1;
j=ir;
a=arr[l];                      /* Partitioning element */
line3:
i=i+1;                         /* Beginning of inmost loop */
if arr[i] < a THEN goto line3; /* scan up to find element < a */
line4:
j=j-1;                         /* scan down to find element < a */
if arr[j] > a THEN goto line4;
if j < i THEN goto line5;      /* pointers crossed, get out */
temp=arr[i];                   /* exchange elements */
arr[i]=arr[j];
arr[j]=temp;
goto line3;                    /* end of inmost loop */
line5:
arr[l]=arr[j];                 /* Insert partitioning element */
arr[j]=a;
jstack=jstack+2;
                               /* Push pointers to larger subarray on stack, process smaller subarray */
if jstack > NSTACK THEN DO;
PUT 'NSTACK too small in sort';
END;
if ((ir-i+1) >= (j-l)) then goto inmost;
else goto closeit;
inmost:
istack[jstack]=ir;
istack[jstack-1]=i;
ir=j-1;
goto insert0;
closeit:
istack[jstack]=j-1;
istack[jstack-1]=l;
l=i;
insert0:
goto line1;
FINISH qsort;
                             /* Perform the analysis */
run bsrep;
                             /* Sort the output for POST bootstrap processing */
run qsort(absmbs);
run qsort(msm);
                             /* Extract the bootstrap band-width */
k = ROUND(.95*niter);
bwidth= absmbs[k];
print "Bootstrap band-width is";
print bwidth;
                             /* Reclaim memory by getting rid of large matrices */
FREE absmbs mysam bssam;
                             /* Define graphics module for plotting CDF - POST Bootstrap processing */
CALL GSTART;
CALL GOPEN("bootstr");
CALL GSET("font","swiss");
CALL GWINDOW({0 0, .8 1});
CALL GPORT({20 20 85 85});
START xyplot(xdata,ydata,labels);
                            * {matching X-Y co-ord}, length, # of major, # minor, flag for labels,format, height, {window};
CALL GXAXIS({.35 0},.45,6,5,0) COLOR='BLACK';
CALL GYAXIS({.35 0},1,6,5,0) COLOR='BLACK';
CALL GSCRIPT(50,5,LABELS(|1|)) COLOR='BLACK';
IF NROW(labels)>1 | NCOL(labels) > 1
THEN CALL GSCRIPT(3,80,labels(|2|)) ANGLE=270 ROTATE=90
COLOR='BLACK';
CALL GPOINT(xdata,ydata) COLOR='BLACK';
CALL GSCRIPT(40,82,'Cumulative Distribution Function')
COLOR='BLACK';
CALL GSHOW;
FINISH xyplot;
RUN xyplot(msm,y,{'Average' 'F'});
stop;
RUN;

quit;


 

你可能感兴趣的:(BOOTSTRAP RESAMPLING DISTRIBUTIONS OF THE MEAN WHEN THE CENTRAL LOCATION PARAMETER IS KNOWN)