java-蓄水池抽样-要求从N个元素中随机的抽取k个元素,其中N无法确定

蓄水池抽样(Reservoir Sampling)
相关证明可看这个链接:[url]http://www.cnblogs.com/HappyAngel/archive/2011/02/07/1949762.html[/url]
以下为上面这个链接的两个截图:

[img]http://dl.iteye.com/upload/attachment/0065/6909/b405ec99-4c89-3ecc-a15a-658f089123f5.jpg[/img]

[img]http://dl.iteye.com/upload/attachment/0065/6911/5e74e239-4de7-3197-80f5-e65f7ad14889.jpg[/img]


import java.util.Arrays;
import java.util.Random;


public class ReservoirSampling {

/**
* 题目:给定一个数据流,其中包含无穷尽的搜索关键字(比如,人们在谷歌搜索时不断输入的关键字)。
* 如何才能从这个无穷尽的流中随机的选取1000个关键字?
* Reservoir Sampling
* I read some proof on the internet,but I found they are hard to understand except this:
* http://www.cnblogs.com/HappyAngel/archive/2011/02/07/1949762.html
* It's excellent.
*/
public static void main(String[] args) {
int k=100;
int n=1000;
int[] data=new int[n];
for(int i=0;i data[i]=i;
}
int[] sample=reservoirSampling(data,k);
System.out.println(Arrays.toString(sample));
}

public static int[] reservoirSampling(int[] data,int k){
if(data==null){
return new int[0];//In ,it advises to return int[0] instead of null.Am i doing right in this case?
}
if(data.length return new int[0];
}
int[] sample=new int[k];
int n=data.length;
for(int i=0;i if(i sample[i]=data[i];
}else{
int j=new Random().nextInt(i);
if(j sample[j]=data[i];
}
}
}
return sample;
}

}

你可能感兴趣的:(算法与数据结构)