bagging算法java实现(从N个样本中有放回地取N次)

该试验实现了统计学上经典的bagging抽样方法,并通过多次反复实验(迭代),画出了分布图,验证了bagging的正态分布和随机抽样的概率收敛情况。


package cn.melina.classification.test;

import java.text.DecimalFormat;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.Random;
import java.util.Set;

import org.apache.mahout.common.RandomUtils;

public class BaggingTest {
	/**
	   * if data has N cases, sample N cases at random without replacement.
	   * 
	   * @author melina
	   * 
	   * @param N
	   *          numbers of cases
	   * 
	   * @return N次取值并去重之后剩余的数的个数
	   */
	public static int runBagging(int N){
		Random rng = RandomUtils.getRandom();
	    ArrayList<Integer> list = new ArrayList<Integer>(); 
	    
	    for (int i = 0; i < N; i++) {
	      int index = rng.nextInt(N);
	      list.add(index);
	    }
	    //合并相同的取值
	    HashMap<String, Integer> hash = new HashMap<String, Integer>();  
        for (int i = 0; i < list.size(); i++) {  
            try {  
  
                if (!hash.isEmpty() && hash.containsKey(list.get(i))) {  
                    hash.put(list.get(i).toString(), hash.get(list.get(i)) + 1);  
                } else {  
                    hash.put(list.get(i).toString(), 1);  
                }  
            } catch (Exception e) {  
  
            }  
        }  
        /*Set<String> set = hash.keySet();  
        for (String key : set) {  
            System.out.println(key + "==>" + hash.get(key));  
        }  */
		return hash.keySet().size();
	}
	
	public static void main(String []args){
		int itr_num = 10000;   //迭代次数
		int datasize = 100;    //bagging的样本数目,此处为0~99之间100个数字做bagging
		ArrayList<Integer> list = new ArrayList<Integer>();		
		for(int i = 0; i < itr_num; i ++){
			int num = runBagging(datasize);
			list.add(num);
		    //System.out.println("第"+i+"次bagging去重之后的个数:"+ num);
		    //System.out.println(num);
		}
		
		//统计 相同的数目在全部迭代后出现的频率
		HashMap<String, Integer> hash = new HashMap<String, Integer>();  
        for (int i = 0; i < list.size(); i++) {  
            try {  
  
                if ((!hash.isEmpty() )&&( hash.containsKey(list.get(i).toString()))) {  
                    hash.put(list.get(i).toString(), Integer.valueOf(hash.get(list.get(i).toString())) + 1);  
                } else {  
                    hash.put(list.get(i).toString(), 1);  
                }  
            } catch (Exception e) {  
  
            }  
        }
        Set<String> set = hash.keySet();  
        for (String key : set) {  
        	double itr_double=itr_num*1.0;  
        	double value =  hash.get(key)/itr_double;
        	DecimalFormat df = new DecimalFormat("0.00%");
        	
            System.out.println(key + "," + df.format(value));  
        }
        
	}

}


bagging的样本总数设定为100,则从0~99这100个数中随机有放回取100次,将会在63.2附近达到极值。反复实验10000次,画出一下统计图:

bagging算法java实现(从N个样本中有放回地取N次)_第1张图片

 GOOD LUCK!

小伙伴们加油!有问题欢迎加我好友讨论~

你可能感兴趣的:(java,统计学,随机森林,bagging)